13 – Sequence-Batching

One of the most difficult parts of building networks for me is getting the batches right. It’s more of a programming challenge than anything deep learning specific. So here I’m going to walk you through how batching works for RNN. With RNNs we’re training on sequences of data like text, stock values, audio etc. By taking a sequence and splitting it into multiple shorter sequences, we can take advantage of matrix operations to make training more efficient. In the fact, the RNN is training on multiple sequences in parallel. Let’s look at a simple example, a sequence of numbers from 1 to 12. We can pass these into an RNN as one sequence. What’s better. We could split it in half and pass in two sequences. The batch size corresponds to the number of sequences we’re using. So here we’d say the batch size is 2. Along with the batch size we also choose the length of the sequences we feed to the network. For example, let’s consider using a sequence length of 3. Then the first batch of data we pass into the network are the first 3 values in each mini sequence. The next batch contains the next three values and so on until we run out of data. We can retain the hidden state from one batch and use it at the start of the next batch. This way the sequence information is transferred across batches for each mini sequence. Next up you’ll see how to actually build a recurrent network. Cheers.

%d