10 – Sequence Batching

One of the most difficult parts of building networks for me is getting the batches right. It’s more of a programming challenge than anything deep learning specific. So here, I’m going to walk you through how batching works for RNNs. With RNNs, we’re training on sequences of data like text, stack values, audio, etc. By taking a sequence and splitting it into multiple shorter sequences, we can take advantage of matrix operations to make training more efficient. In fact, the RNN is training on multiple sequences in parallel. Let’s look at a simple example. A sequence of numbers from one to 12. We can pass these into an RNN as one sequence. But what’s better, we could split it in half and pass in two sequences. The batch size corresponds to the number of sequences we’re using. So here we’d say the batch size is two. Along with the batch size, we also choose the length of the sequences we feed to the network. For example, let’s consider using a sequence length of three. Then the first batch of data we pass into the network are the first three values in each mini-sequence. The next batch contains the next three values, and so on until we run out of data. We can retain the hidden state from one batch and use it at the start of the next batch. This way the sequence information is transferred across batches for each mini-sequence. Next up, you’ll see how to actually build a recurrent network. Cheers!

%d 블로거가 이것을 좋아합니다: