The unfolding in time scheme can be confusing. So let’s go back for a bit, look at it closely, and see what’s actually going on there. First, we will take the Elman network, and tilted by 90 degrees counter-clockwise. As in RNNs, we usually display the flow of information from the bottom to the top. In the case of a single hidden layer without stacking further, this is how the unfolded model may look like. At any given time t, we will have an input vector X of t, connected to the state by the weight matrix WX. State vector S of t minus one, connected to the state by the weight matrix WS, X of t and s of t minus one together, helped produce the desired state vector S of t, weight matrix WY in return helps produce the output vector Y of t. Now, at time t plus one, the system will have two different input vectors, vector X of t plus one, and the previous state vector S of t, which was our hidden layer activation in the previous time step. But the weights remained the same as it’s the same system, only at a different time. To avoid messy sketches, we can use the unfolded scheme which is visibly cleaner with fewer connecting lines. So to present the system time t plus one and so on, we can simply connect it this way. Add the weights to the sketch, and we have our complete unfolded system where each arrow represents a few if not many variables. Understanding this principle, and realizing that we can stack up any arbitrary number of layers, enables the easy sketching of RNN networks of any number of layers.