In this video, I will show you a pair of similar architectures that also work well, but there are many variations to LSTMs and we encourage you to study them further. Here’s a simple architecture which also works well. It’s called the gated recurring unit or GRU for short. It combines the forget and the learn gate into an update gate and then runs this through a combine gate. It only returns one working memory instead of a pair of long- and short-term memories, but it actually seems to work in practice very well too. I won’t go much into details, but in the instructor comments I’ll recommend some very good reference to learn more about gated recurrent units. Here’s another observation. Let’s remember the forget gate. The forget factor f_t was calculating using as input a combination of the short-term memory and the event. But what about the long term memory? It seems like we left it away from the decision. Why does a long-term memory not have a say into which things get remembered or not? Well let’s fix that. Let’s also connect the long-term memory into the neural network that calculates the forget factor. Mathematically, this just means the input matrix is larger since we’re also concatenating it with the long-term memory matrix. This is called a peephole connection since now the long-term memory has more access into the decisions made inside the LSTM. We can do this for every one of the forget-type nodes, and this is what we get: an LSTM with peephole connections.