So here we go. As we’ve seen before, here is the architecture for an LSTM with the four gates. There is the forget gate, which takes the long-term memory and forgets part of it. The learn gate puts the short-term memory together with the event as the information we’ve recently learned. The remember gate joins the long-term memory that we haven’t yet forgotten plus the new information we’ve learned in order to update our long-term memory and output it. And finally, the use gate also takes the information we just learned together with long-term memory we haven’t yet forgotten, and it uses it to make a prediction and update the short-term memory. So this is how it looks all put together. It’s not so complicated after all, isn’t it? Now you may be thinking, wait a minute, this looks too arbitrary. Why use tanh sometimes and sigmoid other times? Why multiply sometimes and add other times, and other times apply a more complicated linear function? You can probably think of different architectures that make more sense or that are simpler, and you are absolutely right. This is an arbitrary construction. And as many things in machine learning, the reason why it is like this is because it works. And in the following section, we’ll see some other architectures which can be simpler or more complex and that also do the job. But you’re welcome to look for others and experiment. This is an area very much under development so if you come up with a different architecture and it works, that is wonderful.