After the first wave of artificial neural networks in the mid 80s, it became clear that feedforward networks are limited since they are unable to capture temporal dependencies, which, as we said before, are dependencies that change over time. Modeling temporal data is critical in most real-world applications, since natural signals like speech and video have time varying properties and are characterized by having dependencies across time. By the way, biological neural networks have recurring connections, so applying recurrence to artificial feedforward neural networks made natural sense. The first attempt to add memory to neural networks were the Time Delay Neural Networks, or TDNNs in short. And TDNNs, inputs from past timesteps were introduced to the network input, changing the actual external inputs. This had the advantage of clearly allowing the network to look beyond the current timestep, but also introduce to clear disadvantage, since the temporal dependencies were limited to the window of the time chosen. Simple RNNs, also known as Elman networks and Jordan networks, were next to follow. We will talk about all those later in the lesson. It was recognized in the early 90s that all of these networks suffer from what we call, the vanishing gradient problem, in which contributions of information decayed geometrically over time. So, capturing relationships that spanned more than eight or ten steps back was practically impossible. Despite the elegance of these networks, they all had this key flaw. We will discuss the vanishing gradient problem in detail later. You can also find more information about this topic right after this video. In the mid 90s, Long Short-Term Memory cells, or LSTMs in short, were invented to address this very problem. The key novelty in LSTMs was the idea that some signals, what we call state variables, can be kept fixed by using gates, and reintroduced or not at an appropriate time in the future. In this way, arbitrary time intervals can be represented, and temporal dependencies can be captured. Don’t be alarmed by this sketch. We will get into all of the LSTM details later in this lesson. Variations on LSTMs such as Gated Recurrent Networks, or GRUs in short, further refined this theme, and nowadays, represent another mainstream approach to realizing RNNs..