3 – LSTM Architecture

So in order to study the architecture of an LSTM, let’s quickly recall the architecture of an RNN. Basically what we do is we take our event E_t and our memory M_t-1, coming from the previous point in time, and we apply a simple tanh or sigmoid activation function to obtain the output and then your memory M_t. So to be more specific, we join these two vectors and multiply them by a matrix W and add a bias b, and then squish this with the tanh function, and that gives us the output M_t. This output is a prediction and also the memory that we carry to the next node. The LSTM architecture is very similar, except with a lot more nodes inside and with two inputs and outputs since it keeps track of the long- and short-term memories. And as I said, the short-term memory is, again, the output or prediction. Don’t get scared. These are actually not as complicated as they look. We’ll break them down in the next few videos.

%d 블로거가 이것을 좋아합니다: