Understanding the maths behind Long Short-Term Memory (LSTM) Networks: What happens inside an LSTM cell?

Introduction

Traditional RNNs, limited by their simplistic structure, have problems retaining information over longer time periods, leading to the infamous vanishing gradient problem. Long Short-Term Memory (LSTM) Networks have the impressive ability to capture and preserve long-term dependencies in sequential data. But how is an LSTM able to do this? What happens inside an LSTM cell? In this tutorial, we take a journey through the inside of an LSTM cell and investigate what exactly happens there. We take a step-by-step look at the math underlying an LSTM cell and unlock the sophisticated equations that control its gates, memory cell and output.

Basics

In the following post we have already explained how an LSTM cell is structured. We looked at its architecture and at the individual gates. Be sure to check out the post if you are not yet familiar with the basics of an LSTM cell.

Mathematical View

Now, let's go one step further and analyze step by step what goes on inside an LSTM cell. We will explain what happens at the individual gates and take a detailed look at the mathematical functions.

The following illustration shows the inside of an LSTM cell.

Forget Gate, Input Gate and Output Gate

Let's examine where to find the Forget Gate, the Input Gate and the Output Gate in the LSTM cell:

To fully understand the process, we will go through the illustration step by step and explain what happens at each point from a mathematical perspective.