Deep Learning - How Long Short-Term Memory Networks (LSTMs) work: A Visual Guide

Introduction

In 1997, Sepp Hochreiter and Jürgen Schmidhuber introduced Long Short-Term Memory (LSTM), a special type of Recurrent Neural Network (RNN). The motivation was to overcome the problems of traditional RNNs when dealing with long sequences. This is exactly what LSTM Networks (LSTMs) are supposed to do. The special architecture of LSTMs allows them to have a kind of memory, which makes it possible to deal very well with sequences that contain long-term dependencies. The concept of LSTM was a groundbreaking achievement in the field of Deep Learning and has become a fundamental part of Natural Language Processing (NLP) applications. In this tutorial we want to take a closer look at LSTMs and explore how exactly they work. We will particularly uncover the power of LSTMs and their unique architecture.

Challenges of RNNs

If you are not yet familiar with RNNs, check out our post about how RNNs work:

Traditional RNNs are well suited for tasks where short-term dependencies need to be taken into account. However, they struggle when it comes to processing long sequences that contain long-term dependencies.