Table of contents
1.
Introduction
2.
Application of Sequence Models
2.1.
Speech Recognition
2.2.
Sentiment Classification
3.
Video Activity Recognition
4.
Recurrent Neural Network
4.1.
RNN Architectures
5.
Long Short-Term Memory
5.1.
Architecture of LSTM
5.2.
Vectorized formulas for forward pass of LSTM layer
6.
Frequently Asked Questions
7.
Key Takeaways
Last Updated: Mar 27, 2024

Sequence Models

Author soham Medewar
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

The machine learning models having sequential data as input or output are called sequence models. It includes text streams, video clips, audio clips, time-series data, etc. Recurrent Neural Networks (RNNs) and Long Short-Term Memory(LSTM) are popular algorithms used in sequence models.

Application of Sequence Models

Speech Recognition

Speech recognition is a technology where an audio input is given to the model, and its textual transcript is obtained as an output. In this case, both input and output data have a sequence. 

source

Sentiment Classification

In sentiment classification, every sentence is categorized according to its emotions. A sequence of words is given as input.

As we know, the sequence of words changes the sentence's meaning. So to maintain a sequence of words during classification, we use sequence models.

source

Video Activity Recognition

Video activity recognition is to identify the activity in the video clip by analyzing every frame of the clip. As a video clip is a sequence of video frames; therefore, input data of the video activity recognition has a certain sequence.

source

All the above examples imply that the sequence model has a large number of applications. We observed three possible cases while using sequence models; input data can be sequential, output data can be sequential, or both input and output data can be sequential. A recurrent neural network (RNN) is a popular sequence model that has shown efficient performance for sequential data.

Recurrent Neural Network

RNN is a special neural network suited for sequential (or recurrent) data. Examples of sequential data include:

  • Sentences (sequences of words).
  • Time series (sequences of stock prices, for instance).
  • Videos (sequences of frames).


They qualify as recurrent data as each time step is related to the previous ones. 

While RNNs were originally developed for time series analysis and natural language processing tasks, they are now applied to various computer vision tasks.

When it comes to sharing features, standard neural networks cannot share features, so one of the reasons for using a recurrent neural network is that features can be shared. Weights are shared across time in RNN. RNNs can remember their previous inputs, but Standard Neural Networks cannot remember previous inputs. RNN takes historical information for computation.

In RNN, the loss function is defined based on the loss at each time step.

Backpropagation is done at each point in time in RNN.

RNN Architectures

There are several RNN architectures depending on the number of inputs and outputs.

1. One to Many Architecture: One of the good examples of this architecture is image captioning. Image captioning takes one image as an input and then outputs a sequence of words as we can see that there is only one input and sequence of output.

2. Many to One Architecture: Sentiment classification is one good example of this architecture. In sentiment classification, a given sentence is classified as positive or negative. In this case, the input is a sequence of words, and the output is a binary classification.

3. Many to Many Architecture: There are two cases in many to many architectures.

The first type is when the input length equals the output length. Name entity recognition is one good example of the number of words in the input sequence equaling the number of words in the output sequence.

The second type of many to many architectures is when input length does not equal the output length. Machine translation is one good scenario for this architecture. In machine translation, RNN reads a sentence in one language and then converts it to another language. Here input length and output length are different.

source

Long Short-Term Memory

Hochreiter and Schmidhuber introduced LSTM networks in 1997. These are the variation of recurrent neural networks that are most commonly used.

The critical component of the LSTM is the memory cell and the gates (including the forget gate and the input gate). The memory cell contents are modulated by the input gates and forget gates. The contents of the memory cell remains unchanged between one time-step and the next, assuming that both of these gates are closed. The gating structure allows information to be retained across many time-steps and consequently allows gradients to flow across many time-steps. This helps the LSTM model to overcome the vanishing gradient problem with most Recurrent Neural Network models. 

LSTMs are known for two main reason i.e., better backpropagation and better updation of equation.

Application of LSTM:

  • Handwriting recognition
  • Generating sentences
  • Speech recognition

Architecture of LSTM

The memory cell allows the LSTM network to maintain its state over time. LSTM block is the main body of the LSTM unit.

The image below displays the block diagram of the LSTM network.

Three gates of LSTM unit

Input gate: It protects the LSTM unit from irrelevant input events.

Output gate: It helps to expose the contents of the memory at the output of the LSTM unit.

Forget gate: It helps the LSTM unit to forget the unnecessary previous data.

 

The output of the LSTM block is recurrently connected back to the block input and all of the gates for the LSTM block. The input, forget, and output gates in an LSTM unit have sigmoid activation functions for [0, 1] restriction. The tanh activation function is generally used at the input and output of the LSTM block.

Vectorized formulas for forward pass of LSTM layer

Here,

  • xt is input vector at time t.
  • W is rectangular input weight matrixes
  • R is square recurrent weight matrices
  • P is peephole weight vectors
  • b is bias vectors

Frequently Asked Questions

  1. What is sequence modeling?
    Sequence modeling is generating a sequence of values by analyzing a series of input values.
     
  2. What is bidirectional LSTM?
    A Bidirectional LSTM, or biLSTM, is a sequence processing model that consists of two LSTMs: one taking the input in a forward direction and the other in a backward direction.
     
  3. What is the difference between RNN and CNN?
    The main difference between a CNN and an RNN is processing temporal information — data that comes in sequences, such as a sentence. Recurrent neural networks are designed for this very purpose, while convolutional neural networks cannot effectively interpret temporal information.
     
  4. Difference between RNN and LSTM?
    RNNs do not have a cell state. They only have hidden states, and those hidden states serve as the memory for RNNs. Meanwhile, LSTM has both cell states and hidden states. The cell state can remove or add information to the cell, regulated by "gates".
     
  5. Which model is best suited for sequential data?
    Recurrent neural networks works best for the sequential data, because RNN's can remember important things about the input they received, which allows them to be very precise in predicting what's coming next.

Key Takeaways

In this article, we have discussed the following topics:

  • Introduction to sequence models
  • Architecture of RNN
  • Architecture of LSTM

Check out this problem - Shortest Common Supersequence
Want to learn more about Machine Learning? Here is an excellent course that can guide you in learning. 

Happy Coding!

Live masterclass