Video Activity Recognition
Video activity recognition is to identify the activity in the video clip by analyzing every frame of the clip. As a video clip is a sequence of video frames; therefore, input data of the video activity recognition has a certain sequence.

source
All the above examples imply that the sequence model has a large number of applications. We observed three possible cases while using sequence models; input data can be sequential, output data can be sequential, or both input and output data can be sequential. A recurrent neural network (RNN) is a popular sequence model that has shown efficient performance for sequential data.
Recurrent Neural Network
RNN is a special neural network suited for sequential (or recurrent) data. Examples of sequential data include:
- Sentences (sequences of words).
- Time series (sequences of stock prices, for instance).
- Videos (sequences of frames).
They qualify as recurrent data as each time step is related to the previous ones.
While RNNs were originally developed for time series analysis and natural language processing tasks, they are now applied to various computer vision tasks.
When it comes to sharing features, standard neural networks cannot share features, so one of the reasons for using a recurrent neural network is that features can be shared. Weights are shared across time in RNN. RNNs can remember their previous inputs, but Standard Neural Networks cannot remember previous inputs. RNN takes historical information for computation.
In RNN, the loss function is defined based on the loss at each time step.

Backpropagation is done at each point in time in RNN.

RNN Architectures

There are several RNN architectures depending on the number of inputs and outputs.
1. One to Many Architecture: One of the good examples of this architecture is image captioning. Image captioning takes one image as an input and then outputs a sequence of words as we can see that there is only one input and sequence of output.
2. Many to One Architecture: Sentiment classification is one good example of this architecture. In sentiment classification, a given sentence is classified as positive or negative. In this case, the input is a sequence of words, and the output is a binary classification.
3. Many to Many Architecture: There are two cases in many to many architectures.
The first type is when the input length equals the output length. Name entity recognition is one good example of the number of words in the input sequence equaling the number of words in the output sequence.
The second type of many to many architectures is when input length does not equal the output length. Machine translation is one good scenario for this architecture. In machine translation, RNN reads a sentence in one language and then converts it to another language. Here input length and output length are different.

source
Long Short-Term Memory
Hochreiter and Schmidhuber introduced LSTM networks in 1997. These are the variation of recurrent neural networks that are most commonly used.
The critical component of the LSTM is the memory cell and the gates (including the forget gate and the input gate). The memory cell contents are modulated by the input gates and forget gates. The contents of the memory cell remains unchanged between one time-step and the next, assuming that both of these gates are closed. The gating structure allows information to be retained across many time-steps and consequently allows gradients to flow across many time-steps. This helps the LSTM model to overcome the vanishing gradient problem with most Recurrent Neural Network models.
LSTMs are known for two main reason i.e., better backpropagation and better updation of equation.
Application of LSTM:
- Handwriting recognition
- Generating sentences
- Speech recognition
Architecture of LSTM
The memory cell allows the LSTM network to maintain its state over time. LSTM block is the main body of the LSTM unit.
The image below displays the block diagram of the LSTM network.

Three gates of LSTM unit
Input gate: It protects the LSTM unit from irrelevant input events.
Output gate: It helps to expose the contents of the memory at the output of the LSTM unit.
Forget gate: It helps the LSTM unit to forget the unnecessary previous data.
The output of the LSTM block is recurrently connected back to the block input and all of the gates for the LSTM block. The input, forget, and output gates in an LSTM unit have sigmoid activation functions for [0, 1] restriction. The tanh activation function is generally used at the input and output of the LSTM block.
Vectorized formulas for forward pass of LSTM layer

Here,
- xt is input vector at time t.
- W is rectangular input weight matrixes
- R is square recurrent weight matrices
- P is peephole weight vectors
- b is bias vectors
Frequently Asked Questions
-
What is sequence modeling?
Sequence modeling is generating a sequence of values by analyzing a series of input values.
-
What is bidirectional LSTM?
A Bidirectional LSTM, or biLSTM, is a sequence processing model that consists of two LSTMs: one taking the input in a forward direction and the other in a backward direction.
-
What is the difference between RNN and CNN?
The main difference between a CNN and an RNN is processing temporal information — data that comes in sequences, such as a sentence. Recurrent neural networks are designed for this very purpose, while convolutional neural networks cannot effectively interpret temporal information.
-
Difference between RNN and LSTM?
RNNs do not have a cell state. They only have hidden states, and those hidden states serve as the memory for RNNs. Meanwhile, LSTM has both cell states and hidden states. The cell state can remove or add information to the cell, regulated by "gates".
-
Which model is best suited for sequential data?
Recurrent neural networks works best for the sequential data, because RNN's can remember important things about the input they received, which allows them to be very precise in predicting what's coming next.
Key Takeaways
In this article, we have discussed the following topics:
- Introduction to sequence models
- Architecture of RNN
- Architecture of LSTM
Check out this problem - Shortest Common Supersequence
Want to learn more about Machine Learning? Here is an excellent course that can guide you in learning.
Happy Coding!