Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Last Updated: Mar 27, 2024

Time Series Prediction with GRU

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM


The Gated Recurrent Unit or GRU is a kind of Recurrent Neural Network. It is younger than the more popular Long Short-Term Memory (LSTM) network (RNN). GRUs, like their sibling, can retain long-term dependencies in sequential data. Furthermore, they can address the "short-term memory" problem that plagues vanilla RNNs.
Given the legacy of Recurrent architectures in sequence modelling and predictions, the GRU is poised to outperform its elder sibling due to its superior speed while achieving comparable accuracy and effectiveness. In this article, we'll go over the concepts of GRUs and compare their mechanisms to those of LSTMs. We'll also look at the differences in performance between these two RNN variants.

Time Series Prediction with GRU


If the reader is unfamiliar with RNNs or LSTMs, they should have a look through the following topics:

1. Recurrent Neural Networks.

2. Long Short-Term Memory.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job

Gated Recurrent Units

As the name implies, a Gated Recurrent Unit (GRU) is a variant of the RNN architecture. It controls and manages the flow of information between cells in the neural network using gating mechanisms. Cho and colleagues first introduced GRUs in 2014. It is a relatively new architecture, especially when compared to Sepp Hochreiter and Jürgen Schmidhuber's widely used LSTM, which was proposed in 1997.

Overview of GRU

The GRU's structure lets it capture dependencies from big data sequences in an adaptive style without throwing away data from earlier parts of the sequence. This gets accomplished through gating units similar to those found in LSTMs, which solve the vanishing/exploding gradient problem of traditional RNNs. These gates control whether the information is retained or discarded at each time step. Later in this article, we'll get into the specifics of how these gates work and how they overcome the abovementioned issues.

How Does GRU Work

GRUs operate in the same manner as traditional RNNs. GRUs are analogous to Long Short Term Memory (LSTM). Gated Recurrent Units, like LSTM, control the flow of data using gates. They are a newer technology than LSTM. This is why they beat LSTM and have a simpler architecture.


The GRU's ability to retain long-term dependencies or memory arises from the computations performed within the GRU cell to produce the hidden state. While LSTMs transfer two states between cells — the cell state and the hidden state, which carry long and short-term memory, respectively — GRUs only transfer one hidden state between time steps. Because of the gating mechanisms and computations that the hidden state and input data go through, this hidden state can hold both long-term and short-term dependencies at the same time.


Even though both GRUs and LSTMs contain gates, the main distinction between these structures is the number of gates and their specific functions. The Update gate in the GRU performs a similar function to the Input and Forget gates in the LSTM. The control of new memory content added to the network, on the other hand, differs between these two.

The Forget gate in the LSTM determines which part of the previous cell state to retain, while the Input gate determines how much new memory to add. These two gates are independent of one another, meaning that the amount of new information added via the Input gate is altogether independent of the data retained via the Forget gate.

Regarding the GRU, the Update gate determines which information from the last memory to retain and also controls the new memory to be added. It means that the GRU's memory retention and addition of new information are NOT independent processes.


Another significant distinction between the structures is the absence of cell state in the GRU. Simultaneously, the LSTM stores its long-term dependencies in the cell state and its short-term memory in the hidden state. Both are kept in a single hidden state by the GRU. However, both architectures have been shown to achieve this goal effectively in terms of long-term information retention.

Speed Differences

GRUs train faster than LSTMs because there are fewer weights and parameters to update during training. This is because the GRU cell has two fewer gates than the LSTM, which has three.

In this article's code walkthrough, we'll directly compare the speed of training an LSTM versus a GRU on the same task.

Performance Evaluations

When deciding which model to use for a task, the model's accuracy, whether evaluated by the margin of error or the proportion of proper and accurate classifications, is generally the most crucial factor to consider. GRUs and LSTMs are RNNS variants that can be used interchangeably to achieve comparable results.

Code Implementation

We've learned about the GRU's theoretical concepts. It's now time to put what we have understood to use.

We will be writing code to implement a GRU model. To supplement our GRU-LSTM comparison, we will perform the same task with an LSTM model. Then we will make a comparison between the performance of the two models.


Let us have a look at an example. Through this code, we will also compare GRU with LSTM. We will also give a Time-series Prediction.



#To run the experiment, change the values.
experiment = 1 #BANKEX = 1; ACTIVITIES = 2
f = 20
#The length of the future window in days.
#Change f to represent the number of steps ahead. 1 or 20.
thepath = 'sample_data'
#Change the path to where you want the models saved.


from numpy.random import seed
import tensorflow
import math
import numpy as np
import pandas as pd
from keras.layers import Dense, LSTM, Dropout, GRU
from numpy import genfromtxt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr
import as cm
from sklearn.metrics import mean_squared_error
from scipy.stats import mannwhitneyu


def GRU_Model(output_window):
  model = Sequential()
  model.add(GRU(128, return_sequences=False, input_shape=(X.shape[1],1)))
  model.compile(optimizer='adam', loss='mean_squared_error')
  return model


def LSTM_Model(output_window):
  model = Sequential()
  model.add(LSTM(128, return_sequences=False, input_shape=(X.shape[1],1)))
  model.compile(optimizer='adam', loss='mean_squared_error')
  return model


#Based on the Video :
# "Computer Science: Stock Price Prediction Using Python and Machine Learning."  
#Link: ""
def data_preparation(w, scaled_data, N, f):
  window = w + f
  Q = len(scaled_data)
  for i in range(Q-window+1):
    X.append(scaled_data[i:i+window, 0])

  X = np.array(X)
  X = np.reshape(X, (X.shape[0],X.shape[1],1))

  trainX, trainY = X[0:N,0:w], X[0:N,w:w+f]
  testX, testY = X[N:Q-w,0:w], X[N:Q-w,w:w+f]
  X = trainX
  return trainX, trainY, testX, testY, X


#The last known value is repeated f times.
def baselinef(U,f):
  last = U.shape[0]
  yhat = np.zeros((last, f))
  for j in range(0,last):
    yhat[j,0:f] = np.repeat(U[j,U.shape[1]-1], f)
  return yhat


#Directional accuracy
# taken from the link:
def mda(actual: np.ndarray, predicted: np.ndarray):
  return np.mean((np.sign(actual[1:] - actual[:-1]) == np.sign(predicted[1:] - predicted[:-1])).astype(int))


#Normalizing the data between 0 and 1
def scaleit(DATAX):
  mima = np.zeros((DATAX.shape[0], 2)) #To save min and max values
  for i in range(DATAX.shape[0]):
    mima[i,0],mima[i,1] = DATAX[i,:].min(), DATAX[i,:].max()
    DATAX[i,:] = (DATAX[i,:]-DATAX[i,:].min())/(DATAX[i,:].max()-DATAX[i,:].min())
  return DATAX, mima


#Rescaling to the original values
def rescaleit(y,mima,i):
  yt = (y*(mima[i,1]-mima[i,0]))+mima[i,0]
  return yt


#This is Based on suggestions from
#The link: ""
#This code is to plot series of different colors
def plot_series(X):
  x = np.arange(10)
  ys = [i+x+(i*x)**2 for i in range(10)]
  colors = cm.rainbow(np.linspace(0, 1, len(ys)))
  for i in range(10):
    plt.plot(X[i], label='%s ' % (i+1), color=colors[i,:])
    plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)
  plt.ylabel("Closing Price")


#The Statistical tests
def statisticaltests(s):
  print('LSTM and Baseline (RMSE)')
  U1, p = mannwhitneyu(s[:,0],s[:,2], alternative = 'two-sided')
  print('U='+ str(U1) + '. p = ' + str(p))
  print('GRU and Baseline (RMSE)')
  U1, p = mannwhitneyu(s[:,1],s[:,2], alternative = 'two-sided')
  print('U='+ str(U1) + '. p = ' + str(p))
  print('LSTM and GRU (RMSE)')
  U1, p = mannwhitneyu(s[:,0],s[:,1], alternative = 'two-sided')
  print('U='+ str(U1) + '. p = ' + str(p))
  print('LSTM and Baseline (DA)')
  U1, p = mannwhitneyu(s[:,0+3],s[:,2+3], alternative = 'two-sided')
  print('U='+ str(U1) + '. p = ' + str(p))
  print('GRU and Baseline (DA)')
  U1, p = mannwhitneyu(s[:,1+3],s[:,2+3], alternative = 'two-sided')
  print('U='+ str(U1) + '. p = ' + str(p))
  print('LSTM and GRU (DA)')
  U1, p = mannwhitneyu(s[:,0+3],s[:,1+3], alternative = 'two-sided')
  print('U='+ str(U1) + '. p = ' + str(p))


if experiment ==1:
  #Retrieving the BANKEX dataset
  DATAX = genfromtxt('BANKEX.csv', delimiter=',')
  #Retrieving the Activities dataset
   DATAX = genfromtxt('activities.csv', delimiter=',')


#Normalizing the data between 0 and 1
DATAX, mima = scaleit(DATAX)
selected_series = 0 #Select one signal arbitrarily to train the dataset
scaled_data = DATAX[selected_series, :]
scaled_data = np.reshape(scaled_data, (len(scaled_data),1))


#w < N < Q
window = 60 #Size of the window in days
test_samples = 251 #Number of test samples
N = len(scaled_data) - test_samples - window


trainX, trainY, testX, testY, X = data_preparation(window, scaled_data, N,f)


lstm_model = LSTM_Model(f)
gru_model = GRU_Model(f)
epochs = 200


# Training the LSTM model
lstm_trained =, trainY, shuffle=True, epochs=epochs)
# Training the GRU model
gru_trained =, trainY, shuffle=True, epochs=epochs)


s=np.ones(s)  #for the results

for j in range(0,10):
  scaled_data = DATAX[j, :] #Modifies the index of "time_series"
  scaled_data = np.reshape(scaled_data, (len(scaled_data),1))
  _ , _ , testX, testY, _ = data_preparation(window, scaled_data, N, f)
  y_pred_lstm = lstm_model.predict(testX)
  y_pred_gru = gru_model.predict(testX)
  y_baseline = baselinef(testX,f)
  testY = np.reshape(testY, (testY.shape[0],testY.shape[1]))
  s[j,0] = np.sqrt(mean_squared_error(testY, y_pred_lstm))
  s[j,1] = np.sqrt(mean_squared_error(testY, y_pred_gru))
  s[j,2] = np.sqrt(mean_squared_error(testY, y_baseline))
  s[j,0+3] = mda(testY, y_pred_lstm)
  s[j,1+3] = mda(testY, y_pred_gru)
  s[j,2+3] = mda(testY, y_baseline)


print('Mean values')
np.mean(s, axis=0)
Mean values
print('Standard Deviation')
np.std(s, axis=0)
Standard deviation
#The Statistical tests


#To save the models, use this code.
if experiment == 1:
  ex = 'B'
  ex = 'A''/GRU_'+str(ex)+str(f)) #Saves GRU'/LSTM_'+str(ex)+str(f)) #Saves LSTM


#Rescale for visual examination
testY = rescaleit(testY,mima,j)
y_pred_lstm = rescaleit(y_pred_lstm,mima,j)
y_pred_gru = rescaleit(y_pred_gru,mima,j)
y_baseline = rescaleit(y_baseline,mima,j)


if f == 1:
  plt.plot(testY[0:100], label = 'Actual')
  plt.plot(y_baseline[0:100], label = 'Baseline prediction')
  plt.plot(y_pred_lstm[0:100], label = 'LSTM prediction')
  plt.plot(y_pred_gru[0:100], label = 'GRU prediction')
  plt.legend(loc='upper center', bbox_to_anchor=(0.5, 1.1),
          fancybox=True, shadow=True, ncol=5)
  if experiment ==1:
    plt.ylabel("Closing Price")


g = 230 #Selects one of the test samples
if f == 20:
  plt.plot(testY[g,:], label = 'Actual')
  plt.plot(y_baseline[g,:], label = 'Baseline prediction')
  plt.plot(y_pred_lstm[g,:], label = 'LSTM prediction')
  plt.plot(y_pred_gru[g,:], label = 'GRU prediction')
  days = np.arange(testY.shape[1]+1)
  new_list = range(math.floor(min(days)), math.ceil(max(days))+1)


Plot showing actual, Baseline, LSTM, GRU Prediction


#"""Predictions and loss function plot"""
plt.plot(lstm_trained.history['loss'], label = 'LSTM')
plt.plot(gru_trained.history['loss'], label = 'GRU los')
plt.ylabel('MSE Loss')
Predictions and loss function plot

Frequently Asked Questions

How is GRU used in machine learning?

A gated recurrent unit (GRU) is a type of recurrent neural network that uses connections through a series of nodes to perform machine learning tasks related to memory and clustering, such as speech recognition.

Does GRU have a forget gate?

The GRU functions similarly to a long short-term memory (LSTM) with a forget gate, but with fewer parameters because it lacks an output gate.

Can we combine LSTM and GRU?

Although LSTM came before GRU and GRU contains less computation, LSTM is roughly on par with GRU in terms of performance. Stacking LSTM and GRU or any other cells may be interesting, but it will not significantly improve performance over simply stacking LSTM or GRU cells.


In the article, we read about time series prediction with GRU. We also saw the prerequisites and compared GRU with LSTM. Finally, we implemented a code to see and compare the time series prediction. Time series prediction can come to good use when forecasting an outcome. You can learn about Cloud Computing and find our courses on Data Science and machine learning. Do not forget to check out more blogs on GRU to follow.

Thank you

Explore Coding Ninjas Studio to find more exciting stuff. Happy Coding!

Topics covered
Gated Recurrent Units
How Does GRU Work
Speed Differences
Performance Evaluations
Code Implementation
Frequently Asked Questions
How is GRU used in machine learning?
Does GRU have a forget gate?
Can we combine LSTM and GRU?