Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Table of contents
The Importance of Learning Rates
Introduction to Keras
Introducing Learning Rate Schedulers
Step Decay
Exponential Decay
Cyclical Learning Rates
Implementing Learning Rate Schedulers in Keras
Choosing the Right Scheduler
Frequently Asked Questions
Why do I need a learning rate scheduler?
How does the LearningRateScheduler work?
What is exponential decay in learning rate scheduling?
How does cosine annealing work in Keras?
Can I create a custom learning rate schedule in Keras?
Last Updated: Mar 27, 2024

LR Schedulers in Keras

Author Juhi Pathak
0 upvote
Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM


When it comes to training deep learning models, finding the correct learning rate can significantly impact your model's performance and convergence speed. Learning rate schedulers are potent tools that help automate adapting the learning rate during training, allowing your model to learn more efficiently.

LR Schedulers in Keras

This article delves into the world of learning rate schedulers in Keras, providing a beginner-friendly guide to understanding and implementing them effectively.

The Importance of Learning Rates

Before we dive into learning rate schedulers, let's quickly recap the importance of learning rates in training neural networks. The learning rate determines the step size at which the model adjusts its weights during optimization. Too high a learning rate might cause the model to overshoot optimal values, leading to instability or divergence. Conversely, a learning rate that's too small could result in slow convergence or getting stuck in local minima.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job

Introduction to Keras

Keras, developed by François Chollet, first appeared in 2015 as an interface to quickly build and train neural networks. It was designed with simplicity and flexibility, enabling users to rapidly prototype models without delving into the intricate details of neural network architecture and optimization algorithms. Keras acts as a high-level API (Application Programming Interface) that runs on top of popular machine learning libraries like TensorFlow, Theano, and Microsoft Cognitive Toolkit (CNTK).

Learning rate scheduling, often referred to as learning rate annealing or decay, is a technique used in training machine learning models, and intense neural networks, to adjust the learning rate during the optimization process. The learning rate is a critical hyperparameter that determines the step size taken during gradient descent, the optimization algorithm used to update the model's parameters (weights and biases) based on the calculated gradients of the loss function.

The goal of learning rate scheduling is to strike a balance between two main concerns:

  • Convergence Speed: A high learning rate allows the model to take significant steps in parameter space, potentially leading to faster convergence in the early stages of training.
  • Stability and Precision: A low learning rate ensures the optimization process is more stable, preventing overshooting and allowing the model to settle into a well-calibrated parameter space, especially as training progresses.

Learning rate scheduling recognizes that using a single fixed learning rate throughout the training process may not be optimal. Instead, learning rate scheduling techniques dynamically adjust the learning rate during training to adapt to the changing landscape of the loss function and improve optimization performance.

Introducing Learning Rate Schedulers

Learning rate schedulers offer a dynamic approach to setting the learning rate during training. Instead of using a fixed learning rate throughout the entire training process, these schedulers automatically adjust the learning rate at predefined intervals or based on specific conditions. This adaptability can help overcome challenges like plateauing in performance or avoiding overshooting.

Keras provides several built-in learning rate schedulers, each with its unique characteristics. Let's explore a few of the most common ones:

Step Decay

The step decay scheduler reduces the learning rate by a specific factor at fixed intervals. This approach is practical when you expect the model's performance to plateau after several epochs. You can specify the decay factor and the step size.

Exponential Decay

Exponential decay reduces the learning rate exponentially over epochs. It's controlled by a decay rate parameter, which determines how quickly the learning rate decreases. This scheduler is suitable when you want a gradual and smooth reduction in the learning rate.


This scheduler adjusts the learning rate when the model's validation loss stagnates. It monitors a specified metric (usually validation loss) and reduces the learning rate if the improvement slows. This is particularly useful when training has reached a local minimum.

Cyclical Learning Rates

Cyclical learning rate schedulers alternate between lower and higher learning rates within a defined range. This can help the model escape local minima and explore different parts of the loss landscape.

Implementing Learning Rate Schedulers in Keras

Implementing learning rate schedulers in Keras is straightforward. We follow these steps:

LR Scheduling

1. First, import the necessary libraries.

from keras.optimizers import SGD
from keras.callbacks import LearningRateScheduler

2. Next, define the learning rate scheduler function.

def lr_scheduler(epoch, lr):
    new_lr = lr * 0.1  # Adjust the factor as needed
    return new_lr

3. Finally, attach the scheduler to your optimizer when compiling the model.

model.compile(optimizer=SGD(learning_rate=0.1), loss='mean_squared_error', metrics=['accuracy'])
scheduler = LearningRateScheduler(lr_scheduler)

4. And include the scheduler in the fit function's callbacks list., y_train, epochs=50, callbacks=[scheduler])


The complete implementation framework would be like the following example:

  • Python


import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.callbacks import LearningRateScheduler
# Create a simple neural network model
model = Sequential([
   Dense(units=128, activation='relu', input_shape=(input_dim,)),
   Dense(units=64, activation='relu'),
   Dense(units=num_classes, activation='softmax')
# Compile the model
model.compile(optimizer=SGD(learning_rate=0.1), loss='categorical_crossentropy', metrics=['accuracy'])
# Define the learning rate scheduler function
def step_decay(epoch, lr):
   initial_lr = 0.1  # Initial learning rate
   drop_factor = 0.5  # Factor by which the learning rate will be reduced
   epochs_drop = 10  # Number of epochs after which to drop the learning rate
   new_lr = initial_lr * np.power(drop_factor, np.floor((1 + epoch) / epochs_drop))
   return new_lr
# Create a LearningRateScheduler callback
lr_scheduler = LearningRateScheduler(step_decay)
# Train the model with the learning rate scheduler callback, y_train, epochs=50, callbacks=[lr_scheduler])


The learning rate scheduler will reduce the learning rate by a factor of 0.5 every 10 epochs. The initial learning rate is 0.1. Therefore, the learning rate schedule will look like this:

Epochs 1-10: Learning rate = 0.1

Epochs 11-20: Learning rate = 0.1 * 0.5 = 0.05

Epochs 21-30: Learning rate = 0.05 * 0.5 = 0.025

Epochs 31-40: Learning rate = 0.025 * 0.5 = 0.0125

Epochs 41-50: Learning rate = 0.0125 * 0.5 = 0.00625

The output of this code will be the training process of the neural network, including loss and accuracy values for each epoch and the adjusted learning rates based on the schedule.

Choosing the Right Scheduler

Selecting the appropriate learning rate scheduler depends on your model's architecture, dataset, and training characteristics. Experimentation is vital to finding the best fit for your specific task. Start with a simple scheduler like Step Decay and gradually explore other options to observe their impact on your model's performance.

Learning rate schedulers in Keras provide a dynamic and automated approach to adjusting the learning rate during training. These tools can help your model converge faster, avoid stagnation, and improve overall performance. By understanding the different types of schedulers and how to implement them, you can empower yourself to train more effective deep-learning models. Remember that no one-size-fits-all solution exists, so be prepared to experiment and fine-tune the best scheduler for your specific use case.

Frequently Asked Questions

Why do I need a learning rate scheduler?

Learning rate schedulers can help balance rapid convergence at the beginning of training and finer adjustments as the optimization process progresses. This can lead to faster movement and better final performance.

How does the LearningRateScheduler work?

The LearningRateScheduler in Keras allows you to define a function that maps the current epoch or iteration to a learning rate. This function is called at the beginning of each epoch or iteration, and the learning rate is updated accordingly.

What is exponential decay in learning rate scheduling?

Exponential decay involves reducing the learning rate by a certain factor at fixed intervals or epochs. The learning rate decreases exponentially over time, typically following a formula like new_lr = initial_lr * decay_rate^(epoch/decay_steps).

How does cosine annealing work in Keras?

Cosine annealing is a technique where the learning rate follows a cosine curve. It starts high, decreases to a minimum value, and then increases again. This can help models escape local minima and find better optima during training.

Can I create a custom learning rate schedule in Keras?

Keras allows you to define your custom learning rate schedules by implementing a function that calculates the learning rate based on the current epoch or iteration. You can then use this function with the LearningRateScheduler callback.


This article discussed LR Schedulers in Keras, exploring their importance, types, implementation and method of selecting them. Alright! So now that we have learned about AWS Panorama, you can refer to other similar articles.

You may refer to our Guided Path on Code Ninjas Studios for enhancing your skill set on DSACompetitive ProgrammingSystem Design, etc. Check out essential interview questions, practice our available mock tests, look at the interview bundle for interview preparations, and so much more!

Happy Learning!

Live masterclass