Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Table of contents
Exploration-Exploitation in Epsilon Greedy Algorithm
Epsilon Greedy Action Selection
Epsilon greedy policy improvement
Pseudocode of Epsilon Greedy
Implementation of Epsilon Greedy Algorithm
Example of Epsilon Greedy Algorithm
Applications of Epsilon Greedy Algorithm
Advantages and Disadvantages of Epsilon Greedy Algorithm
Advantages of Epsilon Greedy Algorithm
Disadvantages of Epsilon Greedy Algorithm
Frequently Asked Questions
Key Takeaways
Last Updated: Mar 27, 2024

Epsilon Greedy Algorithm

Author Tashmit
1 upvote
Master Python: Predicting weather forecasts
Ashwin Goyal
Product Manager @


Let us say your friend group really enjoys going to the pizza parlor regularly. But one day, one of your friends said to visit the Chinese restaurant that opened a week back. The whole group is in a dilemma, whether you guys should continue to see the pizza parlor you already know is fantastic or try a new restaurant that could be worse or better than the pizza parlor.

Source: Link

Now, this is the point where you have to make a decision. 

Similarly, computers also learn to make decisions by making mistakes, trying new things, and sometimes taking the same conclusion they took last. This kind of learning is known as Reinforcement learning, a branch of Artificial Intelligence. According to Reinforcement learning, the dilemma to eat at a pizza parlor or the new Chinese restaurant is known as the exploration-exploitation tradeoff. This kind of problem is called the multi-armed bandit problem.

Exploration-Exploitation in Epsilon Greedy Algorithm

Exploitation is when the agent knows all his options and chooses the best option based on the previous success rates. Whereas exploration is the concept where the agent is unaware of his opportunities and tries to explore other options to better predict and earn rewards. As in the example above, going to the same pizza parlor and ordering the best pizza is an example of exploitation; here, the user knows all his options and selects the best option. On the other hand, going to the new Chinese restaurant and trying new things would come under exploration. The user is exploring new things; it could be better or worse, the user is unaware of the result.

To know about Machine Learning click here.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job

Epsilon Greedy Action Selection

The epsilon greedy algorithm chooses between exploration and exploitation by estimating the highest rewards. It determines the optimal action. It takes advantage of previous knowledge to choose exploitation, looks for new options, and select exploration. 

Source: Link

Mathematically it is represented as:

Source: Link

The decision for action is made by the following method:

Source: Link

Epsilon greedy policy improvement

The policy improvement is a theorem that states For any epsilon greedy policy π, the epsilon greedy policy π' concerning qπ is an improvement. Therefore, the reward for π' will be more. The inequality is because the max function returns a more excellent value than the arbitrary weighted sum.

Pseudocode of Epsilon Greedy

Source: Link

Implementation of Epsilon Greedy Algorithm

A sample code for implementation of epsilon greedy algorithm for choosing the restaurant is as follows:

#Creating class
class Restaurant:
#Defining the various choices
  Chinese_rest= [0,0,0,0]
#Creating function for Exploration 
  def exploration():
    return random.choice(Pizza_parlor)
#Creating function for Exploitation 
  def exploitation():
    return Chinese_rest.index(max(Chinese_rest))
  def Choose_restaurant():
    if random.random()>epsilon:

Example of Epsilon Greedy Algorithm

A classic problem named a multi-armed bandit problem is an example of a greedy epsilon algorithm. In this problem n arms or bandits are provided to the machine with the probability rate of success. Each arm is pulled out one by one, and based on the success rate, the reward is given. If a success, the reward is incremented by 1; otherwise, the reward is 0. Our objective is to pull out the arms sequentially to have the maximum reward at the end. 

Bandit 1 Bandit 2 Bandit 3 Bandit 4 Bandit 5
Success rate: 25% Success rate: 67% Success rate: 79% Success rate: 12% Success rate: 55%

The Epsilon greedy algorithm does not just work on the pre-known notion; it explores various other ways with the method of tries and errors to get rewards. We will set the epsilon exploration probability as 10%, and run it for 100 cycles for this case. At the beginning of the algorithm, the agent will be at the exploration stage during the first few cycles, i.e., it will explore all possible arms to get the reward. 

Later, as the name suggests, our agent is greedy; therefore, its first preference would be picking the Bandit 3 as it has the maximum success rate. Following this, the second preference would be Bandit 2; now, the agent knows all its bandit's success rates and would select those with high success rates to get the maximum reward.

Source: Link

Applications of Epsilon Greedy Algorithm

  • Predict stock prices
  • Summarizing texts
  • Chatbot 
  • Decision-making Algorithms

Advantages and Disadvantages of Epsilon Greedy Algorithm

Advantages of Epsilon Greedy Algorithm

The significant advantage of the epsilon greedy algorithm is that it gets the opportunity to learn from past experiences, like other decision-making models, and explore new outcomes. It can explore new situations and have diverse knowledge, which helps in improved decision-making. 

Disadvantages of Epsilon Greedy Algorithm

Epsilon greedy algorithm could sometimes explore new parameters and decide which user is not satisfied. 

Also Read - Kadanes Algorithm

Frequently Asked Questions

Q1. What is Epsilon probability?

Ans. It represents an arbitrarily small positive number in regression.


Q2. What is Epsilon in Reinforcement Learning?

Ans. Whenever we select actions based on our already known values, it is known as Epsilon. 


Q3. Why does Epsilon decay in reinforcement learning?

Ans. A decay in epsilon value gives the user an optimal action and the desired output.

Key Takeaways

The Epsilon greedy algorithm is all about exploring new parameters and exploiting the already known facts to make a better decision. This article included the fundamental understating of the Epsilon greedy algorithm, its pseudocode, along with an example of practical implementation. Its applications, advantages, and disadvantages. To get an in-depth understanding of Reinforcement Learning, check out this article.

Previous article
Next article
What is Genetic Algorithm?
Live masterclass