Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Table of contents
What is reinforcement learning?
Reinforcement Learning: The constituent elements
How does Reinforcement learning work?
Reinforcement Learning: The approaches
Reinforcement learning algorithms
Applications of Reinforcement Learning
Advantages of Reinforcement learning
Disadvantages of Reinforcement learning
Frequently Asked Questions
Key Takeaways
Last Updated: Mar 27, 2024

Introduction to Reinforcement Learning

Author Pratyksh
0 upvote
Master Python: Predicting weather forecasts
Ashwin Goyal
Product Manager @


Ever wondered, how the recommendation system of your online website works? Or, how do the NPCs in the biggest of games function? The generic answer for this would be Machine learning. Although machine learning is often viewed as a monolith, this cutting-edge technology has several sub-types. We'll be covering one such type in this article today: Reinforcement learning. Although a little less popular when compared to its compatriots, it is one of the most prominent topics in Artificial Intelligence, and its popularity is only increasing.

What is reinforcement learning?

Reinforcement Learning (RL) is a feedback-based Machine Learning approach in which an agent learns how to behave in a given environment by executing actions and seeing the outcomes of those actions. For each positive activity, the agent receives positive feedback; for each poor action, the agent receives negative feedback or a penalty. The agent learns via trial and error, and as it gains experience, it learns how to complete the task more effectively.

Source: link

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job

Reinforcement Learning: The constituent elements

Before we dive into the workings of Reinforcement learning, let's understand a few technical terms that are used very frequently when talking about RL problems.

  • Agent: An entity that can interpret its surroundings and act on them.
  • Environment: The physical environment in which the agent operates
  • Action: Moves made by the agent.
  • State: The agent's current status
  • Reward: Feedback from the environment after evaluating the agent's action.
  • Policy: Method used by the agent to choose the following action depending on the current condition.
  • Value: Future reward that an agent would earn for performing an action in a specific state

How does Reinforcement learning work?

Games are the finest way to understand an RL problem. 

Consider the classic game PacMan, in which the agent's (PacMan's) purpose is to consume the food in the grid while dodging the ghosts in its path. 

In this example, the grid world represents the agent's interactive environment in which it operates. Agents are rewarded for consuming food and punished if they are encountered by the ghost (loses the game). 

The states represent the agent's position in the grid environment, while the overall cumulative reward represents the agent winning the game.

Now that we've defined each constituent let's take a look at how we can solve the problem!


source: link

In order to construct an optimum strategy, the agent must solve the conundrum of exploring new states while maximizing its total benefit. This is referred to as the Exploration vs. Exploitation trade-off. To strike a balance between the two, the optimal overall plan may include short-term compromises. As a result, the agent should gather sufficient knowledge to make the best overall decision in the future.

Reinforcement Learning: The approaches

There are primarily three approaches to implementing reinforcement learning in ML, and they are as follows:

  1. Value based
    The value-based approach is concerned with determining the optimal value function, which is the maximum value at a given point in time under any policy.
  2. Policy based
    The policy-based approach seeks the best policy for the greatest future benefits while avoiding the use of the value function. In this technique, the agent strives to adopt a strategy in which each step's activity contributes to maximizing the future reward. This can further be divided into two categories:
    1. Stochastic: Here, probability determines the action.
    2. Deterministic: The same action is produced regardless of the state.
  3. Model based
    This is the process of indirectly learning optimum behavior through doing actions and monitoring the results, which include the next state and the immediate reward. 

Source: link

Reinforcement learning algorithms

The algorithms used most often in Reinforcement learning include:

  1. Q-Learning
    It is a value-based, off-policy learning algorithm used for temporal distance learning. The temporal difference learning methods compare temporally consecutive predictions.
  2. SARSA
    State Action Reward State Action is an on-policy temporal difference. SARSA selects additional actions and rewards based on the same policy that decided the initial step.
  3. DQN
    Deep Q Neural Network, or DQN, is Q learning with the help of neural networks. Defining and updating a Q-table in a large state space environment is a daunting task. To solve this very issue, we use the DQN algorithm to approximate Q values for every action and state.

Applications of Reinforcement Learning

  1. Self Driving cars
    There are several factors to take into account with self-driving automobiles, such as speed restrictions in various locations, drivable zones, and avoiding crashes, to name a few. Motion planning, trajectory optimization, and scenario-based policies for highways are some aspects that can be automized with the help of reinforcement learning.
  2. Robotics
    There has been a lot of progress in using RL in robotics. Reinforcement learning is one of the primary machine learning approaches being explored in controlled environments where industrial robots may work from fixed positions under less hazardous conditions.
  3. Gaming
    Reinforcement learning is often used in making interactive complex video games. We saw the example of PacMan earlier. Google's AlphaGo is another popular software based on RL that shot to fame after defeating a professional human player in the game of Go.
  4. Traffic control
    With the help of reinforcement learning, it's possible to create traffic systems that not only provide insights into the older data but also help city developers in understanding the population's behavioral trends.

Advantages of Reinforcement learning

  1. Reinforcement Learning is used to address complicated issues that conventional approaches cannot handle.
  2. The model can fix mistakes made during the training phase.
  3. In RL, training data is gathered by the agent's direct contact with the environment. Training data is the experience of the learning agent, not a distinct collection of data that must be provided to the algorithm. This considerably decreases the supervisor in charge of the training process's workload.
  4. Traditional machine learning algorithms are built to excel at individual subtasks, with no regard for the larger picture. RL, on the other hand, does not break the problem down into subproblems; instead, it works straight to maximize the long-term goal.

Disadvantages of Reinforcement learning

  1. RL approaches produce training data on their own by interacting with the environment. As a result, the rate of data gathering is constrained by the environment's dynamics. High latency environments slow down the learning curve.
  2. The learning agent can trade-off short-term benefits for long-term advantages. While this basic premise makes RL valuable, it also makes it difficult for the agent to choose the best policy.

Check out this problem - Optimal Strategy For A Game

Frequently Asked Questions

  1. How is reinforcement learning different from supervised learning?
    In contrast to supervised learning, the agent learns autonomously via feedbacks in Reinforcement Learning.
  2. Is Reinforcement Learning only trial-and-error learning, or does it also require planning?
    Modern reinforcement learning is concerned with both trial-and-error learning and deliberative planning with a model of the environment. In general, it refers to any predictions about the environment's future behavior dependent on the agent's actions.
  3. Is Q-Learning a subset of Reinforcement learning?
    No, Q learning is actually one of the most used algorithms for solving problems based on RL. Here, An agent attempts to learn the best policy from its previous interactions with the environment. An agent's prior experiences are a series of state-actions-rewards.

Key Takeaways

Now we have a fair idea about Reinforcement Learning, its workings, the algorithms involved, and some of its wide applications. This is just the tip of the iceberg as far as reinforcement learning is concerned. You can check out articles on other types of Machine learning too. To learn more about Supervised Learning, check this article out, and to find out more about Unsupervised learning, you can check this article out.
If you're interested in learning about Machine Learning in-depth, you should check out this course.

Happy Learning!

Next article
Markov Decision Process
Live masterclass