Introduction
Machine Learning tasks are generally categorised into 3 different categories - Supervised Learning, Unsupervised Learning, and Reinforcement Learning. We have discussed the first two in detail in multiple blogs in the past. You may wish to check them out to clearly differentiate between the three. We also have a blog explaining RL in detail. So we’ll just revise the basic idea behind RL to begin this blog. RL models are trained based on a reward system. Consider a child learning to ride a bicycle. Riding a bicycle comes from experience. It could be a good or a bad one. Initially, the kid will make mistakes and may fall. This counts as a bad experience, and the kid learns to avoid it as he progresses. The joy of riding the bicycle could be classified as a reward for learning to ride the bicycle. Over time, the kid learns what he needs to do to gain the reward and avoid the punishment (fall from the bicycle). RL models work on a similar idea. The RL agent explores an environment. The objective is to take actions that lead to better rewards. Initially, it takes random actions to know which actions lead to a better outcome and which actions need to be avoided. This is a high-level overview of RL and how it is implemented.
Prerequisites
Before moving forward, readers are strongly advised to learn about reinforcement learning. That would make you better acquainted to grasp the key learnings of this blog.
Applications of Reinforcement learning
Self-driving cars
AI models in self-driving cars work on the principle of reinforcement learning. There are several aspects to an AI model of autonomous cars like trajectory optimisation, motion planning, etc. These are achieved by reinforcement learning where the agent must be trained to take actions to avoid a collision, which is the punishment in this case. Tesla Cars, the market leader of the autonomous vehicle industry make extensive use of Reinforcement learning in the AI models.
Source - link
Robotics
RL principles are one of the core features of the robotics industry. Industrial robots can be trained to attain maximum efficiency as robots are more optimized to take work than humans. Not only that, some tasks may not be feasible for humans at all.
Google data centers are now largely AI automated to maintain the servers up and running to minimise costs. This has led to up to 40% savings in energy consumption. The model takes images at regular intervals which are then fed to a neural network which then predicts an optimal course of action to minimise energy consumptions.
Source - link
Healthcare
RL finds its uses in the healthcare industry too. It makes use of a sequence-based use case of reinforcement learning to suggest a DTR(Dynamic treatment regimes). The DTR predicts an optimal course of drugs and treatments to minimise human errors and can significantly reduce the possible side effects of irregular drugs. DTRs can propose treatment for the most complex of medical emergencies likes HIV and Cancer.
Source - link
Gaming
No surprises here either. The gaming industry makes extensive use of principles of Reinforcement learning. For example, AI controlling FIFA goalkeeper is trained in such a way that it learns from user behaviour even within the game to make it a more realistic and fulfilling experience for the user. The AI, as it learns, makes it tougher for the user to score. This is just one of the well-known examples of RL in the gaming industry. AI in gaming today is smarter than ever.
Source - link
Finance and trading
AI models can predict future market trends based on past data. However, the course of action is up to the user. This is where RL comes into the picture. An RL agent can take action on its own, whether to buy, sell or hold a stock. An optimal course of action is decided and acted upon by the agent. IBM has a sophisticated reinforcement learning agent that automates the entire trading process.
Source - link
Business and Marketing
Business is all about increasing revenues and decreasing costs. A large number of advertisers are sud-divided and clustered into groups with real-time bidding agents. It is based on multi-agent reinforcement learning.
Strategising and optimising advertisements is essential to maximise the ROI per ad. A study found out it outperforms single-agent RL models.
Source - link
Engineering
Reinforcement learning is used to optimise large-scale productions. Meta’s open-source software, Horizon, was developed to serve this purpose. The organisation makes extensive use of the RL agent to personalise the user experience for every individual. It can also predict the streaming quality based on other factors like streaming buffer and ensuring the content quality is vastly degraded either to not make it annoying for the user.
Source - link
Recommendation engines
AI models can be used to recommend personalised content to user and RL agents can be employed to track reciprocated user behaviour and optimise the content.
Source - link