In my previous posts, I explained about unsupervised learning along with other important machine learning concepts. To add further to the concept of unsupervised learning, there are 2 other machine learning techniques, they are:
- Reinforcement Learning
- Semi-Supervised Learning
Semi-supervised Learning
Semi-supervised learning lies somewhere between supervised and unsupervised learning. It is a combination of labeled and unlabeled data. This means that sometimes, the relationship between the data needs to be figured out whereas sometimes, it is provided beforehand. It is useful in places like webpage scraping or classification, speech recognition, and genetic sequencing. Mostly, in the modern world, data is either processed using unsupervised learning or semi-supervised earning algorithms.
Today, we will try to explore and understand more about reinforcement learning. Reinforcement learning has been called as the "hope of artificial intelligence". You will soon know why. Let us dive into this exciting world.
What is Reinforcement Learning?
Consider the following example which will help set a foundation on how reinforcement learning can be interpreted: How do humans learn? We learn by observing our surroundings, and slowly we try to adapt to it. Then based on the result of the trial, we evaluate the situation (incident or experiment) to be positive or negative which means we LEARN from it. We interact with people and things around us and learn.
Consider one more example: How does a kid learn to speak a language? The kid will first learn the letters of the language and then try small words. Further, the child learns more complex words and eventually learns to form full-fledged sentences.
So basically, in reinforcement learning, a machine will learn and interact (like how a human learns, in the above example) with the surroundings and receive rewards for performing actions. In short, it is the computational approach of learning from action.
The reward and feedback mechanism is there to provide an insight for the machine so that it learns that a certain task will give a better reward than another task. This reward is also known as "reinforcement signal". The machine will not be told to perform certain actions, but it will have to discover and learn overtime about which actions will yield maximum rewards. There is a mapping from an input to an output, i.e if "this" is the input, then "that" is the output. The ultimate goal is to achieve maximum rewards for performing actions. And the rewards are not given after every step. There are delays in the rewards (or feedback). These rewards are nothing but numbers, a scalar feedback.
After seeing this, you might be in a dilemma as to how reinforcement learning and unsupervised learning are different. Here you go, below we have some simple difference between both,
Reinforcement Learning |
Unsupervised Learning |
There is a mapping from input to output. |
There is no mapping. |
The main task of this algorithm is to find the best ways to earn the highest(best) reward (the reinforcement signal) |
The main task of the algorithm is to find relationships, patterns and structure of the given dataset. |
Example: In a game of chess, only the end result (win/lose) is provided to the algorithm. The algorithm plays more games and learns which moves will eventually make it win. |
Example: There is a set of data with 2 different types of fruits provided with no labels (or names) whatsoever. The algorithm has to differentiate between the 2 types of fruits and classify them into their respective categories. |
Terms used in Reinforcement Learning
- Action (A): All the possible steps that the algorithm takes
- State (S): Current position of the environment.
- Reward (R): An immediate return (reinforcement signal) sent from the environment to evaluate the last action.
- Policy (?): The strategy that the algorithm employs to determine next action based on the current state and the reward.
- Value (V): The expected long-term return.
- Q-value or action-value (Q): Q-value is like Value, except for the fact that it takes an extra parameter, the current action A. Q?(S, A) refers to the long-term return from the current state S.
Some well-known Reinforcement algorithms
- Q learning
- SARSA (State-Action-Reward- State-Action)
- Deep Q network (DQN)
- Deep deterministic policy gradient (DDPG)
We will understand about Q learning and various other algorithms in the upcoming posts.
Applications of Reinforcement learning
Following are some common applications of Reinforcement Learning:
1. Robotics
Used in places where robots need to take decisions, for example, a robot trying to explore a building should calculate how much power it needs to completely see the building and then return back to the original starting point. Whether the robot successfully reaches the starting point or not, it memorizes the object and gains knowledge required and further trains itself. Many warehousing facilities used by eCommerce sites and other supermarkets use these intelligent robots for sorting their millions of products every day, helping them to deliver the right products to the right customer. If you look at Tesla's factory, it comprises more than 160 robots that do the major part of the work on its cars to reduce the risk of any defect.
2. Manufacturing
Robots use deep reinforcement learning to speed up or perform tasks required by the manufacturing company. Many warehouses and e-commerce sites use these intelligent robots to sort and store products efficiently to deliver the right products to the right customer.
3. Finance
Reinforcement learning can be used in stock market trading where the Q-learning algorithm learns the optimal trading strategy using just one basic instruction, which is, Maximize the portfolio value of the company. This algorithm keeps all considerations in mind before taking decisions which most of the times prove to be a benefit to the company using it.
4. Game theory
In order to determine the best move so as to win the game against the opponent, players think of various factors. A machine can be built to think likewise and cover up all the possibilites of a certain step and perform the right move to ultimately win. Even otherwise, it learns from the loss and gains insights.
5. Medicine
Algorithms may be used to give personalized treatments to patients, drug discovery for newer ailments, diagnosis and even read thousands of studies related to radiotherapy. It may also be used to monitor and predict epidemic outbreaks
6. Computer networks
Certain algorithms can be used to find out optimum network configuration and setup for data movement and predict when and where the network might become unstable and provide back ups or alternatives for the same.
These are just a handful of the innumerable number of applications of reinforcement learning or machine learning in general. It will be interesting to see how this kind of technology will shape our lives in the near future.