0% found this document useful (0 votes)

10 views15 pages

Unit 5 ML

This document provides an overview of machine learning techniques, focusing on reinforcement learning (RL) and genetic algorithms (GA). It explains the components and processes of RL, including the learning task, Q-learning, and applications in various fields such as robotics, finance, and healthcare. Additionally, it covers genetic algorithms, their core concepts, processes, advantages, and applications in optimization and machine learning.

Uploaded by

tejaschandelkar04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views15 pages

Unit 5 ML

Uploaded by

tejaschandelkar04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Unit 5

SUBJECT: MACHINE LEARNING TECHNIQUES

Introduction to Reinforcement Learning

Reinforcement Learning (RL) is a branch of machine learning where an agent interacts with an environment
to learn the best actions to take to achieve a specific goal. The agent learns by receiving feedback in the
form of rewards or penalties based on its actions and uses this feedback to improve its decision-making
over time.

Learning Task in Reinforcement Learning

In reinforcement learning, the learning task involves training an agent to achieve a specific goal by
interacting with its environment. The process consists of several components:

1. Agent: The learner or decision-maker.

2. Environment: The external system with which the agent interacts.
3. State (S): The current situation of the environment observed by the agent.
4. Actions (A): The set of all possible moves the agent can make.
5. Reward (R): Feedback from the environment for the agent's actions. Positive rewards encourage
good behavior, while penalties discourage undesired actions.
6. Policy (π): A strategy that the agent uses to decide its actions based on its current state.
7. Objective: The goal of the agent is to learn an optimal policy to maximize cumulative rewards over
time.

The agent learns through a trial-and-error approach:

 Exploration: Trying new actions to discover their effects.

 Exploitation: Using knowledge gained so far to maximize immediate rewards.

Example of Reinforcement Learning in Practice

1: Self-Driving Cars

 Agent: The car's AI system.

 Environment: The road, traffic signals, pedestrians, and other vehicles.
 State: Current speed, lane position, distance from other objects, etc.
 Actions: Steering, braking, accelerating, or signaling.
 Reward: Positive reward for staying in the lane, stopping at a red light, or avoiding accidents;
penalty for collisions or breaking traffic rules.
 Learning Goal: The car learns to drive safely and efficiently by repeatedly interacting with the
environment and adjusting its actions.

2: Game Playing (e.g., Chess or Go)

 Agent: The AI player.

 Environment: The game board and its rules.
 State: The current configuration of the game board.
 Actions: Moving a piece, capturing an opponent's piece, etc.
 Reward: Positive reward for winning the game or achieving strategic goals; penalty for losing or
making poor moves.
 Learning Goal: To master the game by developing strategies that maximize the chances of
winning.

Reinforcement learning is also widely used in robotics, healthcare (personalized treatment planning), and
finance (portfolio optimization). The commonality across these examples is that the agent improves its
performance over time through feedback from its actions.

Learning Models for Reinforcement Learning

Reinforcement Learning uses specific models and techniques to guide the agent in learning optimal
behaviors. Two widely used models are:

Markov Decision Process (MDP)

A Markov Decision Process (MDP) is a mathematical framework designed to model decision-making
problems where an agent interacts with an environment to achieve a goal. The MDP framework helps
determine the best course of action for an agent to maximize its cumulative rewards over time, even in
uncertain or dynamic environments.

Key Elements of MDP

1. States (S):
 Represent the current situation or position of the agent within the environment.
 Examples: The current location of a robot in a warehouse or the configuration of a
chessboard in a game.
2. Actions (A):
 The set of all possible moves or choices the agent can make in any given state.
 Examples: Moving left, right, or staying still in a grid environment.
3. Transition Probability (P):
 Defines the likelihood of moving from one state to another after taking a specific action.
Represented as P(s′∣s,a)P(s′∣s,a), where ss is the current state, aa is the action taken,
and s′s′ is the next state.
 Example: In a dice game, rolling a six might lead to a certain state with a probability
of 1661.
4. Rewards (R):
 The immediate feedback or outcome received after the agent takes an action in a state.
 Rewards guide the agent toward desirable outcomes by reinforcing good actions and
discouraging bad ones.
 Examples: Gaining points for completing a task or receiving a penalty for an error.

Applications of MDP

 Robotics: Planning optimal paths for robots in dynamic environments.

 Game AI: Developing strategies for games like chess or Go.
 Finance: Portfolio optimization by choosing actions that maximize returns.
 Healthcare: Personalized treatment planning based on patient states and outcomes.

By using MDPs, reinforcement learning can solve complex decision-making problems with structured
and mathematical precision.

Q-Learning
Q-Learning is a widely used and powerful model-free reinforcement learning algorithm. Its primary goal
is to train an agent to determine the most beneficial action to take in any given state to maximize its
cumulative rewards over time. Unlike other methods, Q-Learning does not require prior knowledge of the
environment's dynamics, making it "model-free."

Key Concepts in Q-Learning

Q-Value (Quality Value):

The Q-value represents the "goodness" or expected future reward of taking a particular action aa in a
specific state ss.

A higher Q-value indicates a more beneficial action.

The Q-value is updated iteratively using the Q-Learning Function.

Policy:
The agent's strategy for choosing actions based on the Q-values.

An optimal policy ensures that the agent always selects the action with the highest Q-value.
Exploration vs. Exploitation:

Exploration: Trying new actions to gather information about the environment.

Exploitation: Using the knowledge gained so far to maximize immediate rewards.

Q-Learning Function
The Q-value for a state-action pair is updated using the following formula:

Q-Learning Algorithm
The Q-Learning algorithm operates through the following steps:

1. Initialization:

 All Q-values are initially set to zero or some random value.

2. State Selection:

 The agent begins from an initial state.

3. Action Selection:
 The agent selects an action based on current Q-values or an exploration strategy.
o Exploration: Trying out new actions.
o Exploitation: Using the current best-known action.

4. Receiving Reward:

 Based on the selected action, the agent receives a reward.

5. Q-Value Update:

 Update the Q-value using the Q-Learning function.

6. Iteration:

 Repeat this process until the agent learns an optimal policy for all states.

Advantages of Q-Learning

Model-Free: Does not require knowledge of the environment's dynamics.

Flexible: Can be applied to a wide range of problems.

Effective: Performs well in stochastic (uncertain) environments.

Example of Q-Learning in Action

Scenario:

Environment: A maze where the agent must reach the goal while avoiding obstacles.

States (S): Different positions in the maze.

Actions (A): Moving up, down, left, or right.

Rewards (R):

+10 for reaching the goal.

-1 for each step taken.

-10 for hitting an obstacle.

Learning Process:
The agent explores the maze, updates the Q-values based on rewards and penalties, and eventually learns
the shortest and safest path to the goal.

Applications of Reinforcement Learning (RL)

Reinforcement Learning (RL) is widely used across various industries and domains due to its ability to
learn optimal decision-making policies in complex, dynamic environments. Below are some of the major
applications of RL:

1. Robotics

 Task Automation: RL is used to train robots to perform tasks like object manipulation, assembling
parts, or warehouse management.
 Navigation: Autonomous robots use RL to navigate complex environments while avoiding
obstacles.
 Human-Robot Interaction: RL enables robots to adapt to human behavior and work
collaboratively in shared environments.

2. Gaming and AI Development

 Game Playing: RL powers AI agents in video games, enabling them to learn optimal strategies
(e.g., AlphaGo mastering the game of Go).
 Game Testing: Automates the testing of game mechanics and ensures challenging gameplay.
 Dynamic Difficulty Adjustment: Adapts game difficulty based on the player’s skill level.

3. Autonomous Vehicles

 Path Planning: RL helps vehicles learn to navigate roads and traffic efficiently.
 Collision Avoidance: Ensures safety by training vehicles to avoid obstacles dynamically.
 Driving Policies: Develops efficient driving strategies under varying conditions like weather or
traffic.

4. Finance and Trading

 Portfolio Management: RL optimizes investment strategies to maximize returns.
 Trading Algorithms: Learns to buy or sell assets at the best times based on market conditions.
 Fraud Detection: Identifies unusual patterns in transactions to flag potential fraud.

5. Healthcare

 Personalized Treatment Plans: RL helps design optimal treatment strategies for patients, such as
adjusting medication doses.
 Medical Imaging: Enhances image analysis by learning patterns in X-rays, MRIs, or CT scans.
 Drug Discovery: Optimizes the process of identifying effective drug combinations.

Deep Q-Learning:
Deep Q-Learning is a method of reinforcement learning that combines the Q-Learning algorithm with
deep neural networks. It trains an agent to make the right decisions in large and complex environments.
Neural networks are used to estimate Q-Values and maximize rewards.

It is particularly useful in fields like gaming, robotics, and autonomous vehicles.

Applications of Deep Q-Learning:

1. Game Playing:
o Deep Q-Learning is widely used in training AI to play video games. For example,
DeepMind's AlphaGo and AI agents in Atari games use it to make optimal moves and
improve their strategies.
2. Robotics:
o It helps robots learn complex tasks like object manipulation, navigation, and motion
planning without explicit programming.
3. Autonomous Vehicles:
o Deep Q-Learning is applied in self-driving cars to make decisions such as lane-changing,
obstacle avoidance, and path planning.
4. Healthcare:
o It can assist in medical diagnosis, treatment planning, and optimizing drug doses by
analyzing complex datasets.
5. Finance:
o Deep Q-Learning is used in stock trading to predict market trends and optimize
investment strategies.
Advantages of Deep Q-Learning
1. Scalability:
 Handles large and continuous state spaces efficiently.
2. Efficient Learning:
 Experience replay improves sample efficiency and stability.
3. Versatility:
 Can be applied to problems with high-dimensional input spaces, such as images or videos
(e.g., playing Atari games).

Example: Deep Q-Learning in Action

Scenario:

 Environment: Playing an Atari game like Breakout.

 State Space: Pixel values of the game screen.
 Actions: Move paddle left, right, or stay.
 Rewards: +1 for hitting the ball, -1 for missing it.

Learning Process:

 The deep neural network learns to predict the optimal paddle movement based on the game
screen, maximizing the score over time.

Genetic Algorithm (GA)

A Genetic Algorithm (GA) is a search and optimization technique inspired by the principles of
natural selection and genetics. It is part of evolutionary algorithms and is used to solve
optimization and search problems by mimicking biological processes like reproduction,
mutation, and survival of the fittest.

Core Concepts in Genetic Algorithms

1. Population:
A set of potential solutions to the optimization problem, where each solution is
represented as an individual (often encoded as a string or array, e.g., binary or real-
valued).
2. Chromosome:
Representation of a candidate solution. For example, in binary encoding, a chromosome
might look like 101011.
3. Gene:
A part of a chromosome representing a specific trait or variable in the solution. For
example, in the chromosome 101011, each digit is a gene.
4. Fitness Function:
A function that evaluates how good a solution is by assigning it a "fitness" score. Higher
fitness scores indicate better solutions.
5. Selection:
A process to choose individuals from the population for reproduction based on their
fitness. Common selection methods include:
o Roulette Wheel Selection: Probability of selection proportional to fitness.
o Tournament Selection: Selects the best individual from a randomly chosen
subset.
6. Crossover (Recombination):
A process where two parent solutions combine to create offspring. This introduces
diversity into the population. Common techniques include:
o Single-Point Crossover: Split chromosomes at one point and exchange parts.
o Two-Point Crossover: Split chromosomes at two points.
o Uniform Crossover: Randomly mix genes from both parents.
7. Mutation:
Randomly changes some genes in an individual to maintain diversity and avoid local
optima. For example, flipping a bit in binary encoding (e.g., 101011 becomes 101111).
8. Termination Criteria:
The algorithm stops when:
o A maximum number of generations is reached.
o An acceptable fitness level is achieved.
o The population converges to a solution.

Genetic Algorithm Process

1. Initialization:
Generate an initial population of solutions randomly.
2. Evaluation:
Compute the fitness of each individual in the population using the fitness function.
3. Selection:
Select individuals for reproduction based on their fitness.
4. Crossover and Mutation:
o Perform crossover to create new offspring.
o Apply mutation to introduce variability.
5. Replacement:
Replace the old population with the new one, keeping the best solutions.
6. Repeat:
Iterate the process until the termination criteria are met.

Advantages of Genetic Algorithms

1. Robustness:
Can handle complex and multi-dimensional problems.
2. Exploration and Exploitation:
Efficiently searches large spaces and avoids getting stuck in local optima.
3. Adaptability:
Can be applied to various optimization problems across domains.
4. Parallelism:
Evaluates multiple solutions simultaneously, making it suitable for parallel computing.

Applications of Genetic Algorithms

1. Optimization:
o Traveling Salesman Problem (TSP): Finding the shortest route between cities.
o Resource Allocation: Optimizing the use of resources in industries.
2. Machine Learning:
o Feature Selection: Choosing the most relevant features for training models.
o Hyperparameter Tuning: Optimizing hyperparameters of algorithms.
3. Engineering Design:
o Circuit Design: Optimizing electronic circuit layouts.
o Structural Design: Designing efficient mechanical structures.
4. Bioinformatics:
o Protein Folding: Understanding protein structures.
o Genetic Research: Analyzing DNA sequences.
5. Robotics:
o Path Planning: Finding optimal paths for robots in dynamic environments.
6. Finance:
o Portfolio Optimization: Allocating assets for maximum return.
o Trading Strategies: Developing efficient stock trading strategies.
7. Game Development:
o AI in Games: Evolving game characters or strategies.

Example: Solving the Traveling Salesman Problem (TSP) with GA

Problem:
Find the shortest path to visit a set of cities and return to the starting point.
Steps in GA:

1. Representation:
Encode the order of cities as chromosomes. For example, A-B-C-D-E.
2. Fitness Function:
Calculate the total distance of the path. Lower distances indicate better fitness.
3. Crossover:
Combine two parent paths to create new paths (offspring).
4. Mutation:
Swap cities randomly to explore new paths.
5. Result:
Over generations, the algorithm evolves to find the optimal or near-optimal path.

Crossover:

Crossover is a genetic operator used to combine two parent solutions (programs) to create a new
offspring solution. It involves exchanging parts of the parent programs (like code segments) to
form a new program that may have better or different characteristics. This process mimics
natural reproduction, where offspring inherit traits from both parents.

Example:
If Parent 1 has the program x + y and Parent 2 has x * y, a crossover might result in an
offspring program like x + y * y.

Mutation:

Mutation is a genetic operator that introduces random changes to a solution. This can involve
altering parts of the program (such as changing an operator or a constant) to explore new areas of
the solution space. Mutation helps maintain diversity in the population and can prevent the
algorithm from getting stuck in local optima.

Example:
If a program is x + y, a mutation could change it to x * y or x - y, introducing a new
variation.
What is Genetic Programming?

Genetic Programming (GP) is an evolutionary algorithm-based approach used to automatically

generate computer programs or models to solve specific problems. It is inspired by the biological
process of natural evolution and adapts the principles of Genetic Algorithms (GAs) to evolve
complete programs instead of fixed-length solutions like strings or numbers.

In GP, solutions to problems are represented as tree-like structures, where nodes correspond to
operations (like addition or multiplication) and leaves represent input variables or constants.
These structures are evolved over successive generations using evolutionary operators like
selection, crossover, and mutation, to find the best-performing program.

Key Features of Genetic Programming:

1. Automated Problem Solving:

GP does not require explicit programming. Instead, it evolves programs to optimize a
given fitness function.
2. Representation of Solutions:
Solutions are represented as hierarchical tree structures, similar to mathematical
expressions or logical programs.
3. Evolutionary Operators:
GP employs biological evolution-inspired operators:
o Selection: Chooses the best programs based on their fitness.
o Crossover: Combines parts of two parent programs to create new offspring
programs.
o Mutation: Introduces variations by modifying parts of a program randomly.

Steps in Genetic Programming:

1. Initialization: Generate a population of random candidate programs using predefined

function and terminal sets.
2. Evaluation: Assess the fitness of each program based on how well it solves the problem.
3. Selection: Select the best-performing programs for reproduction.
4. Crossover: Combine parts of two parent programs to create new programs.
5. Mutation: Modify parts of programs to introduce diversity in the population.
6. Termination: Repeat the process until an optimal program is found or a specified
number of generations is reached.
Applications of Genetic Programming:

1. Symbolic Regression: Deriving mathematical models from data.

2. Automated Software Design: Designing algorithms, circuits, or neural networks.
3. Control Systems: Developing control logic for robotics or automated systems.
4. Data Mining and Pattern Recognition: Identifying patterns in complex datasets.

Example of Genetic Programming:

Problem: Find an equation to fit data points for y = x^2 + 3x + 2.

1. Initial Population:
Random programs like y = x + 2 or y = x * x are generated.
2. Fitness Evaluation:
Compare the output of each program with the expected output for given input values.
3. Evolution:
o Combine programs like y = x + 2 and y = x * x to produce y = x * x + 2.
o Introduce mutations, such as adding a term to create y = x * x + 3x + 2.
4. Optimal Solution:
After several generations, GP evolves a program close to y = x^2 + 3x + 2.

Conclusion:

Genetic Programming is a powerful tool for solving complex problems where traditional
programming methods are infeasible or time-consuming. By mimicking evolution, it allows
programs to improve automatically and adapt to the requirements of the task.

Models of Evolution and Learning

Models of Evolution and Learning and their applications explain how concepts of natural
evolution and learning are applied in computer science and artificial intelligence to solve
complex problems. These models use principles like selection, crossover, and mutation to create
systems that learn and evolve on their own.

1. Evolutionary Models:
These models simulate biological evolution and are used for optimization and problem-
solving:
o Genetic Algorithms (GA):
A population of solutions evolves, and the best solutions are selected in each
generation.
 Example: Finding the shortest route in the Traveling Salesman Problem.
o Genetic Programming (GP):
Evolves algorithms or mathematical expressions that solve specific problems.
 Example: Designing automated algorithms.
o Neuroevolution:
Evolves the architecture and weights of neural networks.
 Example: Optimizing deep learning models.
o Differential Evolution (DE):
Optimizes problems in continuous spaces.
 Example: Tuning hyperparameters in machine learning models.

Applications of Evolutionary Models

1. Optimization Problems:
Finding solutions for resource allocation and scheduling problems.
o Example: Supply chain optimization.
2. Engineering Design:
Optimizing complex mechanical and structural designs.
o Example: Designing aerodynamic vehicles.
3. Machine Learning:
Selecting features and tuning hyperparameters for models.
o Example: Evolving the architecture of neural networks.
4. Robotics:
Creating motion planning and control strategies for autonomous robots.
o Example: Path optimization for self-driving cars.
5. Healthcare:
Optimizing treatment plans and discovering drugs.
o Example: Creating personalized cancer treatment plans.
6. Finance:
Optimizing portfolios and predicting stock market trends.
o Example: Improving investment strategies.

Session 3b - Qualification Standards and Performance Requirements
No ratings yet
Session 3b - Qualification Standards and Performance Requirements
41 pages
Learners' Self - Confidence and Academic Performance: A Correlational Study
No ratings yet
Learners' Self - Confidence and Academic Performance: A Correlational Study
56 pages
3 Questions For Regina Modify
53% (17)
3 Questions For Regina Modify
2 pages
Draft Spa Jhs Cgs
No ratings yet
Draft Spa Jhs Cgs
197 pages
Automatic Question Paper Generation Using ML
100% (1)
Automatic Question Paper Generation Using ML
7 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Reflection Paper Dukomentaryo Ni Kara David
No ratings yet
Reflection Paper Dukomentaryo Ni Kara David
2 pages
DLL - New 1 1
No ratings yet
DLL - New 1 1
2 pages
Chapter 2
No ratings yet
Chapter 2
55 pages
Civics Lesson 2 Why Do People Form Governments
No ratings yet
Civics Lesson 2 Why Do People Form Governments
3 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
Lesson Plan MIL People Media
91% (11)
Lesson Plan MIL People Media
2 pages
Module 1
No ratings yet
Module 1
72 pages
Module 01
No ratings yet
Module 01
66 pages
Unit 4
No ratings yet
Unit 4
49 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Unit 6
No ratings yet
Unit 6
34 pages
The Life and Works of Rizal The Life and Works of Rizal
No ratings yet
The Life and Works of Rizal The Life and Works of Rizal
100 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
28 pages
3.RL Unit 3
No ratings yet
3.RL Unit 3
31 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
19 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Ai Unit 3
No ratings yet
Ai Unit 3
23 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Chapter 18 - Reinforcement Learning
No ratings yet
Chapter 18 - Reinforcement Learning
29 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
Unit 3
No ratings yet
Unit 3
29 pages
13.11.2024 Class 11 English Tale of The Melon City Session 2
No ratings yet
13.11.2024 Class 11 English Tale of The Melon City Session 2
16 pages
Unit-5 ML
No ratings yet
Unit-5 ML
18 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
(26022024 1500) (26082020 1456) Highfield Level 1 Award Health and Safety Awareness (RQF) Assessment Pack v1 PDF
No ratings yet
(26022024 1500) (26082020 1456) Highfield Level 1 Award Health and Safety Awareness (RQF) Assessment Pack v1 PDF
13 pages
37 RL
No ratings yet
37 RL
18 pages
Module - 1 - Reinforcement Learning and Markov Decision Process
No ratings yet
Module - 1 - Reinforcement Learning and Markov Decision Process
19 pages
Markov Decision Process: Reinforcement Learning
No ratings yet
Markov Decision Process: Reinforcement Learning
10 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Courseoutliness 8
No ratings yet
Courseoutliness 8
3 pages
IT314-week4 - Lecture4
No ratings yet
IT314-week4 - Lecture4
12 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Ai PPT New
No ratings yet
Ai PPT New
14 pages
Unit 1
No ratings yet
Unit 1
18 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Fai Mid2 4ans
No ratings yet
Fai Mid2 4ans
4 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
Problem Solving and Decision Making Course PROPOSAL
No ratings yet
Problem Solving and Decision Making Course PROPOSAL
16 pages
Metacognitive Awareness in Malaysia Reading Research: An Suhamira Nordin
No ratings yet
Metacognitive Awareness in Malaysia Reading Research: An Suhamira Nordin
18 pages
Assignment 15 Modern AI
No ratings yet
Assignment 15 Modern AI
3 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
Sptve - Techdrwg 8 - q1 - m10
No ratings yet
Sptve - Techdrwg 8 - q1 - m10
13 pages
ML 4
No ratings yet
ML 4
4 pages
Deped - K To 12 Curriculum 2012
No ratings yet
Deped - K To 12 Curriculum 2012
33 pages
Early Lesson Plan
No ratings yet
Early Lesson Plan
2 pages
DLL AOM (Week 0 June 20-21)
No ratings yet
DLL AOM (Week 0 June 20-21)
3 pages
Introduction To SMAC Social Mobile Analytics and Cloud
No ratings yet
Introduction To SMAC Social Mobile Analytics and Cloud
3 pages
Learning Development Certificate Program Syllabus AIHR
No ratings yet
Learning Development Certificate Program Syllabus AIHR
11 pages
Strategies For Increasing Student Motivation: Your Persona
No ratings yet
Strategies For Increasing Student Motivation: Your Persona
2 pages
Mind Mapping in CLIL: How It Facilitates Students' Reading Comprehension
No ratings yet
Mind Mapping in CLIL: How It Facilitates Students' Reading Comprehension
11 pages
Emotional Capital For Building Sustainable Business Performance
No ratings yet
Emotional Capital For Building Sustainable Business Performance
9 pages
Operant Conditioning by B. F. Skinner: Activity
No ratings yet
Operant Conditioning by B. F. Skinner: Activity
5 pages
Curriculum Vitae 2019
No ratings yet
Curriculum Vitae 2019
2 pages
Criticisms of MI Theory
No ratings yet
Criticisms of MI Theory
2 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet

Unit 5 ML

Uploaded by

Unit 5 ML

Uploaded by

Unit 5

SUBJECT: MACHINE LEARNING TECHNIQUES

Introduction to Reinforcement Learning

Learning Task in Reinforcement Learning

1. Agent: The learner or decision-maker.

The agent learns through a trial-and-error approach:

 Exploration: Trying new actions to discover their effects.

Example of Reinforcement Learning in Practice

 Agent: The car's AI system.

2: Game Playing (e.g., Chess or Go)

 Agent: The AI player.

Learning Models for Reinforcement Learning

Markov Decision Process (MDP)

Key Elements of MDP

 Robotics: Planning optimal paths for robots in dynamic environments.

Key Concepts in Q-Learning

Q-Value (Quality Value):

A higher Q-value indicates a more beneficial action.

The Q-value is updated iteratively using the Q-Learning Function.

Exploration: Trying new actions to gather information about the environment.

Exploitation: Using the knowledge gained so far to maximize immediate rewards.

 All Q-values are initially set to zero or some random value.

 The agent begins from an initial state.

 Based on the selected action, the agent receives a reward.

 Update the Q-value using the Q-Learning function.

Model-Free: Does not require knowledge of the environment's dynamics.

Flexible: Can be applied to a wide range of problems.

Effective: Performs well in stochastic (uncertain) environments.

Example of Q-Learning in Action

States (S): Different positions in the maze.

Actions (A): Moving up, down, left, or right.

+10 for reaching the goal.

-1 for each step taken.

-10 for hitting an obstacle.

Applications of Reinforcement Learning (RL)

2. Gaming and AI Development

4. Finance and Trading

It is particularly useful in fields like gaming, robotics, and autonomous vehicles.

Applications of Deep Q-Learning:

Example: Deep Q-Learning in Action

 Environment: Playing an Atari game like Breakout.

Genetic Algorithm (GA)

Core Concepts in Genetic Algorithms

Genetic Algorithm Process

Advantages of Genetic Algorithms

Applications of Genetic Algorithms

Example: Solving the Traveling Salesman Problem (TSP) with GA

Genetic Programming (GP) is an evolutionary algorithm-based approach used to automatically

Key Features of Genetic Programming:

1. Automated Problem Solving:

Steps in Genetic Programming:

1. Initialization: Generate a population of random candidate programs using predefined

1. Symbolic Regression: Deriving mathematical models from data.

Example of Genetic Programming:

Problem: Find an equation to fit data points for y = x^2 + 3x + 2.

Models of Evolution and Learning

Applications of Evolutionary Models

You might also like