0% found this document useful (0 votes)
42 views35 pages

ML Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views35 pages

ML Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

UNIT-1

Introduction To Machine
Learning
By
D JAYANARAYANA REDDY
Assistant Professor
Department of CSE
UNIT - 1
1. Introduction
What is Machine Learning?
The Field of study that gives computers a capability to learn without
being explicitly programmed.
Ex: online shopping
Machine learning adopts to the user based on data
Artificial Intelligence

Machine Learning

Deep Learning
Well Posed Learning Problems
• An agent solves a problem or task ‘T’, performance ‘P’ and gain some experience ‘E’
• If ‘P’ is measured at ‘T’ it can improve ‘E’ (learning by experience)
• Examples: 1) Playing checkers problem
2) Handwritten recognition problem
3) Robot driving learning problem

Problem Task (T) Performance (P) Experience (E)


Playing checkers problem Playing against opponents Make perfect moves to win It plays itself to improve
to win the game game

Handwriting recognition Classifying the images and Better classification A database of homework
learning text text

Robot driving learning Drive the car in a 4 lane Source to destination, the Images, vehicles on road
problem ighway average distance travelled
(long & safe)
2. Perspectives and Issues of Machine Learning
• Perspective of machine learning involves searching very large space of
possible hypothesis to determine one that best fits the observed data and
any prior knowledge held by learner.
Issues in Machine Learning
1. What algorithm should be used
2. Which algorithm perform best for which types of problems
3. How much training data is sufficient and testing data
4. What kind of methods should be used
5. What methods should be used to reduce learning overhead
6. For which type of data which methods should be used
Designing a Learning System
• To get a successful learning system, it should be designed for a proper design, several steps
should be followed
1. Choosing the training experience
2. Choosing the target function
3. Choosing a representation for target function
4. Choosing a learning algorithm for approximating target function.
5. Final Design
Step 1: Choosing a Training Experience
• In choosing a training experience, 3 attributes are taken
1. Type of Feedback -> direct/indirect
2. Degree
3. Distribution of examples
Type of Feedback: Whether the training experience provides
direct/indirect feedback regarding the choices made by performance
system. –Learning driving
Degree: degree to which learner will control the sequence of training
--With trainers,partial and complete training
Distribution of examples: How well it represents the distribution of
examples over which the performance of final system is measured.
Step 2: Choosing a Target Function
• What type of knowledge is learnt and how it is used by the performance system. Example is
checker game.
• While moving diagonally set of all possible moves is called legal moves.
• Travel only in forward direction
• Only one move per chance
• Only in diagonal direction
• Jump over opponent
• Target function -> v(b)
• Board state -> b
• Legal moves set -> B
1. If b is final board state that is won, then v(b)= 100
2. If b is final board state that is lost, then v(b)= -100
3. If b is final board state that is draw, then v(b) = 0
4. If b is not final board state then v(b) = v(), where is best final state
Step 3: Choosing a representation for Target Function
• For any board state ‘b’, we calculate function ‘c’ as linear combination of following board feature
is c(b)
• Features:
x1 - > No. of black pieces on board
x2 -> No. of red pieces on board
x3 ->No. of black kings on board
x4 -> No. of red kings on board
x5 -> No. of black pieces threatened by red
(blacks which can be beaten by red)
x6 -> No. of red pieces threatened by black
(red which can be beaten by black)
(b) = ++++++
Where to = numerical coefficients or weights of each feature
is additive constant
Step 4: Choosing a Learning algorithm for approximating the Target
Function
• To learn a target function (f) we need a set of training examples (describe a particular
board state (b) and training value
• Ordered pair = (b,
• Example : Black won the game (i.e., x2 =0 which means no red)
• b = (x1=3, x2=0, x3=1, x4=0, x5=0, x6=0)
• <(x1=3, x2=0, x3=1, x4=0, x5=0, x6=0)+100>, We need to do 2 steps in this phase.
1. Estimating Training Values
In every step, we consider successor ( depending on the next step of opponent)
 ),
It represents the next board state (estimating that this move will help (destroy
opponent), Where represents approximation.
2. Adjusting the Weights
There are some algorithm to find weights of linear functions.
Here we are using LMS (Least Mean Square) used to minimize the
error.
If error=0, we need to change weights.
If error is positive, each weight is increased in proportion.
If error is negative, each weight is decreased in proportion.

Error ‘E’ =
Step 5: Final Design
• The final design of our checkers learning system can be naturally described by four
distinct program modules that represent the central components in many learning
systems. These four modules, summarized in below Figure as follows:
2. Concept Learning
Concept Learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by hypotheses representation.
Example:
Features (Binary Valued Attributes)
Size – large, small -> x1
Color – black, blue -> x2
Screentype – Flat, Folded -> x3
Shape – Square, rectangle -> x4
Concept = <x1, x2, x3, x4>
Tablet = <large, black, flat, square>
Smart phone = < small, blue, folded, rectangle>
Number of possible instances =
Where <ɸ, ɸ, ɸ, ɸ> -> Reject all (Most specific hypothesis)
<?, ?, ?, ?> -> Accept all (Most general hypothesis

Concept Learning As Search


-> The main goal of this search is to find the hypotheses that best fits the
training examples.
Example :
• The most general hypothesis (?, ?, ?, ?, ?, ?)
• Most specific possible hypothesis <ɸ, ɸ, ɸ, ɸ, ɸ >
Given : Instances
Sky -> 3 values -> (sunny, cloudy, rainy)
Airtemp -> 2 values -> (warm, cold)
Humidity -> 2 values -> (normal, high)
Wind -> 2 values -> (Strong, weak)
Water -> 2 values -> (warm, cool)
Forecast -> 2 values -> (same , change)
-> Different instances possible = 3*2*2*2*2*2
= 96 distinct instances
-> Syntactically distinct hypothesis = 5*4*4*4*4*4
= 5120
->Semantically distinct hypothesis = 1+(4*3*3*3*3*3)
= 973
-> After finding all the syntactically and semantically distinct hypothesis, we search
the best match from all these (i.e., much closes to our learning problems)
FIND-S Algorithm (Finding a Maximally Specific
hypotheses)
• This algorithm considers only positive examples
• Most specific hypothesis => ɸ
• Most general hypothesis => ?
• Algorithm
Step 1: Initialize h with most specific hypothesis (ɸ) in H
h0= <ɸ, ɸ, ɸ, ɸ, ɸ > => 5 attributes (depends on attributes)
Step 2: For each positive sample,
For each attribute,
if(value=hypothesis value) => ignore
else
Replace with the most general hypothesis (?)
Step 3: Output hypothesis h
Origin Manufacturer Color Year Type Class
Example: 1 Japan Honda Blue 1980 Economy Yes
2 Japan Toyota Green 1970 Sports No
3 Japan Toyota Blue 1990 Economy Yes
4 USA Audi Red 1980 Economy No
5 Japan Hondo White 1980 Economy Yes
6 Japan Toyota Green 1980 Economy Yes
7 Japan Toyota Red 1980 Economy No

Step 1: h0 = <ɸ, ɸ, ɸ, ɸ, ɸ >


Step 2: h1= < Japan, honda, blue, 1980, Economy>
h2 = h1
h3= <Japan, ?, blue, ?, Economy>
h4 = h3
h5 = <Japan, ?, ?, ?, Economy>
h6 = <Japan, ?, ?, ?, Economy>
Disadvantage
-> Consider only positive (Yes) values
-> h6 may not be sole hypotheses that fits the complete data.
4. Version Spaces and Candidate Elimination
Algorithm
• Subset of hypothesis ‘H’ consistent with the training examples ‘D
• = { h € H / consistent ( h,D) }
• Where H is hypothesis and D is training examples
Algorithm to obtain Version Space (List then eliminate algorithm)
• Step 1: Version Space <- List containing every hypothesis in ‘H’
• Step 2: From this step, we keep on removing inconsistent hypothesis from version
space
For each training example, <x, c(x)> remove any hypothesis that is h(x) ≠ C(x)
• Step 3: Output the list of hypothesis into version space after checking for all
training examples.
Candidate Elimination Algorithm
• It uses the concept of version space
• It consider both positive and negative values (Yes & No)
• Both Specific and General hypothesis
• For positive samples, move from specific to general
• For negative samples, move from general to specific
• Specific uses <ɸ, ɸ, ɸ, ɸ, ɸ > ---> Positive
• General uses <?,?,?,?,?> --> Negative
Algorithm
Step 1: Initialize both general and specific hypothesis (S and G)
S= {ɸ, ɸ, ɸ, ɸ, ɸ }
G= {?, ?, ?, ?, ?} depends on number of attributes
Step 2: For each example,
if example is positive
Make specific to general
if example is negative
Make general to specific
Example : Enjoy sport
S0 = {ɸ, ɸ, ɸ, ɸ, ɸ, ɸ } G0= {?, ?, ?, ?, ?, ?}

1. For positive: s1={‘sunny’, ’warm’, ’normal’, ‘strong’, ’warm’, ’same’}


G1= {?, ?, ?, ?, ?, ?}
2. For positive: s2 = {‘sunny’, ‘warm’, ?, ‘strong’, ‘warm’, ‘same’}
G2= {?, ?, ?, ?, ?, ?}
3. For negative: s3= {‘sunny’, ‘warm’,?, ‘strong’, ‘warm’, ‘same’}
G3={<‘sunny’, ?,?,?,?,?>, <?,’warm’,?,?,?,?,?>, <?,?,?,?,?,’same’>}
4. For positive: s4={‘sunny’, ‘warm’, ?, ‘strong’, ?, ?}
G4= {<‘sunny’,?,?,?,?,?> , <?, ’warm’, ?,?,?,?> }
Therefore s4 and G4 are Final hypothesis
5. Inductive Bias
• Remarks on candidate elimination and version space algorithms
1. Will the CE algorithm gives us correct hypothesis?
2. What training examples should the learner request next
Inductive Learning: From examples we derive rules
Deductive Learning: Already existing rules are applied to our examples.
Biased hypothesis space: Does not consider all types of training examples.
Solution-> include all hypothesis.
Ex: sunny ^ warm ^ normal ^ strong ^ cool ^ change  Yes
Unbiased hypothesis space
Providing a hypothesis capable of representing set of all examples
Possible instances: 3*2*2*2*2*2 = 96
Target concepts: (huge) (practically, not possible)

Idea of Inductive Bias


Learner generalizes beyond the observed training examples to infer new examples
>  Inductively inferred from
x > y  Y is Inductively inferred from ‘x’

Example: Learning Algorithm => L


Training Data = {x, c(x)}
New instance =
Represented as L ( , )
( ^ ) > L( )
6. Decision Tree Learning
• It is used in tree structured classification and regression
• Dataset -> Algorithm -> classifies the data (decision tree algorithm)
• 2 types of nodes
1. Decision node
2. Leaf node
Example:
Algorithm (ID3) ID3 algorithm, stands for Iterative Dichotomiser 3

1. In the given dataset, choose a target attribute


2. Calculate Information Gain of target attribute
IG= -P/P+N - N/P+N
3. For remaining attributes, Find Entropy
entropy = IG * Probability
€(A) = Σ /P+N * I()
4. Calculate Gain = IG - €(A)
Example: Age Competetion Type Profit
old Yes s/w Down
old No s/w Down
old No h/w Down
mid Yes s/w Down
mid Yes h/w Down

mid No h/w Up
mid No s/w Up
new Yes s/w Up
new No h/w Up
new No s/w Up

Step 1: Target Attribute = profit


Step 2: Information Gain (IG)
Where P= count(down)=5, N=count(up)=5
Calculate, then value IG = 1
Step 3: Calculate Entropy, for remaining attributes
€(A) = Σ /P+N * I()
Prepare a table for each attribute
Rows -> values of undertaken attribute (old, mid,new)
Columns -> values of target attribute (down , up)
IG(old) =0 ; Probability = 3/10 ; Entropy(old) = 0 * 3/10 =0
In the same manner IG(mid)=1 , probability=4/10 , Entropy(mid)= 0.4
IG(new)=0, probability=3/10, entropy=(new)=0
So, therefore Entropy(Age)= E(old)+E(mid)+E(new)
= 0 + 0.4+0
= 0.4
Step 4: Gain = IG - €(A)
= 1 – 0.4
= 0.6
In the same way, calculate Gain of other attributes.
Gain(competition) = 0.124
Gain(Type) = 0
Gain(Age) = 0.6
Problem: Training examples for the target concept PlayTennis .

Here the target attribute PlayTennis


Gain(S, Outlook) = 0.246
Gain(S, Humidity) = 0.151
Gain(S, Wind) = 0.048
Gain(S, Temperature) = 0.029
First Attribute - Outlook
Categorical values - sunny, overcast and rain
H(Outlook=sunny) = -(2/5)*log(2/5)-(3/5)*log(3/5) =0.971
H(Outlook=rain) = -(3/5)*log(3/5)-(2/5)*log(2/5) =0.971
H(Outlook=overcast) = -(4/4)*log(4/4)-0 = 0
Average Entropy Information for Outlook –
I(Outlook) = p(sunny) * H(Outlook=sunny) + p(rain) * H(Outlook=rain) + p(overcast) *
H(Outlook=overcast)
= (5/14)*0.971 + (5/14)*0.971 + (4/14)*0 = 0.693

Information
Second Gain
Attribute = H(S) - I(Outlook) = 0.94 - 0.693 = 0.247
- Temperature

Categorical values - hot, mild, cool


H(Temperature=hot) = -(2/4)*log(2/4)-(2/4)*log(2/4) = 1
H(Temperature=cool) = -(3/4)*log(3/4)-(1/4)*log(1/4) = 0.811
H(Temperature=mild) = -(4/6)*log(4/6)-(2/6)*log(2/6) = 0.9179
Average Entropy Information for Temperature –
I(Temperature) = p(hot)*H(Temperature=hot) + p(mild)*H(Temperature=mild) +
p(cool)*H(Temperature=cool)
= (4/14)*1 + (6/14)*0.9179 + (4/14)*0.811 = 0.9108

Information Gain = H(S) - I(Temperature)


= 0.94 - 0.9108
Third Attribute - Humidity
Categorical values - high, normal
H(Humidity=high) = -(3/7)*log(3/7)-(4/7)*log(4/7) = 0.983
H(Humidity=normal) = -(6/7)*log(6/7)-(1/7)*log(1/7) = 0.591
Average Entropy Information for Humidity –
I(Humidity) = p(high)*H(Humidity=high) + p(normal)*H(Humidity=normal)
= (7/14)*0.983 + (7/14)*0.591
= 0.787
Information Gain = H(S) - I(Humidity)
= 0.94 - 0.787
= 0.153
Fourth Attribute - Wind
Categorical values - weak, strong
H(Wind=weak) = -(6/8)*log(6/8)-(2/8)*log(2/8) = 0.811
H(Wind=strong) = -(3/6)*log(3/6)-(3/6)*log(3/6) = 1
Average Entropy Information for Wind –
I(Wind) = p(weak)*H(Wind=weak) + p(strong)*H(Wind=strong)
= (8/14)*0.811 + (6/14)*1
= 0.892
Information Gain = H(S) - I(Wind)
= 0.94 - 0.892
Here, the attribute with maximum information gain is Wind. So, the
decision tree built so far

Here, when Outlook = Rain and Wind = Strong, it is a pure class of category
"no". And When Outlook = Rain and Wind = Weak, it is again a pure class
of category "yes".
Appropriate Problems for decision tree
learning
Decision Tree is best suited to problems with the following characteristics:
1. Instances are represented by attribute value pairs.
2. The target function has discrete output values.
3. Training data can have errors.
4. May contain missing attribute values also.
7. Hypothesis space search in Decision Tree Learning

• ID3 can be characterized as searching a space of hypothesis for one


that fits the training examples.
• ID3 will search set of possible decision trees from available hypothesis.
• ID3 performs simple to complex searching.
• First start with empty tree and keep on adding.
• Every discrete valued function can be described by some decision tree.
• Avoids major risk of searching incomplete hypothesis.
• Has only single current hypothesis.
• Cannot determine alternative decision trees.
• Backtracking is not possible.
Inductive Bias in Decision Tree Algorithm
• Set of assumptions
• Inductive bias of ID3 consist of describing the basis by which ID3
chooses one consistent decision tree over all the possible decision trees.
• ID3 search strategy
1. Selects in favour of shorter trees over longer ones.
2. Select element with highest IG as root attribute over lowest IG ones.
Types of Inductive bias
3. Restrictive bias -> based on conditions
4. Preference bias -> based on priorities
Issues in Decision Tree Learning
• Overfitting the data
• Incorporating continuous valued attributes
• Determining depth of tree for correct classification
• Handling attributes with different costs
• Alternative measures for selecting attributes.
THE END

You might also like