0% found this document useful (0 votes)
45 views97 pages

DMML Unit 3

Uploaded by

Aditya Khajuria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views97 pages

DMML Unit 3

Uploaded by

Aditya Khajuria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

Given a picture of a face, whose face does it belong to?

Suppose you
wish to implement a home security system that recognizes who is
entering your house.

Given this problem, I propose the following two solutions:

• The non-machine learning algorithm that will define a face as


having a roundish structure, two eyes, hair, nose, and so on. The
algorithm then looks for these hard-coded features in the photo and
returns whether or not it was able to find any of these features.

• The machine learning algorithm will work a bit differently. The


model will only be given several pictures of faces and non-faces that
are labeled as such. From the examples (called training sets) it would
figure out its own definition of a face.
Assumptions that are universal for any machine learning model, as
follows:
• The data used has to be preprocessed and cleaned. Almost no
machine learning model will tolerate dirty data with missing values or
categorical values. Use dummy variables and filling/dropping
techniques to handle these discrepancies.
• Each row of a cleaned dataset represents a single observation of the
environment we are trying to model.
• If our goal is to find relationships between variables, then there is an
assumption that there is some kind of relationship between these
variables. This assumption is particularly important. Many machine
learning models take this assumption very seriously. These models are
not able to communicate that there might not be a relationship.
• Machine learning models are generally considered semiautomatic,
which means that intelligent decisions by humans are still needed. The
output of most models is a series of numbers and metrics attempting
to quantify how well the model did. It is up to a human to put these
metrics into perspective and communicate the results to an audience
• Most machine learning models are sensitive to noisy data. This means
that the models get confused when you include data that doesn't
make sense.
How does machine learning work?
Machine learning works by taking in data, finding
relationships within the data, and giving as output
what the model learned, as illustrated in the
following diagram:
Which of the following is a supervised learning problem?

1. Predicting credit approval based on historical data


2. Grouping people in a social network.
3. Predicting the gender of a person from his/her image. You are
given the data of 1 Million images along the gender.
4.Given the class labels of old news articles, predcting the class
of a new news article from its content. Class of a news article
can be such as sports, politics, technology, etc
Which of the following are classification problems?
1. Predicting the temperature (in Celsius) of a room from
other environmental features (such as atmospheric
pressure, humidity etc).
2. Predicting if a cricket player is a batsman or bowler
given his playing records.
3. Finding the shorter route between two existing routes
between two points.
4. Predicting if a particular route between two points has
traffic jam or not based on the travel time of vehicles.
5. Filtering of spam messages
Which of the following is a regression task?
1. Predicting the monthly sales of a cloth store in rupees.
2. Predicting if a user would like to listen to a newly
released song or not based on historical data.
3. Predicting the confirmation probability (in fraction) of
your train ticket whose current status is waiting list
based on historical data.
4. Predicting if a patient has diabetes or not based on
historical medical records.
5. Predicting the gender of a human
Which of the following is an unsupervised task?
1. Learning to play chess.
2. Predicting if an edible item is sweet or spicy
based on the information of the ingredients
and their quantities.
3. Grouping related documents from an
unannotated corpus.
4. all of the above
Ans: E
Issues in Machine Learning
1. Poor Quality of Data
2. Underfitting of Training Data
3. Overfitting of Training Data
4. Machine Learning is a Complex Process
5. Lack of Training Data
6. Slow Implementation
7. Imperfections in the Algorithm When Data
Grows
Concept Learning –

1. FIND-S Algorithm
2. Candidate Elimination
Algorithm
Concept Learning
• A Formal Definition for Concept Learning:

Inferring a boolean-valued function from training examples


of its input and output.

• An example for concept-learning is the learning of bird-concept from


the given examples of birds (positive examples) and non-birds
(negative examples).
• We are trying to learn the definition of a concept from given
examples.

5
3
h0 = <0, 0, 0, 0, 0, 0>

H1 = <Sunny, Warm, Normal, Strong, Warm, Same>

H2 = <Sunny, Warm, ? , Strong, Warm, Same>

H3=H2

H4 = <Sunny, Warm, ? , Strong, ?, ?>


H0 = <0,0,0,0,0,0>
H1 = <Sunny, Warm, Normal, Strong, Warm, Same>
H2 = <Sunny, Warm, ?, Strong, Warm, Same>
H3 = h2
H4 = <Sunny, Warm, ?, Strong, ?, ?>
Apply FIND S Algorithm to the given dataset to find the most specific hypothesis that is
consistent with the given dataset
Apply FIND S Algorithm to the given dataset to find the most specific hypothesis that is
consistent with the given dataset

H0 = <0, 0, 0, 0, 0>
H1 = <Japan, Honda, Blue, 1980, Economy> = h2
H3 = <Japan, ?, Blue, ?, Economy> = h4
H5 = <Japan, ?, ?, ?, Economy>
H6 = <Japan, ?, ?, ?, Economy> = h7
Algorithm:
For each training example d, do:

If d is positive example
Remove from G any hypothesis h inconsistent with d
For each hypothesis s in S not consistent with d:
Remove s from S
Add to S all minimal generalizations of s consistent with d and having a
generalization in G
Remove from S any hypothesis with a more specific h in S

If d is negative example
Remove from S any hypothesis h inconsistent with d
For each hypothesis g in G not consistent with d:
Remove g from G
Add to G all minimal specializations of g consistent with d and having a
specialization in S
Remove from G any hypothesis having a more general hypothesis in G
Example Citations Size InLibrary Price Editions Buy

1 Some Small No Affordable One No

2 Many Big No Expensive Many Yes

3 Many Medium No Expensive Few Yes

4 Many Small No Affordable Many Yes

S1 = S0 = {<0,0,0,0,0>} Grow +Instances


S2 = <Many, Big, No, Expensive, Many>
S3 = {<Many, ?, No, Expensive, ?>}
S4 = {<Many, ? No, ?,?>}

G4 ={<Many, ????>}
G3 ={<Many, ????>, <??? Expensive ?>}
G2 ={<Many, ????>, <?,Big, ???>, <??? Expensive ?> <????Many>}
G1={<Many, ????>, <?,Big, ???>, <? Medium???> <??? Expensive ?> <????Many> <????Few>}
G0={<?,?,?,?,?>} Shrink –ve instance
S0 = {<0,0,0,0,0>} +ve
S2 = S1 = {<Japan, Honda, Blue, 1980, Economy>}
S4 = S3 = {<Japan, ?,Blue, ?, Economy >}
S5 = {<Japan, ? ?,?Economy>}

G5 = {<Japan, ???, Economy>}


G4 = {<??Blue??> <Japan, ??? Economy>,<??Blue?Economy> }
G3 = {<?? Blue ??> <???? Economy>}
G2 = {<?, Honda, ???>, <?? Blue ? ?, <??? 1980 ?><???? Economy> }
G1 = G0 = {<?????>} -ve
S0 = {<0,0,0,0,0>} +ve
S2 = S1 = {<Japan, Honda, Blue, 1980, Economy>}
S4 = S3 = {<Japan, ?,Blue, ?, Economy >}
S5 = {<Japan, ? ?,?Economy>}
S6 = {<Japan, ??? Economy>}
S7 = {}

G7 = {}
G6 = G5 = {<Japan, ???, Economy>}
G4 = {<??Blue??> <Japan, ??? Economy> }
G3 = {<?? Blue ??> <???? Economy>}
G2 = {<?, Honda, ???>, <?? Blue ? ?, <??? 1980 ?><???? Economy> }
G1 = G0 = {<?????>} -ve
S0 = {<0,0,0,0,0,>} +ve
S2 = S1={<Round, Triangle, Round, Purple, Yes>}
S4 = S3 = {<?,Triangle, Round, ?, Yes>}
S5 = <?, ?, Round, ?, Yes>

G5 = {<? ? Round ? Yes>}


G4 = {<? Triangle??Yes> <? ? Round ? Yes>}}
G3 = {<? Triangle???>,<??Round??>}
G2 = {<Round ????>,<? Triangle???>,<??Round??><???Purple?>}
G1 = G0= {<?????>} -ve
16

2. What is the maximum number of semantically distinct


hypotheses in the hypotheses space??

81

You might also like