ML 1 (Introduction)
ML 1 (Introduction)
●Learning by self
❖ In many situations, humans are left to learn on their own.
❖ A classic example is a baby learning to walk through obstacles.
❖ He bumps on to obstacles and falls down multiple times till he learns that whenever there is
an obstacle , he needs to cross over it.
❖ Not all things are taught by others.
❖ A lot of things need to be learnt only from mistakes made in the past.
AI-ML-DL :
1. Artificial Intelligence(AI): AI enables machines to think
without any human intervention. It is a broad area of
computer science
2. Machine Learning (ML): ML is a subset of AI that uses
statistical learning algorithms to build smart systems. The
ML systems can automatically learn and improve without
explicitly being programmed.
3. Deep Learning (DL): This subset of AI is a technique that is
inspired by the way a human brain filters information.
Evolution of Machine Learning…
What is Machine Learning?
❖ Before learning this we should be able to answer more
fundamental questions like:
➢ Do machines really learn?
➢ If so , how do they learn?
➢ Which problem do we consider as well posed learning
problem?
➢ What are the important features that are required to well
define a learning problem?
Well Posed:
Having a unique solution whose value changes only slightly if initial conditions change
slightly.
What is Machine Learning…
Abstraction
Input
Data
Abstraction
● During the machine learning process, knowledge is fed in the form of
input data. However data cannot be used in the original shape and
form.
● Abstraction helps in deriving a conceptual map based on input data.
● This map or a model as it is known as in machine learning paradigm
is summarized knowledge representation of the raw data.
● The model may be in one of the following forms
○ If/else rules
○ Mathematical equations
○ Data structures like tree or graphs
○ Logical grouping of similar observations
Contd.
● The choice of model to solve a specific learning problem is a human task. The
decision related to the choice of model is taken on multiple aspects like
○ The type of problem to be solved
○ Nature of input data
○ Domain of the problem
● Once the model is chosen the next task is to fit the model based on the input
data.
● For ex:
○ In a case where the model is represented by a mathematical equation , say ‘y=c1 +
c2x’, based on the input data, we have to find out the values of c1 and c2.
○ Otherwise the equation is of no use.
○ So, fitting the model, in this case, means finding the values of unknown coefficients or
constants of the equation or the model.
○ This process of fitting the model based on input data is known as training
○ Also the input data based on which model is being finalized is known as training
data .
Generalization
● Next part is to tune up the abstracted knowledge to a form which can be
used to take future decisions.
● This is achieved as part of generalization
● This part is quite difficult to achieve.
● This is because the model is trained based on a finite set of data, which may
possess a limited set of characteristics.
● But when we want to apply the model to take decision on a set of
unknown data, usually termed as test data, we may encounter two
problems.
○ The trained model is aligned with the training data too much , hence may not
portray the actual trend.
○ The test data possess certain characteristics apparently unknown to the
training data.
Types of Machine Learning
Machine Learning
Problem 2: You’d like software to examine individual customer accounts, and for each
account decide if it has been hacked/compromised. Should you treat these as
classification or as regression problems?
Given data about the size of houses on the real estate market, try to predict their price. Price
as a function of size is a continuous output, so this is a regression problem.
Example 2:
Given a picture of a person, we have to predict their age on the basis of the given picture
Regression- Contd
● Let us take the example of yearly budgeting exercise of the sales
managers
● They have to give sales prediction for the next year based on sales figure of
previous years investment put in.
● Data related to past and future is continuous in nature.
● A simple linear regression model can be applied with
investment as predictor variable and sales revenue as target variable.
Regression- Contd
● A typical linear regression model can be represented in
the form-
○ y= 𝜶 + 𝜷x
○ Where x is predictor variable and y is target variable
● Given figure here, the input data comes from a famous
multivariate data set names Iris introduced by the British
statistician and biologist Ronald Fisher.
● The data set consists of 50 samples from each of three
species of Iris- Iris setosa, Iris virginica and Iris
versicolor.
● Four features were measured for each sample to
distinguish different species of flower-
○ Sepal length
○ Sepal width
○ Petal length
○ Petal width
Regression- Contd.
● The iris dataset is typically used as a training data for
solving the classification problem of predicting the
flower species based on feature values.
● But we can also demonstrate regression using this data
set by predicting the value of one feature using another
feature as predictor.
● In the figure given previously, petal length is predictor
variable , which helps in predicting the value of target
variable petal width.
● Typical applications of regression
○ Forecasting in retails
○ Sales prediction
○ Price prediction
○ Weather forecasting
○ Skill demand forecasting
Unsupervised learning
● There is no labelled training data to learn from and no prediction to be made.
● The objective is to take a dataset as input and try to
find natural groupings or patterns within the data
● It is often termed as descriptive model and the process of unsupervised
learning is referred to as pattern discovery or knowledge discovery.
● Clustering is the main type of unsupervised learning.
○ It intends to group or organize similar objects together.
○ Objects belonging to the same cluster are quite similar to each other while
objects belonging to different clusters are quite dissimilar.
○ Objective of clustering is to discover the intrinsic grouping of unlabelled data and
form clusters .
○ Different measures of similarity can be applied for clustering
Ǫuestion?
Of the following examples, which would you address using an unsupervised
learning algorithm? (Check all that apply.)
● Given a set of news articles found on the web, group them into
set of articles about the same story.
● Given a database of customer data, automatically discover
market segments and group customers into different market
segments.
● Common similarity measure is distance.
● Two data items are considered a part of the same cluster if the distance
between them is less.
● If the distance between the data items is high, the items do not generally
belong to the same cluster.
● This is known as distance based clustering.
Association analysis
● One more variant of unsupervised learning.
● Association between data items is identified.
● Examples: Market basket analysis.
○ From past transaction data in a grocery store, it may be observed that
most of the customers who have bought item A, have also bought item
B and item C or atleast one of them.
○ This means that there is a strong association of the event ‘purchase of
item A’ with the event ‘purchase of item B’ or ‘purchase of item C’ .
○ Identifying these sort of associations is the goal of association analysis.
● Applications:
○ Market basket analysis
○ Recommender systems
Reinforcement learning
● Example: We have seen babies learning to walk without any prior knowledge of how to do it.
○ First they notice how others do it.
○ They understand that legs have to be used, one at a time, to take a step
○ While walking, sometimes they fall down hitting an obstacle, whereas other times they are able to
walk smoothly
○ Babies might get a reward like clapping of hands by parents or chocolates.
○ Obviously no claps when baby falls.
○ Slowly a time comes when the babies learn from mistakes and are able to walk with much ease
● In the same way, machines often learn to do tasks automatically.
● Machine is given a task with hurdles.
● It tries to improve its performance of doing task .
● When a sub task is completed successfully, a reward is given.
● When a sub task is not performed successfully no reward is given
● This continues until the task is completed successfully.
● This process of learning is called reinforcement learning
● Applications
○ Self driving cars
Comparison -
supervised , unsupervised and reinforcement learning
Ǫuestion?
Some of the problems below are best addressed using a supervised learning algorithm, and the
others with an unsupervised learning algorithm. Which of the following would you apply
supervised learning to? (Select all that apply.) In each case, assume some appropriate dataset
is available for your algorithm to learn from.
1. Given a large dataset of medical records from patients suffering from heart disease, try to learn
whether there might be different clusters of such patients for which we might tailor separate
treatments.
2. Have a computer examine an audio clip of a piece of music, and classify whether or not there are
vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical instruments (and
no vocals).
3. Given data on how 1000 medical patients respond to an experimental drug (such as effectiveness of
the treatment, side effects, etc.), discover whether there are different categories or "types" of
patients in terms of how they respond to the drug, and if so what these categories are.
4. Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the next
10 years.