Lecture 2
Lecture 2
by
Rajdip Nayek
Assistant Professor,
Applied Mechanics Department,
IIT Delhi
3
Example of Supervised learning
Misclassification
4
Example of Supervised learning
Object Recognition: Detect the class of the object
• ImageNet
• 1000 classes
• Lots of variability in
lighting, viewpoint,
etc.
• Deep neural
networks reduced
error rates from
26% to under 4%
5
Example of Unsupervised learning
Unsupervised learning: no labelled examples, you only have input data. You are looking for interesting
patterns in the data
Poisson’s ratio
210 GPa 0.279
70 GPa 0.325
⋮ ⋮
190 GPa 0.267 Elastic modulus
6
Example of Unsupervised learning
Unsupervised learning: no labelled examples, you only have input data. You are looking for interesting
patterns in the data
7
Example of Unsupervised learning
Unsupervised learning: no labelled examples, you only have input data. You are looking for interesting
patterns in the data
• In generative modeling, we want to learn a distribution over some dataset, such as natural images. We
can then sample from the generative model and see how the if it looks like the data.
Generated faces
(not true faces)
8
Example of reinforcement learning
Computer playing a game
Computer (agent that performs action)
Agent
Environment
Game (environment)
• Negative reinforcements
• Hunger
• Pain
• Positive reinforcements
• Food
• Pleasure
10
Supervised Learning
Supervised Learning: Background
▪ We start with Supervised Learning; it is most common type of machine learning (will span most of this course)
▪ The task is to learn the function 𝑓 that best maps certain input (𝐱) to output (𝑦)
Example 1
Example 2
Example 3
Example 4
𝑦=𝑓 𝐱
▪ In statistics, one uses the terminology: 𝐱 → Independent variable, regressor, covariate and 𝑦 → Dependent variable,
response
𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 = 𝑓 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
▪ In computer science, one uses the terminology: 𝐱 → Input attribute, feature and 𝑦 → Output attribute, output
𝑂𝑢𝑡𝑝𝑢𝑡 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 = 𝑃𝑟𝑜𝑔𝑟𝑎𝑚 𝐼𝑛𝑝𝑢𝑡 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝑂𝑢𝑡𝑝𝑢𝑡 = 𝑃𝑟𝑜𝑔𝑟𝑎𝑚 𝐼𝑛𝑝𝑢𝑡 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 12
Supervised Learning: What do we do in supervised ML?
▪ The task in supervised ML is to learn the function 𝑓 that best maps certain input (𝐱) to output (𝑦)
𝑦=𝑓 𝐱
▪ We don’t know what the function (𝑓) looks like or its form
▪ If we knew the form, we would use it directly and we would not need to learn it from data
▪ Moreover, the output 𝑦 is often observed with some errors 𝑒 that is independent of the input 𝐱
▪ The error could be due to measurement instrument errors
▪ The error could be due to not including enough input features to sufficient characterize the mapping from 𝐱 to 𝑦
𝑦=𝑓 𝐱 +𝑒
▪ In supervised ML, we use some labelled training data (input-output pairs) that contains examples of how some
input 𝐱 relates to output 𝑦 to learn the input-output mapping
▪ Say, 𝑁 examples of labelled training data: 𝐱 (1) , 𝑦 (1) , 𝐱 (2) , 𝑦 (2) , ⋯ , 𝐱 (𝑁) , 𝑦 (𝑁)
▪ Labelled data: Each input 𝐱 (𝑖) is accompanied by an associated label 𝑦 (𝒊) , which be jointly recorded or labelled later by
some domain expert
13
Supervised Learning: What is reason for learning a function?
𝑦=𝑓 𝐱 +𝑒
▪ The most common type of ML is to learn the mapping 𝑦 = 𝑓 𝐱 to make good predictions of the output for new
examples of input (say, 𝐱*) → to generalize well beyond training data
▪ This is called predictive modeling or predictive analytics and our goal is to make the most accurate
predictions possible
▪ Hence, we are not really interested in the form of the function (𝑓) that we are learning, only that it makes
accurate predictions
▪ We could learn the mapping of 𝑦 = 𝑓 𝐱 to understand more about the relationship in the data → statistical
inference
▪ If this were the goal, we would use simpler methods and give more importance to understanding the
learned form of 𝑓 than making accurate predictions
▪ E.g. “Does eating seafood increase life expectancy?” requires careful reasoning about the function that was
learned
14
Supervised Learning: Learning a function
𝑦 =𝑓 𝐱 +𝑒
▪ Learning a function 𝑓 means estimating its form from the noisy data that is available with us
▪ The estimate will have errors and will not be exactly same as the underlying true mapping from 𝐱 → 𝑦
▪ Much time in applied machine learning is spent attempting to improve the estimate of the underlying function and in
turn improve the performance of the predictions made by the model
▪ Supervised ML algorithms are techniques for estimating the target function 𝑓 to predict the output 𝑦 given input 𝐱
▪ Different ML algorithms make different assumptions about the form of the function being learned
▪ Linear vs nonlinear models
▪ Parametric vs non-parametric models
▪ How to optimize to approximate the mapping
15
Parametric vs Non-parametric algorithms
▪ Assumptions about the unknown 𝑓 can greatly simplify the learning process, but can also limit what can be learned
Parametric models Non-Parametric models
Pros Pros
1. Often simpler and faster 1. Flexible: Can fit any type of function
2. May require less training data 2. Powerful: Can result in better prediction
Cons Cons
1. Constrained: Functional form is fixed 1. More data: Require a lot more training data
2. Poor fit: Unlikely to match the underlying true function 2. Over fitting: Harder to explain certain predictions made
16
Supervised learning: Two types of data
The variables contained in the data (input 𝐱 as well as output 𝑦) can be of two different types:
• Numerical (quantitative)
• Has a natural ordering, i.e., a numerical variable maybe larger or smaller than another one
• Can be continuous or discrete
• Categorical (qualitative)
• Lacks a natural ordering
• Is always discrete
• The notion of categorical vs. numerical applies to both the output variable 𝑦 and to the 𝑝 elements 𝑥𝑗 of the
input vector variable 𝐱 = 𝑥1 𝑥2 ⋯ 𝑥𝑝 𝑇
• Also, the 𝑝 components of the input vector do not have to be of the same type and can be a mix of categorical
and numerical input
▪ For example, having no bike is qualitatively different from having bikes, and we can use the categorical
variable ‘bikes: yes/no’ instead of the numerical ‘0, 1 or 2 bikes’
▪ Therefore, the decision lies the ML engineer whether a certain variable is to be considered as numerical
or categorical
18
Supervised ML: Regression vs Classification
▪ Output variable 𝑦? → categorical → Classification
▪ Output variable 𝑦?→ numerical → Regression
▪ Note that the 𝑝 − dimensional input vector variable 𝐱 = 𝑥1 𝑥2 ⋯ 𝑥𝑝 𝑇 can be either numerical or
categorical for both regression and classification problems
▪ It is only the type of the output that determines whether a problem is a regression or a classification
problem
▪ Multi-class classification: if more than two set of values. E.g. “Sweden”, “Norway”, “Finland”,
“Denmark”
19
Examples of classification and regression
Classification or
Problem Input Output
Regression?
20
Examples of classification and regression
Classification or
Problem Input Output
Regression?
21
Bias-Variance Trade-Off
▪ The goal of any supervised machine learning algorithm is to best estimate the mapping function 𝑓 for the output
variable 𝑦 given the input data 𝐱 = 𝑥1 𝑥2 ⋯ 𝑥𝑝 𝑇
▪ The prediction error for any machine learning algorithm results from three things:
▪ Bias
▪ Variance
▪ Irreducible Error, that cannot be reduced regardless of the algorithm used; caused by factors like partially
known inputs
▪ Bias - the simplifying assumptions made by a model to make the target function easier to learn
▪ They make algorithms easier to understand but are generally less flexible
▪ Low bias: Suggests less assumptions about the function 𝑓
▪ High bias: Suggests more assumptions about the function 𝑓