0% found this document useful (0 votes)
6 views134 pages

ML - Introduction - Linear Regression - Regularization

The document provides an introduction to machine learning (ML), defining it as a process where a computer program improves its performance on a task through experience. It outlines various types of ML, including supervised, unsupervised, and reinforcement learning, along with key concepts such as regression, classification, and decision-making. Additionally, it highlights the importance of metrics for evaluating model performance in both regression and classification tasks.

Uploaded by

raj.prakhar26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views134 pages

ML - Introduction - Linear Regression - Regularization

The document provides an introduction to machine learning (ML), defining it as a process where a computer program improves its performance on a task through experience. It outlines various types of ML, including supervised, unsupervised, and reinforcement learning, along with key concepts such as regression, classification, and decision-making. Additionally, it highlights the importance of metrics for evaluating model performance in both regression and classification tasks.

Uploaded by

raj.prakhar26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 134

Introduction to Machine

Learning

Dr. Saketh Athkuri


What we have seen so far
Statistics
• Mean
• Median
• Mode
• IQR
• CI
• Hypothesis testing
• 𝑡-test
• 𝑧-test
• 𝜒 2 -test
• 𝐹-test

10-09-2024 10:35 AM 2
ML definition
A computer program is said to learn from Experience E with respect to
task T and performance measure P, if its performance at task T as
measured by P improves with experience E.

Example: Alan Turing, Loan approval example

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York


2. https://www.wordstream.com/blog/ws/2017/07/28/machine-learning-applications

10-09-2024 10:35 AM 3
Artificial Intelligence

Enigma code

Data RULES Output

10-09-2024 10:35 AM 4
Machine Learning

Features
Black box Rules
Output

10-09-2024 10:35 AM 5
Deep Learning

Raw data
Black box Rules
Output

10-09-2024 10:35 AM 6
AI and its fields

Artificial Intelligence

Machine Learning

Deep Learning

10-09-2024 10:35 AM 7
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 8
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 9
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 10
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box
Rul
es

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 11
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box Rules

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 12
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box Rules

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 13
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box Rules

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 14
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box Rules

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 15
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box Rules

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 16
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box Rules

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 17
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Rul
Black box
es

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 18
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box Rules

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 19
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box Rules

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 20
ML definition
A computer program is said to learn from Experience E with respect to task T and performance measure P,
if its performance at task T as measured by P improves with experience E.

Black box Rules

1. Mitchell T M (1997), Machine Learning. McGraw-Hill, New York

10-09-2024 10:35 AM 21
ML overview

Machine
Learning

10-09-2024 10:35 AM 23
ML overview

Regression Classification
•Linear regression •Logistic regression
•Forecasting •Naive-Bayes
•Decision Trees
•SVM, knn
Supervised •Decision trees
•SVM, knn
•Ensemble techniques •Ensemble techniques

10-09-2024 10:35 AM 24
ML overview

Machine
Learning

10-09-2024 10:35 AM 25
ML overview
Dimension
ality
reduction
(PCA)

Clustering
Unsupervised •k-means
•Hierarchical
•DB-Scan

Association
rules

10-09-2024 10:35 AM 26
ML overview

Machine
Learning

10-09-2024 10:35 AM 27
ML overview

Reinforcement

10-09-2024 10:35 AM 28
ML overview

Machine
Learning

10-09-2024 10:35 AM 29
ML overview

10-09-2024 10:35 AM 30
ML overview

Optimization

10-09-2024 10:35 AM 31
Machine Learning – Experience, E
Input 𝒙 ∈ ℝ𝑝 p: number of features in the data

p-tuple
𝑥1 𝑥2 𝑥3 ⋯ 𝑥𝑝
𝒙 = (𝑥1 , 𝑥2 , 𝑥3 , … 𝑥𝑝 ) 𝑳𝒂𝒃𝒆𝒍: 𝒚
𝒙𝟏 𝑥11 𝑥12 𝑥13 ⋯ 𝑥1𝑝
𝒙𝟏 = (𝑥11 , 𝑥12 , 𝑥13 , … 𝑥1𝑝 ) 𝒚𝟏 𝒙𝟐 𝑥21 𝑥22 𝑥23 ⋯ 𝑥2𝑝
⋮ ⋮ ⋮ ⋯ ⋮
𝒙𝒏 𝑥𝑛1 𝑥𝑛2 𝑥𝑛3 ⋯ 𝑥𝑛𝑝
𝒙𝒏 = (𝑥𝑛1 , 𝑥𝑛2 , 𝑥𝑛3 , … 𝑥𝑛𝑝 ) 𝒚𝒏

10-09-2024 10:35 AM 32
Machine Learning – Task, T
• Predict or forecast a value • Group objects
• Classify an object in to one of ‘n’ • Identify areas of interest in an
given categories image – segmentation
• Anomaly detection • Fastest route between two cities
• Transcription • Combination of stocks with
• Translation maximum ROI
• Synthesis of a new exemplar • Predict the next product the
customer will buy
• Determination of missing value –
imputation
• Data Cleaning – Denoising
• Estimation of probability mass
function or density
10-09-2024 10:35 AM 33
What is decision making?
• Decision making is the process of identifying and selecting a
course of action among several alternatives to achieve a
desired outcome.

• Decision making is essential for navigating


uncertainties and achieving
organizational goals.
Types of decision making
1. Certainty

2. Risk

3. Uncertainty
Image Source: Link
Types of decision making
1. Certainty

2. Risk

3. Uncertainty
Image Source: Link
Types of decision making
1. Certainty

2. Risk

3. Uncertainty
Image Source: Link
Types of decision making
1. Certainty

2. Risk

3. Uncertainty
Image Source: Link
NIFTY50

Types of decision making


1. Certainty
Mid cap

2. Risk
Small cap

3. Uncertainty
Image Source: Link
Types of decision making
1. Certainty

2. Risk

3. Uncertainty
Image Source: Link
Types of decision making
1. Certainty

2. Risk

3. Uncertainty
Image Source: Link
Supervised Learning
Linear regression and logistic regression
Supervised learning
• Labelled data – target column or dependent variable

• Labelled data can be numerical or categorical


• What is numerical data – Eg: Age
• What is categorical data – Eg: Type

• Generally, model assumes some relationship. Eg: Linear and logistic


regression

• Applications (identify the right applications)


• Image Classification, Spam Detection, Customer Segmentation, Network
Anomaly, Fraud Detection, House Price Prediction, Handwriting Recognition

10-09-2024 10:35 AM 43
Metrics
Regression Classification
MAE Accuracy
MSE Recall
RMSE Precision
MAPE F1-score
R-square

10-09-2024 10:35 AM 44
Metrics
Regression Classification
MAE Accuracy
MSE Recall
RMSE Precision
MAPE F1-score
R-square

10-09-2024 10:35 AM 45
Linear regression
Simple and Multiple

10-09-2024 10:35 AM 46
MPG – application in automobile sector
• Suppose you want to launch a new car model and wants to find
mileage of it.

• How to find it?

10-09-2024 10:35 AM 47
Dataset

10-09-2024 10:35 AM 48
Dataset

10-09-2024 10:35 AM 49
Dataset

10-09-2024 10:35 AM 50
Dataset
y X

10-09-2024 10:35 AM 51
Dataset
y = 𝛽1 X + 𝛽0

10-09-2024 10:35 AM 52
Model summary (R)

10-09-2024 10:35 AM 53
Model summary (R)

10-09-2024 10:35 AM 54
Model summary (R)

10-09-2024 10:35 AM 55
Model summary (R)

10-09-2024 10:35 AM 56
Model summary (R)

10-09-2024 10:35 AM 57
Model summary (R)

10-09-2024 10:35 AM 58
Model summary (R)

𝑚𝑝𝑔 = −0.0077(𝑤𝑒𝑖𝑔ℎ𝑡) + 46.31


10-09-2024 10:35 AM 59
Model summary (R)

𝑚𝑝𝑔 = −0.0077(𝑤𝑒𝑖𝑔ℎ𝑡) + 46.31


10-09-2024 10:35 AM 60
Model summary (R) What is p-value?

𝑚𝑝𝑔 = −0.0077(𝑤𝑒𝑖𝑔ℎ𝑡) + 46.31


10-09-2024 10:35 AM 61
Model summary (R)

10-09-2024 10:35 AM 62
Multiple
𝑦ො linear regression
= 𝛽𝑖 X𝑖 + 𝛽0

10-09-2024 10:35 AM 63
Model summary (MLR)

10-09-2024 10:35 AM 64
Model summary (MLR)

10-09-2024 10:35 AM 65
Model summary (MLR)

10-09-2024 10:35 AM 66
Model summary (MLR)

10-09-2024 10:35 AM 67
Model summary (MLR)

10-09-2024 10:35 AM 68
Model summary (MLR)

10-09-2024 10:35 AM 69
Model summary (MLR)

10-09-2024 10:35 AM 70
Model summary (MLR)

10-09-2024 10:35 AM 71
Mutual fund manager skill

10-09-2024 10:35 AM 72
Mutual fund manager skill

10-09-2024 10:35 AM 73
Mutual fund manager skill

10-09-2024 10:35 AM 74
Mutual fund manager skill

10-09-2024 10:35 AM 75
Mutual fund manager skill

10-09-2024 10:35 AM 76
Application

10-09-2024 10:35 AM 77
Application

10-09-2024 10:35 AM 78
𝜶, 𝜷 values

10-09-2024 10:35 AM 79
𝜶, 𝜷 values

10-09-2024 10:35 AM 80
𝜶, 𝜷 values

10-09-2024 10:35 AM 81
𝜶, 𝜷 values

10-09-2024 10:35 AM 82
How to find 𝜷𝒊 ?
𝑦ො = 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0

2
SSE: 𝑦 − 𝑦ො
Minimize SSE to get 𝛽s.

2
Obj function: 𝑦 − 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0

10-09-2024 10:35 AM 83
Visualization

10-09-2024 10:35 AM 84
Visualization

10-09-2024 10:35 AM 85
Visualization

10-09-2024 10:35 AM 86
Assumptions
1.Linearity: The relationship between independent and dependent
variables is linear. This can be checked using scatter plots or
residual plots.
2.Independence: Observations are independent of each other. This
assumption is often verified through knowledge of the data collection
and experiment design.
3.Homoscedasticity: The variance of the residuals (or "errors")
should be constant across all levels of the independent variables. A
plot of residuals vs. predicted values can help check this.
4.Normality of Errors: The residuals (or "errors") should be
approximately normally distributed. This can be checked using
histograms or QQ-plots of residuals

10-09-2024 10:35 AM 87
Residual plots

Linearity Normality

Homoscedasticity

10-09-2024 10:35 AM 88
Residual plots

Linearit Normality

Homoscedasticity

10-09-2024 10:35 AM 89
Residual plots

Linearity Normality

Homoscedasticity

10-09-2024 10:35 AM 90
Residual plots

Linearity
Normality

Homoscedasticity

10-09-2024 10:35 AM 91
Residual plots

Linearity Normality

Homoscedasticity

10-09-2024 10:35 AM 92
Residual plots

Linearity Normality

Homoscedasticity

10-09-2024 10:35 AM 93
Residual plots

Linearity Normality

Homoscedasticity

10-09-2024 10:35 AM 94
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.

• High residual points – points having high residuals can also be


influential points.

• Cook's Distance: Cook's Distance is a measure that combines


leverage and residual to identify influential points. It measures the
effect of deleting a given observation.

10-09-2024 10:35 AM 95
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.

• High residual points – points having high residuals can also be


influential points.

• Cook's Distance: Cook's Distance is a measure that combines


leverage and residual to identify influential points. It measures the
effect of deleting a given observation.

10-09-2024 10:35 AM 96
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.

• High residual points – points having high residuals can also be


influential points.

• Cook's Distance: Cook's Distance is a measure that combines


leverage and residual to identify influential points. It measures the
effect of deleting a given observation.

10-09-2024 10:35 AM 97
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.

• High residual points – points having high residuals can also be


influential points.

• Cook's Distance: Cook's Distance is a measure that combines


leverage and residual to identify influential points. It measures the
effect of deleting a given observation.

10-09-2024 10:35 AM 98
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.

• High residual points – points having high residuals can also be


influential points.

• Cook's Distance: Cook's Distance is a measure that combines


leverage and residual to identify influential points. It measures the
effect of deleting a given observation.

10-09-2024 10:35 AM 99
Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.

• High residual points – points having high residuals can also be


influential points.

• Cook's Distance: Cook's Distance is a measure that combines


leverage and residual to identify influential points. It measures the
effect of deleting a given observation.

10-09-2024 10:35 AM 100


Beware of Influential points
• Leverage – measure of how much the independent variable values
of an observation differ from the mean of those independent
variables.

• High residual points – points having high residuals can also be


influential points.

• Cook's Distance: Cook's Distance is a measure that combines


leverage and residual to identify influential points. It measures the
effect of deleting a given observation.

10-09-2024 10:35 AM 101


Python code using statsmodels
import statsmodels.api as sm

# Define independent variables (X) and dependent variable (y)


X = df[['horsepower', 'weight']]
X = sm.add_constant(X) # Add a constant column for the
intercept
y = df['mpg']

# Fit the linear regression model


model_statsmodels = sm.OLS(y, X).fit()

# Print the summary of the regression


print(model_statsmodels.summary())

10-09-2024 10:35 AM 102


Python code using sklearn
from sklearn.linear_model import LinearRegression

# Define independent variables (X) and dependent variable (y)


X = df[['horsepower', 'weight']]
y = df['mpg']

# Initialize and fit the linear regression model


model_sklearn = LinearRegression().fit(X, y)

# Print the coefficients and intercept


print("Intercept:", model_sklearn.intercept_)
print("Coefficients:", model_sklearn.coef_)

10-09-2024 10:35 AM 103


Multicollinearity
What is multi-collinearity?

Variance Inflation Factor (VIF):

1
𝑉𝐼𝐹 𝑋𝑖 =
1 − 𝑅𝑖2

In practice, a VIF value exceeding 5 or 10 suggests that


multicollinearity may be a problem and should be further investigated.

10-09-2024 10:35 AM 105


Numerical attributes

10-09-2024 10:35 AM 106


Handling Categorical Attributes
Qualification
_btech
_phd
_mtech
10Btech

01Mtech

01Phd

01Phd

01Mtech

01Mtech

10-09-2024 10:35 AM 107


Handling Categorical Attributes
Qualification Qualification
_phd
_mtech _btech
0Btech 1

1Mtech
0 0

0Phd
1 0

0Phd
1 0

1Mtech
0 0

1Mtech
0 0

10-09-2024 10:35 AM 108


Handling Categorical Attributes
Qualification Qualification Qualification
_phd _btech _mtech
0Btech 1 0

0Mtech 0 1

1Phd 0 0

1Phd 0 0

0Mtech 0 1

0Mtech 0 1

10-09-2024 10:35 AM 109


Handling Categorical Attributes
Qualification Qualification Qualification Qualification
_btech _mtech _phd
Btech 1 0 0

Mtech 0 1 0

Phd 0 0 1

Phd 0 0 1

Mtech 0 1 0

Mtech 0 1 0

10-09-2024 10:35 AM 110


Handling Categorical Attributes
Qualification Qualification Qualification Qualification
_btech _mtech _phd
Btech 1 0 0

Mtech 0 1 0

Phd 0 0 1

Phd 0 0 1

Mtech 0 1 0

Mtech 0 1 0

10-09-2024 10:35 AM 111


Transformations – handle non-linear data

10-09-2024 10:35 AM 112


Transformations – handle non-linear data

10-09-2024 10:35 AM 113


Transformations – handle non-linear data

10-09-2024 10:35 AM 114


Box-Cox transformations
• Learn it on your own

10-09-2024 10:35 AM 115


Regularization
• Bias – Variance tradeoff

10-09-2024 10:35 AM 116


Regularization
• Bias – Variance tradeoff

Data

10-09-2024 10:35 AM 117


Regularization
• Bias – Variance tradeoff

Data

10-09-2024 10:35 AM 118


Regularization
• Bias – Variance tradeoff

Data

10-09-2024 10:35 AM 119


Regularization
• Bias – Variance tradeoff

Data

10-09-2024 10:35 AM 120


Regularization
• Bias – Variance tradeoff

Data Unseen data

10-09-2024 10:35 AM 121


Regularization
• Bias – Variance tradeoff

Data Unseen data

10-09-2024 10:35 AM 122


Regularization
• Bias – Variance tradeoff

Data Unseen data

10-09-2024 10:35 AM 123


Regularization
• Bias – Variance tradeoff

Data Unseen data

10-09-2024 10:35 AM 124


Regularization
• Bias – Variance tradeoff

Data Unseen data

10-09-2024 10:35 AM 125


Regularization
• Bias – Variance tradeoff

Data Unseen data

10-09-2024 10:35 AM 126


Regularization
• Bias – Variance tradeoff

Data Unseen data

10-09-2024 10:35 AM 127


Regularization
So, we understand that we should reduce model complexity.

What is model complexity?

𝑦ො = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝛽4 𝑥4 + 𝛽5 𝑥5 + 𝛽6 𝑥6 + ⋯

10-09-2024 10:35 AM 129


Regularization
So, we understand that we should reduce model complexity.

What is model complexity?

𝑦ො = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝛽4 𝑥4 + 𝛽5 𝑥5 + 𝛽6 𝑥6 + ⋯

How to reduce the complexity now?

10-09-2024 10:35 AM 130


Regularization
Loss function:

2
𝑦 − 𝑦ො
or
(𝑦 − 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝛽4 𝑥4 + 𝛽5 𝑥5
2
+ 𝛽6 𝑥6 + ⋯ )

10-09-2024 10:35 AM 131


Regularization
Loss function:

2
𝑦 − 𝑦ො + 𝜆|𝛽𝑖 |

What if 𝜆 is high?

10-09-2024 10:35 AM 132


Regularization
Loss function:

2
𝑒𝑟𝑟𝑜𝑟 = 𝑦 − 𝑦ො + 𝜆|𝛽𝑖 |

What if 𝜆 is high?

𝜆 𝛽𝑖 𝑒𝑟𝑟𝑜𝑟

10-09-2024 10:35 AM 133


Regularization
Loss function:

2
𝑒𝑟𝑟𝑜𝑟 = 𝑦 − 𝑦ො + 𝜆|𝛽𝑖 |

What if 𝜆 is high?

𝜆 𝛽𝑖 𝑒𝑟𝑟𝑜𝑟

10-09-2024 10:35 AM 134


Regularization
Loss function:

2
𝑒𝑟𝑟𝑜𝑟 = 𝑦 − 𝑦ො + 𝜆|𝛽𝑖 |

What if 𝜆 is high?

𝜆 𝛽𝑖 𝑒𝑟𝑟𝑜𝑟

10-09-2024 10:35 AM 135


Regularization
Loss function:

2
𝑒𝑟𝑟𝑜𝑟 = 𝑦 − 𝑦ො + 𝜆|𝛽𝑖 |

What if 𝜆 is high?

𝜆 𝛽𝑖 𝑒𝑟𝑟𝑜𝑟

10-09-2024 10:35 AM 136


Applications

Index tracker

10-09-2024 10:35 AM 137

You might also like