0% found this document useful (0 votes)

16 views15 pages

Machine Learning Basics 1683717543

Uploaded by

dalpreet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views15 pages

Machine Learning Basics 1683717543

Uploaded by

dalpreet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

2

Contents
1. Loading the dataset and fitting the model: 3
1.1. Dataset 3
1.2. Model to fit 4

2. Evaluation Metrics: 5
2.1. R2 score 5
2.1.1. Definition 5
2.1.2. Recomputed 6
2.1.3. sklearn 6
2.1.4. Both 6
2.1.5. Let’s Visualize 6
2.2. Explained Variable score 7
2.2.1. Definition 7
2.2.2. Recomputed 7
2.2.3. sklearn 8
2.2.4. Both 8
2.3. MSE: Mean Squared Error 8
2.3.1. Definition 8
2.3.2. Recomputed 9
2.3.3. sklearn 9
2.3.4. Both 9
2.4. RMSE: Root Mean Squared Error 9
2.4.1. Definition 9
2.4.2. Recomputed 10
2.4.3. sklearn 10
2.4.4. Both 10
2.5. MAE: Mean Absolute Error 10
2.5.1. Definition 10
2.5.2. Recomputed 10
2.5.3. sklearn 11
2.5.4. Both 11
2.6. MAPE: Mean Absolute Percentage Error 11
2.6.1. Definition 11
2.6.2. Recomputed 11
2.6.3. sklearn 12
2.6.4. Both 12
2.7. MedAE: Median Absolute Value 12
2.7.1. Definition 12
2.7.2. Recomputed 12
2.7.3. sklearn 12
2.7.4. Both 13

3. Evaluation metrics to compare among models: 13

Evaluation Metrics For Regression Models

To be able to measure the performance of a regression model and compare several models, different
metrics could be used: R2 score, MSE, RMSE, MAE, MAPE.

In the following, you will find a definition of each metric. Furthermore, based on the famous dataset
“Diabetes” used in LARS paper, you will also find a practical example in Python, showing how you
compute each metric step by step, and how to use scikit-learn methods. Also, using 2 regression
models, you will find how to exploit those measures to choose the best one.

In all the following metrics formulas, we define:

● yi as the true dependent variable for the observation i
● 𝑦i as the predicted value for the observation i
● N as the number of observations in our dataset
● p as the number of independent variables in our dataset

1. Loading the dataset and fitting the model

1.1. Dataset
We will use the Diabetes data used in the "Least Angle Regression" paper: N=442 patients,
p=10 predictors. One row per patient, and the last column is the response variable.

You can find the dataset (raw) in: https://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt

df=pd.read_csv("https://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt", sep="\t")
print(df.shape)
df.head()

(442, 11)

AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Y

0 59 2 32.1 101.0 157 93.2 38.0 4.0 4.8598 87 151

1 48 1 21.6 87.0 183 103.2 70.0 3.0 3.8918 69 75

We will also normalize data, to have the same order of magnitude among the independent variables:
#### Scaled data: X-mean: # Start by subtracting the mean with StandardScaler, without
dividing by the std(with_std=False)
df_sc=StandardScaler(with_std=False).fit_transform(df)
4

df_sc=pd.DataFrame(data=df_sc, columns=df.columns)

#### Normalize data: divide each feature by its L2 norm

# If axis=0 no need to transpose, this available in "normalize" method but not in
"Normalizer"
df_norm=normalize(df_sc.iloc[:,:-1],axis=0,norm='l2')
#or transpose the dataframe: (as axis=1 is the default value )
#df_norm=normalize(df_sc.iloc[:,:-1].T,norm='l2')
#(not to forget to transpose the results too, to go back to the initial shape)

df_norm=pd.DataFrame(data=df_norm, columns=df.columns[:-1])

df_norm['Y']=df_sc['Y']
print("Normalized data: Scaled data/L2 norm")
df_norm.head()

1.2. Model to fit

To illustrate the computation of the different metrics, step by step, let's take the predicted
values given by the OLS model “y_predict_ols”. In this example we will not be splitting the dataset
to train and test sets. It's not the objective of this notebook.

features=df.columns[:-1]
X=df_norm[features]
y=df_norm['Y']

#### OLS: Ordinary Least Square

reg_ols=LinearRegression(fit_intercept=False)
reg_ols.fit(X,y)
#Predict values
y_predict_ols=reg_ols.predict(X)
5

2. Evaluation Metrics:

2.1. R2 score

2.1.1. Definition

R2 score or the coefficient of determination measures how much the explanatory variables
explain the variance of the dependent variable. It indicates if the fitted model is a good one, and if it
could be used to predict the unseen values.

The best value of R2 is 1, meaning that the model is a perfect fit of our dataset. It could be 0, meaning
that the model is a constant and it will always predict the expected average value of the dependent
variable y, regardless of the input variables. It could also be negative, the model could be arbitrarily
worse.

With 𝑦i the predicted value for the observation i, and 𝑦 is the average value of the dependent variable:

Let’s introduce another measure RSS “Residual Sum of Square” which as its name indicates, compute
the square of the residual error (difference between the real and the predicted value):

We can then, write the formula of the R2 as:

By adding more and more independent variables to the dataset, the R2 score will be mechanically
improved, which does not mean that those variables are really improving the accuracy of the
prediction. Therefore, the adjusted R2 is introduced to provide a more precise accuracy by taking into
account how many independent variables are used:
6

2.1.2. Computed

df_errors=pd.DataFrame()
df_errors['y_true']=y
df_errors['yhat']=y_predict_ols

df_errors['y_yhat']=df_errors['y_true']-df_errors['yhat']
df_errors['(y_yhat)**2']=df_errors['y_yhat']**2

mean_y=df_errors['y_true'].mean()
df_errors['(y_ymean)**2']=(df_errors['y_true']-mean_y)**2

r2_recomp=1-(np.sum(df_errors['(y_yhat)**2'])/np.sum(df_errors['(y_ymean)**2']))
r2_recomp

0.5177484222203499

2.1.3. sklearn

r2_sk_learn=r2_score(y, y_predict_ols)
r2_sk_learn

0.5177484222203499

2.1.4. Both
print("r2 recomputed:",r2_recomp, ", r2_sk_learn:",r2_sk_learn)

r2 recomputed: 0.5177484222203499 , r2_sk_learn: 0.5177484222203499

2.1.5. Let’s Visualize

plt.scatter(y_predict_ols,y)
plt.plot(y,y,'-r')
plt.annotate(r"R2={0}".format(round(r2_recomp,3)), xy=(200, 200),xytext=(-65, -10),
textcoords='offset points',
fontsize=12)
plt.xlabel("y_predict_ols")
plt.ylabel("y_true")
plt.title("Regression: Predicted vs True y")
7

2.2. Explained Variable score

2.2.1. Definition

The explained variance score is computed:

With:

and:

When the residuals have 0 mean, the explained variable score becomes equal to R2 score:

2.2.2. Computed
N=y.shape[0]

df_errors=pd.DataFrame()
df_errors['y_true']=y
df_errors['yhat']=y_predict_ols

mean_y=df_errors['y_true'].mean()
df_errors['(y_ymean)**2']=(df_errors['y_true']-mean_y)**2
8

var_y=np.sum(df_errors['(y_ymean)**2'])/N

df_errors['y_yhat']=df_errors['y_true']-df_errors['yhat']
mean_yyhat=df_errors['y_yhat'].mean()
df_errors['(y_yhat_yyhatmean)**2']=(df_errors['y_yhat']-mean_yyhat)**2

var_yyhat=np.sum(df_errors['(y_yhat_yyhatmean)**2'])/N

explained_variance_recomp=1-var_yyhat/var_y
explained_variance_recomp

0.5177484222203499

2.2.3. sklearn
explained_variance_sklearn=explained_variance_score(y, y_predict_ols)
explained_variance_sklearn

0.5177484222203499

2.2.4. Both
print("Explained variance recomputed:",explained_variance_recomp, ", Explained variance
from sk_learn:",explained_variance_sklearn)

Explained variance recomputed: 0.5177484222203499 , Explained variance from sk_learn: 0.5177484222203499

One can see that the Explained Variance and the R2 score are equal. That's because the mean
of the residual is ~0:

mean_yyhat=df_errors['y_yhat'].mean()
mean_yyhat

-4.4810811672653376e-14

2.3. MSE: Mean Squared Error

2.3.1. Definition

Mean square Error compute the average squared error between the true value and the
predicted one:

We can recall here that the RSS (Residual Sum of Square):

Then:

It gives more importance to the highest errors, thus it’s more sensitive to outliers.

2.3.2. Computed
N=y.shape[0]

df_errors=pd.DataFrame()
df_errors['y_true']=y
df_errors['yhat']=y_predict_ols
df_errors['y_yhat']=df_errors['y_true']-df_errors['yhat']
df_errors['(y_yhat)**2']=df_errors['y_yhat']**2

mse_recomp=np.sum(df_errors['(y_yhat)**2'])/N
mse_recomp

2859.69634758675

2.3.3. sklearn
mse_sklearn=mean_squared_error(y, y_predict_ols)
mse_sklearn

2859.69634758675

2.3.4. Both
print("MSE recomputed:",mse_recomp, ", MSE from sk_learn:",mse_sklearn)

MSE recomputed: 2859.69634758675 , MSE from sk_learn: 2859.69634758675

2.4. RMSE: Root Mean Squared Error

2.4.1. Definition

RMSE is the root squared of MSE

2.4.2. Computed
rmse_recomp=np.sqrt(mse_recomp)
rmse_recomp

53.47612876402657

2.4.3. sklearn
rmse_sklearn=mean_squared_error(y, y_predict_ols,squared=False)
rmse_sklearn

53.47612876402657

2.4.4. Both
print("RMSE recomputed:",rmse_recomp, ", RMSE from sk_learn:",rmse_sklearn)

RMSE recomputed: 53.47612876402657 , RMSE from sk_learn: 53.47612876402657

2.5. MAE: Mean Absolute Error

2.5.1. Definition

MAE is the mean absolute error, is more robust to outliers than the MSE:

2.5.2. Computed
N=y.shape[0]

df_errors=pd.DataFrame()
df_errors['y_true']=y
df_errors['yhat']=y_predict_ols
df_errors['y_yhat']=df_errors['y_true']-df_errors['yhat']
df_errors['abs(y_yhat)']=df_errors['y_yhat'].abs()

mae_recomp=np.sum(df_errors['abs(y_yhat)'])/N
mae_recomp

43.27745202531506
11

2.5.3. sklearn
mae_sklearn=mean_absolute_error(y, y_predict_ols)
mae_sklearn

43.27745202531506

2.5.4. Both
print("MAE recomputed:",mae_recomp, ", MAE sklearn:",mae_sklearn)

MAE recomputed: 43.27745202531506 , MAE sklearn: 43.27745202531506

2.6. MAPE: Mean Absolute Percentage Error

2.6.1. Definition

MAPE also known as Mean Absolute Percentage Deviation (MAPD). This measure is
sensitive to relative errors. It’s also insensitive to a global scaling of the target value.
It’s also more robust to outliers than MAE.

With 𝛆 a very small strict positive value, to avoid undefined value when dividing by a value of y=0.
The score could be high when the y is small, or when the difference of the absolute value is high.

2.6.2. Computed
N=y.shape[0]
epsilon=0.0001

df_errors=pd.DataFrame()
df_errors['y_true']=y
df_errors['yhat']=y_predict_ols
df_errors['y_yhat']=df_errors['y_true']-df_errors['yhat']
df_errors['abs(y_yhat)']=df_errors['y_yhat'].abs()
df_errors['abs(y_true)']= df_errors['y_true'].apply(lambda x: max(epsilon,np.abs(x)))
df_errors['(y_yhat)/y']=df_errors['abs(y_yhat)']/df_errors['abs(y_true)']

mape_recomp=np.sum(df_errors['(y_yhat)/y'])/N
mape_recomp

3.2636897018293114
12

2.6.3. sklearn
mape_sklearn=mean_absolute_percentage_error(y, y_predict_ols)
mape_sklearn

3.2636897018293114

2.6.4. Both
print("MAPE recomputed:",mape_recomp, ", MAPE sklearn:",mape_sklearn)

MAPE recomputed: 3.2636897018293114 , MAPE sklearn: 3.2636897018293114

2.7. MedAE: Median Absolute Value

2.7.1. Definition

MedAE metric computes the median of all absolute residual values:

This metric does not take into consideration the outliers.

2.7.2. Computed
N=y.shape[0]
epsilon=0.0001

df_errors=pd.DataFrame()
df_errors['y_true']=y
df_errors['yhat']=y_predict_ols
df_errors['y_yhat']=df_errors['y_true']-df_errors['yhat']
df_errors['abs(y_yhat)']=df_errors['y_yhat'].abs()

medae_recomp=np.median(df_errors['abs(y_yhat)'])
medae_recomp

38.5247645673831

2.7.3. sklearn
medae_sklearn=median_absolute_error(y, y_predict_ols)
medae_sklearn

38.5247645673831
13

2.7.4. Both
print("MAPE recomputed:",mape_recomp, ", MAPE sklearn:",mape_sklearn)

MAPE recomputed: 3.2636897018293114 , MAPE sklearn: 3.2636897018293114

3. Evaluation metrics to compare among models

We will use 2 regression models, to fit the data and evaluate their performance. As explained
at the beginning, we will not be splitting the dataset to train and test sets. It's not the objective of this
notebook.

features=df.columns[:-1]
X=df_norm[features]
y=df_norm['Y']

#### OLS: Ordinary Least Square

reg_ols=LinearRegression(fit_intercept=False)
reg_ols.fit(X,y)
#Predict values
y_predict_ols=reg_ols.predict(X)

####LASSO
reg_lasso=Lasso(alpha=1,fit_intercept=False) #Without cross-validation to find the best
alpha, it's not the objective here
reg_lasso.fit(X,y)
#Predict values
y_predict_lasso=reg_lasso.predict(X)

Now that we fit the models, let’s compute the different metrics for each of them:

predicted_values=[y_predict_ols,y_predict_lasso]
models=['OLS','LASSO']
measures_list=[]
i=0

for y_predict in predicted_values:

r2=r2_score(y, y_predict)
explained_variance=explained_variance_score(y, y_predict)
mse=mean_squared_error(y, y_predict)
rmse=mean_squared_error(y, y_predict,squared=False)
mae=mean_absolute_error(y, y_predict)
mape=mean_absolute_percentage_error(y, y_predict)
medae=median_absolute_error(y, y_predict)
measures_list.append([models[i],r2,explained_variance,mse,rmse,mae,mape,medae])
i=+1

df_results=pd.DataFrame(data=measures_list,
columns=['model','r2','explained_var','mse','rmse','mae','mape','medae'])
14

df_results

As shown in the results, the R2 is higher for the OLS (0.52) than the Lasso model (0.36) (even if the
value in absolute terms is not that high). At this stage, we can assume that the OLS is a better model
than the Lasso for our dataset.
Furthermore, the MSE and RMSE are lower for the OLS than the Lasso. Once again, OLS is a good
fit for our dataset.
MAE and MedAE are also showing better results for OLS than Lasso.
However, MAPE is lower for the Lasso than the OLS, showing that there are some values of the true y
that could be high, making the relative value of the residual lower. It could be interesting to study the
outliers in the dataset, and remove them if any.

Globally, the OLS model is showing better metrics than the Lasso model. Thus, between these 2
models, OLS will be a good fit for our dataset.

I hope you enjoy it.

If you want to learn more about data science, have a look on my blog:
https://machinelearning-basic.blogspot.com/
Download for free an ebook on the most famous regression models, simply explained:
https://sites.google.com/view/machinelearning-sample/download-free-ebook
15

Simple Linear Regression Analysis
100% (5)
Simple Linear Regression Analysis
62 pages
Real Statistics Using Excel - Examples Workbook Charles Zaiontz, 9 April 2015
No ratings yet
Real Statistics Using Excel - Examples Workbook Charles Zaiontz, 9 April 2015
1,595 pages
ML Unit-2
100% (1)
ML Unit-2
52 pages
MTH6134 Notes11
No ratings yet
MTH6134 Notes11
77 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
Gls PDF
No ratings yet
Gls PDF
12 pages
2 - (9-3) Regression Classifiers
No ratings yet
2 - (9-3) Regression Classifiers
35 pages
Lec03 MultLinRegression
No ratings yet
Lec03 MultLinRegression
42 pages
Advanced Statistics Project Module 3 - Advanced Statistics: Submitted To Great Learning
No ratings yet
Advanced Statistics Project Module 3 - Advanced Statistics: Submitted To Great Learning
37 pages
Regression Model
No ratings yet
Regression Model
6 pages
WINSEM2024-25 CSE3506 ELA CH2024250502181 Reference Material III 21-12-2024 21NEW3
No ratings yet
WINSEM2024-25 CSE3506 ELA CH2024250502181 Reference Material III 21-12-2024 21NEW3
7 pages
Spearman Rank-Order Correlation
No ratings yet
Spearman Rank-Order Correlation
16 pages
Using Propensity Score Analysis in Behavioral Studies: Jie Chen, Ph.D. University of Massachusetts Boston
No ratings yet
Using Propensity Score Analysis in Behavioral Studies: Jie Chen, Ph.D. University of Massachusetts Boston
58 pages
20mia1006 FDA LAB REGRESSION TYPES
No ratings yet
20mia1006 FDA LAB REGRESSION TYPES
11 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Accounting & Control: Cost Management
No ratings yet
Accounting & Control: Cost Management
41 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Machine Learning - Classification: CS102 Winter 2019
No ratings yet
Machine Learning - Classification: CS102 Winter 2019
36 pages
2 Modele Lineare
No ratings yet
2 Modele Lineare
43 pages
Data Analytics Unit 3 Notes
100% (3)
Data Analytics Unit 3 Notes
28 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
Week 4 Lecture Slides BUS265 2023
No ratings yet
Week 4 Lecture Slides BUS265 2023
45 pages
SEM
100% (1)
SEM
20 pages
Regression Metrics
No ratings yet
Regression Metrics
26 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
MLR Example 2predictors
No ratings yet
MLR Example 2predictors
5 pages
DSEnd
No ratings yet
DSEnd
30 pages
Tab 1 Production Laitiere Lactation 1: Statistics
No ratings yet
Tab 1 Production Laitiere Lactation 1: Statistics
16 pages
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
No ratings yet
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
22 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
curs8-BA-partial Correlation-14.05
No ratings yet
curs8-BA-partial Correlation-14.05
12 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling
No ratings yet
Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling
8 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Calendar Variation Model Based On ARIMAX
No ratings yet
Calendar Variation Model Based On ARIMAX
13 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
28 pages
Unit 5
No ratings yet
Unit 5
171 pages
Correlation vs. Causation
No ratings yet
Correlation vs. Causation
13 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages
Minitab Simple Regression Analysis
No ratings yet
Minitab Simple Regression Analysis
7 pages
Labrecord
No ratings yet
Labrecord
39 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Tugas Rancoblan Kelompok: 1.putri Fardha A.O.H 2.ilma Amira 3.ajeng Musyafaah
No ratings yet
Tugas Rancoblan Kelompok: 1.putri Fardha A.O.H 2.ilma Amira 3.ajeng Musyafaah
10 pages
AI and ML Research Paper
No ratings yet
AI and ML Research Paper
44 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Machine Learning-Lecture 1 (Student)
No ratings yet
Machine Learning-Lecture 1 (Student)
14 pages
6th Lecture Note 108335647 230518 203102
No ratings yet
6th Lecture Note 108335647 230518 203102
35 pages
Wa0002.
No ratings yet
Wa0002.
5 pages
Line of Regression Part 1
No ratings yet
Line of Regression Part 1
27 pages
Linear Models in Stata and Anova
No ratings yet
Linear Models in Stata and Anova
20 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Unit 3
No ratings yet
Unit 3
24 pages
Statistics: Introduction To Regression
No ratings yet
Statistics: Introduction To Regression
14 pages
Lab Linear Regression
No ratings yet
Lab Linear Regression
21 pages
Simple Linear Regression Part I - Updated FA18
No ratings yet
Simple Linear Regression Part I - Updated FA18
59 pages
PCCCS504 Module 4
No ratings yet
PCCCS504 Module 4
4 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
CS6ML Assignment1
No ratings yet
CS6ML Assignment1
4 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Regression Model
No ratings yet
Regression Model
30 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
GCSE Statistics - Chapter 4
No ratings yet
GCSE Statistics - Chapter 4
2 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Regression Notes
No ratings yet
Regression Notes
6 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Model Evaluation Metrics
No ratings yet
Model Evaluation Metrics
21 pages
Partial Regression Coefficients.: Herv e Abdi
No ratings yet
Partial Regression Coefficients.: Herv e Abdi
4 pages
L4b - Perfomance Evaluation Metric - Regression
No ratings yet
L4b - Perfomance Evaluation Metric - Regression
6 pages
Section 2
No ratings yet
Section 2
22 pages
Assignment 2 - 418-With Answers
No ratings yet
Assignment 2 - 418-With Answers
3 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Formula Sheet For Measures of Central Tendency
No ratings yet
Formula Sheet For Measures of Central Tendency
2 pages
Evaluation Metrics For Your Regression Model - Analytics Vidhya
No ratings yet
Evaluation Metrics For Your Regression Model - Analytics Vidhya
6 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Lecture 16: Polynomial and Categorical Regression 1 Review
No ratings yet
Lecture 16: Polynomial and Categorical Regression 1 Review
10 pages
BN2102 7-10
No ratings yet
BN2102 7-10
24 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Regression
No ratings yet
Regression
24 pages
L4b - Perfomance Evaluation Metric - Regression
No ratings yet
L4b - Perfomance Evaluation Metric - Regression
6 pages
04 - Notebook4 - Additional Information
No ratings yet
04 - Notebook4 - Additional Information
5 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet

Machine Learning Basics 1683717543

Uploaded by

Machine Learning Basics 1683717543

Uploaded by

2

3. Evaluation metrics to compare among models: 13

Evaluation Metrics For Regression Models

In all the following metrics formulas, we define:

1. Loading the dataset and fitting the model

You can find the dataset (raw) in: https://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt

AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Y

0 59 2 32.1 101.0 157 93.2 38.0 4.0 4.8598 87 151

1 48 1 21.6 87.0 183 103.2 70.0 3.0 3.8918 69 75

#### Normalize data: divide each feature by its L2 norm

1.2. Model to fit

#### OLS: Ordinary Least Square

We can then, write the formula of the R2 as:

r2 recomputed: 0.5177484222203499 , r2_sk_learn: 0.5177484222203499

2.1.5. Let’s Visualize

2.2. Explained Variable score

The explained variance score is computed:

Explained variance recomputed: 0.5177484222203499 , Explained variance from sk_learn: 0.5177484222203499

2.3. MSE: Mean Squared Error

We can recall here that the RSS (Residual Sum of Square):

MSE recomputed: 2859.69634758675 , MSE from sk_learn: 2859.69634758675

2.4. RMSE: Root Mean Squared Error

RMSE is the root squared of MSE

RMSE recomputed: 53.47612876402657 , RMSE from sk_learn: 53.47612876402657

2.5. MAE: Mean Absolute Error

MAE recomputed: 43.27745202531506 , MAE sklearn: 43.27745202531506

2.6. MAPE: Mean Absolute Percentage Error

MAPE recomputed: 3.2636897018293114 , MAPE sklearn: 3.2636897018293114

2.7. MedAE: Median Absolute Value

MedAE metric computes the median of all absolute residual values:

This metric does not take into consideration the outliers.

MAPE recomputed: 3.2636897018293114 , MAPE sklearn: 3.2636897018293114

3. Evaluation metrics to compare among models

#### OLS: Ordinary Least Square

for y_predict in predicted_values:

I hope you enjoy it.

You might also like