6 - Classification and Regression Tasks
6 - Classification and Regression Tasks
Regression Tasks
Intro to AI and Data Science
NGN 112 – Fall 2024
Amer S. Zakaria
Department of Electrical Engineering
College of Engineering
Regression
Applications of Regression
Classification
Applications of Classification
Regression and Classification
3
◻ Common things:
▪ Regression and classification are both supervised learning methods
▪ They both require a dataset for training so they can make predictions
Regression versus Classification
4
Difference:
◻ In regression:
◻ Predict the price of a house based on variables like size of the house,
number of rooms, school district, neighborhood, etc.
◻ Predict the net worth of people based on variables like their age,
income, education, etc.
◻ Regression
A statistical technique that uses independent (input) variables to predict the
outcome of a dependent (output) variable.
◻ Linear regression
◻ Simple regression
It establishes a relationship between one independent (input) variable and one
dependent (output) variable. It attempts to draw a line or curve that fits the data
most and minimizes regression errors.
▪ Example of Simple Linear Regression:
Equation of a line: 𝑦 = 𝑏1 𝑥1 + 𝑏0
where 𝑥1 is the input variable, 𝑦 is the output variable, 𝑏1 , 𝑏0 are the coefficients.
◻ Multiple regression
It establishes a relationship between multiple independent (input) variables and one
dependent (output) variable.
▪ Example of Multiple Linear Regression:
𝑦 = 𝑏𝑛 𝑥 𝑛 + … . + 𝑏2 𝑥 2 + 𝑏1 𝑥 1 + 𝑏 0
where 𝑥1, … , 𝑥𝑛 are the input variables, 𝑦 is the output variable, 𝑏𝑛, … , 𝑏0 are the
coefficients.
The objective in a regression problem is calculate the
coefficients using optimization techniques.
Simple Linear Regression
13
Example
◻ Estimating the net worth of people based on their age.
Net worth
Age
Simple Linear Regression (cont.)
14
Example
◻ If you want to draw a line representing the data, which line of
Net worth
A
B
C
Answer is Line B
Age
Simple Linear Regression: Training & Fitting
15
Example
◻ Simple Linear Regression model can be represented by a line that
best fits the data.
◻ Can you give the model (line) equation? Given a point that the line
passes through.
Net worth Line Equation: 𝑦 = 𝑏1 𝑥 + 𝑏0
500
Here:
(Net worth) = 𝑏1 (age) + 𝑏0
where 𝑏1 is slope, and 𝑏0 is the y-
intercept (value of y when x =0)
Age (Net worth) = (500/80) (age) + 0
80
Simple Linear Regression: Prediction
16
Example
◻ Using this model, predict the net worth of a person of age 36
Net worth
Given the line equation:
(Net worth) = (500/80) (age) + 0
𝑖=1 𝑖=1
◻ In 𝑆𝑆𝐸Total , 𝑦ത is the mean of the actual values.
◻ The value of 𝑹𝟐 is between 0.0 and 1.0.
➢ 0.0 means the regression model is not doing a good job of capturing the
trend in the data.
➢ 1.0 means the regression model is doing a good job of describing the
relationship between the input(s) and the output.
Regression Task Pipeline
19
6. Model evaluation
from sklearn.metrics import mean_squared_error, r2_score
MSE = mean_squared_error(y_test,y_pred)
R2 = r2_score(y_test,y_pred)
Applications of Regression in Engineering-
Example 1: Simple Regression
21
Dataset link:
https://www.kaggle.com/datasets/himanshunakrani/student-study-
hours?ref=machinelearningnuggets.com
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
22
import pandas as pd
# Loading dataset
stud_scores = pd.read_csv('student_scores.csv')
stud_scores.describe()
Output
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
23
Output
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
24
X = X.to_numpy()
print(X)
Output
array([2.5, 5.1, 3.2, 8.5, 3.5, 1.5, 9.2, 5.5, 8.3, 2.7, 7.7,
5.9, 4.5, 3.3, 1.1, 8.9, 2.5, 1.9, 6.1, 7.4, 2.7, 4.8, 3.8,
6.9, 7.8])
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
25
X = X.reshape(-1, 1)
print(X)
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
26
Output
array([21, 47, 27, 75, 30, 20, 88, 60, 81, 25, 85, 62, 41, 42,
17, 95, 30, 24, 67, 69, 30, 54, 35, 76, 86], dtype=int64)
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
27
Output
Coefficient: [10.41]
Intercept: -1.51
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
30
Output
Predictions:
[ 9.93952968 32.84320126 18.26813752 86.97915227 48.45934097
78.65054442 61.99332873 75.52731648]
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
31
Output
Mean squared error (MSE): 56.09
Coefficient of determination(R squared): 0.89
Coefficient of determination(R squared) using score function: 0.89
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
32
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
33
Dataset link:
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
36
Output
Feature names: ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms',
'Population', 'AveOccup', 'Latitude', 'Longitude']
Target names: ['MedHouseVal']
Shape of dataset (20640, 8)
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
37
import pandas as pd
#for Dataframe data
X = df.drop(‘House_Value', axis=1) #0 =
# Convert X and y into a DataFrame row
df = pd.DataFrame(data=X, columns=feature_names) y = df[‘House_Value’] #target _column
df['House_Value'] = y # new column, target Or
selected_columns = [‘label 1’, ‘label 2’, ..]
X = df[selected_columns]
# Print the DataFrame
df
df.head() Output
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
38
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
Output
(14448, 8)
(6192, 8)
(14448,)
(6192,)
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
39
import numpy as np
print("Coefficients:\n", np.round(regressor.coef_,2))
print('Intercept:\n', round(regressor.intercept_,2))
Output x1
# expose the model to new values and predict the target vector
y_predictions = regressor.predict(X_test)
print('Predictions:', y_predictions)
Output
◻ Model evaluation
#Model evaluation
from sklearn.metrics import mean_squared_error, r2_score
# The mean squared error
print("Mean squared error: " , round(mean_squared_error(y_test,
y_predictions),2))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination: ", round(r2_score(y_test,
y_predictions),2))
Output
Diabetes Dataset
Dataset link:
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html
Applications of Regression in Engineering-
Example 3: Multiple Regression (cont.)
44
Output
Coefficients:
[ -30.77592231 -197.11523603 519.50634733 346.49118652 -688.21410873
431.49892496 19.3325826 94.20724607 716.79048049 75.26379265]
y-intercept:
152.09140122905802
Applications of Regression in Engineering-
Example 3: Multiple Regression (cont.)
48
◻ Model evaluation
from sklearn.metrics import mean_squared_error, r2_score
Output
Training/learning
results in trained model
______ Sunny?
not Yes No
sunny
X2 < 2 ? X2 < 4 ?
Yes No Yes No
Classification Algorithms- DT
62
Splitting Attributes
Home
Owner
Yes No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
MarSt Single,
Married Divorced
NO Home
Yes Owner No
NO Income
< 80K > 80K
NO YES
Training Data
64
Apply the Trained Model to Predict the Class of a Test Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
65
Apply the Trained Model to Predict the Class of a Test Sample (cont.)
Test Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
66
Apply the Trained Model to Predict the Class of a Test Sample (cont.)
Test
Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
67
Apply the Trained Model to Predict the Class of a Test Sample (cont.)
Test Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
68
Apply the Trained Model to Predict the Class of a Test Sample (cont.)
Test Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
69
Apply the Trained Model to Predict the Class of a Test Sample (cont.)
Test Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Predicted class is “No”
Income NO
< 80K > 80K
NO YES
70
Classification Algorithms- DT: Python
71
Output
[1 2]
Classification Algorithms- KNN
72
Output
[1 2]
Classification Algorithms- NB
76
❑ Example:
❑ Assume you have two friends, Adam and Lena.
❑ You received a message from one of them, but you do not
know who is the sender.
❑ You would like to use machine learning to “predict” the
sender.
❑ Assuming equal prior probability: Both Adam and Lena
have access to internet and can write emails.
■ P(Adam) = 0.5
■ P(Lena) = 0.5
Classification Algorithms- NB: Example (cont.)
78
◻ Data Preprocessing
X, y = load_iris(return_X_y=True)
#Output already encoded to numbers so no need for labeling
#Linear SVM
clf = LinearSVC()
print(y_pred)
Output
Applications of Classification in Engineering-
Example 1: Model Evaluation using Accuracy
94
◻ Model evaluation
#Evaluating the model
#print the accuracy score
Output
Applications of Classification in Engineering-
Example 1: Model Evaluation via Plotting (DT)
95
Output
Applications of Classification in Engineering-
Example 1: Model Evaluation via Plotting
96
Output
Applications of Classification in Engineering-
Example 1: Model Evaluation via Plotting
97
Linear SVM SVM with poly kernel SVM with RBF Kernel
Summary: Regression vs. Classification
98
Purpose To find the decision boundary, To find the line/curve that best fits
which divides the dataset into the data and predicts the output
different classes. more accurately.
Evaluation Accuracy, other measures such as MSE, R2, and other metrics such as
F-score, precision, recall, and SSE, MAE, and MAPE.
confusion matrix
Learning Outcomes
99
◻ Another approach in splitting data into the training set and testing set
# Split the data and targets - SHORT WAY
# Use the function train_test_split to split the data and targets into
training and testing sets. Testing data size is 20% of the data, and the rest
is the training portion
from sklearn.model_selection import train_test_split
diabetes_X_train, diabetes_X_test, diabetes_y_train, diabetes_y_test =
train_test_split (diabetes_X, diabetes_y, test_size= 0.2)
# OR: Split the data into training and testing sets – LONG WAY
diabetes_X_train = diabetes_X[:-20] #the first part of the array excluding the
last 20 records
diabetes_X_test = diabetes_X[-20:] #the last 20 records