6 - Classification and Regression Tasks
6 - Classification and Regression Tasks
Regression Tasks
Intro to AI and Data Science
NGN 112 – Fall 2024
Ammar Hasan
Department of Electrical Engineering
College of Engineering
Regression
Applications of Regression
Classification
Applications of Classification
Regression and Classification
3
◻ Common things:
▪ Regression and classification are both supervised learning methods
▪ They both require a dataset for training so they can make predictions
Regression versus Classification
4
Difference:
◻ In regression:
Regression
Regression
6
◻ Predict the price of a house based on variables like size of the house,
number of rooms, school district, neighborhood, etc.
◻ Predict the net worth of people based on variables like their age,
income, education, etc.
◻ Linear regression
A statistical technique that uses independent (input) variables to predict the
outcome of a dependent (output) variable. The dependent variable shows
a linear relationship with each of the independent variables.
◻ Non-linear regression
The dependent variable shows a non-linear relationship with the
independent variables.
Types of Regression (cont.)
13
◻ Simple regression
It establishes a relationship between one independent (input) variable and one
dependent (output) variable. It attempts to draw a line or curve that fits the data most
and minimizes regression errors.
▪ Example of Simple Linear Regression:
Equation of a line: 𝑦 = 𝑏1 𝑥1 + 𝑏0
where 𝑥1 is the input variable, 𝑦 is the output variable, 𝑏1 , 𝑏0 are the coefficients.
◻ Multiple regression
It establishes a relationship between multiple independent (input) variables and one
dependent (output) variable.
▪ Example of Multiple Linear Regression:
𝑦 = 𝑏𝑛 𝑥 𝑛 + … . + 𝑏2 𝑥 2 + 𝑏1 𝑥 1 + 𝑏 0
where 𝑥1, … , 𝑥𝑛 are the input variables, 𝑦 is the output variable, 𝑏𝑛, … , 𝑏0 are the
coefficients.
The objective in a regression problem is calculate the
coefficients using optimization techniques.
Simple Linear Regression
14
Example
◻ Estimating the net worth of people based on their age.
Net worth
Age
Simple Linear Regression (cont.)
15
Example
◻ If you want to draw a line representing the data, which line of
Net worth
A
B
C
Answer is Line B
Age
Simple Linear Regression: Training & Fitting
16
Example
◻ Simple Linear Regression model can be represented by a line that
best fits the data.
◻ Can you give the model (line) equation? Given a point that the line
passes through.
Net worth Line Equation: 𝑦 = 𝑏1 𝑥 + 𝑏0
500
Here:
(Net worth) = 𝑏1 (age) + 𝑏0
where 𝑏1 is slope, and 𝑏0 is the y-
intercept (value of y when x =0)
Age (Net worth) = (500/80) (age) + 0
80
Simple Linear Regression: Prediction
17
Example
◻ Using this model, predict the net worth of a person of age 36
Net worth
Given the line equation:
(Net worth) = (500/80) (age) + 0
𝑖=1 𝑖=1
◻ In 𝑆𝑆𝐸Total , 𝑦ത is the mean of the actual values.
◻ The value of 𝑹𝟐 is between 0.0 and 1.0.
➢ 0.0 means the regression model is not doing a good job of capturing the
trend in the data.
➢ 1.0 means the regression model is doing a good job of describing the
relationship between the input(s) and the output.
Regression Task Pipeline
20
1. 2. 3. 4. 5. 6.
Data Splitting Selecting Training Predictio Model
Preproce data into or the model n using evaluatio
ssing training Creating the n
set and the model trained
testing model
set
Common Python Codes – Regression
22
3. Selecting and creating the model choose one of the codes from next slide
4. Training the model:
regressor.fit(X_train, y_train)
6. Model evaluation
from sklearn.metrics import mean_squared_error, r2_score
MSE = mean_squared_error(y_test,y_pred)
R2 = r2_score(y_test,y_pred)
Details of Selecting and creating a model
23
# OR
# Creating the non-linear (polynomial) model
from sklearn.svm import SVR
regressor = SVR(kernel = 'poly') #degree 3 is default value
#regressor = SVR(kernel = 'poly', degree =4) degree 4
# OR
# Creating the non-linear (RBF) model
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
24
Applications of Regression
Applications of Regression in Engineering-
Example 1: Simple Regression
25
Dataset link:
https://www.kaggle.com/datasets/himanshunakrani/student-study-
hours?ref=machinelearningnuggets.com
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
26
# LOADING DATASET
stud_scores = pd.read_csv('student_scores.csv')
stud_scores.describe()
Output
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
27
Output
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
28
X = X.to_numpy()
print(X)
Output
array([2.5, 5.1, 3.2, 8.5, 3.5, 1.5, 9.2, 5.5, 8.3, 2.7, 7.7,
5.9, 4.5, 3.3, 1.1, 8.9, 2.5, 1.9, 6.1, 7.4, 2.7, 4.8, 3.8,
6.9, 7.8])
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
29
X = X.reshape(-1, 1)
print(X)
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
30
Output
array([21, 47, 27, 75, 30, 20, 88, 60, 81, 25, 85, 62, 41, 42,
17, 95, 30, 24, 67, 69, 30, 54, 35, 76, 86], dtype=int64)
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
31
Output
Predictions:
[ 9.93952968 32.84320126 18.26813752 86.97915227 48.45934097
78.65054442 61.99332873 75.52731648]
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
35
Output
Mean squared error (MSE): 56.09
Coefficient of determination(R squared): 0.89
Coefficient of determination(R squared) using score function: 0.89
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
36
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
37
feature_names = housing_data.feature_names
target_names = housing_data.target_names
print('Feature names: ', feature_names)
print('\nTarget names: ', target_names)#Median house value for households
print('\nShape of dataset', X.shape)
Output
Feature names: ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms',
'Population', 'AveOccup', 'Latitude', 'Longitude']
Target names: ['MedHouseVal']
Shape of dataset (20640, 8)
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
42
import pandas as pd
#for Dataframe data
X = df.drop(‘House_Value', axis=1) #0 =
# Convert X and y into a DataFrame row
df = pd.DataFrame(data=X, columns=feature_names) y = df[‘House_Value’] #target _column
df['House_Value'] = y # new column, target Or
selected_columns = [‘label 1’, ‘label 2’, ..]
X = df[selected_columns]
# Print the DataFrame
df
df.head() Output
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
43
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
Output (14448, 8)
(6192, 8)
(14448,)
(6192,)
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
44
import numpy as np
print("Coefficients:\n", np.round(regressor.coef_,2))
print('Intercept:\n', round(regressor.intercept_,2))
Output x1
# expose the model to new values and predict the target vector
y_pred = regressor.predict(X_test)
print('Predictions:', y_pred)
Output
◻ Model evaluation
#Model evaluation
from sklearn.metrics import mean_squared_error, r2_score
# The mean squared error
print("Mean squared error: " , round(mean_squared_error(y_test,
y_pred),2))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination: ", round(r2_score(y_test,
y_pred),2))
Output
Mean squared error: 0.55
Coefficient of determination: 0.57
Complete code for California Housing Dataset
after excluding some optional code
48
Diabetes Dataset
Input Variable(s)
age: Age in years
sex: Gender of the patient
bmi: Body mass index
bp: Average blood pressure Output Variable(s)
Total serum cholesterol (tc) Measure of disease progression
Low-density lipoproteins (ldl)
High-density lipoproteins (hdl)
Total cholesterol / HDL (tch)
Possibly log of serum triglycerides level (ltg)
Blood sugar level (glu)
Dataset link:
https://scikit-
learn.org/stable/modules/generated/sklea
rn.datasets.load_diabetes.html
Applications of Regression in Engineering-
Example 3: Multiple Regression (cont.)
50
# OR: Split the data into training and testing sets – LONG WAY
X_train = X[:-20] #the first part of the array excluding the last 20 records
X_test = X[-20:] #the last 20 records
regressor = LinearRegression()
Output:
Coefficients:
[ -30.77592231 -197.11523603 519.50634733 346.49118652 -688.21410873
431.49892496 19.3325826 94.20724607 716.79048049 75.26379265]
y-intercept:
152.09140122905802
Applications of Regression in Engineering-
Example 3: Multiple Regression (cont.)
54
◻ Model evaluation
from sklearn.metrics import mean_squared_error, r2_score
Output:
Mean squared error: 2197.35
Coefficient of determination: 0.57
Complete code for Diabetes Dataset
after excluding some optional code
56
1. Try different values of test_size in Step 2 and notice the effect on MSE
and R2 scores
2. Try different models in Step 3 and identify the model that gives the best
R2 score. The model options are given on this slide . Note that if you use
a nonlinear model, then delete the code for printing coefficient and
intercept.
3. In the diabetes dataset, we have 10 features. Therefore, we cannot
make a scatter plot. However, a scatter plot is possible with only one
feature. Use the following code to draw the scatter plot for the feature
at index 2, i.e., BMI which has been normalized with Z-Score
normalization.
import matplotlib.pyplot as plt
#plot basic scatterplot
plt.scatter(X_test[:,2], y_test, label = 'Actual')
#plot the regression line
plt.scatter(X_test[:,2], y_pred, label = 'Predicted' )
plt.xlabel(‘Scaled BMI')
plt.ylabel(‘Disease Progression')
plt.legend()
plt.show()
58
Classification
Classification
59
1. 2. 3. 4. 5. 6.
Data Splitting Selecting Training Predictio Model
Preproce data into or the model n using evaluatio
ssing training Creating the n
set and the model trained
testing model
set
Classification Task Pipeline
67
3. Selecting and creating the model choose one of the codes from the next slide
4. Training the model:
clf.fit(X_train, y_train)
Training/learning
results in trained model
______ Sunny?
not Yes No
sunny
X2 < 2 ? X2 < 4 ?
Yes No Yes No
Classification Algorithms- DT
76
Splitting Attributes
Home
Owner
Yes No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
MarSt Single,
Married Divorced
NO Home
Yes Owner No
NO Income
< 80K > 80K
NO YES
Training Data
78
Apply the Trained Model to Predict the Class of a Test Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
79
Apply the Trained Model to Predict the Class of a Test Sample (cont.)
Test Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
80
Apply the Trained Model to Predict the Class of a Test Sample (cont.)
Test
Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
81
Apply the Trained Model to Predict the Class of a Test Sample (cont.)
Test Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
82
Apply the Trained Model to Predict the Class of a Test Sample (cont.)
Test Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Income NO
< 80K > 80K
NO YES
83
Apply the Trained Model to Predict the Class of a Test Sample (cont.)
Test Sample
Home
Yes Owner No
NO MarSt
Single, Divorced Married
Predicted class is “No”
Income NO
< 80K > 80K
NO YES
84
Classification Algorithms- DT: Python
85
Output:
[1 2]
Classification Algorithms- KNN
86
Output:
[1 2]
Classification Algorithms- NB
90
❑ Example:
❑ Assume you have two friends, Adam and Lena.
❑ You received a message from one of them, but you do not
know who is the sender.
❑ You would like to use machine learning to “predict” the
sender.
❑ Assuming equal prior probability: Both Adam and Lena
have access to internet and can write emails.
■ P(Adam) = 0.5
■ P(Lena) = 0.5
Classification Algorithms- NB
92
Great Great
Classification Algorithms- NB
93
Great Great
Classification Algorithms- NB
94
Great Great
Classification Algorithms- NB
95
Great Great
Classification Algorithms- NB
96
Great Great
Classification Algorithms- NB: Python
97
Output:
[1 2]
Evaluation of Classification: Accuracy
98
Applications of Classification
Applications of Classification
100
◻ Data Preprocessing
X, y = load_iris(return_X_y=True)
#Output already encoded to numbers so no need for labeling
#Linear SVM
clf = LinearSVC()
print(y_pred)
Output:
Applications of Classification in Engineering-
Example 1: Model Evaluation using Accuracy
108
◻ Model evaluation
#Evaluating the model
#print the accuracy score
Output:
Applications of Classification in Engineering-
Example 1: Model Evaluation via Plotting (DT)
109
Linear SVM SVM with poly kernel SVM with RBF Kernel
Complete code for Iris Dataset
after excluding some optional code
112
Purpose To find the decision boundary, To find the line/curve that best fits
which divides the dataset into the data and predicts the output
different classes. more accurately.
Evaluation Accuracy (and other measures, MSE, R2 (and other metrics, not
not covered in this course, such as covered in this course, such as SSE,
F-score, precision, recall, and MAE, and MAPE)
confusion matrix)
Learning Outcomes
115