0% found this document useful (0 votes)

6 views100 pages

6 - Classification and Regression Tasks

Uploaded by

b00098269

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views100 pages

6 - Classification and Regression Tasks

Uploaded by

b00098269

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

Supervised Learning: Classification and

Regression Tasks
Intro to AI and Data Science
NGN 112 – Fall 2024

Amer S. Zakaria
Department of Electrical Engineering
College of Engineering

American University of Sharjah

Prepared by Dr. Salam Dhou, CSE

Last Updated on: 13th of Nov. 2024

Table of Content
2

Regression vs. Classification

Regression

Applications of Regression

Classification

Applications of Classification
Regression and Classification
3

◻ Regression is a method for understanding the relationship between

independent variables or features and a dependent variable or output.
Output can be predicted once the relationship between independent and
dependent variables has been estimated.

◻ Classification is a method for finding a function that helps in dividing the

dataset into classes based on different variables. In Classification, a
computer program is trained on the training dataset and based on that
training, it categorizes the data into different classes.

◻ Common things:
▪ Regression and classification are both supervised learning methods
▪ They both require a dataset for training so they can make predictions
Regression versus Classification
4

Difference:
◻ In regression:

▪ Output is continuous (numbers).

▪ The purpose is to find the line or curve that best fits the data and predicts the
output more accurately.
◻ In classification:
▪ Output is discrete (class labels).
▪ The purpose is to find the decision boundary, which divides the dataset into
different classes.
Regression
5

◻ The objective is to find a line, curve, or surface that best fits

the data.
◻ Finding the regression line or curve is an optimization
problem. The best line or curve is the one that minimizes the
distance (error) between the line or curve and the data
points.
◻ Given the training data, the regression algorithms try to find
the best line or curve that best fits the data.
◻ This model (line or curve) is used later for prediction.
Exercise
6

Exercise: The following are regression tasks. Identify the input

variable(s) and output variable(s) for each task.
Examples of Regression Tasks
7

◻ Predict the price of a house based on variables like size of the house,
number of rooms, school district, neighborhood, etc.

Input Variable(s) Output Variable(s)

• Size of the house House price

• Number of room
• School district
• neighborhood
Examples of Regression Tasks (cont.)
8

◻ Predict the net worth of people based on variables like their age,
income, education, etc.

Input Variable(s) Output Variable(s)

• Age Person’s networth

• Income
• Education
Examples of Regression Tasks (cont.)
9

◻ Predicting sales amounts of new product based on advertising

expenditure.

Input Variable(s) Output Variable(s)

• Advertising expenditure Amount of sales

Examples of Regression Tasks (cont.)
10

◻ Predicting wind velocities as a function of temperature, humidity, air

pressure, etc.

Input Variable(s) Output Variable(s)

• Temperature Wind velocity

• Humidity
• Air pressure
Types of Regression
11

◻ Regression
A statistical technique that uses independent (input) variables to predict the
outcome of a dependent (output) variable.
◻ Linear regression

The dependent variable shows a linear relationship with each of the

independent variables.
◻ Non-linear regression

The dependent variable shows a non-linear relationship with the

independent variables.
Types of Regression (cont.)
12

◻ Simple regression
It establishes a relationship between one independent (input) variable and one
dependent (output) variable. It attempts to draw a line or curve that fits the data
most and minimizes regression errors.
▪ Example of Simple Linear Regression:
Equation of a line: 𝑦 = 𝑏1 𝑥1 + 𝑏0
where 𝑥1 is the input variable, 𝑦 is the output variable, 𝑏1 , 𝑏0 are the coefficients.
◻ Multiple regression
It establishes a relationship between multiple independent (input) variables and one
dependent (output) variable.
▪ Example of Multiple Linear Regression:
𝑦 = 𝑏𝑛 𝑥 𝑛 + … . + 𝑏2 𝑥 2 + 𝑏1 𝑥 1 + 𝑏 0
where 𝑥1, … , 𝑥𝑛 are the input variables, 𝑦 is the output variable, 𝑏𝑛, … , 𝑏0 are the
coefficients.
The objective in a regression problem is calculate the
coefficients using optimization techniques.
Simple Linear Regression
13

Example
◻ Estimating the net worth of people based on their age.

◻ One feature: Age, output: Net worth

Net worth

Age
Simple Linear Regression (cont.)
14

Example
◻ If you want to draw a line representing the data, which line of

the following is the best?

Net worth
A
B
C
Answer is Line B

Age
Simple Linear Regression: Training & Fitting
15

Example
◻ Simple Linear Regression model can be represented by a line that
best fits the data.
◻ Can you give the model (line) equation? Given a point that the line
passes through.
Net worth Line Equation: 𝑦 = 𝑏1 𝑥 + 𝑏0
500
Here:
(Net worth) = 𝑏1 (age) + 𝑏0
where 𝑏1 is slope, and 𝑏0 is the y-
intercept (value of y when x =0)
Age (Net worth) = (500/80) (age) + 0
80
Simple Linear Regression: Prediction
16

Example
◻ Using this model, predict the net worth of a person of age 36

Net worth
Given the line equation:
(Net worth) = (500/80) (age) + 0

? By substituting in the equation:

(Net worth) = (500/80) (36) + 0 = 225
Age
36
Evaluation of Regression: MSE
17

◻ Mean squared error (MSE) is an accuracy measure that

measures the average of the squares of the
errors/difference between the predicted values and the
actual value.
𝑛
1
MSE = ෍ 𝑦𝑖 − 𝑦ෝ𝑖 2
𝑛
𝑖=1
◻ Here 𝑛 is the number of actual points, 𝑦𝑖 is the actual
value, 𝑦ෝ𝑖 is the predicted value.
◻ The smaller the value of MSE (close to zero) that better.
Evaluation of Regression: R2 Coefficient
18

◻ Coefficient of Determination (𝑹𝟐 ) is an accuracy measure. It measures

how much of any change in the output is explained by the change in the
input.
2
𝑆𝑆𝐸Regression
𝑅 =1−
𝑆𝑆𝐸Total
◻ Here
𝑛 𝑛

𝑆𝑆𝐸Regression = ෍ 𝑦𝑖 − 𝑦ෝ𝑖 2 and 𝑆𝑆𝐸Total = ෍ 𝑦𝑖 − 𝑦ത 2

𝑖=1 𝑖=1
◻ In 𝑆𝑆𝐸Total , 𝑦ത is the mean of the actual values.
◻ The value of 𝑹𝟐 is between 0.0 and 1.0.
➢ 0.0 means the regression model is not doing a good job of capturing the
trend in the data.
➢ 1.0 means the regression model is doing a good job of describing the
relationship between the input(s) and the output.
Regression Task Pipeline
19

1. Data Preprocessing (perform normalization if necessary)

2. Splitting data into a training set and testing set
3. Selecting and creating the model
4. Training the model
5. Prediction using the trained model
6. Model evaluation
Common Python Codes – Regression
20

Regression Task Pipeline:

1. Data Preprocessing (perform normalization if necessary)
2. Splitting data into training set and testing set
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.30, random_state = 1)

3. Selecting and creating the model (see slide)

4. Training the model:
regressor.fit(X_train, y_train)

5. Prediction using the trained model:

y_pred = regressor.predict(X_test)

6. Model evaluation
from sklearn.metrics import mean_squared_error, r2_score
MSE = mean_squared_error(y_test,y_pred)
R2 = r2_score(y_test,y_pred)
Applications of Regression in Engineering-
Example 1: Simple Regression
21

Predicting students scores based on study hours

Dataset link:
https://www.kaggle.com/datasets/himanshunakrani/student-study-
hours?ref=machinelearningnuggets.com
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
22

◻ Data Preparation and Preprocessing

import pandas as pd

# Loading dataset
stud_scores = pd.read_csv('student_scores.csv')
stud_scores.describe()
Output
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
23

◻ Data Preparation and Preprocessing

# Print the first 5 records

stud_scores.head()

Output
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
24

◻ Data Preparation and Preprocessing

# Creating input data and output variable
X = stud_scores['Hours'] # input variable
y = stud_scores['Scores'] # output variable

# The input to machine learning methods have to be arrays

# converting X to an array

X = X.to_numpy()
print(X)
Output
array([2.5, 5.1, 3.2, 8.5, 3.5, 1.5, 9.2, 5.5, 8.3, 2.7, 7.7,
5.9, 4.5, 3.3, 1.1, 8.9, 2.5, 1.9, 6.1, 7.4, 2.7, 4.8, 3.8,
6.9, 7.8])
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
25

◻ Data Preparation and Preprocessing

# The data has to be represented as an array of records Output
that need to reshape the input if it has a single
feature

# The function reshape is used to change the shape

(dimensions) of an array without changing its data.

# The 1 argument indicates that we want to have 1

column. The -1 argument indicates that we want NumPy to
automatically determine the number of rows needed based
on the total number of elements in the array.

X = X.reshape(-1, 1)
print(X)
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
26

◻ Data Preparation and Preprocessing

# the training data to machine learning methods has to be

arrays converting y to an array
y = y.to_numpy()
print(y)

Output

array([21, 47, 27, 75, 30, 20, 88, 60, 81, 25, 85, 62, 41, 42,
17, 95, 30, 24, 67, 69, 30, 54, 35, 76, 86], dtype=int64)
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
27

◻ Splitting the data into training and testing

# Splitting the data

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.30, random_state = 1)

# The parameter random_state controls the shuffling

applied to the data before applying the split. It is set
to None by default.

# Set random_state to an integer for reproducible output

across multiple function calls (in other words, if you
want to get the same results)
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
28

◻ Creating the model

# Creating the LINEAR model

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
Let’s consider
Linear Regression
# OR in this example
# Creating the non-linear (polynomial) model
from sklearn.svm import SVR
regressor = SVR(kernel = 'poly') #degree 3 is default value
#regressor = SVR(kernel = 'poly', degree =4) degree 4
# OR
# Creating the non-linear (RBF) model
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
29

◻ Training the model coef_ is only

#Training the model available
import numpy as np when using a
linear model
regressor.fit(X_train, y_train)

# GETTING THE COEFFICIENTS AND INTERCEPT (for linear models only)

print('Coefficient: ', np.round (regressor.coef_,2) )
# had to use np.round because regressor.coef_ is an array. You
need to remove this operation for nonlinear regression models

print('Intercept: ', np.round (regressor.intercept_, 2))

#you can use np.round or round on a floating point variable.

Output
Coefficient: [10.41]
Intercept: -1.51
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
30

◻ Prediction using the trained model

# PREDICTION OF TEST RESULT

y_pred = regressor.predict(X_test)
print('Predictions:\n', y_pred)

Output
Predictions:
[ 9.93952968 32.84320126 18.26813752 86.97915227 48.45934097
78.65054442 61.99332873 75.52731648]
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
31

◻ Evaluating the model

#Model Evaluation
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
MSE = mean_squared_error(y_test,y_pred)
R2 = r2_score(y_test,y_pred)
#Note: function r2_score takes the true labels (y_test) and the predicted ones
(y_pred)

print("Mean squared error (MSE):", round (MSE,2))

print('Coefficient of determination(R squared): ', round(R2, 2) )

#Note:Alternative way to calculate R squared

print('Coefficient of determination(R squared) using score function: ',
round(regressor.score(X_test, y_test), 2) )
#Note:function score takes the X_test and y_test

Output
Mean squared error (MSE): 56.09
Coefficient of determination(R squared): 0.89
Coefficient of determination(R squared) using score function: 0.89
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
32

◻ Evaluating the model

import matplotlib.pyplot as plt

#plot basic scatterplot

plt.scatter(X_test, y_test, label
= 'Actual')
Output
#plot the regression line
plt.scatter(X_test, y_pred, label
= 'Predicted' )

plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
33

◻ Evaluating the model

import matplotlib.pyplot as plt

#Using all data

#plot the basic scatterplot
plt.plot(X, y, 'o', label = 'Actual')
#’o’ is to make scatter plot, alternatively you can
write: plt.scatter(X,y, label = ‘Actual’) Output
#plot the predicted regression line
y_pred_all_data = regressor.predict(X)
plt.plot(X,y_pred_all_data,'o',label ='Predicted')
Applications of Regression in Engineering-
Example 1: Simple Regression (cont.)
34

◻ Regression line/curve produced by several regression models:

Linear Regression Non-Linear regression

(Polynomial of degree 3) RBF Kernel
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
35

California housing dataset

Dataset link:
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
36

◻ Data Preparation and Preprocessing

from sklearn.datasets import fetch_california_housing

housing_data = fetch_california_housing() #load the dataset

X = housing_data.data # represent the feature matrix
y = housing_data.target # represent the response vector/target
#OR
#X,y = fetch_california_housing(return_X_y = True)
# Extra: For creating a dataframe (next slide)
feature_names = housing_data.feature_names
target_names = housing_data.target_names
print('Feature names: ', feature_names)
print('\nTarget names: ', target_names)#Median house value for households
print('\nShape of dataset', X.shape)

Output
Feature names: ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms',
'Population', 'AveOccup', 'Latitude', 'Longitude']
Target names: ['MedHouseVal']
Shape of dataset (20640, 8)
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
37

◻ Extra: Data Preparation and Preprocessing – Create a DataFrame

#if you want to display the data set and visualize table,
You need to convert X and y into a DataFrame

import pandas as pd
#for Dataframe data
X = df.drop(‘House_Value', axis=1) #0 =
# Convert X and y into a DataFrame row
df = pd.DataFrame(data=X, columns=feature_names) y = df[‘House_Value’] #target _column
df['House_Value'] = y # new column, target Or
selected_columns = [‘label 1’, ‘label 2’, ..]
X = df[selected_columns]
# Print the DataFrame
df
df.head() Output
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
38

◻ Splitting data into the training set and testing set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

#Note: In this example, random_state is not set to an int, so expect

different splits and consequently different results!

print(X_train.shape)
print(X_test.shape)

print(y_train.shape)
print(y_test.shape)

Output
(14448, 8)
(6192, 8)
(14448,)
(6192,)
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
39

◻ Creating the model

# importing the linearRegression class
from sklearn.linear_model import LinearRegression

# instantiate the Linear Regression model

regressor = LinearRegression()

#You can also create other non-linear models as in Example 1

Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
40

◻ Training the model

# training the model

regressor.fit(X_train, y_train)
# get the coefficients and intercept

import numpy as np
print("Coefficients:\n", np.round(regressor.coef_,2))
print('Intercept:\n', round(regressor.intercept_,2))

Output x1

Coefficients: [ 0.45 0.01 -0.13 0.84 -0. -0. -0.42 -0.44]

Intercept: -37.35
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
41

◻ Prediction using the trained model

# expose the model to new values and predict the target vector
y_predictions = regressor.predict(X_test)
print('Predictions:', y_predictions)

Output

Predictions: [2.54827248 2.98136965 2.10894987 ...

2.82017938 6.84565693 2.68012622]
Applications of Regression in Engineering-
Example 2: Multiple Regression (cont.)
42

◻ Model evaluation
#Model evaluation
from sklearn.metrics import mean_squared_error, r2_score
# The mean squared error
print("Mean squared error: " , round(mean_squared_error(y_test,
y_predictions),2))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination: ", round(r2_score(y_test,
y_predictions),2))

#or you can use the score function as in Example 1

Output

Mean squared error: 0.55

Coefficient of determination: 0.57
Applications of Regression in Engineering-
Example 3: Multiple Regression
43

Diabetes Dataset

Dataset link:
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html
Applications of Regression in Engineering-
Example 3: Multiple Regression (cont.)
44

◻ Data Preparation and Preprocessing

from sklearn import datasets

# Load the diabetes dataset

diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
Applications of Regression in Engineering-
Example 3: Multiple Regression (cont.)
45

◻ Splitting data into the training set and testing set

# Split the data and targets - SHORT WAY
# Use the function train_test_split to split the data and targets into
training and testing sets. Testing data size is 20% of the data, and the rest
is the training portion
from sklearn.model_selection import train_test_split
diabetes_X_train, diabetes_X_test, diabetes_y_train, diabetes_y_test =
train_test_split (diabetes_X, diabetes_y, test_size= 0.2)

# OR: Longer way: Check extra slide 100

Applications of Regression in Engineering-
Example 3: Multiple Regression (cont.)
46

◻ Creating the model

from sklearn import linear_model

# Create linear regression object

regr = linear_model.LinearRegression()

# You can also create non-linear models as well

Applications of Regression in Engineering-
Example 3: Multiple Regression (cont.)
47

◻ Training the model

# Train the model using the training set
regr.fit (diabetes_X_train, diabetes_y_train)

# Get the model equation

# The coefficients
print("Coefficients: \n", regr.coef_)
# The y-intercept
print("y-intercept: \n", regr.intercept_)

Output
Coefficients:
[ -30.77592231 -197.11523603 519.50634733 346.49118652 -688.21410873
431.49892496 19.3325826 94.20724607 716.79048049 75.26379265]
y-intercept:
152.09140122905802
Applications of Regression in Engineering-
Example 3: Multiple Regression (cont.)
48

◻ Prediction using the trained model

# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)
Applications of Regression in Engineering-
Example 3: Multiple Regression (cont.)
49

◻ Model evaluation
from sklearn.metrics import mean_squared_error, r2_score

# The mean squared error

print("Mean squared error: " ,
round(mean_squared_error(diabetes_y_test, diabetes_y_pred),2))

# The coefficient of determination: 1 is perfect prediction

print("Coefficient of determination: ",
round(r2_score(diabetes_y_test, diabetes_y_pred),2))

Output

Mean squared error: 2197.35

Coefficient of determination: 0.57
Classification
50

❑ The objective is to find a decision boundary or decision surface that

separates the classes.
❑ The decision boundary is a boundary that partitions the samples in the
dataset into two sets or more, one for each class.
❑ Each machine learning algorithm has it is own way of finding that decision
boundary, that is, how a machine learning model might draw a line/set of
lines/curve to separate the classes.
❑ Different decision boundaries for the same dataset:
Exercise
51

Exercise: The following are classification tasks. Identify the input

variable(s) and output variable(s) for each task.
Examples of Classification Tasks
52

◻ Classifying credit card transactions as legitimate or fraudulent

Input Variable(s) Output Variable(s)

• Features of credit card Classes such as legitimate or

transactions, such as date and fraudulent
time of transaction, amount, etc.
Examples of Classification Tasks (cont.)
53

◻ Classifying land covers (water bodies, urban areas, forests,

etc.) using satellite data

Input Variable(s) Output Variable(s)

• Satellite images Classes such as water bodies, urban

areas, forests, etc.
Examples of Classification Tasks (cont.)
54

◻ Categorizing news stories as finance, weather, entertainment,

sports, etc.

Input Variable(s) Output Variable(s)

• News stories Classes such as finance,

entertainment, sports, etc.
Examples of Classification Tasks (cont.)
55

◻ Predicting tumor cells as benign or malignant

Input Variable(s) Output Variable(s)

• Features describing tumors Classes such as benign or malignant

shape and texture
Classification Algorithms
56

◻ There are several types of classification algorithms you can

use depending on the dataset you’re working with. The
following are five of the most common classification
algorithms:
▪ Decision Tree (DT)
▪ K-Nearest Neighbors (KNN)
▪ Naïve Bayes
▪ Logistic Regression
▪ Support Vector Machines (SVM)
Classification Algorithms (cont.)
57

There are different machine

learning algorithms

Training Phase: Learning

algorithm is used to build the
model

Training/learning
results in trained model

Testing Phase: Trained model

is used for prediction
Classification Algorithms- Decision Tree
58

◻ Decision Trees (DT)

▪ Decision Tree is a supervised learning technique that can be used for
both classification and Regression.
▪ It is a tree-structured classifier where internal nodes represent the
features of a dataset, branches represent the decision rules, and each
leaf node represents the outcome.
Classification Algorithms- DT (cont.)
59

◻ Decision nodes represent questions about features (e.g., Home

Owner?), which have two or more branches (e.g., Yes and No).
◻ Leaf nodes (e.g., Defaulted Borrower --> Yes, Defaulted Borrower --
> No) represent a classification or decision.
◻ Decision trees can handle both categorical and numerical data.
Classification Algorithms- DT (cont.)
60

◻ Decision trees allows you to ask multiple “Linear questions” to

classify a non-linearly separable dataset.
◻ Example 1: The following is a dataset with two features: Sun and
Wind, and there are two classes:
■ Good day for surfing
Sample decision tree to
■ Not a good day for surfing
separate the classes of this
dataset Windy?
Yes No
Sunny

______ Sunny?
not Yes No
sunny

not windy | windy

Classification Algorithms- DT (cont.)
61

◻ Example 2: The following is a dataset with two features: X1

and X2, and there are two classes ( , ).
◻ Can we build the decision tree to classify this sample set?
◻ Hint: Start splitting using X1
Sample decision tree to
separate the classes of this
dataset X1 < 3?
Yes No

X2 < 2 ? X2 < 4 ?
Yes No Yes No
Classification Algorithms- DT
62

◻ The DT algorithm decides where to spit the data based on

Impurity
◻ It finds split points that result in subsets that are as pure as
possible.
◻ A subset is purer when most data in it belong to the same
class.
This is a better split as it
results in purer subsets
Example of a Decision Tree

Splitting Attributes

Home
Owner
Yes No

NO MarSt
Single, Divorced Married

Income NO
< 80K > 80K

NO YES

Training Data Model: Decision Tree

63
Another Example of a Decision Tree

MarSt Single,
Married Divorced

NO Home
Yes Owner No

NO Income
< 80K > 80K

NO YES

There could be more than one tree that

fits the same data!

Training Data
64
Apply the Trained Model to Predict the Class of a Test Sample

Start from the root of tree. Test Sample

Home
Yes Owner No

NO MarSt
Single, Divorced Married

Income NO
< 80K > 80K

NO YES

65
Apply the Trained Model to Predict the Class of a Test Sample (cont.)

Test Sample

Home
Yes Owner No

NO MarSt
Single, Divorced Married

Income NO
< 80K > 80K

NO YES

66
Apply the Trained Model to Predict the Class of a Test Sample (cont.)

Test
Sample

Home
Yes Owner No

NO MarSt
Single, Divorced Married

Income NO
< 80K > 80K

NO YES

67
Apply the Trained Model to Predict the Class of a Test Sample (cont.)

Test Sample

Home
Yes Owner No

NO MarSt
Single, Divorced Married

Income NO
< 80K > 80K

NO YES

68
Apply the Trained Model to Predict the Class of a Test Sample (cont.)

Test Sample

Home
Yes Owner No

NO MarSt
Single, Divorced Married

Income NO
< 80K > 80K

NO YES

69
Apply the Trained Model to Predict the Class of a Test Sample (cont.)

Test Sample

Home
Yes Owner No

NO MarSt
Single, Divorced Married
Predicted class is “No”
Income NO
< 80K > 80K

NO YES

70
Classification Algorithms- DT: Python
71

◻ Example of classification using DT in Python

import numpy as np
#creating a dataset of 6 samples, where
#X is the array of feature vectors
#y is the array of labels
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])

#import the classifier package

from sklearn.tree import DecisionTreeClassifier

#create the classifier

clf = DecisionTreeClassifier()

#train the classifier on the whole dataset

clf.fit(X, y)

#use the trained classifier to predict the classes of two samples

print(clf.predict([[-0.8, -1],[4, 1]]))

Output
[1 2]
Classification Algorithms- KNN
72

◻ K-Nearest Neighbor (KNN)

▪ Simple algorithm for classification.
▪ Stores all the available training samples and classifies the new
samples based on the similarity measure (e.g., distance
functions).
▪ Basic idea: If it walks like a duck, quacks like a duck, then it’s
probably a duck.
Classification Algorithms- KNN (cont.)
73

◻ Requires the following:

▪ A set of labeled records
▪ Proximity metric to compute
distance/similarity between a pair
of records. For example, calculating
the Euclidean distance between the
sample pairs.
▪ The value of K is the number of
nearest neighbors to consider.
▪ One method for using class labels of
K nearest neighbors to determine
the class label of an unknown
record is, for example, by taking a
majority vote.
Classification Algorithms- KNN (cont.)
74

◻ K-nearest neighbors of a record x are data points that have

the k smallest distances to x
◻ Different values of k affect the predicted class

Predicted class is - Predicted class is Predicted class is +

either + or -
Classification Algorithms- KNN: Python
75

◻ Example of classification using KNN in Python

#import the classifier package

from sklearn.neighbors import KNeighborsClassifier

#create the classifier

clf = KNeighborsClassifier()

#train the classifier on the whole dataset

clf.fit(X, y)

#use the trained classifier to predict the classes of two samples

print(clf.predict([[-0.8, -1],[4, 1]]))

Output
[1 2]
Classification Algorithms- NB
76

❑ Naïve Bayes (NB): a probabilistic framework for solving

classification problems, based on Bayes Theorem.
𝑃 𝑋𝑌 𝑃 𝑌
𝑃 𝑌𝑋 =
𝑃 𝑋
❑ Consider each attribute and class label as random variables
❑ Given a record 𝑋 with attributes (𝑋1, 𝑋2, … , 𝑋𝑑)
▪ Goal is to predict class 𝑌
▪ Specifically, we want to find the value of Y that maximizes 𝑃(𝑌|𝑋)

❑ We can estimate 𝑃(𝑌|𝑋) directly from data.

Classification Algorithms- NB: Example
77

❑ Example:
❑ Assume you have two friends, Adam and Lena.
❑ You received a message from one of them, but you do not
know who is the sender.
❑ You would like to use machine learning to “predict” the
sender.
❑ Assuming equal prior probability: Both Adam and Lena
have access to internet and can write emails.
■ P(Adam) = 0.5
■ P(Lena) = 0.5
Classification Algorithms- NB: Example (cont.)
78

◻ The model is trained on the following probabilities that

describe the frequency of mentioning specific words by each
of the persons in their conversations (assume the language
consists of three words for simplicity)
◻ Lena mentions ‘Love’ in 50% of her conversations, while she
mentions ‘Deal’ and ‘Life’ in 20% and 30% of her
conversations, respectively.
Classification Algorithms- NB: Example (cont.)
79

◻ Assume you received an Email with contents:

Love!
◻ Whom do you think would be the sender of the email?
Lena! Why? Because Lena has higher probability of using the
word ‘Love’ than Adam.
Classification Algorithms- NB: Example (cont.)
80

◻ Assume you received an Email with contents:

Love Life!
◻ Whom do you think would be the sender of the email?
Lena! Why? Because Lena has higher probability of using both
the word ‘Love’ and the word ‘Life’ than Adam.
Classification Algorithms- NB: Example (cont.)
81

◻ Assume you received an Email with contents:

Life Deal!
◻ Whom do you think would be the sender of the email?
Let’s calculate:
P(Adam is sender of “Life Deal”) = prior_probability × P(Adam saying ‘life’) × P(Adam saying ‘Deal’)
= 0.5 × 0.1 × 0.8 = 0.04
P(Lena is sender of “Life Deal”) = prior_probability × P(Lena saying ‘life’) × P(Lena saying ‘Deal’)
= 0.5 × 0.3 × 0.2 = 0.03
Adam! Why? because of the higher probability
Classification Algorithms- NB: Example (cont.)
82

◻ Assume you received an Email with contents:

Love Deal!
◻ Whom do you think would be the sender of the email?
Let’s calculate: P(Adam is sender of “Love Deal”) = 0.5 × 0.1 × 0.8 = 0.04
P(Lena is sender of “Love Deal”) = 0.5 × 0.5 × 0.2 = 0.05
Lena! Why? because of the higher probability
Classification Algorithms- NB: Python
83

◻ Example of classification using NB in Python

#import the classifier package

from sklearn.naive_bayes import GaussianNB

#create the classifier

clf = GaussianNB()

#train the classifier on the whole dataset

clf.fit(X, y)

#use the trained classifier to predict the classes of two samples

print(clf.predict([[-0.8, -1],[4, 1]]))
Output
[1 2]
Evaluation of Classification: Accuracy
84

◻ Accuracy measures how many samples were classified

correctly over the total number of samples used in the
prediction.
Number of correctly classifed samples
accuracy =
Total number of samples used in the prediction
◻ The value of accuracy is between 0.0 and 1.0.
➢ 0.0 means the model did not make any correct
predications.
➢ 1.0 means the model predicted ALL the tested samples
correctly.
➢ The higher the accuracy (closer to 1.0), the better the
model.
Classification Task Pipeline
85

1. Data Preprocessing, which includes labeling any non-numerical

output, and, if necessary, normalizing numerical data.
2. Splitting data into a training set and testing set
3. Selecting and creating the classification model
4. Training the model
5. Prediction using the trained model
6. Model evaluation
Common Python Codes – Classification
86

Classification Task Pipeline:

1. Data Preprocessing (if necessary, reshaping, labeling and normalization)
2. Splitting data into a training set and a testing set
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.30, random_state = 1)

3. Selecting and creating the model (see slide)

4. Training the model:
clf.fit(X_train, y_train)

5. Prediction using the trained model:

y_pred = clf.predict(X_test)

6. Model evaluation: Calculating accuracy

clf.score (X_test, y_test)

Applications of Classification in Engineering-
Example 1: Data Processing
87

◻ Data Preprocessing

from sklearn.datasets import load_iris #import the dataset

# Explore the dataset

#data= load_iris(return_X_y=False)
#print(data.target_names) #class labels

X, y = load_iris(return_X_y=True)
#Output already encoded to numbers so no need for labeling

X = X[:, :2] # we only take the first two features for

visualization purposes
##ther option X = X[:, [0,3]] this includes the data of 1st
and 4th feature columns.
Dataset link:
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris
Applications of Classification in Engineering-
Example 1: Data Processing with Labeling
88

◻ Data Preprocessing, with labeling

◻ If target classes are categorial text, you need to encode them
before you proceed with machine learning. Here is an example
◻ Using the simplest form of encode, which is a Label Encoder:
Applications of Classification in Engineering-
Example 1: Data Splitting
89

◻ Splitting data into a training set and testing set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

#Note: random_state is not set to int, so you may get

different results
Applications of Classification in Engineering-
Example 1: Selecting and Creating a Model
90

◻ Selecting and creating the classifier model

from sklearn.tree import DecisionTreeClassifier #import the decision tree class

from sklearn.neighbors import KNeighborsClassifier #import the KNN class
from sklearn.naive_bayes import GaussianNB #import the NB class
from sklearn.svm import LinearSVC #import the Linear Support Vector Classifier
from sklearn.svm import SVC #import the Support Vector Classifier
Applications of Classification in Engineering-
Example 1: Selecting and Creating a Model
91

◻ Creating the model

◻ It can be ONE of the following:
#Create the classifier
#Decision tree Classifier
clf = DecisionTreeClassifier() Let’s consider Decision
Tree Classifier
#K-Nearest Neighbors Classifier
clf = KNeighborsClassifier()

#Naive Bayes Classifier

clf = GaussianNB()

#Linear SVM
clf = LinearSVC()

#Non-Linear SVM with polynomial kernel

clf = SVC(kernel='poly')

#Non-Linear SVM with Radial Basis Function (RBF) kernel

clf = SVC(kernel='rbf')
Applications of Classification in Engineering-
Example 1: Training
92

◻ Training the model

#training the model by calling the function fit

and passing the training features and labels

clf = clf.fit(X_train, y_train)

Applications of Classification in Engineering-
Example 1: Using the Model for Prediction
93

◻ Prediction using the trained model

#predicting the labels of the test features

y_pred = clf.predict(X_test)

print(y_pred)

Output
Applications of Classification in Engineering-
Example 1: Model Evaluation using Accuracy
94

◻ Model evaluation
#Evaluating the model
#print the accuracy score

print ('Accuracy is:', round(clf.score (X_test, y_test),2));

Output
Applications of Classification in Engineering-
Example 1: Model Evaluation via Plotting (DT)
95

◻ Model evaluation via Plotting for DT algorithms

#plot the decision tree, only if using DT classifier
from sklearn import tree
tree.plot_tree(clf)

Output
Applications of Classification in Engineering-
Example 1: Model Evaluation via Plotting
96

◻ Plot the Decision Boundary produced by the classifier:

#plot the decision boundary by calling the following ‘user-defined’ function
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
plot_decision_regions(X_train, y_train, clf)
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')

Output
Applications of Classification in Engineering-
Example 1: Model Evaluation via Plotting
97

◻ Decision Boundaries produced by several classifiers:

DT KNN NB

Linear SVM SVM with poly kernel SVM with RBF Kernel
Summary: Regression vs. Classification
98

◻ Differences between classification and regression

Property Classification Regression

Output type Discrete (class labels) Continuous (numbers)

Purpose To find the decision boundary, To find the line/curve that best fits
which divides the dataset into the data and predicts the output
different classes. more accurately.
Evaluation Accuracy, other measures such as MSE, R2, and other metrics such as
F-score, precision, recall, and SSE, MAE, and MAPE.
confusion matrix
Learning Outcomes
99

Upon completion of the course, students will be able to:

1. Identify the importance of AI and Data Science for society
2. Perform data loading, preprocessing, summarization and
visualization
3. Apply machine learning methods to solve basic regression
and classification problems
4. Apply artificial neural networks to solve simple engineering
problems
5. Implement basic data science and machine learning tasks
using programming tools
Extra: Applications of Regression in
Engineering- Example 3: Multiple Regression
100

◻ Another approach in splitting data into the training set and testing set
# Split the data and targets - SHORT WAY
# Use the function train_test_split to split the data and targets into
training and testing sets. Testing data size is 20% of the data, and the rest
is the training portion
from sklearn.model_selection import train_test_split
diabetes_X_train, diabetes_X_test, diabetes_y_train, diabetes_y_test =
train_test_split (diabetes_X, diabetes_y, test_size= 0.2)

# OR: Split the data into training and testing sets – LONG WAY
diabetes_X_train = diabetes_X[:-20] #the first part of the array excluding the
last 20 records
diabetes_X_test = diabetes_X[-20:] #the last 20 records

# Split the labels into training and testing sets

diabetes_y_train = diabetes_y[:-20] #the first part of the array excluding the
last 20 records
diabetes_y_test = diabetes_y[-20:] #the last 20 records

Hospital List
75% (4)
Hospital List
4 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
Christian Dior The Magic of Fashion
100% (3)
Christian Dior The Magic of Fashion
66 pages
Gul Nawaz CV
No ratings yet
Gul Nawaz CV
2 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
HRM360 Assignment
No ratings yet
HRM360 Assignment
10 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
Archmodels Vol 127
100% (1)
Archmodels Vol 127
71 pages
Unit 5
No ratings yet
Unit 5
171 pages
d3 It ML Jan 2023 Part 2
No ratings yet
d3 It ML Jan 2023 Part 2
32 pages
Aiml 4
No ratings yet
Aiml 4
107 pages
6 - Classification and Regression Tasks
No ratings yet
6 - Classification and Regression Tasks
115 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
70 pages
Unit 2
No ratings yet
Unit 2
80 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Bart Daily Routine 6TH
50% (2)
Bart Daily Routine 6TH
2 pages
LG 50PM4700-TA Chassis PA22A
No ratings yet
LG 50PM4700-TA Chassis PA22A
73 pages
Closed-Loop Control of DC Drives With Controlled Rectifier
0% (1)
Closed-Loop Control of DC Drives With Controlled Rectifier
40 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Supervised Learning
No ratings yet
Supervised Learning
61 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
DS w13 Regression
No ratings yet
DS w13 Regression
60 pages
Day 2
No ratings yet
Day 2
52 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
AI Lec23
No ratings yet
AI Lec23
36 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Supervised Learning
No ratings yet
Supervised Learning
41 pages
UBL Operations Management
No ratings yet
UBL Operations Management
18 pages
AIML Lab
No ratings yet
AIML Lab
48 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
1694600692-Unit2.1 Linear Regression CU 2.0
No ratings yet
1694600692-Unit2.1 Linear Regression CU 2.0
45 pages
03 Regression
No ratings yet
03 Regression
39 pages
ML 2 ND Unit
No ratings yet
ML 2 ND Unit
50 pages
Module 4
No ratings yet
Module 4
41 pages
E103-W02 UserManual EN V3.0
No ratings yet
E103-W02 UserManual EN V3.0
54 pages
Hhghiikkk
No ratings yet
Hhghiikkk
29 pages
ML U2 Regression
No ratings yet
ML U2 Regression
20 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
AI Lec 3
No ratings yet
AI Lec 3
36 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
ML - Regression
No ratings yet
ML - Regression
34 pages
Slide 3 Linear Regression
No ratings yet
Slide 3 Linear Regression
27 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
CSL0777 L11
No ratings yet
CSL0777 L11
20 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Wa0023.
No ratings yet
Wa0023.
22 pages
Regression
No ratings yet
Regression
16 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
G2 3 1 2HowBearLostHisTail5
No ratings yet
G2 3 1 2HowBearLostHisTail5
15 pages
UNIT 3 Regression
No ratings yet
UNIT 3 Regression
5 pages
ML Unit
No ratings yet
ML Unit
23 pages
Chap 2 Linear Regression - Part1
No ratings yet
Chap 2 Linear Regression - Part1
29 pages
Parts of Speech Test Bank
No ratings yet
Parts of Speech Test Bank
14 pages
AI Lab7
No ratings yet
AI Lab7
13 pages
Lec 6
No ratings yet
Lec 6
19 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Pre Cal Circle
No ratings yet
Pre Cal Circle
16 pages
DC1500 - Installation Manual: WWW - HHO-Plus - LV T: +371 27124103
No ratings yet
DC1500 - Installation Manual: WWW - HHO-Plus - LV T: +371 27124103
39 pages
Capacity Planning For Products and Services
No ratings yet
Capacity Planning For Products and Services
31 pages
Department of Computer Science & Engineering.: Submitted To
No ratings yet
Department of Computer Science & Engineering.: Submitted To
16 pages
Residual Method
No ratings yet
Residual Method
15 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Lab Experiment 4 - AI
No ratings yet
Lab Experiment 4 - AI
7 pages
Unit 2
No ratings yet
Unit 2
18 pages
Educ 102
No ratings yet
Educ 102
3 pages
INtro To Eco
No ratings yet
INtro To Eco
5 pages
Cove R Lin e
No ratings yet
Cove R Lin e
17 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Syllabus MBA542 Fall 2020
No ratings yet
Syllabus MBA542 Fall 2020
3 pages
MR-Pdt-SE New Adhesion Communication
No ratings yet
MR-Pdt-SE New Adhesion Communication
2 pages
Sahil - Shamra - TCA NDA Form
No ratings yet
Sahil - Shamra - TCA NDA Form
2 pages
ML Week 4
No ratings yet
ML Week 4
5 pages
Chapter I
No ratings yet
Chapter I
8 pages
Avatar Courage - AHTS Brochure Dec 2022 (Singapore Flag)
No ratings yet
Avatar Courage - AHTS Brochure Dec 2022 (Singapore Flag)
2 pages
Management Education in India
No ratings yet
Management Education in India
22 pages
Tutorial #1:the Essential ANSYS.: ME309: Finite Element Analysis in Mechanical Design
No ratings yet
Tutorial #1:the Essential ANSYS.: ME309: Finite Element Analysis in Mechanical Design
9 pages
Datasheet SX95
No ratings yet
Datasheet SX95
1 page
JZC 32F Etc PDF
No ratings yet
JZC 32F Etc PDF
1 page
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
P6F-52 Antenna: Terrestrial Microwave Antenna Products
No ratings yet
P6F-52 Antenna: Terrestrial Microwave Antenna Products
3 pages
6 Hobbies That Can Build Up Your Creativity and Imagination
No ratings yet
6 Hobbies That Can Build Up Your Creativity and Imagination
1 page