0% found this document useful (0 votes)
12 views93 pages

Machine Learnig Revision

Uploaded by

samsamin62
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views93 pages

Machine Learnig Revision

Uploaded by

samsamin62
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

AE4800 - SPECIAL TOPICS IN AEE:

Aerospace Engineering Project

Machine Learning and Artificial Neural Networks


Application

Aerospace Engineering Department

Submitted by:
Safa YILMAZ

Professor:
Assoc. Prof. Dr. Ercan GÜRSES

Date: February 1, 2021


Contents
1 INTRODUCTION 7

2 Types of Compilers 7

3 Types of Libraries 8
3.1 Built-In Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Standard Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 3rd Party Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Learning Process 10
4.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.1 Import Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.2 Import Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.3 Optimization Missing Data . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.4 Encoding Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.5 Arranging Training and Test Data . . . . . . . . . . . . . . . . . . . 13
4.1.6 Feature Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.3 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.4 Support Vector Regression (SVR) . . . . . . . . . . . . . . . . . . . . 22
4.2.5 Decision Tree Regression . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.6 Random Forest Regression . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.2 K-Nearest Neighbors (K-NN) . . . . . . . . . . . . . . . . . . . . . . 33
4.3.3 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . . . 37
4.3.4 Kernel SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.5 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.6 Decision Tree Classification . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.7 Random Forest Classification . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . 70
4.6 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6.1 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . 78
4.6.2 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . 80
4.6.3 Kernel PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.7 Model Selection and Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . 85

1
4.7.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.7.2 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5 CONCLUSION 91

6 REFERENCES 92

2
List of Figures
1 Simple Linear Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Simple Linear Regression Application . . . . . . . . . . . . . . . . . . . . . . 15
3 Simple Linear Regression Error Contribution . . . . . . . . . . . . . . . . . . 15
4 Simple Linear Regression Applications . . . . . . . . . . . . . . . . . . . . . 17
5 Multiple Linear Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7 Polynomial Regression Approximation . . . . . . . . . . . . . . . . . . . . . 20
8 Regression Application Differences on Training Data . . . . . . . . . . . . . 21
9 Polynomial Regression Application on Training Data . . . . . . . . . . . . . 22
10 Support Vector Regression (SVR) . . . . . . . . . . . . . . . . . . . . . . . . 22
11 Support Vector Regression (SVR) Application on Training Data . . . . . . . 24
12 Support Vector Regression (SVR) Application on Training Data with Radial
Basis Function (RBF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
13 Decision Tree Regression Split Map . . . . . . . . . . . . . . . . . . . . . . . 25
14 Decision Tree Regression Tree Map . . . . . . . . . . . . . . . . . . . . . . . 26
15 Decision Regression Tree Application on Training Data . . . . . . . . . . . . 27
16 Random Forest Regression Application on Training Data . . . . . . . . . . . 29
17 Probability Distribution of Data Set . . . . . . . . . . . . . . . . . . . . . . . 30
18 Probability Shifting for Predicted Values . . . . . . . . . . . . . . . . . . . . 30
19 Logistic Regression Application . . . . . . . . . . . . . . . . . . . . . . . . . 33
20 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
21 Categories and New Data Position . . . . . . . . . . . . . . . . . . . . . . . . 34
22 K-NN Classification Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
23 K-Nearest Neighbors (K-NN) Application . . . . . . . . . . . . . . . . . . . . 37
24 Support Vector Machine Applcation . . . . . . . . . . . . . . . . . . . . . . . 37
25 Support Vector Machine (SVM) Application . . . . . . . . . . . . . . . . . . 40
26 Mapping Function Application . . . . . . . . . . . . . . . . . . . . . . . . . . 40
27 Kernels of Mapping Function . . . . . . . . . . . . . . . . . . . . . . . . . . 41
28 The Gaussian RBF Kernel Application . . . . . . . . . . . . . . . . . . . . . 41
29 Kernel SVM Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
30 Probabilities of Two Different Class on the Data Set . . . . . . . . . . . . . . 44
31 Naive Bayes Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
32 Decision Tree Classification Separation of Data Set . . . . . . . . . . . . . . 47
33 Decision Tree Classification Tree . . . . . . . . . . . . . . . . . . . . . . . . 47
34 Decision Tree Classification Application . . . . . . . . . . . . . . . . . . . . . 50
35 Random Forest Classification Application . . . . . . . . . . . . . . . . . . . . 52
36 Randomly Selected Points for Each Clusters . . . . . . . . . . . . . . . . . . 53
37 Movement to the New Centroid Point Calculated . . . . . . . . . . . . . . . 53
38 New Clusters for New Closest Centroid . . . . . . . . . . . . . . . . . . . . . 54
39 Final Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
40 K-Means Application Process . . . . . . . . . . . . . . . . . . . . . . . . . . 55
41 Optimum Number of Clusters Determination . . . . . . . . . . . . . . . . . . 55
42 K-Means Clustering Application . . . . . . . . . . . . . . . . . . . . . . . . . 57

3
43 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
44 Distance Between Two Clusters . . . . . . . . . . . . . . . . . . . . . . . . . 58
45 Hierarchical Clustering Model Optimization of Cluster Number . . . . . . . . 59
46 Final Optimum Clusters for Hierarchical Clustering Model . . . . . . . . . . 59
47 Hierarchical Clustering Application . . . . . . . . . . . . . . . . . . . . . . . 60
48 Biological and Artificial Neurons . . . . . . . . . . . . . . . . . . . . . . . . . 61
49 Standardization of Independent Variables . . . . . . . . . . . . . . . . . . . . 61
50 Transfer Function Step in Neurons . . . . . . . . . . . . . . . . . . . . . . . 62
51 Activation Function Step in Neurons . . . . . . . . . . . . . . . . . . . . . . 62
52 Activation Functions of Neural Networks . . . . . . . . . . . . . . . . . . . . 63
53 Multi Layer Neurons with Different Activation Functions . . . . . . . . . . . 64
54 Arranging Weight with Back Propagation . . . . . . . . . . . . . . . . . . . 64
55 Minimizing Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
56 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
57 Back Propagation with Gradient Descent . . . . . . . . . . . . . . . . . . . . 66
58 Example of Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . 66
59 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
60 Difference Between Batch Gradient Descent and Stochastic Gradient Descent 67
61 Types of Propagation on Neural Networks . . . . . . . . . . . . . . . . . . . 68
62 Convolutional Neural Network Structure . . . . . . . . . . . . . . . . . . . . 71
63 Dimensions of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
64 Pixels in Binary Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
65 Feature Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
66 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
67 Example of Convolutional Layer Application . . . . . . . . . . . . . . . . . . 73
68 Rectifier Activation (ReLU) Layer . . . . . . . . . . . . . . . . . . . . . . . . 73
69 Pooled Feature Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
70 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
71 Converting of Pooling Layer to Input Layer (Flattening) . . . . . . . . . . . 75
72 Convolutional Neural Networks Preparation Steps . . . . . . . . . . . . . . . 75
73 Input Application on Artificial Neural Networks for Convolutional Neural Net-
works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
74 Convolutional Neural Network Structure Included All Steps . . . . . . . . . 76
75 Difference Between PCA and LDA . . . . . . . . . . . . . . . . . . . . . . . 81
76 k-Fold Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4
Listings
1 Import Libraries Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Import Data Set Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Optimization Missing Data Code . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Encoding Categorical Data Code . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Arranging Traning and Test Data Code . . . . . . . . . . . . . . . . . . . . . 13
6 Feature Scaling Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7 Simple Linear Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . 15
8 Multiple Linear Regression Code . . . . . . . . . . . . . . . . . . . . . . . . 18
9 Polynomial Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
10 Support Vector Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . 23
11 Decision Tree Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . . 26
12 Random Forest Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . 28
13 Logistic Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
14 K-Nearest Neighbors (K-NN) Code . . . . . . . . . . . . . . . . . . . . . . . 35
15 Support Vector Machine Code . . . . . . . . . . . . . . . . . . . . . . . . . . 37
16 Kernel SVM Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
17 Naive Bayes Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
18 Decision Tree Classification Code . . . . . . . . . . . . . . . . . . . . . . . . 48
19 Random Forest Classification Code . . . . . . . . . . . . . . . . . . . . . . . 50
20 K-Means Clustering Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
21 Hierarchical Clustering Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
22 Artificial Neural Networks Code . . . . . . . . . . . . . . . . . . . . . . . . . 68
23 Convolutional Neural Networks Code . . . . . . . . . . . . . . . . . . . . . . 76
24 Principal Component Analysis (PCA) Code . . . . . . . . . . . . . . . . . . 79
25 Linear Discriminant Analysis Code . . . . . . . . . . . . . . . . . . . . . . . 81
26 Kernel PCA Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
27 K-Fold Cross Validation Code . . . . . . . . . . . . . . . . . . . . . . . . . . 86
28 Grid Search Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
29 XGBoost Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5
Abstract
The aims of this article are how machine learning apply data set to reach accurate
results, types of leaning models, how data set is prepared for learning process, natural
language process application and neural network types. At the end of the report, main
expectation is knowledge about all types of application on artificial intelligence gained
by reader. Processing is explained step by step for readers’ benefits. All codes and
graphs are imported in the report to guide how to apply all these processes. Data sets
are very huge and flexible to use in code. That is why, data sets are not imported in
the report. Reader can be apply their own data set to improve their knowledge about
machine learning application. The syntax of codes is Python3. Python3 is useful to
reach many open source documents on the internet or different region. That is why,
Python3 is preferred for application of machine learning.

6
1 INTRODUCTION

In machine learning, a person is chosen or built for the underlying algorithm. However,
the algorithms learn of parameters that form a mathematical model for making forecasts
rather than direct human interference. Human beings don’t know these criteria or set them
– the computer does. Another approach is to train a mathematical model by using a col-
lection of data such that it learns what to do with identical data as it sees in the future.
Usually models take data as an input and then generate a forecast of some interest.

Managers may not have to be specialists in machine learning, but there should be even
a limited amount of information. You would have an understanding of where to start and
what to grab if you can grasp what the specific types of things you can do with ML. And you
would not have to blindly say ”go magic” to the technical staff and hope they excel. We will
continue with machine-learning techniques, explore the physics and computational reality
that make all this possible, answer a fundamental problem for ML, delve into deep learning
and discuss. All in all, we look forward to a more fruitful dialogue with data scientists and
engineers.

2 Types of Compilers

There are two types of compiler use in the machine learning applications Which are
Python and R. On the other hand, so many IDE (Integrated Development Environment) are
used for these compilers. An IDE helps the programmer to consolidate the various facets of
a computer program writing. IDEs improve the effectiveness of programmers by integrating
common software writing tasks into a single application: source code editing, executable
creation, and debugging. There are so many IDEs to use for application.

Some IDEs for Python programings are,

• PyCharm

• Kite

• Spyder

• IDLE

• Visual Studio Code

• Sublime Text 3

• Atom

7
• Jupyter

• Pydev

• Thonny

• Wing

• ActivePython

• ...

3 Types of Libraries

There are 3 types of libraries in Python compiler which are,

• Built-In Libraries

• Standard Libraries

• 3rd Party Libraries

3.1 Built-In Libraries

Builtin libraries apart from main simple application modules like,

• abs

• all

• any

• bin

• dir

• enumerate

• isinstance

• iter

• bytes

• compile

8
• ...

Builtin libraries has some mathematical application or function solving modules.


For instance, abs module takes absolute value of solution or isinstance module control the
equality cases.

3.2 Standard Libraries

Standard libraries include main directory modules like,

• time

• sys

• os

• math

• random

• pickle

• urllib

• re

• cgi

• socket

• prefix

• suffix

• warning

• locale

• ...

For instance, the time modules is count second and give exact solution in that time
period. Python gives solution in each time period on console. This gives us controlling the
information sequence in exact time period.

With this library, we can reach some directory and controlling the system. For
instance, we can create or control directory elements to compile a code if it exist. The code
check the existence of the files and run it.

9
3.3 3rd Party Libraries

This libraries are most common libraries of python because of the usage of this libraries
importance. This libraries are importing from the outsource. Most of these libraries are
open source libraries. For instance, python main libraries does not have some modules as
’mean’. We can not calculate mean value of our datas. However, using Numpy libraries, we
can apply these directly with single word. All these libraries using different application. In
this report our expectation is machine learning and artificial intelligence application. That
is why, we use some of these libraries for calculate our prediction and preparing our data set
to train the machine.

Some 3rd party libraries are,

• TensorFlow

• Scikit-Learn

• Numpy

• Keras

• PyTorch

• LightGBM

• Eli5

• SciPy

• Theano

• Pandas

• MatPlotLib

In this report, we mostly use TensorFlow for ANN application, Scikit-Learn for ma-
chine learning, Numpy for mathematical application, Pandas for data set importing and
MatPlotLib for visualizing our solution.

4 Learning Process

Learning process is include some part of application types which are,

• Data Preprocessing

10
• Regression
• Classification
• Clustering
• Association Rule Learning
• Reinforcement Learning
• Natural Language Processing
• Deep Learning
• Dimensionality Reduction
• Model Selection and Boosting

4.1 Data Preprocessing

Data preprocessing step includes some application to prepare data set for training steps.
These steps are important for training process to reach optimum learning solutions. These
steps are,

• Importing Libraries
• Importing Data Set
• Optimization of Missing Data
• Encoding Categorical Data
• Arranging Training and Test Data
• Feature Scaling

4.1.1 Import Libraries

Importing Libraries is the beginner step of the coding on the application. The necessary
3rd Party libraries are import in code in this part. As mentioned on previous steps, chosen
libraries import in code with following syntax.

1 # Importing the libraries


2 import numpy as np
3 import matplotlib . pyplot as plt
4 import pandas as pd
Listing 1: Import Libraries Code

11
4.1.2 Import Data Set

The second step for preprocessing is data implementation. Data set for using in training
process import in this step. With using Pandas library, data set import the code to train
with some modules. Data set import is applying with following code.

1 # Importing the dataset


2 dataset = pd . read_csv ( ’ Data . csv ’)
3 X = dataset . iloc [: , : -1]. values
4 y = dataset . iloc [: , -1]. values
5 print ( X )
6 print ( y )
Listing 2: Import Data Set Code

4.1.3 Optimization Missing Data

Some missing part can be on data set. Missing data is problem for training process.
That is why, we can apply some rules like mean of the whole datas on this column. Missing
data optimisation is applied with following code.

1 # Taking care of missing data


2 from sklearn . impute import SimpleImputer
3 imputer = SimpleImputer ( missing_values = np . nan , strategy = ’ mean ’)
4 imputer . fit ( X [: , 1:3])
5 X [: , 1:3] = imputer . transform ( X [: , 1:3])
6 print ( X )
Listing 3: Optimization Missing Data Code

4.1.4 Encoding Categorical Data

Some datas in data set are string which means words or some thing like these. These
datas should be converted to numerical values to understood by the machine. That is why,
this step is applied on the string datas. The encoding categorical datas is applied with fol-
lowing code.

1 # Encoding categorical data


2 # Encoding the Independent Variable
3 from sklearn . compose import ColumnTransformer
4 from sklearn . preprocessing import OneHotEncoder
5 ct = ColumnTransformer ( transformers =[( ’ encoder ’ , OneHotEncoder () , [0]) ] ,
remainder = ’ passthrough ’)
6 X = np . array ( ct . fit_transform ( X ) )

12
7 print ( X )
8 # Encoding the Dependent Variable
9 from sklearn . preprocessing import LabelEncoder
10 le = LabelEncoder ()
11 y = le . fit_transform ( y )
12 print ( y )
Listing 4: Encoding Categorical Data Code

4.1.5 Arranging Training and Test Data

The training process needs to corrected some test part to reach accuracy of learning pro-
cess. For this reason, data set is split two part which is training data and test data. Training
data using for the learning process. On the other hand, test part is using for testing learning
process to gives correct solution. Arranging training and test part of data set is applied with
following code.

1 # Splitting the dataset into the Training set and Test set
2 from sklearn . model_selection import train_test_split
3 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 1)
4 print ( X_train )
5 print ( X_test )
6 print ( y_train )
7 print ( y_test )
Listing 5: Arranging Traning and Test Data Code

4.1.6 Feature Scaling

Data set has column include datas with own units. To compare all this different values,
we need to normalize or standardize these values. Feature scaling scale datas on exact col-
umn with non-dimensional values of datas. Applying this step makes datas comparable with
other column-wise datas. Application of the feature scaling is made by the following code.

1 # Feature Scaling
2 from sklearn . preprocessing import StandardScaler
3 sc = StandardScaler ()
4 X_train [: , 3:] = sc . fit_transform ( X_train [: , 3:])
5 X_test [: , 3:] = sc . transform ( X_test [: , 3:])
6 print ( X_train )
7 print ( X_test )
Listing 6: Feature Scaling Code

13
4.2 Regression

Regressions is an approximations of dependent values of data set. Dependent values


are expectation of learning process result. With these values, machine is trained. Then,
asking prediction about depending values to machine, we can take prediction around these
approximations.

There are some regression types that are ,

• Simple Linear Regression

• Multiple Linear Regression

• Polynomial Regression

• Support Vector Regression (SVR)

• Decision Tree Regression

• Random Forest Regression

4.2.1 Simple Linear Regression

Simple linear regression is applied for predicting simple linear relation between training
data set. As known application, formula of linear function is,

Figure 1: Simple Linear Function

14
Simple linear regression are used for single dependent single independent data set. Mul-
tiplication of independent values with some coefficient and summation of this values with
initial value gives us dependent value. At the end of the process, regression module obtain
a coefficient which is related data set dependent and independent values. These relation
should be linear in this module because of the regression prediction accuracy.

Figure 2: Simple Linear Regression Application

With respect to obtained linear distribution of datas, machine makes a prediction around
this line. Square sum of subtraction of depending value and predicting value gives us error
of prediction.

Figure 3: Simple Linear Regression Error Contribution

1 # Simple Linear Regression


2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt

15
6 import pandas as pd
7

8 # Importing the dataset


9 dataset = pd . read_csv ( ’ Salary_Data . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 1/3 ,
random_state = 0)
16
17 # Training the Simple Linear Regression model on the Training set
18 from sklearn . linear_model import LinearRegression
19 regressor = LinearRegression ()
20 regressor . fit ( X_train , y_train )
21
22 # Predicting the Test set results
23 y_pred = regressor . predict ( X_test )
24

25 # Visualising the Training set results


26 plt . scatter ( X_train , y_train , color = ’ red ’)
27 plt . plot ( X_train , regressor . predict ( X_train ) , color = ’ blue ’)
28 plt . title ( ’ Salary vs Experience ( Training set ) ’)
29 plt . xlabel ( ’ Years of Experience ’)
30 plt . ylabel ( ’ Salary ’)
31 plt . show ()
32
33 # Visualising the Test set results
34 plt . scatter ( X_test , y_test , color = ’ red ’)
35 plt . plot ( X_train , regressor . predict ( X_train ) , color = ’ blue ’)
36 plt . title ( ’ Salary vs Experience ( Test set ) ’)
37 plt . xlabel ( ’ Years of Experience ’)
38 plt . ylabel ( ’ Salary ’)
39 plt . show ()
Listing 7: Simple Linear Regression Code
Using an data set which is related salary of employers in a year with respect to their
experience, linear distribution of data set can be observed in a graph for training and test
data set looks like,

16
(a) Training Data (b) Test Data

Figure 4: Simple Linear Regression Applications

4.2.2 Multiple Linear Regression

Multiple linear regression is similar with single linear regression. However, in this regres-
sion, independent variables are more than one as an input variable.

Figure 5: Multiple Linear Function

There are some models to train the models with multiple linear regression which are,

• Backward Elimination

– Select a significance level to stay in the model (e.g. SL = 0.005)


– Fit the full model with all possible predictors
– Consider the predictor with the highest P-value. If P > SL, go to next step,
otherwise go to final
– Remove the predictor
– Fit model without this variable

17
• Forward Selection

– Select a significance level to stay in the model (e.g. SL = 0.005)


– Fit all simple regression models y xn . Select the one with the lowest P-value
– Consider the predictor with the highest P-value. If P > SL, go to next step,
otherwise go to final

• Bidirectional Elimination

– Select a significance level to enter and stay in the model (e.g. SLENTER = 0.05,
SLSTAY = 0.05)
– Perform the nect step of Forward Selection (new variable must have: P ¡ SLEN-
TER to enter)
– Perform ALL steps of Backward Elimination (old variables must have P ¡ SLSTAY
to stay)
– No new variables can enter and no old variables can exit

• All Possible Models

– Select a criterion of goodness of fit (e.g. Akaike criterion)


– Construct All Possible Regression Models: 2N − 1 total combinations
– Select the one with the best criterion

For training the model with multiple linear regression, the following python code is,
1 # Multiple Linear Regression
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ 50 _Startups . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12 print ( X )
13
14 # Encoding categorical data
15 from sklearn . compose import ColumnTransformer
16 from sklearn . preprocessing import OneHotEncoder
17 ct = ColumnTransformer ( transformers =[( ’ encoder ’ , OneHotEncoder () , [3]) ] ,
remainder = ’ passthrough ’)
18 X = np . array ( ct . fit_transform ( X ) )
19 print ( X )
20
21 # Splitting the dataset into the Training set and Test set
22 from sklearn . model_selection import train_test_split

18
23 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 0)
24
25 # Training the Multiple Linear Regression model on the Training set
26 from sklearn . linear_model import LinearRegression
27 regressor = LinearRegression ()
28 regressor . fit ( X_train , y_train )
29

30 # Predicting the Test set results


31 y_pred = regressor . predict ( X_test )
32 np . set_printoptions ( precision =2)
33 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
Listing 8: Multiple Linear Regression Code

4.2.3 Polynomial Regression

Polynomial linear regression is polynomial approach for training data set to improve ac-
curacy. Nonlinear relations between independent and dependent variables cause some errors
when linear regressions are used. That is why, polynomial regression is useful for training
process. Shown in following figure, linearization process is done with polynomial approach
on independent variables. With using this model, error caused by nonlinear relation is get-
ting smaller.

Figure 6: Polynomial Regression

Training trend line is like following figure. Each polynomial element makes curve
change with more accurate trend.

19
Figure 7: Polynomial Regression Approximation

The following code written in Python is useful to apply this model on training process.

1 # Polynomial Regression
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Position_Salaries . csv ’)
10 X = dataset . iloc [: , 1: -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Training the Linear Regression model on the whole dataset
14 from sklearn . linear_model import LinearRegression
15 lin_reg = LinearRegression ()
16 lin_reg . fit (X , y )
17

18 # Training the Polynomial Regression model on the whole dataset


19 from sklearn . preprocessing import PolynomialFeatures
20 poly_reg = PolynomialFeatures ( degree = 4)
21 X_poly = poly_reg . fit_transform ( X )
22 lin_reg_2 = LinearRegression ()
23 lin_reg_2 . fit ( X_poly , y )
24
25 # Visualising the Linear Regression results
26 plt . scatter (X , y , color = ’ red ’)
27 plt . plot (X , lin_reg . predict ( X ) , color = ’ blue ’)
28 plt . title ( ’ Truth or Bluff ( Linear Regression ) ’)
29 plt . xlabel ( ’ Position Level ’)
30 plt . ylabel ( ’ Salary ’)
31 plt . show ()

20
32
33 # Visualising the Polynomial Regression results
34 plt . scatter (X , y , color = ’ red ’)
35 plt . plot (X , lin_reg_2 . predict ( poly_reg . fit_transform ( X ) ) , color = ’ blue ’)
36 plt . title ( ’ Truth or Bluff ( Polynomial Regression ) ’)
37 plt . xlabel ( ’ Position level ’)
38 plt . ylabel ( ’ Salary ’)
39 plt . show ()
40
41 # Visualising the Polynomial Regression results ( for higher resolution and
smoother curve )
42 X_grid = np . arange ( min ( X ) , max ( X ) , 0.1)
43 X_grid = X_grid . reshape (( len ( X_grid ) , 1) )
44 plt . scatter (X , y , color = ’ red ’)
45 plt . plot ( X_grid , lin_reg_2 . predict ( poly_reg . fit_transform ( X_grid ) ) , color
= ’ blue ’)
46 plt . title ( ’ Truth or Bluff ( Polynomial Regression ) ’)
47 plt . xlabel ( ’ Position level ’)
48 plt . ylabel ( ’ Salary ’)
49 plt . show ()
50
51 # Predicting a new result with Linear Regression
52 lin_reg . predict ([[6.5]])
53
54 # Predicting a new result with Polynomial Regression
55 lin_reg_2 . predict ( poly_reg . fit_transform ([[6.5]]) )
Listing 9: Polynomial Regression Code
In the following Figure 10, training data set has more error with using linear regres-
sion.However, The polynomial regression change the trend line with minimum error version.

(a) Simple Linear Regression Applica- (b) Polynomial Regression Application


tion on Training Data on Training Data with Rough Line

Figure 8: Regression Application Differences on Training Data

Using spline method makes graph smooth line version which is in Figure 12. Poly-
nomial degree is controlled in this method. Higher degree in polynomial regression makes

21
prediction more accurate. Curve approach to training data with minimum error.

(a) With Low Degree (b) With High Degree

Figure 9: Polynomial Regression Application on Training Data

4.2.4 Support Vector Regression (SVR)

Support vector regression makes a prediction line on the data set with using Ordinary
Least Squares.Then, ǫ-Intensive tube is a tube line around the prediction line between +ǫ and
−ǫ shown in following figure. Using this approach makes prediction more accurate because
of the nearest data to prediction line.

Figure 10: Support Vector Regression (SVR)

The optimization of error is,

m
1 X min
||w||2 + c (ǫi + ǫ∗i ) −−→
2 i=1

22
The application of this model on the training process, following code in Python can
be useful for beginning.

1 # Support Vector Regression ( SVR )


2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Position_Salaries . csv ’)
10 X = dataset . iloc [: , 1: -1]. values
11 y = dataset . iloc [: , -1]. values
12 print ( X )
13 print ( y )
14 y = y . reshape ( len ( y ) ,1)
15 print ( y )
16
17 # Feature Scaling
18 from sklearn . preprocessing import StandardScaler
19 sc_X = StandardScaler ()
20 sc_y = StandardScaler ()
21 X = sc_X . fit_transform ( X )
22 y = sc_y . fit_transform ( y )
23 print ( X )
24 print ( y )
25
26 # Training the SVR model on the whole dataset
27 from sklearn . svm import SVR
28 regressor = SVR ( kernel = ’ rbf ’)
29 regressor . fit (X , y )
30
31 # Predicting a new result
32 sc_y . inverse_transform ( regressor . predict ( sc_X . transform ([[6.5]]) ) )
33

34 # Visualising the SVR results


35 plt . scatter ( sc_X . inverse_transform ( X ) , sc_y . inverse_transform ( y ) , color =
’ red ’)
36 plt . plot ( sc_X . inverse_transform ( X ) , sc_y . inverse_transform ( regressor .
predict ( X ) ) , color = ’ blue ’)
37 plt . title ( ’ Truth or Bluff ( SVR ) ’)
38 plt . xlabel ( ’ Position level ’)
39 plt . ylabel ( ’ Salary ’)
40 plt . show ()
41
42 # Visualising the SVR results ( for higher resolution and smoother curve )
43 X_grid = np . arange ( min ( sc_X . inverse_transform ( X ) ) , max ( sc_X .
inverse_transform ( X ) ) , 0.1)
44 X_grid = X_grid . reshape (( len ( X_grid ) , 1) )
45 plt . scatter ( sc_X . inverse_transform ( X ) , sc_y . inverse_transform ( y ) , color =

23
’ red ’)
46 plt . plot ( X_grid , sc_y . inverse_transform ( regressor . predict ( sc_X . transform (
X_grid ) ) ) , color = ’ blue ’)
47 plt . title ( ’ Truth or Bluff ( SVR ) ’)
48 plt . xlabel ( ’ Position level ’)
49 plt . ylabel ( ’ Salary ’)
50 plt . show ()
Listing 10: Support Vector Regression Code
Support Vector Regression application on the training data set is illustrated with
following figures.

(a) With Rough Line (b) With Smooth Line

Figure 11: Support Vector Regression (SVR) Application on Training Data

The smooth and higher degree application is,

24
Figure 12: Support Vector Regression (SVR) Application on Training Data with Radial
Basis Function (RBF)

4.2.5 Decision Tree Regression

The decision tree regression is mainly based on a prediction which is making a tree about
relation of independent variables. As you see in following figure, all values from data set is
in a coordinate system include some boundaries.

Figure 13: Decision Tree Regression Split Map

For X1 < 20 values split in 2 part in X2 direction which are X2 < 200 part and X2 > 200

25
part. It is illustrated in following figure as a first separation on the relation tree. Then for
each boundaries, the new separation is written on the tree. Finally, probability in final step is
make prediction about new input. The decision tree regression works in this principle in code.

Figure 14: Decision Tree Regression Tree Map

The following Python code is for an example for application of decision tree regression.

1 # Decision Tree Regression


2

3 # Importing the libraries


4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Position_Salaries . csv ’)
10 X = dataset . iloc [: , 1: -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Training the Decision Tree Regression model on the whole dataset
14 from sklearn . tree import De cisi onTr eeReg ress or
15 regressor = Deci sion Tree Regre ssor ( random_state = 0)
16 regressor . fit (X , y )
17
18 # Predicting a new result
19 regressor . predict ([[6.5]])
20
21 # Visualising the Decision Tree Regression results ( higher resolution )
22 X_grid = np . arange ( min ( X ) , max ( X ) , 0.01)
23 X_grid = X_grid . reshape (( len ( X_grid ) , 1) )

26
24 plt . scatter (X , y , color = ’ red ’)
25 plt . plot ( X_grid , regressor . predict ( X_grid ) , color = ’ blue ’)
26 plt . title ( ’ Truth or Bluff ( Decision Tree Regression ) ’)
27 plt . xlabel ( ’ Position level ’)
28 plt . ylabel ( ’ Salary ’)
29 plt . show ()
Listing 11: Decision Tree Regression Code
The decision tree regression model is perfect model for multiple input data set. That is
why, the graph in the following figure is less accurate because of the data set.

Figure 15: Decision Regression Tree Application on Training Data

4.2.6 Random Forest Regression

Random forest regression is basically multiple application of the decision tree regression.
As you can realize name of the regressions, decision tree regression has only one tree in
leaning model. Therefore, random forest regression has more than one tree like a forest.

The following steps are application steps which are,

• Pick at random K (Tree Number) data points from the Training set.

• Build the Decision Tree associated to these K data points

• Choose the number Ntree of trees you want to build and repeat previous steps

27
• For a new data point, make each one of your Ntree trees predict the values of Y to for
the data point in question and assign the new data point the average across all of the
predicted Y values.

The code on the below is all about the application of random forest regression model for
machine learning written in Python.

1 # Random Forest Regression


2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Position_Salaries . csv ’)
10 X = dataset . iloc [: , 1: -1]. values
11 y = dataset . iloc [: , -1]. values
12

13 # Training the Random Forest Regression model on the whole dataset


14 from sklearn . ensemble import R andom Fore stReg ress or
15 regressor = Rand omFo rest Regre ssor ( n_estimators = 10 , random_state = 0)
16 regressor . fit (X , y )
17
18 # Predicting a new result
19 regressor . predict ([[6.5]])
20
21 # Visualising the Random Forest Regression results ( higher resolution )
22 X_grid = np . arange ( min ( X ) , max ( X ) , 0.01)
23 X_grid = X_grid . reshape (( len ( X_grid ) , 1) )
24 plt . scatter (X , y , color = ’ red ’)
25 plt . plot ( X_grid , regressor . predict ( X_grid ) , color = ’ blue ’)
26 plt . title ( ’ Truth or Bluff ( Random Forest Regression ) ’)
27 plt . xlabel ( ’ Position level ’)
28 plt . ylabel ( ’ Salary ’)
29 plt . show ()
Listing 12: Random Forest Regression Code
Similarly, in random forest regression is less accurate model for single input data set.

28
Figure 16: Random Forest Regression Application on Training Data

4.3 Classification

Generally, classification model using for classifying data set values in class which related
in a group with multi input. We will examine all these model in this part. Classification is
studied some subtopic which are,

• Logistic Regression

• K-Nearest Neighbors (K-NN)

• Support Vector Machine (SVM)

• Kernel SVM

• Naive Bayes

• Decision Tree Classification

• Random Forest Classification

4.3.1 Logistic Regression

Logistic regression use probability approach to predict new result. In data set, we have
some input with outputs. The machine calculate all probabilities in this data set.

29
Figure 17: Probability Distribution of Data Set

After this step, mean of this distribution is calculated. Machine use mean as reference
value for prediction. Below and above of mean value is different prediction for new input.

Figure 18: Probability Shifting for Predicted Values

To apply logistic regression, following code written in Python can be good example for
modelling.

1 # Logistic Regression

30
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20

21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the Logistic Regression model on the Training set
30 from sklearn . linear_model import LogisticRegression
31 classifier = LogisticRegression ( random_state = 0)
32 classifier . fit ( X_train , y_train )
33
34 # Predicting a new result
35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36
37 # Predicting the Test set results
38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40
41 # Making the Confusion Matrix
42 from sklearn . metrics import confusion_matrix , accuracy_score
43 cm = confusion_matrix ( y_test , y_pred )
44 print ( cm )
45 accuracy_score ( y_test , y_pred )
46
47 # Visualising the Training set results
48 from matplotlib . colors import ListedColormap
49 X_set , y_set = sc . inverse_transform ( X_train ) , y_train
50 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )

31
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ Logistic Regression ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63

64 # Visualising the Test set results


65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’ Logistic Regression ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 13: Logistic Regression Code
Logistic regression is similar to linear regression. However, it use probability to improve
the prediction accuracy.

32
(a) Training Data (b) Test Data

Figure 19: Logistic Regression Application

4.3.2 K-Nearest Neighbors (K-NN)

K-nearest neighbor classification model based on the Euclidean Distance Which is for-
mulated by,

Figure 20: Euclidean Distance

p
Distance = (x2 − x1 )2 + (y2 − y1 )2

First of all, the K value which is number of neighbor is determined. Shown in the follow-
ing figure, K = 5.

33
Figure 21: Categories and New Data Position

When the new data input is entered, machine determine the Euclidean Distance which
is closest the new data point. At the end, new data specified in that class.

Figure 22: K-NN Classification Process

We can finalize all step as,

• Choose the number K of neighbors

• Take the K nearest neighbor of the new data point, according to the Euclidean Distance

• Among these K neighbors, count the number of data points in each category

• Assign the new data point to the category where you counted the most neighbors

34
• The model is ready

To apply this model on the data set you have, the following code written in Python is
good example,

1 # K - Nearest Neighbors (K - NN )
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20

21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the K - NN model on the Training set
30 from sklearn . neighbors import KNeighborsClassifier
31 classifier = KNeighborsClassifier ( n_neighbors = 5 , metric = ’ minkowski ’ , p
= 2)
32 classifier . fit ( X_train , y_train )
33
34 # Predicting a new result
35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36
37 # Predicting the Test set results
38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40
41 # Making the Confusion Matrix
42 from sklearn . metrics import confusion_matrix , accuracy_score
43 cm = confusion_matrix ( y_test , y_pred )
44 print ( cm )
45 accuracy_score ( y_test , y_pred )

35
46
47 # Visualising the Training set results
48 from matplotlib . colors import ListedColormap
49 X_set , y_set = sc . inverse_transform ( X_train ) , y_train
50 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 1) ,
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 1) )
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’K - NN ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63
64 # Visualising the Test set results
65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 1) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 1) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’K - NN ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 14: K-Nearest Neighbors (K-NN) Code
As you seen in the next two figure, The boundary is not linear. That is why, accuracy of
this model is higher than the logistic regression.

36
(a) Training Data (b) Test Data

Figure 23: K-Nearest Neighbors (K-NN) Application

4.3.3 Support Vector Machine (SVM)

Support Vector Machine classification is similar to Support Vector Regression. The main
idea is obtaining a line between support vectors in equal length. Mean line has negative and
positive hyperplane. Mean line is boundary for prediction.

Figure 24: Support Vector Machine Applcation

Application of Support Vector Machine in Python is written in the following code.

1 # Support Vector Machine ( SVM )


2

37
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20
21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the SVM model on the Training set
30 from sklearn . svm import SVC
31 classifier = SVC ( kernel = ’ linear ’ , random_state = 0)
32 classifier . fit ( X_train , y_train )
33

34 # Predicting a new result


35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36
37 # Predicting the Test set results
38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40
41 # Making the Confusion Matrix
42 from sklearn . metrics import confusion_matrix , accuracy_score
43 cm = confusion_matrix ( y_test , y_pred )
44 print ( cm )
45 accuracy_score ( y_test , y_pred )
46
47 # Visualising the Training set results
48 from matplotlib . colors import ListedColormap
49 X_set , y_set = sc . inverse_transform ( X_train ) , y_train
50 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,

38
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ SVM ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63
64 # Visualising the Test set results
65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’ SVM ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 15: Support Vector Machine Code
the Hyperplanes and maximum margin is shown on the following figures. As you see in
the figures some data point from the green side is in the red side of hyperplane. The machine
predict new data entry with this modelling.

39
(a) Training Data (b) Test Data

Figure 25: Support Vector Machine (SVM) Application

4.3.4 Kernel SVM

Like previous application of kernel models, this model also solve prediction with Radial
Basis Function in Support Vector Machine. The only difference from Support Vector Machine
is prediction is not linear approximation. That is why, data set is separated in two part in
3D space.

Figure 26: Mapping Function Application

There are different types of kernels to improve accuracy of SVM modelling. Which are,

40
Figure 27: Kernels of Mapping Function

The Gaussian RBF Kernel is an common application in the following figure as an exam-
ple. The kernel application convert 2D data set to 3D space with following formula which is,

~i 2
||~
x,l ||
K(~x, ~li ) = e− σ2

Figure 28: The Gaussian RBF Kernel Application

The example of Kernel SVM application on Python code is,

1 # Kernel SVM
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt

41
6 import pandas as pd
7

8 # Importing the dataset


9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20
21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the Kernel SVM model on the Training set
30 from sklearn . svm import SVC
31 classifier = SVC ( kernel = ’ rbf ’ , random_state = 0)
32 classifier . fit ( X_train , y_train )
33
34 # Predicting a new result
35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36

37 # Predicting the Test set results


38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40
41 # Making the Confusion Matrix
42 from sklearn . metrics import confusion_matrix , accuracy_score
43 cm = confusion_matrix ( y_test , y_pred )
44 print ( cm )
45 accuracy_score ( y_test , y_pred )
46
47 # Visualising the Training set results
48 from matplotlib . colors import ListedColormap
49 X_set , y_set = sc . inverse_transform ( X_train ) , y_train
50 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )

42
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ Kernel SVM ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63
64 # Visualising the Test set results
65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’ Kernel SVM ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 16: Kernel SVM Code
Shown on the following figures the Kernel SVM has nonlinear margin line with respect
to classical SVM. This application develop the accuracy of prediction.

(a) Training Data (b) Test Data

Figure 29: Kernel SVM Application

43
4.3.5 Naive Bayes

Naive Bayes classification is basically based on Bayes Probability Theorem which is,

P (B \ A) × P (A)
P (A \ B) =
P (B)
In this model, Bayes theorem is applied on the each class like following figures.

(a) Class A Probability Related to Total (b) Class B Probability Related to Total
Probability Probability

Figure 30: Probabilities of Two Different Class on the Data Set

Then, The comparison on two class Bayes probabilities using in the new data prediction.

P (W alks \ X)v.s.P (Drives \ X)

The application this model can be modelled in following Python code.

1 # Naive Bayes
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12

13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)

44
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20
21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the Naive Bayes model on the Training set
30 from sklearn . naive_bayes import GaussianNB
31 classifier = GaussianNB ()
32 classifier . fit ( X_train , y_train )
33
34 # Predicting a new result
35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36
37 # Predicting the Test set results
38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40

41 # Making the Confusion Matrix


42 from sklearn . metrics import confusion_matrix , accuracy_score
43 cm = confusion_matrix ( y_test , y_pred )
44 print ( cm )
45 accuracy_score ( y_test , y_pred )
46

47 # Visualising the Training set results


48 from matplotlib . colors import ListedColormap
49 X_set , y_set = sc . inverse_transform ( X_train ) , y_train
50 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ Naive Bayes ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63
64 # Visualising the Test set results

45
65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’ Naive Bayes ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 17: Naive Bayes Code
Result of application of this model is illustrated following figure illustrated of data set
contribution.

(a) Training Data (b) Test Data

Figure 31: Naive Bayes Application

4.3.6 Decision Tree Classification

Decision Tree Classification is similar to the Decision Tree Regression. The data set is
separated in parts like following figure again.

46
Figure 32: Decision Tree Classification Separation of Data Set

Then, similar to Decision Tree Regression, tree of data set is determined and specified
boundary of each class. The only difference in that model from Decision Tree Regression is
using classification for data set not a probability for new data entry.

Figure 33: Decision Tree Classification Tree

Decision Tree Classification application is written in Python like following code.

47
1 # Decision Tree Classification
2

3 # Importing the libraries


4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20
21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the Decision Tree Classification model on the Training set
30 from sklearn . tree import D ec is io nTr ee Cl ass if ie r
31 classifier = De ci si onT re eC la ssi fi er ( criterion = ’ entropy ’ , random_state =
0)
32 classifier . fit ( X_train , y_train )
33
34 # Predicting a new result
35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36

37 # Predicting the Test set results


38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40
41 # Making the Confusion Matrix
42 from sklearn . metrics import confusion_matrix , accuracy_score
43 cm = confusion_matrix ( y_test , y_pred )
44 print ( cm )
45 accuracy_score ( y_test , y_pred )
46
47 # Visualising the Training set results
48 from matplotlib . colors import ListedColormap
49 X_set , y_set = sc . inverse_transform ( X_train ) , y_train
50 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,

48
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ Decision Tree Classification ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63
64 # Visualising the Test set results
65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’ Decision Tree Classification ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 18: Decision Tree Classification Code
As shown on the following figures, this modeled is has more than one region to obtain
classes. That is why prediction can be solved with high accuracy with respect to data set size.

49
(a) Training Data (b) Test Data

Figure 34: Decision Tree Classification Application

4.3.7 Random Forest Classification

In this classification model, each steps are same as Random Forest Regression. However,
only difference is model prediction way which is classification. Therefore, steps are,

• Pick at random K (Tree Number) data points from the Training set.

• Build the Decision Tree associated to these K data points

• Choose the number Ntree of trees you want to build and repeat previous steps

• For a new data point, make each one of your Ntree trees predict the values of Y to for
the data point in question and assign the new data point the average across all of the
predicted Y values.

The application of this classification is in the following Python code, which is,

1 # Random Forest Classification


2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7

8 # Importing the dataset


9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split

50
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20
21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the Random Forest Classification model on the Training set
30 from sklearn . ensemble import R an do mF ore st Cl ass if ie r
31 classifier = Ra nd om For es tC la ssi fi er ( n_estimators = 10 , criterion = ’
entropy ’ , random_state = 0)
32 classifier . fit ( X_train , y_train )
33
34 # Predicting a new result
35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36
37 # Predicting the Test set results
38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40
41 # Making the Confusion Matrix
42 from sklearn . metrics import confusion_matrix , accuracy_score
43 cm = confusion_matrix ( y_test , y_pred )
44 print ( cm )
45 accuracy_score ( y_test , y_pred )
46
47 # Visualising the Training set results
48 from matplotlib . colors import ListedColormap
49 X_set , y_set = sc . inverse_transform ( X_train ) , y_train
50 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ Random Forest Classification ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()

51
62 plt . show ()
63

64 # Visualising the Test set results


65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’ Random Forest Classification ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 19: Random Forest Classification Code
The distribution of data set entries is visualized as following figures which are,

(a) Training Data (b) Test Data

Figure 35: Random Forest Classification Application

4.4 Clustering

Clustering is a different way to obtain new data prediction. In this type of models, model
make a prediction with making cluster for each group of data entry on training part. There
are two types of clustering, which are,

52
• K-Means Clustering
• Hierarchical Clustering

4.4.1 K-Means Clustering

K-Means Clustering is a cluster determination model on data set to obtain new entries’
related cluster. Firstly, cluster number is specified on the data set. Then, the centroid of
these clusters
• Choose the number K of clusters
• Select at random K points, the centroids (not necessarily from your data set)

Figure 36: Randomly Selected Points for Each Clusters

• Assign each data point to closest centroid that forms K clusters

Figure 37: Movement to the New Centroid Point Calculated

53
• Compute and place the new centroid of each cluster

Figure 38: New Clusters for New Closest Centroid

• Reassign each data point to the new closest centroid. If any reassignment took place,
go to previous step, otherwise go to finalize

Figure 39: Final Clusters

• Your model is Ready

54
Figure 40: K-Means Application Process

To obtain number of cluster K, using The Elbow Method which is calculated as,

X X X
W CSS = distance(Pi , C1 )2 + distance(Pi , C2 )2 + distance(Pi , C3 )2 +...
Pi inCluster1 Pi inCluster2 Pi inCluster3

Figure 41: Optimum Number of Clusters Determination

To apply the K-Means Clustering model on the data set, following code written in Python
is good example.

55
1 # K - Means Clustering
2

3 # Importing the libraries


4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Mall_Customers . csv ’)
10 X = dataset . iloc [: , [3 , 4]]. values
11
12 # Using the elbow method to find the optimal number of clusters
13 from sklearn . cluster import KMeans
14 wcss = []
15 for i in range (1 , 11) :
16 kmeans = KMeans ( n_clusters = i , init = ’k - means ++ ’ , random_state = 42)
17 kmeans . fit ( X )
18 wcss . append ( kmeans . inertia_ )
19 plt . plot ( range (1 , 11) , wcss )
20 plt . title ( ’ The Elbow Method ’)
21 plt . xlabel ( ’ Number of clusters ’)
22 plt . ylabel ( ’ WCSS ’)
23 plt . show ()
24
25 # Training the K - Means model on the dataset
26 kmeans = KMeans ( n_clusters = 5 , init = ’k - means ++ ’ , random_state = 42)
27 y_kmeans = kmeans . fit_predict ( X )
28
29 # Visualising the clusters
30 plt . scatter ( X [ y_kmeans == 0 , 0] , X [ y_kmeans == 0 , 1] , s = 100 , c = ’ red ’ ,
label = ’ Cluster 1 ’)
31 plt . scatter ( X [ y_kmeans == 1 , 0] , X [ y_kmeans == 1 , 1] , s = 100 , c = ’ blue ’ ,
label = ’ Cluster 2 ’)
32 plt . scatter ( X [ y_kmeans == 2 , 0] , X [ y_kmeans == 2 , 1] , s = 100 , c = ’ green ’
, label = ’ Cluster 3 ’)
33 plt . scatter ( X [ y_kmeans == 3 , 0] , X [ y_kmeans == 3 , 1] , s = 100 , c = ’ cyan ’ ,
label = ’ Cluster 4 ’)
34 plt . scatter ( X [ y_kmeans == 4 , 0] , X [ y_kmeans == 4 , 1] , s = 100 , c = ’
magenta ’ , label = ’ Cluster 5 ’)
35 plt . scatter ( kmeans . cluster_centers_ [: , 0] , kmeans . cluster_centers_ [: , 1] ,
s = 300 , c = ’ yellow ’ , label = ’ Centroids ’)
36 plt . title ( ’ Clusters of customers ’)
37 plt . xlabel ( ’ Annual Income ( k$ ) ’)
38 plt . ylabel ( ’ Spending Score (1 -100) ’)
39 plt . legend ()
40 plt . show ()
Listing 20: K-Means Clustering Code
As shown in following figure, the elbow method determined 5 for optimum cluster number
and each clusters has exact region in there.

56
(a) The Elbow Method (b) Training Data

Figure 42: K-Means Clustering Application

4.4.2 Hierarchical Clustering

This model based on the hierarchical clustering model which mean all data points have
a relation between the other points. Then, nearest point become one cluster in itself with
using Euclidean Distance.

Figure 43: Euclidean Distance

p
Distance = (x2 − x1 )2 + (y2 − y1 )2

57
Figure 44: Distance Between Two Clusters

The application steps are,

• Make each data point a single-point cluster that forms N clusters

• Take the two closest data points and make them one cluster that forms N − 1 clusters

• Take the two closest clusters and make them one cluster that forms N − 2 clusters

• Repeat previous step until there is only one cluster

• Model is ready

At the end of these steps, the main issue is how many clusters are selected. At this point,
the method specified the number of clusters is applied with using Dendrogram. As shown in
the following figures, chosen cluster numbers is determined on dendrogram with obtain the
single largest continuous leg. To reach this leg on the dendrogram, number of total leg gives
number of clusters.

58
(a) 2 Cluster Example (b) 3 Cluster Example

Figure 45: Hierarchical Clustering Model Optimization of Cluster Number

Then, the model is finalized with this cluster number for training data set.

Figure 46: Final Optimum Clusters for Hierarchical Clustering Model

Therefore, All this steps are in the Pyhon code as a single part of script as an example
in the previous code page.

1 # Hierarchical Clustering
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Mall_Customers . csv ’)
10 X = dataset . iloc [: , [3 , 4]]. values
11

12 # Using the dendrogram to find the optimal number of clusters


13 import scipy . cluster . hierarchy as sch
14 dendrogram = sch . dendrogram ( sch . linkage (X , method = ’ ward ’) )
15 plt . title ( ’ Dendrogram ’)
16 plt . xlabel ( ’ Customers ’)
17 plt . ylabel ( ’ Euclidean distances ’)
18 plt . show ()
19
20 # Training the Hierarchical Clustering model on the dataset

59
21 from sklearn . cluster import Ag g lo me r at i ve Cl u st e ri ng
22 hc = Ag g lo me r at i ve Cl u st e ri ng ( n_clusters = 5 , affinity = ’ euclidean ’ ,
linkage = ’ ward ’)
23 y_hc = hc . fit_predict ( X )
24
25 # Visualising the clusters
26 plt . scatter ( X [ y_hc == 0 , 0] , X [ y_hc == 0 , 1] , s = 100 , c = ’ red ’ , label =
’ Cluster 1 ’)
27 plt . scatter ( X [ y_hc == 1 , 0] , X [ y_hc == 1 , 1] , s = 100 , c = ’ blue ’ , label =
’ Cluster 2 ’)
28 plt . scatter ( X [ y_hc == 2 , 0] , X [ y_hc == 2 , 1] , s = 100 , c = ’ green ’ , label
= ’ Cluster 3 ’)
29 plt . scatter ( X [ y_hc == 3 , 0] , X [ y_hc == 3 , 1] , s = 100 , c = ’ cyan ’ , label =
’ Cluster 4 ’)
30 plt . scatter ( X [ y_hc == 4 , 0] , X [ y_hc == 4 , 1] , s = 100 , c = ’ magenta ’ ,
label = ’ Cluster 5 ’)
31 plt . title ( ’ Clusters of customers ’)
32 plt . xlabel ( ’ Annual Income ( k$ ) ’)
33 plt . ylabel ( ’ Spending Score (1 -100) ’)
34 plt . legend ()
35 plt . show ()
Listing 21: Hierarchical Clustering Code
The dendrogram and traning data visualization is in the following figures with same data
set in the previous clustering model K-Means Clustering model.

(a) The Dendrogram (b) Training Data

Figure 47: Hierarchical Clustering Application

4.5 Deep Learning

Deep Learning is a type of machine learning application. Deep Learning applications


have self learning itself. It is just like human brain working principle. That is why, the Deep

60
Learning consist of two subtopic which are,

• Artificial Neural Networks


• Convolutional Neural Networks

4.5.1 Artificial Neural Networks

The Artificial Neural networks work as a brain. That is why, the structure of this model
is similar to the human brain cell neuron.

Figure 48: Biological and Artificial Neurons

The input values is has different unit. Therefore, standardization of input value is im-
portant to imply this input in transfer function.

Figure 49: Standardization of Independent Variables

Standardized input variables are multiplied with an weight which is related with the im-
portance of input values gives us the transfer function.

61
Figure 50: Transfer Function Step in Neurons

Then, all inputs are transferred to the activation function which finalize the learning
process and connected to the output steps.

Figure 51: Activation Function Step in Neurons

Activation functions are used in different aspect of application which are,

62
(a) Threshold Function (b) Sigmoid Function

(d) Hyperbolic Tangent (tanh) Func-


(c) Rectifier (ReLu) Function tion

Figure 52: Activation Functions of Neural Networks

The neurons can consist of more than one neurons’ layer called hidden layers to improve
the accuracy and effected learning time. In hidden layer and output layer, different activa-
tion function can be used to reach optimum learning model as you can see following figure.

63
Figure 53: Multi Layer Neurons with Different Activation Functions

To update weights and minimize error, neurons use back propagation rule.

Figure 54: Arranging Weight with Back Propagation

The error calculated with loss function which should be minimum.

64
Figure 55: Minimizing Error

For minimizing the error, some rules are used with loss function, which are,

• Gradient Descent

• Stochastic Gradient Descent

Gradient Descent

The gradient descent use loss function step by step until the error is minimum.

Figure 56: Gradient Descent

Then upgrade the weight with using back propagation.

65
Figure 57: Back Propagation with Gradient Descent

For instance, The error is minimized when the gradient is reach the minimum error point
in the following figure.

Figure 58: Example of Gradient Descent

Stochastic Gradient Descent

As shown on the following figure, some error distribution is not like the previous one in
data sets. The minimized error is wrong in this case. That is why, We use this Stochastic
Gradient Descent to upgrade the weight distribution to reach minimum error on the model.

66
Figure 59: Stochastic Gradient Descent

The difference between Gradient Descent and Stochastic Gradient Descent is shown in
the following figure. Stochastic Gradient Descent examine on each single input to obtain
error on the model. However, Gradient Descent use batch model which is total or sum part
of input are used to obtain error and minimized.

Figure 60: Difference Between Batch Gradient Descent and Stochastic Gradient Descent

There are two type of propagation which are,

• Forward Propagation

• Backward Propagation

67
The way of this propagation is shown on the following figures.

(a) Forward Propagation (b) Backward Propagation

Figure 61: Types of Propagation on Neural Networks

To apply the Artificial Neural Networks on the data set, all steps are included in following
Python code.

1 # Artificial Neural Network


2

3 # Importing the libraries


4 import numpy as np
5 import pandas as pd
6 import tensorflow as tf
7 tf . __version__
8

9 # Part 1 - Data Preprocessing


10
11 # Importing the dataset
12 dataset = pd . read_csv ( ’ Churn_Modelling . csv ’)
13 X = dataset . iloc [: , 3: -1]. values
14 y = dataset . iloc [: , -1]. values
15 print ( X )
16 print ( y )
17
18 # Encoding categorical data
19 # Label Encoding the " Gender " column
20 from sklearn . preprocessing import LabelEncoder
21 le = LabelEncoder ()
22 X [: , 2] = le . fit_transform ( X [: , 2])
23 print ( X )
24 # One Hot Encoding the " Geography " column
25 from sklearn . compose import ColumnTransformer
26 from sklearn . preprocessing import OneHotEncoder
27 ct = ColumnTransformer ( transformers =[( ’ encoder ’ , OneHotEncoder () , [1]) ] ,
remainder = ’ passthrough ’)
28 X = np . array ( ct . fit_transform ( X ) )
29 print ( X )

68
30
31 # Splitting the dataset into the Training set and Test set
32 from sklearn . model_selection import train_test_split
33 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 0)
34
35 # Feature Scaling
36 from sklearn . preprocessing import StandardScaler
37 sc = StandardScaler ()
38 X_train = sc . fit_transform ( X_train )
39 X_test = sc . transform ( X_test )
40
41 # Part 2 - Building the ANN
42

43 # Initializing the ANN


44 ann = tf . keras . models . Sequential ()
45
46 # Adding the input layer and the first hidden layer
47 ann . add ( tf . keras . layers . Dense ( units =6 , activation = ’ relu ’) )
48

49 # Adding the second hidden layer


50 ann . add ( tf . keras . layers . Dense ( units =6 , activation = ’ relu ’) )
51
52 # Adding the output layer
53 ann . add ( tf . keras . layers . Dense ( units =1 , activation = ’ sigmoid ’) )
54

55 # Part 3 - Training the ANN


56
57 # Compiling the ANN
58 ann . compile ( optimizer = ’ adam ’ , loss = ’ binary_crossentropy ’ , metrics = [ ’
accuracy ’ ])
59

60 # Training the ANN on the Training set


61 ann . fit ( X_train , y_train , batch_size = 32 , epochs = 100)
62
63 # Part 4 - Making the predictions and evaluating the model
64
65 # Predicting the result of a single observation
66
67 """
68 Homework :
69 Use our ANN model to predict if the customer with the following
informations will leave the bank :
70 Geography : France
71 Credit Score : 600
72 Gender : Male
73 Age : 40 years old
74 Tenure : 3 years
75 Balance : $ 60000
76 Number of Products : 2
77 Does this customer have a credit card ? Yes
78 Is this customer an Active Member : Yes
79 Estimated Salary : $ 50000
80 So , should we say goodbye to that customer ?

69
81
82 Solution :
83 """
84
85 print ( ann . predict ( sc . transform ([[1 , 0 , 0 , 600 , 1 , 40 , 3 , 60000 , 2 , 1 , 1 ,
50000]]) ) > 0.5)
86
87 """
88 Therefore , our ANN model predicts that this customer stays in the bank !
89 Important note 1: Notice that the values of the features were all input in
a double pair of square brackets . That ’s because the " predict " method
always expects a 2 D array as the format of its inputs . And putting our
values into a double pair of square brackets makes the input exactly a
2 D array .
90 Important note 2: Notice also that the " France " country was not input as a
string in the last column but as "1 , 0 , 0" in the first three columns .
That ’s because of course the predict method expects the one - hot -
encoded values of the state , and as we see in the first row of the
matrix of features X , " France " was encoded as "1 , 0 , 0". And be careful
to include these values in the first three columns , because the dummy
variables are always created in the first columns .
91 """
92
93 # Predicting the Test set results
94 y_pred = ann . predict ( X_test )
95 y_pred = ( y_pred > 0.5)
96 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
97
98 # Making the Confusion Matrix
99 from sklearn . metrics import confusion_matrix , accuracy_score
100 cm = confusion_matrix ( y_test , y_pred )
101 print ( cm )
102 accuracy_score ( y_test , y_pred )
Listing 22: Artificial Neural Networks Code

4.5.2 Convolutional Neural Networks

Convolutional Neural Networks are basically used in image processing application. Ma-
chine is trained to realized new image entry with structure in following figure.

70
Figure 62: Convolutional Neural Network Structure

All images have pixels on it. If the image is consist of black and white, the array on
image processing is 2D array. Otherwise, in colored image have three dimension which are
red, green and blue.

Figure 63: Dimensions of Images

The each pixel of image have a number in binary system for image processing. For BW
images, black is 1 and white is 0.

71
Figure 64: Pixels in Binary Systems

There are some rules to reduce the binary map. First method is Convolution which is,

Z ∞
(f )(t) = f (t)g(t − τ )dτ

For application of this method, the first step is choosing a feature detector form input
image which is part of input image. Then, for each part of input image this detectors simi-
larities are searched. Hence, Feature map is prepared.

Figure 65: Feature Map

With application of all different feature map application generate Convolutional Layer.

72
Figure 66: Convolutional Layer

At the end, we have a convolutional Layer and application of this layer to the input image
seen on the computer memory like following figure.

Figure 67: Example of Convolutional Layer Application

Then, The first activation function is applied in there for ReLU layer.

Figure 68: Rectifier Activation (ReLU) Layer

73
Feature maps in the convolutional layer is turned to pooled feature map as shown on the
following figure.

Figure 69: Pooled Feature Map

Similar to the convolutional layer determination steps, this steps specified Pooling Layer
with using each pooled featured map.

Figure 70: Pooling Layer

Then pooling layer is converted to input layer for Artificial Neural Networks. This step
is called flattening.

74
Figure 71: Converting of Pooling Layer to Input Layer (Flattening)

The Convolutional Neural Network preparation with all steps is shown on the following
figure.

Figure 72: Convolutional Neural Networks Preparation Steps

At the end of applying all steps mentioned in previous steps, input layer is used as a
Artificial Neural Network input.

75
Figure 73: Input Application on Artificial Neural Networks for Convolutional Neural Net-
works

Finally, Convolutional Neural Network with all steps seen in the following figure is ready
for the train a data set.

Figure 74: Convolutional Neural Network Structure Included All Steps

To apply this model for different data set, there are a code written in Python.

1 # Convolutional Neural Network


2
3 # Importing the libraries
4 import tensorflow as tf
5 from keras . preprocessing . image import ImageDataGenerator
6 tf . __version__

76
7
8 # Part 1 - Data Preprocessing
9
10 # Preprocessing the Training set
11 train_datagen = ImageDataGenerator ( rescale = 1./255 ,
12 shear_range = 0.2 ,
13 zoom_range = 0.2 ,
14 horizontal_flip = True )
15 training_set = train_datagen . flow_from_directory ( ’ dataset / training_set ’ ,
16 target_size = (64 , 64) ,
17 batch_size = 32 ,
18 class_mode = ’ binary ’)
19
20 # Preprocessing the Test set
21 test_datagen = ImageDataGenerator ( rescale = 1./255)
22 test_set = test_datagen . flow_from_directory ( ’ dataset / test_set ’ ,
23 target_size = (64 , 64) ,
24 batch_size = 32 ,
25 class_mode = ’ binary ’)
26

27 # Part 2 - Building the CNN


28
29 # Initialising the CNN
30 cnn = tf . keras . models . Sequential ()
31
32 # Step 1 - Convolution
33 cnn . add ( tf . keras . layers . Conv2D ( filters =32 , kernel_size =3 , activation = ’ relu
’ , input_shape =[64 , 64 , 3]) )
34
35 # Step 2 - Pooling
36 cnn . add ( tf . keras . layers . MaxPool2D ( pool_size =2 , strides =2) )
37

38 # Adding a second convolutional layer


39 cnn . add ( tf . keras . layers . Conv2D ( filters =32 , kernel_size =3 , activation = ’ relu
’) )
40 cnn . add ( tf . keras . layers . MaxPool2D ( pool_size =2 , strides =2) )
41
42 # Step 3 - Flattening
43 cnn . add ( tf . keras . layers . Flatten () )
44
45 # Step 4 - Full Connection
46 cnn . add ( tf . keras . layers . Dense ( units =128 , activation = ’ relu ’) )
47
48 # Step 5 - Output Layer
49 cnn . add ( tf . keras . layers . Dense ( units =1 , activation = ’ sigmoid ’) )
50
51 # Part 3 - Training the CNN
52
53 # Compiling the CNN
54 cnn . compile ( optimizer = ’ adam ’ , loss = ’ binary_crossentropy ’ , metrics = [ ’
accuracy ’ ])
55
56 # Training the CNN on the Training set and evaluating it on the Test set
57 cnn . fit ( x = training_set , validation_data = test_set , epochs = 25)

77
58
59 # Part 4 - Making a single prediction
60
61 import numpy as np
62 from keras . preprocessing import image
63 test_image = image . load_img ( ’ dataset / single_prediction / cat_or_dog_1 . jpg ’ ,
target_size = (64 , 64) )
64 test_image = image . img_to_array ( test_image )
65 test_image = np . expand_dims ( test_image , axis = 0)
66 result = cnn . predict ( test_image )
67 training_set . class_indices
68 if result [0][0] == 1:
69 prediction = ’ dog ’
70 else :
71 prediction = ’ cat ’
72 print ( prediction )
Listing 23: Convolutional Neural Networks Code

4.6 Dimensionality Reduction

Dimensionality Reduction is applied in three different way which are,

• Principal Component Analysis (PCA)


• Linear Discriminant Analysis
• Kernel PCA

4.6.1 Principal Component Analysis (PCA)

The goal of PCA is identify patterns in data and detect the correlation between variales.
PCA is used in ,

• Noise Filtering
• Visualization
• Feature Extraction
• Stock Market Prediction
• Gene Data Analysis
For application of PCA, following code written in Pyhton will be useful in applying data
set.

78
1 # Principal Component Analysis ( PCA )
2

3 # Importing the libraries


4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Wine . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 0)
16
17 # Feature Scaling
18 from sklearn . preprocessing import StandardScaler
19 sc = StandardScaler ()
20 X_train = sc . fit_transform ( X_train )
21 X_test = sc . transform ( X_test )
22
23 # Applying PCA
24 from sklearn . decomposition import PCA
25 pca = PCA ( n_components = 2)
26 X_train = pca . fit_transform ( X_train )
27 X_test = pca . transform ( X_test )
28
29 # Training the Logistic Regression model on the Training set
30 from sklearn . linear_model import LogisticRegression
31 classifier = LogisticRegression ( random_state = 0)
32 classifier . fit ( X_train , y_train )
33
34 # Making the Confusion Matrix
35 from sklearn . metrics import confusion_matrix , accuracy_score
36 y_pred = classifier . predict ( X_test )
37 cm = confusion_matrix ( y_test , y_pred )
38 print ( cm )
39 accuracy_score ( y_test , y_pred )
40
41 # Visualising the Training set results
42 from matplotlib . colors import ListedColormap
43 X_set , y_set = X_train , y_train
44 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
45 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
46 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
47 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
48 plt . xlim ( X1 . min () , X1 . max () )
49 plt . ylim ( X2 . min () , X2 . max () )

79
50 for i , j in enumerate ( np . unique ( y_set ) ) :
51 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
52 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
53 plt . title ( ’ Logistic Regression ( Training set ) ’)
54 plt . xlabel ( ’ PC1 ’)
55 plt . ylabel ( ’ PC2 ’)
56 plt . legend ()
57 plt . show ()
58
59 # Visualising the Test set results
60 from matplotlib . colors import ListedColormap
61 X_set , y_set = X_test , y_test
62 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
63 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
64 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
65 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
66 plt . xlim ( X1 . min () , X1 . max () )
67 plt . ylim ( X2 . min () , X2 . max () )
68 for i , j in enumerate ( np . unique ( y_set ) ) :
69 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
70 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
71 plt . title ( ’ Logistic Regression ( Test set ) ’)
72 plt . xlabel ( ’ PC1 ’)
73 plt . ylabel ( ’ PC2 ’)
74 plt . legend ()
75 plt . show ()
Listing 24: Principal Component Analysis (PCA) Code

4.6.2 Linear Discriminant Analysis

The Linear Discriminant Analysis is used in the pre-processing step for pattern classifi-
cation in the machine learning application. This dimensionality reduction method has the
goal to project a data set on to a lower-dimensional space. Difference between PCA is that
maximizing the separation between mutiple classes.

80
Figure 75: Difference Between PCA and LDA

The following code written in Python is all about Linear Discriminant Analysis on data
set.

1 # Linear Discriminant Analysis ( LDA )


2

3 # Importing the libraries


4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Wine . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 0)
16
17 # Feature Scaling
18 from sklearn . preprocessing import StandardScaler
19 sc = StandardScaler ()
20 X_train = sc . fit_transform ( X_train )
21 X_test = sc . transform ( X_test )
22
23 # Applying LDA
24 from sklearn . di scri mina nt_an alys is import L i n e a r D i s c r i m i n a n t A n a l y s i s as
LDA
25 lda = LDA ( n_components = 2)
26 X_train = lda . fit_transform ( X_train , y_train )
27 X_test = lda . transform ( X_test )

81
28
29 # Training the Logistic Regression model on the Training set
30 from sklearn . linear_model import LogisticRegression
31 classifier = LogisticRegression ( random_state = 0)
32 classifier . fit ( X_train , y_train )
33
34 # Making the Confusion Matrix
35 from sklearn . metrics import confusion_matrix , accuracy_score
36 y_pred = classifier . predict ( X_test )
37 cm = confusion_matrix ( y_test , y_pred )
38 print ( cm )
39 accuracy_score ( y_test , y_pred )
40
41 # Visualising the Training set results
42 from matplotlib . colors import ListedColormap
43 X_set , y_set = X_train , y_train
44 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
45 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
46 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
47 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
48 plt . xlim ( X1 . min () , X1 . max () )
49 plt . ylim ( X2 . min () , X2 . max () )
50 for i , j in enumerate ( np . unique ( y_set ) ) :
51 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
52 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
53 plt . title ( ’ Logistic Regression ( Training set ) ’)
54 plt . xlabel ( ’ LD1 ’)
55 plt . ylabel ( ’ LD2 ’)
56 plt . legend ()
57 plt . show ()
58
59 # Visualising the Test set results
60 from matplotlib . colors import ListedColormap
61 X_set , y_set = X_test , y_test
62 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
63 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
64 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
65 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
66 plt . xlim ( X1 . min () , X1 . max () )
67 plt . ylim ( X2 . min () , X2 . max () )
68 for i , j in enumerate ( np . unique ( y_set ) ) :
69 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
70 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
71 plt . title ( ’ Logistic Regression ( Test set ) ’)

82
72 plt . xlabel ( ’ LD1 ’)
73 plt . ylabel ( ’ LD2 ’)
74 plt . legend ()
75 plt . show ()
Listing 25: Linear Discriminant Analysis Code

4.6.3 Kernel PCA

The Kernel PCA is same as Principal Component Analysis. Only difference between
PCA and Kernel PCA is Radial Basis Function Kernel application on Kernel PCA.

Kernel PCA can be applied on the data set like following Python code on the below.

1 # Kernel PCA
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Wine . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 0)
16
17 # Feature Scaling
18 from sklearn . preprocessing import StandardScaler
19 sc = StandardScaler ()
20 X_train = sc . fit_transform ( X_train )
21 X_test = sc . transform ( X_test )
22
23 # Applying Kernel PCA
24 from sklearn . decomposition import KernelPCA
25 kpca = KernelPCA ( n_components = 2 , kernel = ’ rbf ’)
26 X_train = kpca . fit_transform ( X_train )
27 X_test = kpca . transform ( X_test )
28
29 # Training the Logistic Regression model on the Training set
30 from sklearn . linear_model import LogisticRegression
31 classifier = LogisticRegression ( random_state = 0)
32 classifier . fit ( X_train , y_train )
33

34 # Making the Confusion Matrix


35 from sklearn . metrics import confusion_matrix , accuracy_score

83
36 y_pred = classifier . predict ( X_test )
37 cm = confusion_matrix ( y_test , y_pred )
38 print ( cm )
39 accuracy_score ( y_test , y_pred )
40
41 # Visualising the Training set results
42 from matplotlib . colors import ListedColormap
43 X_set , y_set = X_train , y_train
44 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
45 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
46 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
47 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
48 plt . xlim ( X1 . min () , X1 . max () )
49 plt . ylim ( X2 . min () , X2 . max () )
50 for i , j in enumerate ( np . unique ( y_set ) ) :
51 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
52 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
53 plt . title ( ’ Logistic Regression ( Training set ) ’)
54 plt . xlabel ( ’ PC1 ’)
55 plt . ylabel ( ’ PC2 ’)
56 plt . legend ()
57 plt . show ()
58
59 # Visualising the Test set results
60 from matplotlib . colors import ListedColormap
61 X_set , y_set = X_test , y_test
62 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
63 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
64 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
65 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
66 plt . xlim ( X1 . min () , X1 . max () )
67 plt . ylim ( X2 . min () , X2 . max () )
68 for i , j in enumerate ( np . unique ( y_set ) ) :
69 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
70 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
71 plt . title ( ’ Logistic Regression ( Test set ) ’)
72 plt . xlabel ( ’ PC1 ’)
73 plt . ylabel ( ’ PC2 ’)
74 plt . legend ()
75 plt . show ()
Listing 26: Kernel PCA Code

84
4.7 Model Selection and Boosting

In this part, two topics are included which are,

• Model Selection

• Boosting

4.7.1 Model Selection

There are two type of model selection which are,

• k-Fold Cross Validation

• Grid Search

K-Fold Cross Validation


The k-Fold Cross Validation technique use training test for each iteration randomly in the
specific model which is chosen At the end of this process, it gives us accuracy of the model in
that specific model. Doing this application on each model gives us the must accurate model
for selected data set.

Figure 76: k-Fold Cross Validation

The application of k-Fold Validation technique for model selection can be applied as fol-
lowing Python code.

85
1 # k - Fold Cross Validation
2

3 # Importing the libraries


4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16
17 # Feature Scaling
18 from sklearn . preprocessing import StandardScaler
19 sc = StandardScaler ()
20 X_train = sc . fit_transform ( X_train )
21 X_test = sc . transform ( X_test )
22
23 # Training the Kernel SVM model on the Training set
24 from sklearn . svm import SVC
25 classifier = SVC ( kernel = ’ rbf ’ , random_state = 0)
26 classifier . fit ( X_train , y_train )
27
28 # Making the Confusion Matrix
29 from sklearn . metrics import confusion_matrix , accuracy_score
30 y_pred = classifier . predict ( X_test )
31 cm = confusion_matrix ( y_test , y_pred )
32 print ( cm )
33 accuracy_score ( y_test , y_pred )
34
35 # Applying k - Fold Cross Validation
36 from sklearn . model_selection import cross_val_score
37 accuracies = cross_val_score ( estimator = classifier , X = X_train , y =
y_train , cv = 10)
38 print ( " Accuracy : {:.2 f } % " . format ( accuracies . mean () *100) )
39 print ( " Standard Deviation : {:.2 f } % " . format ( accuracies . std () *100) )
40
41 # Visualising the Training set results
42 from matplotlib . colors import ListedColormap
43 X_set , y_set = X_train , y_train
44 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
45 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
46 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
47 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
48 plt . xlim ( X1 . min () , X1 . max () )
49 plt . ylim ( X2 . min () , X2 . max () )

86
50 for i , j in enumerate ( np . unique ( y_set ) ) :
51 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
52 c = ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
53 plt . title ( ’ Kernel SVM ( Training set ) ’)
54 plt . xlabel ( ’ Age ’)
55 plt . ylabel ( ’ Estimated Salary ’)
56 plt . legend ()
57 plt . show ()
58
59 # Visualising the Test set results
60 from matplotlib . colors import ListedColormap
61 X_set , y_set = X_test , y_test
62 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
63 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
64 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
65 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
66 plt . xlim ( X1 . min () , X1 . max () )
67 plt . ylim ( X2 . min () , X2 . max () )
68 for i , j in enumerate ( np . unique ( y_set ) ) :
69 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
70 c = ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
71 plt . title ( ’ Kernel SVM ( Test set ) ’)
72 plt . xlabel ( ’ Age ’)
73 plt . ylabel ( ’ Estimated Salary ’)
74 plt . legend ()
75 plt . show ()
Listing 27: K-Fold Cross Validation Code
Grid Search

The next model selection technique is grid search use comparison of two model can be
applied on the data set with parameters is in those model. Then, it specify the exact model
with exact parameters. In this technique, we can also determine all parameter in exact model
to reach the highest accuracy for learning process on data set.

For Grid Search technique application, the following code written in Python can be useful
as an example for you.

1 # Grid Search
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values

87
11 y = dataset . iloc [: , -1]. values
12

13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16
17 # Feature Scaling
18 from sklearn . preprocessing import StandardScaler
19 sc = StandardScaler ()
20 X_train = sc . fit_transform ( X_train )
21 X_test = sc . transform ( X_test )
22
23 # Training the Kernel SVM model on the Training set
24 from sklearn . svm import SVC
25 classifier = SVC ( kernel = ’ rbf ’ , random_state = 0)
26 classifier . fit ( X_train , y_train )
27
28 # Making the Confusion Matrix
29 from sklearn . metrics import confusion_matrix , accuracy_score
30 y_pred = classifier . predict ( X_test )
31 cm = confusion_matrix ( y_test , y_pred )
32 print ( cm )
33 accuracy_score ( y_test , y_pred )
34
35 # Applying k - Fold Cross Validation
36 from sklearn . model_selection import cross_val_score
37 accuracies = cross_val_score ( estimator = classifier , X = X_train , y =
y_train , cv = 10)
38 print ( " Accuracy : {:.2 f } % " . format ( accuracies . mean () *100) )
39 print ( " Standard Deviation : {:.2 f } % " . format ( accuracies . std () *100) )
40

41 # Applying Grid Search to find the best model and the best parameters
42 from sklearn . model_selection import GridSearchCV
43 parameters = [{ ’C ’: [0.25 , 0.5 , 0.75 , 1] , ’ kernel ’: [ ’ linear ’]} ,
44 { ’C ’: [0.25 , 0.5 , 0.75 , 1] , ’ kernel ’: [ ’ rbf ’] , ’ gamma ’:
[0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9]}]
45 grid_search = GridSearchCV ( estimator = classifier ,
46 param_grid = parameters ,
47 scoring = ’ accuracy ’ ,
48 cv = 10 ,
49 n_jobs = -1)
50 grid_search . fit ( X_train , y_train )
51 best_accuracy = grid_search . best_score_
52 best_parameters = grid_search . best_params_
53 print ( " Best Accuracy : {:.2 f } % " . format ( best_accuracy *100) )
54 print ( " Best Parameters : " , best_parameters )
55
56 # Visualising the Training set results
57 from matplotlib . colors import ListedColormap
58 X_set , y_set = X_train , y_train
59 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
60 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set

88
[: , 1]. max () + 1 , step = 0.01) )
61 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
62 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
63 plt . xlim ( X1 . min () , X1 . max () )
64 plt . ylim ( X2 . min () , X2 . max () )
65 for i , j in enumerate ( np . unique ( y_set ) ) :
66 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
67 c = ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
68 plt . title ( ’ Kernel SVM ( Training set ) ’)
69 plt . xlabel ( ’ Age ’)
70 plt . ylabel ( ’ Estimated Salary ’)
71 plt . legend ()
72 plt . show ()
73
74 # Visualising the Test set results
75 from matplotlib . colors import ListedColormap
76 X_set , y_set = X_test , y_test
77 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
78 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
79 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
80 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
81 plt . xlim ( X1 . min () , X1 . max () )
82 plt . ylim ( X2 . min () , X2 . max () )
83 for i , j in enumerate ( np . unique ( y_set ) ) :
84 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
85 c = ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
86 plt . title ( ’ Kernel SVM ( Test set ) ’)
87 plt . xlabel ( ’ Age ’)
88 plt . ylabel ( ’ Estimated Salary ’)
89 plt . legend ()
90 plt . show ()
Listing 28: Grid Search Code

4.7.2 XGBoost

XGBoost is a model which train with the highest accuracy in all the others mod-
els.XGBoost model is simple and useful model to train machine with data set. There are
four type of XGBoost model which are,

• XGBClassifier

• XGBRegressor

• XGBRFClassifier

89
• XGBRFRegressor

For applying XGBoost models, following code written in Python is good example for
your application.

1 # XGBoost
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Data . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 0)
16
17 # Training XGBoost on the Training set
18 from xgboost import XGBClassifier
19 classifier = XGBClassifier ()
20 classifier . fit ( X_train , y_train )
21

22 # Making the Confusion Matrix


23 from sklearn . metrics import confusion_matrix , accuracy_score
24 y_pred = classifier . predict ( X_test )
25 cm = confusion_matrix ( y_test , y_pred )
26 print ( cm )
27 accuracy_score ( y_test , y_pred )
28
29 # Applying k - Fold Cross Validation
30 from sklearn . model_selection import cross_val_score
31 accuracies = cross_val_score ( estimator = classifier , X = X_train , y =
y_train , cv = 10)
32 print ( " Accuracy : {:.2 f } % " . format ( accuracies . mean () *100) )
33 print ( " Standard Deviation : {:.2 f } % " . format ( accuracies . std () *100) )
Listing 29: XGBoost Code

90
5 CONCLUSION

As a conclusion of this report, all application and how it do this is explained. For each
part, little explanation is done. The codes are import in each part to use for your own project
or free training application. As you realized, there so many way for machine learning process.
Regression, classification and clustering models are mentioned for simple machine learning
models. The deep learning application is explained for neural network modelling types with
example codes. At the end of this article, The reader can be modelled for machine learning or
deep learning with all these information. Easily Build, Train, and Deploy Machine Learning
Models.

91
6 REFERENCES

1. https://www.udemy.com/course/machinelearning/

2. https://www.udemy.com/course/the-python-mega-course/learn/lecture/7627770?start=15overview

3. https://www.superdatascience.com/podcast/sds-002-machine-learning-recommender-systems-
and-the-future-of-data-with-hadelin-de-ponteves

4. https://www.superdatascience.com/pages/welcome-to-faqbot

5. https://www.superdatascience.com/podcast/sds-041-inspiring-journey-totally-different-
background-data-science

6. https://www.superdatascience.com/podcast/sds-002-machine-learning-recommender-systems-
and-the-future-of-data-with-hadelin-de-ponteves

92

You might also like