Machine Learnig Revision
Machine Learnig Revision
Submitted by:
Safa YILMAZ
Professor:
Assoc. Prof. Dr. Ercan GÜRSES
2 Types of Compilers 7
3 Types of Libraries 8
3.1 Built-In Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Standard Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 3rd Party Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Learning Process 10
4.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.1 Import Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.2 Import Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.3 Optimization Missing Data . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.4 Encoding Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.5 Arranging Training and Test Data . . . . . . . . . . . . . . . . . . . 13
4.1.6 Feature Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.3 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.4 Support Vector Regression (SVR) . . . . . . . . . . . . . . . . . . . . 22
4.2.5 Decision Tree Regression . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.6 Random Forest Regression . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.2 K-Nearest Neighbors (K-NN) . . . . . . . . . . . . . . . . . . . . . . 33
4.3.3 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . . . 37
4.3.4 Kernel SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.5 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.6 Decision Tree Classification . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.7 Random Forest Classification . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . 70
4.6 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6.1 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . 78
4.6.2 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . 80
4.6.3 Kernel PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.7 Model Selection and Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . 85
1
4.7.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.7.2 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5 CONCLUSION 91
6 REFERENCES 92
2
List of Figures
1 Simple Linear Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Simple Linear Regression Application . . . . . . . . . . . . . . . . . . . . . . 15
3 Simple Linear Regression Error Contribution . . . . . . . . . . . . . . . . . . 15
4 Simple Linear Regression Applications . . . . . . . . . . . . . . . . . . . . . 17
5 Multiple Linear Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7 Polynomial Regression Approximation . . . . . . . . . . . . . . . . . . . . . 20
8 Regression Application Differences on Training Data . . . . . . . . . . . . . 21
9 Polynomial Regression Application on Training Data . . . . . . . . . . . . . 22
10 Support Vector Regression (SVR) . . . . . . . . . . . . . . . . . . . . . . . . 22
11 Support Vector Regression (SVR) Application on Training Data . . . . . . . 24
12 Support Vector Regression (SVR) Application on Training Data with Radial
Basis Function (RBF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
13 Decision Tree Regression Split Map . . . . . . . . . . . . . . . . . . . . . . . 25
14 Decision Tree Regression Tree Map . . . . . . . . . . . . . . . . . . . . . . . 26
15 Decision Regression Tree Application on Training Data . . . . . . . . . . . . 27
16 Random Forest Regression Application on Training Data . . . . . . . . . . . 29
17 Probability Distribution of Data Set . . . . . . . . . . . . . . . . . . . . . . . 30
18 Probability Shifting for Predicted Values . . . . . . . . . . . . . . . . . . . . 30
19 Logistic Regression Application . . . . . . . . . . . . . . . . . . . . . . . . . 33
20 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
21 Categories and New Data Position . . . . . . . . . . . . . . . . . . . . . . . . 34
22 K-NN Classification Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
23 K-Nearest Neighbors (K-NN) Application . . . . . . . . . . . . . . . . . . . . 37
24 Support Vector Machine Applcation . . . . . . . . . . . . . . . . . . . . . . . 37
25 Support Vector Machine (SVM) Application . . . . . . . . . . . . . . . . . . 40
26 Mapping Function Application . . . . . . . . . . . . . . . . . . . . . . . . . . 40
27 Kernels of Mapping Function . . . . . . . . . . . . . . . . . . . . . . . . . . 41
28 The Gaussian RBF Kernel Application . . . . . . . . . . . . . . . . . . . . . 41
29 Kernel SVM Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
30 Probabilities of Two Different Class on the Data Set . . . . . . . . . . . . . . 44
31 Naive Bayes Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
32 Decision Tree Classification Separation of Data Set . . . . . . . . . . . . . . 47
33 Decision Tree Classification Tree . . . . . . . . . . . . . . . . . . . . . . . . 47
34 Decision Tree Classification Application . . . . . . . . . . . . . . . . . . . . . 50
35 Random Forest Classification Application . . . . . . . . . . . . . . . . . . . . 52
36 Randomly Selected Points for Each Clusters . . . . . . . . . . . . . . . . . . 53
37 Movement to the New Centroid Point Calculated . . . . . . . . . . . . . . . 53
38 New Clusters for New Closest Centroid . . . . . . . . . . . . . . . . . . . . . 54
39 Final Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
40 K-Means Application Process . . . . . . . . . . . . . . . . . . . . . . . . . . 55
41 Optimum Number of Clusters Determination . . . . . . . . . . . . . . . . . . 55
42 K-Means Clustering Application . . . . . . . . . . . . . . . . . . . . . . . . . 57
3
43 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
44 Distance Between Two Clusters . . . . . . . . . . . . . . . . . . . . . . . . . 58
45 Hierarchical Clustering Model Optimization of Cluster Number . . . . . . . . 59
46 Final Optimum Clusters for Hierarchical Clustering Model . . . . . . . . . . 59
47 Hierarchical Clustering Application . . . . . . . . . . . . . . . . . . . . . . . 60
48 Biological and Artificial Neurons . . . . . . . . . . . . . . . . . . . . . . . . . 61
49 Standardization of Independent Variables . . . . . . . . . . . . . . . . . . . . 61
50 Transfer Function Step in Neurons . . . . . . . . . . . . . . . . . . . . . . . 62
51 Activation Function Step in Neurons . . . . . . . . . . . . . . . . . . . . . . 62
52 Activation Functions of Neural Networks . . . . . . . . . . . . . . . . . . . . 63
53 Multi Layer Neurons with Different Activation Functions . . . . . . . . . . . 64
54 Arranging Weight with Back Propagation . . . . . . . . . . . . . . . . . . . 64
55 Minimizing Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
56 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
57 Back Propagation with Gradient Descent . . . . . . . . . . . . . . . . . . . . 66
58 Example of Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . 66
59 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
60 Difference Between Batch Gradient Descent and Stochastic Gradient Descent 67
61 Types of Propagation on Neural Networks . . . . . . . . . . . . . . . . . . . 68
62 Convolutional Neural Network Structure . . . . . . . . . . . . . . . . . . . . 71
63 Dimensions of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
64 Pixels in Binary Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
65 Feature Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
66 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
67 Example of Convolutional Layer Application . . . . . . . . . . . . . . . . . . 73
68 Rectifier Activation (ReLU) Layer . . . . . . . . . . . . . . . . . . . . . . . . 73
69 Pooled Feature Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
70 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
71 Converting of Pooling Layer to Input Layer (Flattening) . . . . . . . . . . . 75
72 Convolutional Neural Networks Preparation Steps . . . . . . . . . . . . . . . 75
73 Input Application on Artificial Neural Networks for Convolutional Neural Net-
works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
74 Convolutional Neural Network Structure Included All Steps . . . . . . . . . 76
75 Difference Between PCA and LDA . . . . . . . . . . . . . . . . . . . . . . . 81
76 k-Fold Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4
Listings
1 Import Libraries Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Import Data Set Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Optimization Missing Data Code . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Encoding Categorical Data Code . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Arranging Traning and Test Data Code . . . . . . . . . . . . . . . . . . . . . 13
6 Feature Scaling Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7 Simple Linear Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . 15
8 Multiple Linear Regression Code . . . . . . . . . . . . . . . . . . . . . . . . 18
9 Polynomial Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
10 Support Vector Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . 23
11 Decision Tree Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . . 26
12 Random Forest Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . 28
13 Logistic Regression Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
14 K-Nearest Neighbors (K-NN) Code . . . . . . . . . . . . . . . . . . . . . . . 35
15 Support Vector Machine Code . . . . . . . . . . . . . . . . . . . . . . . . . . 37
16 Kernel SVM Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
17 Naive Bayes Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
18 Decision Tree Classification Code . . . . . . . . . . . . . . . . . . . . . . . . 48
19 Random Forest Classification Code . . . . . . . . . . . . . . . . . . . . . . . 50
20 K-Means Clustering Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
21 Hierarchical Clustering Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
22 Artificial Neural Networks Code . . . . . . . . . . . . . . . . . . . . . . . . . 68
23 Convolutional Neural Networks Code . . . . . . . . . . . . . . . . . . . . . . 76
24 Principal Component Analysis (PCA) Code . . . . . . . . . . . . . . . . . . 79
25 Linear Discriminant Analysis Code . . . . . . . . . . . . . . . . . . . . . . . 81
26 Kernel PCA Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
27 K-Fold Cross Validation Code . . . . . . . . . . . . . . . . . . . . . . . . . . 86
28 Grid Search Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
29 XGBoost Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5
Abstract
The aims of this article are how machine learning apply data set to reach accurate
results, types of leaning models, how data set is prepared for learning process, natural
language process application and neural network types. At the end of the report, main
expectation is knowledge about all types of application on artificial intelligence gained
by reader. Processing is explained step by step for readers’ benefits. All codes and
graphs are imported in the report to guide how to apply all these processes. Data sets
are very huge and flexible to use in code. That is why, data sets are not imported in
the report. Reader can be apply their own data set to improve their knowledge about
machine learning application. The syntax of codes is Python3. Python3 is useful to
reach many open source documents on the internet or different region. That is why,
Python3 is preferred for application of machine learning.
6
1 INTRODUCTION
In machine learning, a person is chosen or built for the underlying algorithm. However,
the algorithms learn of parameters that form a mathematical model for making forecasts
rather than direct human interference. Human beings don’t know these criteria or set them
– the computer does. Another approach is to train a mathematical model by using a col-
lection of data such that it learns what to do with identical data as it sees in the future.
Usually models take data as an input and then generate a forecast of some interest.
Managers may not have to be specialists in machine learning, but there should be even
a limited amount of information. You would have an understanding of where to start and
what to grab if you can grasp what the specific types of things you can do with ML. And you
would not have to blindly say ”go magic” to the technical staff and hope they excel. We will
continue with machine-learning techniques, explore the physics and computational reality
that make all this possible, answer a fundamental problem for ML, delve into deep learning
and discuss. All in all, we look forward to a more fruitful dialogue with data scientists and
engineers.
2 Types of Compilers
There are two types of compiler use in the machine learning applications Which are
Python and R. On the other hand, so many IDE (Integrated Development Environment) are
used for these compilers. An IDE helps the programmer to consolidate the various facets of
a computer program writing. IDEs improve the effectiveness of programmers by integrating
common software writing tasks into a single application: source code editing, executable
creation, and debugging. There are so many IDEs to use for application.
• PyCharm
• Kite
• Spyder
• IDLE
• Sublime Text 3
• Atom
7
• Jupyter
• Pydev
• Thonny
• Wing
• ActivePython
• ...
3 Types of Libraries
• Built-In Libraries
• Standard Libraries
• abs
• all
• any
• bin
• dir
• enumerate
• isinstance
• iter
• bytes
• compile
8
• ...
• time
• sys
• os
• math
• random
• pickle
• urllib
• re
• cgi
• socket
• prefix
• suffix
• warning
• locale
• ...
For instance, the time modules is count second and give exact solution in that time
period. Python gives solution in each time period on console. This gives us controlling the
information sequence in exact time period.
With this library, we can reach some directory and controlling the system. For
instance, we can create or control directory elements to compile a code if it exist. The code
check the existence of the files and run it.
9
3.3 3rd Party Libraries
This libraries are most common libraries of python because of the usage of this libraries
importance. This libraries are importing from the outsource. Most of these libraries are
open source libraries. For instance, python main libraries does not have some modules as
’mean’. We can not calculate mean value of our datas. However, using Numpy libraries, we
can apply these directly with single word. All these libraries using different application. In
this report our expectation is machine learning and artificial intelligence application. That
is why, we use some of these libraries for calculate our prediction and preparing our data set
to train the machine.
• TensorFlow
• Scikit-Learn
• Numpy
• Keras
• PyTorch
• LightGBM
• Eli5
• SciPy
• Theano
• Pandas
• MatPlotLib
In this report, we mostly use TensorFlow for ANN application, Scikit-Learn for ma-
chine learning, Numpy for mathematical application, Pandas for data set importing and
MatPlotLib for visualizing our solution.
4 Learning Process
• Data Preprocessing
10
• Regression
• Classification
• Clustering
• Association Rule Learning
• Reinforcement Learning
• Natural Language Processing
• Deep Learning
• Dimensionality Reduction
• Model Selection and Boosting
Data preprocessing step includes some application to prepare data set for training steps.
These steps are important for training process to reach optimum learning solutions. These
steps are,
• Importing Libraries
• Importing Data Set
• Optimization of Missing Data
• Encoding Categorical Data
• Arranging Training and Test Data
• Feature Scaling
Importing Libraries is the beginner step of the coding on the application. The necessary
3rd Party libraries are import in code in this part. As mentioned on previous steps, chosen
libraries import in code with following syntax.
11
4.1.2 Import Data Set
The second step for preprocessing is data implementation. Data set for using in training
process import in this step. With using Pandas library, data set import the code to train
with some modules. Data set import is applying with following code.
Some missing part can be on data set. Missing data is problem for training process.
That is why, we can apply some rules like mean of the whole datas on this column. Missing
data optimisation is applied with following code.
Some datas in data set are string which means words or some thing like these. These
datas should be converted to numerical values to understood by the machine. That is why,
this step is applied on the string datas. The encoding categorical datas is applied with fol-
lowing code.
12
7 print ( X )
8 # Encoding the Dependent Variable
9 from sklearn . preprocessing import LabelEncoder
10 le = LabelEncoder ()
11 y = le . fit_transform ( y )
12 print ( y )
Listing 4: Encoding Categorical Data Code
The training process needs to corrected some test part to reach accuracy of learning pro-
cess. For this reason, data set is split two part which is training data and test data. Training
data using for the learning process. On the other hand, test part is using for testing learning
process to gives correct solution. Arranging training and test part of data set is applied with
following code.
1 # Splitting the dataset into the Training set and Test set
2 from sklearn . model_selection import train_test_split
3 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 1)
4 print ( X_train )
5 print ( X_test )
6 print ( y_train )
7 print ( y_test )
Listing 5: Arranging Traning and Test Data Code
Data set has column include datas with own units. To compare all this different values,
we need to normalize or standardize these values. Feature scaling scale datas on exact col-
umn with non-dimensional values of datas. Applying this step makes datas comparable with
other column-wise datas. Application of the feature scaling is made by the following code.
1 # Feature Scaling
2 from sklearn . preprocessing import StandardScaler
3 sc = StandardScaler ()
4 X_train [: , 3:] = sc . fit_transform ( X_train [: , 3:])
5 X_test [: , 3:] = sc . transform ( X_test [: , 3:])
6 print ( X_train )
7 print ( X_test )
Listing 6: Feature Scaling Code
13
4.2 Regression
• Polynomial Regression
Simple linear regression is applied for predicting simple linear relation between training
data set. As known application, formula of linear function is,
14
Simple linear regression are used for single dependent single independent data set. Mul-
tiplication of independent values with some coefficient and summation of this values with
initial value gives us dependent value. At the end of the process, regression module obtain
a coefficient which is related data set dependent and independent values. These relation
should be linear in this module because of the regression prediction accuracy.
With respect to obtained linear distribution of datas, machine makes a prediction around
this line. Square sum of subtraction of depending value and predicting value gives us error
of prediction.
15
6 import pandas as pd
7
16
(a) Training Data (b) Test Data
Multiple linear regression is similar with single linear regression. However, in this regres-
sion, independent variables are more than one as an input variable.
There are some models to train the models with multiple linear regression which are,
• Backward Elimination
17
• Forward Selection
• Bidirectional Elimination
– Select a significance level to enter and stay in the model (e.g. SLENTER = 0.05,
SLSTAY = 0.05)
– Perform the nect step of Forward Selection (new variable must have: P ¡ SLEN-
TER to enter)
– Perform ALL steps of Backward Elimination (old variables must have P ¡ SLSTAY
to stay)
– No new variables can enter and no old variables can exit
For training the model with multiple linear regression, the following python code is,
1 # Multiple Linear Regression
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ 50 _Startups . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12 print ( X )
13
14 # Encoding categorical data
15 from sklearn . compose import ColumnTransformer
16 from sklearn . preprocessing import OneHotEncoder
17 ct = ColumnTransformer ( transformers =[( ’ encoder ’ , OneHotEncoder () , [3]) ] ,
remainder = ’ passthrough ’)
18 X = np . array ( ct . fit_transform ( X ) )
19 print ( X )
20
21 # Splitting the dataset into the Training set and Test set
22 from sklearn . model_selection import train_test_split
18
23 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 0)
24
25 # Training the Multiple Linear Regression model on the Training set
26 from sklearn . linear_model import LinearRegression
27 regressor = LinearRegression ()
28 regressor . fit ( X_train , y_train )
29
Polynomial linear regression is polynomial approach for training data set to improve ac-
curacy. Nonlinear relations between independent and dependent variables cause some errors
when linear regressions are used. That is why, polynomial regression is useful for training
process. Shown in following figure, linearization process is done with polynomial approach
on independent variables. With using this model, error caused by nonlinear relation is get-
ting smaller.
Training trend line is like following figure. Each polynomial element makes curve
change with more accurate trend.
19
Figure 7: Polynomial Regression Approximation
The following code written in Python is useful to apply this model on training process.
1 # Polynomial Regression
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Position_Salaries . csv ’)
10 X = dataset . iloc [: , 1: -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Training the Linear Regression model on the whole dataset
14 from sklearn . linear_model import LinearRegression
15 lin_reg = LinearRegression ()
16 lin_reg . fit (X , y )
17
20
32
33 # Visualising the Polynomial Regression results
34 plt . scatter (X , y , color = ’ red ’)
35 plt . plot (X , lin_reg_2 . predict ( poly_reg . fit_transform ( X ) ) , color = ’ blue ’)
36 plt . title ( ’ Truth or Bluff ( Polynomial Regression ) ’)
37 plt . xlabel ( ’ Position level ’)
38 plt . ylabel ( ’ Salary ’)
39 plt . show ()
40
41 # Visualising the Polynomial Regression results ( for higher resolution and
smoother curve )
42 X_grid = np . arange ( min ( X ) , max ( X ) , 0.1)
43 X_grid = X_grid . reshape (( len ( X_grid ) , 1) )
44 plt . scatter (X , y , color = ’ red ’)
45 plt . plot ( X_grid , lin_reg_2 . predict ( poly_reg . fit_transform ( X_grid ) ) , color
= ’ blue ’)
46 plt . title ( ’ Truth or Bluff ( Polynomial Regression ) ’)
47 plt . xlabel ( ’ Position level ’)
48 plt . ylabel ( ’ Salary ’)
49 plt . show ()
50
51 # Predicting a new result with Linear Regression
52 lin_reg . predict ([[6.5]])
53
54 # Predicting a new result with Polynomial Regression
55 lin_reg_2 . predict ( poly_reg . fit_transform ([[6.5]]) )
Listing 9: Polynomial Regression Code
In the following Figure 10, training data set has more error with using linear regres-
sion.However, The polynomial regression change the trend line with minimum error version.
Using spline method makes graph smooth line version which is in Figure 12. Poly-
nomial degree is controlled in this method. Higher degree in polynomial regression makes
21
prediction more accurate. Curve approach to training data with minimum error.
Support vector regression makes a prediction line on the data set with using Ordinary
Least Squares.Then, ǫ-Intensive tube is a tube line around the prediction line between +ǫ and
−ǫ shown in following figure. Using this approach makes prediction more accurate because
of the nearest data to prediction line.
m
1 X min
||w||2 + c (ǫi + ǫ∗i ) −−→
2 i=1
22
The application of this model on the training process, following code in Python can
be useful for beginning.
23
’ red ’)
46 plt . plot ( X_grid , sc_y . inverse_transform ( regressor . predict ( sc_X . transform (
X_grid ) ) ) , color = ’ blue ’)
47 plt . title ( ’ Truth or Bluff ( SVR ) ’)
48 plt . xlabel ( ’ Position level ’)
49 plt . ylabel ( ’ Salary ’)
50 plt . show ()
Listing 10: Support Vector Regression Code
Support Vector Regression application on the training data set is illustrated with
following figures.
24
Figure 12: Support Vector Regression (SVR) Application on Training Data with Radial
Basis Function (RBF)
The decision tree regression is mainly based on a prediction which is making a tree about
relation of independent variables. As you see in following figure, all values from data set is
in a coordinate system include some boundaries.
For X1 < 20 values split in 2 part in X2 direction which are X2 < 200 part and X2 > 200
25
part. It is illustrated in following figure as a first separation on the relation tree. Then for
each boundaries, the new separation is written on the tree. Finally, probability in final step is
make prediction about new input. The decision tree regression works in this principle in code.
The following Python code is for an example for application of decision tree regression.
26
24 plt . scatter (X , y , color = ’ red ’)
25 plt . plot ( X_grid , regressor . predict ( X_grid ) , color = ’ blue ’)
26 plt . title ( ’ Truth or Bluff ( Decision Tree Regression ) ’)
27 plt . xlabel ( ’ Position level ’)
28 plt . ylabel ( ’ Salary ’)
29 plt . show ()
Listing 11: Decision Tree Regression Code
The decision tree regression model is perfect model for multiple input data set. That is
why, the graph in the following figure is less accurate because of the data set.
Random forest regression is basically multiple application of the decision tree regression.
As you can realize name of the regressions, decision tree regression has only one tree in
leaning model. Therefore, random forest regression has more than one tree like a forest.
• Pick at random K (Tree Number) data points from the Training set.
• Choose the number Ntree of trees you want to build and repeat previous steps
27
• For a new data point, make each one of your Ntree trees predict the values of Y to for
the data point in question and assign the new data point the average across all of the
predicted Y values.
The code on the below is all about the application of random forest regression model for
machine learning written in Python.
28
Figure 16: Random Forest Regression Application on Training Data
4.3 Classification
Generally, classification model using for classifying data set values in class which related
in a group with multi input. We will examine all these model in this part. Classification is
studied some subtopic which are,
• Logistic Regression
• Kernel SVM
• Naive Bayes
Logistic regression use probability approach to predict new result. In data set, we have
some input with outputs. The machine calculate all probabilities in this data set.
29
Figure 17: Probability Distribution of Data Set
After this step, mean of this distribution is calculated. Machine use mean as reference
value for prediction. Below and above of mean value is different prediction for new input.
To apply logistic regression, following code written in Python can be good example for
modelling.
1 # Logistic Regression
30
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20
21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the Logistic Regression model on the Training set
30 from sklearn . linear_model import LogisticRegression
31 classifier = LogisticRegression ( random_state = 0)
32 classifier . fit ( X_train , y_train )
33
34 # Predicting a new result
35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36
37 # Predicting the Test set results
38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40
41 # Making the Confusion Matrix
42 from sklearn . metrics import confusion_matrix , accuracy_score
43 cm = confusion_matrix ( y_test , y_pred )
44 print ( cm )
45 accuracy_score ( y_test , y_pred )
46
47 # Visualising the Training set results
48 from matplotlib . colors import ListedColormap
49 X_set , y_set = sc . inverse_transform ( X_train ) , y_train
50 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
31
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ Logistic Regression ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63
32
(a) Training Data (b) Test Data
K-nearest neighbor classification model based on the Euclidean Distance Which is for-
mulated by,
p
Distance = (x2 − x1 )2 + (y2 − y1 )2
First of all, the K value which is number of neighbor is determined. Shown in the follow-
ing figure, K = 5.
33
Figure 21: Categories and New Data Position
When the new data input is entered, machine determine the Euclidean Distance which
is closest the new data point. At the end, new data specified in that class.
• Take the K nearest neighbor of the new data point, according to the Euclidean Distance
• Among these K neighbors, count the number of data points in each category
• Assign the new data point to the category where you counted the most neighbors
34
• The model is ready
To apply this model on the data set you have, the following code written in Python is
good example,
1 # K - Nearest Neighbors (K - NN )
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20
21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the K - NN model on the Training set
30 from sklearn . neighbors import KNeighborsClassifier
31 classifier = KNeighborsClassifier ( n_neighbors = 5 , metric = ’ minkowski ’ , p
= 2)
32 classifier . fit ( X_train , y_train )
33
34 # Predicting a new result
35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36
37 # Predicting the Test set results
38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40
41 # Making the Confusion Matrix
42 from sklearn . metrics import confusion_matrix , accuracy_score
43 cm = confusion_matrix ( y_test , y_pred )
44 print ( cm )
45 accuracy_score ( y_test , y_pred )
35
46
47 # Visualising the Training set results
48 from matplotlib . colors import ListedColormap
49 X_set , y_set = sc . inverse_transform ( X_train ) , y_train
50 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 1) ,
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 1) )
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’K - NN ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63
64 # Visualising the Test set results
65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 1) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 1) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’K - NN ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 14: K-Nearest Neighbors (K-NN) Code
As you seen in the next two figure, The boundary is not linear. That is why, accuracy of
this model is higher than the logistic regression.
36
(a) Training Data (b) Test Data
Support Vector Machine classification is similar to Support Vector Regression. The main
idea is obtaining a line between support vectors in equal length. Mean line has negative and
positive hyperplane. Mean line is boundary for prediction.
37
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20
21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the SVM model on the Training set
30 from sklearn . svm import SVC
31 classifier = SVC ( kernel = ’ linear ’ , random_state = 0)
32 classifier . fit ( X_train , y_train )
33
38
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ SVM ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63
64 # Visualising the Test set results
65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’ SVM ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 15: Support Vector Machine Code
the Hyperplanes and maximum margin is shown on the following figures. As you see in
the figures some data point from the green side is in the red side of hyperplane. The machine
predict new data entry with this modelling.
39
(a) Training Data (b) Test Data
Like previous application of kernel models, this model also solve prediction with Radial
Basis Function in Support Vector Machine. The only difference from Support Vector Machine
is prediction is not linear approximation. That is why, data set is separated in two part in
3D space.
There are different types of kernels to improve accuracy of SVM modelling. Which are,
40
Figure 27: Kernels of Mapping Function
The Gaussian RBF Kernel is an common application in the following figure as an exam-
ple. The kernel application convert 2D data set to 3D space with following formula which is,
~i 2
||~
x,l ||
K(~x, ~li ) = e− σ2
1 # Kernel SVM
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
41
6 import pandas as pd
7
42
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ Kernel SVM ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63
64 # Visualising the Test set results
65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’ Kernel SVM ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 16: Kernel SVM Code
Shown on the following figures the Kernel SVM has nonlinear margin line with respect
to classical SVM. This application develop the accuracy of prediction.
43
4.3.5 Naive Bayes
Naive Bayes classification is basically based on Bayes Probability Theorem which is,
P (B \ A) × P (A)
P (A \ B) =
P (B)
In this model, Bayes theorem is applied on the each class like following figures.
(a) Class A Probability Related to Total (b) Class B Probability Related to Total
Probability Probability
Then, The comparison on two class Bayes probabilities using in the new data prediction.
1 # Naive Bayes
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
44
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20
21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the Naive Bayes model on the Training set
30 from sklearn . naive_bayes import GaussianNB
31 classifier = GaussianNB ()
32 classifier . fit ( X_train , y_train )
33
34 # Predicting a new result
35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36
37 # Predicting the Test set results
38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40
45
65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’ Naive Bayes ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 17: Naive Bayes Code
Result of application of this model is illustrated following figure illustrated of data set
contribution.
Decision Tree Classification is similar to the Decision Tree Regression. The data set is
separated in parts like following figure again.
46
Figure 32: Decision Tree Classification Separation of Data Set
Then, similar to Decision Tree Regression, tree of data set is determined and specified
boundary of each class. The only difference in that model from Decision Tree Regression is
using classification for data set not a probability for new data entry.
47
1 # Decision Tree Classification
2
48
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ Decision Tree Classification ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
62 plt . show ()
63
64 # Visualising the Test set results
65 from matplotlib . colors import ListedColormap
66 X_set , y_set = sc . inverse_transform ( X_test ) , y_test
67 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
68 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
69 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
70 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
71 plt . xlim ( X1 . min () , X1 . max () )
72 plt . ylim ( X2 . min () , X2 . max () )
73 for i , j in enumerate ( np . unique ( y_set ) ) :
74 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
75 plt . title ( ’ Decision Tree Classification ( Test set ) ’)
76 plt . xlabel ( ’ Age ’)
77 plt . ylabel ( ’ Estimated Salary ’)
78 plt . legend ()
79 plt . show ()
Listing 18: Decision Tree Classification Code
As shown on the following figures, this modeled is has more than one region to obtain
classes. That is why prediction can be solved with high accuracy with respect to data set size.
49
(a) Training Data (b) Test Data
In this classification model, each steps are same as Random Forest Regression. However,
only difference is model prediction way which is classification. Therefore, steps are,
• Pick at random K (Tree Number) data points from the Training set.
• Choose the number Ntree of trees you want to build and repeat previous steps
• For a new data point, make each one of your Ntree trees predict the values of Y to for
the data point in question and assign the new data point the average across all of the
predicted Y values.
The application of this classification is in the following Python code, which is,
50
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16 print ( X_train )
17 print ( y_train )
18 print ( X_test )
19 print ( y_test )
20
21 # Feature Scaling
22 from sklearn . preprocessing import StandardScaler
23 sc = StandardScaler ()
24 X_train = sc . fit_transform ( X_train )
25 X_test = sc . transform ( X_test )
26 print ( X_train )
27 print ( X_test )
28
29 # Training the Random Forest Classification model on the Training set
30 from sklearn . ensemble import R an do mF ore st Cl ass if ie r
31 classifier = Ra nd om For es tC la ssi fi er ( n_estimators = 10 , criterion = ’
entropy ’ , random_state = 0)
32 classifier . fit ( X_train , y_train )
33
34 # Predicting a new result
35 print ( classifier . predict ( sc . transform ([[30 ,87000]]) ) )
36
37 # Predicting the Test set results
38 y_pred = classifier . predict ( X_test )
39 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
40
41 # Making the Confusion Matrix
42 from sklearn . metrics import confusion_matrix , accuracy_score
43 cm = confusion_matrix ( y_test , y_pred )
44 print ( cm )
45 accuracy_score ( y_test , y_pred )
46
47 # Visualising the Training set results
48 from matplotlib . colors import ListedColormap
49 X_set , y_set = sc . inverse_transform ( X_train ) , y_train
50 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 10 , stop =
X_set [: , 0]. max () + 10 , step = 0.25) ,
51 np . arange ( start = X_set [: , 1]. min () - 1000 , stop =
X_set [: , 1]. max () + 1000 , step = 0.25) )
52 plt . contourf ( X1 , X2 , classifier . predict ( sc . transform ( np . array ([ X1 . ravel () ,
X2 . ravel () ]) . T ) ) . reshape ( X1 . shape ) ,
53 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
54 plt . xlim ( X1 . min () , X1 . max () )
55 plt . ylim ( X2 . min () , X2 . max () )
56 for i , j in enumerate ( np . unique ( y_set ) ) :
57 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] , c =
ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
58 plt . title ( ’ Random Forest Classification ( Training set ) ’)
59 plt . xlabel ( ’ Age ’)
60 plt . ylabel ( ’ Estimated Salary ’)
61 plt . legend ()
51
62 plt . show ()
63
4.4 Clustering
Clustering is a different way to obtain new data prediction. In this type of models, model
make a prediction with making cluster for each group of data entry on training part. There
are two types of clustering, which are,
52
• K-Means Clustering
• Hierarchical Clustering
K-Means Clustering is a cluster determination model on data set to obtain new entries’
related cluster. Firstly, cluster number is specified on the data set. Then, the centroid of
these clusters
• Choose the number K of clusters
• Select at random K points, the centroids (not necessarily from your data set)
53
• Compute and place the new centroid of each cluster
• Reassign each data point to the new closest centroid. If any reassignment took place,
go to previous step, otherwise go to finalize
54
Figure 40: K-Means Application Process
To obtain number of cluster K, using The Elbow Method which is calculated as,
X X X
W CSS = distance(Pi , C1 )2 + distance(Pi , C2 )2 + distance(Pi , C3 )2 +...
Pi inCluster1 Pi inCluster2 Pi inCluster3
To apply the K-Means Clustering model on the data set, following code written in Python
is good example.
55
1 # K - Means Clustering
2
56
(a) The Elbow Method (b) Training Data
This model based on the hierarchical clustering model which mean all data points have
a relation between the other points. Then, nearest point become one cluster in itself with
using Euclidean Distance.
p
Distance = (x2 − x1 )2 + (y2 − y1 )2
57
Figure 44: Distance Between Two Clusters
• Take the two closest data points and make them one cluster that forms N − 1 clusters
• Take the two closest clusters and make them one cluster that forms N − 2 clusters
• Model is ready
At the end of these steps, the main issue is how many clusters are selected. At this point,
the method specified the number of clusters is applied with using Dendrogram. As shown in
the following figures, chosen cluster numbers is determined on dendrogram with obtain the
single largest continuous leg. To reach this leg on the dendrogram, number of total leg gives
number of clusters.
58
(a) 2 Cluster Example (b) 3 Cluster Example
Then, the model is finalized with this cluster number for training data set.
Therefore, All this steps are in the Pyhon code as a single part of script as an example
in the previous code page.
1 # Hierarchical Clustering
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Mall_Customers . csv ’)
10 X = dataset . iloc [: , [3 , 4]]. values
11
59
21 from sklearn . cluster import Ag g lo me r at i ve Cl u st e ri ng
22 hc = Ag g lo me r at i ve Cl u st e ri ng ( n_clusters = 5 , affinity = ’ euclidean ’ ,
linkage = ’ ward ’)
23 y_hc = hc . fit_predict ( X )
24
25 # Visualising the clusters
26 plt . scatter ( X [ y_hc == 0 , 0] , X [ y_hc == 0 , 1] , s = 100 , c = ’ red ’ , label =
’ Cluster 1 ’)
27 plt . scatter ( X [ y_hc == 1 , 0] , X [ y_hc == 1 , 1] , s = 100 , c = ’ blue ’ , label =
’ Cluster 2 ’)
28 plt . scatter ( X [ y_hc == 2 , 0] , X [ y_hc == 2 , 1] , s = 100 , c = ’ green ’ , label
= ’ Cluster 3 ’)
29 plt . scatter ( X [ y_hc == 3 , 0] , X [ y_hc == 3 , 1] , s = 100 , c = ’ cyan ’ , label =
’ Cluster 4 ’)
30 plt . scatter ( X [ y_hc == 4 , 0] , X [ y_hc == 4 , 1] , s = 100 , c = ’ magenta ’ ,
label = ’ Cluster 5 ’)
31 plt . title ( ’ Clusters of customers ’)
32 plt . xlabel ( ’ Annual Income ( k$ ) ’)
33 plt . ylabel ( ’ Spending Score (1 -100) ’)
34 plt . legend ()
35 plt . show ()
Listing 21: Hierarchical Clustering Code
The dendrogram and traning data visualization is in the following figures with same data
set in the previous clustering model K-Means Clustering model.
60
Learning consist of two subtopic which are,
The Artificial Neural networks work as a brain. That is why, the structure of this model
is similar to the human brain cell neuron.
The input values is has different unit. Therefore, standardization of input value is im-
portant to imply this input in transfer function.
Standardized input variables are multiplied with an weight which is related with the im-
portance of input values gives us the transfer function.
61
Figure 50: Transfer Function Step in Neurons
Then, all inputs are transferred to the activation function which finalize the learning
process and connected to the output steps.
62
(a) Threshold Function (b) Sigmoid Function
The neurons can consist of more than one neurons’ layer called hidden layers to improve
the accuracy and effected learning time. In hidden layer and output layer, different activa-
tion function can be used to reach optimum learning model as you can see following figure.
63
Figure 53: Multi Layer Neurons with Different Activation Functions
To update weights and minimize error, neurons use back propagation rule.
64
Figure 55: Minimizing Error
For minimizing the error, some rules are used with loss function, which are,
• Gradient Descent
Gradient Descent
The gradient descent use loss function step by step until the error is minimum.
65
Figure 57: Back Propagation with Gradient Descent
For instance, The error is minimized when the gradient is reach the minimum error point
in the following figure.
As shown on the following figure, some error distribution is not like the previous one in
data sets. The minimized error is wrong in this case. That is why, We use this Stochastic
Gradient Descent to upgrade the weight distribution to reach minimum error on the model.
66
Figure 59: Stochastic Gradient Descent
The difference between Gradient Descent and Stochastic Gradient Descent is shown in
the following figure. Stochastic Gradient Descent examine on each single input to obtain
error on the model. However, Gradient Descent use batch model which is total or sum part
of input are used to obtain error and minimized.
Figure 60: Difference Between Batch Gradient Descent and Stochastic Gradient Descent
• Forward Propagation
• Backward Propagation
67
The way of this propagation is shown on the following figures.
To apply the Artificial Neural Networks on the data set, all steps are included in following
Python code.
68
30
31 # Splitting the dataset into the Training set and Test set
32 from sklearn . model_selection import train_test_split
33 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 0)
34
35 # Feature Scaling
36 from sklearn . preprocessing import StandardScaler
37 sc = StandardScaler ()
38 X_train = sc . fit_transform ( X_train )
39 X_test = sc . transform ( X_test )
40
41 # Part 2 - Building the ANN
42
69
81
82 Solution :
83 """
84
85 print ( ann . predict ( sc . transform ([[1 , 0 , 0 , 600 , 1 , 40 , 3 , 60000 , 2 , 1 , 1 ,
50000]]) ) > 0.5)
86
87 """
88 Therefore , our ANN model predicts that this customer stays in the bank !
89 Important note 1: Notice that the values of the features were all input in
a double pair of square brackets . That ’s because the " predict " method
always expects a 2 D array as the format of its inputs . And putting our
values into a double pair of square brackets makes the input exactly a
2 D array .
90 Important note 2: Notice also that the " France " country was not input as a
string in the last column but as "1 , 0 , 0" in the first three columns .
That ’s because of course the predict method expects the one - hot -
encoded values of the state , and as we see in the first row of the
matrix of features X , " France " was encoded as "1 , 0 , 0". And be careful
to include these values in the first three columns , because the dummy
variables are always created in the first columns .
91 """
92
93 # Predicting the Test set results
94 y_pred = ann . predict ( X_test )
95 y_pred = ( y_pred > 0.5)
96 print ( np . concatenate (( y_pred . reshape ( len ( y_pred ) ,1) , y_test . reshape ( len (
y_test ) ,1) ) ,1) )
97
98 # Making the Confusion Matrix
99 from sklearn . metrics import confusion_matrix , accuracy_score
100 cm = confusion_matrix ( y_test , y_pred )
101 print ( cm )
102 accuracy_score ( y_test , y_pred )
Listing 22: Artificial Neural Networks Code
Convolutional Neural Networks are basically used in image processing application. Ma-
chine is trained to realized new image entry with structure in following figure.
70
Figure 62: Convolutional Neural Network Structure
All images have pixels on it. If the image is consist of black and white, the array on
image processing is 2D array. Otherwise, in colored image have three dimension which are
red, green and blue.
The each pixel of image have a number in binary system for image processing. For BW
images, black is 1 and white is 0.
71
Figure 64: Pixels in Binary Systems
There are some rules to reduce the binary map. First method is Convolution which is,
Z ∞
(f )(t) = f (t)g(t − τ )dτ
∞
For application of this method, the first step is choosing a feature detector form input
image which is part of input image. Then, for each part of input image this detectors simi-
larities are searched. Hence, Feature map is prepared.
With application of all different feature map application generate Convolutional Layer.
72
Figure 66: Convolutional Layer
At the end, we have a convolutional Layer and application of this layer to the input image
seen on the computer memory like following figure.
Then, The first activation function is applied in there for ReLU layer.
73
Feature maps in the convolutional layer is turned to pooled feature map as shown on the
following figure.
Similar to the convolutional layer determination steps, this steps specified Pooling Layer
with using each pooled featured map.
Then pooling layer is converted to input layer for Artificial Neural Networks. This step
is called flattening.
74
Figure 71: Converting of Pooling Layer to Input Layer (Flattening)
The Convolutional Neural Network preparation with all steps is shown on the following
figure.
At the end of applying all steps mentioned in previous steps, input layer is used as a
Artificial Neural Network input.
75
Figure 73: Input Application on Artificial Neural Networks for Convolutional Neural Net-
works
Finally, Convolutional Neural Network with all steps seen in the following figure is ready
for the train a data set.
To apply this model for different data set, there are a code written in Python.
76
7
8 # Part 1 - Data Preprocessing
9
10 # Preprocessing the Training set
11 train_datagen = ImageDataGenerator ( rescale = 1./255 ,
12 shear_range = 0.2 ,
13 zoom_range = 0.2 ,
14 horizontal_flip = True )
15 training_set = train_datagen . flow_from_directory ( ’ dataset / training_set ’ ,
16 target_size = (64 , 64) ,
17 batch_size = 32 ,
18 class_mode = ’ binary ’)
19
20 # Preprocessing the Test set
21 test_datagen = ImageDataGenerator ( rescale = 1./255)
22 test_set = test_datagen . flow_from_directory ( ’ dataset / test_set ’ ,
23 target_size = (64 , 64) ,
24 batch_size = 32 ,
25 class_mode = ’ binary ’)
26
77
58
59 # Part 4 - Making a single prediction
60
61 import numpy as np
62 from keras . preprocessing import image
63 test_image = image . load_img ( ’ dataset / single_prediction / cat_or_dog_1 . jpg ’ ,
target_size = (64 , 64) )
64 test_image = image . img_to_array ( test_image )
65 test_image = np . expand_dims ( test_image , axis = 0)
66 result = cnn . predict ( test_image )
67 training_set . class_indices
68 if result [0][0] == 1:
69 prediction = ’ dog ’
70 else :
71 prediction = ’ cat ’
72 print ( prediction )
Listing 23: Convolutional Neural Networks Code
The goal of PCA is identify patterns in data and detect the correlation between variales.
PCA is used in ,
• Noise Filtering
• Visualization
• Feature Extraction
• Stock Market Prediction
• Gene Data Analysis
For application of PCA, following code written in Pyhton will be useful in applying data
set.
78
1 # Principal Component Analysis ( PCA )
2
79
50 for i , j in enumerate ( np . unique ( y_set ) ) :
51 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
52 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
53 plt . title ( ’ Logistic Regression ( Training set ) ’)
54 plt . xlabel ( ’ PC1 ’)
55 plt . ylabel ( ’ PC2 ’)
56 plt . legend ()
57 plt . show ()
58
59 # Visualising the Test set results
60 from matplotlib . colors import ListedColormap
61 X_set , y_set = X_test , y_test
62 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
63 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
64 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
65 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
66 plt . xlim ( X1 . min () , X1 . max () )
67 plt . ylim ( X2 . min () , X2 . max () )
68 for i , j in enumerate ( np . unique ( y_set ) ) :
69 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
70 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
71 plt . title ( ’ Logistic Regression ( Test set ) ’)
72 plt . xlabel ( ’ PC1 ’)
73 plt . ylabel ( ’ PC2 ’)
74 plt . legend ()
75 plt . show ()
Listing 24: Principal Component Analysis (PCA) Code
The Linear Discriminant Analysis is used in the pre-processing step for pattern classifi-
cation in the machine learning application. This dimensionality reduction method has the
goal to project a data set on to a lower-dimensional space. Difference between PCA is that
maximizing the separation between mutiple classes.
80
Figure 75: Difference Between PCA and LDA
The following code written in Python is all about Linear Discriminant Analysis on data
set.
81
28
29 # Training the Logistic Regression model on the Training set
30 from sklearn . linear_model import LogisticRegression
31 classifier = LogisticRegression ( random_state = 0)
32 classifier . fit ( X_train , y_train )
33
34 # Making the Confusion Matrix
35 from sklearn . metrics import confusion_matrix , accuracy_score
36 y_pred = classifier . predict ( X_test )
37 cm = confusion_matrix ( y_test , y_pred )
38 print ( cm )
39 accuracy_score ( y_test , y_pred )
40
41 # Visualising the Training set results
42 from matplotlib . colors import ListedColormap
43 X_set , y_set = X_train , y_train
44 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
45 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
46 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
47 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
48 plt . xlim ( X1 . min () , X1 . max () )
49 plt . ylim ( X2 . min () , X2 . max () )
50 for i , j in enumerate ( np . unique ( y_set ) ) :
51 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
52 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
53 plt . title ( ’ Logistic Regression ( Training set ) ’)
54 plt . xlabel ( ’ LD1 ’)
55 plt . ylabel ( ’ LD2 ’)
56 plt . legend ()
57 plt . show ()
58
59 # Visualising the Test set results
60 from matplotlib . colors import ListedColormap
61 X_set , y_set = X_test , y_test
62 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
63 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
64 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
65 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
66 plt . xlim ( X1 . min () , X1 . max () )
67 plt . ylim ( X2 . min () , X2 . max () )
68 for i , j in enumerate ( np . unique ( y_set ) ) :
69 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
70 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
71 plt . title ( ’ Logistic Regression ( Test set ) ’)
82
72 plt . xlabel ( ’ LD1 ’)
73 plt . ylabel ( ’ LD2 ’)
74 plt . legend ()
75 plt . show ()
Listing 25: Linear Discriminant Analysis Code
The Kernel PCA is same as Principal Component Analysis. Only difference between
PCA and Kernel PCA is Radial Basis Function Kernel application on Kernel PCA.
Kernel PCA can be applied on the data set like following Python code on the below.
1 # Kernel PCA
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Wine . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 0)
16
17 # Feature Scaling
18 from sklearn . preprocessing import StandardScaler
19 sc = StandardScaler ()
20 X_train = sc . fit_transform ( X_train )
21 X_test = sc . transform ( X_test )
22
23 # Applying Kernel PCA
24 from sklearn . decomposition import KernelPCA
25 kpca = KernelPCA ( n_components = 2 , kernel = ’ rbf ’)
26 X_train = kpca . fit_transform ( X_train )
27 X_test = kpca . transform ( X_test )
28
29 # Training the Logistic Regression model on the Training set
30 from sklearn . linear_model import LogisticRegression
31 classifier = LogisticRegression ( random_state = 0)
32 classifier . fit ( X_train , y_train )
33
83
36 y_pred = classifier . predict ( X_test )
37 cm = confusion_matrix ( y_test , y_pred )
38 print ( cm )
39 accuracy_score ( y_test , y_pred )
40
41 # Visualising the Training set results
42 from matplotlib . colors import ListedColormap
43 X_set , y_set = X_train , y_train
44 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
45 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
46 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
47 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
48 plt . xlim ( X1 . min () , X1 . max () )
49 plt . ylim ( X2 . min () , X2 . max () )
50 for i , j in enumerate ( np . unique ( y_set ) ) :
51 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
52 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
53 plt . title ( ’ Logistic Regression ( Training set ) ’)
54 plt . xlabel ( ’ PC1 ’)
55 plt . ylabel ( ’ PC2 ’)
56 plt . legend ()
57 plt . show ()
58
59 # Visualising the Test set results
60 from matplotlib . colors import ListedColormap
61 X_set , y_set = X_test , y_test
62 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
63 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
64 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
65 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) )
)
66 plt . xlim ( X1 . min () , X1 . max () )
67 plt . ylim ( X2 . min () , X2 . max () )
68 for i , j in enumerate ( np . unique ( y_set ) ) :
69 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
70 c = ListedColormap (( ’ red ’ , ’ green ’ , ’ blue ’) ) ( i ) , label = j
)
71 plt . title ( ’ Logistic Regression ( Test set ) ’)
72 plt . xlabel ( ’ PC1 ’)
73 plt . ylabel ( ’ PC2 ’)
74 plt . legend ()
75 plt . show ()
Listing 26: Kernel PCA Code
84
4.7 Model Selection and Boosting
• Model Selection
• Boosting
• Grid Search
The application of k-Fold Validation technique for model selection can be applied as fol-
lowing Python code.
85
1 # k - Fold Cross Validation
2
86
50 for i , j in enumerate ( np . unique ( y_set ) ) :
51 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
52 c = ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
53 plt . title ( ’ Kernel SVM ( Training set ) ’)
54 plt . xlabel ( ’ Age ’)
55 plt . ylabel ( ’ Estimated Salary ’)
56 plt . legend ()
57 plt . show ()
58
59 # Visualising the Test set results
60 from matplotlib . colors import ListedColormap
61 X_set , y_set = X_test , y_test
62 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
63 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
64 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
65 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
66 plt . xlim ( X1 . min () , X1 . max () )
67 plt . ylim ( X2 . min () , X2 . max () )
68 for i , j in enumerate ( np . unique ( y_set ) ) :
69 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
70 c = ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
71 plt . title ( ’ Kernel SVM ( Test set ) ’)
72 plt . xlabel ( ’ Age ’)
73 plt . ylabel ( ’ Estimated Salary ’)
74 plt . legend ()
75 plt . show ()
Listing 27: K-Fold Cross Validation Code
Grid Search
The next model selection technique is grid search use comparison of two model can be
applied on the data set with parameters is in those model. Then, it specify the exact model
with exact parameters. In this technique, we can also determine all parameter in exact model
to reach the highest accuracy for learning process on data set.
For Grid Search technique application, the following code written in Python can be useful
as an example for you.
1 # Grid Search
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Social_Network_Ads . csv ’)
10 X = dataset . iloc [: , : -1]. values
87
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =
0.25 , random_state = 0)
16
17 # Feature Scaling
18 from sklearn . preprocessing import StandardScaler
19 sc = StandardScaler ()
20 X_train = sc . fit_transform ( X_train )
21 X_test = sc . transform ( X_test )
22
23 # Training the Kernel SVM model on the Training set
24 from sklearn . svm import SVC
25 classifier = SVC ( kernel = ’ rbf ’ , random_state = 0)
26 classifier . fit ( X_train , y_train )
27
28 # Making the Confusion Matrix
29 from sklearn . metrics import confusion_matrix , accuracy_score
30 y_pred = classifier . predict ( X_test )
31 cm = confusion_matrix ( y_test , y_pred )
32 print ( cm )
33 accuracy_score ( y_test , y_pred )
34
35 # Applying k - Fold Cross Validation
36 from sklearn . model_selection import cross_val_score
37 accuracies = cross_val_score ( estimator = classifier , X = X_train , y =
y_train , cv = 10)
38 print ( " Accuracy : {:.2 f } % " . format ( accuracies . mean () *100) )
39 print ( " Standard Deviation : {:.2 f } % " . format ( accuracies . std () *100) )
40
41 # Applying Grid Search to find the best model and the best parameters
42 from sklearn . model_selection import GridSearchCV
43 parameters = [{ ’C ’: [0.25 , 0.5 , 0.75 , 1] , ’ kernel ’: [ ’ linear ’]} ,
44 { ’C ’: [0.25 , 0.5 , 0.75 , 1] , ’ kernel ’: [ ’ rbf ’] , ’ gamma ’:
[0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9]}]
45 grid_search = GridSearchCV ( estimator = classifier ,
46 param_grid = parameters ,
47 scoring = ’ accuracy ’ ,
48 cv = 10 ,
49 n_jobs = -1)
50 grid_search . fit ( X_train , y_train )
51 best_accuracy = grid_search . best_score_
52 best_parameters = grid_search . best_params_
53 print ( " Best Accuracy : {:.2 f } % " . format ( best_accuracy *100) )
54 print ( " Best Parameters : " , best_parameters )
55
56 # Visualising the Training set results
57 from matplotlib . colors import ListedColormap
58 X_set , y_set = X_train , y_train
59 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
60 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
88
[: , 1]. max () + 1 , step = 0.01) )
61 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
62 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
63 plt . xlim ( X1 . min () , X1 . max () )
64 plt . ylim ( X2 . min () , X2 . max () )
65 for i , j in enumerate ( np . unique ( y_set ) ) :
66 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
67 c = ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
68 plt . title ( ’ Kernel SVM ( Training set ) ’)
69 plt . xlabel ( ’ Age ’)
70 plt . ylabel ( ’ Estimated Salary ’)
71 plt . legend ()
72 plt . show ()
73
74 # Visualising the Test set results
75 from matplotlib . colors import ListedColormap
76 X_set , y_set = X_test , y_test
77 X1 , X2 = np . meshgrid ( np . arange ( start = X_set [: , 0]. min () - 1 , stop = X_set
[: , 0]. max () + 1 , step = 0.01) ,
78 np . arange ( start = X_set [: , 1]. min () - 1 , stop = X_set
[: , 1]. max () + 1 , step = 0.01) )
79 plt . contourf ( X1 , X2 , classifier . predict ( np . array ([ X1 . ravel () , X2 . ravel () ])
. T ) . reshape ( X1 . shape ) ,
80 alpha = 0.75 , cmap = ListedColormap (( ’ red ’ , ’ green ’) ) )
81 plt . xlim ( X1 . min () , X1 . max () )
82 plt . ylim ( X2 . min () , X2 . max () )
83 for i , j in enumerate ( np . unique ( y_set ) ) :
84 plt . scatter ( X_set [ y_set == j , 0] , X_set [ y_set == j , 1] ,
85 c = ListedColormap (( ’ red ’ , ’ green ’) ) ( i ) , label = j )
86 plt . title ( ’ Kernel SVM ( Test set ) ’)
87 plt . xlabel ( ’ Age ’)
88 plt . ylabel ( ’ Estimated Salary ’)
89 plt . legend ()
90 plt . show ()
Listing 28: Grid Search Code
4.7.2 XGBoost
XGBoost is a model which train with the highest accuracy in all the others mod-
els.XGBoost model is simple and useful model to train machine with data set. There are
four type of XGBoost model which are,
• XGBClassifier
• XGBRegressor
• XGBRFClassifier
89
• XGBRFRegressor
For applying XGBoost models, following code written in Python is good example for
your application.
1 # XGBoost
2
3 # Importing the libraries
4 import numpy as np
5 import matplotlib . pyplot as plt
6 import pandas as pd
7
8 # Importing the dataset
9 dataset = pd . read_csv ( ’ Data . csv ’)
10 X = dataset . iloc [: , : -1]. values
11 y = dataset . iloc [: , -1]. values
12
13 # Splitting the dataset into the Training set and Test set
14 from sklearn . model_selection import train_test_split
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size = 0.2 ,
random_state = 0)
16
17 # Training XGBoost on the Training set
18 from xgboost import XGBClassifier
19 classifier = XGBClassifier ()
20 classifier . fit ( X_train , y_train )
21
90
5 CONCLUSION
As a conclusion of this report, all application and how it do this is explained. For each
part, little explanation is done. The codes are import in each part to use for your own project
or free training application. As you realized, there so many way for machine learning process.
Regression, classification and clustering models are mentioned for simple machine learning
models. The deep learning application is explained for neural network modelling types with
example codes. At the end of this article, The reader can be modelled for machine learning or
deep learning with all these information. Easily Build, Train, and Deploy Machine Learning
Models.
91
6 REFERENCES
1. https://www.udemy.com/course/machinelearning/
2. https://www.udemy.com/course/the-python-mega-course/learn/lecture/7627770?start=15overview
3. https://www.superdatascience.com/podcast/sds-002-machine-learning-recommender-systems-
and-the-future-of-data-with-hadelin-de-ponteves
4. https://www.superdatascience.com/pages/welcome-to-faqbot
5. https://www.superdatascience.com/podcast/sds-041-inspiring-journey-totally-different-
background-data-science
6. https://www.superdatascience.com/podcast/sds-002-machine-learning-recommender-systems-
and-the-future-of-data-with-hadelin-de-ponteves
92