Machine Learning
Machine Learning
Dr.D.Senthilkumar
University College of Engineering (BIT Campus)
Anna University, Tiruchirappalli
Mathematics
Statistics
Research Software
Development
Data
Domain, Science CS / IT
Business
Knowledge Machine
Learning
Sensor technology
Communication technology
Machine Learning
Human-machine interface (UI/UX)
• Need a new series to fill the binge void? Netflix can recommend one.
In fact, it probably already has — just check your homepage. Using
machine learning to curate its enormous collection of TV shows and
movies, Netflix taps the streaming history and habits of its millions of
users to predict what individual viewers will likely enjoy.
• Example: Optimail
• Application area: Marketing
• Optimail uses artificial intelligence and machine learning to deliver
more effective email marketing campaigns by customizing and
personalizing content, as well as adjusting scheduling, to have the
greatest impact on each recipient.
• Define Problem.
• Prepare Data.
• Evaluate Algorithms.
• Improve Results.
• Present Results.
• Loading data
• Summarizing your data
• Evaluating algorithms
• And making some predictions.
1.2 Install R
• There are no special requirements.
• R Installation and Administration.
1.3 Start R
• You can start R from whatever menu system you use on your operating system.
• How many instances (rows) and how many attributes (columns) the
data contains with the dim function.
• You should see that all of the inputs are double and that the class value
is a factor:
• We can see that each class has the same number of instances (40 or 33%
of the dataset)
• We can see that all of the numerical values have the same scale
(centimeters) and similar ranges [0,8] centimeters.
barchart(y)
• This is useful to see that there are clearly different distributions of the
attributes for each class value.
• Like the boxplots, we can see the difference in distribution of each attribute
by class value. We can also see the Gaussian like distribution (bell curve)
of each attribute.
• Plots that some of the classes are partially linearly separable in some
dimensions.
• Let’s evaluate 5 different algorithms:
• Linear Discriminant Analysis (LDA)
• Classification and Regression Trees (CART).
• kNearest Neighbors (kNN).
• Support Vector Machines (SVM) with a linear kernel.
• Random Forest (RF)
• This is a good mixture of simple linear (LDA), nonlinear (CART,
kNN) and complex nonlinear methods (SVM, RF).
Dr. D. SENTHIL KUMAR – (AP/CSE) UNIVERSITY COLLEGE OF
6/16/2021 92
ENGINEERING TRICHY
Let’s build our five models
Caret does support the configuration and tuning of the configuration of
each model, but we are not going to cover that in this tutorial.
• This gives a nice summary of what was used to train the model and
the mean and standard deviation (SD) accuracy achieved,
specifically 97.5% accuracy +/4%
Python Data Science Handbook: Essential tools for working with data by Jake VanderPlas