Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
Course Outcomes:
After completion of the course the students will be able to:
1. Analyze and apply different data preparation techniques for Machine Learning applications
2. Identify, Analyze and compare appropriate supervised learning algorithm for given problem
Unit 1. Introduction to ML
Unit 5. Trends in ML
5
Course Contents:
Laboratory Exercises:
1. Demonstrate feature selection techniques for a dataset from UCI ML library.
2. Implement exploratory data analysis for IRIS dataset.
3. Implementation of Tree based Classifiers.
4. Implementation of SVM, Comparison with Tree Based Classifier
5. Implementation of Ensemble, Random Forests. Analyze the Performance.
6. Implementation and Comparison of various clustering techniques such as Spectral & DBSCAN.
7. Demonstrate perform analysis on a given dataset to find accuracy, precision, recall and
confusion matrix for supervised learning algorithms.
8. Mini-Project based on suitable Machine Learning dataset
6
Course Contents:
Text Books:
1. E. Alpaydin, Introduction to Machine Learning, PHI, 2004.
2. Peter Flach: Machine Learning: The Art and Science of Algorithms that Make Sense of
Data, Cambridge University Press, Edition 2012.
3. T. Mitchell, Machine Learning, McGraw-Hill, 1997.
4. Josh Patterson, Adam Gibson, “Deep Learning: A Practitioners Approach”, O‟REILLY,
SPD, ISBN: 978-93-5213-604-9, 2017 Edition 1st
Reference Books:
1. C. M. Bishop: Pattern Recognition and Machine Learning, Springer 1st Edition-2013.
2. Ian H Witten, Eibe Frank, Mark A Hall: Data Mining, Practical Machine Learning Tools
and Techniques, Elsevier, 3rd Edition.
3. Shaishalev-shwartz, Shai Ben-David: Understanding Machine Learning from Theory to
algorithms, Cambridge University Press, ISBN-978-1-107-51282-5, 2014.
7
Course Contents:
Supplementary Reading:
1. AurelienGeron, “Hands-on Machine Learning with Scikit-learn and Tensor flow,
O’Reilly Media
Web Resources:
1. Popular dataset resource for ML beginners: http://archive.ics.uci.edu/ml/index.php
Web links:
1. https://www.kaggle.com/datasets
2. http://deeplearning.net/datasets/
MOOCs:
1. https://swayam.gov.in/nd1_noc20_cs29/preview
2. https://swayam.gov.in/nd1_noc20_cs44/preview
8
Course Contents:
9
Syllabus-Unit 1
Introduction to ML:
Introduction, Data Preparation
Data Encoding Techniques
Data Pre-processing techniques for ML applications.
Feature Engineering:
Dimensionality Reduction using PCA
Exploratory Data Analysis
Feature Selection
10
AI Vs. ML
11
INTRODUCTION
12
Cntd..
To solve a problem on a computer, we need an algorithm.
An algorithm is a sequence of instructions that should be carried out to
transform the input to output
Ex. Sorting Input :set of numbers , output : ordered list
For some tasks ,however, we do not have an algorithm We are machine
learning intelligence
Ex. tell spam emails from legitimate emails
Input :email document (file of characters), output : yes/no output indicating
whether the message is spam or not
computer (machine) to extract automatically the algorithm for this task.
Cntd..
(source: https://medium.com/analytics-vidhya/introduction-to-machine-learning-e1b9c055039c)
• Machine learning is a “Field of study that gives computers the ability to learn without
being explicitly programmed.”
• In other words it is concerned with the question of how to construct computer programs
that automatically improve with the experience. - According to Arthur Samuel(1959)
14
Cntd..
• A computer program is said to learn from experience ‘E’ with respect to some
class of task ‘T’ and performance measure ‘P’ if its performance at task in ‘T’
as measured by ‘P’ improves with experience ‘E’ – Tom M Mitchell
15
Cntd..
Example 1
Classify Email as spam or not spam
• Task (T): Classify email as spam or not spam
• Experience(E): watching the user to mark/label the email as spam or
not spam
• Performance (P): The number or fraction of email to be correctly
classified as spam or not spam
16
Cntd..
Example 2
Recognizing hand written digits/ characters
• Task(T): Recognizing hand written digit
• Experience (E): watching the user to mark/ label the hand written digit
to 10 classes(0-9) & identify underling pattern
• Performance(P):The number of fractions of hand-written digits
correctly classified
17
Why Machine Learning Important?.
• Human expertise does not exist
Navigating on Mars
industrial/manufacturing control
mass spectrometer analysis, drug design, astronomic discovery
• Black-box human expertise OR Some tasks cannot be defined well, except by
examples
face/handwriting/speech recognition/ recognizing people
driving a car, flying a plane
19
How does machine learning help us in daily life?
Social networking :
• Use of the appropriate emotions, suggestions about friend tags on Facebook, filtered on Instagram,
content recommendations and suggested followers on social media platforms, etc., are examples of
how machine learning helps us in social networking.
Personal finance and banking solutions
• Whether it’s fraud prevention, credit decisions, or checking deposits on our smartphones machine
learning does it all.
Commute estimation
• Identification of the route to our selected destination, estimation of the time required to reach that
destination using different transportation modes, calculating traffic time, and so on are all made by
machine learning. 20
Applications of Machine Learning
• Face detection Speech recognition
• Stock prediction Hand-written digit recognition
• Spam Email Detection Computational Biology
• Machine Translation Recommender Systems
• Self-parking Cars Guiding robots
• Airplane Navigation Systems Space Exploration
• Medicine Supermarket Chain
• Data Mining
21
Examples…
Example 1: hand-written digit recognition: Output
to me
A Crop Yield Prediction App in Senegal Using Satellite Imagery (Video Link)
https://www.youtube.com/watch?v=4OnBGkhA4jc&t=160s
.
Data Preparation
sometimes, data in data sets have missing or incomplete information, which leads to less
accurate or incorrect predictions.
Further, sometimes data sets are clean but not adequately shaped, such as aggregated or
pivoted, and some have less business context.
Hence, after collecting data from various data sources, data preparation needs to
transform raw data.
Significant advantages of data preparation in machine learning as follows:
• It helps to provide reliable prediction outcomes in various analytics operations.
• It helps identify data issues or errors and significantly reduces the chances of errors.
• It increases decision-making capability.
• It reduces overall project cost (data management and analytic cost).
• It helps to remove duplicate content to make it worthwhile for different applications.
• It increases model performance.
Steps in Data Preparation Process
40
Cntd..
41
Cntd..
42
Cntd..
43
Cntd..
44
Cntd..
45
Cntd..
2. Outliers or anomalies: Unexpected values
• ML algorithms are sensitive to the range and distribution of values when data
comes from unknown sources.
• These values can spoil the entire machine learning training system and the
performance of the model.
• Hence, it is essential to detect these outliers or anomalies through techniques
such as visualization technique.
46
Cntd..
2. Outliers or anomalies: Unexpected values
• ML algorithms are sensitive to the range and distribution of values when data
comes from unknown sources.
• These values can spoil the entire machine learning training system and the
performance of the model.
• Hence, it is essential to detect these outliers or anomalies through techniques
such as visualization technique.
47
Cntd..
3. Unstructured data format :
• Data comes from various sources and needs to be extracted into a different
format.
• Hence, before deploying an ML project, always consult with domain experts or
import data from known sources.
4. Limited or sparse features / attributes :
• Whenever data comes from a single source, it contains limited features,
• so it is necessary to import data from various sources for feature enrichment
or build multiple features in datasets.
5. Understanding feature engineering:
• Features engineering helps develop additional content in the ML models,
increasing model performance and accuracy in predictions.
48
Cntd..
49
Cntd..
50
Cntd..
51
Cntd..
52
Cntd..
53
Cntd..
54
Cntd..
55
Feature Engineering
Feature engineering is the pre-processing step of machine learning, which
is used to transform raw data into features that can be used for creating a
predictive model using Machine learning or statistical Modelling.
57
Feature Engineering
3.Feature Extraction: Feature extraction is an automated feature
engineering process that generates new variables by extracting them
from the raw data.
The main aim of this step is to reduce the volume of data so that it can be
easily used and managed for data modelling.
Feature extraction methods include cluster analysis, text analytics, edge
detection algorithms, and principal components analysis (PCA).
4. Feature Selection: Feature selection is a way of selecting the subset of
the most relevant features from the original features set by removing the
redundant, irrelevant, or noisy features."
58
Feature Engineering
Steps in Feature Engineering
Data Preparation:
• In this step, raw data acquired from different resources are prepared to make it in a suitable format so that it can be
used in the ML model.
• The data preparation may contain cleaning of data, delivery, data augmentation, fusion, ingestion, or loading.
Exploratory Analysis:
• This step involves analysis, investing data set, and summarization of the main characteristics of data.
• Different data visualization techniques are used to better understand the manipulation of data sources, to find the
most appropriate statistical technique for data analysis & to select the best features for the data.
Benchmark:
• Benchmarking is a process of setting a standard baseline for accuracy to compare all the variables from this
baseline.
• The benchmarking process is used to improve the predictability of the model and reduce the error rate.
59
Feature Engineering
Feature Engineering Techniques:
1. Imputation: Imputation is responsible for handling irregularities within the
dataset.
• For numerical data imputation, a default value can be imputed in a column, and
missing values can be filled with means or medians of the columns.
• For categorical data imputation, missing values can be interchanged with the
maximum occurred value in a column.
2. Handling Outliers: This technique first identifies the outliers and then remove
them out.
• Standard deviation can be used to identify the outliers
• Z-score can also be used to detect outliers.
60
Feature Engineering
Feature Engineering Techniques:
3. Log transform: Log transform helps in handling the skewed data, and it makes
the distribution more approximate to normal after transformation.
4. Binning: can be used to normalize the noisy data. This process involves
segmenting different features into bins.
5. Feature Split: is the process of splitting features intimately into two or more parts
and performing to make new features.
6. One hot encoding: It is a technique that converts the categorical data in a form so
that they can be easily understood by machine learning algorithms and hence can
make a good prediction.
61
Dimensionality Reduction: PCA
• We can see/visualize 2D,3D data …..by scatterplot
• 4D, 5D, 6D ………..use pair plot…..(nC2 pairs)
• 10D,100D,1000D data ?
• Visualization of High dimension (n- dim) data?
n-D Reduce 2-D or 3-D
• map high-dimensional data into low dimensions and preserve all the structure.
• Using Dimensionality Reduction techniques like PCA, t-SNE to visualize high
dimension data
• PCA tries to preserve linear structure, MDS tries to preserve global geometry, and
t-SNE tries to preserve topology
62
Principal Component Analysis (PCA)
Why PCA ?
• For dimensionality reduction i.e. d-dim d’-dim E.g. mnist dataset of 784- dim to 2 dim
# MNIST dataset downloaded from Kaggle :
#https://www.kaggle.com/c/digit-recognizer/data
Application-
• Visualization of high dim data using scatter plot, pair plot etc.
63
Principal Component Analysis (PCA)
PCA Steps for dimensionality reduction:
64
Principal Component Analysis (PCA)
1.Standardization of the data
• missing out on standardization will probably result in a biased outcome.
• Standardization is all about scaling your data in such a way that all the variables and their values lie
within a similar range
• E.g let’s say that we have 2 variables in our data set, one has values ranging between 10-100 and the
other has values between 1000-5000.
• In such a scenario, it is obvious that the output calculated by using these predictor variables is going to
be biased
• standardizing the data into a comparable range is very important.
65
Principal Component Analysis (PCA)
2 Computing the covariance matrix
• A covariance matrix expresses the correlation between the different variables in the data set.
• It is essential to identify heavily dependent variables because they contain biased and redundant
information which reduces the overall performance of the model.
• a covariance matrix is a p × p matrix, where p represents the dimensions of the data set.
• 2-Dimensional data set with variables a and b, the covariance matrix is a 2×2 matrix as shown below
• If the covariance value is negative, it denotes the respective variables are indirectly proportional to each
other
• A positive covariance denotes that the respective variables are directly proportional to each other
66
Principal Component Analysis (PCA)
3. Calculating the Eigenvectors and Eigenvalues
Eigenvectors and eigenvalues are computed from the covariance matrix in order to determine the
principal components of the data set.
What are Principal Components?
• Principal components are the new set of variables that are obtained from the initial set of variables.
• The principal components are computed in such a manner that newly obtained variables are highly
significant and independent of each other.
• The principal components compress and possess most of the useful information that was scattered among
the initial variables.
• E.g data set is of 5 dimensions, then 5 principal components are computed, such that, the first principal
component stores the maximum possible information and the second one stores the remaining maximum
info and so on
67
Principal Component Analysis (PCA)
4. Computing the Principal Components
• Eigenvectors and eigenvalues placed in the descending order
• where the eigenvector with the highest eigenvalue is the most significant and thus forms the
first principal component.
• The principal components of lesser significances can thus be removed in order to reduce the
dimensions of the data.
• The final step in computing the Principal Components is to form a matrix known as the feature
matrix that contains all the significant data variables that possess maximum information about
the data.
68
Principal Component Analysis (PCA)
5. Reducing the dimensions of the data set
• performing PCA is to re-arrange the original data with the final principal components which
represent the maximum and the most significant information of the data set.
• In order to replace the original data axis with the newly formed Principal Components, you
simply multiply the transpose of the original data set by the transpose of the obtained feature
vector.
69
T-SNE
https://distill.pub/2016/misread-tsne/
https://colah.github.io/posts/2014-10-Visualizing-MNIST/
t-SNE is t- distributed stochastic neighborhood embedding
• Used dimensionality reduction
70
T-SNE.
Neighborhood and Embedding
• Points are geometrically together…….Neighborhood
Stochastic …….probabilistic
71
T-SNE
Crowding problem :
E.G …2dim to 1 dim
Sometimes it is impossible to preserve distance in all the neighborhood
points such problem is called crowding problem.
72
T-SNE
https://distill.pub/2016/misread-tsne/
• Run t-SNE on simple dataset
• Perplexity : points in neighbors
• Epsilion : learning rate
• Steps : iteration
t-SNE is iterative algorithm…1,2,3…. Runs t-SNE eventually till points
are not moving (stable configuration…shape does not changing)
1. Always runs t-SNE till shape does not change
2. Always runs t-SNE with multiple perplexity value.
3. Perplexity 2 <= p<=N
73
Exploratory Data Analysis
In statistics, exploratory data analysis (EDA) is an approach to analyzing
data sets to summarize their main characteristics, often with visual
methods.
• Understanding dataset by using tools from statistics or simple plotting
tools
• Understand data ?
• E.G iris dataset….flowers are visually similar
• https://en.wikipedia.org/wiki/Iris_flower_data_set
74
Exploratory Data Analysis
75
Exploratory Data Analysis
Typical graphical techniques used in EDA are:
• Box plot
• Histogram
• Multi-vari chart
• Run chart
• Pareto chart
• Scatter plot
• Stem-and-leaf plot
• Stem-and-leaf plot
• Parallel coordinates
• Odds ratio
• Targeted projection pursuit
76
EDA Example
• Wine quality data set from UCI ML repository
• imported necessary libraries (for this example pandas, numpy,
matplotlib and seaborn) and loaded the data set.
77
EDA
78
EDA Techniques
• found out the total number of rows and columns in the data set using
“.shape”
• Dataset comprises of 4898 observations and 12 characteristics.
• Out of which one is dependent variable and rest 11 are independent
variables — physico-chemical characteristics.
• It is also a good practice to know the columns and their corresponding
data types,along with finding whether they contain null values or not.
79
EDA: Exploratory Data Analysis
80
EDA: Exploratory Data Analysis
What about 4-D, 5-D or n-D scatter plot?
3D scatter plot https://plot.ly/pandas/3d- Pair-plot
scatter-plots/ #Only possible to view 2D patterns.
import plotly plt.close();
import plotly.express as px sns.set_style("whitegrid");
sns.pairplot(iris, hue="species", size=3);
iris = px.data.iris() plt.show()
fig = px.scatter_3d(iris, x='sepal_length',
y='sepal_width', z='petal_width', VIOLIN PLOT
color='species') sns.violinplot(x="species", y="petal_length", data=iris,
size=8)
fig.show()
plt.show()
81
Progressive Data Analysis
•Data has only float and integer values.
•No variable column has null/missing values.
82
PDA Techniques
• Here as you can notice mean value is less than median value of each
column which is represented by 50%(50th percentile) in index column.
• There is notably a large difference between 75th %tile and max values
of predictors “residualsugar”, ”freesulfurdioxide”, ”totalsulfur
dioxide”.
84
Graph Visualisation Techniques
85
86
Data Pre-processing techniques for ML applications
87
88
Box Plot
89
• Minimum
• First quartile
• Median
• Third quartile
• Maximum.
• In the simplest box plot the central rectangle spans the first quartile to
the third quartile (the interquartile range or IQR).
90
91
Sparse Matrix
• In numerical analysis and scientific computing, a sparse matrix or sparse
array is a matrix in which most of the elements are zero.
• By contrast, if most of the elements are nonzero, then the matrix is
considered dense.
• The number of zero-valued elements divided by the total number of elements
(e.g., m × n for an m × n matrix) is called the sparsity of the matrix (which is
equal to 1 minus the density of the matrix).
• Using those definitions, a matrix will be sparse when its Sparsity is greater
than 0.5.
• Conceptually, Sparsity corresponds to systems with few pairwise interactions
92
Feature Engineering: Feature selection
93
Feature Engineering: Feature selection
94
Feature Engineering: Feature selection
• There are two main types of feature selection techniques: supervised and
unsupervised
• Supervised methods may be divided into wrapper, filter and intrinsic
95
Feature Engineering: Feature selection
• Unsupervised feature selection techniques ignores the target variable, such as methods that remove redundant
variables using correlation.
• Supervised feature selection techniques use the target variable, such as methods that remove irrelevant variables.
96
Feature Engineering: Feature selection
Filter:
between each input variable and the target variable using statistics and selecting those
input variables that have the strongest relationship with the target variable.
• These methods can be fast and effective, although the choice of statistical measures
depends on the data type of both the input and output variables.
97
Feature Engineering: Feature selection
Wrapper:
• Wrapper feature selection methods create many models with different subsets of
input features and select those features that result in the best performing model
according to a performance metric
Intrinsic
• machine learning algorithms that perform feature selection automatically as part of
learning the model. These techniques considered as intrinsic feature selection
methods.
• E.g. Decision tree
98
Feature Engineering: Feature selection
99
Feature Engineering: Feature selection
100
Feature Engineering: Feature selection
101
Feature Engineering: Feature selection
103
Feature Engineering: Feature selection
104
Feature Engineering: Feature selection
105
106