0% found this document useful (0 votes)
64 views80 pages

Research Trends in Machine Learning: Muhammad Kashif Hanif

The document discusses research trends in machine learning. It covers machine learning concepts like supervised and unsupervised learning. It discusses classification and regression algorithms for supervised learning like decision trees, random forests, and neural networks. It also discusses preprocessing techniques like handling different data types and feature extraction. Machine learning applications discussed include image classification, autonomous vehicles, and language translation. Evaluation metrics like precision, recall and accuracy are also covered.

Uploaded by

Xhy Kat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views80 pages

Research Trends in Machine Learning: Muhammad Kashif Hanif

The document discusses research trends in machine learning. It covers machine learning concepts like supervised and unsupervised learning. It discusses classification and regression algorithms for supervised learning like decision trees, random forests, and neural networks. It also discusses preprocessing techniques like handling different data types and feature extraction. Machine learning applications discussed include image classification, autonomous vehicles, and language translation. Evaluation metrics like precision, recall and accuracy are also covered.

Uploaded by

Xhy Kat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 80

Research Trends in Machine

Learning

Muhammad Kashif Hanif


5/28/21
Acknowledgement
Slides contents are based on
• Introduction to Machine Learning with Python by
Andreas C. Müller and Sarah Guido
• Data Mining Concepts and Techniques by Han et al.
• Examples are taken from scikit

2
Today’s Topic
• Machine Learning
• Supervised Machine Learning
– Classification
– Regression
• Unsupervised Machine Learning
• Preprocessing Techniques

3
Machine Learning
• Machine learning is about extracting knowledge
from data.
• Intersection of
– Statistics
– Computer Science
– Mathematics
• Also called predictive analytics or statistical
learning

4
Machine Learning

– Machine Learning is the ability to teach a computer without explicitly


programming it

– Examples are used to train computers to perform tasks that would be


difficult to program
Types of Machine Learning

– Supervised Learning
– Training data is labeled
– Goal is correctly label new data
– Reinforcement Learning
– Training data is unlabeled
– System receives feedback for its actions
– Goal is to perform better actions
– Unsupervised Learning
– Training data is unlabeled
– Goal is to categorize the observations
Applications of Machine Learning

– Handwriting Recognition
– convert written letters into digital letters
– Language Translation
– translate spoken and or written languages (e.g. Google Translate)
– Speech Recognition
– convert voice snippets to text (e.g. Siri, Cortana, and Alexa)
– Image Classification
– label images with appropriate categories (e.g. Google Photos)
– Autonomous Driving
– enable cars to drive
Features in Machine Learning

– Features are the observations that are used to form predictions


– For image classification, the pixels are the features
– For voice recognition, the pitch and volume of the sound samples are the features
– For autonomous cars, data from the cameras, range sensors, and GPS are
features

– Extracting relevant features is important for building a model


– Time of day is an irrelevant feature when classifying images
– Time of day is relevant when classifying emails because SPAM often occurs at
night

– Common Types of Features in Robotics


– Pixels (RGB data)
– Depth data (sonar, laser rangefinders)
– Movement (encoder values)
– Orientation or Acceleration (Gyroscope, Accelerometer, Compass)
Measuring Success for Classification

– True Positive: Correctly identified as relevant


– True Negative: Correctly identified as not relevant
– False Positive: Incorrectly labeled as relevant
– False Negative: Incorrectly labeled as not relevant
Example: Identify Cats

Predictio
n:

Image:

True True False False


Positive Negative Negative Positive

Images from the STL-10 dataset


Precision, Recall, and Accuracy

– Precision
– Percentage of positive labels that are correct
– Precision = (# true positives) / (# true positives + # false positives)
– Recall
– Percentage of positive examples that are correctly labeled
– Recall = (# true positives) / (# true positives + # false negatives)
– Accuracy
– Percentage of correct labels
– Accuracy = (# true positives + # true negatives) / (# of samples)
Training and Test Data

– Training Data
– data used to learn a model
– Test Data
– data used to assess the accuracy of model

– Overfitting
– Model performs well on training data but poorly on test data
Bias and Variance

– Bias: expected difference between model’s prediction and truth


– Variance: how much the model differs among training sets

– Model Scenarios
– High Bias: Model makes inaccurate predictions on training data
– High Variance: Model does not generalize to new datasets
– Low Bias: Model makes accurate predictions on training data
– Low Variance: Model generalizes to new datasets
Supervised Machine Learning
– Supervised learning uses algorithm to map input variables (x) to the
output variable (Y)
Y = f(X)
– The goal is to approximate the mapping function so well that when
you have new input data (x) that you can predict the output variables
(Y) for that data.

– In supervised learning labeled data is used which is a data set that


has been classified to infer a learning algorithm.
– The data set is used as the basis for predicting other unlabeled
data through the use of machine learning algorithms.
– It is called as supervised, because the learning process is done
under the seen label of observation variables.

Source: https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/
https://www.sciencedirect.com/bookseries/advances-in-computers
Supervised Learning (cont…)
• Supervised learning means the tuning of model parameters using
labeled data sets so that the tuned model parameters can work for
larger and unseen data

Training Dataset Algorithm Model Creation Prediction


Labeled Data
• Labeled data, used by Supervised learning add
meaningful tags or labels or class to the observations (or rows).
• These tags can come from observations or asking people or
specialists about the data.

16
https://stackoverflow.com/questions/19170603/what-is-the-difference-between-labeled-and-unlabeled-data
Example

17
Example (cont…)

18
Supervised Learning Algorithms
• Decision Trees

• Random Forest

• Support Vector Machines

• Linear Regression

• K-Nearest Neighbor

• Neural Networks
Supervised Learning Frameworks

Tool Uses Language


Scikit-Learn Classification, Python
Regression,
Clustering
Spark MLlib Classification, Scala, R, Java
Regression,
Clustering
Weka Classification, Java
Regression,
Clustering
Caffe Neural Networks C++, Python
TensorFlow Neural Networks Python
Flow of Data to Machine Learning

Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 21
Features
• Dataset
– Collection of data objects
• Records, vectors, observation, samples etc
• Data objects can be described by features
• A feature is an individual measurable property or
characteristic of a phenomenon being observed
(Wikipedia).
• Features are also called variables,
characteristics, attributes or dimensions

Source: 22
https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825#:~:text=In%20any%20Machine%20Learning%20process,easily%20interpreted%2
0by%20the%20algorithm.
Data Types

Source: 23
https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825#:~:text=In%20any%20Machine%20Learning%20process,easily%20interpreted%2
0by%20the%20algorithm.
Data Types

Source: 24
https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825#:~:text=In%20any%20Machine%20Learning%20process,easily%20interpreted%2
0by%20the%20algorithm.
Acknowledgement
Slides contents are based on
• Introduction to Machine Learning with Python by
Andreas C. Müller and Sarah Guido
• Data Mining Concepts and Techniques by Han et al.
• Examples are taken from scikit

26
Decision Tree
• Decision tree induction is the learning of decision
trees from class-labeled training tuples.
• A decision tree is a flowchart-like tree structure,
where
– each internal node (nonleaf node) denotes a test on
an attribute,
– each branch represents an outcome of the test, and
– each leaf node (or terminal node) holds a class label.
– The topmost node in a tree is the root node.
• Widely used decision tree algorithms are
– ID3 (Iterative Dichotomiser)
– C4.5 (a successor of ID3)
27
– CART (Classification and Regression Trees)
Example

Source: Data Mining Concepts and Techniques by Han et al.


28
Example

Source: Data Mining Concepts and Techniques by Han et al.


29
Source: Data Mining Concepts and Techniques by Han et al.
30
Decision Tree
• Binary Tree?

• Classification

• Greedy approach

• Most algorithms construct decision tree in a top-


down recursive divide-and-conquer manner

31
Basic algorithm

Source: Data Mining Concepts and Techniques by Han et al.


32
Partitioning Tuple on Splitting Criteria

Source: Data Mining Concepts and Techniques by Han et al.


33
Attribute Selection
Measure
By
Muhammad Kashif Hanif
Acknowledgement
Slides contents are based on
• Data Mining Concepts and Techniques by Han et al.

35
Attribute Selection Measures
• Problem
– selecting the splitting criterion that “best” separates a
given data partition, D, of class-labeled training tuples
into individual classes.
• Splitting rule
• Measures ranking for each attribute
• Splitting attribute
• Methods
– Information gain
– Gain ratio
– Gini index

Source: Data Mining Concepts and Techniques by Han et al.


Attribute Selection Measures
• Let D, the data partition, be a training set of
class-labeled tuples.
• Suppose the class label attribute has m distinct
values defining m distinct classes, Ci (for i = 1,
… , m).
• Let Ci,D be the set of tuples of class Ci in D.
• Let |D| and |Ci,D| denote the number of tuples in
D and Ci,D, respectively.

Source: Data Mining Concepts and Techniques by Han et al.


Acknowledgement
Slides contents are based on
• Data Mining Concepts and Techniques by Han et al.
Information Gain
• Attribute selection measure
• Attribute with the highest information gain is chosen as the
splitting attribute
• The expected information needed to classify a tuple in D is
given by

• where pi is the nonzero probability that an arbitrary tuple in D


belongs to class Ci and is estimated by |Ci,D|/|D|.
• A log function to the base 2 is used, because the information
is encoded in bits.
• Info(D) is just the average amount of information needed to
identify the class label of a tuple in D.
• Info(D) is also known as the entropy of D.

Source: Data Mining Concepts and Techniques by Han et al.


Information Gain

Source: Data Mining Concepts and Techniques by Han et al.


Information Gain Example

Source: Data Mining Concepts and Techniques by Han et al.


Information Gain Example

• Gain(income) = 0.029 bits


• Gain(student) = 0.151 bits
• Gain(credit-rating) = 0.048 bits

Source: Data Mining Concepts and Techniques by Han et al.


Information Gain Example

Source: Data Mining Concepts and Techniques by Han et al.


Acknowledgement
Slides contents are based on
• Data Mining Concepts and Techniques by Han et al.
Gain Ratio
• The information gain measure is biased toward tests with many
outcomes.

• It prefers to select attributes having a large number of values.

• For example, consider an attribute that acts as a unique identifier


such as product_ID.
– A split on product_ID would result in a large number of partitions (as many as
there are values), each one containing just one tuple.
– Because each partition is pure, the information required to classify data set D
based on this partitioning would be Infoproduct_ID.(D)=0.
– Therefore, the information gained by partitioning on this attribute is maximal.

Source: Data Mining Concepts and Techniques by Han et al.


Example

Source: Data Mining Concepts and Techniques by Han et al.


Gain Ratio
• Gain ratio applies a kind of normalization to information gain using a
“split information” value defined analogously with Info(D) as

• Gain(income) = 0.029 bits


• GainRation(income) = 0.029/1.557 = 0.019
• The attribute with the maximum gain ratio is selected as the splitting
attribute.

Source: Data Mining Concepts and Techniques by Han et al.


Acknowledgement
Slides contents are based on
• Data Mining Concepts and Techniques by Han et al.
Gini Index
• Gini index measures the impurity of D, a data
partition or set of training tuples

where pi is the probability that a tuple in D belongs


to class Ci and is estimated by |Ci,D|/|D|. The sum
is computed over m classes.

The Gini index considers a binary split for each


attribute.

Source: Data Mining Concepts and Techniques by Han et al.


Gini Index
• If a binary split on A partitions D into D1 and D2,
the Gini index of D given that partitioning is

• For each attribute, each of the possible binary


splits is considered.
• The reduction in impurity that would be incurred
by a binary split

Source: Data Mining Concepts and Techniques by Han et al.


Example

Source: Data Mining Concepts and Techniques by Han et al.


Gini Index Example

• Consider subset for income attribute as D1 = {low, medium} and D2 =


{high}

• Gini index for split {low, high} and {medium} is 0.458


• Gini index for split {medium, high} and {low} is 0.450
• So we choose Gini index with minimum value for subset split {low, medium}
and {high}
• Similar calculations are done for other attributes.

Source: Data Mining Concepts and Techniques by Han et al.


Confusion Matrix
What is Preprocessing?
• Preprocessing is that step in which the data gets
transformed, or Encoded, to bring it to such a
state that now the machine can easily parse it.
In other words, the features of the data can now
be easily interpreted by the algorithm.

Source: 54
https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825#:~:text=In%20any%20Machine%20Learning%20process,easily%20interpreted%2
0by%20the%20algorithm.
Preprocessing Techniques
Some of data preprocessing operations are
• Data cleaning
– Removing or correcting records with corrupted or
invalid values from raw data
– Handling missing values
• removing records which are missing a large number of
columns.
• Filling in missing values
– Noisy data smoothness
– Handling outliers
• Outliers are values which are significantly different from
other observations.
Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 56
Preprocessing Techniques
• Instance selection and partition
– Selecting data points from the input dataset to create 
training, evaluation (validation), and test sets.
• repeatable random sampling
• minority classes oversampling

Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 57
Preprocessing Techniques (cont..)

• Feature tuning
–  Improving the quality of a feature for ML
• Scaling
• normalizing numeric values
• imputing missing values
• clipping outliers
• adjusting values with skewed distributions.

Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 58
Preprocessing Techniques (cont..)
• Representation transformation
– Some ML models work with
• Numeric features
• Categorical features
• Mixed type features
– Techniques to convert numeric feature to categorical
feature
• Bucketization
– Techniques to convert categorical feature to numerical
feature
• Encoding
• Learning with count
• Sparse feature embedding
Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 59
Preprocessing Techniques (cont..)
• Feature extraction
– Dimensionality Reduction Techniques
• Feature selection
– Filter methods
– Wrapper method
• Feature construction
– New features can be created using
• Polynomial expansion
• Feature crossing

Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 60
Preprocessing Techniques (cont..)
• For unstructured data
– Deep learning uses domain knowledge-based feature
engineering by folding it into the model architecture.
• For example, a convolutional layer is an automatic
feature preprocessor.
– Some amount of preprocessing is required in some
situations like
• For text documents, stemming and lemmatization, 
TF-IDF calculation, and n-gram extraction, embedding
lookup.
• For images, clipping, resizing, cropping, Gaussian blur,
and canary filters.

Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 61
Data Cleaning
• Incorrect data is either
– Corrected
– Removed
– Imputed
• Irrelevant data
• Duplicates
• Type conversion
– Numbers are stored as numerical data type
– Date should be as date object or time stamp
– Categorical values can be converted into and from
numbers.
– Values which cannot be converted to the specified type
should be converted to NA
62
Data Cleaning (cont…)
• Syntax errors
– Remove white spaces
– Pad strings
– Fix typos

63
Missing Values
• Ignore
– Not useful
• Drop
– Some of values in a column are missing and occur at
random
• drop rows (observations) having missing values.
– Most of the column’s values are missing, and occur at
random
• drop the whole column.

64
Missing Values (cont…)
• Impute
– Compute missing values based on other observations
• Use statistical values
– Mean (if data is not skewed)
– Median (for skewed data)
– Not useful for
» Biased data
» Many missing values
• Hot-deck
– Copying values from similar records
• Values estimated by other predictive models
– Linear regression
» Sensitive to outliers
– K nearest neighbours
65
Missing Values (cont…)
• Flag
– Filling missing values can leads to loss of information
– Fill missing numeric data with 0
– Fill missing categorical data with “missing”

66
Missing Values Examples

67
Acknowledgement
• Slides contents are based on Introduction to Machine
Learning with Python by Andreas C. Müller and Sarah
Guido
• Examples are taken from scikit
• https://benalexkeen.com/feature-scaling-with-scikit-learn/
• https://towardsdatascience.com/introduction-to-data-prep
rocessing-in-machine-learning-a9fa83a5dc9d
• Data related to data cleaning is taken from
https://towardsdatascience.com/the-ultimate-guide-to-dat
a-cleaning-3969843991d4
• Examples
– https://machinelearningmastery.com/handle-missing-data-python
/

68
Standardize
• String
– All values are in upper or lower case
• Numerical values
– All values have similar measurement units
• For example, length can be in meters, kilo meters etc
• Dates
– Timestamp
– Date object
– Time zone
– etc

69
Feature Scaling
• Feature scaling standardize the independent
features present in the data in a fixed range.
• Used for highly varying values.
• Without feature scaling, machine learning
algorithms consider small values as the lower
values and large values to have higher weights.

70
Feature Scaling (cont…)
• Scaling Training and Test Data the Same Way

Introduction to Machine Learning with Python by Andreas C. Müller and Sarah Guido 71
• StandardScaler
• RobustScaler
• MinMaxScaler
• Normalizer

72
StandardScaler
• StandardScalar works well when data is
normally distributed within each feature
• Scaling results feature distribution has
– Removes the Mean, i.e., mean= 0
– Unit variance
• x = (x – mean) / standard_deviation
• As a result of applying StandardScaler, all
features have same magnitude.
• This technique does not guarantee any
particular minimum or maximum values for
features.
73
StandardScaler Example 1
var = [[6, 4], [5, 2], [1, 5], [3, 4],[3, 6]]

df = pd.DataFrame(var) 
df.columns = ['X', 'Y']

scaler = StandardScaler()
scaled_df=scaler.fit_transform(df)
scaled_df=pd.DataFrame(scaled_df, columns=['X', 'Y'])
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 5))

ax1.set_title('Before Scaling')
sns.scatterplot(x="X", y="Y", ax=ax1, data=df)
ax2.set_title('After Standard Scaler')

sns.scatterplot(x="X", y="Y", ax=ax2, data=scaled_df)

74
StandardScaler Example 1

75
StandardScaler Example 2
np.random.seed(1)
df = pd.DataFrame({
    'x1': np.random.normal(15, 5, 100),
    'x2': np.random.normal(20, 8, 100)
})

scaler = preprocessing.StandardScaler()
scaled_df = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_df, columns=['x1', 'x2'])

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 5))

ax1.set_title('Before Scaling')
sns.kdeplot(df['x1'], ax=ax1)
sns.kdeplot(df['x2'], ax=ax1)
ax2.set_title('After Standard Scaler')
sns.kdeplot(scaled_df['x1'], ax=ax2)
sns.kdeplot(scaled_df['x2'], ax=ax2)
plt.show()
76
StandardScaler Example 2

77
StandardScaler Example 3

78
MinMaxScaler
• Rescales data to predefined range
– Normally, range could be
• [0,1]
• [-1,1], if there exists negative values
• X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
• X_scaled = X_std * (max - min) + min
• Not suitable for
– Gaussian distribution
– when the standard deviation is very small. However, it
is sensitive to outliers
– Sensitive to outliers

79
Example

Source: https://jovianlin.io/feature-scaling/
80
RobustScaler

81
Normalization
• Rescaling of the data from the original range so
that all values are within the new range of 0 and
1.
• Normalization requires that you know or are able
to accurately estimate the minimum and
maximum observable values.

You might also like