Research Trends in Machine Learning: Muhammad Kashif Hanif
Research Trends in Machine Learning: Muhammad Kashif Hanif
Learning
2
Today’s Topic
• Machine Learning
• Supervised Machine Learning
– Classification
– Regression
• Unsupervised Machine Learning
• Preprocessing Techniques
3
Machine Learning
• Machine learning is about extracting knowledge
from data.
• Intersection of
– Statistics
– Computer Science
– Mathematics
• Also called predictive analytics or statistical
learning
4
Machine Learning
– Supervised Learning
– Training data is labeled
– Goal is correctly label new data
– Reinforcement Learning
– Training data is unlabeled
– System receives feedback for its actions
– Goal is to perform better actions
– Unsupervised Learning
– Training data is unlabeled
– Goal is to categorize the observations
Applications of Machine Learning
– Handwriting Recognition
– convert written letters into digital letters
– Language Translation
– translate spoken and or written languages (e.g. Google Translate)
– Speech Recognition
– convert voice snippets to text (e.g. Siri, Cortana, and Alexa)
– Image Classification
– label images with appropriate categories (e.g. Google Photos)
– Autonomous Driving
– enable cars to drive
Features in Machine Learning
Predictio
n:
Image:
– Precision
– Percentage of positive labels that are correct
– Precision = (# true positives) / (# true positives + # false positives)
– Recall
– Percentage of positive examples that are correctly labeled
– Recall = (# true positives) / (# true positives + # false negatives)
– Accuracy
– Percentage of correct labels
– Accuracy = (# true positives + # true negatives) / (# of samples)
Training and Test Data
– Training Data
– data used to learn a model
– Test Data
– data used to assess the accuracy of model
– Overfitting
– Model performs well on training data but poorly on test data
Bias and Variance
– Model Scenarios
– High Bias: Model makes inaccurate predictions on training data
– High Variance: Model does not generalize to new datasets
– Low Bias: Model makes accurate predictions on training data
– Low Variance: Model generalizes to new datasets
Supervised Machine Learning
– Supervised learning uses algorithm to map input variables (x) to the
output variable (Y)
Y = f(X)
– The goal is to approximate the mapping function so well that when
you have new input data (x) that you can predict the output variables
(Y) for that data.
Source: https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/
https://www.sciencedirect.com/bookseries/advances-in-computers
Supervised Learning (cont…)
• Supervised learning means the tuning of model parameters using
labeled data sets so that the tuned model parameters can work for
larger and unseen data
16
https://stackoverflow.com/questions/19170603/what-is-the-difference-between-labeled-and-unlabeled-data
Example
17
Example (cont…)
18
Supervised Learning Algorithms
• Decision Trees
• Random Forest
• Linear Regression
• K-Nearest Neighbor
• Neural Networks
Supervised Learning Frameworks
Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 21
Features
• Dataset
– Collection of data objects
• Records, vectors, observation, samples etc
• Data objects can be described by features
• A feature is an individual measurable property or
characteristic of a phenomenon being observed
(Wikipedia).
• Features are also called variables,
characteristics, attributes or dimensions
Source: 22
https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825#:~:text=In%20any%20Machine%20Learning%20process,easily%20interpreted%2
0by%20the%20algorithm.
Data Types
Source: 23
https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825#:~:text=In%20any%20Machine%20Learning%20process,easily%20interpreted%2
0by%20the%20algorithm.
Data Types
Source: 24
https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825#:~:text=In%20any%20Machine%20Learning%20process,easily%20interpreted%2
0by%20the%20algorithm.
Acknowledgement
Slides contents are based on
• Introduction to Machine Learning with Python by
Andreas C. Müller and Sarah Guido
• Data Mining Concepts and Techniques by Han et al.
• Examples are taken from scikit
26
Decision Tree
• Decision tree induction is the learning of decision
trees from class-labeled training tuples.
• A decision tree is a flowchart-like tree structure,
where
– each internal node (nonleaf node) denotes a test on
an attribute,
– each branch represents an outcome of the test, and
– each leaf node (or terminal node) holds a class label.
– The topmost node in a tree is the root node.
• Widely used decision tree algorithms are
– ID3 (Iterative Dichotomiser)
– C4.5 (a successor of ID3)
27
– CART (Classification and Regression Trees)
Example
• Classification
• Greedy approach
31
Basic algorithm
35
Attribute Selection Measures
• Problem
– selecting the splitting criterion that “best” separates a
given data partition, D, of class-labeled training tuples
into individual classes.
• Splitting rule
• Measures ranking for each attribute
• Splitting attribute
• Methods
– Information gain
– Gain ratio
– Gini index
Source: 54
https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825#:~:text=In%20any%20Machine%20Learning%20process,easily%20interpreted%2
0by%20the%20algorithm.
Preprocessing Techniques
Some of data preprocessing operations are
• Data cleaning
– Removing or correcting records with corrupted or
invalid values from raw data
– Handling missing values
• removing records which are missing a large number of
columns.
• Filling in missing values
– Noisy data smoothness
– Handling outliers
• Outliers are values which are significantly different from
other observations.
Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 56
Preprocessing Techniques
• Instance selection and partition
– Selecting data points from the input dataset to create
training, evaluation (validation), and test sets.
• repeatable random sampling
• minority classes oversampling
Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 57
Preprocessing Techniques (cont..)
• Feature tuning
– Improving the quality of a feature for ML
• Scaling
• normalizing numeric values
• imputing missing values
• clipping outliers
• adjusting values with skewed distributions.
Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 58
Preprocessing Techniques (cont..)
• Representation transformation
– Some ML models work with
• Numeric features
• Categorical features
• Mixed type features
– Techniques to convert numeric feature to categorical
feature
• Bucketization
– Techniques to convert categorical feature to numerical
feature
• Encoding
• Learning with count
• Sparse feature embedding
Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 59
Preprocessing Techniques (cont..)
• Feature extraction
– Dimensionality Reduction Techniques
• Feature selection
– Filter methods
– Wrapper method
• Feature construction
– New features can be created using
• Polynomial expansion
• Feature crossing
Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 60
Preprocessing Techniques (cont..)
• For unstructured data
– Deep learning uses domain knowledge-based feature
engineering by folding it into the model architecture.
• For example, a convolutional layer is an automatic
feature preprocessor.
– Some amount of preprocessing is required in some
situations like
• For text documents, stemming and lemmatization,
TF-IDF calculation, and n-gram extraction, embedding
lookup.
• For images, clipping, resizing, cropping, Gaussian blur,
and canary filters.
Source: https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt1 61
Data Cleaning
• Incorrect data is either
– Corrected
– Removed
– Imputed
• Irrelevant data
• Duplicates
• Type conversion
– Numbers are stored as numerical data type
– Date should be as date object or time stamp
– Categorical values can be converted into and from
numbers.
– Values which cannot be converted to the specified type
should be converted to NA
62
Data Cleaning (cont…)
• Syntax errors
– Remove white spaces
– Pad strings
– Fix typos
63
Missing Values
• Ignore
– Not useful
• Drop
– Some of values in a column are missing and occur at
random
• drop rows (observations) having missing values.
– Most of the column’s values are missing, and occur at
random
• drop the whole column.
64
Missing Values (cont…)
• Impute
– Compute missing values based on other observations
• Use statistical values
– Mean (if data is not skewed)
– Median (for skewed data)
– Not useful for
» Biased data
» Many missing values
• Hot-deck
– Copying values from similar records
• Values estimated by other predictive models
– Linear regression
» Sensitive to outliers
– K nearest neighbours
65
Missing Values (cont…)
• Flag
– Filling missing values can leads to loss of information
– Fill missing numeric data with 0
– Fill missing categorical data with “missing”
66
Missing Values Examples
67
Acknowledgement
• Slides contents are based on Introduction to Machine
Learning with Python by Andreas C. Müller and Sarah
Guido
• Examples are taken from scikit
• https://benalexkeen.com/feature-scaling-with-scikit-learn/
• https://towardsdatascience.com/introduction-to-data-prep
rocessing-in-machine-learning-a9fa83a5dc9d
• Data related to data cleaning is taken from
https://towardsdatascience.com/the-ultimate-guide-to-dat
a-cleaning-3969843991d4
• Examples
– https://machinelearningmastery.com/handle-missing-data-python
/
68
Standardize
• String
– All values are in upper or lower case
• Numerical values
– All values have similar measurement units
• For example, length can be in meters, kilo meters etc
• Dates
– Timestamp
– Date object
– Time zone
– etc
69
Feature Scaling
• Feature scaling standardize the independent
features present in the data in a fixed range.
• Used for highly varying values.
• Without feature scaling, machine learning
algorithms consider small values as the lower
values and large values to have higher weights.
70
Feature Scaling (cont…)
• Scaling Training and Test Data the Same Way
Introduction to Machine Learning with Python by Andreas C. Müller and Sarah Guido 71
• StandardScaler
• RobustScaler
• MinMaxScaler
• Normalizer
72
StandardScaler
• StandardScalar works well when data is
normally distributed within each feature
• Scaling results feature distribution has
– Removes the Mean, i.e., mean= 0
– Unit variance
• x = (x – mean) / standard_deviation
• As a result of applying StandardScaler, all
features have same magnitude.
• This technique does not guarantee any
particular minimum or maximum values for
features.
73
StandardScaler Example 1
var = [[6, 4], [5, 2], [1, 5], [3, 4],[3, 6]]
df = pd.DataFrame(var)
df.columns = ['X', 'Y']
scaler = StandardScaler()
scaled_df=scaler.fit_transform(df)
scaled_df=pd.DataFrame(scaled_df, columns=['X', 'Y'])
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 5))
ax1.set_title('Before Scaling')
sns.scatterplot(x="X", y="Y", ax=ax1, data=df)
ax2.set_title('After Standard Scaler')
sns.scatterplot(x="X", y="Y", ax=ax2, data=scaled_df)
74
StandardScaler Example 1
75
StandardScaler Example 2
np.random.seed(1)
df = pd.DataFrame({
'x1': np.random.normal(15, 5, 100),
'x2': np.random.normal(20, 8, 100)
})
scaler = preprocessing.StandardScaler()
scaled_df = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_df, columns=['x1', 'x2'])
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 5))
ax1.set_title('Before Scaling')
sns.kdeplot(df['x1'], ax=ax1)
sns.kdeplot(df['x2'], ax=ax1)
ax2.set_title('After Standard Scaler')
sns.kdeplot(scaled_df['x1'], ax=ax2)
sns.kdeplot(scaled_df['x2'], ax=ax2)
plt.show()
76
StandardScaler Example 2
77
StandardScaler Example 3
78
MinMaxScaler
• Rescales data to predefined range
– Normally, range could be
• [0,1]
• [-1,1], if there exists negative values
• X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
• X_scaled = X_std * (max - min) + min
• Not suitable for
– Gaussian distribution
– when the standard deviation is very small. However, it
is sensitive to outliers
– Sensitive to outliers
79
Example
Source: https://jovianlin.io/feature-scaling/
80
RobustScaler
81
Normalization
• Rescaling of the data from the original range so
that all values are within the new range of 0 and
1.
• Normalization requires that you know or are able
to accurately estimate the minimum and
maximum observable values.