0% found this document useful (0 votes)
25 views

Machine Learning Fundamentals (Updated)

This webinar covers machine learning fundamentals including supervised and unsupervised learning. Supervised learning involves using labelled training data to develop models that can predict outputs for new data, such as classification and regression models. Unsupervised learning is used to find hidden patterns in unlabeled data through techniques like clustering.

Uploaded by

slowkimo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Machine Learning Fundamentals (Updated)

This webinar covers machine learning fundamentals including supervised and unsupervised learning. Supervised learning involves using labelled training data to develop models that can predict outputs for new data, such as classification and regression models. Unsupervised learning is used to find hidden patterns in unlabeled data through techniques like clustering.

Uploaded by

slowkimo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Machine Learning Fundamentals

Introduction

This webinar covers

 Identifying the needs and goals


 analysing the requirements
 gathering and prepossessing data
 understanding how to apply machine learning in commercials

Lecturer: Samson Hui


IT Support for Research: https://www.polyu.edu.hk/its/researchsupport/en/
Materials on Git Repo: https://polyu.hk/OJETT
Contact Person

Timothy Yim
Senior Specialist
Information and Technology Service
[email protected]
Computer vs Human

 Computers are good at


 94893 ×1235 = 117192855, 2394 ÷ 123804 = 0.17342799….
 Fast Memory
 Fast Calculation
 Fast Signal Transmit

 Humans are good at


 Recognition

 Think out of the box.


 Make decisions base on intelligence and life experience
Our Goal

 Develop algorithms and models so that computers can perform


tasks that traditionally humans are better at.

 And with the help of high computational power and data storage,
hopefully computers can out perform humans in terms of
accuracy, speed and volume.
AI vs Machine Learning vs Deep Learning

 Artificial intelligence - programs and machines to solve problems


like human

 Machine learning is a subset of AI – without explicitly programmed

 Deep learning is a subset of machine learning – neural network


Data Visualization

 Data visualization is the graphical representation of information and data.

 Charts, graphs and maps

 Key tools to tell stories

 Curating data into a form easier to understand


Data Visualization – Data Table vs Graph
Data Visualization – Classifying Iris

Problem:
• Classifying three types of Iris, Setosa, Versicolour and Virginica.

Existing dataset information


• Sepal length (cm)
• Sepal width (cm)
• Petal length (cm)
• Petal width (cm)
Data Visualization – Demo with Orange
Data Prepossessing

 Data Cleaning

 Data Integration

 Data Transformation
Data Transformation

 Examples: Distance (1 km, 1m, 1 cm)

 Measurement in different scale do not contribute equally to our


senses.

 Correlation is much important.

 Data transformation process or feature scaling methods are


needed
Standardization

 Standardization is a widely used data transformation technique to


change feature vectors into representation.

 Transform the data to center it by removing the mean value of each


feature, then scale it by dividing non-constant features by their
standard deviation

 Therefore, scaled data has zero mean and scaled variance.


Standardization Demo with Python
Example Python Code for Standardization
Machine Learning Algorithms

 Machine learning algorithms or models are used to make decision


or prediction with the data. E.g. KNN, Neural Network, SVM…

 The model is said to learn from existing data and giving outputs
with new data.

 For example, traffic patterns prediction

 We will be focusing on machine learning algorithms in later


webinars.
Validation

We need to know that our trained algorithm is working as expected.


Validation is important before we publish our machine learning
program to the world.

The challenge of validation


 Limited existing data
 Past data may not representing the future

We will be introducing one of the widely used validations method to


solve these problems.
K-Fold Cross-Validation

 Cross-validation is a resampling procedure


 Limited data sample.
 k refers the number of folds

Senerio
 The data set is divided into k groups, e.g. 10 groups
 9 groups of data is used to train the machine learning model and
the remaining group is used for testing.
 Iterate each group to become the testing data set.
K-Fold Cross-Validation
Applied Machine Learning in Business
Thanks!
Machine Learning Fundamentals (Session 2)
Objectives

 Machine learning is a subset of AI

 Without explicitly programmed

 Supervised learning VS Unsupervised learning

 And now, we are going to learn how to build a self learn program
How Human Learns

Imagine we are learning how to throw darts….


Brain
Eyes(feedbacks
)

Our goal is to hit the specific the


targets, e.g. bull’s eye, triple 20,
single 16….. Algorithms
Supervised Learning

• Trained on a pre-defined set of data

• Reach conclusion when given new data.

• Develop the function , where is input


Supervised Learning – Classification vs
Regression

• Supervised learning problems can be further grouped into classification and


regression problems.

• Classification – When the output variable is a category, e.g. true or false, red or
blue

• Regression – When the output variable is a real value, e.g. exchange rate,
weight
Supervised Learning – K Nearest Neighbors

• K nearest neighbours is a simple algorithm that stores all available cases and
classifies new cases based on a similarity measure (e.g. distance function).

• Classify by majority votes of its neighbours

• Measured by a distance function

• If K = 1, assigned to the class of its nearest neighbour


Supervised Learning – K Nearest Neighbors

• When K=3, Class B

• When K=6, Class A


Supervised Learning – K Nearest Neighbors

The black line is the decision


boundary
KNN – How K Influences the algorithm

• The boundary becomes smoother with increasing the value of K.

• When K is 1, the algorithm is overfitting the boundary.

• When K is infinite, the prediction will become only one class depending on the
total majority, which is useless….
Error Rates

Most of the time, our trained model will have errors


• Classifying the target to a wrong class
• The predicted value is not exactly equal to the real value

We calculate the error rate to evaluate the effectiveness of our trained model

Bayes Error
• The lowest possible error
rate for any classifier of a
random outcome and is
analogous to the
irreducible error.
Error Rates

In the KNN example, we fine tune the value k to lower the error as much as possible.

But what if we cannot improve the successful rate anymore and it’s still bad….
Supervised Learning – Neural Network
Supervised Learning – Neural Network History

• Warren McCulloch and Walter Pitts (1943) opened the subject by creating a
computational model for neural network.

• First functional networks with many layers were published by Ivakhnenko and
Lapa in 1965.

• The basics of continuous backpropagation were derived in the context by Kelley


in 1960 and by Bryson in 1961, using principles of dynamic programming.

• In 1970, a lot of research were carried out but stagnated because of computers at
that time lacked sufficient power to process useful neural networks.

• Recently, the rise of high performance GPUs and CPUs make multiple layers
neural network feasible and neural network becomes popular.
Supervised Learning – Neural Network

Neural networks are computing systems vaguely inspired by the biological neural
networks that constitute animal brains

Components
• Neurons
• Input layer
• Hidden layer
• Output layer

• Connections and Weights


Supervised Learning – Classifying Iris

Problem:
• Classifying three types of Iris, Setosa, Versicolour and Virginica.

Existing dataset information


• Sepal length (cm)
• Sepal width (cm)
• Petal length (cm)
• Petal width (cm)
Unsupervised Learning

• Dataset without labelled responses

• Find hidden patterns

• Find grouping in data

• Usually less accurate and trustworthy

• Clustering is a common
Clustering

• Involves the grouping of data points

• Similar properties in the same group

• Highly Dissimilar properties in different group

• Work best if the classes not overlapping

Examples:
• K-means clustering
• Hierarchical clustering
• Fuzzy c-means clustering
K-means Clustering

• Target number k – number of centroids

• A centroid is the imaginary or real location representing the center of the cluster

• Allocates every data point to the nearest cluster

• Keeping centroids as small as possible


K-means Clustering

• Target number k – number of centroids

• A centroid is the imaginary or real location representing the center of the cluster

• Allocates every data point to the nearest cluster

• Keeping centroids as small as possible


K-means Clustering - Steps

1. Randomly Initialize a number of classes/groups

2. Classify each point to the closest centre

3. Re-computer the centres by the means of data points

4. Iterate a set number or until centres do not change much


Summary

Supervised Learning
• Labelled data
• Develop the finely tuned function to predict with inputs
• Can be very precise and data are harder to be collected

Unsupervised learning
• Unlabelled data
• Find hidden pattern
• Less trustworthy but data are easier to be collected

You might also like