0% found this document useful (0 votes)

41 views42 pages

Machine Learning Fundamentals (Updated)

This webinar covers machine learning fundamentals including supervised and unsupervised learning. Supervised learning involves using labelled training data to develop models that can predict outputs for new data, such as classification and regression models. Unsupervised learning is used to find hidden patterns in unlabeled data through techniques like clustering.

Uploaded by

slowkimo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views42 pages

Machine Learning Fundamentals (Updated)

Uploaded by

slowkimo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Machine Learning Fundamentals

Introduction

This webinar covers

 Identifying the needs and goals

 analysing the requirements
 gathering and prepossessing data
 understanding how to apply machine learning in commercials

Lecturer: Samson Hui

IT Support for Research: https://www.polyu.edu.hk/its/researchsupport/en/
Materials on Git Repo: https://polyu.hk/OJETT
Contact Person

Timothy Yim
Senior Specialist
Information and Technology Service
[email protected]
Computer vs Human

 Computers are good at

 94893 ×1235 = 117192855, 2394 ÷ 123804 = 0.17342799….
 Fast Memory
 Fast Calculation
 Fast Signal Transmit

 Humans are good at

 Recognition

 Think out of the box.

 Make decisions base on intelligence and life experience
Our Goal

 Develop algorithms and models so that computers can perform

tasks that traditionally humans are better at.

 And with the help of high computational power and data storage,
hopefully computers can out perform humans in terms of
accuracy, speed and volume.
AI vs Machine Learning vs Deep Learning

 Artificial intelligence - programs and machines to solve problems

like human

 Machine learning is a subset of AI – without explicitly programmed

 Deep learning is a subset of machine learning – neural network

Data Visualization

 Data visualization is the graphical representation of information and data.

 Charts, graphs and maps

 Key tools to tell stories

 Curating data into a form easier to understand

Data Visualization – Data Table vs Graph
Data Visualization – Classifying Iris

Problem:
• Classifying three types of Iris, Setosa, Versicolour and Virginica.

Existing dataset information

• Sepal length (cm)
• Sepal width (cm)
• Petal length (cm)
• Petal width (cm)
Data Visualization – Demo with Orange
Data Prepossessing

 Data Cleaning

 Data Integration

 Data Transformation
Data Transformation

 Examples: Distance (1 km, 1m, 1 cm)

 Measurement in different scale do not contribute equally to our

senses.

 Correlation is much important.

 Data transformation process or feature scaling methods are

needed
Standardization

 Standardization is a widely used data transformation technique to

change feature vectors into representation.

 Transform the data to center it by removing the mean value of each

feature, then scale it by dividing non-constant features by their
standard deviation

 Therefore, scaled data has zero mean and scaled variance.

Standardization Demo with Python
Example Python Code for Standardization
Machine Learning Algorithms

 Machine learning algorithms or models are used to make decision

or prediction with the data. E.g. KNN, Neural Network, SVM…

 The model is said to learn from existing data and giving outputs
with new data.

 For example, traffic patterns prediction

 We will be focusing on machine learning algorithms in later

webinars.
Validation

We need to know that our trained algorithm is working as expected.

Validation is important before we publish our machine learning
program to the world.

The challenge of validation

 Limited existing data
 Past data may not representing the future

We will be introducing one of the widely used validations method to

solve these problems.
K-Fold Cross-Validation

 Cross-validation is a resampling procedure

 Limited data sample.
 k refers the number of folds

Senerio
 The data set is divided into k groups, e.g. 10 groups
 9 groups of data is used to train the machine learning model and
the remaining group is used for testing.
 Iterate each group to become the testing data set.
K-Fold Cross-Validation
Applied Machine Learning in Business
Thanks!
Machine Learning Fundamentals (Session 2)
Objectives

 Machine learning is a subset of AI

 Without explicitly programmed

 Supervised learning VS Unsupervised learning

 And now, we are going to learn how to build a self learn program
How Human Learns

Imagine we are learning how to throw darts….

Brain
Eyes(feedbacks
)

Our goal is to hit the specific the

targets, e.g. bull’s eye, triple 20,
single 16….. Algorithms
Supervised Learning

• Trained on a pre-defined set of data

• Reach conclusion when given new data.

• Develop the function , where is input

Supervised Learning – Classification vs
Regression

• Supervised learning problems can be further grouped into classification and

regression problems.

• Classification – When the output variable is a category, e.g. true or false, red or
blue

• Regression – When the output variable is a real value, e.g. exchange rate,
weight
Supervised Learning – K Nearest Neighbors

• K nearest neighbours is a simple algorithm that stores all available cases and
classifies new cases based on a similarity measure (e.g. distance function).

• Classify by majority votes of its neighbours

• Measured by a distance function

• If K = 1, assigned to the class of its nearest neighbour

Supervised Learning – K Nearest Neighbors

• When K=3, Class B

• When K=6, Class A

Supervised Learning – K Nearest Neighbors

The black line is the decision

boundary
KNN – How K Influences the algorithm

• The boundary becomes smoother with increasing the value of K.

• When K is 1, the algorithm is overfitting the boundary.

• When K is infinite, the prediction will become only one class depending on the
total majority, which is useless….
Error Rates

Most of the time, our trained model will have errors

• Classifying the target to a wrong class
• The predicted value is not exactly equal to the real value

We calculate the error rate to evaluate the effectiveness of our trained model

Bayes Error
• The lowest possible error
rate for any classifier of a
random outcome and is
analogous to the
irreducible error.
Error Rates

In the KNN example, we fine tune the value k to lower the error as much as possible.

But what if we cannot improve the successful rate anymore and it’s still bad….
Supervised Learning – Neural Network
Supervised Learning – Neural Network History

• Warren McCulloch and Walter Pitts (1943) opened the subject by creating a
computational model for neural network.

• First functional networks with many layers were published by Ivakhnenko and
Lapa in 1965.

• The basics of continuous backpropagation were derived in the context by Kelley

in 1960 and by Bryson in 1961, using principles of dynamic programming.

• In 1970, a lot of research were carried out but stagnated because of computers at
that time lacked sufficient power to process useful neural networks.

• Recently, the rise of high performance GPUs and CPUs make multiple layers
neural network feasible and neural network becomes popular.
Supervised Learning – Neural Network

Neural networks are computing systems vaguely inspired by the biological neural
networks that constitute animal brains

Components
• Neurons
• Input layer
• Hidden layer
• Output layer

• Connections and Weights

Supervised Learning – Classifying Iris

Problem:
• Classifying three types of Iris, Setosa, Versicolour and Virginica.

Existing dataset information

• Sepal length (cm)
• Sepal width (cm)
• Petal length (cm)
• Petal width (cm)
Unsupervised Learning

• Dataset without labelled responses

• Find hidden patterns

• Find grouping in data

• Usually less accurate and trustworthy

• Clustering is a common
Clustering

• Involves the grouping of data points

• Similar properties in the same group

• Highly Dissimilar properties in different group

• Work best if the classes not overlapping

Examples:
• K-means clustering
• Hierarchical clustering
• Fuzzy c-means clustering
K-means Clustering

• Target number k – number of centroids

• A centroid is the imaginary or real location representing the center of the cluster

• Allocates every data point to the nearest cluster

• Keeping centroids as small as possible

K-means Clustering

• Target number k – number of centroids

• A centroid is the imaginary or real location representing the center of the cluster

• Allocates every data point to the nearest cluster

• Keeping centroids as small as possible

K-means Clustering - Steps

1. Randomly Initialize a number of classes/groups

2. Classify each point to the closest centre

3. Re-computer the centres by the means of data points

4. Iterate a set number or until centres do not change much

Summary

Supervised Learning
• Labelled data
• Develop the finely tuned function to predict with inputs
• Can be very precise and data are harder to be collected

Unsupervised learning
• Unlabelled data
• Find hidden pattern
• Less trustworthy but data are easier to be collected

Donalek_Classif
No ratings yet
Donalek_Classif
69 pages
Phase Rule: Prof. H. K. Khaira Professor in MSME Deptt. MANIT, Bhopal
No ratings yet
Phase Rule: Prof. H. K. Khaira Professor in MSME Deptt. MANIT, Bhopal
14 pages
An Introduction To Machine Learning
No ratings yet
An Introduction To Machine Learning
136 pages
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
mlintro-4
No ratings yet
mlintro-4
28 pages
Machine Learning KTU Module 1
No ratings yet
Machine Learning KTU Module 1
77 pages
Lecture 1- Introduction to Machine Learning-HO - Ch0
No ratings yet
Lecture 1- Introduction to Machine Learning-HO - Ch0
44 pages
CH 4
No ratings yet
CH 4
106 pages
mlintro-2
No ratings yet
mlintro-2
28 pages
Week 09 Lesson 1 Intro Machine Learning 1 to 32 (4)
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 to 32 (4)
61 pages
AAM book
No ratings yet
AAM book
159 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
SDLC Models - Javatpoint
No ratings yet
SDLC Models - Javatpoint
10 pages
complete ml (1)
No ratings yet
complete ml (1)
325 pages
Class10-Introduction_to_ML
No ratings yet
Class10-Introduction_to_ML
32 pages
asset-v1_MKAU+SEng9032+DEV_01+type@asset+block@ChapOne
No ratings yet
asset-v1_MKAU+SEng9032+DEV_01+type@asset+block@ChapOne
29 pages
CS480 Lecture November 14th
No ratings yet
CS480 Lecture November 14th
72 pages
01_ml_basics
No ratings yet
01_ml_basics
61 pages
Week 8
No ratings yet
Week 8
70 pages
5.1 Large Scale ML
No ratings yet
5.1 Large Scale ML
10 pages
Basics of Machine Learning and Classifications: Dr. Helal Uddin Ahmed
No ratings yet
Basics of Machine Learning and Classifications: Dr. Helal Uddin Ahmed
18 pages
CS3491-AI ML-Chapter 1
No ratings yet
CS3491-AI ML-Chapter 1
19 pages
MAchineLearningNotes
No ratings yet
MAchineLearningNotes
6 pages
Iu 3.6.4 ML 101
No ratings yet
Iu 3.6.4 ML 101
39 pages
Machine Learning IAI
No ratings yet
Machine Learning IAI
94 pages
Machine Learning With Matlab
100% (1)
Machine Learning With Matlab
36 pages
Machine Learning
No ratings yet
Machine Learning
95 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
Tirth.pdf
No ratings yet
Tirth.pdf
19 pages
MLUnit_1
No ratings yet
MLUnit_1
131 pages
Artificial Intelligence in Vision-Based Structural Health Monitoring by Khalid M. Mosalam ..
No ratings yet
Artificial Intelligence in Vision-Based Structural Health Monitoring by Khalid M. Mosalam ..
396 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
MLUnit_1 Share (1)
No ratings yet
MLUnit_1 Share (1)
162 pages
Data analysis ch1
No ratings yet
Data analysis ch1
13 pages
SEng5305-chap-1-Introduction to ML (1)
No ratings yet
SEng5305-chap-1-Introduction to ML (1)
85 pages
mlintro-3
No ratings yet
mlintro-3
28 pages
ML-chap-2
No ratings yet
ML-chap-2
60 pages
Signals and Systems: Dr. V. Simhadri Dr. V. Simhadri
No ratings yet
Signals and Systems: Dr. V. Simhadri Dr. V. Simhadri
82 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
Machine Learning-Lecture 01
No ratings yet
Machine Learning-Lecture 01
28 pages
Lecture 2 Unit 1
No ratings yet
Lecture 2 Unit 1
60 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
ML Notes
No ratings yet
ML Notes
52 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
10.0 Innovative Turbulence Modeling - SST Model in ANSYS CFX
No ratings yet
10.0 Innovative Turbulence Modeling - SST Model in ANSYS CFX
2 pages
Asset-V1 ColumbiaX+CSMM.101x+1T2017+type@asset+block@AI Edx ML 5.1intro
No ratings yet
Asset-V1 ColumbiaX+CSMM.101x+1T2017+type@asset+block@AI Edx ML 5.1intro
70 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Bike Buyer Prediction Using Classification Algorithm
No ratings yet
Bike Buyer Prediction Using Classification Algorithm
19 pages
Analisis Implementasi Manajemen Rantai Pasok Beras Di Perum Bulog Gudang Singakerta Kabupaten Indramayu
No ratings yet
Analisis Implementasi Manajemen Rantai Pasok Beras Di Perum Bulog Gudang Singakerta Kabupaten Indramayu
10 pages
Siemens Xcelerator AI Integrated in Mecalux's New Picking Robots Boosts Effciency Press Company Siemens
No ratings yet
Siemens Xcelerator AI Integrated in Mecalux's New Picking Robots Boosts Effciency Press Company Siemens
4 pages
Supervised and Deep Learning
No ratings yet
Supervised and Deep Learning
83 pages
Fully Spiking Variational Autoencoder: Hiromichi Kamata, Yusuke Mukuta, Tatsuya Harada
No ratings yet
Fully Spiking Variational Autoencoder: Hiromichi Kamata, Yusuke Mukuta, Tatsuya Harada
15 pages
Machine Learning Slides
No ratings yet
Machine Learning Slides
281 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Reitlinger Thermodynamic Cycles
No ratings yet
Reitlinger Thermodynamic Cycles
5 pages
Utilizing SIMULINK and MATLAB in A Graduate Nonlinear Systems Analysis Course
No ratings yet
Utilizing SIMULINK and MATLAB in A Graduate Nonlinear Systems Analysis Course
4 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Experimental Determination of Latent Heat of Fusion of Ice: July 2022
No ratings yet
Experimental Determination of Latent Heat of Fusion of Ice: July 2022
5 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Intentin Autonomous Network Report WEB
No ratings yet
Intentin Autonomous Network Report WEB
39 pages
Entropy
No ratings yet
Entropy
6 pages
Corporate Profile - Shrishti Consultants 20
No ratings yet
Corporate Profile - Shrishti Consultants 20
37 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Test Case Writing
No ratings yet
Test Case Writing
19 pages
Key Elements of Mechatronics
50% (2)
Key Elements of Mechatronics
14 pages
Gibbs Paradox
No ratings yet
Gibbs Paradox
19 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
Intelligent Machines: An Introductory Perspective of Artificial Intelligence and Robotics
No ratings yet
Intelligent Machines: An Introductory Perspective of Artificial Intelligence and Robotics
2 pages
B.E Syllabus For DL
No ratings yet
B.E Syllabus For DL
4 pages
HWFM300-1 - Teaching and Learning Solutions W3 (Unit 2&3) - 1
No ratings yet
HWFM300-1 - Teaching and Learning Solutions W3 (Unit 2&3) - 1
2 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Eee-V-modern Control Theory U1 1-3
No ratings yet
Eee-V-modern Control Theory U1 1-3
3 pages
Fuzzy Min-Max Neural Networks
No ratings yet
Fuzzy Min-Max Neural Networks
32 pages
System Analysis and Design
0% (1)
System Analysis and Design
2 pages
Carnot Cycle
No ratings yet
Carnot Cycle
6 pages
LAB-1: PID Control
No ratings yet
LAB-1: PID Control
17 pages
Systems Integrators in India For Factory Automation, Process Control, and Instrumentation
67% (3)
Systems Integrators in India For Factory Automation, Process Control, and Instrumentation
86 pages
Software Quality Engineering: Ch. # 7: Integrating Quality Activities in The Project Life Cycle
No ratings yet
Software Quality Engineering: Ch. # 7: Integrating Quality Activities in The Project Life Cycle
15 pages
Syllabus - AERO THERMODYNAMICS Print
No ratings yet
Syllabus - AERO THERMODYNAMICS Print
2 pages
DSP Syllabus
No ratings yet
DSP Syllabus
4 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Modeling and Control of An Electric Arc Furnace
No ratings yet
Modeling and Control of An Electric Arc Furnace
6 pages

Machine Learning Fundamentals (Updated)

Uploaded by

Machine Learning Fundamentals (Updated)

Uploaded by

Machine Learning Fundamentals

This webinar covers

 Identifying the needs and goals

Lecturer: Samson Hui

 Computers are good at

 Humans are good at

 Think out of the box.

 Develop algorithms and models so that computers can perform

 Artificial intelligence - programs and machines to solve problems

 Machine learning is a subset of AI – without explicitly programmed

 Deep learning is a subset of machine learning – neural network

 Data visualization is the graphical representation of information and data.

 Charts, graphs and maps

 Key tools to tell stories

 Curating data into a form easier to understand

Existing dataset information

 Examples: Distance (1 km, 1m, 1 cm)

 Measurement in different scale do not contribute equally to our

 Correlation is much important.

 Data transformation process or feature scaling methods are

 Standardization is a widely used data transformation technique to

 Transform the data to center it by removing the mean value of each

 Therefore, scaled data has zero mean and scaled variance.

 Machine learning algorithms or models are used to make decision

 For example, traffic patterns prediction

 We will be focusing on machine learning algorithms in later

We need to know that our trained algorithm is working as expected.

The challenge of validation

We will be introducing one of the widely used validations method to

 Cross-validation is a resampling procedure

 Machine learning is a subset of AI

 Without explicitly programmed

 Supervised learning VS Unsupervised learning

Imagine we are learning how to throw darts….

Our goal is to hit the specific the

• Trained on a pre-defined set of data

• Reach conclusion when given new data.

• Develop the function , where is input

• Supervised learning problems can be further grouped into classification and

• Classify by majority votes of its neighbours

• Measured by a distance function

• If K = 1, assigned to the class of its nearest neighbour

• When K=3, Class B

• When K=6, Class A

The black line is the decision

• The boundary becomes smoother with increasing the value of K.

• When K is 1, the algorithm is overfitting the boundary.

Most of the time, our trained model will have errors

• The basics of continuous backpropagation were derived in the context by Kelley

• Connections and Weights

Existing dataset information

• Dataset without labelled responses

• Find hidden patterns

• Find grouping in data

• Usually less accurate and trustworthy

• Involves the grouping of data points

• Similar properties in the same group

• Highly Dissimilar properties in different group

• Work best if the classes not overlapping

• Target number k – number of centroids

• Allocates every data point to the nearest cluster

• Keeping centroids as small as possible

• Target number k – number of centroids

• Allocates every data point to the nearest cluster

• Keeping centroids as small as possible

1. Randomly Initialize a number of classes/groups

2. Classify each point to the closest centre

3. Re-computer the centres by the means of data points

4. Iterate a set number or until centres do not change much

You might also like