0% found this document useful (0 votes)
1 views

Lesson3-IntroML

Uploaded by

Huy Nguyen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Lesson3-IntroML

Uploaded by

Huy Nguyen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Introduction to machine learning

■ Outline:
1. What is machine learning?
2. Types of machine learning
3. Data for machine learning
4. Machine learning for classification
Introduction to machine learning

■ Outline:
1. What is machine learning?
2. Types of machine learning
3. Data for machine learning
4. Machine learning for classification
What is machine learning?

§ The process of solving a practical problem by:


ECG
Ø Gathering a dataset

Ø Based on that dataset building a statistical model which is


assumed to be used somehow to solve the practical problem.
History

§ In 1959, Arthur Samuel (American pioneer in computer


ECG
gaming and AI) coined the term “machine learning” while at
IBM

§ In 1960, IBM used new cool term “machine learning” to


attract clients and talented employees

Speech signal
Speech signal
Speech signal
Three forces brought AI to life

Speech signal
What is machine learning

ECG

Speech signal
Machine learning algorithm

§ Finding a mathematical formula based on a collection of inputs (i.e,


“training data”)
ECG
§ Applying formula to training inputs à produces the desired outputs.
§ Applying formula to novel inputs à generates the correct outputs.
§ New inputs come from the same or a similar statistical distribution.

Speech signal
Traditional programming vs. ML

Traditional programming Machine learning


ECG
■ Feeding computer with rules ■ Feeding computer with huge
amount of data

■ Computer utilizes computing ■ Computer processes the data

■ Coming up with answers ■ Coming up with trained model


that can solve the unseen
Speech signal problems of the real world
Traditional programming vs. ML

ECG

Speech signal
Example of Fizzbuzz game solving

ECG
Example of Fizzbuzz game solving

ECG

Speech signal
Bài tập nhóm

■ Đề: Cho 𝑎 ≠ 0, 𝑏, 𝑐 thực


ECG

Tìm x thoả mãn phương trình trên

■ Hãy lập trình giải phương trình trên theo phương pháp
truyền thống
■ Tìm hiểu về phương pháp giải phương trình trên dùng ML

Gợi ý tìm kiếm: quadratic equation solving


Example of quadratic equation solving

■ Given 𝑎 ≠ 0, 𝑏, 𝑐
ECG

■ Find x?

■ Solving:
Example of quadratic equation solving

■ Given y = ax ! + bx + c
ECG

■ Solve for x = 𝟖?

■ Using ML à y = 3.078x ! + 1.701x + 1.106

■ x = 8 −→ y =?
When ML is used?

ECG

Speech signal
Introduction to machine learning

■ Outline:
1. What is machine learning?
2. Types of machine learning
3. Data for machine learning
4. Teachable machine
Types of ML

■ Supervised learning: training data includes desired outputs


(labels) ECG
Ø Classification
Ø Regression
■ Unsupervised learning: training data does not include desired
outputs (labels)
Ø Clustering

■ Speechrewards
Reinforcement: signal from a sequence of actions
Supervised learning

■ Dataset: set of labeled examples (labeled data)


ECG

Feature vector Label


[height weight gender age] {normal, thin, fat}
■ Goal: to produce a model that takes a feature vector as input
and outputs the label for this feature vector.
Speech signal
■ Ex: detection, classification
Classification

n An application of supervised learning


ECG
n Automatically assigning a label to an unlabeled example

n An classification learning algorithm takes a collection of labeled


examples as inputs and produces a model that can take an
unlabeled example as input and output a label (or a number
label)

n Speechdetection,
Ex: Covid-19 signal traffic light classification
Regression

n An application of supervised learning


ECG
n Automatically predicting a real-valued label (i.e., target) given
an unlabeled example.

n A regression learning algorithm takes a collection of labeled


examples as inputs and produces a model that can take an
unlabeled example as input and output a target.
Speech signal
n Ex: estimating house price based on house features [area, #
bedrooms, location, etc]
Unsupervised learning

■ Dataset: set of unlabeled examples


ECG

Feature vector
[height weight gender age]
■ Goal: to produce a model that takes a feature vector as input and
transforms it to another vector or to a value used to solve a practical
Speech signal
problem.

■ Ex: clustering, outlier detection


Clustering
n An application of unsupervised learning

n ECGassigning a label to examples


Automatically

n Dividing the examples into a number of groups/clusters such


that examples in the same groups are more similar to other
examples in the same group than those in other groups.

Speech signal
An example of clustering

ECG

Speech signal
Reinforcement

■ The machine is capable of perceiving the state of the environment


around as aECG
feature vector.
■ The machine can execute actions in every state.
■ Different action brings different rewards à move the machine to the
other state
■ Goal: to learn a policy (function f ~ model in supervised learning)
that takes the feature vector of a state as input and outputs an
Speech
optimal action signal
(=action maximizes the expected average reward)
■ Ex: game playing, robotics, logistics.
Introduction to machine learning

■ Outline:
1. What is machine learning?
2. Types of machine learning
3. Data for machine learning
4. Machine learning for classification
Importance of data

§ ML depends heavily on data.

§ In every ML/AI projects, data preparation takes most of time

§ Data in unorganized format is not useful for machines to


ingest the useful information.

Ex: self-driving car crash 2017 in Florida, Amazon’s AI


recruiting tool “learnt” gender bias

§ Flawed data can make a ML system harmful.


Data is used for…

§ Train the model

§ Evaluate the model

Data Practical
acquisition usage
Universal set
(unobserved)

Train
Training set Testing set
(observed) (unobserved)
Evaluate/test
What factors make a good dataset?

§ The right quantity

§ The approach to split data

§ The past history

§ Domain expertise (Two key qualities: independence and


identical distribution)

§ The right kind of data transformation

https://www.promptcloud.com/blog/what-to-look-for-in-training-dataset/
Dataset structure

§ Dataset comprises data and labels:

Ø Data: array [m, k] stores the k-D feature vectors of m objects

Ø Labels: contain the m object labels

§ Label types:

Ø Integer numbers

Ø String (class name)

Ø Soft: real numbers in interval [0,1]

Ø Target: numeric values in interval (−∞, +∞)


How to build dataset?
§ Start small and reduce the complexity of the data.

§ Articulate the problem early (i.e., classification, detection,


ranking,…)

§ Establish data collection mechanisms

§ Check the data quality (human errors, technical problems, missing


features, adequate?, imbalanced?)

§ Format data

§ Clean data

§ Segmentation

§ Complete feature engineering


How to build dataset?
An example
Iris dataset (cơ sở dữ liệu hoa diên vĩ)
Iris dataset

§ Perhaps the best known database

§ The dataset contains 3 classes of 50 instances each, where each


class refers to a type of iris plant.

§ Inputs: sepal length in cm, sepal width in cm, petal length in cm,
petal width in cm

§ Outputs: Iris Setosa, Iris Versicolour, Iris Virginica


Introduction to machine learning

■ Outline:
1. What is machine learning?
2. Types of machine learning
3. Data for machine learning
4. Machine learning for classification
Basic terms of classification

ECG

Speech signal
Types of classification

ECG

Speech signal
Implementation of classification

n Template matching (~ grid-by-grid comparison)


ECG
n Machine learning

Speech signal
Implementation of classification

ECG

Speech signal
Implementation of classification

ECG

Speech signal
Bài tập

n Cho biết vector đặc trưng rút trích từ một bức ảnh chụp quả cam là
ECG
{2.7887, 6.5063, 9.4425, 9.8402, -19.5930} và ảnh chụp quả táo là:
{2.6743, 5.7745, 9.9031, 11.0016, -21.4722}.

n Dùng phương pháp so khớp mẫu, với hai kiểu tính khoảng cách là
norm1 và norm2, em hãy phân loại bức ảnh chụp trái cây có vector
đặc trưng là {2.6588, 5.7358, 9.6682, 10.7427, -20.9914} là ảnh
Speech signal
chụp loại quả nào? Giả sử bức ảnh này chỉ chụp một loại quả cam
hoặc táo.
Bài tập mở rộng

n Đề: Phân loại rượu vang Ý bằng phương pháp template matching

n
ECG https://github.com/MukeshTirupathi/Wine-
Cơ sở dữ liệu:
Classifier-Italy?tab=readme-ov-file

These data are the results of a chemical analysis of wines grown in


the same region in Italy but derived from three different cultivars. The
analysis determined the quantities of 13 constituents found in each of
the three types of wines.

n Phân chia dữ liệu: 1 test, toàn bộ còn lại là train

You might also like