L1 Intro
L1 Intro
Khoat Than
School of Information and Communication Technology
Hanoi University of Science and Technology
2024
2
Contents
Introduction to Machine Learning & Data Mining
Supervised learning
Unsupervised learning
Performance evaluation
Practical advice
3
Who is real? Ai thực, ai giả?
4
Why ML & DM?
“The most important general-purpose technology of our era is artificial
intelligence, particularly machine learning” – Harvard Business
Review
https://hbr.org/cover-story/2017/07/the-business-of-artificial-intelligence
90
60
30
Source: Statista
7
Why? Industry 4.0
https://www.pwc.com/ca/en/industries/industry-4-0.html
8
Why? AI & DS & Industry 4.0
Artificial
Intelligence
Machine
Learning
Industry 4.0
Data Science
9
Some successes: Amazon’s secret
Ian Goodfellow
Artificial faces
Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair,
Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In NIPS, pp. 2672-2680. 2014.
11
Some successes: AlphaGo (2016)
AlphaGo of Google DeepMind the world champion at Go
(cờ vây), 3/2016
Go is a 2500-year-old game.
Go is one of the most complex games.
Midjourney
DALL-E 2
A bowl of soup
Imagen
14
Some successes: ChatGPT (2022)
Human-level Chatting, Writing, QA,…
Why ChatGPT is
about to
change how
you work, like it
or not?
- Forbes, Feb. 2, 2023
15
Some successes: Sora (2024)
Generate videos by short descriptions
16
Machine Learning vs Data Mining
Machine Learning Data Mining
(ML - Học máy) (DM - Khai phá dữ liệu)
To build computer systems To find new and useful
that can improve themselves knowledge from datasets.
by learning from data.
(Tìm ra/Khai phá những tri thức
(Xây dựng những hệ thống mà mới và hữu dụng từ các tập dữ
có khả năng tự cải thiện bản liệu lớn.)
thân bằng cách học từ dữ liệu.)
texts in websites, emails, articles, tweets 2D/3D images, videos + meta spectrograms, DNAs, …
18
Methodology: product-driven
Business Analytic
understanding approach
Data
Feedback
requirements
Data
Deployment
collection
Data
Evaluation
understanding
Data
Modeling
preparation (http://www.theta.co.nz/)
19
Methodology: insight-driven
v0.5 05/09
60%
Precision
v0.4 12/08
50% v0.3 08/08
v0.2 05/08
40%
v0.1 12/07
30%
20%
10%
Baseline 12/06
0%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
© Data Science Laboratory,
% Answered SOICT, HUST, 2017
21
What is Machine Learning?
Machine Learning (ML) is an active subfield of Artificial
Intelligence.
ML seeks to answer the question [Mitchell, 2006]
How can we build computer systems that automatically improve with
experience, and what are the fundamental laws that govern all
learning processes?
a small hedgehog
a girl giving cat a gentle hug holding a piece of
lychee-inspired spherical
chair watermelon
After learning:
We obtain a model, new knowledge, or new experience (f).
We can use that model/function to do prediction or inference for future
observations, e.g.,
𝑦 = 𝑓(𝑥)
27
Two basic learning problems
There is an unknown function 𝑦 ∗ that maps each x to a
number 𝑦 ∗ (𝑥)
In practice, we can collect some pairs: (xi, yi), where 𝑦𝑖 = 𝑦 ∗ 𝑥𝑖
Community detection
Detect communities in online social networks
32
Unsupervised learning: examples (2)
Trends detection
Discover the trends, demands, future needs
of online users
33
Design a learning system (1)
Some issues should be carefully considered when designing
a learning system.
Business Analytic
Determine the type of the understanding approach
𝑦 ∗ : X → set of labels/tags
Data
Evaluation
understanding
Data
Collect a training set:
Modeling
preparation
Business Analytic
understanding approach
Data
Feedback
requirements
Modeling
Data ID3? …
preparation
35
ML: some issues (1)
Learning algorithm
Under what conditions the chosen algorithm will (asymptotically)
converge?
(với điều kiện nào thì thuật toán học sẽ hội tụ?)
For a given application/domain and a given objective function, what
algorithm performs best?
(Đối với một ứng dụng và mục tiêu cho trước, thuật toán nào sẽ tốt nhất?)
Test error
Error
Training error
Training
error
Training
error