2 Machine Learning Overview
2 Machine Learning Overview
Objectives
2 Huawei Confidential
Contents
3 Huawei Confidential
Machine Learning Algorithms (1)
⚫ Machine learning is often combined with deep learning methods to study and observe AI
algorithms. A computer program is said to learn from experience 𝐸 with respect to some class
of tasks 𝑇 and performance measure 𝑃, if its performance at tasks in 𝑇, as measured by 𝑃,
improves with experience 𝐸.
Understanding
Data Learning algorithm
(Performance
(Experience 𝐸) (Task 𝑇)
measure 𝑃)
4 Huawei Confidential
Machine Learning Algorithms (2)
Historical
Experience
data
Summarize Train
Input Predict Input Predict
New New Future
Rules Future Model
problem data attributes
6 Huawei Confidential
Created by: Jim Liang
Training data
Machine
learning
New
Model Prediction
data
Task rules change over time, for example, Data distribution changes over time and
Rules are complex or difficult to describe, part-of-speech tagging, in which new words programs need to adapt to new data
for example, speech recognition. or word meanings can be generated at any constantly, for example, sales trend
time. forecast.
8 Huawei Confidential
When to Use Machine Learning (2)
High
Manual Machine learning
rules algorithms
Complexity of
rules
Simple Rule-based
Low questions algorithms
Small Large
Scale of the problem
9 Huawei Confidential
Rationale of Machine Learning Algorithms
Target equation
𝑓: 𝑋 → 𝑌
Ideal
Actual
Training data Learning Hypothesis function
𝐷: {(𝑥1 , 𝑦1 ) ⋯ , (𝑥𝑛 , 𝑦𝑛 )} algorithm 𝑔≈𝑓
⚫ The objective function 𝑓 is unknown, and the learning algorithm cannot obtain a
perfect function 𝑓.
⚫ Hypothesis function 𝑔 approximates function 𝑓, but may be different from function
𝑓.
10 Huawei Confidential
Main Problems Solved by Machine Learning
⚫ Machine learning can solve many types of tasks. Three most common types are:
◼ Classification: To specify a specific one of the k categories for the input, the learning algorithm usually outputs a function
𝑓: 𝑅 𝑛 → (1,2, … , 𝑘) . For example, image classification algorithms in computer vision solve classification tasks.
◼ Regression: The program predicts the output for a given input. The learning algorithms usually output a function 𝑓: 𝑅 𝑛 → 𝑅.
Such tasks include predicting the claim amount of a policy holder to set an insurance premium or predicting the security price.
◼ Clustering: Based on internal similarities, the program groups a large amount of unlabeled data into multiple classes. Same-
class data is more similar than data across classes. Clustering tasks include search by image and user profiling.
⚫ Classification and regression are two major types of prediction tasks. The output of classification is discrete class
values, and the output of regression is continuous values.
11 Huawei Confidential
Contents
12 Huawei Confidential
Types of Machine Learning
⚫ Supervised learning: The program takes a known set of samples and trains an optimal model to generate
predictions. Then, the trained model maps all inputs to outputs and performs simple judgment on the outputs. In
this way, unknown data is classified.
⚫ Unsupervised learning: The program builds a model based on unlabeled input data. For example, a clustering model
groups objects based on similarities. Unsupervised learning algorithms model the highly similar samples, calculate
the similarity between new and existing samples, and classify new samples by similarity.
⚫ Semi-supervised learning: The program trains a model through a combination of a small amount of labeled data and
a large amount of unlabeled data.
⚫ Reinforcement learning: The learning systems learn behavior from the environment to maximize the value of reward
(reinforcement) signal function. Reinforcement learning differs from supervised learning of connectionism in that,
instead of telling the system the correct action, the environment provides scalar reinforcement signals to evaluate its
actions.
⚫ Machine learning evolution is producing new machine learning types, for example, self-supervised learning,
contrastive learning, generative learning.
13 Huawei Confidential
Supervised Learning
Supervised learning
Feature 1 ······ Feature n Target
algorithm
Suitable for
Weather Temperature Wind Speed
Exercise
Sunny High High
Yes
Rainy Low Medium
No
Sunny Low Low
Yes
15 Huawei Confidential
Supervised Learning - Regression
⚫ Regression reflects the features of sample attributes in a dataset. A function is used to express
the sample mapping relationship and further discover the dependency between attributes.
Examples include:
◼ How much money can I make from stocks next week?
◼ What will the temperature be on Tuesday?
Monday Tuesday
38° ?
16 Huawei Confidential
Supervised Learning - Classification
⚫ Classification uses a classification model to map samples in a dataset to a given category.
◼ What category of garbage does the plastic bottle belong to?
◼ Is the email a spam?
17 Huawei Confidential
Unsupervised Learning
Data features
18 Huawei Confidential
Unsupervised Learning - Clustering
⚫ Clustering uses a clustering model to classify samples in a dataset into several categories based
on similarity.
◼ Defining fish of the same species.
◼ Recommending movies for users.
19 Huawei Confidential
Semi-supervised Learning
Semi-supervised learning
Feature 1 ······ Feature n Unknown
algorithm
Model
𝑟𝑡+1
𝑠𝑡+1 Environment
21 Huawei Confidential
Reinforcement Learning - Best Action
⚫ Reinforcement learning always tries to find the best action.
◼ Autonomous vehicles: The traffic lights are flashing yellow. Should the vehicle brake or accelerate?
◼ Robot vacuum: The battery level is 10%, and a small area is not cleaned. Should the robot continue cleaning or
recharge?
22 Huawei Confidential
Contents
23 Huawei Confidential
Machine Learning Process
Feature Model
Data Data Model Model
extraction and deployment and
preparation cleansing training evaluation
selection integration
Feedback and
iteration
24 Huawei Confidential
Machine Learning Basic Concept - Dataset
⚫ Dataset: collection of data used in machine learning tasks, where each piece of data is called a
sample. Items or attributes that reflect the presentation or nature of a sample in a particular
aspect are called features.
Training set: dataset used in the training process, where each sample is called a training sample.
Learning (or training) is the process of building a model from data.
Test set: dataset used in the testing process, where each sample is called a test sample. Testing refers
to the process, during which the learned model is used for prediction.
25 Huawei Confidential
Data Overview
⚫ Typical dataset composition
4 80 9 Southeast 1100
26 Huawei Confidential
Importance of Data Processing
⚫ Data is crucial to models and determines the scope of model capabilities. All good models
require good data.
Data cleansing
Data Data
standardization
preprocessing
Fill in missing values, Standardize data to
and detect and reduce noise and
eliminate noise and improve model
other abnormal points accuracy
Data dimension
reduction
Simplify data
attributes to avoid the
curse of
dimensionality
27 Huawei Confidential
Data Cleansing
⚫ Most machine learning models process features, which are usually numeric representations of
input variables that can be used in the model.
⚫ In most cases, only preprocessed data can be used by algorithms. Data preprocessing involves
the following operations:
◼ Data filtering
◼ Data loss handling
◼ Handling of possible error or abnormal values
◼ Merging of data from multiple sources
◼ Data consolidation
29 Huawei Confidential
Dirty Data
⚫ Raw data usually contains data quality problems:
◼ Incompleteness: Incomplete data or lack of relevant attributes or values.
◼ Noise: Data contains incorrect records or abnormal points.
◼ Inconsistency: Data contains conflicting records.
Missing value
Invalid value
Misfielded value
30 Huawei Confidential
Data Conversion
⚫ Preprocessed data needs to be converted into a representation suitable for machine learning models.
The following are typically used to convert data:
◼ Encoding categorical data into numerals for classification
◼ Converting numeric data into categorical data to reduce the values of variables (for example, segmenting age
data)
◼ Other data:
◼ Embedding words into text to convert them into word vectors (Typically, models such as word2vec and BERT are used.)
◼ Image data processing, such as color space conversion, grayscale image conversion, geometric conversion, Haar-like
features, and image enhancement
◼ Feature engineering:
◼ Normalizing and standardizing features to ensure that different input variables of a model fall into the same value range
◼ Feature augmentation: combining or converting the existing variables to generate new features, such as averages.
31 Huawei Confidential
Necessity of Feature Selection
⚫ Generally, a dataset has many features, some of which may be unnecessary or irrelevant to the
values to be predicted.
⚫ Feature selection is necessary in the following aspects:
Simplifies
models for Shortens
easy training time
interpretation
Improves
Avoids the model
curse of generalization
dimensionality and avoids
overfitting
32 Huawei Confidential
Feature Selection Methods - Filter
⚫ Filter methods are independent of models during feature selection.
33 Huawei Confidential
Feature Selection Methods - Wrapper
⚫ Wrapper methods use a prediction model to score a feature subset.
34 Huawei Confidential
Feature Selection Methods - Embedded
⚫ Embedded methods treat feature selection as a part of the modeling process.
Common method:
Embedded method process
• LASSO regression
35 Huawei Confidential
Supervised Learning Example - Learning Phase
⚫ Use a classification model to determine whether a person is a basketball player based on
specific features.
Features (attributes) Target (label)
38 Huawei Confidential
What Is a Good Model?
• Generalization
The accuracy of predictions based on actual data
• Explainability
Predicted results are easy to explain
• Prediction speed
The time needed to make a prediction
39 Huawei Confidential
Model Effectiveness (1)
⚫ Generalization capability: Machine learning aims to ensure models perform well on new
samples, not just those used for training. Generalization capability, also called robustness, is
the extent to which a learned model can be applied to new samples.
⚫ Error is the difference between the prediction of a learned model on a sample and the actual
result of the sample.
◼ Training error is the error of the model on the training set.
◼ Generalization error is the error of the model on new samples. Obviously, we prefer a model
with a smaller generalization error.
⚫ Underfitting: The training error is large.
⚫ Overfitting: The training error of a trained model is small while the generalization error is large.
40 Huawei Confidential
Model Effectiveness (2)
⚫ Model capacity, also known as model complexity, is the capability of the model to fit various functions.
◼ With sufficient capacity to handle task complexity and training data volumes, the algorithm results are optimal.
◼ Models with an insufficient capacity cannot handle complex tasks because underfitting may occur.
◼ Models with a large capacity can handle complex tasks, but overfitting may occur when the capacity is greater
than the amount required by a task.
Underfitting: Overfitting:
features not learned Good fitting
noises learned
41 Huawei Confidential
Cause of Overfitting - Errors
⚫ Prediction error = Bias2 + Variance + Ineliminable error
⚫ In general, the two main factors of prediction error are variance and
Variance
bias.
⚫ Variance: Bias
42 Huawei Confidential
Variance and Bias
⚫ Different combinations of variance and bias are as
follows:
◼ Low bias & low variance ➜ good model
◼ Low bias & high variance ➜ inadequate model
◼ High bias & low variance ➜ inadequate model
◼ High bias & high variance ➜ bad model
⚫ An ideal model can accurately capture the rules in the
training data and be generalized to invisible (new)
data. However, it is impossible for a model to complete
both tasks at the same time.
43 Huawei Confidential
Complexity and Errors of Models
⚫ The more complex a model is, the smaller its training error is.
⚫ As the model complexity increases, the test error decreases before increasing again, forming a
convex curve.
Test error
Error
Training error
Model complexity
44 Huawei Confidential
Performance Evaluation of Machine Learning - Regression
⚫ Mean absolute error (MAE). An MAE value closer to 0 indicates the model fits the training data
better.
1 m
MAE = yi − yi
m i =1
⚫ Mean squared error (MSE).
2
1 m
MSE = ( yi − yi )
m i =1
⚫ The value range of 𝑅2 is [0,1]. A larger value indicates that the model fits the training data
better. 𝑇𝑆𝑆 indicates the difference between samples, and 𝑅𝑆𝑆 indicates the difference
between the predicted values and sample values.
m 2
RSS ( yi − yi )
R = 1−
2
= 1 − i =1 2
TSS m
( y
i =1
i − yi )
45 Huawei Confidential
Performance Evaluation of Machine Learning - Classification (1)
⚫ Terms:
◼ 𝑃: positive, indicating the number of real positive cases in the data. Predicted
◼ 𝑁: negative, indicating the number of real negative cases in the data. Yes No Total
Actual
◼ 𝑇P : true positive, indicating the number of positive cases that are correctly
Yes 𝑇𝑃 𝐹𝑁 𝑃
classified.
◼ 𝑇𝑁: true negative, indicating the number of negative cases that are correctly No 𝐹𝑃 𝑇𝑁 𝑁
classified.
Total 𝑃′ 𝑁′ 𝑃+𝑁
◼ 𝐹𝑃: false positive, indicating the number of positive cases that are incorrectly
classified.
Confusion matrix
◼ 𝐹𝑁: false negative, indicating the number of negative cases that are incorrectly
classified.
⚫ The confusion matrix is an 𝑚 × 𝑚 table at minimum. The entry 𝐶𝑀𝑖,𝑗 in the first 𝑚 rows and 𝑚 columns indicates the number of
cases that belong to class 𝑖 but are labeled as 𝑗.
For classifiers with high accuracy, most of the cases should be represented by entries on the diagonal of the confusion matrix from 𝐶𝑀1,1 to
𝐶𝑀𝑚,𝑚 ,while other entries are 0 or close to 0. That is, 𝐹𝑃 and 𝐹𝑁 are close to 0.
46 Huawei Confidential
Performance Evaluation of Machine Learning - Classification (2)
Measurement Formula
𝑇𝑃 + 𝑇𝑁
Accuracy, recognition rate
𝑃+𝑁
𝐹𝑃 + 𝐹𝑁
Error rate, misclassification rate
𝑃+𝑁
True positive rate, sensitivity, 𝑇𝑃
recall 𝑃
𝑇𝑁
True negative rate, specificity
𝑁
𝑇𝑃
Precision
𝑇𝑃 + 𝐹𝑃
𝐹1 value, harmonic mean of 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
precision and recall 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
47 Huawei Confidential
Performance Evaluation of Machine Learning - Example
⚫ In this example, an ML model was trained to identify an image of a cat. To evaluate the model's
performance, 200 images were used, of which 170 of them were cats.
⚫ The model reported that 160 images were cats.
𝑇𝑃 140
Precision: 𝑃 = = = 87.5% Predicted
𝑇𝑃+𝐹𝑃 140+20
𝒚𝒆𝒔 𝒏𝒐 Total
𝑇𝑃 140 Actual
Recall: 𝑅 = = = 82.4%
𝑃 170
𝑦𝑒𝑠 140 30 170
𝑇𝑃+𝑇𝑁 140+10
Accuracy: 𝐴𝐶𝐶 = = = 75% 𝑛𝑜 20 10 30
𝑃+𝑁 170+30
48 Huawei Confidential
Contents
49 Huawei Confidential
Machine Learning Training Methods - Gradient Descent (1)
⚫ This method uses the negative gradient direction
of the current position as the search direction,
which is the fastest descent direction of the
current position. The formula is as follows:
⚫
w k +1 = w k − f w ( x i
)
𝜂 is the learning rate. 𝑖 indicates the 𝑖-th data
k
50 Huawei Confidential
Parameters and Hyperparameters
⚫ A model contains not only parameters but also hyperparameters. Hyperparameters enable the
model to learn the optimal configurations of the parameters.
◼ Parameters are automatically learned by models.
◼ Hyperparameters are manually set. Parameters are
"distilled" from data.
Model
Train
Use hyperparameters to
control training
53 Huawei Confidential
Hyperparameters
• Commonly used for model parameter • λ of Lasso/Ridge regression
estimation. • Learning rate, number of iterations,
• Specified by the user. batch size, activation function, and
• Set heuristically. number of neurons of a neural network
• Often tuned for a given predictive to be trained
modeling problem. • 𝐶 and 𝜎 of support vector machines
(SVMs)
• k in the k-nearest neighbors (k-NN)
algorithm
• Number of trees in a random forest
Hyperparameters are
Common
configurations outside
hyperparameters
the model.
54 Huawei Confidential
Hyperparameter Search Process and Methods
1. Divide a dataset into a training set, validation set, and test set.
2. Optimize the model parameters using the training set based on the model
performance metrics.
3. Search for model hyperparameters using the validation set based on model
Hyperparameter performance metrics.
search general 4. Perform step 2 and step 3 alternately until model parameters and
process hyperparameters are determined, and assess the model using the test set.
•Grid search
•Random search
•Heuristic intelligent search
Search algorithms •Bayesian search
(step 3)
55 Huawei Confidential
Hyperparameter Tuning Methods - Grid Search
Hyperparameter 1
4
Hyperparameter 2
56 Huawei Confidential
Hyperparameter Tuning Methods - Random Search
⚫ If the hyperparameter search space is large, random search is
more appropriate than grid search.
Random search
⚫ In a random search, each setting item is sampled from possible
parameter values to find the most appropriate parameter
subset.
Hyperparameter 1
⚫ Note:
◼ In a random search, a search is first performed within a broad range, and
then the range is narrowed based on the location of the best result.
◼ Some hyperparameters are more important than others and affect random
search preferences.
Hyperparameter 2
57 Huawei Confidential
Cross-Validation (1)
⚫ Cross-validation is a statistical analysis method used to check the performance of classifiers. It splits the original
data into the training set and validation set. The former is used to train a classifier, whereas the latter is used to
evaluate the classifier by testing the trained model.
⚫ k-fold cross-validation (k-fold CV):
◼ Divides the original data into 𝑘 (usually equal-sized) subsets.
◼ Each unique group is treated as a validation set, and the remaining 𝑘 − 1 groups are treated as the training set.
In this way, 𝑘 models are obtained.
◼ The average classification accuracy score of the 𝑘 models on the validation set is used as the performance metric
for k-fold CV classifiers.
58 Huawei Confidential
Cross-Validation (2)
Full dataset
59 Huawei Confidential
Contents
60 Huawei Confidential
Machine Learning Algorithm Overview
Machine learning
Naive Bayes
61 Huawei Confidential
Linear Regression (1)
⚫ Linear regression uses the regression analysis of mathematical statistics to determine the
quantitative relationship between two or more variables.
⚫ Linear regression is a type of supervised learning.
62 Huawei Confidential
Linear Regression (2)
⚫ The model function of linear regression is as follows, where 𝑤 is the weight parameter, 𝑏 is the bias, and 𝑥
represents the sample:
hw ( x) = w x + b
T
⚫ The relationship between the value predicted by the model and the actual value is as follows, where 𝑦 indicates the
actual value, and 𝜀 indicates the error:
y = w x +b+
T
⚫ The error 𝜀 is affected by many independent factors. Linear regression assumes that the error 𝜀 follows normal
distribution. The loss function of linear regression can be obtained using the normal distribution function and
maximum likelihood estimation (MLE):
1
J ( w) = ( w − )
2
h ( x ) y
2m
⚫ We want the predicted value approaches the actual value as far as possible, that is, to minimize the loss value. We
can use a gradient descent algorithm to calculate the weight parameter 𝑤 when the loss function reaches the
minimum, thereby complete model building.
63 Huawei Confidential
Linear Regression Extension - Polynomial Regression
⚫ Polynomial regression is an extension of linear regression. Because the complexity of a dataset
exceeds the possibility of fitting performed using a straight line (obvious underfitting occurs if
the original linear regression model is used), polynomial regression is used.
hw ( x ) = w1 x + w2 x 2 + + wn x n + b
Here, 𝑛-th power indicates the degree of the
polynomial.
65 Huawei Confidential
Logistic Regression (1)
⚫ The logistic regression model is a classification model used to resolve classification problems. The model
is defined as follows:
𝑒 −(𝑤𝑥+𝑏)
𝑃 𝑌=0𝑥 =
1 + 𝑒 −(𝑤𝑥+𝑏)
1
𝑃 𝑌=1𝑥 =
1 + 𝑒 −(𝑤𝑥+𝑏)
𝑤 represents the weight, 𝑏 represents the bias, and 𝑤𝑥 + 𝑏 represents a linear function with respect to 𝑥.
Compare the preceding two probability values. 𝑥 belongs to the type with a larger probability value.
66 Huawei Confidential
Logistic Regression (2)
⚫ Logistic regression and linear regression are both linear models in broad sense. The former
introduces a non-linear factor (sigmoid function) on the basis of the latter and sets a threshold.
Therefore, logistic regression applies to binary classification.
⚫ According to the model function of logistic regression, the loss function of logistic regression
can be calculated through maximum likelihood estimation as follows:
1
J ( w) = - ( y ln hw ( x) + (1 − y ) ln(1 − hw ( x)) )
m
⚫ In the formula, 𝑤 indicates the weight parameter, 𝑚 indicates the number of samples, 𝑥
indicates the sample, and 𝑦 indicates the actual value. You can also obtain the values of all the
weight parameters 𝑤 by using a gradient descent algorithm.
67 Huawei Confidential
Decision Tree
⚫ Each non-leaf node of the decision tree denotes a test on an attribute; each branch represents the output of a test;
and each leaf (or terminal) node holds a class label. The algorithm starts at the root node (topmost node in the
tree), tests the selected attributes on the intermediate (internal) nodes, and generates branches according to the
output of the tests. Then, it saves the class labels on the leaf nodes as the decision results.
Root node
Small Large
Short- Long-
It could be a nosed nosed It could be a
It could be a
squirrel. giraffe.
rat.
Stays on Stays in It could be an
land water elephant.
Root node
Subnode
Subnode
72 Huawei Confidential
Key to Decision Tree Construction
⚫ A decision tree requires feature attributes and an appropriate tree structure. The key step of
constructing a decision tree is to divide data of all feature attributes, compare the result sets in terms of
purity, and select the attribute with the highest purity as the data point for dataset division.
⚫ Purity is measured mainly through the information entropy and GINI coefficient. The formula is as
follows:
K K
H ( X )= - pk log 2 ( pk ) Gini = 1 − pk2
k =1 2 k =1 2
𝑚𝑖𝑛𝑗,𝑠 [𝑚𝑖𝑛𝑐1 𝑦𝑖 − 𝑐1 + 𝑚𝑖𝑛𝑐2 𝑦𝑖 − 𝑐2 ]
𝑥𝑖 ∈𝑅1 𝑗,𝑠 𝑥𝑖 ∈𝑅2 𝑗,𝑠
⚫ 𝑝𝑘 indicates the probability that a sample belongs to category 𝑘 (in a total of K categories). A larger purity difference
between the sample before and after division indicates a better decision tree.
⚫ Common decision tree algorithms include ID3, C4.5, and CART.
73 Huawei Confidential
Decision Tree Construction Process
⚫ Feature selection: Select one of the features of the training data as the split standard of the
current node. (Different standards distinguish different decision tree algorithms.)
⚫ Decision tree generation: Generate subnodes from top down based on the selected feature
and stop until the dataset can no longer be split.
⚫ Pruning: The decision tree may easily become overfitting unless necessary pruning (including
pre-pruning and post-pruning) is performed to reduce the tree size and optimize its node
structure.
74 Huawei Confidential
Decision Tree Example
⚫ The following figure shows a decision tree for a classification problem. The classification result is affected
by three attributes: refund, marital status, and taxable income.
75 Huawei Confidential
Support Vector Machine
⚫ Support vector machines (SVMs) are binary classification models. Their basic model is the linear classifier
that maximizes the width of the gap between the two categories in the feature space. SVMs also have a
kernel trick, which makes it a non-linear classifier. The learning algorithm of SVMs is the optimal
algorithm for convex quadratic programming.
Mapping
76 Huawei Confidential
Linear SVM (1)
⚫ How can we divide the red and blue data points with just one line?
or
Two-dimensional data set with Both the division methods on the left and right can divide
two sample categories data. But which is correct?
77 Huawei Confidential
Linear SVM (2)
⚫ We can use different straight lines to divide data into different categories. SVMs find a straight line and
keep the most nearby points as far from the line as possible. This gives the model a strong generalization
capability. These most nearby points are called support vectors.
⚫ In the two-dimensional space, a straight line is used for division; in the high-dimensional space, a
hyperplane is used for division.
Maximize the
distance from each
support vector to
the line
78 Huawei Confidential
Non-linear SVM (1)
⚫ How can we divide a linear inseparable data set?
79 Huawei Confidential
Non-linear SVM (2)
⚫ Kernel functions can be used to create non-linear SVMs.
⚫ Kernel functions allow algorithms to fit a maximum-margin hyperplane in a transformed high-
dimensional feature space.
Common kernel functions
Polynomial
Linear kernel
kernel
Gaussian Sigmoid
kernel kernel
Input space High-dimensional
feature space
80 Huawei Confidential
k-Nearest Neighbors Algorithm (1)
⚫ The k-nearest neighbor (k-NN) classification algorithm
is a theoretically mature method and one of the
simplest machine learning algorithms. The idea of k-
NN classification is that, if most of k closest samples
(nearest neighbors) of a sample in the feature space ?
belong to a category, the sample also belongs to this
category.
81 Huawei Confidential
k-Nearest Neighbors Algorithm (2)
⚫ The logic of k-NN is simple: If an object's k nearest neighbors belong to a class, so does the
object.
⚫ k-NN is a non-parametric method and is often used for datasets with irregular decision
boundaries.
◼ k-NN typically uses the majority voting method to predict classification, and uses the mean value
method to predict regression.
⚫ k-NN requires a very large amount of computing.
82 Huawei Confidential
k-Nearest Neighbors Algorithm (3)
⚫ Typically, a larger k value reduces the impact of noise on classification, but makes the boundary between
classes less obvious.
◼ A large k value indicates a higher probability of underfitting because the division is too rough; while a small k
value indicates a higher probability of overfitting because the division is too refined.
83 Huawei Confidential
Naive Bayes (1)
⚫ Naive Bayes classifiers are a family of simple "probabilistic classifiers" based on Bayes' theorem with strong
independence assumptions between the features. For a given sample feature 𝑋, the probability that the sample
belongs to category 𝐻 is:
P ( X 1 , , X n | Ck ) P ( C k )
P ( Ck | X 1 , , X n ) =
P ( X 1 , , X n )
◼ 𝑋1 , 𝑋2 , … , 𝑋𝑛 are data features, which are usually described by m measurement values of the attribute set.
◼ For example, the attribute of the color feature may be red, yellow, and blue.
84 Huawei Confidential
Naive Bayes (2)
⚫ Feature independent hypothesis example:
◼ If a fruit is red, round, and about 10 cm in diameter, it can be considered an apple.
◼ A Naive Bayes classifier believes that each of these features independently contributes to the
probability of the fruit being an apple, regardless of any possible correlation between color,
roundness, and diameter features.
85 Huawei Confidential
Ensemble Learning
⚫ Ensemble learning is a machine learning paradigm in which multiple learners are trained and combined to resolve an
issue. When multiple learners are used, the generalization capability of the ensemble can be much stronger than
that of a single learner.
⚫ For example, If you ask thousands of people at random a complex question and then summarize their answers, the
summarized answer is more accurate than an expert's answer in most cases. This is the wisdom of the crowd.
Training set
Large
Ensemble model
86 Huawei Confidential
Types of Ensemble Learning
87 Huawei Confidential
Ensemble Learning - Random Forest
⚫ Random forest = Bagging + Classification and regression tree (CART)
⚫ Random forest builds multiple decision trees and aggregates their results to make prediction more accurate and
stable.
◼ The random forest algorithm can be used for classification and regression problems.
Bootstrap sampling Build trees Aggregate results
Subset 1 Prediction 1
Subset n Prediction n
88 Huawei Confidential
Ensemble learning - Gradient Boosted Decision Tree
⚫ Gradient boosted decision tree (GBDT) is a type of boosting algorithm.
⚫ The prediction result of the ensemble model is the sum of results of all base learners. The essence of GBDT is that
the next base learner tries to fit the residual of the error function to the prediction value, that is, the residual is the
error between the prediction value and the actual value.
⚫ During GBDT model training, the loss function value of the sample predicted by the model must be as small as
possible.
Predict
30 20
Calculate
residual
Predict
10 9
Calculate
residual
Predict
1 1
89 Huawei Confidential
Unsupervised Learning - k-Means Clustering
⚫ k-means clustering takes the number of clusters k and a dataset of n objects as inputs, and outputs k
clusters with minimized within-cluster variances.
⚫ In the k-means algorithm, the number of clusters is k, and n data objects are split into k clusters. The
obtained clusters meet the following requirements: high similarity between objects in the same cluster,
and low similarity between objects in different clusters.
x1 x1
k-means clustering
k-means clustering
automatically classifies
unlabeled data.
x2 x2
90 Huawei Confidential
Unsupervised Learning - Hierarchical Clustering
⚫ Hierarchical clustering divides a dataset at different layers and forms a tree-like clustering structure. The
dataset division may use a "bottom-up" aggregation policy, or a "top-down" splitting policy. The
hierarchy of clustering is represented in a tree diagram. The root is the only cluster of all samples, and
the leaves are clusters of single samples.
91 Huawei Confidential
Summary
⚫ This course first describes the definition and types of machine learning, as well as
problems machine learning solves. Then, it introduces key knowledge points of
machine learning, including the overall procedure (data preparation, data cleansing,
feature selection, model evaluation, and model deployment), common algorithms
(including linear regression, logistic regression, decision tree, SVM, Naive Bayes, k-
NN, ensemble learning, and k-means clustering), and hyperparameters.
92 Huawei Confidential
Quiz
93 Huawei Confidential
Recommendations
⚫ Huawei Talent
https://e.huawei.com/en/talent/portal/#/
94 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.