0% found this document useful (0 votes)

13 views37 pages

ML Revision

Lecture 2 discusses key issues of machine learning, emphasizing the importance of data quality, iterative improvement, and model optimization. It covers various machine learning techniques, including supervised, unsupervised, and reinforcement learning, along with their applications in fields like healthcare and finance. Additionally, it highlights concepts such as ensemble learning, semi-supervised learning, and dimensionality reduction, along with the principles of model selection and evaluation.

Uploaded by

taphuansgkvan2019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views37 pages

ML Revision

Uploaded by

taphuansgkvan2019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Lecture 2: Key Issues of Machine Learning

1, Machine Learning: is a subdomain of computer science that focuses on algorithms which

help a computer learn from data without explicit programming. These algorithms can
identify patterns, make predictions, and perform tasks that would be difficult or impossible
for humans to do manually.

#Key concepts in machine learning:

● Data: The raw material that machine learning algorithms use to learn.
● Algorithm: The set of rules that the machine follows to learn from the data.
● Model: The output of the learning process, which can be used to make predictions or
decisions.

#Given:

● A dataset: This is a collection of data points, each consisting of two parts:

○ Description (x_i): This represents the features or attributes of an object.
○ Label (y_i): This is the corresponding output or target value associated with the
object.
● Label types: The labels can be:
○ Categorical: Belonging to a specific category (e.g., "cat", "dog").
○ Continuous: Representing a real-valued quantity (e.g., temperature).
○ Sequence or structure: Having a specific order or arrangement (e.g., text,
images).

Examples:

● Electronic medical records: Each record can be considered a data point, with features
like age, symptoms, and test results, and a label like diagnosis or treatment.

#Find:

● Predictive function (f(x)): This function takes an input feature vector (x) and predicts a
corresponding output (y). This is commonly used in supervised learning tasks like
classification or regression.
● Modeling function (h(x)): This function models the underlying distribution of the data.
It can be used to generate new data points similar to the existing ones (e.g., generative
models).
● Conditional generative function (g(z|x) or g(z|y)): This function generates new data
points or labels conditioned on latent variables (z) or input features (x). This is
commonly used in generative models like GANs and VAEs.
● Diagnosis or treatment regimen: This is a specific application of machine learning,
where the goal is to use the learned models to predict a diagnosis or recommend a
treatment plan for a patient based on their medical records.

2, Key issues of machine learning:

Principle about data quality and quantity: High-quality and sufficient quantities of
data are crucial for training effective ML models.___________
a. Data types and structure:

b. Process:
Principle about iterative improvement (cải tiến lặp đi lặp lại): Machine learning is an
iterative process involving continuous refinement (tinh chỉnh liên tục).
Principle about training and optimization: Learning process involves optimizing a loss
function to minimize prediction errors.

#The data analytics process:

1. Data Collection and Understanding:
Step 1: Develop an understanding of the purpose of the machine learningproject
2. Data Processing:
Step 2: Obtain the data to be used in the analysis
Step 3: Explore, clean, and preprocess the data
3. Data Transformation:
Step 4: Reduce the data dimension, if necessary
4. Data Analysis:
Step 5: Determine the machine learning task
Step 6: Partition the data
Step 7: Choose the machine learning techniques to be used
Step 8: Use algorithms to perform the task
5. Interpretation & Evaluation:
Step 9: Interpret the results of the algorithms
Step 10: Deploy the model
#Lost functions in machine learning: Measures how well the model’s predictions
match the actual outcomes. It’s a key component in training models, guiding the optimization
process.

#Data transformation:
Key Components:
● Oligonucleotides: These are short, single-stranded DNA or RNA molecules. They serve
as the fundamental units of the data.
● Pairwise Similarity: This measures the degree of similarity between two
oligonucleotides. It's often calculated based on factors like sequence alignment, base
composition, or structural properties.
● Matrix Representation: A matrix is a rectangular array of numbers. In this case, each
row and column represents an oligonucleotide, and the values within the matrix
correspond to the pairwise similarity between the corresponding oligonucleotides.

c. Methods:
Principle about generalization (Nguyên tắc về khái quát hóa): A model’s ability to
perform well on unseen data is more important than its performance on training data.
Principle on feature engineering (Nguyên tắc về kỹ thuật tính năng): Transforming raw
data into features that better represent the underlying patterns can improve model
performance.
Principle about bias-variance tradeoff (Nguyên tắc về sự đánh đổi giữa độ lệch và
phương sai): The balance between bias (error due to overly simplistic models) and variance
(error due to overly complex models) must be managed carefully.

#Machine learning (predictive) : View by data Labelled vs. Unlabelled data

● Labeled data is a type of dataset where each data point is paired with a corresponding
label or target variable. This label represents the desired output or outcome associated
with that data point. For instance, in an image classification task, labeled data would
consist of images along with their corresponding labels (e.g., "cat," "dog," "car").

● Unlabeled data, on the other hand, only contains the data points themselves without
any associated labels. In the same image classification example, unlabeled data would
consist of images without their corresponding labels.
#####Types of machine learning:
● Supervised learning (Học có giám sát):The algorithm is trained on a dataset with
labeled inputs, allowing it to learn a mapping between inputs and outputs.
○ Examples:
■ Classification (Phân loại): Image classification (e.g., identifying cats vs.
dogs in images); Phát hiện sai sót (fraut detection); Phân biệt email spam
(spam detection); Chẩn đoán bệnh (diagnostics)
■ Regression (Hồi quy): Đánh giá rủi ro (risk assesment), Dự đoán giá (price
prediction) (e.g., predicting house prices)

● Unsupervised learning (Học không có giám sát): The algorithm is trained on a dataset
without labels, allowing it to discover patterns and relationships within the data.
○ Examples:
■ Clustering (Phân nhóm): Sinh học (biology)Grouping of plant and animal
species based on genetic characteristics.; Khách hàng (customer
grouping) (e.g., grouping similar customers); Nhóm doanh nghiệp
(enterprise )Grouping of businesses based on revenue, size, and field of
operation
■ Dimensionality reduction (Rút gọn số chiều): Khai khá văn bản(text
mining) (e.g., simplifying complex data); Nhận dạng mặt người (face
recognition); Hiển thị dữ liệu lớn (big data visualization); Nhận dạng ảnh
(image recognition)

● Reinforcement learning (Học tăng cường): The algorithm learns by interacting with an
environment and receiving rewards or penalties based on its actions.
○ Examples include Trò chơi (Gaming), Tài chính (finance sector), Chế tạo (
manufactoring), Quản lý hàng tồn kho (inventory management), Robot (Robot
navigation).

#####Applications of machine learning:

● Natural language processing: Machine learning is used to develop chatbots,
language translation systems, and sentiment analysis tools.
● Computer vision: Machine learning is used to develop image recognition
systems, object detection systems, and autonomous vehicles.
● Recommendation systems: Machine learning is used to develop
recommendation systems for products, movies, and music.
● Healthcare: Machine learning is used to develop medical diagnosis systems,
drug discovery systems, and personalized medicine.
● Finance: Machine learning is used to develop fraud detection systems, credit
scoring systems, and algorithmic trading systems.

#kernel methods:
Kernel Methods are a powerful technique in machine learning, particularly utilized in models
like Support Vector Machines (SVMs). The primary objective of Kernel Methods is to address
non-linear data by transforming it into a higher-dimensional feature space where the data
becomes linearly separable, thus simplifying classification or regression tasks.

Kernel Method là một kỹ thuật quan trọng trong học máy, đặc biệt được sử dụng nhiều trong
các mô hình như Support Vector Machines (SVM). Mục đích chính của Kernel Method là xử lý
các dữ liệu không tuyến tính (non-linear data) bằng cách chuyển đổi chúng sang một không
gian có độ chiều cao hơn, nơi dữ liệu có thể trở nên tuyến tính và dễ dàng phân loại hơn.

1. Tại sao cần Kernel Method?

When data cannot be easily classified or predicted in its original space (e.g., using linear
models), Kernel Methods facilitate the mapping of data into an alternative feature space where
linear models can be effectively applied. For instance, in a 2D space, data might not be linearly
separable using a straight line. However, by projecting the data into a 3D space, a plane can be
used to easily separate the classes.

#Probabilistic graphical models (mô hình đồ họa xác suất): A probabilistic graphical
model (graphical model) is a way of representing probabilistic relationships between random
variables (bring graph theory and probability theory in a powerful formalism for multivariate
statistical modeling)

#Ensemble learning: Ensemble methods employ multiple learners and combine their
predictions to achieve higher performance than that of a single learner.
• Boosting: Make examples currently misclassified more important
● Kỹ thuật này tạo ra nhiều mô hình bằng cách sử dụng các tập con khác nhau từ
dữ liệu huấn luyện (lấy mẫu lại với phép thay thế, tức là cùng một mẫu có thể
xuất hiện nhiều lần trong các tập con).
● Mỗi mô hình được huấn luyện độc lập, và kết quả dự đoán của các mô hình sẽ
được kết hợp lại (thường là bằng cách lấy trung bình đối với bài toán hồi quy
hoặc lấy đa số phiếu đối với bài toán phân loại).
● Mục đích Giảm phương sai (variance) của mô hình và cải thiện tính ổn định.
○ This technique creates multiple models by using different subsets of the
training data (sampling with replacement, meaning the same sample can
appear multiple times in different subsets). Each model is trained
independently, and the predictions of the models are combined (usually
by taking the average for regression problems or by majority voting for
classification problems).Purpose: To reduce the variance of the model
and improve stability.
• Bagging: Use different subsets of the training data for each model (Random Forest)
● Kỹ thuật này huấn luyện một chuỗi các mô hình, trong đó mỗi mô hình
mới cố gắng sửa các lỗi của mô hình trước đó.
● Boosting chú trọng vào các điểm dữ liệu bị phân loại sai hoặc có lỗi cao ở
các lần trước và gán trọng số lớn hơn cho chúng trong các lần huấn luyện
sau.
● Mục đích Giảm độ chệch (bias) của mô hình.

This technique trains a sequence of models, where each new model tries to
correct the errors of the previous models. Boosting focuses on the data points that were
misclassified or had high errors in previous iterations and assigns higher weights to
them in subsequent training rounds.Purpose: To reduce the bias of the model.

######Advantages of Ensemble learning:

● Improve accuracy: By combining multiple models, we can reduce errors and increase
prediction accuracy.
● Reduce overfitting: Ensemble learning helps reduce overfitting, where a model learns
too much from the training data and fails to generalize well to new data.
● Increase stability: Models in an ensemble are often trained on different datasets or
using different algorithms, which helps reduce dependence on a particular model.
● Flexible: Can combine many different types of models.

######Disadvantages of Ensemble learning:

● Complexity: More time-consuming and computationally expensive than individual
models.
● Difficult to interpret: Ensemble models are often more complex and difficult to interpret
the results.

# Semi-supervised learning (Học Máy Bán Giám sát ): A class of machine learning
techniques that make use of both labeled and unlabeled data for training - typically a small
amount of labeled data with a large amount of unlabeled data.

Tại sao lại cần học bán giám sát? Trong nhiều trường hợp thực tế, việc thu thập dữ liệu
có nhãn là một quá trình tốn kém và mất thời gian. Trong khi đó, dữ liệu không nhãn lại rất dễ
thu thập. Học bán giám sát giúp chúng ta tận dụng tối đa lượng dữ liệu sẵn có, đặc biệt là khi
lượng dữ liệu có nhãn rất hạn chế.

How it works:

● Labeled data: Helps the model learn the relationships between features and labels.
● Unlabeled data: Helps the model learn the structure of the data, the probability
distribution of the data, and the underlying relationships between data points.

Classes of SSL methods

● Generative models: Sử dụng các mô hình sinh để mô hình hóa phân bố xác suất của dữ
liệu và tạo ra dữ liệu mới.
● Graph-based methods: Xây dựng đồ thị kết nối các điểm dữ liệu, sau đó sử dụng thông
tin từ đồ thị để truyền nhãn từ dữ liệu có nhãn sang dữ liệu không nhãn.
● Low-density separation: Các lớp khác nhau thường được phân tách bởi các vùng có mật
độ điểm dữ liệu thấp.
● Change of representation

######Advantages of semi-supervised learning:

● Maximizes data utilization: Improves model performance when labeled data is limited.
● Flexibility: Applicable to a wide range of problems.

######Disadvantages of semi-supervised learning:

● Reliance on assumptions: The effectiveness of semi-supervised learning methods

depends on the accuracy of assumptions about the data.
● Complexity: Semi-supervised learning algorithms are often more complex than
supervised learning algorithms.

######Applications: Semi-supervised learning is widely applied in various fields, such

as:

● Natural language processing: Labeling words, sentences, and paragraphs.

● Image processing: Object recognition, image classification.
● Bioinformatics: Classifying genes and proteins.

#Dimensionality reduction: The process of reducing the number of random variables

under consideration, and can be divided into feature selection (select a subset of features) and
feature extraction (create new features from existting features).
d. Model Selection:
Principle about model selection and complexity: The choice of model and its
complexity should match the problem’s complexity.
Principle about regularization (Nguyên tắc về chính quy hóa): Those techniques help
prevent overfitting by penalizing overly complex models.
Principle about ethics and fairness: ML models should be designed with fairness
and ethical considerations in mind.
Principle about evaluation and validation: Continuous evaluation using validation data
is necessary to monitor and improve model performance

#Model selection:
In machine learning (ML), a model is a mathematical function (parametric collection of
probability distributions) or algorithm that maps input data (features) to output data
(predictions or decisions).

#Some key concepts and issues:

● Overfitting: Overfitting occurs if model is too large, complicated, too many parameters.
● Regularization: A technique to prevent overfitting by adding a penalty to the loss
function, discouraging the model from becoming overly complex.
Lecture 6: K-NN and Naive Bayes Classifers

1, Classification vs Prediction:

Classification: In classification problems, the response variable is discrete or categorical. This

means it can take on a finite(hữu hạn) number of values, often representing different classes or
categories.

Examples:

● Email spam detection: Classifying emails as either "spam" or "not spam."

● Image recognition: Classifying images as "cat," "dog," or "other."
● Customer churn prediction: Predicting whether a customer will churn (leave) or
stay.

Prediction: In prediction problems, the response variable is continuous. This means it can take
on any value within a given range.

Examples:

● Stock price prediction: Predicting the future price of a stock.

● Sales forecasting: Predicting the amount of sales a company will make in the
next quarter.
● Weather forecasting: Predicting the temperature or rainfall for a given location.

#Classification/Prediction – a two-step process

● Model construction: involves creating a model based on labeled data.

○ Each tuple/object is assumed to belong to a predefined class, as determined by
the class label attribute
○ The set of tuples used for model construction: training set. Using a subset of
the data (the training set) to teach the model the patterns and relationships
between the features and the class labels.
○ The model is represented as classification rules, decision trees, or mathematical
formulae that can be used to make predictions or classifications.
● Model usage involves using the model to classify future or unknown objects

Estimate accuracy of the model:

○ The known label of test object is compared with the classified result from the
model
○ Accuracy rate is the percentage of test set objects that are correctly classified by
the model
○ Test set is independent of training set, otherwise over-fitting will occur
(Overfitting occurs when the model becomes too complex and fits the training
data too closely, leading to poor performance on new, unseen data.)

# Criteria for classification methods:

● Predictive accuracy: the ability of the classifier to correctly predict unseen data
● Speed: refers to computation cost
● Robustness: the ability of the classifier to make correctly predictions given noisy data or
data with missing values
● Scalability: the ability to construct the classifier efficiently given large amounts of data
● Interpretability: the level of understanding and insight that is provided by the classifier

#Performance measures of classifiers:

=> If many (thousands) of examples are available, including several hundred examples
from each class, we evaluate our classifier method by randomly split data into training and
test sets (usually 2/3 for train,1/3 for test). Build a classifier using the train set and
evaluate it using the test set.
2, The k-NN classifier:

a. Cross-validation: Is a statistical method used to assess the performance of a machine

learning model on a dataset. Instead of dividing the data into a single training and
testing set as is traditionally done, cross-validation involves partitioning the data into
multiple subsets. Each subset is used as a testing set once, while the remaining subsets
are used as the training set. This process helps to reduce overfitting and provides a
more accurate estimate of the model's performance on unseen data.

● K-fold cross-validation:
○ First step: The dataset is divided into k equal-sized partitions (phân vùng).
○ Second step: The model is trained on k-1 partitions and evaluated on the
remaining partitions. This process is repeated k times, and the results are
averaged.
..stratified: phân tầng

b. The k-NN classifier:

● k-Nearest Neighbor (k-NN) algorithm can be used for classification (of a categorical
outcome) or prediction (of a numerical outcome)
● To classify or predict a new object, the method determines the similarity between a
new object and the existing objects in the training dataset. This similarity is typically
measured using a distance metric like Euclidean distance or Manhattan distance.
● k nearest neighbors (based on their calculated similarities) are determined and the new
object is classified into the class with the majority of nearest neighbors among k
nearest neighbors (the predominant class among these neighbors - lớp chiếm ưu thế).
○ Classification: The new object is assigned to the class that is most common
among its k nearest neighbors. For example, if the majority of the k nearest
neighbors are "cat," then the new object is likely also a "cat."
○ Prediction: In the case of numerical prediction, the average or median of the
values of the k nearest neighbors is used as the predicted value for the new
object.

#Determining Neighbors: There are various measures of distance (similarity) between

two objects, depending on the data type of their predictor variables. If all predictor
variables are numerical, the most popular measure of distance is the Euclidean
distance

● Classification Rule:
○ Find the nearest k neighbors to the record to be classified.
○ Use a majority decision rule to classify the record, where the record is
classified as a member of the majority class of the k neighbors

● Choosing Parameter k:
○ If k is too low, we may be fitting to the noise in the data
○ If k is too high, we will miss out local structure in the data (important)
○ Typically, fall in the range 1–20, often an odd number is chosen
○ Choose the k that minimizes error rates for various choices of k
● Weighted k-NN:
○ Each neighbor’s weight is determined by the inverse (nghịch đảo) of its
distance from the record to be predicted.
○ Given d1, d2, ... , dk distances to each of the k neighbors, the weight for
neighbor i is computed as

#Advantages and Disadvantages of k-NN algorithms:

● Advantages:
○ Simplicity: k-NN is a straightforward algorithm that is easy to
understand and implement.
○ Non-parametric: k-NN is a non-parametric algorithm, meaning it
doesn't make assumptions about the underlying data distribution.
○ Perform well for mixed data (for instance, in real estate databases
there are {home type, number of rooms, neighborhood, asking
price, etc.}
● Disadvantage:
○ Computational complexity: For large datasets, finding the k
nearest neighbors can be computationally expensive, especially
for high-dimensional data.
○ Hard for real-time prediction

2, Bayesians in machine learning:

● Leverages Bayesian probability to construct models. Unlike traditional approaches that

seek a single optimal solution, Bayesian methods determine a probability
distribution(phân phối xác suất) over all possible solutions, allowing us to quantify
the uncertainty in our predictions.
○ Handling uncertainty: Real-world data is often noisy or incomplete. Bayesian
methods provide a principled way to model this uncertainty.
○ Making probabilistic predictions: Instead of providing a single point estimate,
Bayesian methods yield a probability distribution over possible outcomes,
allowing us to assess the confidence (độ tin cậy) of our predictions.
● Core Concepts:
○ Prior distribution: Represents our beliefs about the model parameters before
observing the data. (Phân phối trước: Đây là những gì chúng ta biết về các tham
số của mô hình trước khi xem xét dữ liệu)
○ Likelihood: The probability of observing the data given the model parameters. It
reflects how likely the data is based on different model assumptions. (Phân phối
khả năng: Đây là xác suất quan sát được dữ liệu, giả sử chúng ta biết các tham
số) (khả năng xảy ra của e nếu như assume là H đúng)
○ Posterior distribution: The updated beliefs about the model parameters after
observing the data. This is what we are ultimately interested in (Phân phối hậu:
Đây là những gì chúng ta biết về các tham số sau khi xem xét dữ liệu. Đây là kết
quả mà chúng ta quan tâm nhất) (xác suất của hypothesis H/ sự xuất hiện của
class H given the data e/feature e)

Where:

● P(H∣e): The posterior probability, or the probability of hypothesis H being true given
the data e.
● P(e∣H): The likelihood, or the probability of the data given the hypothesis.
● P(H): The prior probability, or the initial belief about the hypothesis before observing
the data.
● P(e): The marginal likelihood or the total probability of observing the data under all
possible hypotheses.

#Bayes theorem: describes the probability of a hypothesis, based on conditions that

might be related to the hypothesis.

Maximum A Posteriori (MAP) and Maximum Likelihood Hypothesis (MLH) are two widely
used criteria in parameter estimation and machine learning for finding the most probable value
of a parameter in a statistical model. Both aim to maximize a probability function, but they
differ in their approaches.

Maximum Likelihood Hypothesis (MLH)

● Core idea: MLH finds the value of the parameter that makes the observed data most
likely. In other words, it seeks the parameter that best "explains" the data.

Formula:
θ_ML = argmax_θ P(D|θ)

● Where:
○ θ: The parameter to be estimated
○ D: The observed data
○ P(D|θ): The probability of the data D given the parameter θ (likelihood function)
● Advantages:
○ Simple and intuitive.
○ Widely used in many statistical models.
● Disadvantages:
○ Can lead to overfitting when the data is noisy or the sample size is small.
○ Does not utilize prior information about the parameter.

Maximum A Posteriori (MAP)

● Core idea: MAP incorporates both the observed data and prior information about the
parameter to find the most probable value of the parameter. It treats the parameter θ as
a random variable and finds the value of θ that maximizes the posterior probability
P(θ|D).

Formula:
θ_MAP = argmax_θ P(θ|D) = argmax_θ P(D|θ)P(θ)

● Where:
○ P(θ): The prior probability of the parameter θ, representing prior knowledge
about the parameter.
● Advantages:
○ More flexible than MLH as it utilizes both data and prior information.
○ Can help reduce overfitting.
● Disadvantages:
○ The choice of prior distribution can influence the results.
○ More complex than MLH.

#Probability vs. Likelihood (khả năng xảy ra):

● Probability is used before data are available to describe possible future outcomes given
a fixed value for the parameter (or parameter vector).
○ For example: You flip a coin with probability p=0.5 of getting heads. The
probability of getting 7 heads in 10 flips is an example of probability.
● Likelihood is used after data are available to describe a function of a parameter (or
parameter vector) for a given outcome.
○ For example, if you flip a coin 10 times and get 7 heads, you do not know in
advance what the probability of the coin landing on heads is. You would then
use likelihood to estimate the likelihood of the observed data (7 heads) for
different values of the parameter p.

3, What are Bayesian classification?

● Bayesian classification is classification based on Bayes theorem.

● Naïve Bayesian classifiers (is a simple yet powerful machine learning algorithm,
commonly used in text classification, spam filtering, and multiclass classification
problems) assume that the effect of an attribute value on a given class is
independent of the values of the other attributes.
○ Ví dụ, trong bài toán phân loại email, từ "mua" và "giảm giá" có thể cùng xuất
hiện trong một email spam, nhưng chúng ta giả định rằng sự xuất hiện của từ
này không ảnh hưởng đến sự xuất hiện của từ kia khi chúng ta đã biết email đó là
spam hay không.

○ Naïve Bayesian learning makes computation possible

■ It yields optimal classifiers when independence satisfied (ạo ra các bộ
phân loại tối ưu khi tính độc lập được thỏa mãn)
■ But it is seldom satisfied in practice, as attributes (variables) are often
correlated. Attempts to overcome this limitation, among others:
● Bayesian networks, that combine Bayesian reasoning with causal
relationships between attributes
● Decision trees, that reason on one attribute at the time,
considering most important attributes first
● Bayesian belief networks (BBNs) are probabilistic graphical models (mô hình đồ họa
xác suất) that represent a set of random variables and their conditional dependencies.
They are a powerful tool for modeling complex relationships between variables,
especially when dealing with uncertainty.
○ Bayesian belief networks allow class conditional independencies to be defined
between subsets of variables
■ In a Bayesian network, conditional independence refers to how certain
variables can become independent of others. This means that given
certain information about some variables, other variables become
independent.

Example:

● Consider a BBN with three variables: A, B, and C.

● If the DAG shows an arc from A to B and an arc from A to C, but no
direct arc between B and C, then B and C are conditionally independent
given A.
● This means that knowing the value of A makes the values of B and C
independent of each other.
● First component: Network Structure (Directed Acyclic Graph - DAG- Đồ thị có
hướng không có chu trình)
○ Each node in the DAG represents a random variable.
○ Each directed arc (arrow) hướng (mũi tên) represents a probabilistic
dependence between the variables.
■ Ex: If there is an arrow from node A to node B, this means that A
influences B, or B is conditionally dependent on A.
● Second component (network structure): Conditional probability table (CPT)
○ Each node (random variable) in the network has a corresponding CPT
that specifies the probability of a variable taking on a particular value
given the values of its parent variables in the DAG.

#How These Two Components Work Together:

● The network structure (DAG) defines the relationships and conditional dependencies
between the variables. It shows which variables directly affect others.
● The CPTs provide the actual probabilities that quantify the strength of these
relationships.

These two components allow the Bayesian network to represent a complex joint probability
distribution in a compact form by factoring it into conditional probabilities.
#Benefits of Bayesian Belief Networks:

● Efficient Representation: BBNs reduce the complexity of representing joint probability

distributions by exploiting conditional independencies between variables.
● Reasoning under Uncertainty: They allow for reasoning and inference in uncertain
environments, which makes them powerful tools in decision-making systems.
● Modular: The DAG structure makes it easy to add or remove variables and update the
network as needed.

#Several cases of learning Bayesian belief networks

● Given both network structure and all the variables: easy

● Given network structure but only some variables (-> parameter learning)
● When the network structure is not known in advance (-> structure learning)

When to use Bayesian Belief Networks:

1. When there are dependencies between variables:

BBNs are useful when you want to model the dependencies between variables in a
structured way. Instead of assuming that all variables are independent (as in Naive
Bayes), BBNs allow you to model how variables affect each other.
2. When solving complex problems with multiple causes:
Bayesian networks are well-suited for modeling problems where there are multiple
causes or factors affecting an outcome. For example, in medical diagnosis, you can
model how symptoms depend on different underlying causes (such as diseases).
3. When you want to combine expert knowledge with data:
BBNs allow you to incorporate expert knowledge into the model in the form of prior
probabilities and relationships between variables. This is useful in fields like medicine,
finance, or engineering, where you can combine real-world data with domain-specific
expertise.
4. When you have missing or incomplete data:
Bayesian networks handle missing data effectively. Since BBNs model the probabilities
of variables, you can infer the values of missing variables based on the remaining
observed variables.
5. When you need prediction or inference:
BBNs are widely used for inference and prediction tasks, such as calculating the
probability of an event based on other known events. For example, if you know the
values of some variables (i.e., certain events have occurred), you can compute the
probabilities of other variables using a Bayesian network.
6. When making decisions in uncertain environments:
BBNs support decision-making in uncertain environments by calculating conditional
probabilities based on available information. This is useful in various applications like
risk management, system diagnostics, and supply chain management.
Example applications of Bayesian Belief Networks:

● Medicine: Modeling relationships between symptoms and diseases to assist in

diagnosis.
● Finance: Predicting market risks and financial events by modeling risk factors.
● Recommendation systems: Predicting user preferences based on past behavior and
relationships between different options.

When to use K-Nearest Neighbors (K-NN):

1. No assumption about data distribution is required: K-NN is a non-parametric method,

meaning it doesn't assume any specific distribution for the data. If you have no prior
knowledge about the distribution, K-NN is a good choice.
2. When the data has few features and is of moderate size: K-NN works well when the
data has few features because it relies on calculating the distance between data points.
With too many features, it may face the "curse of dimensionality."
3. When simplicity and ease of implementation are needed: K-NN is one of the simplest
algorithms to implement. It works by calculating the distance between data points and
finding the nearest neighbors.
4. When the data allows easy distance computation: K-NN requires you to compute the
distance between points. If a distance metric (such as Euclidean distance) makes sense
for your data, K-NN is a feasible option.
5. Classification or regression problems: K-NN can be used for both classification and
regression problems. In classification, it predicts the class of a new point based on
nearby points, while in regression, it calculates the average value of the nearest points.

When to use Bayes' Theorem (Bayesian Classification):

1. Data has a clear probabilistic distribution: Bayes' Theorem works well when the data
can be modeled using probability distributions, especially when you have prior
knowledge about the probabilities of the classes.
2. Classification problems with independent features: Naive Bayes, a popular variant of
Bayes' Theorem, assumes that the features of the data are independent, which
simplifies calculations. Although this assumption may not be entirely true, Naive Bayes
still performs well in many real-world datasets.
3. When a fast and efficient algorithm is needed for large datasets: Naive Bayes is very
fast because it only requires the computation of probabilities of features and classes,
making it scalable for large datasets.
4. Problems with imbalanced class distributions: Bayes' Theorem handles imbalanced
data well, where one class has significantly fewer examples than another. It allows the
integration of prior knowledge about the frequency of the classes.
5. Text classification or natural language processing problems: Naive Bayes is
commonly used in text classification, such as spam email detection or document topic
categorization, as it works well with discrete data and is scalable to large datasets.

Summary:

● Use K-NN when you have small datasets, lack prior knowledge of the distribution, and
don't need to make assumptions about the data.
● Use Bayes' Theorem when you can model the data probabilistically and need a fast
method, especially for imbalanced or large datasets

Lecture 7: Decision Tree Learning

Decision Tree Learning is a supervised machine learning technique used for both classification
and regression tasks. It constructs a decision tree to represent a set of rules that can be used to
predict a target variable.

A decision tree is a flow-chart-like tree structure:

● Each internal node (nút nội bộ) denotes a test on an attribute (Ví dụ: "Tuổi > 30?", "Giới
tính = Nam?")
● Each branch (nhánh) represents an outcome of the test (Ví dụ: "Đúng" hoặc "Sai")
● Leaf nodes (nút lá) represent classes or class distributions (Ví dụ: "Mua hàng" hoặc
"Không mua hàng"). The process continues until all nodes are either leaf nodes or have
no more attributes to split on. Each leaf node represents a predicted value of the target
variable.
● The top-most node in a tree is the root node (nút gốc), which represents the entire
dataset.

1, Decision tree induction (DTI)

Decision tree generation consists of two phases

● Tree construction (xây dựng cây)

○ Partition examples recursively (phân chia đệ quy) based on selected attributes
○ At start, all the training objects are at the root
● Tree pruning (cắt tỉa cây)
○ Identify and remove branches that reflect noise ( phản ánh nhiễu) or outliers (dữ
liệu bất thường)
● Use of decision trees: Classify unknown objects
○ Test the attribute values of the object against the decision tree (Kiểm tra thuộc
tính: Giá trị của các thuộc tính của đối tượng cần phân loại được kiểm tra với cây
quyết định.)

#Tree construction general algorithm

Two steps: recursively (đệ quy) generate the tree(1-4), and prune the tree (5):

1. At each node, choose the “best” attribute by a given measure for attribute selection

2. Extend tree by adding new branch for each value of the attribute

3. Sorting training examples to leaf nodes

4. If examples in a node belong to one class. Then Stop Else Repeat steps 1-4 for leaf nodes

5. Prune the tree to avoid over-fitting

#Entropy

Entropy characterizes the impurity (purity) of an arbitrary collection of objects (used to

measure the level of uncertainty or "mess" in a data set). When building decision trees, the
goal is to minimize entropy by choosing the attribute that best splits the data, reducing
uncertainty.

● Entropy bằng 0: Khi tất cả các thành viên của tập dữ liệu S thuộc cùng một lớp, entropy
bằng 0. Điều này có nghĩa là the data is more pure or certain, there’s no uncertainty

● Entropy bằng 1: Khi tập dữ liệu chứa số lượng bằng nhau của các mẫu dương và âm,
entropy bằng 1. Đây là mức độ không chắc chắn cao nhất, vì không có thông tin đủ để
phân biệt các lớp.
● Entropy giữa 0 và 1: Nếu số lượng mẫu dương và âm không bằng nhau, entropy sẽ nằm
giữa 0 và 1. Càng gần 0, độ không chắc chắn càng thấp, và ngược lại, càng gần 1, độ
không chắc chắn càng cao.

#Information gain measures the expected reduction in entropy

We define a measure, called information gain (là một thước đo hiệu quả của một thuộc tính
trong việc phân loại dữ liệu. Nó đại diện cho sự giảm thiểu dự kiến của entropy khi chia dữ liệu
theo thuộc tính đó), of the effectiveness of an attribute in classifying data. It is the expected
reduction in entropy caused by partitioning the objects according to this attribute

Information Gain là hiệu số giữa entropy ban đầu và tổng entropy sau phân chia.
Nếu Information Gain cao, điều đó có nghĩa là thuộc tính A đã giúp giảm được độ không
chắc chắn của dữ liệu sau khi phân chia.

trong đó:

● SplitInfo(S, A) là một thước đo sự phân chia của thuộc tính A trên tập S. Nó càng
nhỏ khi thuộc tính A tạo ra các tập con có kích thước tương đối bằng nhau.

ID3 (Iterative Dichotomiser 3): là một thuật toán xây dựng cây quyết định dựa trên Information
Gain (lợi ích thông tin). Nó chọn thuộc tính có Information Gain cao nhất để chia dữ liệu thành
các nhóm nhỏ hơn.

● Information Gain đo lường sự giảm độ không chắc chắn (entropy) khi chia tập dữ
liệu dựa trên một thuộc tính. Thuộc tính nào giảm độ không chắc chắn nhiều
nhất sẽ được chọn..
C4.5: Một cải tiến của ID3, khắc phục một số hạn chế của ID3, bao gồm việc ưu tiên chọn các
thuộc tính có nhiều giá trị. (ID3 có xu hướng ưu tiên chọn các thuộc tính có nhiều giá trị hơn, vì
chúng thường có thể tạo ra nhiều tập con nhỏ hơn, dẫn đến giảm entropy nhiều hơn. Điều này
có thể dẫn đến việc tạo ra các cây quyết định quá sâu và phức tạp)

● Gain Ratio là tỷ lệ giữa Information Gain và Split Information (thông tin phân
tách). Split Information đo lường mức độ mà một thuộc tính chia tập dữ liệu
thành các nhóm khác nhau.
● Split Information sẽ tăng lên khi một thuộc tính có nhiều giá trị, vì thuộc tính đó
sẽ chia tập dữ liệu thành nhiều nhóm nhỏ hơn. Điều này làm giảm Gain Ratio,
nghĩa là thuộc tính có nhiều giá trị sẽ bị phạt nếu việc chia nhỏ không thực sự
hữu ích.
● #Example:

Consider an attribute like "day of the month". If we split on this attribute in ID3, we may end up
with 31 small subsets, each containing very few examples, and each subset may have nearly
pure classes, leading to high information gain.

However, in C4.5, the Split Information would be very high for this attribute because it's
splitting the data into many small subsets. As a result, the Gain Ratio would be lower,
preventing the algorithm from choosing it unless it truly provides useful information.

#Summary:

● ID3: Chooses attributes based on information gain, which can lead to overfitting when
attributes have many unique values.
● C4.5: Uses Gain Ratio, which adjusts for the number of values an attribute has by
penalizing attributes that split the data into many small, overly specific subsets,
reducing the risk of overfitting.

#Stopping Conditions in Decision Tree Construction

When building a decision tree, the splitting process continues until one of the following
stopping conditions is met:

● All attributes have been used: If all possible attributes have been considered along the
current branch, the splitting process stops. This means there are no more attributes
available to further divide the data at that node.
● Zero entropy: If all data instances at a leaf node belong to the same class (i.e., the
node's entropy is zero), the splitting process stops. This indicates that the data at that
node is pure and does not require further partitioning.

#Generalization problem in classification

● One of the most common tasks is to fit a “model” to a set of training data, so as to be
able to make reliable predictions on general untrained data.
● Overfitting: A statistical model describes random error or noise instead of the
underlying relationship.
● Overfitting occurs when a model is excessively complex, such as having too many
parameters relative to the number of observations.
● A model that has been overfitted has poor predictive performance, as it overreacts to
minor fluctuations in the training data.

#Over-fitting in decision trees

● The generated tree may overfit the training data

○ Too many branches, some may reflect anomalies due to noise or outliers
○ Result is in poor accuracy for unseen objects
● Two approaches to avoid overfitting
○ PrepruningCắt tỉa trước: Halt tree construction early—do not split a node if this
would result in the goodness measure falling below a threshold.
■ Difficult to choose an appropriate threshold
○ PostpruningCắt tỉa sau: Remove branches from a “fully grown” tree—get a
sequence of progressively pruned trees
■ Use a set of data different from the training data to decide which is the
“best pruned tree”.

Nguyên nhân của Overfitting trong Cây Quyết Định

● Quá nhiều nhánh: Một cây quyết định quá phức tạp có thể có quá nhiều nhánh, dẫn đến
việc nó học được các mẫu nhiễu hoặc ngoại lệ trong dữ liệu huấn luyện.
● Phản ánh nhiễu: Một số nhánh có thể phản ánh các mẫu nhiễu hoặc ngoại lệ, làm cho
cây trở nên quá đặc biệt cho dữ liệu huấn luyện và kém hiệu quả với dữ liệu mới.

#Random forest: Random Forest is a machine learning algorithm of ensemble methods

(phương pháp tập hợp), which is made up of many random decision trees. The main idea of
Random Forest is to minimize the possibility of overfitting of a single decision tree by
combining multiple trees together and relying on the majority vote from the trees to make a
prediction

So sánh với Bagging

● Bagging (Bootstrap Aggregating): Là một kỹ thuật kết hợp nhiều mô hình học máy để
cải thiện độ chính xác. Nó tạo ra nhiều tập dữ liệu con bằng cách lấy mẫu có hoàn lại từ
tập dữ liệu gốc, sau đó huấn luyện một mô hình trên mỗi tập con và kết hợp kết quả của
các mô hình này.
● Random Forest: Tương tự như Bagging, Random Forest cũng sử dụng nhiều cây quyết
định. Tuy nhiên, khác với Bagging, Random Forest còn chọn ngẫu nhiên một tập con các
thuộc tính để xây dựng mỗi cây.

Quy trình xây dựng Random Forest:

1. Bagging:
○ Với một tập dữ liệu huấn luyện S=(x1,y1),(x2,y2),…,(xn,yn), chọn ngẫu nhiên có
thay thế (sample with replacement) các ví dụ (xi,yi)(x_i, y_i)(xi,yi) từ tập dữ liệu
gốc để tạo ra các tập dữ liệu con. Mỗi tập con có cùng kích thước với tập dữ liệu
ban đầu.
○ Một cây quyết định (decision tree) được học từ mỗi tập con này.
○ Khi xây dựng mỗi cây quyết định, thay vì xem xét tất cả các thuộc tính (features)
ở mỗi nút (node), chỉ một tập hợp ngẫu nhiên các thuộc tính được chọn để đánh
giá. Điều này giúp giảm sự tương quan giữa các cây trong rừng và tăng tính đa
dạng (diversity) giữa các cây.
2. Xây dựng nhiều cây (K trees):
○ Lặp lại quá trình này K lần để tạo ra K cây quyết định. Mỗi cây sẽ hơi khác nhau
do sự ngẫu nhiên trong việc lấy mẫu và chọn thuộc tính.
3. Dự đoán:
○ Khi có một ví dụ mới, mỗi cây trong rừng sẽ đưa ra dự đoán riêng của nó.
○ Dự đoán cuối cùng được xác định bằng đa số phiếu (majority vote): dự đoán nào
được nhiều cây nhất đồng thuận sẽ là kết quả cuối cùng.

Comparison with Bagging

Bagging (Bootstrap Aggregating): This is an ensemble learning technique that combines

multiple machine learning models to improve accuracy. It creates multiple subsamples of the
original dataset by sampling with replacement, then trains a model on each subsample, and
combines the results of these models.

Random Forest: Similar to Bagging, Random Forest also uses multiple decision trees. However,
unlike Bagging, Random Forest randomly selects a subset of features to construct each tree.

Random Forest Building Process:

Bagging:

Given a training set S = {(x1,y1), (x2,y2), ..., (xn,yn)}, randomly sample (with replacement)
examples (xi,yi) from the original dataset to create subsamples. Each subsample has the same
size as the original dataset.

A decision tree is learned from each subsample.

When building each decision tree, instead of considering all features at each node, only a
random subset of features is selected for evaluation. This helps reduce the correlation between
trees in the forest and increases diversity among the trees.

Build multiple trees (K trees):

Repeat this process K times to create K decision trees. Each tree will be slightly different due to
the randomness in sampling and feature selection.

Prediction:

When a new example is given, each tree in the forest makes its own prediction.

The final prediction is determined by a majority vote: the prediction that is agreed upon by the
most trees is the final result.

Key differences between Bagging and Random Forest:

Feature selection: Random Forest introduces additional randomness by selecting a random

subset of features at each node, while Bagging considers all features.

Diversity: Random Forest generally leads to more diverse trees due to the feature selection
process, which can improve the overall performance of the ensemble.

Bias-variance trade-off: Both Bagging and Random Forest can reduce variance, but Random
Forest can also help reduce bias by considering different subsets of features.

Lecture 3: Neural Networks

What is an Artificial Neural Network (ANN)?

An Artificial Neural Network (ANN) is a computational model designed to mimic the

functioning of the human brain. It consists of a network of interconnected nodes (neurons) that
process information and transmit results to other nodes. The connections between nodes have
varying weights, representing the influence of one node on another.

Về bản chất, ANN là một mô hình tính toán được thiết kế để mô phỏng cách hoạt động của não
người. Nó gồm một mạng lưới các nút (neuron) kết nối với nhau, mỗi nút xử lý thông tin và
truyền đạt kết quả đến các nút khác. Các kết nối giữa các nút có trọng số (weight) khác nhau,
đại diện cho mức độ ảnh hưởng của một nút lên nút khác.
Cấu trúc của Artificial Neural Network:

1, Các lớp (Layers): ANN thường bao gồm 3 loại lớp:

● Input Layer: Lớp đầu vào, nơi dữ liệu đầu vào được đưa vào mạng. Mỗi neuron trong lớp
này tương ứng với một đặc trưng (feature) của dữ liệu.
● Hidden Layers: Các lớp ẩn giữa lớp đầu vào và lớp đầu ra. Đây là nơi diễn ra các tính
toán chính của mạng. Một mạng có thể có nhiều lớp ẩn.
● Output Layer: Lớp đầu ra, nơi mạng cho ra kết quả cuối cùng. Số neuron ở lớp đầu ra
phụ thuộc vào loại bài toán (phân loại, hồi quy, v.v.).

2, Neuron nhân tạo (Artificial Neuron):

● Neuron là đơn vị cơ bản của ANN, cũng giống như tế bào thần kinh trong não bộ. Mỗi
neuron nhận đầu vào, thực hiện tính toán và cho ra một đầu ra.
● Mỗi neuron có một hàm kích hoạt (activation function) để quyết định giá trị đầu ra của
nó dựa trên đầu vào. Một số hàm kích hoạt phổ biến là: sigmoid, tanh, và ReLU
(Rectified Linear Unit).

3, Trọng số (Weights): Mỗi kết nối giữa các neuron được gán một trọng số. Trọng số quyết định
mức độ ảnh hưởng của đầu vào đến đầu ra của neuron. Trong quá trình huấn luyện mạng, các
trọng số này được điều chỉnh để tối ưu hóa kết quả đầu ra. (Each connection between neurons
is assigned a weight. The weight determines how much the input affects the neuron's output.
During network training, these weights are adjusted to optimize the output.)

● Connection Strength:
○ Weights dictate how much influence the input signal from one neuron has on
the output of the next neuron.
○ A higher weight means the input signal has a stronger influence on the output,
while a lower weight means it has a weaker influence.
● Learning Process:
○ During the training of a neural network, weights are adjusted through a process
called backpropagation (truyền ngược).
○ The goal is to minimize the error between the network's predictions and the
actual target values.
○ By tweaking the weights, the network learns to make more accurate predictions.
● Transformation of Input:
○ When an input is fed into a neuron, it is multiplied by the corresponding weight.
This weighted input is then summed with other weighted inputs and passed
through an activation function to produce the output. This transformation
process allows the network to learn complex patterns in the data.
● Bias: Bên cạnh trọng số, mỗi neuron cũng có một giá trị bias. Bias giúp dịch chuyển giá
trị đầu ra của hàm kích hoạt để tăng tính linh hoạt trong việc học. (In addition to the
weights, each neuron also has a bias value. Bias helps shift the output value of the
activation function to increase flexibility in learning.)
○ A bias can be thought of as a weight with a constant input of 1. It allows the
activation function to be shifted to the left or right, providing additional flexibility
in the model.

#Common Activation Functions: In Artificial Neural Networks (ANN), activation functions play
a crucial role in determining whether a neuron should be activated or not by converting the
input signal into an output. Different types of activation functions have different
characteristics, and they are chosen based on the problem being solved.

● Sigmoid (Logistic) Function: The Sigmoid function outputs values in the range of (0,
1), making it useful for binary classification problems where the output needs to be
interpreted as probabilities.
○ Range: (0, 1)
○ Use cases: Typically used in the output layer for binary classification (phân loại
nhị phân).
○ Pros: Smooth gradient, output is probabilistic and can be interpreted as
probabilities.
○ Cons: Can suffer from vanishing gradient problem in deep networks, slow
convergence due to gradient saturation (when inputs are far from 0, gradient
becomes very small).
● Tanh (Hyperbolic Tangent) Function
○ The Tanh function is similar to the Sigmoid function but its output ranges from
(-1, 1). It is symmetric around the origin (đối xứng quanh gốc tọa độ), meaning
negative inputs produce negative outputs, and positive inputs produce positive
outputs.
○ Range: (-1, 1)
○ Use cases: Commonly used in hidden layers of neural networks.
○ Pros: Zero-centered output (helps with optimization), steeper gradient compared
to Sigmoid.
○ Cons: Still suffers from the vanishing gradient problem for very large or very
small inputs.
● The ReLU function: is one of the most popular activation functions in deep learning. It
outputs the input directly if it’s positive; otherwise, it outputs zero.
○ Range: [0, ∞)
○ Use cases: Most commonly used in hidden layers of deep networks.
○ Pros: Computationally efficient, does not saturate for positive inputs, helps
alleviate the vanishing gradient problem.
○ Cons: Dying ReLU problem: neurons can get "stuck" during training (i.e., output
zero for all inputs and stop learning) if the input is negative, causing the gradient
to become zero.

#Types of connectivity:

● Feedforward Neural Networks (FFNs): the connections between neurons are

unidirectional (đơn hướng), meaning that the information moves in one direction, from
the input layer to the output layer, without cycles or loops.
○ Structure:
■ Input Layer: This is where the input data enters the network. Each
neuron in this layer represents one feature of the input data.
■ Hidden Layers: These layers perform the bulk of the computation, where
neurons apply activation functions to the weighted inputs.
■ Output Layer: The final layer produces the output, which could be a
single value for regression tasks or a set of values for classification.
● Recurrent Networks (RNNs): Recurrent Neural Networks (RNNs) are a type of neural
network where connections between neurons can form directed cycles (chu kỳ có
hướng), allowing information to be passed not only from input to output but also from
one time step to the next.
○ Structure:
■ Input Layer: Just like feedforward networks, the input layer accepts input
data, but this input is typically a sequence.
■ Hidden Layers with Recurrence: Each neuron in the hidden layer takes
not only the current input but also the output from the previous time step
as input, allowing for temporal dependencies to be learned.
■ Output Layer: This layer generates the output based on the hidden
states, either for each time step (in tasks like sequence prediction) or at
the end of the sequence (in tasks like sentiment analysis).

#Training a ANN network

Learn patterns and relationships within a given dataset by adjusting the weights during the
learning process

● Purposes
○ Pattern Recognition: Identifying patterns in data, such as recognizing images,
detecting anomalies, or understanding speech.
○ Prediction: Making predictions based on historical data, such as forecasting stock
prices or weather conditions.
○ Classification: Categorizing data into predefined classes, such as spam
detection in emails or diagnosis in medical imaging.
○ Function Approximation: Approximating complex functions that are difficult to
model mathematically.

Training an ANN network: 5 steps

1. Initialize the weights and biases (w0, w1, ..., wn)

● Initialize weights and biases, typically with small random values to break symmetry
(phá vỡ tính đối xứng).
● Initialization methods (phương pháp khởi tạo) like Xavier or He initialization can be used
depending on the activation functions.

2. Forward Propagation (truyền tiến)

● Pass the input data through the network, layer by layer, applying weights, biases, and
activation functions to compute the output.
● Calculate the predicted output for the given input.

3. Calculate the loss

● Define a loss function (e.g., Mean Squared Error for regression, Cross-Entropy Loss for
classification).
● Compute the loss by comparing the predicted output with the actual target values.
○ Sau khi mạng đưa ra kết quả đầu ra, một hàm mất mát (loss function) sẽ tính
toán mức độ sai lệch giữa đầu ra dự đoán và kết quả thực tế. Hàm mất mát
thường được sử dụng để đo lường hiệu suất của mô hình.

4. Backward Propagation

● Calculate the gradient of the loss function with respect to each weight and bias in the
network. (Tính toán độ dốc của hàm mất mát theo từng trọng số và độ lệch trong mạng)
● Use these gradients to update the weights and biases in the direction that minimizes
the loss function.
● This step involves the chain rule of calculus to propagate the error backward through
the network.

5. Update the weights and biases

● Use an optimization algorithm (e.g., Stochastic Gradient Descent, Adam) to update the
weights and biases based on the gradients computed during backpropagation.

#Several types of modern practical Deep NN

1. Perceptron (P):
● The Perceptron is the simplest type of artificial neural network and serves as the
building block for more complex networks. It models a single neuron and is used to
model logic gates for performing the logical operation.

How it works:

● Straight forward: Information flows in a single direction, from the input layer to the
output layer.
● Modeling logic gates: Perceptron can be used to model logic gates such as AND, OR,
and NOT.
Training:

● Supervised Learning: Perceptrons are trained using supervised learning, where the
network is provided with input data and the corresponding correct output.
● Backpropagation: While the backpropagation algorithm is more commonly associated
with multilayer neural networks, it can be applied to perceptrons to update the weights
and biases.
○ Thuật toán back-propagation được sử dụng để cập nhật các trọng số của
perceptron dựa trên lỗi giữa đầu ra dự đoán và đầu ra mong muốn. Lỗi này
thường được tính bằng các phương pháp như sai số bình phương trung bình
(MSE).
● Application: The simple structure limits the applications, but they are popularly
combined with other networks to form new NN structures.
2. Feedforward Neural Network (FFNN)

A Feedforward Neural Network (FFNN) is a more advanced type of artificial neural network
than the perceptron. It consists of multiple layers of neurons, making it capable of solving more
complex problems.

● Single Direction: The connection is all straight forward from the input to the output.
There is no loop going back to the previous layers. The only time using the
backpropregration is in the training phase to minimise the differences between the
target value and the ANN output.
○ The error (difference between predicted and actual output) is propagated back
through the network, and the weights are updated to reduce the error.
○ The process repeats for multiple iterations (epochs) until the network's
predictions are accurate.
● When deploying the trained ANN in an application, the computation is very fast and
thus is capable of using in real-time scenario.

● Input Layer: This is where the input data is fed into the network. Each node in this layer
represents a feature of the data.
● Hidden Layers: Located between the input and output layers, hidden layers perform
complex calculations to extract high-level features from the data. The number of
hidden layers and the number of nodes in each hidden layer can vary depending on the
network architecture and the specific problem.
● Output Layer: This layer provides the final output of the neural network. The number of
nodes in the output layer corresponds tolecture 3 neural networks 7 the number of
classes we want to predict

Q1:Để biết một data set được classify (phân loại) tốt hay không, bạn cần sử dụng một số chỉ
số và kỹ thuật đánh giá hiệu suất của mô hình. Dưới đây là các bước và các chỉ số phổ biến
được sử dụng để đánh giá hiệu quả của mô hình phân loại:

Classification Methods:

Classification methods are used when the output variable is a discrete category or class.
These methods focus on predicting which class a data point belongs to.

Regression Methods:

Regression methods are used when the output variable is continuous and not categorical.
These methods aim to predict a number.

When to Use Which?

● Use classification when you need to categorize data points into distinct groups (e.g.,
"yes" or "no").
● Use regression when you're predicting a continuous quantity (e.g., predicting house
prices).

Để biết một data set được classify (phân loại) tốt hay không, bạn cần sử dụng một số chỉ số và
kỹ thuật đánh giá hiệu suất của mô hình. Dưới đây là các bước và các chỉ số phổ biến được sử
dụng để đánh giá hiệu quả của mô hình phân loại:

1. Độ chính xác (Accuracy)

● Accuracy đo lường tỷ lệ phần trăm các dự đoán chính xác của mô hình trên tổng số
mẫu.
● Công thức: Accuracy=Soˆˊ dự đoaˊn đuˊngTổng soˆˊ maˆ˜u\text{Accuracy} =
\frac{\text{Số dự đoán đúng}}{\text{Tổng số mẫu}}Accuracy=Tổng soˆˊ maˆ˜uSoˆˊ dự
đoaˊn đuˊng
● Tuy nhiên, accuracy không phải lúc nào cũng phản ánh chính xác chất lượng của mô
hình, đặc biệt khi dữ liệu bị mất cân bằng (ví dụ: 90% lớp A và 10% lớp B). Trong trường
hợp đó, các chỉ số khác sẽ quan trọng hơn.
2. Ma trận nhầm lẫn (Confusion Matrix)

Ma trận nhầm lẫn cho phép bạn hiểu rõ hơn cách mô hình phân loại từng lớp cụ thể.

● True Positive (TP): Dự đoán đúng và mẫu thuộc lớp dương.

● True Negative (TN): Dự đoán đúng và mẫu thuộc lớp âm.
● False Positive (FP): Dự đoán sai, mô hình dự đoán lớp dương nhưng mẫu thuộc lớp âm.
● False Negative (FN): Dự đoán sai, mô hình dự đoán lớp âm nhưng mẫu thuộc lớp dương

6. Cross-Validation

● Cross-validation (phổ biến là k-fold cross-validation) là kỹ thuật chia nhỏ dữ liệu

thành k phần, trong đó mỗi phần sẽ lần lượt được sử dụng làm tập kiểm tra, các phần
còn lại làm tập huấn luyện. Kỹ thuật này giúp đánh giá hiệu quả của mô hình trên các
phân đoạn khác nhau của dữ liệu và đảm bảo tính tổng quát cao hơn.

Q2: Is KNN prone to overfitting?

Yes, KNN is prone to overfitting.

Here's why:

● Small K values: When the value of K is too small, the model becomes overly sensitive
to the local structure of the data. This can lead to it memorizing the training data rather
than learning general patterns, causing overfitting.
● High-dimensional data: In high-dimensional spaces, the "curse of dimensionality" can
come into play. This phenomenon makes it difficult for KNN to find meaningful
neighbors, leading to overfitting.
● Noisy data: If the data contains a lot of noise or outliers, KNN can be easily influenced
by these points, leading to overfitting.

To mitigate overfitting in KNN:

● Choose an appropriate K value: Experiment with different K values to find the optimal
balance between underfitting and overfitting. Cross-validation can help you select the
best K.
● Consider dimensionality reduction techniques: Techniques like PCA or t-SNE can help
reduce the dimensionality of your data, making it easier for KNN to find meaningful
neighbors.
● Handle outliers: Identify and remove or handle outliers in your data to prevent them
from unduly influencing the model.
● Use distance metrics that are appropriate for your data: The choice of distance metric
can significantly impact the performance of KNN. Consider using metrics like Manhattan
distance or cosine similarity for different types of data.
#Data types and structure

Types of data:

Quantitative:

● Discrete (distinct/separate value): It can only have specific values. Data can be
“countedˮ. Discrete data includes discrete variables that are finite, numeric, countable,
and non-negative integers.Eg: dice
● Continuous (any value in an interval): It can take on value in an interval(1 khoảng).
Variables in continuous data sets often carry decimal points, with the number stretching
out as far as possible. Data that can be “measuredˮ. Eg: Temperature Qualitativelecture
2 key issues of machine learning 4 Nominal: Nominal data is categorical and represents
labels or names without any inherent order.

Qualitative:

● Nominal: Nominal data is categorical and represents labels or names without any
inherent order.
○ This type of data categorizes items based on their characteristics, but the
categories don't have a ranked or logical sequence
○ You can think of it as simple "labels" that differentiate one category from
another
○ For example Gender: Male, Female, Non-binary Blood Type: A, B, AB, O Ordinal:
Ordinal data is also categorical, but unlike nominal data, it has a meaningful
order or ranking.
● Ordinal:Ordinal data is also categorical, but unlike nominal data, it has a meaningful
order or ranking.
○ However, the intervals between these ranked categories are not necessarily
equal or known.
○ Ordinal data provides a sense of sequence, but it doesn't analyze the difference
between each rank.
○ VD: Educational level: High school, Bachelorʼs Degree, Masterʼs Degree, PhD
■ Movie Ratings: 1 star, 2 stars, 3 stars, 4 stars, 5 stars

Conversion

● Discretization: transforms continuous variables into discrete variables

VD Transforms temperature data from continuous form into classification levels

such as "High", "Medium", "Lowˮ

● Word Embedding: Word embedding or word vector is an approach with which we

represent documents and wordslecture 2 key issues of machine learning 5
○ It is defined as a numeric vector input that allows words with similar meanings
to have the same representation
○ VD In natural language processing NLP, words are initially just character strings
(e.g. "frog", "toad "), which has no numerical meaning. By word embedding, each
word is converted into a vector of numbers in a continuous space. These vectors
represent relationships between words, for example the words "frog" and "toad"
will have vectors close together, because they have similar meanings

(1) Overfitting in Machine Learning:

● Definition: Overfitting occurs when a machine learning model learns the training data
too well, including noise and details that don't generalize to new, unseen data.
● Phenomenon: The model performs exceptionally well on the training data but poorly
on the test data. This happens when the model is overly complex relative to the amount
of data or the noise in the data.

Example: A Decision Tree that perfectly classifies the training data but fails to generalize to
unseen examples because it has learned irrelevant patterns.

(2) Solutions to Avoid Overfitting:

General Solutions:

● Cross-Validation: Use k-fold cross-validation to ensure that the model generalizes

well.
● Regularization: Apply techniques like L1 (Lasso) and L2 (Ridge) regularization to
penalize large coefficients.
● Simpler Models: Use simpler models with fewer parameters to reduce the chance of
overfitting.
● Pruning (for decision trees): Trim the branches that provide little power in classifying
data points.
● Dropout (for neural networks): Randomly drop neurons during training to prevent the
network from becoming too reliant on specific nodes.

Algorithm-Specific Solutions:

● For Decision Trees: Use pruning techniques or set a maximum depth.

● For Random Forests: Limit the number of features used at each split, and increase the
number of trees.
● For Neural Networks: Apply regularization, dropout, or early stopping.
● For SVM: Use a simpler kernel (e.g., linear instead of radial basis function) or regularize
using C parameter.

Q2- What is the difference between supervised and unsupervised machine learning? More
reading: What is the difference between supervised and unsupervised machine learning?
(Quora) Supervised learning requires training labeled data. For example, in order to do
classification (a supervised learning task), you’ll need to first label the data you’ll use to train
the model to classify data into your labeled groups. Unsupervised learning, in contrast, does
not require labeling data explicitly.

Q7- Why is “Naive” Bayes naive? More reading: Why is “naive Bayes” naive? (Quora) Despite
its practical applications, especially in text mining, Naive Bayes is considered “Naive” because it
makes an assumption that is virtually impossible to see in real-life data: the conditional
probability is calculated as the pure product of the individual probabilities of components. This
implies the absolute independence of features — a condition probably never met in real life. As
a Quora commenter put it whimsically, a Naive Bayes classifier that figured out that you liked
pickles and ice cream would probably naively recommend you a pickle ice cream.

Q13- What is deep learning, and how does it contrast with other machine learning
algorithms? More reading: Deep learning (Wikipedia) Deep learning is a subset of machine
learning that is concerned with neural networks: how to use backpropagation and certain
principles from neuroscience to more accurately model large sets of unlabelled or
semi-structured data. In that sense, deep learning represents an unsupervised learning
algorithm that learns representations of data through the use of neural nets.

Q15- What cross-validation technique would you use on a time series dataset? More
reading: Using k-fold cross-validation for time-series model selection (CrossValidated) Instead
of using standard k-folds cross-validation, you have to pay attention to the fact that a time
series is not randomly distributed data — it is inherently ordered by chronological order. If a
pattern emerges in later time periods for example, your model may still pick up on it even if
that effect doesn’t hold in earlier years! You’ll want to do something like forward chaining
where you’ll be able to model on past data then look at forward-facing data.
Q16- How is a decision tree pruned? More reading: Pruning (decision trees) Pruning is what
happens in decision trees when branches that have weak predictive power are removed in
order to reduce the complexity of the model and increase the predictive accuracy of a decision
tree model. Pruning can happen bottom-up and top-down, with approaches such as reduced
error pruning and cost complexity pruning. Reduced error pruning is perhaps the simplest
version: replace each node. If it doesn’t decrease predictive accuracy, keep it pruned.

Q20- When should you use classification over regression? More reading: Regression vs
Classification (Math StackExchange) Classification produces discrete values and dataset to
strict categories, while regression gives you continuous results that allow you to better
distinguish differences between individual points. You would use classification over regression
if you wanted your results to reflect the belongingness of data points in your dataset to certain
explicit categories (ex: If you wanted to know whether a name was male or female rather than
just how correlated they were with male and female names.)

Q22- How do you ensure you’re not overfitting with a model? More reading: How can I avoid
overfitting? (Quora) This is a simple restatement of a fundamental problem in machine
learning: the possibility of overfitting training data and carrying the noise of that data through
to the test set, thereby providing inaccurate generalizations. There are three main methods to
avoid overfitting: 1- Keep the model simpler: reduce variance by taking into account fewer
variables and parameters, thereby removing some of the noise in the training data. 2- Use
cross-validation techniques such as k-folds cross-validation. 3- Use regularization techniques
such as LASSO that penalize certain model parameters if they’re likely to cause overfitting.

Q23- What evaluation approaches would you work to gauge the effectiveness of a machine
learning model? More reading: How to Evaluate Machine Learning Algorithms (Machine
Learning Mastery) You would first split the dataset into training and test sets, or perhaps use
cross-validation techniques to further segment the dataset into composite sets of training and
test sets within the data. You should then implement a choice selection of performance metrics:
here is a fairly comprehensive list. You could use measures such as the F1 score, the accuracy,
and the confusion matrix. What’s important here is to demonstrate that you understand the
nuances of how a model is measured and how to choose the right performance measures for
the right situations.

Lecture 2.1 - Quantum Circuit Compilation With Qiskit
No ratings yet
Lecture 2.1 - Quantum Circuit Compilation With Qiskit
44 pages
Chapter 3 An Illustrative Example of Case 1 Best-Worst Scaling - Non-Market Valuation With R
No ratings yet
Chapter 3 An Illustrative Example of Case 1 Best-Worst Scaling - Non-Market Valuation With R
41 pages
MATLAB Experiment No. (2) Second Order Systems: Objectives: 1. 2
No ratings yet
MATLAB Experiment No. (2) Second Order Systems: Objectives: 1. 2
21 pages
AI ML Concepts
No ratings yet
AI ML Concepts
97 pages
PSCS511 - Machine Learning
No ratings yet
PSCS511 - Machine Learning
23 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
Machine Learning
No ratings yet
Machine Learning
38 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
ML Question Bank Ese
No ratings yet
ML Question Bank Ese
37 pages
Mlanswers
No ratings yet
Mlanswers
17 pages
Machine Learning.
No ratings yet
Machine Learning.
50 pages
ML - Part - A
No ratings yet
ML - Part - A
10 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
No ratings yet
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
5 pages
ML Cheet
No ratings yet
ML Cheet
14 pages
Understanding Time Complexity With Simple Examples
No ratings yet
Understanding Time Complexity With Simple Examples
78 pages
L1 Intro
No ratings yet
L1 Intro
43 pages
Chapter 7 Complexity
No ratings yet
Chapter 7 Complexity
21 pages
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
ML Notes
No ratings yet
ML Notes
16 pages
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
From Everand
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
Ethan Bennett
No ratings yet
Data Science Notes C
No ratings yet
Data Science Notes C
4 pages
MLE
No ratings yet
MLE
15 pages
2022-2023 ICPC Latin American Regional Programming Contest - Unofficial Editorial
No ratings yet
2022-2023 ICPC Latin American Regional Programming Contest - Unofficial Editorial
10 pages
ML QB Answers
No ratings yet
ML QB Answers
11 pages
ML Imp Ques 1
No ratings yet
ML Imp Ques 1
22 pages
Unit-2-Symmetric Key Techniques (Part-1)
No ratings yet
Unit-2-Symmetric Key Techniques (Part-1)
80 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
cs3491 Aiandmllabmanual
No ratings yet
cs3491 Aiandmllabmanual
43 pages
Unit 3 Queue
No ratings yet
Unit 3 Queue
52 pages
Chatgpt Unit - 1
No ratings yet
Chatgpt Unit - 1
5 pages
How To Create Beautiful Graphs and Charts With LaTeX
No ratings yet
How To Create Beautiful Graphs and Charts With LaTeX
1 page
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Module - 1
No ratings yet
Module - 1
9 pages
Class Notes: The Basics of Machine Learning
No ratings yet
Class Notes: The Basics of Machine Learning
4 pages
1 Introduction
No ratings yet
1 Introduction
51 pages
Optimization For Machine Learning: Lecture 12: Coordinate Descent, BCD, Altmin 6.881: MIT
No ratings yet
Optimization For Machine Learning: Lecture 12: Coordinate Descent, BCD, Altmin 6.881: MIT
124 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
DFS Algorithm For Graph
No ratings yet
DFS Algorithm For Graph
4 pages
Basic Concepts of Machine Learning For Beginners 1732109263
No ratings yet
Basic Concepts of Machine Learning For Beginners 1732109263
102 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
DS Lab Manual Final
No ratings yet
DS Lab Manual Final
49 pages
Ahishek File
No ratings yet
Ahishek File
6 pages
Slide AI-ML-DL
No ratings yet
Slide AI-ML-DL
124 pages
Pa 2
No ratings yet
Pa 2
13 pages
ML
No ratings yet
ML
5 pages
Machine Learning For Data Science Unit-4
No ratings yet
Machine Learning For Data Science Unit-4
16 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
ALGO Practice Session-I
No ratings yet
ALGO Practice Session-I
2 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Machine Learning in Unit-1
No ratings yet
Machine Learning in Unit-1
10 pages
MACHINE LEARNING Unit-1
No ratings yet
MACHINE LEARNING Unit-1
23 pages
ML Module 1
No ratings yet
ML Module 1
12 pages
ML1-Introduction To Machine Learning
No ratings yet
ML1-Introduction To Machine Learning
46 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
64 pages
Chapter 01 Machine Learning
No ratings yet
Chapter 01 Machine Learning
22 pages
DAA Practical File - 1900648
No ratings yet
DAA Practical File - 1900648
20 pages
Unit I
No ratings yet
Unit I
23 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
Formula Sheet ENMG 435
No ratings yet
Formula Sheet ENMG 435
11 pages
Ass 2
No ratings yet
Ass 2
6 pages
Chapter 4
100% (1)
Chapter 4
31 pages
Define Machine Learning Explain With Examples Why Machine Learning Is Important? Ans
No ratings yet
Define Machine Learning Explain With Examples Why Machine Learning Is Important? Ans
10 pages
Baltica Insurance Company LTD., Ballerup, Denmark: by Henrik Ramlau-Hansen
No ratings yet
Baltica Insurance Company LTD., Ballerup, Denmark: by Henrik Ramlau-Hansen
15 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
20 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
63 pages
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
No ratings yet
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
5 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Duality and Sensitivity Analysis: Chapter 4: Group 3
100% (1)
Duality and Sensitivity Analysis: Chapter 4: Group 3
56 pages
The Poor Cartographer-Graph: Coloring
No ratings yet
The Poor Cartographer-Graph: Coloring
13 pages
Design of Singly Reinforced Beam Case 1
No ratings yet
Design of Singly Reinforced Beam Case 1
12 pages
EE-211 Circuit Analysis: Dr. Hadeed Ahmed Sher
No ratings yet
EE-211 Circuit Analysis: Dr. Hadeed Ahmed Sher
23 pages
Weakly Nonlinear Oscillations
No ratings yet
Weakly Nonlinear Oscillations
15 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Scholkopf Kernel PDF
No ratings yet
Scholkopf Kernel PDF
6 pages
Study Notes - Lesson 1 - 7 PDF
No ratings yet
Study Notes - Lesson 1 - 7 PDF
25 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Artificial Neural Networks An Econometric Perspective
No ratings yet
Artificial Neural Networks An Econometric Perspective
98 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Chinese Reminder Theorem
No ratings yet
Chinese Reminder Theorem
7 pages
Or Assignment
No ratings yet
Or Assignment
9 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet