0% found this document useful (0 votes)

20 views49 pages

AI14 - MachineLearning

Chapter 14 covers basic concepts of machine learning, including supervised and unsupervised learning techniques, with a focus on linear regression and the k-NN algorithm. It explains how machine learning allows computers to learn from data to make predictions and decisions, and discusses issues like overfitting and underfitting. The chapter also includes practical examples using Python libraries like Scikit-learn.

Uploaded by

truongquangmy1991

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views49 pages

AI14 - MachineLearning

Uploaded by

truongquangmy1991

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Chapter 14.

Basic
Concepts of
Machine Learning
Table of contents
• 14.1 Introduction
• 14.2 Linear regression
• 14.3 k-NN algorithm
• 14.4 Overfitting and
underfitting
14.1 Introduction

Big Data Analytics &

Artificial Intelligence
Starting with Python
Introduction
• Rule-based method: A method of writing a program and instructing the computer to
do something.
• Machine learning method: A method in which a computer learns on its own based
on data to solve problems.
Like “AlphaGo,” if you tell the computer only the rules of the Go game and notation
of the previous game, the computer can learn the principles of Go on its own and
play Go.

Go has many
game rules.

Apply machine
learning

The computer program that beats Lee Sedol

Introduction

• Machine learning is a field of artificial intelligence, a research field

for giving learning capabilities to computers.
• Machine learning, which evolved from pattern recognition and
computational learning theories, involves making computers learn how
to make decisions by looking at given data.
• The performance of decision-making algorithms improves as more
data is available to train.
• Unlike algorithms composed of commands that always perform
predetermined operations, learning algorithms can predict and make
decisions using data.
• The field where machine learning is mainly used is to deal with
problems that are difficult for computers to do by specifying a solution
method. For example, it is actively used in fields such as spam mail
filtering, automatic detection of network intruders, computer vision,
and autonomous driving
Introduction

• Different types of machine learning techniques: Machine learning is

generally divided into supervised learning and unsupervised learning
depending on the existence of a “teacher” who teaches.
Machine Learning

Supervised learning Unsupervised learning Reinforcement

learning

Regression analysis Clustering

Random forest Dimensionality

reduction
Decision tree

Classification
Introduction

• Supervised learning: The computer is given examples and correct answers

(or labels) given by the "teacher". The goal of supervised learning is to
learn general rules for mapping inputs to outputs.

Labeling

cat

data Learning using data and labels

Introduction

• Unsupervised learning is representative of clustering shown above. For

this data, it can be divided into two large groups, the rules are that the
computer learns itself by looking at the data.
Introduction

• Reinforcement learning is given learning data in the form of rewards and

punishments. This is the case where only feedback on the behavior of the program is
provided in a dynamic environment, such as driving a vehicle or playing against an
opponent.
14.2 Linear regression

Big Data Analytics &

Artificial Intelligence
Starting with Python
Linear regression

Supervised learning informs problems and correct answers and enables

learning
• Supervised learning predicts a reasonable output value when a new input value
14.1 Supervised learning:
comes in after learning a given input-output pair.
• In other words, supervised learning can be said to learn a mapping function f(x)
Linear regression
from input to output when input (x) and output (y) are given.
Linear regression
Supervised learning informs problems and correct answers and enables
learning
• Suppose we are given points (1, 10), (2, 20), (3, 30), and (4, 40) as input data in

14.1 Supervised learning:

the form of (x, y). The computer does not yet know that the y value for the x
value is data that can be expressed by the equation 𝑦 = 10𝑥. I want to make the
Linear regression
computer answer 50 by learning 4 given data and inputting x = 5 after learning
is finished.
• Supervised learning is when a computer finds the best function that can explain
this input by itself based on input values, and this problem can be called
regression analysis among supervised learning.
Linear regression

Find a function that describes the data well: a regression problem

Linear regression Nonlinear regression Classification

14.1 Supervised learning:

Linear regression

• Regression is the problem of finding the straight line or curve that best describes
the data, usually after plotting the data in a multidimensional space.

• In other words, predicting the function 𝑓(𝑥) while looking at the input 𝑥 and the
output 𝑦 at 𝑦=𝑓(𝑥) is called a regression technique.
Linear regression

• Scikit-learn

– Libraries for Machine Learning

– It includes classification, regression, and clustering algorithms such as linear

regression, k-NN algorithm, support vector machine, random forest, gradient
boosting, and k-means, so it is gaining popularity as a good tool for those
new to machine learning.

• How to install Scikit-learn in Anaconda?

pip install numpy==1.19.2 scipy==1.9.3 scikit-learn
Linear regression

The Simplest Regression: Linear Regression

• Linear regression is a technique for modeling the

https://lh6.googleusercontent.com/KLJCv2ZyvlN6FAGid-bssUZ7n2F_ZUBglQHo_5m5gb-ovIfR-QCJoZzgph3h7M_lR-iUn8G5UTbZb2dCr4CB8xGOfC88M9Lxky8gUq76qoh8Ax9Zo_4TOdNllXJBvdNqZjYNEeKE

correlation of a random variable 𝑥 with another

variable 𝑦 depending on it (x is the feature of the
data, m is the slope, b is the intercept)
Here, it is an input
with two variables,
but in three
dimensions or more,
When there it is a hyperplane.
are more than
two variables,
it is called
multiple
linear
regression.
Linear regression
Let's implement linear regression with the Scikit-Learn library
• Suppose we are examining the height and weight of students in a classroom.
• Suppose, in general, that taller students weigh more.
• Height and weight were measured for several students.
• Student A, whose height is accurately known but whose weight is unknown, is
absent from school.
• Can you accurately estimate this student A's weight?

• If we could create a formula to quantify the correlation between height and weight, we

?
would be able to estimate the weight of student A, whose weight is unknown.
Linear regression
Let's implement linear regression with the Scikit-Learn library
• Four students were randomly extracted to measure the height and weight, and
the height was 164, 179, 162, and 170 cm, and their weight was 53, 63, 55, and
59 kg, respectively.

The input value

Write the most suitable li
must be arranged
near equation to describe
2D
this distribution
키
Linear
Regression

Height
weight
Linear regression
Let's implement linear regression with the Scikit-Learn library
Caution: Input value is a person’s height, 164, 179, 162, 170, respectively.
-> The input of linear regression must be used to use a multi-dimensional array.

Target value

As a function that creates a linear regression model

Create an input vector X to optimize the target value Y.
In other words, it is a model generator.
Linear regression
Check and predict the results of linear regression learning

If you want to check the slope and

sections of the determined straight line,
check the characteristic value CoEF_
and Intercept_. And how well these
values are suitable for predicting Y for
input X, confirming the score () function

How well the model predict

the data
: About 90 points
Linear regression
Check and predict the results of linear regression learning

Now, for students with a height of 180 or 185, I would like to find out
how the Regrin Linear Return model we created predicts weight.
To do this, prepare the input data.

180 63.71
regr.predict()
185 66.47

Use the predict () function of the linear regression model regr.

Enter the students' keys by entering this function.
Now, based on the model of regr, it is returned by estimating the
students’ heights.
Linear regression

Use MATPLOTLIB library to graph this linear regression

Linear regression

Question: Use a linear regression model to predict the weights of [166, 0] and [170, 1]?

Student 1 Student 2 Student 3 Student 4 Student 5 Student 6 Student 7 Student 8

Height 164 167 165 170 179 163 159 166

Sex 1 1 0 0 0 1 0 1
Weight 43 48 47 66 67 50 52 44

I am a I am a
woman (1) man (0)

Even if the height is similar, the weight of

male and women will be different.

The feature value of student 4 is [170, 0]

As a feature to be used for input of linear

regression
Add a man (0), a woman (1)
Linear regression
Diabetes examples:

• The sklearn library includes a dataset from diabetics.

• The data has more data and features than the above examples.
Linear regression
Diabetes examples: diabetes dataset

This data includes data used as an input, targets used as a result of learning, and
feature_names that store the names of the input features.
Linear regression

Diabetes examples: diabetes dataset

• Extract only one third item corresponding to the body mass index bmi out of 10 features

Use this data as an input of regr we learned earlier.

• The data used as the input of the function must be a two-dimensional array.

Increase the dimension of array using np.newaxis

Now you can use this data X as an input of a linear return
model.

We will just extract only bmi data and use it as an input of linear regression.
Linear regression

Diabetes examples: What is the correlation between the body mass index and the
diabetes level?

• Linear regression learning

Linear regression
Diabetes examples: Separate the diabetes example into training and test data

diabetes dataset load_diabetes() Let's see how accurate this model is

using 20% of the test data (new data).

train_test_split()

learning data test data

X_train X_test
y_train y_test

Final performance
evaluation

linear
regression Model Accuracy
learning

regr.fit() regr.predict()
Linear regression
Diabetes examples:
Separate the diabetes example into training and test data
• Only 80%of the total 442 are used for learning (or training)
• Using the remaining 20%for testing

Learning as a linear regression

model using learning data such as
X_train and y_train
(Using only bmi data)

Both training data and test data

show scores of 35 and 31 points.
Linear regression

Diabetes examples: Use all the features in the dataset for linear regression
Linear regression

Diabetes examples: Use all the features in the dataset for linear regression

predicted
value

actual value
Linear regression

Diabetes examples: Mean squared error(MSE):

There are various methods of calculating the error between y_pred and y_test.
One of them is Mean squared error(MSE):

N
1
MSE =
N
 (H ( X ) − y )
i =1
i i

Where, N is the number of elements, 𝑦𝑖 is the ith y_test value, and 𝐻 𝑋𝑖 is

the y_pred estimated by the linear regression model, corresponding to 𝑦𝑖 .
Linear regression

Diabetes examples: Python program

Linear regression

Diabetes examples: Python program: Results

14.3 k-NN algorithm

Big Data Analytics &

Artificial Intelligence
Starting with Python
k-NN algorithm
The problem of classifying Dachshund and Samoyed dogs
Assume a simple case where the feature space of the data consists of two
features and items are displayed in this feature space.
k-NN algorithm

Samoyed dog
Height

Dachsund dog

Length
• The Samoyed has a high height value compared to its length, and the lower
Dachshund has a low height value compared to its length.
k-NN algorithm
If you classify by looking at 3
Class A of the number of nearest neig
hbors, you belong to class B,
but if you classify by looking
at 5, you belong to class A.

Class B

When k = 3: 2 class B, 1 class A

When k = 5: 2 class B, 3 class A
k-NN algorithm

Get ready to classify the beautiful irises

k-NN algorithm

sepal length

petal length
sepal width

petal width
Setosa : 0
Versicolor : 1
Virginica: 2.
k-NN algorithm

Setosa : 0
Versicolor : 1
Virginica: 2.
k-NN algorithm

You can also see that the labels are encoded as 0, 1, 2

Setosa : 0
Versicolor : 1
Virginica: 2
k-NN algorithm
Apply the k-NN algorithm
Using 80% of the total data as training data,
this model
The remaining 20% of the test data
Make sure you predict well

Training and testing using

KNeighborClassifier
k-NN algorithm

Let's classify new flowers by applying the model?

k-NN algorithm
Let's find out the accuracy of the classifier
k-NN algorithm
Let's find out the accuracy of the classifier
k-NN algorithm
Let's find out the accuracy of the classifier
14.4 Overfitting and
underfitting

Big Data Analytics &

Artificial Intelligence
Starting with Python
Overfitting and underfitting

When performance is excellent on

Poor performance on both trained and
trained data, but performance on
new data.
new data is poor.

Underfitting and overfittin

g
1) Reasons for underfitting a
nd overfitting?
2) Techniques to reduce unde
rfitting and overfitting?

Underfit Good fit Overfit

(https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning)

Unit3aiml 230421054431 97b34666
No ratings yet
Unit3aiml 230421054431 97b34666
62 pages
Day.9 SML
No ratings yet
Day.9 SML
23 pages
Class 8 - Linear Regression
No ratings yet
Class 8 - Linear Regression
56 pages
Unit 3
No ratings yet
Unit 3
62 pages
Machine Learning
No ratings yet
Machine Learning
100 pages
Unit 3
No ratings yet
Unit 3
45 pages
Mach Nine Learning 12
No ratings yet
Mach Nine Learning 12
22 pages
Foundation of Machine Learning F-PMLFML02-WS
No ratings yet
Foundation of Machine Learning F-PMLFML02-WS
352 pages
d3 It ML Jan 2023 Part 2
No ratings yet
d3 It ML Jan 2023 Part 2
32 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
92 pages
ML Unit-4
No ratings yet
ML Unit-4
65 pages
Ilovepdf - Merged (1) - Merged
No ratings yet
Ilovepdf - Merged (1) - Merged
30 pages
Linear Regression Presentation
No ratings yet
Linear Regression Presentation
8 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Chapter - 2-ML
No ratings yet
Chapter - 2-ML
63 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Lecture-07 & 08 (New)
No ratings yet
Lecture-07 & 08 (New)
17 pages
AI Lec 2
No ratings yet
AI Lec 2
49 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
Cp4252 ML Unit-II
No ratings yet
Cp4252 ML Unit-II
44 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Machine Learning
No ratings yet
Machine Learning
115 pages
Module 4
No ratings yet
Module 4
41 pages
ML 2
No ratings yet
ML 2
155 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Linear Regression For ML Ass
No ratings yet
Linear Regression For ML Ass
99 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Aiml 4
No ratings yet
Aiml 4
107 pages
AI Lec-04
No ratings yet
AI Lec-04
21 pages
Lesson 09 - Introduction To Model Building
No ratings yet
Lesson 09 - Introduction To Model Building
85 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
ML Unit
No ratings yet
ML Unit
23 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Knowledge Cartography 2014
No ratings yet
Knowledge Cartography 2014
555 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
12 pages
Plant Risk Assessment Worksheet (Pra)
No ratings yet
Plant Risk Assessment Worksheet (Pra)
19 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
Slide 1
No ratings yet
Slide 1
29 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Ai ML 3
No ratings yet
Ai ML 3
27 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Unit 3
No ratings yet
Unit 3
30 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Unit-Vi 2
No ratings yet
Unit-Vi 2
31 pages
Lecture 3
No ratings yet
Lecture 3
51 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
Alexander Hamilton, Michael A. Genovese, James Madison, John Jay - The Federalist Papers-Palgrave Macmillan (2009) PDF
No ratings yet
Alexander Hamilton, Michael A. Genovese, James Madison, John Jay - The Federalist Papers-Palgrave Macmillan (2009) PDF
313 pages
Linear Regression in Python
No ratings yet
Linear Regression in Python
28 pages
Machine Learning
No ratings yet
Machine Learning
53 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Machine Learning: Introduction and Linear Regression
No ratings yet
Machine Learning: Introduction and Linear Regression
29 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
ML Unit Ii
No ratings yet
ML Unit Ii
30 pages
Fa Assignment
No ratings yet
Fa Assignment
32 pages
Autoclaved Aerated Concrete (AAC) Blocks Project - Brief Report
77% (31)
Autoclaved Aerated Concrete (AAC) Blocks Project - Brief Report
12 pages
Engineering Economics-Question Bank
0% (1)
Engineering Economics-Question Bank
2 pages
Cueng Discover Analyze Read Publish - 0
No ratings yet
Cueng Discover Analyze Read Publish - 0
48 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
Thought Mastery Vocab Text PDF
No ratings yet
Thought Mastery Vocab Text PDF
2 pages
An Introduction To Meditation
No ratings yet
An Introduction To Meditation
20 pages
Physical Science - Reviewer
No ratings yet
Physical Science - Reviewer
7 pages
Durapac - Pumps - LR
No ratings yet
Durapac - Pumps - LR
29 pages
Third Periodic Examination in Math Problem Solving: Main Campus - Level 12
No ratings yet
Third Periodic Examination in Math Problem Solving: Main Campus - Level 12
9 pages
LinearRegression PDF
No ratings yet
LinearRegression PDF
4 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Hi Connections
No ratings yet
Hi Connections
6 pages
04.DNS Protection Advanced Profiles
No ratings yet
04.DNS Protection Advanced Profiles
3 pages
Script For Turn-Over and Installation Ceremonies
100% (15)
Script For Turn-Over and Installation Ceremonies
3 pages
Accudemia For Tutors FA24
No ratings yet
Accudemia For Tutors FA24
7 pages
Column Slides - Chapter 9
No ratings yet
Column Slides - Chapter 9
17 pages
2024 Exercise Allomorph Der Inf
No ratings yet
2024 Exercise Allomorph Der Inf
5 pages
Marine-Diesel-Purifier For High Quality Diesel and Gas Oil PSST1114-UK
No ratings yet
Marine-Diesel-Purifier For High Quality Diesel and Gas Oil PSST1114-UK
2 pages
Daria Reflection
No ratings yet
Daria Reflection
1 page
B.SC - in Civil Engineering Session 2014 2015
No ratings yet
B.SC - in Civil Engineering Session 2014 2015
25 pages
Activity On Precis Writing
No ratings yet
Activity On Precis Writing
2 pages
Today in Physics 217: Electric Dipoles and Their Interactions
No ratings yet
Today in Physics 217: Electric Dipoles and Their Interactions
15 pages
Loraine Boettner Marriage
No ratings yet
Loraine Boettner Marriage
17 pages
Environmental Product Declaration: Arcelormittal
No ratings yet
Environmental Product Declaration: Arcelormittal
10 pages
Kiss That Frog Book Review
No ratings yet
Kiss That Frog Book Review
6 pages
The C.P.A - Sunday Service of 17TH March 2024.
No ratings yet
The C.P.A - Sunday Service of 17TH March 2024.
3 pages
1 Logarthmic - Decrement
No ratings yet
1 Logarthmic - Decrement
5 pages
List of Experiments OOPM16
No ratings yet
List of Experiments OOPM16
3 pages
Mathematics Stage 9
No ratings yet
Mathematics Stage 9
4 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet

AI14 - MachineLearning

Uploaded by

AI14 - MachineLearning

Uploaded by

Chapter 14.

Big Data Analytics &

The computer program that beats Lee Sedol

• Machine learning is a field of artificial intelligence, a research field

• Different types of machine learning techniques: Machine learning is

Supervised learning Unsupervised learning Reinforcement

Regression analysis Clustering

Random forest Dimensionality

• Supervised learning: The computer is given examples and correct answers

data Learning using data and labels

• Unsupervised learning is representative of clustering shown above. For

• Reinforcement learning is given learning data in the form of rewards and

Big Data Analytics &

Supervised learning informs problems and correct answers and enables

14.1 Supervised learning:

Find a function that describes the data well: a regression problem

14.1 Supervised learning:

– Libraries for Machine Learning

– It includes classification, regression, and clustering algorithms such as linear

• How to install Scikit-learn in Anaconda?

The Simplest Regression: Linear Regression

• Linear regression is a technique for modeling the

correlation of a random variable 𝑥 with another

The input value

As a function that creates a linear regression model

If you want to check the slope and

How well the model predict

Use the predict () function of the linear regression model regr.

Use MATPLOTLIB library to graph this linear regression

Student 1 Student 2 Student 3 Student 4 Student 5 Student 6 Student 7 Student 8

Height 164 167 165 170 179 163 159 166

Even if the height is similar, the weight of

The feature value of student 4 is [170, 0]

As a feature to be used for input of linear

• The sklearn library includes a dataset from diabetics.

Diabetes examples: diabetes dataset

Use this data as an input of regr we learned earlier.

Increase the dimension of array using np.newaxis

• Linear regression learning

diabetes dataset load_diabetes() Let's see how accurate this model is

learning data test data

Learning as a linear regression

Both training data and test data

Diabetes examples: Mean squared error(MSE):

Where, N is the number of elements, 𝑦𝑖 is the ith y_test value, and 𝐻 𝑋𝑖 is

Diabetes examples: Python program

Diabetes examples: Python program: Results

Big Data Analytics &

When k = 3: 2 class B, 1 class A

Get ready to classify the beautiful irises

You can also see that the labels are encoded as 0, 1, 2

Training and testing using

Let's classify new flowers by applying the model?

Big Data Analytics &

When performance is excellent on

Underfitting and overfittin

Underfit Good fit Overfit

You might also like