0% found this document useful (0 votes)
20 views49 pages

AI14 - MachineLearning

Chapter 14 covers basic concepts of machine learning, including supervised and unsupervised learning techniques, with a focus on linear regression and the k-NN algorithm. It explains how machine learning allows computers to learn from data to make predictions and decisions, and discusses issues like overfitting and underfitting. The chapter also includes practical examples using Python libraries like Scikit-learn.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views49 pages

AI14 - MachineLearning

Chapter 14 covers basic concepts of machine learning, including supervised and unsupervised learning techniques, with a focus on linear regression and the k-NN algorithm. It explains how machine learning allows computers to learn from data to make predictions and decisions, and discusses issues like overfitting and underfitting. The chapter also includes practical examples using Python libraries like Scikit-learn.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Chapter 14.

Basic
Concepts of
Machine Learning
Table of contents
• 14.1 Introduction
• 14.2 Linear regression
• 14.3 k-NN algorithm
• 14.4 Overfitting and
underfitting
14.1 Introduction

Big Data Analytics &


Artificial Intelligence
Starting with Python
Introduction
• Rule-based method: A method of writing a program and instructing the computer to
do something.
• Machine learning method: A method in which a computer learns on its own based
on data to solve problems.
Like “AlphaGo,” if you tell the computer only the rules of the Go game and notation
of the previous game, the computer can learn the principles of Go on its own and
play Go.

Go has many
game rules.

Apply machine
learning

The computer program that beats Lee Sedol


Introduction

• Machine learning is a field of artificial intelligence, a research field


for giving learning capabilities to computers.
• Machine learning, which evolved from pattern recognition and
computational learning theories, involves making computers learn how
to make decisions by looking at given data.
• The performance of decision-making algorithms improves as more
data is available to train.
• Unlike algorithms composed of commands that always perform
predetermined operations, learning algorithms can predict and make
decisions using data.
• The field where machine learning is mainly used is to deal with
problems that are difficult for computers to do by specifying a solution
method. For example, it is actively used in fields such as spam mail
filtering, automatic detection of network intruders, computer vision,
and autonomous driving
Introduction

• Different types of machine learning techniques: Machine learning is


generally divided into supervised learning and unsupervised learning
depending on the existence of a “teacher” who teaches.
Machine Learning

Supervised learning Unsupervised learning Reinforcement


learning

Regression analysis Clustering

Random forest Dimensionality


reduction
Decision tree

Classification
Introduction

• Supervised learning: The computer is given examples and correct answers


(or labels) given by the "teacher". The goal of supervised learning is to
learn general rules for mapping inputs to outputs.

Labeling

cat

data Learning using data and labels


Introduction

• Unsupervised learning is representative of clustering shown above. For


this data, it can be divided into two large groups, the rules are that the
computer learns itself by looking at the data.
Introduction

• Reinforcement learning is given learning data in the form of rewards and


punishments. This is the case where only feedback on the behavior of the program is
provided in a dynamic environment, such as driving a vehicle or playing against an
opponent.
14.2 Linear regression

Big Data Analytics &


Artificial Intelligence
Starting with Python
Linear regression

Supervised learning informs problems and correct answers and enables


learning
• Supervised learning predicts a reasonable output value when a new input value
14.1 Supervised learning:
comes in after learning a given input-output pair.
• In other words, supervised learning can be said to learn a mapping function f(x)
Linear regression
from input to output when input (x) and output (y) are given.
Linear regression
Supervised learning informs problems and correct answers and enables
learning
• Suppose we are given points (1, 10), (2, 20), (3, 30), and (4, 40) as input data in

14.1 Supervised learning:


the form of (x, y). The computer does not yet know that the y value for the x
value is data that can be expressed by the equation 𝑦 = 10𝑥. I want to make the
Linear regression
computer answer 50 by learning 4 given data and inputting x = 5 after learning
is finished.
• Supervised learning is when a computer finds the best function that can explain
this input by itself based on input values, and this problem can be called
regression analysis among supervised learning.
Linear regression

Find a function that describes the data well: a regression problem


Linear regression Nonlinear regression Classification

14.1 Supervised learning:


Linear regression

• Regression is the problem of finding the straight line or curve that best describes
the data, usually after plotting the data in a multidimensional space.

• In other words, predicting the function 𝑓(𝑥) while looking at the input 𝑥 and the
output 𝑦 at 𝑦=𝑓(𝑥) is called a regression technique.
Linear regression

• Scikit-learn

– Libraries for Machine Learning

– It includes classification, regression, and clustering algorithms such as linear


regression, k-NN algorithm, support vector machine, random forest, gradient
boosting, and k-means, so it is gaining popularity as a good tool for those
new to machine learning.

• How to install Scikit-learn in Anaconda?


pip install numpy==1.19.2 scipy==1.9.3 scikit-learn
Linear regression

The Simplest Regression: Linear Regression

• Linear regression is a technique for modeling the


https://lh6.googleusercontent.com/KLJCv2ZyvlN6FAGid-bssUZ7n2F_ZUBglQHo_5m5gb-ovIfR-QCJoZzgph3h7M_lR-iUn8G5UTbZb2dCr4CB8xGOfC88M9Lxky8gUq76qoh8Ax9Zo_4TOdNllXJBvdNqZjYNEeKE

correlation of a random variable 𝑥 with another


variable 𝑦 depending on it (x is the feature of the
data, m is the slope, b is the intercept)
Here, it is an input
with two variables,
but in three
dimensions or more,
When there it is a hyperplane.
are more than
two variables,
it is called
multiple
linear
regression.
Linear regression
Let's implement linear regression with the Scikit-Learn library
• Suppose we are examining the height and weight of students in a classroom.
• Suppose, in general, that taller students weigh more.
• Height and weight were measured for several students.
• Student A, whose height is accurately known but whose weight is unknown, is
absent from school.
• Can you accurately estimate this student A's weight?

• If we could create a formula to quantify the correlation between height and weight, we

?
would be able to estimate the weight of student A, whose weight is unknown.
Linear regression
Let's implement linear regression with the Scikit-Learn library
• Four students were randomly extracted to measure the height and weight, and
the height was 164, 179, 162, and 170 cm, and their weight was 53, 63, 55, and
59 kg, respectively.

The input value


Write the most suitable li
must be arranged
near equation to describe
2D
this distribution

Linear
Regression

Height
weight
Linear regression
Let's implement linear regression with the Scikit-Learn library
Caution: Input value is a person’s height, 164, 179, 162, 170, respectively.
-> The input of linear regression must be used to use a multi-dimensional array.

Target value

As a function that creates a linear regression model


Create an input vector X to optimize the target value Y.
In other words, it is a model generator.
Linear regression
Check and predict the results of linear regression learning

If you want to check the slope and


sections of the determined straight line,
check the characteristic value CoEF_
and Intercept_. And how well these
values are suitable for predicting Y for
input X, confirming the score () function

How well the model predict


the data
: About 90 points
Linear regression
Check and predict the results of linear regression learning

Now, for students with a height of 180 or 185, I would like to find out
how the Regrin Linear Return model we created predicts weight.
To do this, prepare the input data.

180 63.71
regr.predict()
185 66.47

Use the predict () function of the linear regression model regr.


Enter the students' keys by entering this function.
Now, based on the model of regr, it is returned by estimating the
students’ heights.
Linear regression

Use MATPLOTLIB library to graph this linear regression


Linear regression

Question: Use a linear regression model to predict the weights of [166, 0] and [170, 1]?

Student 1 Student 2 Student 3 Student 4 Student 5 Student 6 Student 7 Student 8

Height 164 167 165 170 179 163 159 166


Sex 1 1 0 0 0 1 0 1
Weight 43 48 47 66 67 50 52 44

I am a I am a
woman (1) man (0)

Even if the height is similar, the weight of


male and women will be different.

The feature value of student 4 is [170, 0]

As a feature to be used for input of linear


regression
Add a man (0), a woman (1)
Linear regression
Diabetes examples:

• The sklearn library includes a dataset from diabetics.


• The data has more data and features than the above examples.
Linear regression
Diabetes examples: diabetes dataset

This data includes data used as an input, targets used as a result of learning, and
feature_names that store the names of the input features.
Linear regression

Diabetes examples: diabetes dataset

• Extract only one third item corresponding to the body mass index bmi out of 10 features

Use this data as an input of regr we learned earlier.

• The data used as the input of the function must be a two-dimensional array.

Increase the dimension of array using np.newaxis


Now you can use this data X as an input of a linear return
model.

We will just extract only bmi data and use it as an input of linear regression.
Linear regression

Diabetes examples: What is the correlation between the body mass index and the
diabetes level?

• Linear regression learning


Linear regression
Diabetes examples: Separate the diabetes example into training and test data

diabetes dataset load_diabetes() Let's see how accurate this model is


using 20% of the test data (new data).

train_test_split()

learning data test data

X_train X_test
y_train y_test

Final performance
evaluation

linear
regression Model Accuracy
learning

regr.fit() regr.predict()
Linear regression
Diabetes examples:
Separate the diabetes example into training and test data
• Only 80%of the total 442 are used for learning (or training)
• Using the remaining 20%for testing

Learning as a linear regression


model using learning data such as
X_train and y_train
(Using only bmi data)

Both training data and test data


show scores of 35 and 31 points.
Linear regression

Diabetes examples: Use all the features in the dataset for linear regression
Linear regression

Diabetes examples: Use all the features in the dataset for linear regression

predicted
value

actual value
Linear regression

Diabetes examples: Mean squared error(MSE):

There are various methods of calculating the error between y_pred and y_test.
One of them is Mean squared error(MSE):

N
1
MSE =
N
 (H ( X ) − y )
i =1
i i

Where, N is the number of elements, 𝑦𝑖 is the ith y_test value, and 𝐻 𝑋𝑖 is


the y_pred estimated by the linear regression model, corresponding to 𝑦𝑖 .
Linear regression

Diabetes examples: Python program


Linear regression

Diabetes examples: Python program: Results


14.3 k-NN algorithm

Big Data Analytics &


Artificial Intelligence
Starting with Python
k-NN algorithm
The problem of classifying Dachshund and Samoyed dogs
Assume a simple case where the feature space of the data consists of two
features and items are displayed in this feature space.
k-NN algorithm

Samoyed dog
Height

Dachsund dog

Length
• The Samoyed has a high height value compared to its length, and the lower
Dachshund has a low height value compared to its length.
k-NN algorithm
If you classify by looking at 3
Class A of the number of nearest neig
hbors, you belong to class B,
but if you classify by looking
at 5, you belong to class A.

Class B

When k = 3: 2 class B, 1 class A


When k = 5: 2 class B, 3 class A
k-NN algorithm

Get ready to classify the beautiful irises


k-NN algorithm

sepal length

petal length
sepal width

petal width
Setosa : 0
Versicolor : 1
Virginica: 2.
k-NN algorithm

Setosa : 0
Versicolor : 1
Virginica: 2.
k-NN algorithm

You can also see that the labels are encoded as 0, 1, 2


Setosa : 0
Versicolor : 1
Virginica: 2
k-NN algorithm
Apply the k-NN algorithm
Using 80% of the total data as training data,
this model
The remaining 20% of the test data
Make sure you predict well

Training and testing using


KNeighborClassifier
k-NN algorithm

Let's classify new flowers by applying the model?


k-NN algorithm
Let's find out the accuracy of the classifier
k-NN algorithm
Let's find out the accuracy of the classifier
k-NN algorithm
Let's find out the accuracy of the classifier
14.4 Overfitting and
underfitting

Big Data Analytics &


Artificial Intelligence
Starting with Python
Overfitting and underfitting

When performance is excellent on


Poor performance on both trained and
trained data, but performance on
new data.
new data is poor.

Underfitting and overfittin


g
1) Reasons for underfitting a
nd overfitting?
2) Techniques to reduce unde
rfitting and overfitting?

Underfit Good fit Overfit


(https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning)

You might also like