0% found this document useful (0 votes)

139 views32 pages

Modul 7 Praktikum Machine Learning Python

Machine learning involves analyzing data to make predictions. It works by studying large datasets to learn patterns and predict outcomes. Key steps include understanding data types, calculating statistical values like the mean, median, and mode to summarize data, and visualizing data distributions using histograms. These techniques help machine learning models learn from data to make informed predictions.

Uploaded by

Anandita Putri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views32 pages

Modul 7 Praktikum Machine Learning Python

Uploaded by

Anandita Putri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Machine Learning

Machine Learning is making the computer learn from studying data and statistics.

Machine Learning is a step into the direction of artificial intelligence (AI).

Machine Learning is a program that analyses data and learns to predict the outcome.

Where To Start?
In this tutorial we will go back to mathematics and study statistics, and how to calculate
important numbers based on data sets.

We will also learn how to use various Python modules to get the answers we need.

And we will learn how to make functions that are able to predict the outcome based on what
we have learned.

Data Set
In the mind of a computer, a data set is any collection of data. It can be anything from an
array to a complete database.

Example of an array:

[99,86,87,88,111,86,103,87,94,78,77,85,86]

By looking at the array, we can guess that the average value is probably around 80 or 90, and
we are also able to determine the highest value and the lowest value, but what else can we
do?

And by looking at the database we can see that the most popular color is white, and the oldest
car is 17 years, but what if we could predict if a car had an AutoPass, just by looking at the
other values?

That is what Machine Learning is for! Analyzing data and predicting the outcome!

In Machine Learning it is common to work with very large data sets. In this tutorial we will
try to make it as easy as possible to understand the different concepts of machine learning,
and we will work with small easy-to-understand data sets.

Data Types
To analyze data, it is important to know what type of data we are dealing with.

We can split the data types into three main categories:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 1

• Numerical
• Categorical
• Ordinal

Numerical data are numbers, and can be split into two numerical categories:

• Discrete Data
- numbers that are limited to integers. Example: The number of cars passing by.
• Continuous Data
- numbers that are of infinite value. Example: The price of an item, or the size of an
item

Categorical data are values that cannot be measured up against each other. Example: a color
value, or any yes/no values.

Ordinal data are like categorical data, but can be measured up against each other. Example:
school grades where A is better than B and so on.

By knowing the data type of your data source, you will be able to know what technique to use
when analyzing them.

You will learn more about statistics and analyzing data in the next chapters.

Mean, Median, and Mode

What can we learn from looking at a group of numbers?

In Machine Learning (and in mathematics) there are often three values that interests us:

• Mean - The average value

• Median - The mid point value
• Mode - The most common value

Example: We have registered the speed of 13 cars:

speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]

What is the average, the middle, or the most common speed value?

Mean
The mean value is the average value.

To calculate the mean, find the sum of all values, and divide the sum by the number of
values:

(99+86+87+88+111+86+103+87+94+78+77+85+86) / 13 = 89.77

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 2

The NumPy module has a method for this. Learn about the NumPy module in our NumPy
Tutorial.

Example

Use the NumPy mean() method to find the average speed:

Median
The median value is the value in the middle, after you have sorted all the values:

77, 78, 85, 86, 86, 86, 87, 87, 88, 94, 99, 103, 111

It is important that the numbers are sorted before you can find the median.

The NumPy module has a method for this:

Example

Use the NumPy median() method to find the middle value:

If there are two numbers in the middle, divide the sum of those numbers by two.

77, 78, 85, 86, 86, 86, 87, 87, 94, 98, 99, 103

(86 + 87) / 2 = 86.5

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 3

Example

Using the NumPy module:

Mode
The Mode value is the value that appears the most number of times:

99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86 = 86

The SciPy module has a method for this. Learn about the SciPy module in our SciPy Tutorial.

Example

Use the SciPy mode() method to find the number that appears the most:

The Mean, Median, and Mode are techniques that are often used in Machine Learning, so it is
important to understand the concept behind them.

Standard Deviation
Standard deviation is a number that describes how spread out the values are.

A low standard deviation means that most of the numbers are close to the mean (average)
value.

A high standard deviation means that the values are spread out over a wider range.

Example: This time we have registered the speed of 7 cars:

speed = [86,87,88,86,87,85,86]

The standard deviation is:

0.9

Meaning that most of the values are within the range of 0.9 from the mean value, which is
86.4.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 4

Let us do the same with a selection of numbers with a wider range:

speed = [32,111,138,28,59,77,97]

The standard deviation is:

37.85

Meaning that most of the values are within the range of 37.85 from the mean value, which is
77.4.

As you can see, a higher standard deviation indicates that the values are spread out over a
wider range.

The NumPy module has a method to calculate the standard deviation:

Example

Use the NumPy std() method to find the standard deviation:

Variance
Variance is another number that indicates how spread out the values are.

In fact, if you take the square root of the variance, you get the standard deviation!

Or the other way around, if you multiply the standard deviation by itself, you get the
variance!

To calculate the variance you have to do as follows:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 5

1. Find the mean:

(32+111+138+28+59+77+97) / 7 = 77.4

2. For each value: find the difference from the mean:

32 - 77.4 = -45.4
111 - 77.4 = 33.6
138 - 77.4 = 60.6
28 - 77.4 = -49.4
59 - 77.4 = -18.4
77 - 77.4 = - 0.4
97 - 77.4 = 19.6

3. For each difference: find the square value:

(-45.4)2 = 2061.16
(33.6)2 = 1128.96
(60.6)2 = 3672.36
(-49.4)2 = 2440.36
(-18.4)2 = 338.56
(- 0.4)2 = 0.16
(19.6)2 = 384.16

4. The variance is the average number of these squared differences:

(2061.16+1128.96+3672.36+2440.36+338.56+0.16+384.16) / 7 = 1432.2

Luckily, NumPy has a method to calculate the variance:

Example

Use the NumPy var() method to find the variance:

As we have learned, the formula to find the standard deviation is the square root of the
variance:

√1432.25 = 37.85

Or, as in the example from before, use the NumPy to calculate the standard deviation:

Example

Use the NumPy std() method to find the standard deviation:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 6

Symbols

Standard Deviation is often represented by the symbol Sigma: σ

Variance is often represented by the symbol Sigma Square: σ 2

The Standard Deviation and Variance are terms that are often used in Machine Learning, so it
is important to understand how to get them, and the concept behind them.

Percentiles
Percentiles are used in statistics to give you a number that describes the value that a given
percent of the values are lower than.

Example: Let's say we have an array of the ages of all the people that lives in a street.

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]

What is the 75. percentile? The answer is 43, meaning that 75% of the people are 43 or
younger.

The NumPy module has a method for finding the specified percentile:

Example

Use the NumPy percentile() method to find the percentiles:

Example

What is the age that 90% of the people are younger than?

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 7

Data Distribution
Earlier in this tutorial we have worked with very small amounts of data in our examples, just
to understand the different concepts.

In the real world, the data sets are much bigger, but it can be difficult to gather real world
data, at least at an early stage of a project.

How Can we Get Big Data Sets?

To create big data sets for testing, we use the Python module NumPy, which comes with a
number of methods to create random data sets, of any size.

Example

Create an array containing 250 random floats between 0 and 5:

Histogram
To visualize the data set we can draw a histogram with the data we collected.

We will use the Python module Matplotlib to draw a histogram.

Learn about the Matplotlib module in our Matplotlib Tutorial.

Example

Draw a histogram:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 8

Histogram Explained

We use the array from the example above to draw a histogram with 5 bars.

The first bar represents how many values in the array are between 0 and 1.

The second bar represents how many values are between 1 and 2.

Etc.

Which gives us this result:

• 52 values are between 0 and 1

• 48 values are between 1 and 2
• 49 values are between 2 and 3
• 51 values are between 3 and 4
• 50 values are between 4 and 5

Note: The array values are random numbers and will not show the exact same result on your
computer.

Big Data Distributions

An array containing 250 values is not considered very big, but now you know how to create a
random set of values, and by changing the parameters, you can create the data set as big as
you want.

Example

Create an array with 100000 random numbers, and display them using a histogram with 100
bars:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 9

Normal Data Distribution
In the previous chapter we learned how to create a completely random array, of a given size,
and between two given values.

In this chapter we will learn how to create an array where the values are concentrated around
a given value.

In probability theory this kind of data distribution is known as the normal data distribution,
or the Gaussian data distribution, after the mathematician Carl Friedrich Gauss who came up
with the formula of this data distribution.

Example

A typical normal data distribution:

Note: A normal distribution graph is also known as the bell curve because of it's
characteristic shape of a bell.

Histogram Explained

We use the array from the numpy.random.normal() method, with 100000 values, to draw a
histogram with 100 bars.

We specify that the mean value is 5.0, and the standard deviation is 1.0.

Meaning that the values should be concentrated around 5.0, and rarely further away than 1.0
from the mean.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 10

And as you can see from the histogram, most values are between 4.0 and 6.0, with a top at
approximately 5.0.

Scatter Plot
A scatter plot is a diagram where each value in the data set is represented by a dot.

The Matplotlib module has a method for drawing scatter plots, it needs two arrays of the
same length, one for the values of the x-axis, and one for the values of the y-axis:

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]

y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

The x array represents the age of each car.

The y array represents the speed of each car.

Example

Use the scatter() method to draw a scatter plot diagram:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 11

Scatter Plot Explained

The x-axis represents ages, and the y-axis represents speeds.

What we can read from the diagram is that the two fastest cars were both 2 years old, and the
slowest car was 12 years old.

Note: It seems that the newer the car, the faster it drives, but that could be a coincidence,
after all we only registered 13 cars.

Random Data Distributions

In Machine Learning the data sets can contain thousands-, or even millions, of values.

You might not have real world data when you are testing an algorithm, you might have to use
randomly generated values.

As we have learned in the previous chapter, the NumPy module can help us with that!

Let us create two arrays that are both filled with 1000 random numbers from a normal data
distribution.

The first array will have the mean set to 5.0 with a standard deviation of 1.0.

The second array will have the mean set to 10.0 with a standard deviation of 2.0:

Example

A scatter plot with 1000 dots:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 12

Scatter Plot Explained

We can see that the dots are concentrated around the value 5 on the x-axis, and 10 on the y-
axis.

We can also see that the spread is wider on the y-axis than on the x-axis.

Regression
The term regression is used when you try to find the relationship between variables.

In Machine Learning, and in statistical modeling, that relationship is used to predict the
outcome of future events.

Linear Regression
Linear regression uses the relationship between the data-points to draw a straight line through
all them.

This line can be used to predict future values.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 13

In Machine Learning, predicting the future is very important.

How Does it Work?

Python has methods for finding a relationship between data-points and to draw a line of linear
regression. We will show you how to use these methods instead of going through the
mathematic formula.

In the example below, the x-axis represents age, and the y-axis represents speed. We have
registered the age and speed of 13 cars as they were passing a tollbooth. Let us see if the data
we collected could be used in a linear regression:

Example

Start by drawing a scatter plot:

Example

Import scipy and draw the line of Linear Regression:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 14

Example Explained

Import the modules you need.

You can learn about the Matplotlib module in our Matplotlib Tutorial.

You can learn about the SciPy module in our SciPy Tutorial.

import matplotlib.pyplot as plt

from scipy import stats

Create the arrays that represent the values of the x and y axis:

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

Execute a method that returns some important key values of Linear Regression:

slope, intercept, r, p, std_err = stats.linregress(x, y)

Create a function that uses the slope and intercept values to return a new value. This new
value represents where on the y-axis the corresponding x value will be placed:

def myfunc(x):
return slope * x + intercept

Run each value of the x array through the function. This will result in a new array with new
values for the y-axis:

mymodel = list(map(myfunc, x))

Draw the original scatter plot:

plt.scatter(x, y)

Draw the line of linear regression:

plt.plot(x, mymodel)

Display the diagram:

plt.show()

R for Relationship
It is important to know how the relationship between the values of the x-axis and the values
of the y-axis is, if there are no relationship the linear regression can not be used to predict
anything.

This relationship - the coefficient of correlation - is called r.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 15

The r value ranges from -1 to 1, where 0 means no relationship, and 1 (and -1) means 100%
related.

Python and the Scipy module will compute this value for you, all you have to do is feed it
with the x and y values.

Example

How well does my data fit in a linear regression?

Note: The result -0.76 shows that there is a relationship, not perfect, but it indicates that we
could use linear regression in future predictions.

Predict Future Values

Now we can use the information we have gathered to predict future values.

Example: Let us try to predict the speed of a 10 years old car.

To do so, we need the same myfunc() function from the example above:

def myfunc(x):
return slope * x + intercept

Example

Predict the speed of a 10 years old car:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 16

The example predicted a speed at 85.6, which we also could read from the diagram:

Bad Fit?
Let us create an example where linear regression would not be the best method to predict
future values.

Example

These values for the x- and y-axis should result in a very bad fit for linear regression:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 17

Example

You should get a very low r value.

The result: 0.013 indicates a very bad relationship, and tells us that this data set is not suitable
for linear regression.

Polynomial Regression
If your data points clearly will not fit a linear regression (a straight line through all data
points), it might be ideal for polynomial regression.

Polynomial regression, like linear regression, uses the relationship between the variables x
and y to find the best way to draw a line through the data points.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 18

How Does it Work?
Python has methods for finding a relationship between data-points and to draw a line of
polynomial regression. We will show you how to use these methods instead of going through
the mathematic formula.

In the example below, we have registered 18 cars as they were passing a certain tollbooth.

We have registered the car's speed, and the time of day (hour) the passing occurred.

The x-axis represents the hours of the day and the y-axis represents the speed:

Example

Start by drawing a scatter plot:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 19

Example

Import numpy and matplotlib then draw the line of Polynomial Regression:

Example Explained

Import the modules you need.

You can learn about the NumPy module in our NumPy Tutorial.

You can learn about the SciPy module in our SciPy Tutorial.

import numpy
import matplotlib.pyplot as plt

Create the arrays that represent the values of the x and y axis:

x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]

NumPy has a method that lets us make a polynomial model:

mymodel = numpy.poly1d(numpy.polyfit(x, y, 3))

Then specify how the line will display, we start at position 1, and end at position 22:

myline = numpy.linspace(1, 22, 100)

Draw the original scatter plot:

plt.scatter(x, y)

Draw the line of polynomial regression:

plt.plot(myline, mymodel(myline))

Display the diagram:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 20

plt.show()

R-Squared
It is important to know how well the relationship between the values of the x- and y-axis is, if
there are no relationship the polynomial regression can not be used to predict anything.

The relationship is measured with a value called the r-squared.

The r-squared value ranges from 0 to 1, where 0 means no relationship, and 1 means 100%
related.

Python and the Sklearn module will compute this value for you, all you have to do is feed it
with the x and y arrays:

Example

How well does my data fit in a polynomial regression?

Note: The result 0.94 shows that there is a very good relationship, and we can use
polynomial regression in future predictions.

Predict Future Values

Now we can use the information we have gathered to predict future values.

Example: Let us try to predict the speed of a car that passes the tollbooth at around 17 P.M:

To do so, we need the same mymodel array from the example above:

mymodel = numpy.poly1d(numpy.polyfit(x, y, 3))

Example

Predict the speed of a car passing at 17 P.M:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 21

The example predicted a speed to be 88.87, which we also could read from the diagram:

Bad Fit?
Let us create an example where polynomial regression would not be the best method to
predict future values.

Example

These values for the x- and y-axis should result in a very bad fit for polynomial regression:

And the r-squared value?

Example

You should get a very low r-squared value.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 22

The result: 0.00995 indicates a very bad relationship, and tells us that this data set is not
suitable for polynomial regression.

Multiple Regression
Multiple regression is like linear regression, but with more than one independent value,
meaning that we try to predict a value based on two or more variables.

Take a look at the data set below, it contains some information about cars

We can predict the CO2 emission of a car based on the size of the engine, but with multiple
regression we can throw in more variables, like the weight of the car, to make the prediction
more accurate.

How Does it Work?

In Python we have modules that will do the work for us. Start by importing the Pandas
module.

import pandas

Learn about the Pandas module in our Pandas Tutorial.

The Pandas module allows us to read csv files and return a DataFrame object.

The file is meant for testing purposes only, you can use cars.csv file included.

df = pandas.read_csv("cars.csv")

Then make a list of the independent values and call this variable X.

Put the dependent values in a variable called y.

X = df[['Weight', 'Volume']]
y = df['CO2']

Tip: It is common to name the list of independent values with a upper case X, and the list of
dependent values with a lower case y.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 23

We will use some methods from the sklearn module, so we will have to import that module
as well:

from sklearn import linear_model

From the sklearn module we will use the LinearRegression() method to create a linear
regression object.

This object has a method called fit() that takes the independent and dependent values as
parameters and fills the regression object with data that describes the relationship:

regr = linear_model.LinearRegression()
regr.fit(X, y)

Now we have a regression object that are ready to predict CO2 values based on a car's weight
and volume:

#predict the CO2 emission of a car where the weight is 2300kg, and the
volume is 1300cm3:
predictedCO2 = regr.predict([[2300, 1300]])

Example

See the whole example in action:

We have predicted that a car with 1.3 liter engine, and a weight of 2300 kg, will release
approximately 107 grams of CO2 for every kilometer it drives.

Coefficient
The coefficient is a factor that describes the relationship with an unknown variable.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 24

Example: if x is a variable, then 2x is x two times. x is the unknown variable, and the number
2 is the coefficient.

In this case, we can ask for the coefficient value of weight against CO2, and for volume
against CO2. The answer(s) we get tells us what would happen if we increase, or decrease,
one of the independent values.

Example

Print the coefficient values of the regression object:

Result Explained
The result array represents the coefficient values of weight and volume.

Weight: 0.00755095
Volume: 0.00780526

These values tell us that if the weight increase by 1kg, the CO2 emission increases by
0.00755095g.

And if the engine size (Volume) increases by 1 cm3, the CO2 emission increases by
0.00780526 g.

I think that is a fair guess, but let test it!

We have already predicted that if a car with a 1300cm3 engine weighs 2300kg, the CO2
emission will be approximately 107g.

What if we increase the weight with 1000kg?

Example

Copy the example from before, but change the weight from 2300 to 3300:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 25

We have predicted that a car with 1.3 liter engine, and a weight of 3300 kg, will release
approximately 115 grams of CO2 for every kilometer it drives.

Which shows that the coefficient of 0.00755095 is correct:

107.2087328 + (1000 * 0.00755095) = 114.75968

Scale Features
When your data has different values, and even different measurement units, it can be difficult
to compare them. What is kilograms compared to meters? Or altitude compared to time?

The answer to this problem is scaling. We can scale data into new values that are easier to
compare.

Take a look at the table below, it is the same data set that we used in the multiple regression,
but this time the volume column contains values in liters instead of cm3 (1.0 instead of 1000).

It can be difficult to compare the volume 1.0 with the weight 790, but if we scale them both
into comparable values, we can easily see how much one value is compared to the other.

There are different methods for scaling data, in this tutorial we will use a method called
standardization.

The standardization method uses this formula:

z = (x - u) / s

Where z is the new value, x is the original value, u is the mean and s is the standard
deviation.

If you take the weight column from the data set above, the first value is 790, and the scaled
value will be:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 26

(790 - 1292.23) / 238.74 = -2.1

If you take the volume column from the data set above, the first value is 1.0, and the scaled
value will be:

(1.0 - 1.61) / 0.38 = -1.59

Now you can compare -2.1 with -1.59 instead of comparing 790 with 1.0.

You do not have to do this manually, the Python sklearn module has a method called
StandardScaler() which returns a Scaler object with methods for transforming data sets.

Example

Scale all values in the Weight and Volume columns:

Predict CO2 Values

The task in the Multiple Regression chapter was to predict the CO2 emission from a car when
you only knew its weight and volume.

When the data set is scaled, you will have to use the scale when you predict values:

Example

Predict the CO2 emission from a 1.3 liter car that weighs 2300 kilograms:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 27

Evaluate Your Model
In Machine Learning we create models to predict the outcome of certain events, like in the
previous chapter where we predicted the CO2 emission of a car when we knew the weight
and engine size.

To measure if the model is good enough, we can use a method called Train/Test.

What is Train/Test
Train/Test is a method to measure the accuracy of your model.

It is called Train/Test because you split the the data set into two sets: a training set and a
testing set.

80% for training, and 20% for testing.

You train the model using the training set.

You test the model using the testing set.

Train the model means create the model.

Test the model means test the accuracy of the model.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 28

Start With a Data Set
Start with a data set you want to test.

Our data set illustrates 100 customers in a shop, and their shopping habits.

Example

Split Into Train/Test

The training set should be a random selection of 80% of the original data.

The testing set should be the remaining 20%.

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

Display the Training Set

Display the same scatter plot with the training set:

Example

Display the Testing Set

To make sure the testing set is not completely different, we will take a look at the testing set
as well.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 29

Example

Fit the Data Set

What does the data set look like? In my opinion I think the best fit would be a polynomial
regression, so let us draw a line of polynomial regression.

To draw a line through the data points, we use the plot() method of the matplotlib module:

Example

Draw a polynomial regression line through the data points:

The result can back my suggestion of the data set fitting a polynomial regression, even
though it would give us some weird results if we try to predict values outside of the data set.
Example: the line indicates that a customer spending 6 minutes in the shop would make a
purchase worth 200. That is probably a sign of overfitting.

But what about the R-squared score? The R-squared score is a good indicator of how well my
data set is fitting the model.

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 30

Remember R2, also known as R-squared?

It measures the relationship between the x axis and the y axis, and the value ranges from 0 to
1, where 0 means no relationship, and 1 means totally related.

The sklearn module has a method called r2_score() that will help us find this relationship.

In this case we would like to measure the relationship between the minutes a customer stays
in the shop and how much money they spend.

Example

How well does my training data fit in a polynomial regression?

Note: The result 0.799 shows that there is a OK relationship.

Bring in the Testing Set

Now we have made a model that is OK, at least when it comes to training data.

Now we want to test the model with the testing data as well, to see if gives us the same result.

Example

Let us find the R2 score when using testing data:

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 31

Predict Values
Now that we have established that our model is OK, we can start predicting new values.

Example

How much money will a buying customer spend, if she or he stays in the shop for 5 minutes?

Maman Somantri – Modul Praktikum Algoritma & Pemrograman 32

Abhishek Thakur - Approaching (Almost) Any Machine Learning Problem-Abhishek Thakur (2020) PDF
100% (6)
Abhishek Thakur - Approaching (Almost) Any Machine Learning Problem-Abhishek Thakur (2020) PDF
301 pages
Machine Learning From Scratch PDF
89% (9)
Machine Learning From Scratch PDF
124 pages
Unit 1-Week2: Linear Regression, Bias, Variance, Under and Over Fitting, Curse of Dimensionality and ROC
No ratings yet
Unit 1-Week2: Linear Regression, Bias, Variance, Under and Over Fitting, Curse of Dimensionality and ROC
53 pages
Tutorial-1-Machine Learning-2020
No ratings yet
Tutorial-1-Machine Learning-2020
1 page
Machine Learning: Dr. Muhammad Asadullah
No ratings yet
Machine Learning: Dr. Muhammad Asadullah
69 pages
Machine Learning: Where To Start?
No ratings yet
Machine Learning: Where To Start?
71 pages
LASSO Book Tibshirani PDF
No ratings yet
LASSO Book Tibshirani PDF
362 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
No ratings yet
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
4 pages
Data Mining Exercises - Solutions
No ratings yet
Data Mining Exercises - Solutions
5 pages
8 Logic Programming
No ratings yet
8 Logic Programming
29 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
16 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
Mathematical Foundations of Machine Learning: (NMAG 469, FALL TERM 2018-2019)
No ratings yet
Mathematical Foundations of Machine Learning: (NMAG 469, FALL TERM 2018-2019)
74 pages
Medians and Order Statistics: CLRS Chapter 9
No ratings yet
Medians and Order Statistics: CLRS Chapter 9
19 pages
Logistic Regression
No ratings yet
Logistic Regression
7 pages
Application of Machine Learning Techniques in Project Management
No ratings yet
Application of Machine Learning Techniques in Project Management
11 pages
Greykite Part 2
No ratings yet
Greykite Part 2
2 pages
Artificial Intelligence Lab 2
No ratings yet
Artificial Intelligence Lab 2
4 pages
Data Preparation For Automated Machine Learning: White Paper
No ratings yet
Data Preparation For Automated Machine Learning: White Paper
21 pages
Fuzzy Neural Network
No ratings yet
Fuzzy Neural Network
6 pages
FCRAR 2012-1-2 Masory Payrard Bartlet Wright FAU
No ratings yet
FCRAR 2012-1-2 Masory Payrard Bartlet Wright FAU
4 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
19 pages
DSBDA ORAL Question Bank
100% (1)
DSBDA ORAL Question Bank
6 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
Understanding Random Forest
100% (1)
Understanding Random Forest
12 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
EDA-Discrete Probability Distribution
No ratings yet
EDA-Discrete Probability Distribution
35 pages
Lecture 4.b - Metaheuristics - Basic Concepts
No ratings yet
Lecture 4.b - Metaheuristics - Basic Concepts
42 pages
Feed Forward Neural Networks: Prof. Adel Abdennour
No ratings yet
Feed Forward Neural Networks: Prof. Adel Abdennour
48 pages
The Qur'an On Cosmology - Part Two
No ratings yet
The Qur'an On Cosmology - Part Two
49 pages
Activity - Detect The Bias in Ai Model
No ratings yet
Activity - Detect The Bias in Ai Model
7 pages
Statistical Methods in Artificial Intelligence CSE471 - Monsoon 2015
No ratings yet
Statistical Methods in Artificial Intelligence CSE471 - Monsoon 2015
23 pages
Label Noise Types and Their Effects On Deep Learning
No ratings yet
Label Noise Types and Their Effects On Deep Learning
6 pages
Machine Learning
No ratings yet
Machine Learning
80 pages
6.lab Activity
No ratings yet
6.lab Activity
23 pages
Beating The Odds: Learning To Bet On Soccer Matches Using Historical Data
No ratings yet
Beating The Odds: Learning To Bet On Soccer Matches Using Historical Data
7 pages
CSE 446 Machine Learning: Instructor: Pedro Domingos
No ratings yet
CSE 446 Machine Learning: Instructor: Pedro Domingos
17 pages
1804 03209
No ratings yet
1804 03209
11 pages
English Alphabet Recognition With Telephone Speech: Bid, Pit, Viz IS. Is, Pit, BID Viz
No ratings yet
English Alphabet Recognition With Telephone Speech: Bid, Pit, Viz IS. Is, Pit, BID Viz
10 pages
Gumpy Tutorial
No ratings yet
Gumpy Tutorial
26 pages
Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau
No ratings yet
Machine Translation: A Presentation By: Julie Conlonova, Rob Chase, and Eric Pomerleau
31 pages
PerceptiLabs-ML Handbook
No ratings yet
PerceptiLabs-ML Handbook
31 pages
Planar Data Classification With One Hidden Layer v5
No ratings yet
Planar Data Classification With One Hidden Layer v5
19 pages
DATA SCIENCE Indeks Standar Pencemaran Udara (ISPU) PROVINSI DKI JAKARTA Tahun 2020
No ratings yet
DATA SCIENCE Indeks Standar Pencemaran Udara (ISPU) PROVINSI DKI JAKARTA Tahun 2020
21 pages
Keyence Image Processing Useful Tips Vol.7 Pre Processing
No ratings yet
Keyence Image Processing Useful Tips Vol.7 Pre Processing
6 pages
A Neural Attention Model For Speech Command Recognition: A B C C
No ratings yet
A Neural Attention Model For Speech Command Recognition: A B C C
18 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review
No ratings yet
Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review
41 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
The Role of Artificial Intelligence in Cyber Security
No ratings yet
The Role of Artificial Intelligence in Cyber Security
24 pages
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
No ratings yet
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
24 pages
UNIT-1 Introduction: Dr. C.Nagaraju Head of Cse Ysrec of YVU Proddatur
100% (1)
UNIT-1 Introduction: Dr. C.Nagaraju Head of Cse Ysrec of YVU Proddatur
86 pages
Machine Learning: Data Set
100% (1)
Machine Learning: Data Set
52 pages
AI and Machine Learning Module Resources
No ratings yet
AI and Machine Learning Module Resources
56 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
Bayesian Learning
No ratings yet
Bayesian Learning
49 pages
Data Science Report
No ratings yet
Data Science Report
35 pages
2021 - Inductive Logic Programming at 30
No ratings yet
2021 - Inductive Logic Programming at 30
23 pages
06 - Classification Algorithms - Part II
No ratings yet
06 - Classification Algorithms - Part II
28 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Lecture 9.1 - Model Evaluations - Train Test Cross-Validate (Autosaved)
No ratings yet
Lecture 9.1 - Model Evaluations - Train Test Cross-Validate (Autosaved)
33 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
112 pages
Machine Learning Textbook
No ratings yet
Machine Learning Textbook
191 pages
1811 01348 PDF
No ratings yet
1811 01348 PDF
27 pages
Poly
100% (1)
Poly
108 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Crash Course Coding Companion
No ratings yet
Crash Course Coding Companion
136 pages
TF Idf
100% (3)
TF Idf
38 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
ClassXI DS Student Handbook
No ratings yet
ClassXI DS Student Handbook
107 pages
SMOTE For Imbalanced Classification With Python
No ratings yet
SMOTE For Imbalanced Classification With Python
75 pages
1 - Machine Learning (Start)
No ratings yet
1 - Machine Learning (Start)
32 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Lecture13 ANFIS
No ratings yet
Lecture13 ANFIS
43 pages
Top 9 Data Science Algorithms
No ratings yet
Top 9 Data Science Algorithms
152 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Prolog 101 PDF
No ratings yet
Prolog 101 PDF
61 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Preparing Data For Machine Learning - Pluralsight PDF
No ratings yet
Preparing Data For Machine Learning - Pluralsight PDF
74 pages
LPTHW
100% (1)
LPTHW
220 pages
Machine Learning
No ratings yet
Machine Learning
27 pages
0802 Python Tutorial
100% (1)
0802 Python Tutorial
155 pages
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
100% (1)
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
151 pages
Module 5 - AI
No ratings yet
Module 5 - AI
16 pages
Pytorch Lightning Readthedocs Latest
100% (1)
Pytorch Lightning Readthedocs Latest
421 pages
Photon Prog Guide
100% (1)
Photon Prog Guide
919 pages