0% found this document useful (0 votes)
19 views8 pages

Article Module 4

Uploaded by

Permana Agung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views8 pages

Article Module 4

Uploaded by

Permana Agung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

[Company name]

[Document title]
[Document subtitle]

User
[Date]
OPTIMIZATION AND
SIMULATION Document No.
LABORATORY

Form No.
4 MODULE
th

Effectively Apply

Module Name Regression Model Development

Labwork Analytic Data Logistics

PLO 7. Able to implement mathematics and science knowledge,


Program Learning
engineering principles, and information technology needed to
Outcomes
solve engineering problems in Logistics Systems.

CLO 2. Students are able to work and contribute effectively in


teams in making data analytics programs.
CLO 4. Students are able to apply the concepts of simple
Course Learning
regression, classification, clustering, as well as model
Outcomes
validation.
CLO 5. Students are able to use the python programming
language to perform structured data analysis.
MODULE 4
REGRESSION MODEL DEVELOPMENT

1. Model Development
The development model can be interpreted as a mathematical equation that can be
used to predict a value. The development model can be used to connect one or more
independent variables to the dependent variables. The development model has several types
of models, one of them is Regression Model Development which we will learn in this
module.

2. Linear Regression
One of the subcategories of supervised learning is regression analysis. The purpose
of a regression analysis is to predict outputs on a continuous scale of two or more variables
and to estimate the relationship between these variables.
Linear regression is a linear approach for modelling the relationship between a
scalar response and one or more explanatory variables. In linear regression, the reliable
variation of the response variable by the predictor variable depends on the number of
variables involved in the model. The goal of linear regression is to model the relationship
between one or multiple features and a continuous target variable. The type of data
commonly used for linear regression analysis is interval or ratio data.
There are two types of linear regression, namely Simple Linear Regression and
Multiple Linear Regression.

2.1. Simple Linear Regression


The purpose of simple linear regression is to model the relationship between a
single feature (explanatory variable, 𝑥) and a continuous-valued target (response
variable, 𝑦). The equation of a linear model with one explanatory variable is defined
as follows:
𝑦 = 𝑤0 + 𝑤1 𝑥

The weight, 𝑤0 , represents the 𝑦 axis intercept and 𝑤1 is the weight coefficient
of the explanatory variable.
Based on the linear equation previously, linear regression can be understood as
finding the best-fitting straight line through the training examples, as shown in the
following figure:

Picture 2.1.1 Simple Linear Regression

This best-fitting line is also called the regression line, and the vertical lines from
the regression line to the training examples are the so-called offsets or residuals—the
errors of our prediction.
This method is used to test how far the cause-and-effect relationship is to the
variables which are the causal factors for the consequent variables. Examples of using
Simple Linear Regression analysis in production activities include:
1) The relationship between the duration of an employee's income with their
happiness.
2) The relationship between the number of workers and the output produced.

2.2. Multiple Linear Regression


Multiple linear regression is a statistical analysis used to determine the effect of
two or more independent variables on the dependent variable. This model aims to
assume a linear relationship between the dependent variable and its predictors. The
equation of a linear model with one explanatory variable is defined as follows:
𝑛

𝑦 = 𝑤0 𝑥0 + 𝑤1 𝑥1 + ⋯ + 𝑤𝑚 𝑥𝑚 = ∑ 𝑤𝑖 𝑥𝑖 = 𝑤 𝑇 𝑥
𝑖=0

Here, 𝑤0 is the 𝑦 axis intercept with 𝑥0 = 1.


The following figure shows how the two-dimensional, fitted hyperplane of a
multiple linear regression model with two features could look:

Picture 2.2.1 Multiple Linear Regression

Examples of using Multiple Linear Regression analysis include:


1) Predicting vehicle prices by considering the km driven, ex showroom price, and
year.
2) To predict how much CO2 a car might emit due to independent variables, such as
the car’s engine size and number of cylinders and fuel consumption.

3. Model Evaluation
Model evaluation is a statistical test of each method of estimation that is done.
Model evaluation is used to determine how well a model fits into data. To evaluate our
models so far, we have split our dataset into a train set and a test set, while in python the
training and testing process can use the train_test_split function. The reason we split
our data into training and test sets is that we are interested in measuring how well our model
generalises to new, previously unseen data. So, how well it can make predictions for data
that was not observed during training.

3.1. Training
Training on model evaluation can be used for model building, besides that, it
is used to determine the generalizability of the model that has been made. Built a model
on the training set by calling the fit method.
3.2. Testing
Testing on model evaluation can be used for testing the model that has been
made. In testing, the greater the testing data, the more accurate the simulation of a
model will be. Evaluate the test set in python using the score method.
4. Evaluation Metrics
In evaluating the performance of the regression results, several evaluation metrics
can be used, such as Mean Squared Error, R-Squared, and Accuracy. It is important to
choose the right metrics when selecting between models and adjusting parameters. Every
metrics has its own uniqueness and advantages.

4.1. Mean Squared Error (MSE)


Mean squared error is the sum of the differences between the predictions and
the true values. MSE is a method used to get the best model by looking for the model
that has the smallest MSE value of all models. The following is the formula for MSE:

1
𝑀𝑆𝐸 = ∑ 𝑒2
𝑛

4.2. R-Squared
R-squared is the coefficient of determination that measures the extent to which
the model is able to explain the variation of the dependent variable, and the coefficient
of determination can be used to describe how much variation is described in the model.
The following is the formula for R-Squared:

𝑀𝑆𝐸 𝑜𝑓 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝐿𝑖𝑛𝑒


𝑅 2 = (1 − )
𝑀𝑆𝐸 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎

4.3. Accuracy
Accuracy is a method used to evaluate the performance of the model that has
been made, accuracy can also be called the proportion of correct predictions divided
by the number of samples. The following is the formula for accuracy:
𝑁
1
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ∑ 𝑣𝑒𝑟𝑑𝑖𝑐𝑡𝑖
𝑁
𝑖=1
REFERENCES

Aminuddin, A., Sudarno, S., & Sugito, S. (2018). Pemilihan Model Regresi Linier
Multivariat Terbaik Dengan Kriteria Mean Square Error. Jurnal Gaussian, 2.

Muller, A., & Guido, S. (2017). Introduction to Machine Learning with Python.
O'Reilly Media, Inc.

Raschka, S., & Mirjalili, V. (2019). Python Machine Learning (3rd ed.). Packt
Publishing.

Sudijono, A. (1996). Pengantar Statistik Pendidikan. Rajawali.

You might also like