Article Module 4
Article Module 4
[Document title]
[Document subtitle]
User
[Date]
OPTIMIZATION AND
SIMULATION Document No.
LABORATORY
Form No.
4 MODULE
th
Effectively Apply
1. Model Development
The development model can be interpreted as a mathematical equation that can be
used to predict a value. The development model can be used to connect one or more
independent variables to the dependent variables. The development model has several types
of models, one of them is Regression Model Development which we will learn in this
module.
2. Linear Regression
One of the subcategories of supervised learning is regression analysis. The purpose
of a regression analysis is to predict outputs on a continuous scale of two or more variables
and to estimate the relationship between these variables.
Linear regression is a linear approach for modelling the relationship between a
scalar response and one or more explanatory variables. In linear regression, the reliable
variation of the response variable by the predictor variable depends on the number of
variables involved in the model. The goal of linear regression is to model the relationship
between one or multiple features and a continuous target variable. The type of data
commonly used for linear regression analysis is interval or ratio data.
There are two types of linear regression, namely Simple Linear Regression and
Multiple Linear Regression.
The weight, 𝑤0 , represents the 𝑦 axis intercept and 𝑤1 is the weight coefficient
of the explanatory variable.
Based on the linear equation previously, linear regression can be understood as
finding the best-fitting straight line through the training examples, as shown in the
following figure:
This best-fitting line is also called the regression line, and the vertical lines from
the regression line to the training examples are the so-called offsets or residuals—the
errors of our prediction.
This method is used to test how far the cause-and-effect relationship is to the
variables which are the causal factors for the consequent variables. Examples of using
Simple Linear Regression analysis in production activities include:
1) The relationship between the duration of an employee's income with their
happiness.
2) The relationship between the number of workers and the output produced.
𝑦 = 𝑤0 𝑥0 + 𝑤1 𝑥1 + ⋯ + 𝑤𝑚 𝑥𝑚 = ∑ 𝑤𝑖 𝑥𝑖 = 𝑤 𝑇 𝑥
𝑖=0
3. Model Evaluation
Model evaluation is a statistical test of each method of estimation that is done.
Model evaluation is used to determine how well a model fits into data. To evaluate our
models so far, we have split our dataset into a train set and a test set, while in python the
training and testing process can use the train_test_split function. The reason we split
our data into training and test sets is that we are interested in measuring how well our model
generalises to new, previously unseen data. So, how well it can make predictions for data
that was not observed during training.
3.1. Training
Training on model evaluation can be used for model building, besides that, it
is used to determine the generalizability of the model that has been made. Built a model
on the training set by calling the fit method.
3.2. Testing
Testing on model evaluation can be used for testing the model that has been
made. In testing, the greater the testing data, the more accurate the simulation of a
model will be. Evaluate the test set in python using the score method.
4. Evaluation Metrics
In evaluating the performance of the regression results, several evaluation metrics
can be used, such as Mean Squared Error, R-Squared, and Accuracy. It is important to
choose the right metrics when selecting between models and adjusting parameters. Every
metrics has its own uniqueness and advantages.
1
𝑀𝑆𝐸 = ∑ 𝑒2
𝑛
4.2. R-Squared
R-squared is the coefficient of determination that measures the extent to which
the model is able to explain the variation of the dependent variable, and the coefficient
of determination can be used to describe how much variation is described in the model.
The following is the formula for R-Squared:
4.3. Accuracy
Accuracy is a method used to evaluate the performance of the model that has
been made, accuracy can also be called the proportion of correct predictions divided
by the number of samples. The following is the formula for accuracy:
𝑁
1
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ∑ 𝑣𝑒𝑟𝑑𝑖𝑐𝑡𝑖
𝑁
𝑖=1
REFERENCES
Aminuddin, A., Sudarno, S., & Sugito, S. (2018). Pemilihan Model Regresi Linier
Multivariat Terbaik Dengan Kriteria Mean Square Error. Jurnal Gaussian, 2.
Muller, A., & Guido, S. (2017). Introduction to Machine Learning with Python.
O'Reilly Media, Inc.
Raschka, S., & Mirjalili, V. (2019). Python Machine Learning (3rd ed.). Packt
Publishing.