Regression Models With Python
Regression Models With Python
Edited by AI Publishing
Ebook Converted and Cover by Gazler Studio
Published by AI Publishing LLC
ISBN-13: 978-1-7330426-7-3
Legal Notice:
You cannot amend, distribute, sell, use, quote, or paraphrase any part
of the content within this book without the consent of the author.
Disclaimer Notice:
Please note the information contained within this document is for
educational and entertainment purposes only. No warranties of
any kind are expressed or implied. Readers acknowledge that the
author is not engaging in the rendering of legal, financial, medical,
or professional advice. Please consult a licensed professional before
attempting any techniques outlined in this book.
1 Preface........................................................................1
1.1. Book Approach.............................................................................1
1.2. Regression Analysis and Data Science................................2
1.3. Who Is This Book For?...............................................................3
1.4. How to Use This Book................................................................3
1.5. What is Regression and When to Use It?...........................4
1.6. Using Python for Regression Analysis.................................4
1.7. About the Author........................................................................5
3 Data Preparation....................................................47
3.1. Missing Data............................................................................... 47
3.2. Outliers......................................................................................... 53
3.3. Standardization......................................................................... 56
3.4. Normalization............................................................................. 58
3.5. Summarization by Binning....................................................60
3.6. Qualitative Features Encoding............................................. 61
3.7. Dummy Coding with Pandas................................................ 63
3.8. Summary...................................................................................... 64
3.9. Hands-On Project..................................................................... 65
6 Logistic Regression..............................................105
6.1. Defining a Classification Problem.................................... 105
6.2. Evaluating the Classifier Performance............................ 106
6.3. Logistic Regression Intuition.............................................. 109
6.4. Logistic Regression Gradient Descent.............................. 111
6.5. Logistic Regression Pros and Cons................................... 112
6.6. Hands-On Project..................................................................... 113
1
Preface
Data science is not a usual field like other traditional fields. Instead, it is a
1.2. Regression Analysis and Data Science
multi-disciplinary field, which means it combines different fields such as
Domain
Expertise
Computer
Mathematics
Science
So, you might ask, what is the relation between data science
and regression analysis?
Also, you will notice that there are a lot of hands-on projects
in this book. Try to run them yourself and even try other
approaches that you might find in the additional materials or
that you think about.
4 | P r e fa c e
https://www.aispublishing.net/book-regression-modeling
Get in Touch With Us
are 2.7 and 3.6. In this book, we will be using 3.6 as it has
more features and capability, and because Python 2.7 will
be deprecated very soon, which means that there will be no
support for it in the future.
1+2
14 | Basics of Python for D ata S c i e n c e
x= 1+2
x = 3
y = 3.5
print(type(x))
print(type(y))
From the code snippet, we can see that you can print the type
of any variable using type built-in function.
x = 1
y = 2.5
z = x*y
print(type(z))
<class ‘float’>
s = “Hello World! “
print(s)
print(type(s))
Hello World!
<class ‘str’>
new_s = s[2:5]
print(new_s)
llo
newer_s = s + new_s
print(newer_s)
Hello World! llo
repeated_s = s*4
print(repeated_s)
Hello World! Hello World! Hello World! Hello World!
Add_s_sum = s+4
--------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-c46ee2a4b7fb> in <module>----> 1 add_s_num =
s +4
TypeError: can only concatenate str (not “int”) to str
add_s_sum = s + str(4)
print(add_s_num)
Hello World! 4
f_to_i = int(3.2)
i_to_f = float(3)
print(f_to_i)
print(type(f_to_i))
print(i_to_f)
print(type(i_to_f))
3
<class ‘int’>
3.0
<class ‘float’>
Congratulations! You now know all about the basic data types
in Python. Now, let us move to complex data types.
We can also use negative indexing to access the list from the
end instead of from the start.
print(l[-1])
hi
l_2.append(8)
print(l_2)
[4, 15, 7, 1, 4, 5.1, ‘hi’, 8]
l_2.remove(8)
print(l_2)
[4, 15, 7, 1, 4, 5.1, ‘hi’]
As you have seen, lists can be altered, which means they are
mutable, while on the other hand, tuples are immutable.
18 | Basics of Python for D ata S c i e n c e
For the next data type, we have a Dictionary, which is, just
like its name, an address book, where you can find the address
of anyone by just knowing his/her name. However, this name
needs to be unique, as otherwise, you will not be entirely sure
if this is the correct address or not. The person’s name is called
the key, while the address is called the value. The address is
not unique, as many people can have the same address.
As you can see, the first value was overwritten by the second
one.
Finally, let us talk about sets, which are complex data types
that can only have unique value. Just like the dictionary, sets
are defined by using curly brackets, do not have order, and
cannot be indexed.
s = {1,20,4,5,6,1,2,3,4,5,3}
print(s)
print(type(s))
{1, 2, 3, 4, 5, 6, 20}
<class ‘set’>
20 | Basics of Python for D ata S c i e n c e
s[1]
--------------------------------------------------------------
TypeError Traceback (most
recent call last)
<ipython-input-8-c46ee2a4b7fb> in <module>----> 1 s[1]
TypeError:’set’ object does not support indexing
But the question is now, When should you use what? To answer
this question, you can follow this list of use cases:
l Lists: The most generic data type. Use it when your
data do not have any special cases, and you want to
use indexing.
l Tuples: It is mainly used when you know that the data
should not be no matter what.
l Dictionaries: Used when we want to have some sort of
relation between some unique variables and other non-
unique variables.
l Sets: Used when we know that any repeated data will
be redundant.
Further Readings
For more information about Python data structures, you can
go here: https://www.tutorialspoint.com/python/python_
variable_types.htm
When you open the application, you will see this User Interface.
To start, you click on New from the right corner. You will then
see a new notebook created which looks like this.
The most important part here is that you can either write in a
code cell or a markdown cell. The markdown cell is to beautify
22 | Basics of Python for D ata S c i e n c e
Do not worry about all the other tabs, for now, as we will be
using Jupyter notebooks heavily in all our exercises.
Now, let us see how to get the shape of an array. This is crucial
in data science, as we are always working with arrays and
matrices.
a.shape
(3,)
Regression Models with Python for Beginners | 23
Further Readings
For more information about NumPy, you can go here:
https://www.numpy.org/devdocs/user/quickstart.html
24 | Basics of Python for D ata S c i e n c e
2.6. Pandas
Pandas is another critical library in data science. It provides
high-performance data manipulation and analysis tools with
its powerful data structures.
import pandas as pd
Regression Models with Python for Beginners | 25
Data
Dimensions Description
Structure
Series 1 1D labeled homogeneous array,
size-mutable.
Data 2 and General 2D labeled, size-
Frames mutable tabular structure with
potentially heterogeneously
Panel 3 General 3D labeled, size-mutable
array
pd.Series([1,2,3])
0 1
1 2
2 3
dtype: int64
Pandas Panels are not used widely. Thus, we will focus only on
Series and Data Frames.
However, you can use Panels when your data are 3D.
26 | Basics of Python for D ata S c i e n c e
• read_csv()
• read_excel()
• read_json()
• read_html()
• read_sql()
Let us now use the different reading function that we have just
mentioned. Pandas has a function called head that enables us
to view the first few elements of a specific DataFrame.
iris_df=pd.read_csv(‘iris.csv’)
iris_df.head()
cars_df=pd.read_excel(‘cars.xls’)
cars_df.head()
Regression Models with Python for Beginners | 27
titanic_df=pd.read_json(‘titanic.json’)
titanic_df.head()
record_
datasetid fields recordid
timestamp
titanic- {‘fare’: 7.3125, ‘name’: 2016-09-
0 398286223e6c4c16377d2b81d5335ac6dcc2cafb
passengers ‘Olsen, Mr. Ole Marti... 21T01:34:51+03:00
Now, let us work with the cars dataset and see how to select
a column from it.
cars_df[‘MPG’]
0 18.0
1 15.0
2 18.0
3 16.0
Further Readings
For more information about Pandas, you can go here:
https://pandas.pydata.org/pandas-docs/stable/
Regression Models with Python for Beginners | 29
2.7. Matplotlib
Matplotlib is the fundamental library in Python for plotting 2D
and even some 3D data. You can use it to plot many different
plots, such as histograms, bar plots, heatmaps, line plots,
scatter plots, and many others.
Then, we simply call the scatter method and pass our dataset
variables.
Regression Models with Python for Beginners | 33
plt.figure(figsize=(15,10))
plt.scatter(cars_df.Horsepower,cars_df.MPG,c=cars_
df.Year,s=cars_df.Displacement, alpha=0.5)
plt.xlabel(r’Horsepower’, fontsize=15)
plt.ylabel(r’MPG’, fontsize=15)
plt.title(‘MPG vs Horsepower by Year and Displacement’,
fontsize=25)
plt.show()
After that, we repeat the same code but on our cars’ dataset.
cars_df.MPG.hist()
plt.show()
36 | Basics of Python for D ata S c i e n c e
From there, we can experiment with the bar plots and see
how they look and are used. Here, we combine them with
error bars that are frequently used when we have uncertainty
about our data.
bar_arr = np.array([‘Spring’, ‘Summer’, ‘Fall’, ‘Winder’])
freq_arr = np.random.randint(0, 100, 4)
yerr_arr = np.random.randint(5, 10, 4)
fig, ax = plt.subplots()
ax.bar(bar_arr, freq_arr, # X and Y
yerr = yerr_arr, # error bars
color=’red’,
)
Regression Models with Python for Beginners | 37
The last type of plot that we are going to mention is the line
plot. We will artificially generate the data with the following
distribution so they can be interpreted easily in the plots.
dt = 0.01
t = np.arange(0, 30, dt)
nse1 = np.random.randn(len(t))
nse2 = np.random.randn(len(t))
r = np.exp(-t / 0.05)
cnse1 = np.convolve(nse1, r, mode=’same’) * dt
cnse2 = np.convolve(nse2, r, mode=’same’) * dt
s1 = 0.01 * np.sin(2 * np.pi * 10 * t) + cnse1
s2 = 0.01 * np.sin(2 * np.pi * 10 * t) + cnse2
Now, we can create two plots in one plot using the sub-plots
function.
38 | Basics of Python for D ata S c i e n c e
Further Readings
For more information about Matplotlib, you can get back to
this tutorial here.
2.8. SciPy
SciPy is a very important library for linear algebra operations,
it is and also used in Fourier Transformers.
Regression Models with Python for Beginners | 41
Note that the SciPy library depends on NumPy for all its
operations.
Further Readings
For more information about SciPy, you can get back to this
tutorial here.
2.9. Statsmodels
Given that this book focuses on one of the fundamental
statistical methods, we will be using Statsmodels library to
understand the details of regression analysis. Statsmodels
is a Python library that provides many functions that help
to estimate many different statistical models, along with
conducting tactical tests and statistical data exploration.
Then, using only one line of code, we can get the result of
dozens of statistical tests very easily.
print(results.summary())
Regression Models with Python for Beginners | 43
2.10. Scikit-Learn
2.10. Scikit-Learn
Let us now introduce one of the most important libraries for anyone starting
Let us machine
to learn now introduce one oforthe
learning, Sklearn, most important libraries for
Scikit-Learn.
anyone starting to learn machine learning, Sklearn, or Scikit-
This library includes out-of-the-box ready-to-use machine learning
Learn.
algorithms. It literary has most of the algorithms that we will talk about in
this eBook.
This libraryThe beautifulout-of-the-box
includes thing about it isready-to-use
that it has very great
machine
documentation. But more than that, it is very simple and
learning algorithms. It literary has most of the algorithms thatintuitive to use. We
will look at how to use this library to do all kinds of regression analysis in this
we will talk about in this eBook. The beautiful thing about it
book, starting from chapter 3.
is that it has very great documentation. But more than that, it
The
is library
very also and
simple provides many to
intuitive utilities
use. for
Wedata-preprocessing
will look at howand to data
use
visualization and evaluation.
this library to do all kinds of regression analysis in this book,
We start byfrom
starting importing
chapterthe modules
3. that we will use from sklearn.
fromlibrary
The sklearn.linear_model import
also provides many LinearRegression
utilities for data-preprocessing
from sklearn.model_selection import train_test_split
and data visualization and evaluation.
Then we call the algorithm that we will use, which is the linear
regression.
lm=LinearRegression()
Further Readings
For more information about Sklearn, you can go to https://
scikit-learn.org/stable/documentation.html.
df.isnull()
That was easy! However, the tricky part is how should we deal
with missing values?
48 | D ata P r e pa r at i o n
Then, let us explore our dataset by printing out the first five
columns.
fords.head()
fords.describe()
Now, let us impute the missing values with the mean value. To
do so, we will use a function from Sklearn library as follows:
from sklearn.impute import SimpleImputer
fords=pd.read_csv(‘fords.csv’)
fords = fords[[‘Year’,’Mileage’,’Price’,’Age’]]
3.2. Outliers
The second step in cleaning the data is to detect and remove
any outliers. But first, what is meant exactly by outliers?
We can frankly say that any data point that is very different
from the rest of the data points can be considered as an outlier.
This is not a solid or a scientific definition, but it can give you
an idea about what outliers mean.
considered mild outliers, and any data points that are beyond
an outer fence are considered extreme outliers.
fords = fords[fords.Year<2090]
fig1, ax1 = plt.subplots()
ax1.boxplot(fords.Year)
3.3. Standardization
The third step in data preprocessing is standardization, or mean
removal and variance scaling. This is a crucial requirement for
most of the machine learning algorithms, including regression
analysis. This is because these algorithms assume that the
features look like standard normally distributed data, which is
a Gaussian with zero mean µ and unit variance σ2.
Let us see the mean and the variance of the data after
standardization.
print(X_scaled.mean(axis=0))
print(X_scaled.std(axis=0))
array([0., 0., 0.])
array([1., 1., 1.])
58 | D ata P r e pa r at i o n
3.4. Normalization
Another important step in data preprocessing is normalization,
which is the process of scaling individual observations to have
Regression Models with Python for Beginners | 59
As we can see, all the values are now in the range between
zero and 1.
60 | D ata P r e pa r at i o n
There are two main methods for binning, which are K-bins and
Feature Binarization.
So, all the techniques that we discussed so far will not work
on the categorical variables with their original state. Thus, we
need to manipulate them one way or another so all that we
discussed can be used.
0 1 0 1 0 1 0 1
1 2 1 0 1 0 1 0
3.8. Summary
To summarize, in this chapter, we have explored different
tricks of data preparation, such as dealing with missing values,
Regression Models with Python for Beginners | 65
fords=pd.read_csv(‘fords.csv’)
fords.head()
fords.describe()
fords.info()
<class ‘pandas.core.frame.DataFrame’>
RangeIndex: 635 entries, 0 to 634
Data columns (total 26 columns):
Year 635 non-null float64
Mileage 635 non-null float64
Price 635 non-null float64
Age 635 non-null float64
Color_beige 635 non-null uint8
Color_black 635 non-null uint8
Color_blue 635 non-null uint8
Color_brown 635 non-null uint8
Color_burgundy 635 non-null uint8
fords.Mileage.hist()
fords.Mileage.hist()
plt.show()
fords.Mileage.hist()
plt.show()
plt.show()
fords=fords.loc[~((fords.Price>8000000)|
We then remove the outliers using the following line.
(fords.Year>2080)),]
fords=fords.loc[~((fords.Price>8000000)|
(fords.Year>2080)),]
The next step is to standardize the numerical variables.
The next step is to standardize the numerical variables.
# Standardization
# Standardization
scaler = preprocessing.MinMaxScaler()
scaler = preprocessing.MinMaxScaler()
fords[['Year','Mileage','Price','Age']] =
fords[[‘Year’,’Mileage’,’Price’,’Age’]]
scaler.fit_transform(fords[['Year','Mileage', = scaler.fit_
transform(fords[[‘Year’,’Mileage’,
'Price','Age']])
fords.head()
‘Price’,’Age’]])
fords.head()
4.1. Defining
hypothesis, a Regression
you asked problem
many of your friends about both their
income and their houses’ areas. Or, you simply downloaded
Imagine that you want to know if the income of a person has anything to do
awith
dataset
the areafrom the house.
of his/her internet that
To test thisfits your use
hypothesis, case.many
you asked Then,
of
you will simply try to plot the data and see if there is a
your friends about both their income and their houses' areas. Or, you simplyclear
downloaded
linear a dataset from
relationship. Thisthe internet that
plotting fits your
is done use case.
using Then, you will
a mathematical
simply try to plot the data and see if there is a clear linear relationship. This
equation or a model, which can then be used to predict the
plotting is done using a mathematical equation or a model, which can then be
income of a person by just knowing her/his house area.
used to predict the income of a person by just knowing her/his house area.
This
This equation
equation is aissimple
a simple one,iswhich
one, which is as follows:
as follows:
InInthis
this equation,
equation, we we
can find
can the
find the youtput
output by multiplying
y by the input x by the
multiplying the
slope m and adding this to the y-intercept b. The output is also called the
input x by the slope m and adding this to the y-intercept b.
response variable, as we are trying to evaluate or predict it. While the input
The output
can be is predictor
called the also called the response variable, as we are
variables.
trying to evaluate or predict it. While the input can be called
the predictor variables.
After that, we choose the MPG to be the input variable and the
horsepower to be the output variable.
y=cars_df.MPG
X=cars_df.Horsepower
Then, we add a constant column of ones, so we are having the same equation
Then, we add a constant column of ones, so we are having the
of the linear regression.
same equation of the linear regression.
X_multi=sm.tools.tools.add_constant(X_train, prepend=True,
X_multi=sm.tools.tools.add_constant(X_train, prepend=True,
has_constant='skip')
has_constant=’skip’)
After that,
After that, we perform
we perform the Least
the Ordinary Ordinary Leastregression
Square (OLS) Squareas(OLS)
below.
regression as below.
mod = sm.OLS(y_train, X_multi)
mod == mod.fit()
res sm.OLS(y_train, X_multi)
res = mod.fit()
print(res.summary())
print(res.summary())
Where,
We know that is the difference between the predicted value and the true
We know that is the difference between the predicted value22 and the true
We Weknow
value, know
also that asis
known
that isthe
the
the difference
residual.
difference between
Therefore, wethecan
between predicted
say that
the value
R is aand
predicted the true
measure
value
value, also known as the residual. Therefore, we can say that R 2 is2 a measure
of value,
and also
the reduction known
in the as sum
the residual.
ofknown
squaredTherefore,
values we
the between can saythethatrawRlabelis a values
measure
of thethe true
reduction value,
in the22 sum also of squaredas values residual.
between theTherefore,
raw label values we
and
canofthe
the
We residuals.
reduction
know
sayresiduals.
If
that
that RIf is inR the
is= 0,
sum
the then
of our model
squared
difference is
values
between useless
between
the and
predicted does
the raw not
value reduce
label
and values
the
of true
R a=2measure
0, then ourof the isreduction in the sum
2 2
and the model 2useless and does not reduce
theand the On
error.
value, residuals.
alsothe If R
other
known = 0,residual.
ashand,
the ifthen
rii =our modelR2is=we
0,Therefore,
then 2 useless
1,can
whichandis
say does
thatourR not
2
is areduce
ultimate
measure
the error. On
squared valuesthe other
between hand,the = 0,label
if ri raw then R = 1, which
values and the is our ultimate
residuals.
theoferror.
target. On the other
the reduction in thehand, = 0, then
sum ifofrisquared valuesR2 = 1, which
between theisraw
ourlabel
ultimate
values
target.
If R 2
= 0, then our model is useless and does not reduce the
target. 2
and the residuals. 22If R = 0, then our model is useless and does not reduce
Another
error.theOnvariation ofthe
theOnother R is hand, which
if rifi is
ri the
=0,0,same
thenexcept
R22 == for the SSisterms to
Another error.
variation of R2is other
2 hand,
which is= the then
same R
except 1,1, which
for which
the SS termsis
ourourultimate
to
be Another
the variation
variance of the of R is
residual and which
the is
variancethe same
of the except
true for
labels. the SS terms to
ultimate
be thetarget.target.
variance of the residual and the variance of the true labels.
be the variance of the residual and the variance of the true labels.
Another
Anothervariation
variationofofRR22is is whichisisthe
which thesamesame exceptexcept
for thefor SSthe
terms to
be the to
SS terms variance
be the of variance
the residualof andthe theresidual
variance of andthe the
true variance
labels. of
the true labels.
Regression Models with Python for Beginners | 77
As we saw, both R2 and adjusted R2 are low, which indicates that the input
4.6.
variable cannot Optimization
be used Algorithms
alone to predict the output variable correctly.
2 2
As we saw, both R and adjusted R are low, which indicates that the input
Going 4.6. Optimization
back to the simple Algorithms
linear regression analysis, we said
variable cannot be used alone to predict the output variable correctly.
that we want to predict the output by training a machine
Going4.6.
back to the simple linear regression analysis, we said that we want to
learning Optimization
model on bothAlgorithmsthe input and the output, so we can
predict the output by training a machine learning model on both the input
get theoutput,
and the slopesoandwe can usegetboth theand
theregression
slope newuse input
bothwe and
the the
newthat learned
input
Going back to the simple linear analysis, said we and
wanttheto
slope
learned to predict
slope to the
predict new
the new output.
output.
predict the output by training a machine learning model on both the input
and the output, so we can get the slope and use both the new input and the
learned slope to predict the new output.
So, our goal is to find m and b, which we will call the weights and the bias
So, our goal is to find m and b, which we will call the weights
from now on. Let us take our example one step further and plot it.
and the bias from now on. Let us take our example one step
So, our goal is to find m and b, which we will call the weights and the bias
further
from nowand plot
on. Let us it.
take our example one step further and plot it.
As we can see, there are infinite values for the weights, and
we cannot tell, until now, which set of weights gives the best
performance.
As we can Assee,we can are
there see,infinite
there are infinite
values values
for the for theand
weights, weights,
we cannotand
There are twountilmain methods
until now, to determine
which set of these
weights weights.
gives
now, which set of weights gives the best performance. the Both
best performance.
hts, and we cannot tell,
ance. are based on minimizing the error. However, they differ in their
There
There are two mainaremethods
two maintomethods
determine to these
determine these
weights. weights.
Both Bot
are based
approaches to do so, as the first method, as we will see, does
ghts. Both are based on minimizingminimizing the error. they
the error. However, However,
differ they differ
in their in their approach
approaches to do so
this by gettingthea closed-form
first the first
method, mathematical
method,
as we willas we
see, solution,
will
does see,
this while
does
by the bya closed-f
this
getting getting
pproaches to do so, as
second one is an iterative
getting a closed-form mathematical solution
mathematical that tries
solution,
solution, while to
while the
the second converge
onesecond to the
one is an
is an iterative iterative
solution so
that
correct answer.
ative solution that tries to convergetotoconverge
the to
correct the correct
answer. answer.
We
We cancanfind w using
find some some
w using mathematical manipulation
mathematical that we will not
manipulation thatbe
concerned about right now, but it has a closed-form solution that is applied.
we will not be concerned about right now, but it has a closed-
The
formsecond method
solution is is
that iterative method called gradient descent. In this
anapplied.
method, we have our cost function, which is the same as the sum of the
The second
squared method
errors. Our is an
objective againiterative
is to find method
the weightscalled gradient
that minimize the
descent. In this method,
cost function as follows: we have our cost function, which
is the same as the sum of the squared errors. Our objective
again is to find the weights that minimize the cost function as
follows:
80 | S i mp l e L i n e a r R e g r e s s i o n
So, you might have two questions. The first one is what should
be the value of the learning rate. The answer is that it depends
on the convergence rate. So, if we have an error and far from
the right answer, then we will want a bigger learning rate.
However, once we start converging, this big learning rate will
make it difficult for us to reach the minimum value as it may
overshoot. Also, choosing a very small learning rate will make
the model take too much time to converge, and it may also
get stuck in a local minimum and do not reach the global
minimum. Nonetheless, people tend to use the learning rate
in the range of 10-2 – 10-5. So, a good method to choose your
learning rate is to start from 10-5 and increase it sharply if it
gives you good results, then increase it carefully once you
reach a critical value.
Regression Models with Python for Beginners | 81
We then choose the input and output variables and split the
data.
y=cars_df.MPG
X=cars_df.Horsepower
X_train,X_test,y_train,y_test=train_test_split(
pd.DataFrame(X),y,test_size=0.3,random_state=42)
WeWe Wecan
can can
seesee
see
thatthatRR2 2and
that and
and are
are
are 82.6
82.6
82.6percent
percent and
percentand82.1percent
82.1percent
and for
formultiple
82.1 percent multiple
for linear
We linear
can regression,
see
multipleregression,
that 2
R and
linear which
whichisisare
regression,much
much
82.6better
better than
isthan
percent
which simple
andsimple
much linear
linearthan
82.1percent
better regression,
regression,
for simplewhere
where
multiple
we
linear
linearwe got
gotonly
only62.5%
regression,
regression, 62.5%
which and
and 62.6
62.6
is much
where wepercent.
percent.
better than simple
got only 62.5%linear
and regression,
62.6 percent.where
2
We can see that R and
we got only 62.5% and 62.6 percent. are 82.6 percent and 82.1percent for multiple
5.3.
5.3.Multiple
linear regression, Multiple
which is Linear
Linear
much Regression
Regression
better than simple Equation
Equation
linear regression, where
we5.3. onlyMultiple
got5.3. Multiple
62.5% andLinear62.6Linear Regression
Regression
percent. Equation Equation
The
Thesimple
simplelinear
linearregression
regressionequation
equationwas wasasasfollows:
follows:
The 5.3. Multiple
Thesimple
simple linear
linear Linear
regression
regression Regression
equationequation Equation
was as follows:
was as follows:
TheNow,
simple
Now,we linear
we have
have regression
multiple equation
multipleweights, wasfor
weights,one
one as each
for follows:
eachfeature. Also,xxisiscomposed
feature.Also, composedofof
Now,different
different
we havefeatures,
features, weights,
multiple soso we
we need
need
one fora aeach
subscript
subscript
feature.for
for itit xtoo.
Also, too. Therefore,
Therefore,
is composed of the
the
Now, we have
generalized
generalized multiple
linear
linear weights, one for each feature. Also, x
different features, soregression
regression
we needformula
formula isisasasfollows:
a subscript follows:
for it too. Therefore, the
Now, we have multiple weights, one for each
generalized linear regression formula is as follows: need
is composed of different features, so feature.
we Also,axsubscript
is composedfor
of
different features, sothe
it too. Therefore, we generalized
need a subscript linearforregression
it too. Therefore,
formulatheis
generalized
as follows: linear regression formula is as follows:
TheTTsuperscript
The superscriptthatthatweweuseusefor
forwwisiscalledcalledthethetranspose,
transpose,andandthis
thisequation
equation
The T superscript that we use for w is called the transpose, and this equation our
isisthe
thesame
same as
asthe
thesum
sum equation.
equation. ButBut ititisismainly
mainly used
used when
when wewe convert
convert our
variables
is the variables
same asintointo
thevectors
vectors
sum and
andmatrices.
equation. matrices.
But it By isBymainly
converting
converting
used them,
them,we
when wecan
we canavoid
convertavoidusing
ourusing
The T superscript
loops,
loops, which
which that
takes we
takes toouse
too for
much
much w is
time
time called
toto the
finish transpose,
variables into vectors and matrices. By converting them, we can avoid using ofof
finish ififwewe and
have
have a athis
large equation
large number
number
is the
loops, same
inputs.
inputs.
which asUsing
Usingthe vectors
takes sum
too equation.
vectors
much Buttopreferable
isisalways
always
time itpreferable
is mainly
finish if asused
as
we when
computers
computers
have weare
a large convert
are our
optimized
optimized
number of toto
The simple linear regression equation was as follows:
Regression Models with Python for Beginners | 89
Now, we have multiple weights, one for each feature. Also, x is composed of
The T superscript
different features, so that we use
we need for w isforcalled
a subscript it too.the transpose,
Therefore, the
and this equation
generalized is the
linear regression sameis as
formula the sum equation. But it is
as follows:
mainly used when we convert our variables into vectors and
matrices. By converting them, we can avoid using loops, which
takes too much time to finish if we have a large number of
inputs.
The Using vectors
T superscript is for
that we use always preferable
w is called as computers
the transpose, are
and this equation
is the same asto
optimized theperform
sum equation.
matrix Butmultiplication
it is mainly used when
morewe convert
than our
loops.
variables into vectors and matrices.
We call this paradigm vectorization. By converting them, we can avoid using
loops, which takes too much time to finish if we have a large number of
Also, note
inputs. Usingthat this is
vectors equation is true for
always preferable as simple linear
computers are regression.
optimized to
perform matrix multiplication more than loops. We call this paradigm
vectorization.
5.4. Gradient Descent and
Correlation
Also, note that this equation Matrix
is true for simple linear regression.
5.4. Gradient
For optimizing Descent linear
a multiple and Correlation
regression Matrix
model, the same
concepts apply.
For optimizing This means
a multiple that we model,
linear regression can use
thethe
samesame equation
concepts apply.
for gradient
This means thatdescent:
we can use the same equation for gradient descent:
∇
But now, we have multiple equations equal to the number of features that we
But now,
have in we have multiple equations equal to the number of
our model.
features that we have in our model.
However, the new thing to know is that instead of correlating a single value,
However,
we now havethe new thing
a correlation to know
matrix. is thatis instead
This matrix of correlating
a fancy word for a table thata
shows the
single correlation
value, we now coefficients
have abetween all the matrix.
correlation variables This
and each other.is a
matrix
fancy word for a table that shows the correlation coefficients
between all the variables and each other.
As we can see, the house sizes range from 115 m2 to 340 m2,
while the number of bedrooms and floors from 1 to 5. This is a
huge problem for the gradient descent algorithm because the
range of the house sizes is huge, while the range of the other
features is much smaller. This will lead the algorithm to find
the best values of the weights for bedrooms and the floors
quickly, while it will take much longer to do the same for the
house size.
X
X
Note that your model should not be exposed to the testing set
throughout the training process.
Regression Models with Python for Beginners | 95
You might ask now, are there any guarantees that this splitting
operation will make that the two datasets have the same
distribution?
This is hard to answer, but data science pioneers made all their
algorithms based on the assumption that the data generation
process is I.I.D., which means that the data are independent of
each other and identically distributed.
So, what are the factors that determine how well the linear
regression model is doing?
We say that the model is overfitting when the gap between the
training and testing errors is large, as the model is capturing
even the noise among the data.
96 | M u lt i p l e L i n e a r R e g r e s s i o n
So, you might wonder, can we control this? The answer is yes,
and it can be done by changing the model capacity. Capacity
is a term that is used in many fields, but in the context of
machine learning and regression analysis, it is a measure of
how complex a relationship that the model can describe.
We say that a model that represents quadratic function—
polynomial of the second degree—has more capacity than
the model that can represent a linear function.
But the problem is, if the validation set is the same each time,
we are back to square one again, which prevents us from using
the testing set while training our model.
for training. The model overall error is the average error of all
errors.
import pandas as pd
import os
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from math import sqrt
from sklearn.metrics import mean_squared_error
import statsmodels.api as sm
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
import scipy as sp
import seaborn as sns
import statsmodels.formula.api as smf
y_prediction = regressor.predict(X_test_sc)
RMSE = sqrt(mean_squared_error(y_true = y_test, y_pred = y_
prediction))
print(RMSE)
3.216235254988254
Now, let us build a regression model which has the following equation:
Now, let us build a regression model which has the following
equation:
Now, let us build a regression model which has the following equation:
model = smf.ols(formula='MPG ~ Weight + Cylinders',
data=cars_df).fit()
summary = model.summary()
model = smf.ols(formula='MPG ~ Weight + Cylinders',
model = smf.ols(formula=’MPG ~ Weight + Cylinders’, data=cars_
summary.tables[1]
data=cars_df).fit()
df).fit()
summary
summary == model.summary()
model.summary()
summary.tables[1]
summary.tables[1]
coef std err t P>|t| [0.025 0.975]
We can also notice that all p-values (the fourth column) are
significant. We say that the p-value is significant if it is less
We can also notice that all p-values (the fourth column) are significant. We
than
say that0.05.
the p-value is significant if it is less than 0.05.
For any
For any variable
variable
in inthethe regression
regression model,
model, we must
we must select select
one of one
two
hypotheses:
of two hypotheses:
1.1. Null
Nullhypothesis:
hypothesis: The The
coefficient for this variable
coefficient for thisisvariable
zero. is zero.
2. Alternative hypothesis: The coefficient for this variable is not zero.
2. Alternative hypothesis: The coefficient for this variable
If the p-value
is notforzero.
a variable is significant—less than 0.05—then we reject the
null hypothesis. Therefore, we reject the null hypothesis for both variables.
If the p-value for a variable is significant—less than 0.05—then
Now,
we rejectlet us the
model thehypothesis.
null interaction between the weight
Therefore, variablethe
we reject and null
the
cylinder’s
hypothesis variable.
for both variables.
We can do so using two ways. The first way is as follows.
Now, let us model the interaction between the weight variable
model_interaction = smf.ols(formula='MPG ~ Weight +
and the cylinder’s variable.
Cylinders + Weight:Cylinders', data=cars_df).fit()
We can =
summary domodel_interaction.summary()
so using two ways. The first way is as follows.
summary.tables[1]
model_interaction = smf.ols(formula=’MPG ~ Weight + Cylinders
+ Weight:Cylinders’, data=cars_df).fit()
summary = model_interaction.summary()
summary.tables[1]coef std err t P>|t| [0.025 0.975]
Wesee
We see that
that the the equation
equation now becomes:
now becomes:
You can also notice that the p-value for the interaction variable is significant,
confirming an interaction between the two variables.
The second way to model the interaction is by adding another variable, which
is the multiplication of the two variables.
Regression Models with Python for Beginners | 103
You can also notice that the p-value for the interaction
variable is significant, confirming an interaction between the
two variables.
Now, let us plot the interaction graph using the steps that we
explained earlier.
cars_df[‘cyl_med’] = cars_df.Cylinders>cars_df.Cylinders.
median()
cars_df[‘cyl_med’] = np.where(cars_df.cyl_med == False, “Below
Median”, “Above Median”)
sns.lmplot(x=’Weight’, y=’MPG’, hue=’cyl_med’, data=cars_df,
ci=None, size=5, aspect=2.5);
104 | M u lt i p l e L i n e a r R e g r e s s i o n
We can deduce from this graph that when the cylinder value is
small—below the median—the relationship between MPG and
Weight is strongly negative. While on the other side, when
the cylinder value is big—above-median—the relationship
between MPG and Weight is weaker.
From the confusion matrix, we can get the accuracy which is,
Regression Models with Python for Beginners | 107
Fromthe
From the confusion
confusion matrix,
matrix, we canwe
get can get the
the accuracy accuracy
which is, which is,
From the confusion matrix, we can get the accuracy which is,
From the confusion matrix, we can get the accuracy which is,
From we
Also, thecan
confusion matrix,
get another twowe can getcalled
metrics the accuracy whichand
the precision is, the recall:
Also, we can get another two metrics called the precision and
the recall:
Also, we can get another two metrics called the precision and the recall:
Also, we can get another two metrics called the precision and the recall:
Also, we can get another two metrics called the precision and the recall:
And, we can also get a combination of the precision and the recall called the
F-score as follows:
And, we can also get a combination of the precision and the recall called the
And, we can also get a combination of the precision and the recall called the
F-score
And, as follows:
we can also get a combination of the precision and the
F-score
And, weascan
follows:
also get a combination of the precision and the recall called the
recall called
F-score the F-score as follows:
as follows:
So, why do we not just get the accuracy? Why do we need precision and
recall?
So, why do we not just get the accuracy? Why do we need precision and
To
So, answer
why dothese
we not questions, let us
just get the look at two
accuracy? Whydifferent classification
do we need precision and
recall?
problems and see if the accuracy is the best evaluation metric to use.and
recall?
So, why do we not just get the accuracy? Why do we need precision
So,answer
To why these
recall? do we not just
questions, get
let us lookthe accuracy?
at two Why do we need
different classification
To answer these questions, let us look at two different classification
problems
precision and see
and if the
recall?accuracy is the best evaluation metric to use.
problems
To answerand
theseseequestions,
if the accuracy
let us islook
theatbest
twoevaluation metric to use.
different classification
problems
To answer and see if the accuracy
these questions, is the let
best evaluation
us look metric
at two to use.
different
classification problems and see if the accuracy is the best
evaluation metric to use.
spam email into your main email folder. Of course, both are
considered types of errors. But in our problem, we care more
about having the minimum number of false positives. Thus, we
use precision as our metric when evaluating the model.
To understand how the sigmoid function squashes our input into [0,1], we
To
can understand
plot it usinghow the sigmoid
Python, function squashes our curve.
input into [0,1], we
To understand how andthewe would
sigmoid getfunction
the following
squashes our input
can plot it using Python, and we would get the following curve.
into [0,1], we can plot it using Python, and we would get the
following curve.
We can generate this plot by simply writing the sigmoid as a Python function
We
and can
thengenerate
call this this plot by
function simply
with writing
different the of
values sigmoid
input. as a Python function
and then call this function with different values of input.
As
Weyou cancangenerate
see, the output—Y-axis—can
this plot by simply only take valuesthe
writing in the range [0,1],
sigmoid as
As you can see, the output—Y-axis—can only take values
and it reaches zero at negative infinity and reaches one at positivein the range [0,1],
infinity.
a Python function and then call this function with different
and it reaches
We can also seezero
thatatthe
negative
outputinfinity and reaches
is 0.5 when the inputoneis at positive
zero. We caninfinity.
alter
values
We can of input.
also see that the output is 0.5 when the input is zero. We can alter
that by scaling the sigmoid function or changing the bias.
that by scaling
As you can the
see, sigmoid function or changing the bias.
the output—Y-axis—can only take values in
the range [0,1], and it reaches zero at negative infinity and
Regression Models with Python for Beginners | 111
reaches one at positive infinity. We can also see that the output
is 0.5 when the input is zero. We can alter that by scaling the
sigmoid function or changing the bias.
calledcalled
regularization
regularization to address
to address
this this
problem.
problem.
Regularization is a fancy word for the penalty as we penalize the model if it is
Regularization
this Regularization
better by is a fancy
is a fancy
looking word word
at howfor the
for penalty
we the penalty
update as the
weaspenalize
we penalize
weights thewhen
model
the model
if it is
we if it is
getting more complex. We can understand this better by looking at how we
getting
getting
introduce more more
the complex.
complex. We can
regularization We can
understand
understand
term. this this
better
better
by looking
by looking
at how
at how
we we
update the weights when we introduce the regularization term.
updateupdate
the weights
the weightswhenwhen we introduce
we introduce
the regularization
the regularization
term.term.
We say that is the learning rate and is the penalization term. So, we see
We WesayWethat
say say thatis
that is the learning
theis learning
the learning rate
rate andand
rate andis the is penalization
the penalization
the penalization term.term.
So,term.
we
So, see
we see
that the regularization term is added as a second term in the loss function.
So,that that
theseeregularization
thethat
regularizationtermterm
is added
is addedas a as second
a secondtermterm
in thein a
loss
thesecond
loss
function.
function.
The wepurpose of thisthe regularization
regularization term isterm to pushis added as
the parameters toward
The
term The
purpose
purposeof this
of this
regularization
regularizationterm term
is toispush
to push
the the
parameters
parameters toward
toward
smallerinnumbers,
the loss andfunction. The purpose
thus, the model does not become of thismore regularization
complex, and
smaller
smaller
numbers,
numbers,and andthus,thus,
the model
the model doesdoes not not
become becomemoremore
complex,
complex,and and
hence, is
term it does
to pushnot overfit.
the parameters toward smaller numbers, and
hence,
hence,
it does
it does
not overfit.
not overfit.
thus,
There the model
are many does not
different become
methods more complex,
to implement and hence,
the regularization term.
it There
does There
are
not many
are many
overfit.different
different
methods
methods to implement
to implement the the
regularization
regularizationterm.term.
However, the two most used ways are the Lasso method, also called “L1,”
However,
However, the two
the two
mostmostusedused
waysways are Lasso
are the the Lassomethod,
method,also also
called
called
“L1,”“L1,”
and the Ridge method, which is also called “L2.”
There theare
and and Ridge many
the Ridge method, different
method, whichwhich ismethods
is also also
calledcalled
“L2.” to implement the
“L2.”
regularization term. However, the two most used ways are the
Lasso method, also called “L1,” and the Ridge method, which
is also called “L2.”
The main difference between the two methods is that the Lasso
method tries to push all the parameters toward zero, while the
Ridge method tries to push all the parameters toward very
small numbers but not equal to zero. Both methods are used,
and you need to experiment with both to know which one
works best for each specific case.
l Cons:
a. Requires the data to be probably pre-processed
and scaled.
b. It does not work very well, compared to more
complex algorithms, especially if the data is complex
by nature.
After that, we split the dataset, scale it, and train it.
X_train,X_test,Y_train,Y_test=train_test_split(X,
Y,test_size=0.3,random_state=42)
scaler=MinMaxScaler()
X_train=pd.DataFrame(scaler.fit_transform(X_train),
columns=X_train.columns)
X_test=pd.DataFrame(scaler.transform(X_test),
columns=X_test.columns)
logreg=LogisticRegression()
mod1=logreg.fit(X_train,Y_train)
Let us also plot the weight for each feature to see how
important it is.
plt.figure(figsize=(15,10))
plt.bar(X_train.columns.tolist(),logreg.coef_[0])
plt.xticks(rotation=90,size=10)
plt.show()
116 | Logistic Regression
Now, let us see what the results would be with either L1 or L2 regularization.
Now, let us see what the results would be with either L1 or L2
logreg1=LogisticRegression(penalty='l1',C=1)
regularization.
mod2=logreg1.fit(X_train,Y_train)
logreg1=LogisticRegression(penalty=’l1’,C=1)
mod2=logreg1.fit(X_train,Y_train)
pred2=logreg1.predict(X_test)
pred2=logreg1.predict(X_test)
accuracy_score(y_true=Y_test, y_pred=pred2)
accuracy_score(y_true=Y_test, y_pred=pred2)
0.7566666666666667
0.7566666666666667
logreg2=LogisticRegression(penalty=’l2’,C=0.01)
mod3=logreg2.fit(X_train,Y_train)
pred3=logreg2.predict(X_test)
logreg2=LogisticRegression(penalty='l2',C=0.01)
accuracy_score(y_true=Y_test, y_pred=pred3)
mod3=logreg2.fit(X_train,Y_train)
0.7
Regression Models with Python for Beginners | 117