0% found this document useful (0 votes)

10 views

Lab Experiment 4 - AI

Regression analyzes the relationships between variables. Linear regression finds a linear relationship between dependent and independent variables using historical data. Decision tree regression predicts outcomes by applying conditional rules to data. Random forests use majority voting among decision trees. Neural networks learn from data by adjusting weights to minimize prediction error. Data processing techniques like feature selection and outlier removal can improve models' predictions. The document discusses different regression techniques and how to prepare data for regression analysis.

Uploaded by

MUHAMMAD FAHEEM

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Lab Experiment 4 - AI

Uploaded by

MUHAMMAD FAHEEM

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Chapter 10

Regression

Abstract Regression estimates the relationship between dependent variables and

independent variables. Linear regression is an easily understood, popular basic
technique, which uses historical data to produce an output variable. Decision Tree
regression arrives at an estimate by applying conditional rules on the data, narrowing
possible values until a single prediction is made. Random Forests are clusters of
individual decision trees that produce a prediction by selecting a vote by majority
voting. Neural Networks are a representation of the brain and learns from the
data through adjusting weights to minimize the error of prediction. Proper Data
Processing techniques can further improve a model’s prediction such as ranking
feature importance and outlier removal.
Learning outcomes:
• Learn and apply basic models for regression tasks using sklearn and keras.
• Learn data processing techniques to achieve better results.
• Learn how to use simple feature selection techniques to improve our model.
• Data cleaning to help improve our model’s RMSE

Regression looks for relationships among variables. For example, you can
observe several employees of some company and try to understand how their salaries
depend on the features, such as experience, level of education, role, city they work
in, and so on.
This is a regression problem where data related to each employee represent one
observation. The presumption is that the experience, education, role, and city are the
independent features, and the salary of the employee depends on them.
Similarly, you can try to establish a mathematical dependence of the prices of
houses on their areas, numbers of bedrooms, distances to the city center, and so on.
Generally, in regression analysis, you usually consider some phenomenon of
interest and have a number of observations. Each observation has two or more
features. Following the assumption that (at least) one of the features depends on
the others, you try to establish a relation among them.
The dependent features are called the dependent variables, outputs, or responses.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 163
T. T. Teoh, Z. Rong, Artiﬁcial Intelligence with Python,
Machine Learning: Foundations, Methodologies, and Applications,
https://doi.org/10.1007/978-981-16-8615-3_10
164 10 Regression

The independent features are called the independent variables, inputs, or predic-
tors.
Regression problems usually have one continuous and unbounded dependent
variable. The inputs, however, can be continuous, discrete, or even categorical data
such as gender, nationality, brand, and so on.
It is a common practice to denote the outputs with x and inputs with y. If there
are two or more independent variables, they can be represented as the vector x =
(x1 , . . . , xr ), where r is the number of inputs.
When Do You Need Regression?
Typically, you need regression to answer whether and how some phenomenon
influences the other or how several variables are related. For example, you can use
it to determine if and to what extent the experience or gender impacts salaries.
Regression is also useful when you want to forecast a response using a new set
of predictors. For example, you could try to predict electricity consumption of a
household for the next hour given the outdoor temperature, time of day, and number
of residents in that household.
Regression is used in many different fields: economy, computer science, social
sciences, and so on. Its importance rises every day with the availability of large
amounts of data and increased awareness of the practical value of data.
It is important to note is that regression does not imply causation. It is easy to find
examples of non-related data that, after a regression calculation, do pass all sorts of
statistical tests. The following is a popular example that illustrates the concept of
data-driven “causality.”

It is often said that correlation does not imply causation, although, inadvertently,
we sometimes make the mistake of supposing that there is a causal link between two
variables that follow a certain common pattern
10 Regression 165

Dataset: “Alumni Giving Regression (Edited).csv”

You can obtain the dataset from this link:

https://www.dropbox.com/s/veak3ugc4wj9luz/Alumni%20Giving
→%20Regression%20%28Edited%29.csv?dl=0.

Also, you may run the following code in order to download the dataset in
google colab:
!wget https://www.dropbox.com/s/veak3ugc4wj9luz/Alumni%20Giving
→%20Regression%20%28Edited%29.csv?dl=0 -O
--quiet "./Alumni Giving Regression (Edited).csv"

!wget https://www.dropbox.com/s/veak3ugc4wj9luz/Alumni%20Giving
→%20Regression%20%28Edited%29.csv?dl=0 -O -quiet "./Alumni
→Giving Regression (Edited).csv"

from keras.models import Sequential

from keras.layers import Dense, Dropout
from sklearn.metrics import classification_report, confusion_
→matrix
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
from sklearn import linear_model
from sklearn import preprocessing
from sklearn import tree
from sklearn.ensemble import RandomForestRegressor,
→GradientBoostingRegressor
import pandas as pd
import csv

Using TensorFlow backend.

In general, we will import dataset for structured dataset using pandas. We will
also demonstrate the code for loading dataset using NumPy to show the differences
between both libraries. Here, we are using a method in pandas call read_csv,
which takes the path of a csv ﬁle. ’CS’ in CSV represents comma separated. Thus,
if you open up the ﬁle in excel, you would see values separated by commas.

np.random.seed(7)
df = pd.read_csv("Alumni Giving Regression (Edited).csv",
→delimiter="," )
df.head()
166 10 Regression

A B C D E F
0 24 0.42 0.16 0.59 0.81 0.08
1 19 0.49 0.04 0.37 0.69 0.11
2 18 0.24 0.17 0.66 0.87 0.31
3 8 0.74 0.00 0.81 0.88 0.11
4 8 0.95 0.00 0.86 0.92 0.28

In pandas, it is very convenient to handle numerical data. Before doing any

model, it is good to take a look at some of the dataset’s statistics to get a “feel”
of the data. Here, we can simple call df.describe, which is a method in pandas
dataframe
df.describe()

A B C D
→ E F
count 123.000000 123.000000 123.000000 123.000000 123.
→000000 123.000000
mean 17.772358 0.403659 0.136260 0.645203 0.
→841138 0.141789
std 4.517385 0.133897 0.060101 0.169794 0.
→083942 0.080674
min 6.000000 0.140000 0.000000 0.260000 0.
→580000 0.020000
25% 16.000000 0.320000 0.095000 0.505000 0.
→780000 0.080000
50% 18.000000 0.380000 0.130000 0.640000 0.
→840000 0.130000
75% 20.000000 0.460000 0.180000 0.785000 0.
→910000 0.170000
max 31.000000 0.950000 0.310000 0.960000 0.
→980000 0.410000

Furthermore, pandas provides a helpful method to calculate the pairwise correla-

tion between two variables. What is correlation?
The term “correlation” refers to a mutual relationship or association between
quantities (numerical number). In almost any business, it is very helping to express
one quantity in terms of its relationship with others. We are concerned with
this because business plans and departments are not isolated! For example, sales
might increase when the marketing department spends more on advertisements,
or a customer’s average purchase amount on an online site may depend on his
or her characteristics. Often, correlation is the ﬁrst step to understanding these
relationships and subsequently building better business and statistical models.
For example: “D” and “E” have a strong correlation of 0.93, which means that
when D moves in the positive direction E is likely to move in that direction too.
Here, notice that the correlation of A and A is 1. Of course, A would be perfectly
correlated with A.
10.1 Linear Regression 167

corr=df.corr(method ='pearson')
corr

A B C D E F
A 1.000000 -0.691900 0.414978 -0.604574 -0.521985 -0.549244
B -0.691900 1.000000 -0.581516 0.487248 0.376735 0.540427
C 0.414978 -0.581516 1.000000 0.017023 0.055766 -0.175102
D -0.604574 0.487248 0.017023 1.000000 0.934396 0.681660
E -0.521985 0.376735 0.055766 0.934396 1.000000 0.647625
F -0.549244 0.540427 -0.175102 0.681660 0.647625 1.000000

In general, we would need to test our model. train_test_split is a func-

tion in Sklearn model selection for splitting data arrays into two subsets for train-
ing data and for testing data. With this function, you do not need to divide the dataset
manually. You can use from the function train_test_split using the follow-
ing code sklearn.model_selection import train_test_split.
By default, Sklearn train_test_split will make random partitions for the two subsets.
However, you can also specify a random state for the operation.
Here, take note that we will need to pass in the X and Y to the function. X refers
to the features while Y refers to the target of the dataset.
Y_POSITION = 5
model_1_features = [i for i in range(0,Y_POSITION)]
X = df.iloc[:,model_1_features]
Y = df.iloc[:,Y_POSITION]

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_

→size=0.20, random_state=2020)

10.1 Linear Regression

Linear regression is a basic predictive analytics technique that uses historical data to
predict an output variable. It is popular for predictive modeling because it is easily
understood and can be explained using plain English.
The basic idea is that if we can ﬁt a linear regression model to observed data, we
can then use the model to predict any future values. For example, let us assume that
we have found from historical data that the price (P) of a house is linearly dependent
upon its size (S)—in fact, we found that a house’s price is exactly 90 times its size.
The equation will look like this: P = 90*S
With this model, we can then predict the cost of any house. If we have a
house that is 1,500 square feet, we can calculate its price to be: P = 90*1500
= $135,000
168 10 Regression

There are two kinds of variables in a linear regression model:

• The input or predictor variable is the variable(s) that help predict the value of the
output variable. It is commonly referred to as X.
• The output variable is the variable that we want to predict. It is commonly
referred to as Y.
To estimate Y using linear regression, we assume the equation: Ye = α + β X
where Ye is the estimated or predicted value of Y based on our linear equation. Our
goal is to find statistically significant values of the parameters α and β that minimize
the difference between Y and Ye . If we are able to determine the optimum values of
these two parameters, then we will have the line of best fit that we can use to predict
the values of Y, given the value of X. So, how do we estimate α and β? We can use
a method called ordinary least squares.

The objective of the least squares method is to ﬁnd values of α and β that
minimize the sum of the squared difference between Y and Ye . We will not delve
into the mathematics of least squares in our book.
Here, we notice that when E increases by 1, our Y increases by 0.175399. Also,
when C increases by 1, our Y falls by 0.044160.

model1 = linear_model.LinearRegression()
model1.fit(X_train, y_train)
y_pred_train1 = model1.predict(X_train)
print("Regression")
print("================================")
RMSE_train1 = mean_squared_error(y_train,y_pred_train1)

print("Regression Train set: RMSE ".format(RMSE_train1))

(continues on next page)
10.2 Decision Tree Regression 169

(continued from previous page)

print("================================")
y_pred1 = model1.predict(X_test)
RMSE_test1 = mean_squared_error(y_test,y_pred1)
print("Regression Test set: RMSE ".format(RMSE_test1))
print("================================")

coef_dict = {}
for coef, feat in zip(model1.coef_,model_1_features):
coef_dict[df.columns[feat]] = coef

print(coef_dict)

Regression
================================
Regression Train set: RMSE 0.0027616933222892287
================================
Regression Test set: RMSE 0.0042098240263563754
================================
{'A': -0.0009337757382417014, 'B': 0.16012156890162915, 'C': -
→0.04416001542534971, 'D': 0.15217907817100398, 'E': 0.
→17539950794101034}

10.2 Decision Tree Regression

A decision tree is arriving at an estimate by asking a series of questions to the data,

each question narrowing our possible values until the model gets confident enough
to make a single prediction. The order of the question and their content are being
determined by the model. In addition, the questions asked are all in a True/False
form.
This is a little tough to grasp because it is not how humans naturally think, and
perhaps the best way to show this difference is to create a real decision tree from. In
the above problem x1, x2 are two features that allow us to make predictions for the
target variable y by asking True/False questions.
The decision of making strategic splits heavily affects a tree’s accuracy. The
decision criteria are different for classification and regression trees. Decision trees
regression normally use mean squared error (MSE) to decide to split a node into
two or more sub-nodes. Suppose we are doing a binary tree; the algorithm will first
pick a value and split the data into two subsets. For each subset, it will calculate the
MSE separately. The tree chooses the value with results in smallest MSE value.
Let us examine how is Splitting Decided for Decision Trees Regressor in more
detail. The first step to create a tree is to create the first binary decision. How are
you going to do it?

[Ebooks PDF] download Essentials of Econometrics 5th Edition Damodar Gujarati full chapters
100% (6)
[Ebooks PDF] download Essentials of Econometrics 5th Edition Damodar Gujarati full chapters
50 pages
PDF Real Estate Modelling and Forecasting Chris Brooks download
50% (2)
PDF Real Estate Modelling and Forecasting Chris Brooks download
81 pages
ECN 813 Dummy Variable
No ratings yet
ECN 813 Dummy Variable
21 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
lab mannual of ML
No ratings yet
lab mannual of ML
43 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Commonly Used Machine Learning Algorithms
No ratings yet
Commonly Used Machine Learning Algorithms
38 pages
ML unit-2
No ratings yet
ML unit-2
52 pages
6_Classification and Regression Tasks
No ratings yet
6_Classification and Regression Tasks
115 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
AI lab7
No ratings yet
AI lab7
13 pages
Machine Learning
100% (3)
Machine Learning
46 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
m2 Data analytic and visualization
No ratings yet
m2 Data analytic and visualization
53 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
Module 2
No ratings yet
Module 2
20 pages
7 محاضرات
No ratings yet
7 محاضرات
36 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Top 90+ Data Science Interview Questions and Answers (2024)
No ratings yet
Top 90+ Data Science Interview Questions and Answers (2024)
38 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
Module5
No ratings yet
Module5
30 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
ML 2 nd Unit
No ratings yet
ML 2 nd Unit
50 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
6_Classification and Regression Tasks (3)
No ratings yet
6_Classification and Regression Tasks (3)
100 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
92 pages
Regression
No ratings yet
Regression
16 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
Linear Regression
No ratings yet
Linear Regression
64 pages
data-analytics-manual lab g.anill kumar
No ratings yet
data-analytics-manual lab g.anill kumar
23 pages
3. Machine Learning
No ratings yet
3. Machine Learning
158 pages
ML manoj
No ratings yet
ML manoj
51 pages
Linear Regression Program So Far
No ratings yet
Linear Regression Program So Far
33 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
67 pages
datamining unit4
No ratings yet
datamining unit4
21 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
12 pages
Unit 5
No ratings yet
Unit 5
171 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
66 pages
Regression Analysis in Machine Learning - Javatpoint
No ratings yet
Regression Analysis in Machine Learning - Javatpoint
1 page
ML PR-2
No ratings yet
ML PR-2
11 pages
MLT Unit 2 Linear Regression
No ratings yet
MLT Unit 2 Linear Regression
26 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Unit 1(DS)
No ratings yet
Unit 1(DS)
15 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
unit-3
No ratings yet
unit-3
30 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Method Validation of Analytical Procedures
100% (1)
Method Validation of Analytical Procedures
14 pages
Chapter 13 Assignment Linear Regression and Correlation
No ratings yet
Chapter 13 Assignment Linear Regression and Correlation
4 pages
An Introduction To Mathematical Statistics Fetsje Bijma Marianne Jonker Aad Vaart Stichting Epsilon Uitgaven Reinie Ern instant download
No ratings yet
An Introduction To Mathematical Statistics Fetsje Bijma Marianne Jonker Aad Vaart Stichting Epsilon Uitgaven Reinie Ern instant download
83 pages
Missing Data: I. Types of Missing Data. There Are Several Useful Distinctions We Can Make
No ratings yet
Missing Data: I. Types of Missing Data. There Are Several Useful Distinctions We Can Make
19 pages
18CS72 - Module 5 Notes
No ratings yet
18CS72 - Module 5 Notes
53 pages
COURSES ECONOMETRICS Multiple Regression, Dummy, Error Anal
No ratings yet
COURSES ECONOMETRICS Multiple Regression, Dummy, Error Anal
26 pages
Missing Data Techniques - UCLA
No ratings yet
Missing Data Techniques - UCLA
66 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
The Effect of Inventory Management On Firm Performance
No ratings yet
The Effect of Inventory Management On Firm Performance
15 pages
Panel Data Analysis Using Stata: Sebastian T. Braun University of ST Andrews
No ratings yet
Panel Data Analysis Using Stata: Sebastian T. Braun University of ST Andrews
90 pages
Syl5213 08
No ratings yet
Syl5213 08
11 pages
Ftee 111 91 PDF
No ratings yet
Ftee 111 91 PDF
6 pages
Seawright, Jason y John Gerring
No ratings yet
Seawright, Jason y John Gerring
15 pages
Intraday Market Predictability - A Machine Learning Approach
No ratings yet
Intraday Market Predictability - A Machine Learning Approach
57 pages
Lecture 1. Part 1-Regression Analysis. Correlation and SLRM
No ratings yet
Lecture 1. Part 1-Regression Analysis. Correlation and SLRM
44 pages
ML Supervised Full Notes
No ratings yet
ML Supervised Full Notes
62 pages
Icp PDF
No ratings yet
Icp PDF
46 pages
Ch3 Forecasting
No ratings yet
Ch3 Forecasting
53 pages
Linear Regression Approach To Academic Performance
No ratings yet
Linear Regression Approach To Academic Performance
27 pages
MATH142 Engineering Data Analysis
No ratings yet
MATH142 Engineering Data Analysis
2 pages
Analysis of Liquidity, Profitability and Company Size Ratios To Coal Company Value On The Indonesia Stock Exchange 2017-2022 With Capital Structure As An Intervening Factor
No ratings yet
Analysis of Liquidity, Profitability and Company Size Ratios To Coal Company Value On The Indonesia Stock Exchange 2017-2022 With Capital Structure As An Intervening Factor
10 pages
OPMA chapter 3 ppt and textbook
No ratings yet
OPMA chapter 3 ppt and textbook
65 pages
FRM Part 1 2023 Vs 2024 LO Comparison
No ratings yet
FRM Part 1 2023 Vs 2024 LO Comparison
7 pages
MC Multiple Regression
No ratings yet
MC Multiple Regression
7 pages
Akuntansi Kreatif PDF
No ratings yet
Akuntansi Kreatif PDF
14 pages
Lab Manual Phy1Lab Expt. 1
No ratings yet
Lab Manual Phy1Lab Expt. 1
7 pages
Lecture 3 - Trip Generation: Transportation Planning
No ratings yet
Lecture 3 - Trip Generation: Transportation Planning
49 pages

Lab Experiment 4 - AI

Uploaded by

Lab Experiment 4 - AI

Uploaded by

Chapter 10

Abstract Regression estimates the relationship between dependent variables and

Dataset: “Alumni Giving Regression (Edited).csv”

from keras.models import Sequential

Using TensorFlow backend.

In pandas, it is very convenient to handle numerical data. Before doing any

Furthermore, pandas provides a helpful method to calculate the pairwise correla-

In general, we would need to test our model. train_test_split is a func-

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_

10.1 Linear Regression

There are two kinds of variables in a linear regression model:

print("Regression Train set: RMSE ".format(RMSE_train1))

(continued from previous page)

10.2 Decision Tree Regression

A decision tree is arriving at an estimate by asking a series of questions to the data,

You might also like