0% found this document useful (0 votes)

68 views

Multple Linear Regression

This document discusses developing a multiple linear regression model using Python to predict auction prices of cricket players. It loads a dataset containing performance statistics of players from the Indian Premier League (IPL) and describes the variables in the dataset. These variables measure players' performances in formats like IPL, international ODIs and Tests. It will use these predictor variables to build a linear regression model to predict players' auction prices.

Uploaded by

Neha Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views

Multple Linear Regression

Uploaded by

Neha Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

112 Machine Learning using Python

3 LF L L I
B SUHGB BORZ SUHGB B L ZOVBSUHGLFWLR BVWG PEDBVDODU BOP
WHVWB
DOS D
L I P
SUHGB BGI SG DWD)UDPH UDGHB BSHUF WHVWB > 3HUFH WD H L
UDGH
SUHGB SUHGB
SUHGB BOHIW SUHGB BORZ
SUHGB BUL W SUHGB B L

SUHGB BGI>

grade_10_perc pred_y pred_y_left pred_y_right

6 70.0 279828.402452 158379.832044 401276.972860
36 68.0 272707.227686 151576.715020 393837.740352
37 52.0 215737.829560 92950.942395 338524.716726
28 58.0 237101.353858 115806.869618 358395.838097
43 74.5 295851.045675 173266.083342 418436.008008
49 60.8 247070.998530 126117.560983 368024.436076
5 55.0 226419.591709 104507.444388 348331.739030
33 78.0 308313.101515 184450.060488 432176.142542
20 63.0 254904.290772 134057.999258 375750.582286
42 74.4 295494.986937 172941.528691 418048.445182

4.5 | MULTIPLE LINEAR REGRESSION

association relationship between a dependent variable (aka response variable or outcome variable) and
several independent variables (aka explanatory variables or predictor variable or features).

Yi = b 0 + b 1 X1i + b 2 X2i + L + b k X ki + e i
e regression coe cients b1, b2, … , bk are called partial regression coe cients since the relationship
between an explanatory variable and the response (outcome) variable is calculated a er removing (or
controlling) the e ect all the other explanatory variables (features) in the model.
e assumptions that are made in multiple linear regression model are as follows:

1. e regression model is linear in regression parameters (b-values).

2. e residuals follow a normal distribution and the expected value (mean) of the residuals is zero.
3. In time series data, residuals are assumed to uncorrelated.

Chapter 04_Linear Regression.indd 112 4/24/2019 6:54:46 PM

Chapter 4 • Linear Regression 113

4. e variance of the residuals is constant for all values of Xi. When the variance of the residuals
is constant for di erent values of Xi, it is called homoscedasticity. A non-constant variance of
residuals is called heteroscedasticity.
5. ere is no high correlation between independent variables in the model (called multi-collinearity).
Multi-collinearity can destabilize the model and can result in an incorrect estimation of the
regression parameters.

e partial regressions coe cients are estimated by minimizing the sum of squared errors (SSE). We will
explain the multiple linear regression model by using the example of auction pricing of players in the

4.5.1 | Predicting the SOLD PRICE (Auction Price) of Players

players and popular Indian players are auctioned.

-
lows the Twenty20 format of the game, it is possible that the performance of the players in the other
formats of the game such as Test and One-Day matches could in uence player pricing. A few players
had excellent records in Test matches, but their records in Twenty20 matches were not very impressive.
−2011) measured
through various performance metrics are provided in Table 4.3.

TABLE 4.3 Metadata of IPL dataset

Data Code Description
AGE Age of the player at the time of auction classi ed into three categories. Category 1 (L25) means the player is less
than 25 years old, category 2 means that the age is between 25 and 35 years (B25− 35) and category 3 means
that the age is more than 35 (A35).
RUNS-S Number of runs scored by a player.
RUNS-C Number of runs conceded by a player.
HS Highest score by a batsman in IPL.
AVE-B Average runs scored by a batsman in IPL.
AVE-BL Bowling average (number of runs conceded/number of wickets taken) in IPL.
SR-B Batting strike rate (ratio of the number of runs scored to the number of balls faced) in IPL.
SR-BL Bowling strike rate (ratio of the number of balls bowled to the number of wickets taken) in IPL.
SIXERS Number of six runs scored by a player in IPL.
WKTS Number of wickets taken by a player in IPL.
(Continued)

Chapter 04_Linear Regression.indd 113 4/24/2019 6:54:46 PM

114 Machine Learning using Python

TABLE 4.3 (Continued)

Data Code Description
ECON Economy rate of a bowler (number of runs conceded by the bowler per over) in IPL.
CAPTAINCY EXP Captained either a T20 team or a national team.
ODI-SR-B Batting strike rate in One-Day Internationals.
ODI-SR-BL Bowling strike rate in One-Day Internationals.
ODI-RUNS-S Runs scored in One-Day Internationals.
ODI-WKTS Wickets taken in One-Day Internationals.
T-RUNS-S Runs scored in Test matches.
T-WKTS Wickets taken in Test matches.
PLAYER-SKILL Player’s primary skill (batsman, bowler, or allrounder).
COUNTRY Country of origin of the player (AUS: Australia; IND: India; PAK: Pakistan; SA: South Africa; SL: Sri Lanka; NZ: New
Zealand; WI: West Indies; OTH: Other countries).
YEAR-A Year of Auction in IPL.
IPL TEAM Team(s) for which the player had played in the IPL (CSK: Chennai Super Kings; DC: Deccan Chargers; DD: Delhi Dare-
devils; KXI: Kings XI Punjab; KKR: Kolkata Knight Riders; MI: Mumbai Indians; PWI: Pune Warriors India; RR: Rajasthan
Royals; RCB: Royal Challengers Bangalore). A + sign is used to indicate that the player has played for more than one
team. For example, CSK+ would mean that the player has played for CSK as well as for one or more other teams.

4.5.2 | Developing Multiple Linear Regression Model Using Python

In this section, we will be discussing various steps involved in developing a multiple linear regression

4.5.2.1 Loading the Dataset

3 0 3 F the le and print the meta data.

ipl_auction_df = pd.read_csv( ‘IPL IMB381IPL2013.csv’ )

LSOBDXFWLR BGI L IR

<class ꞌpandas.core.frame.DataFrame’>
Rangelndex: 130 entries, 0 to 129
Data columns (total 26 columns):
6O 1 R XOO L W
3/ <(5 1 0( R XOO REMHFW
( R XOO L W
8175< R XOO REMHFW
7( 0 R XOO REMHFW

Chapter 04_Linear Regression.indd 114 4/24/2019 6:54:46 PM

Chapter 4 • Linear Regression 115

3/ <,1 5 /( R XOO REMHFW

7 5816 R XOO L W
T-WKTS R XOO L W
, 5816 6 R XOO L W
, 65 % R XOO oat64
ODI-WKTS R XOO L W
, 65 %/ R XOO oat64
37 ,1 < ( 3 R XOO L W
5816 6 R XOO L W
6 R XOO L W
9( R XOO oat64
65 % R XOO oat64
6, (56 R XOO L W
5816 R XOO L W
WKTS R XOO L W
9( %/ R XOO oat64
( 1 R XOO oat64
65 %/ R XOO oat64
8 7, 1 <( 5 R XOO L W
% 6( 35, ( R XOO L W
6 / 35, ( R XOO L W
dtypes: oat64(7), int64(15), object(4)
memory usage: 26.5+ KB
ere are 130 observations (records) and 26 columns (features) in the data, and there are no missing
values.

4.5.2.2 Displaying the First Five Records

As the number of columns is very large, we will display the initial 10 columns for the rst 5 rows. e
function df.iloc() is used for displaying a subset of the dataset.

LSOBDXFWLR BGI LORF>

Sl. NO. PLAYER NAME AGE COUNTRY TEAM PLAYING ROLE T-RUNS T-WKTS ODI-RUNS-S ODI-SR-B
0 1 Abdulla, YA 2 SA KXIP Allrounder 0 0 0 0.00
1 2 Abdur Razzak 2 BAN RCB Bowler 214 18 657 71.41
2 3 Agarkar, AB 2 IND KKR Bowler 571 58 1269 80.62
3 4 Ashwin, R 1 IND CSK Bowler 284 31 241 84.56
4 5 Badrinath, S 2 IND CSK Batsman 63 0 79 45.93

We can build a model to understand what features of players are in uencing their SOLD PRICE or
predict the player’s auction prices in future. However, all columns are not features. For example, Sl. NO.

Chapter 04_Linear Regression.indd 115 4/24/2019 6:54:46 PM

116 Machine Learning using Python

is just a serial number and cannot be considered a feature of the player. We will build a model using only
player’s statistics. So, BASE PRICE can also be removed. We will create a variable X_feature which will
contain the list of features that we will nally use for building the model and ignore rest of the columns
of the DataFrame. e following function is used for including the features in the model building.

BIHDWXUHV LSOBDXFWLR BGI FROXP V

Most of the features in the dataset are numerical (ratio scale) whereas features such as AGE, COUNTRY,
PLAYING ROLE, CAPTAINCY EXP are categorical and hence need to be encoded before building the
model. Categorical variables cannot be directly included in the regression model, and they must be
encoded using dummy variables before incorporating in the model building.

BIHDWXUHV > ( 8175< 3/ <,1 5 /(

‘T-RUNS’, ‘T-WKTS’, ‘ODI-RUNS-S’, ‘ODI-SR-B’,
‘ODI-WKTS’, ‘ODI-SR-BL’, ‘CAPTAINCY EXP’, ‘RUNS-S’,
‘HS’, ‘AVE’, ‘SR-B’, ‘SIXERS’, ‘RUNS-C’, ‘WKTS’,
9( %/ ( 1 65 %/

4.5.3 | Encoding Categorical Features

Qualitative variables or categorical variables need to be encoded using dummy variables before incorporating
them in the regression model. If a categorical variable has n categories (e.g., the player role in the data has four
categories, namely, batsman, bowler, wicket-keeper and allrounder), then we will need n − 1 dummy vari-
ables. So, in the case of PLAYING ROLE, we will need three dummy variables since there are four categories.
Finding unique values of column PLAYING ROLE shows the values: Allrounder, Bowler, Batsman,
W. Keeper
variables:

ipl_auction_df[‘PLAYING ROLE’].unique()

array([‘Allrounder’, ‘Bowler’, ‘Batsman’, ‘W. Keeper’], dtype=object)

e variable can be converted into four dummy variables. Set the variable value to 1 to indicate the role
of the player. is can be done using pd.get_dummies() method. We will create dummy variables for only
PLAYING ROLE to understand and then create dummy variables for the rest of the categorical variables.

SG HWBGXPPLHV LSOBDXFWLR BGI> 3/ <,1 5 /( >

Allrounder Batsman Bowler W. Keeper

0 1.0 0.0 0.0 0.0
1 0.0 0.0 1.0 0.0
(Continued)

Chapter 04_Linear Regression.indd 116 4/24/2019 6:54:46 PM

Chapter 4 • Linear Regression 117

Allrounder Batsman Bowler W. Keeper

2 0.0 0.0 1.0 0.0
3 0.0 0.0 1.0 0.0
4 0.0 1.0 0.0 0.0

As shown in the table above, the pd.get_dummies() method has created four dummy variables and has
already set the variables to 1 as variable value in each sample.
Whenever we have n levels (or categories) for a qualitative variable (categorical variable), we will use
(n − 1) dummy variables, where each dummy variable is a binary variable used for representing whether
an observation belongs to a category or not. e reason why we create only (n − 1) dummy variables
is that inclusion of dummy variables for all categories and the constant in the regression equation will
create perfect multi-collinearity (will be discussed later). To drop one category, the parameter drop_ rst
should be set to True.
We must create dummy variables for all categorical (qualitative) variables present in the dataset.

FDWH RULFDOBIHDWXUHV > ( 8175< 3/ <,1 5 /(

37 ,1 < ( 3

LSOBDXFWLR BH FRGHGBGI SG HWBGXPPLHV LSOBDXFWLR BGI> BIHDWXUHV

FROXP V FDWH RULFDOBIHDWXUHV
drop_ rst =

LSOBDXFWLR BH FRGHGBGI FROXP V

Index([‘T-RUNS’, ‘T-WKTS’, ‘ODI-RUNS-S’, ‘ODI-SR-B’, ‘ODI-WKTS’,

, 65 %/ 5816 6 6 9( 65 % 6, (56
‘RUNS-C’, ‘WKTS’, ‘AVE-BL’, ‘ECON’, ‘SR-BL’, ‘AGE_2’,
(B 8175<B% 1 8175<B(1 8175<B,1
‘COUNTRY_NZ’, ‘COUNTRY_PAK’, ‘COUNTRY_SA’, ‘COUNTRY_SL’,
8175<B:, 8175<B=,0 3/ <,1 5 /(B%DWVPD
‘PLAYING ROLE_Bowler’, ‘PLAYING ROLE_W. Keeper’,
37 ,1 < ( 3B GW SH REMHFW
e dataset contains the new dummy variables that have been created. We can reassign the new features
to the variable X_features, which we created earlier to keep track of all features that will be used to build
the model nally.

BIHDWXUHV LSOBDXFWLR BH FRGHGBGI FROXP V

4.5.4 | Splitting the Dataset into Train and Validation Sets

Before building the model, we will split the dataset into 80:20 ratio. e split function allows using a
parameter random_state, which is a seed function for reproducibility of randomness. is parameter is
not required to be passed. Setting this variable to a xed number will make sure that the records that go

Chapter 04_Linear Regression.indd 117 4/24/2019 6:54:46 PM

118 Machine Learning using Python

into training and test set remain unchanged and hence the results can be reproduced. We will use the
value 42 (it is again selected randomly). You can use the same random seed of 42 for the reproducibility

di erent results.

VP DGGBFR VWD W LSOBDXFWLR BH FRGHGBGI

< LSOBDXFWLR BGI> 6 / 35, (
WUDL B WHVWB WUDL B WHVWB WUDL BWHVWBVSOLW
<
WUDL BVL]H
random_state = 42 )

4.5.5 | Building the Model on the Training Dataset

provides details of the model accuracy, feature signi cance, and signs of any multi-collinearity e ect,
which is discussed in detail in the next section.

ipl_model_1 = sm.OLS(train_y, train_X). t()

ipl_model_1.summary2()

TABLE 4.4 Model summary for ipl_model_1

Model: OLS Adj. R-squared: 0.362
Dependent Variable: SOLD PRICE AIC: 2965.2841
Date: 2018-04-08 07:27 BIC: 3049.9046
No. Observations: 104 Log-Likelihood: −1450.6
Df Model: 31 F-statistic: 2.883
Df Residuals: 72 Prob (F-statistic): 0.000114
R-squared: 0.554 Scale: 1.1034e+11

Coef. Std.Err. t P > |t| [0.025 0.975]

const 375827.1991 228849.9306 1.6422 0.1049 −80376.7996 832031.1978
T-RUNS −53.7890 32.7172 −1.6441 0.1045 −119.0096 11.4316
T-WKTS −132.5967 609.7525 −0.2175 0.8285 −1348.1162 1082.9228
ODI-RUNS-S 57.9600 31.5071 1.8396 0.0700 −4.8482 120.7681
ODI-SR-B −524.1450 1576.6368 −0.3324 0.7405 −3667.1130 2618.8231
ODI-WKTS 815.3944 832.3883 0.9796 0.3306 −843.9413 2474.7301
ODI-SR-BL −773.3092 1536.3334 −0.5033 0.6163 −3835.9338 2289.3154
RUNS-S 114.7205 173.3088 0.6619 0.5101 −230.7643 460.2054
HS −5516.3354 2586.3277 −2.1329 0.0363 −10672.0855 −360.5853
AVE 21560.2760 7774.2419 2.7733 0.0071 6062.6080 37057.9439
(Continued)

Chapter 04_Linear Regression.indd 118 4/24/2019 6:54:46 PM

Chapter 4 • Linear Regression 119

Coef. Std.Err. t P > |t| [0.025 0.975]

SR-B −1324.7218 1373.1303 −0.9647 0.3379 −4062.0071 1412.5635
SIXERS 4264.1001 4089.6000 1.0427 0.3006 −3888.3685 12416.5687
RUNS-C 69.8250 297.6697 0.2346 0.8152 −523.5687 663.2187
WKTS 3075.2422 7262.4452 0.4234 0.6732 −11402.1778 17552.6622
AVE-BL 5182.9335 10230.1581 0.5066 0.6140 −15210.5140 25576.3810
ECON −6820.7781 13109.3693 −0.5203 0.6045 −32953.8282 19312.2721
SR-BL −7658.8094 14041.8735 −0.5454 0.5871 −35650.7726 20333.1539
AGE_2 −230767.6463 114117.2005 −2.0222 0.0469 −458256.1279 −3279.1648
AGE_3 −216827.0808 152246.6232 −1.4242 0.1587 −520325.1772 86671.0155
COUNTRY_BAN −122103.5196 438719.2796 −0.2783 0.7816 −996674.4194 752467.3801
COUNTRY_ENG 672410.7654 238386.2220 2.8207 0.0062 197196.5172 1147625.0135
COUNTRY_IND 155306.4011 126316.3449 1.2295 0.2229 −96500.6302 407113.4325
COUNTRY_NZ 194218.9120 173491.9293 1.1195 0.2667 −151630.9280 540068.7521
COUNTRY_PAK 75921.7670 193463.5545 0.3924 0.6959 −309740.7804 461584.3143
COUNTRY_SA 64283.3894 144587.6773 0.4446 0.6579 −223946.8775 352513.6563
COUNTRY_SL 17360.1530 176333.7497 0.0985 0.9218 −334154.7526 368875.0586
COUNTRY_WI 10607.7792 230686.7892 0.0460 0.9635 −449257.9303 470473.4887
COUNTRY_ZIM −145494.4793 401505.2815 −0.3624 0.7181 −945880.6296 654891.6710
PLAYING ROLE_Batsman 75724.7643 150250.0240 0.5040 0.6158 −223793.1844 375242.7130
PLAYING ROLE_Bowler 15395.8752 126308.1272 0.1219 0.9033 −236394.7744 267186.5249
PLAYING ROLE_W. Keeper −71358.6280 213585.7444 –0.3341 0.7393 −497134.0278 354416.7718
CAPTAINCY EXP_1 164113.3972 123430.6353 1.3296 0.1878 −81941.0772 410167.8716

Omnibus: 0.891 Durbin-Watson: 2.244

Prob(Omnibus): 0.640 Jarque-Bera (JB): 0.638
Skew: 0.190 Prob(JB): 0.727
Kurtosis: 3.059 Condition No.: 84116

p-value (<0.05), only the features

HS, AGE_2, AVE and COUNTRY_ENG have come out signi cant. e model says that none of the other
features are in uencing SOLD PRICE (at a signi cance value of 0.05). is is not very intuitive and could
be a result of multi-collinearity e ect of variables.

4.5.6 | Multi-Collinearity and Handling Multi-Collinearity

When the dataset has a large number of independent variables (features), it is possible that few of these
independent variables (features) may be highly correlated. e existence of a high correlation between

Chapter 04_Linear Regression.indd 119 4/24/2019 6:54:46 PM

Maestro Legend
75% (4)
Maestro Legend
34 pages
5 Costa - Teacher Behaviors That Enable Student Thinking
No ratings yet
5 Costa - Teacher Behaviors That Enable Student Thinking
14 pages
Indian Premier League - Final
67% (3)
Indian Premier League - Final
12 pages
Advanced Kicking and Banking for Pool and Billiards
From Everand
Advanced Kicking and Banking for Pool and Billiards
Bret Icenogle
5/5 (2)
1276
No ratings yet
1276
13 pages
Advanced Data Analytics Assignment
No ratings yet
Advanced Data Analytics Assignment
17 pages
UNIT 4 Updated
No ratings yet
UNIT 4 Updated
58 pages
Chapter 4 - Linear Regression
100% (2)
Chapter 4 - Linear Regression
25 pages
Unit II - Diagnotis and Multiple Linear
No ratings yet
Unit II - Diagnotis and Multiple Linear
8 pages
IPL
No ratings yet
IPL
8 pages
ML manoj
No ratings yet
ML manoj
51 pages
Interim Layout
No ratings yet
Interim Layout
9 pages
Business Analytics Using Data Mining
No ratings yet
Business Analytics Using Data Mining
5 pages
Project Report
No ratings yet
Project Report
16 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
PM Week1 MLSDeck0.2
No ratings yet
PM Week1 MLSDeck0.2
15 pages
Regression Using Spss
No ratings yet
Regression Using Spss
12 pages
Big Data Assn Document PDF
No ratings yet
Big Data Assn Document PDF
22 pages
Lecture 3
No ratings yet
Lecture 3
90 pages
15BCE0435 - Lab 3
No ratings yet
15BCE0435 - Lab 3
1 page
Final Cc01 Group7
No ratings yet
Final Cc01 Group7
23 pages
FINAL - CC01 - Group7
No ratings yet
FINAL - CC01 - Group7
23 pages
Predictive Analytics - Regression
No ratings yet
Predictive Analytics - Regression
27 pages
Practical - Regression
No ratings yet
Practical - Regression
114 pages
Hhghiikkk
No ratings yet
Hhghiikkk
29 pages
IPL Score Prediction (Journal) - 4nm18cs142-169-191-215.
No ratings yet
IPL Score Prediction (Journal) - 4nm18cs142-169-191-215.
10 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
Ipl 2
No ratings yet
Ipl 2
6 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Multilinear ProblemStatement
No ratings yet
Multilinear ProblemStatement
132 pages
Fsgs
No ratings yet
Fsgs
28 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
5 pages
Solution Methodology
No ratings yet
Solution Methodology
3 pages
Regression Test Lesson Notes (Optional Download)
No ratings yet
Regression Test Lesson Notes (Optional Download)
5 pages
ML Unit
No ratings yet
ML Unit
23 pages
Linear Algebra
No ratings yet
Linear Algebra
21 pages
Basketball Prediction Using Multiple Regression As A Data Model in Predicting The Outcome of Game
No ratings yet
Basketball Prediction Using Multiple Regression As A Data Model in Predicting The Outcome of Game
11 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Module 4
No ratings yet
Module 4
41 pages
1694600692-Unit2.1 Linear Regression CU 2.0
No ratings yet
1694600692-Unit2.1 Linear Regression CU 2.0
45 pages
Regression Model
No ratings yet
Regression Model
30 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
33 pages
AI Lec 3
No ratings yet
AI Lec 3
36 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Simple Linear Regression - Assignn5
No ratings yet
Simple Linear Regression - Assignn5
8 pages
KUNJ1
No ratings yet
KUNJ1
17 pages
Unit 5
No ratings yet
Unit 5
171 pages
Session7 LinearRegression
No ratings yet
Session7 LinearRegression
52 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Day 6 Session 2 MLR
No ratings yet
Day 6 Session 2 MLR
16 pages
Lecture 3
No ratings yet
Lecture 3
42 pages
Lecture 12 - Adv. Correlation and Multiple Regression
No ratings yet
Lecture 12 - Adv. Correlation and Multiple Regression
32 pages
Regression - Part III - 2021
No ratings yet
Regression - Part III - 2021
55 pages
SMEC ML LAB MANUAL R22
No ratings yet
SMEC ML LAB MANUAL R22
21 pages
Linear Regression (BA)
No ratings yet
Linear Regression (BA)
13 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
Big Data Analysis Ipl Case Questions: Vanshika Shukla MFM/18/66
No ratings yet
Big Data Analysis Ipl Case Questions: Vanshika Shukla MFM/18/66
24 pages
Lecture - 8 MLR
No ratings yet
Lecture - 8 MLR
63 pages
Meaning and Importance of Economics For Engineers 1
No ratings yet
Meaning and Importance of Economics For Engineers 1
18 pages
Lab Reports On Case Study
No ratings yet
Lab Reports On Case Study
22 pages
Machine Learning Assigment 02
No ratings yet
Machine Learning Assigment 02
2 pages
Assignment 1 (ADA 1)
No ratings yet
Assignment 1 (ADA 1)
12 pages
Industrial Flow Measurement
100% (1)
Industrial Flow Measurement
464 pages
Formative Assessments-1-9 - Nathan Wiese
No ratings yet
Formative Assessments-1-9 - Nathan Wiese
9 pages
Raj J: IT Manager, Head IT, GM IT & CTO
No ratings yet
Raj J: IT Manager, Head IT, GM IT & CTO
5 pages
Pentest Report
No ratings yet
Pentest Report
11 pages
Jeepney History
No ratings yet
Jeepney History
3 pages
SOR Dispenser -Champawat
No ratings yet
SOR Dispenser -Champawat
6 pages
Griffin - ch06 Planning and Decison Making
No ratings yet
Griffin - ch06 Planning and Decison Making
29 pages
MIL QUARTER 2 WEEK 7
No ratings yet
MIL QUARTER 2 WEEK 7
1 page
Draft PSG For The Bachelor of Science in Civil Engineering BSCE Effective AY 2018 2019
No ratings yet
Draft PSG For The Bachelor of Science in Civil Engineering BSCE Effective AY 2018 2019
22 pages
Caparo Industries PLC V Dickman
No ratings yet
Caparo Industries PLC V Dickman
3 pages
Yarn
100% (1)
Yarn
37 pages
Visual Communication Brochure
No ratings yet
Visual Communication Brochure
2 pages
Self Checkout Solutions at Spar: Gary Harris - Head of Brand Spar International
No ratings yet
Self Checkout Solutions at Spar: Gary Harris - Head of Brand Spar International
30 pages
Polytone: ABR Series/Thermoplastic Acrylic Resins
No ratings yet
Polytone: ABR Series/Thermoplastic Acrylic Resins
7 pages
CS26110 - Artificial Intelligence: This Question Examines The Area of Uninformed Search
No ratings yet
CS26110 - Artificial Intelligence: This Question Examines The Area of Uninformed Search
8 pages
Safety Work Procedure
No ratings yet
Safety Work Procedure
2 pages
The Different Functions of Intonation
No ratings yet
The Different Functions of Intonation
30 pages
Moral Stories - in Kannada
70% (10)
Moral Stories - in Kannada
19 pages
CCNA2 Module 1
No ratings yet
CCNA2 Module 1
20 pages
21741-books-doubtnut-question-bank
No ratings yet
21741-books-doubtnut-question-bank
180 pages
Management Services
No ratings yet
Management Services
6 pages
2.4 Geometric Design - Sight Distances SSD (Numerical) & OSD
No ratings yet
2.4 Geometric Design - Sight Distances SSD (Numerical) & OSD
30 pages
E3FA Photoelectric Sensor With Adjustable Distance - 英
No ratings yet
E3FA Photoelectric Sensor With Adjustable Distance - 英
2 pages
Operator S Manual: Model Series 760-770 Lawn Tractor
No ratings yet
Operator S Manual: Model Series 760-770 Lawn Tractor
108 pages
Korean Vocabulary WB 1 - FREE
100% (1)
Korean Vocabulary WB 1 - FREE
18 pages
pakistan and india size - Google Search
No ratings yet
pakistan and india size - Google Search
1 page
VX-351 PMR446 SM Ec083u90f
No ratings yet
VX-351 PMR446 SM Ec083u90f
20 pages
Internal Audit Policy
No ratings yet
Internal Audit Policy
14 pages

Multple Linear Regression

Uploaded by

Multple Linear Regression

Uploaded by

112 Machine Learning using Python

grade_10_perc pred_y pred_y_left pred_y_right

4.5 | MULTIPLE LINEAR REGRESSION

1. e regression model is linear in regression parameters (b-values).

Chapter 04_Linear Regression.indd 112 4/24/2019 6:54:46 PM

4.5.1 | Predicting the SOLD PRICE (Auction Price) of Players

players and popular Indian players are auctioned.

TABLE 4.3 Metadata of IPL dataset

Chapter 04_Linear Regression.indd 113 4/24/2019 6:54:46 PM

TABLE 4.3 (Continued)

4.5.2 | Developing Multiple Linear Regression Model Using Python

4.5.2.1 Loading the Dataset

ipl_auction_df = pd.read_csv( ‘IPL IMB381IPL2013.csv’ )

Chapter 04_Linear Regression.indd 114 4/24/2019 6:54:46 PM

3/ <,1 5 /( R XOO REMHFW

4.5.2.2 Displaying the First Five Records

LSOBDXFWLR BGI LORF>

Chapter 04_Linear Regression.indd 115 4/24/2019 6:54:46 PM

BIHDWXUHV LSOBDXFWLR BGI FROXP V

BIHDWXUHV > ( 8175< 3/ <,1 5 /(

4.5.3 | Encoding Categorical Features

array([‘Allrounder’, ‘Bowler’, ‘Batsman’, ‘W. Keeper’], dtype=object)

SG HWBGXPPLHV LSOBDXFWLR BGI> 3/ <,1 5 /( >

Allrounder Batsman Bowler W. Keeper

Chapter 04_Linear Regression.indd 116 4/24/2019 6:54:46 PM

Allrounder Batsman Bowler W. Keeper

FDWH RULFDOBIHDWXUHV > ( 8175< 3/ <,1 5 /(

LSOBDXFWLR BH FRGHGBGI SG HWBGXPPLHV LSOBDXFWLR BGI> BIHDWXUHV

LSOBDXFWLR BH FRGHGBGI FROXP V

Index([‘T-RUNS’, ‘T-WKTS’, ‘ODI-RUNS-S’, ‘ODI-SR-B’, ‘ODI-WKTS’,

BIHDWXUHV LSOBDXFWLR BH FRGHGBGI FROXP V

4.5.4 | Splitting the Dataset into Train and Validation Sets

Chapter 04_Linear Regression.indd 117 4/24/2019 6:54:46 PM

VP DGGBFR VWD W LSOBDXFWLR BH FRGHGBGI

4.5.5 | Building the Model on the Training Dataset

ipl_model_1 = sm.OLS(train_y, train_X). t()

TABLE 4.4 Model summary for ipl_model_1

Coef. Std.Err. t P > |t| [0.025 0.975]

Chapter 04_Linear Regression.indd 118 4/24/2019 6:54:46 PM

Coef. Std.Err. t P > |t| [0.025 0.975]

Omnibus: 0.891 Durbin-Watson: 2.244

p-value (<0.05), only the features

4.5.6 | Multi-Collinearity and Handling Multi-Collinearity

Chapter 04_Linear Regression.indd 119 4/24/2019 6:54:46 PM

You might also like