0% found this document useful (0 votes)

94 views

Exercise 4: Simple and Multiple Linear Regression Analysis

This document presents an exercise on simple and multiple linear regression analysis using temperature data from three locations in Sweden: Falun, Gävle, and Knon for the month of November 1977. Simple linear regression is performed to estimate missing temperature values for Falun using data from the other two locations. The correlation between Falun and combinations of the other datasets is calculated, with the highest correlation found when using a combination of Gävle and Knon data. The regression coefficients are then calculated for the model using this combined dataset to estimate Falun temperatures, finding an R2 value of 0.967, indicating the model explains 96.7% of the variance in Falun temperatures.

Uploaded by

Bikas C. Bhattarai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views

Exercise 4: Simple and Multiple Linear Regression Analysis

Uploaded by

Bikas C. Bhattarai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

9/26/2020 Jupyter Notebook Viewer

Course-work-and-data-analysis (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master)
/ Hydrology-Course (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master/Hydrology-Course)
/
GEO4310_2015 (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master/Hydrology-Course/GEO4310_2015)
/
EX4 (/github/bikasbhattarai/Course-work-and-data-analysis/tree/master/Hydrology-Course/GEO4310_2015/EX4)

In [1]:

%%html
<style>
table {float:left}
</style>

Exercise 4: Simple and multiple linear regression

analysis
Name: Bikas Chandra Bhattarai

Date: 28 September, 2015

1. Simple regression

Temperature data for a certain month (November 1977) is available from Falun (Dalarna), Gävle (Gästrikland) and
Knon (Värmland) (file: temp_falun.txt). For Falun the data series is not complete.

We want to fill the missing data for Falun using the best correlated data set of the three possible data sets:

1. Only the data from Gävle

2. Only the data from Knon
3. Both Gävle and Knon and the information about distances (Gävle-Falun = 82 km, Knon-Falun = 110 km)

Question1: Compute the correlation between Falun and (1), (2) and(3) and
determine which one shall be used as the independent variable.

In [2]:

# this allows plots to appear directly in the notebook

%matplotlib inline
import pandas as pd
import numpy as np
import scipy.stats
from __future__ import division
import matplotlib.pyplot as plt

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 1/15
9/26/2020 Jupyter Notebook Viewer

In [3]:

temp_data = pd.read_table('temp_falun.dat') #reading the table

df_temp = pd.DataFrame(temp_data) # Defining the dataframe
df_temp.head(2) # printing the dataframe upto 3 rows only

Out[3]:

Day T_Falun T_Gavle T_Knon

0 1 8.2 9.0 6.5

1 2 6.4 7.8 4.8

Calculating the third datasets from the temperature data of T_Gavle and T_Knon, by using the inverse distance
weighting methods:

The equation used for the calculation of third datasets is given in equation 1 given below:
2 2
1 1
( ) ( )
82 110

TGavle+K non = TGavle + TK non ...................(1)

2 2 2 2
1 1 1 1
( ) +( ) ( ) +( )
82 110 82 110

In [4]:

# Calculating the third datasets by using equation 1 and inserting the calculated dat
df_temp['T_Galve_Knon']= (((1/82)**2/((1/82)**2+(1/110)**2))* df_temp['T_Gavle'] + ((
df_temp.head(2) # printing the dataframe upto 3 rows only

Out[4]:

Day T_Falun T_Gavle T_Knon T_Galve_Knon

0 1 8.2 9.0 6.5 8.11

1 2 6.4 7.8 4.8 6.73

In [5]:

#printing the lower 10 rows from datasest

df_temp.tail(10)

Out[5]:

Day T_Falun T_Gavle T_Knon T_Galve_Knon

20 21 -1 -2.0 -3.1 -2.39

21 22 NaN 0.1 -1.1 -0.33

22 23 NaN -6.2 -5.4 -5.91

23 24 NaN 0.5 -1.9 -0.36

24 25 NaN -1.9 -3.7 -2.54

25 26 NaN -6.4 -10.7 -7.94

26 27 NaN -7.6 -14.9 -10.21

27 28 NaN 0.9 -5.8 -1.49

28 29 NaN 0.5 -10.0 -3.25

29 30 NaN -1.2 -14.2 -5.84

From this above table, there are no temperature observations in Falun for the days 22 – 30. Therefore there are
gaps in the table for these days.

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 2/15
9/26/2020 Jupyter Notebook Viewer

For the calculation of these parameters only the data from the 1st until the 21st day are used. Otherwise it would
not be comparable to the data from Falun.

In [6]:

# removing the lower 9 rows and Day column from datasets

df_temp_21 = df_temp[:-9].drop('Day',1) # :-9 is for removing lower 9 rows, and drop

In [7]:

# calculating the correlation between each temperature datasets

df_corr = df_temp_21.corr()
print np.round(df_corr,3)

T_Falun T_Gavle T_Knon T_Galve_Knon

T_Falun 1.000 0.984 0.970 0.993
T_Gavle 0.984 1.000 0.937 0.991
T_Knon 0.970 0.937 1.000 0.976
T_Galve_Knon 0.993 0.991 0.976 1.000

Conclusion:

As the correlation coefficient rx;y = 0, 9932 of TF alun and TG ävle+K non

is highest, the combination of these
two samples is best linearly correlated. Therefore TG ävle+K non
should be used as independent variable to
calculate the temperature in Falun.

Question2: Calculate the regression coefficients and how much of the variance is
explained by the regression model, i.e. the R² values.

Simple linear regression equation can be written as in the following form:

y = a + bx............(2)

Where,

a is the intercept

b is the coefficient for x

Together, a and a are called the regression coefficients and can be calculated by using the python function
described below:

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 3/15
9/26/2020 Jupyter Notebook Viewer

In [8]:

# Importing the statistical model

import statsmodels.formula.api as smf

# create a fitted model between T_Galun as dependent variable and T_Gavle as indipend
fg = smf.ols(formula='T_Falun ~ T_Gavle', data=df_temp_21).fit()

#print summary statistics

print(fg.summary())

OLS Regression Results

==========================================================================
Dep. Variable: T_Falun R-squared: 0
Model: OLS Adj. R-squared: 0
Method: Least Squares F-statistic: 5
Date: Mon, 28 Sep 2015 Prob (F-statistic): 1.36
Time: 11:53:40 Log-Likelihood: -25
No. Observations: 21 AIC: 5
Df Residuals: 19 BIC: 5
Df Model: 1
Covariance Type: nonrobust
==========================================================================
coef std err t P>|t| [95.0% Conf.
-------------------------------------------------------------------------
Intercept -0.3989 0.222 -1.800 0.088 -0.863 0
T_Gavle 0.9292 0.039 23.759 0.000 0.847
==========================================================================
Omnibus: 2.569 Durbin-Watson:
Prob(Omnibus): 0.277 Jarque-Bera (JB):
Skew: -0.672 Prob(JB): 0
Kurtosis: 3.031 Cond. No.
==========================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is co

So the equation (2) becomes:

y = −0.3989 + 0.9292 ∗ x

and the coefficient of determination (R² = 0.967)

Linear regression equation for T_Falun as dependent variable and T_Knon as indipendent variables

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 4/15
9/26/2020 Jupyter Notebook Viewer

In [9]:

# create a fitted model between T_Galun as dependent variable and T_Knon as an indipe
fg = smf.ols(formula='T_Falun ~ T_Knon', data=df_temp_21).fit()
#print summary statistics
print(fg.summary())

OLS Regression Results

==========================================================================
Dep. Variable: T_Falun R-squared: 0
Model: OLS Adj. R-squared: 0
Method: Least Squares F-statistic: 3
Date: Mon, 28 Sep 2015 Prob (F-statistic): 4.04
Time: 11:53:40 Log-Likelihood: -3
No. Observations: 21 AIC: 6
Df Residuals: 19 BIC: 7
Df Model: 1
Covariance Type: nonrobust
==========================================================================
coef std err t P>|t| [95.0% Conf.
-------------------------------------------------------------------------
Intercept 1.2902 0.262 4.930 0.000 0.742
T_Knon 0.8280 0.048 17.376 0.000 0.728 0
==========================================================================
Omnibus: 4.658 Durbin-Watson:
Prob(Omnibus): 0.097 Jarque-Bera (JB): 2
Skew: -0.860 Prob(JB): 0
Kurtosis: 3.500 Cond. No.
==========================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is co

When T_Falun as dependent variable and T_Knon as an indipendent variable then the liear regression equation
and coefficient of determination (R²) becomes:

y = 1.2902 + 0.8280 ∗ x

and the coefficient of determination (R² = 0.941)

Linear regression equation for T_Falun as dependent variable and T_Galve_Knon as indipendent variables

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 5/15
9/26/2020 Jupyter Notebook Viewer

In [10]:

# create a fitted model between T_Galun as dependent variable and T_Galve_Knon as an

fg = smf.ols(formula='T_Falun ~ T_Galve_Knon', data=df_temp_21).fit()
#print summary statistics
print(fg.summary())

OLS Regression Results

==========================================================================
Dep. Variable: T_Falun R-squared: 0
Model: OLS Adj. R-squared: 0
Method: Least Squares F-statistic:
Date: Mon, 28 Sep 2015 Prob (F-statistic): 3.00
Time: 11:53:40 Log-Likelihood: -16
No. Observations: 21 AIC: 3
Df Residuals: 19 BIC: 3
Df Model: 1
Covariance Type: nonrobust
==========================================================================
coef std err t P>|t| [95.0% Conf
-------------------------------------------------------------------------
Intercept 0.1841 0.134 1.369 0.187 -0.097
T_Galve_Knon 0.9176 0.025 37.353 0.000 0.866
==========================================================================
Omnibus: 0.498 Durbin-Watson:
Prob(Omnibus): 0.780 Jarque-Bera (JB): 0
Skew: 0.103 Prob(JB): 0
Kurtosis: 3.059 Cond. No.
==========================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is co

When T_Falun as dependent variable and T_Galve_Knon act as a an indipendent variable then the liear regression
equation and coefficient of determination (R²) becomes:

y = 0.1841 + 0.9176 ∗ x

and the coefficient of determination (R² = 0.987)

In summary:

Table 1

Dependent and indipendent variables Linear Regression Equation R²

T_Falun as dependent, T_Gavle as an indipendent y = −0.3989 + 0.9292 ∗ x 0.967

T_Falun as dependent, T_Knon as an indipendent y = 1.2902 + 0.8280 ∗ x 0.941

T_Falun as dependent, T_Galve_Knon as an indipendent y = 0.1841 + 0.9176 ∗ x 0.987

Conclusion:

It is clear from the Table 1 that the coefficient of determination (R²) for the linear regression model with T_Falun as
dependent variable and T_Galve_Knon as an indipendent variable is highest so this model should be used as a
model for predicting the missing temperature for the station T_Falun.

Question3: Test the significance of the regression coefficients

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 6/15
9/26/2020 Jupyter Notebook Viewer

From the above table 1, our selected regression model is y = 0.1841 + 0.9176 ∗ x on the basis of good
coefficient of determination.

Table 2

Coefficient t_critical P [95.0% Conf. Int.]

a 1.369 0.187 -0.097-----0.466

b 37.353 0.000 0.866-----0.969

Hypothesis test for a

Now formulating the test hypothesis for the coefficients to test wheather the coefficients are significantly different or
not and the test hypothesis can be formulated as given below:

H0 : a = 0

Ha : a ≠ 0

Now we have all the calculated statistics (from the summary statistics of best fit model) required for this test and are
shown in table 2, where t is the calculated t-value and p is the probability

Testing approach 1:

Based on t_critical value

If the |t| > t critical , then H0 is rejected

From the table tcritical = t 1− α ;n − 1 = 2.093

Hence, Tcritical value is greater then t value so H0 is not rejected

Approach 2:

Based on P value

If the p > α then H0 is rejected

Here, also P value (0.187) is not smaller than (α = 0.05) so H0 is not rejected

Approach 3:

Based on confidence intervals

Since the conficence intervals is (-0.097 to 0Testing approach 1:

Based on t_critical value

If the |t| > t critical , then H0 is rejected

From the table tcritical = t 1− α ;n − 1 = 2.093

Hence, Tcritical value is greater then t value so H0 is not rejected

Approach 2:

Based on P value

If the p > α then H0 is rejected

Here, also P value (0.187) is not smaller than (α = 0.05) so H0 is not rejected

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 7/15
9/26/2020 Jupyter Notebook Viewer

Approach 3:

Based on confidence intervals

Since the conficence intervals is (-0.097 to 0.466) so there is a possibility that the value 0 should be within this
confidence intervals.

Conclusion

From above all test it is clear that the H0 is not rejected and concluded that the value of a is not signigicantly
different from the value 0 at the 95% confidence interval.

Hypothesis test for b

H0 : b = 0

Ha : b ≠ 0

Testing approach 1:

Based on t_critical value

If the |t| > t critical , then H0 is rejected

From the table tcritical = t 1− α ;n − 1 = 2.093

and |t| = 37.353

Hence, Tcritical value is less then |t| value so H0 is rejected

Approach 2:

Based on P value

If the p > α then H0 is rejected

Here, also P value (0.00) is smaller than (α = 0.05) so H0 is rejected

Approach 3:

Based on confidence intervals

Since the conficence intervals is (0.866 to0.969) so there is no possibility that the value 0 lies within this confidence
intervals, so H0 is rejected

Conclusion

From the above test it is clear that the H0 is rejected and concluded that the value of b is signigicantly different
from the value 0 at 95% confidence interval.

Question4: Plot the time series of the observed and calculated dependent variable
including the extended values on the same graph

From the above all possible regression analysis, the best fit model is obtained from the regression between
T_Falun with T_Galve_Knon. Hence our model for the estimation becomes:

y = 0.1841 + 0.9176 ∗ x . By using this equation the missing data of T_Falun is estimated and plotted as
follows:

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 8/15
9/26/2020 Jupyter Notebook Viewer

In [11]:

# Estimating the temperature by using best fitted regression equation

Estimated = (0.1841 +0.9176 * df_temp['T_Galve_Knon']).round(2)

# Filling missing value for T_Falun and assigning the name T_Falun_Fill
df_temp['T_Falun_Fill'] = df_temp['T_Falun'].fillna(Estimated)

In [12]:

# Showing the lower part of dataframe with filling missing data

print df_temp.tail(10)

Day T_Falun T_Gavle T_Knon T_Galve_Knon T_Falun_Fill

20 21 -1 -2.0 -3.1 -2.39 -1.00
21 22 NaN 0.1 -1.1 -0.33 -0.12
22 23 NaN -6.2 -5.4 -5.91 -5.24
23 24 NaN 0.5 -1.9 -0.36 -0.15
24 25 NaN -1.9 -3.7 -2.54 -2.15
25 26 NaN -6.4 -10.7 -7.94 -7.10
26 27 NaN -7.6 -14.9 -10.21 -9.18
27 28 NaN 0.9 -5.8 -1.49 -1.18
28 29 NaN 0.5 -10.0 -3.25 -2.80
29 30 NaN -1.2 -14.2 -5.84 -5.17

In [21]:

# Plotting Observed and an Estimated temperature for T_Falun

plt.plot(df_temp['T_Falun_Fill'],'bo--')
plt.plot(df_temp['T_Falun'], 'ro-')
plt.plot(df_temp['T_Galve_Knon'],'go-')
plt.legend(['Estimated', 'Observed','Temp_G_K'])
plt.xlabel('Day', size = 15)
plt.ylabel('Temperature', size = 15)

Out[21]:

<matplotlib.text.Text at 0x7f6adb77b750>

2. Multiple linear regression

a) In the file multidata.txt there are a number of numerical variables. Chose Y as dependent variable and x1, x2, x3
as independent variables. Perform a forward stepwise multiple regression and also a standard multiple regression.

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_20… 9/15
9/26/2020 Jupyter Notebook Viewer

In a forward stepwise multiple regression, start with performing a simple regression using the independent variable
which is best correlated with the dependent variable. Then add another independent variable, and make sure that
this second independent variable should have the higher partial correlation with the dependent while the influence
of the first independent variable is removed. Continue this procedure to see if the addition of a third independent
variable will be helpful. In a standard multiple regression, all the independent variables are used in the regression
model. By analysing the result of the regression, you could figure out if some independent variables do not
significantly contribute to the regression. If there are any, remove them from the model and redo the regression with
only the significant independent variables.

b) Present in each case the R2 values and the regression equations.

c) In the forward stepwise method present also your F-test results (use α = 5%)

d) What are your conclusions?

In [14]:

multi = pd.read_table('multidata.txt') #reading in the data

#defining the dataframe
df = pd.DataFrame(multi)
df.head()

Out[14]:

X1 X2 X3 Y

0 1 2 10 5.077

1 2 2 9 32.330

2 3 3 5 65.140

3 4 4 4 47.270

4 5 2 9 80.570

In [15]:

#calculating correlation between all

multi.corr()

Out[15]:

X1 X2 X3 Y

X1 1.000000 -0.048404 -0.142982 0.624132

X2 -0.048404 1.000000 0.252610 0.264732

X3 -0.142982 0.252610 1.000000 0.586944

Y 0.624132 0.264732 0.586944 1.000000

From this table it is clear that the correlation between Y as a dependent variable and X1 as an indipendent variable
have highest correlation coefficient after that Y with X3 have second highest correlation coefficient so the first
regression equation becomes:

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 10/15
9/26/2020 Jupyter Notebook Viewer

In [16]:

#calculating regression equation by using the ols function in python where, Y as depe
#variable.
ols1 = smf.ols(formula='Y ~ X1', data=multi).fit()
print ols1.summary()

OLS Regression Results

==========================================================================
Dep. Variable: Y R-squared: 0
Model: OLS Adj. R-squared: 0
Method: Least Squares F-statistic: 9
Date: Mon, 28 Sep 2015 Prob (F-statistic): 0.0
Time: 11:53:41 Log-Likelihood: -78
No. Observations: 17 AIC:
Df Residuals: 15 BIC:
Df Model: 1
Covariance Type: nonrobust
==========================================================================
coef std err t P>|t| [95.0% Conf.
-------------------------------------------------------------------------
Intercept 9.9956 14.461 0.691 0.500 -20.828 40
X1 11.9826 3.873 3.094 0.007 3.727 20
==========================================================================
Omnibus: 1.684 Durbin-Watson:
Prob(Omnibus): 0.431 Jarque-Bera (JB):
Skew: 0.570 Prob(JB): 0
Kurtosis: 2.198 Cond. No.
==========================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is co

/home/bikascb/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:12
int(n))

So the regression equation becomes:

y = 9.9956 + 11.9826 ∗ x

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 11/15
9/26/2020 Jupyter Notebook Viewer

In [17]:

#calculating regression equation by using the ols function in python

ols2 = smf.ols(formula='Y ~ X1 + X3', data=multi).fit()
print ols2.summary()

OLS Regression Results

==========================================================================
Dep. Variable: Y R-squared: 0
Model: OLS Adj. R-squared: 0
Method: Least Squares F-statistic: 4
Date: Mon, 28 Sep 2015 Prob (F-statistic): 1.26
Time: 11:53:41 Log-Likelihood: -66
No. Observations: 17 AIC:
Df Residuals: 14 BIC:
Df Model: 2
Covariance Type: nonrobust
==========================================================================
coef std err t P>|t| [95.0% Conf.
-------------------------------------------------------------------------
Intercept -14.2231 8.102 -1.756 0.101 -31.600 3
X1 13.8775 1.965 7.062 0.000 9.663 18
X3 1.3498 0.200 6.744 0.000 0.921
==========================================================================
Omnibus: 3.571 Durbin-Watson:
Prob(Omnibus): 0.168 Jarque-Bera (JB):
Skew: 0.561 Prob(JB): 0
Kurtosis: 3.901 Cond. No.
==========================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is co

In this case the equation becomes: −14.2231 + 13.8775b + 1.3498x

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 12/15
9/26/2020 Jupyter Notebook Viewer

In [18]:

#calculating the regression coefficient with Y as dependent and X1, X2 and X3 as indi
ols3 = smf.ols(formula='Y ~ X1 + X2 + X3', data=multi).fit()
print ols3.summary()

OLS Regression Results

==========================================================================
Dep. Variable: Y R-squared: 0
Model: OLS Adj. R-squared: 0
Method: Least Squares F-statistic: 2
Date: Mon, 28 Sep 2015 Prob (F-statistic): 4.27
Time: 11:53:41 Log-Likelihood: -65
No. Observations: 17 AIC:
Df Residuals: 13 BIC:
Df Model: 3
Covariance Type: nonrobust
==========================================================================
coef std err t P>|t| [95.0% Conf.
-------------------------------------------------------------------------
Intercept -22.8259 10.270 -2.223 0.045 -45.013 -0
X1 13.9098 1.917 7.257 0.000 9.769 18
X2 3.8826 2.961 1.311 0.212 -2.514 10
X3 1.2841 0.202 6.372 0.000 0.849
==========================================================================
Omnibus: 2.908 Durbin-Watson: 2
Prob(Omnibus): 0.234 Jarque-Bera (JB):
Skew: 0.709 Prob(JB): 0
Kurtosis: 3.232 Cond. No.
==========================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is co

Table 3

Dependent and indipendent variables Regression Equation R²

Y as dependent, X1 as an indipendent y = 9.9956 + 11.9826 ∗ b 0.390

Y as dependent, X1,X3 as an indipendent y = −14.2231 + 13.8775 ∗ b1 + 1.3498 ∗ b2 0.856

Y as dependent, X1,X2,X3 as an indipendent y = −22.8259 + 13.9098 ∗ b1 + 3.8826 ∗ b2 + 1.2841 ∗ b3 0.873

Conclusion

From all above combination it is clear that the model with Y as a dependent variables and X1, X2 and X3 as an
indipendent variables has high R² value so this is the best fitted model amongst others.

Perform the F- test (alpha = 0.05) for adding indipendent variables.

Test for adding the third independent Variable (X3)

F -Test for Testing the Significance By using the F -Test one can find out, whether adding a variable is significant or
not.The test statistic is

1−R²n−1 (N −n−1)
F = ---------------(3)
1−R²n (N −n−2)

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 13/15
9/26/2020 Jupyter Notebook Viewer

whereas N is the number of data and n is the number of independent variables used. If F > F1−α;N −n−1;N −n−2

, the addition of Xn is significant. In the case of the three variables X1 , X3 and Y one obtaines at a significance
level of α = 0.05

Calculating the value of F by using equation 3, where

R²n−1 = 0.390

R²n = 0.856

N = 17

The calculation are done in the code cell below:

In [19]:

round(((1-0.390)*14)/((1-0.856)*13),2)

Out[19]:

4.56

Here,

F = 4.56

F0,95;14;13 = 2.55

Conclusion: Here, 4.56 > 2.55, which is a true statement. That means that adding X3 to the regression is
significant.

Test for adding the third independent Variable (X2)

Again the F -test is performed to check whether adding the independent variable X2 to the regression is significant
or not by using the same equation as above (α = 0, 05).

F > F0.95;13;12

Calculating the value of F by using equation 3, where

R²n−1 = 0.856

R²n = 0.873

N = 17

The calculation are done in the code cell below:

In [20]:

round(((1-0.856)*13)/((1-0.873)*12),2)

Out[20]:

1.23

Conclusion: As this is a false statement one can conclude, that adding X2 as a third independent variable to the
regression is not significant. That means a regression with X1 and X3 as independent variables will be adequately
exact.

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 14/15
9/26/2020 Jupyter Notebook Viewer

In [ ]:

https://nbviewer.jupyter.org/github/bikasbhattarai/Course-work-and-data-analysis/blob/master/Hydrology-Course/GEO4310_2… 15/15

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Kruskal Wallis With R
No ratings yet
Kruskal Wallis With R
4 pages
Exercise 5: Frequency Analysis:) Against
No ratings yet
Exercise 5: Frequency Analysis:) Against
9 pages
Matplotlib Fundamentals
No ratings yet
Matplotlib Fundamentals
31 pages
Getting Your Hands-On Climate Data - Visualize Climate Data With Python
No ratings yet
Getting Your Hands-On Climate Data - Visualize Climate Data With Python
20 pages
Examen Machine Learning A)
No ratings yet
Examen Machine Learning A)
4 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
ET4248E - Chap9 - K-Means and GMM
No ratings yet
ET4248E - Chap9 - K-Means and GMM
27 pages
Introduction To Tree Methods
No ratings yet
Introduction To Tree Methods
15 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
A Primer of Multivariate Statistics PDF
No ratings yet
A Primer of Multivariate Statistics PDF
626 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Data Preprocessing
100% (1)
Data Preprocessing
33 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
Coordinate Descent and Golden Selection Search
No ratings yet
Coordinate Descent and Golden Selection Search
2 pages
Non Parametric Methods 8
No ratings yet
Non Parametric Methods 8
23 pages
Scikit Learn Docs
100% (1)
Scikit Learn Docs
2,201 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Jupyter Installation
100% (1)
Jupyter Installation
19 pages
Forecast
No ratings yet
Forecast
82 pages
CH 6
No ratings yet
CH 6
72 pages
Times Series Analysis Notes May 2021
No ratings yet
Times Series Analysis Notes May 2021
69 pages
One-Sample T-Test
No ratings yet
One-Sample T-Test
9 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Student Booklet For Sep 2015 v6
100% (1)
Student Booklet For Sep 2015 v6
50 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
K Means
No ratings yet
K Means
22 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Importance Sampling
No ratings yet
Importance Sampling
8 pages
TDA Mapper PDF
No ratings yet
TDA Mapper PDF
71 pages
Literature Review On Feature Selection Methods For HighDimensional Data
No ratings yet
Literature Review On Feature Selection Methods For HighDimensional Data
9 pages
Weka Tutorial
100% (2)
Weka Tutorial
60 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
K Means Clustering
100% (1)
K Means Clustering
10 pages
Data Preprocessing ML Lab
No ratings yet
Data Preprocessing ML Lab
6 pages
Exercise 3: Logistic Regression: Andrew NG (Very Slightly Edited by Luis R. Izquierdo For The University of Burgos)
No ratings yet
Exercise 3: Logistic Regression: Andrew NG (Very Slightly Edited by Luis R. Izquierdo For The University of Burgos)
5 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Decision Trees For Predictive Modeling (Neville)
100% (1)
Decision Trees For Predictive Modeling (Neville)
24 pages
Introduction To Spark With Sparklyr in R
No ratings yet
Introduction To Spark With Sparklyr in R
11 pages
Introduction To Python and Computer Programming 1704298503
No ratings yet
Introduction To Python and Computer Programming 1704298503
44 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
BOOK Nonparametric and Semiparametric Models-2004
No ratings yet
BOOK Nonparametric and Semiparametric Models-2004
87 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Exercise 6: Time Series Analysis and Stochastic Modelling
No ratings yet
Exercise 6: Time Series Analysis and Stochastic Modelling
18 pages
Pregunta 5
No ratings yet
Pregunta 5
2 pages
TP Regression
100% (1)
TP Regression
1 page
assignment2
No ratings yet
assignment2
5 pages
TestExercise 3.ipynb - Colab
No ratings yet
TestExercise 3.ipynb - Colab
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
Assignment_Solution_1
No ratings yet
Assignment_Solution_1
11 pages
Desarrollo Solemne 3 - ..Ipynb - Colaboratory
100% (2)
Desarrollo Solemne 3 - ..Ipynb - Colaboratory
4 pages
Surface Hydrology: GEO2010 Spring 2017
No ratings yet
Surface Hydrology: GEO2010 Spring 2017
8 pages
Solution To Exercise 2: Probability Distributions
No ratings yet
Solution To Exercise 2: Probability Distributions
10 pages
Hydrology D 19 00079 PDF
No ratings yet
Hydrology D 19 00079 PDF
25 pages
GEO 2010 Surface Hydrology Spring 2017 Exercise 1 Date: 2017-01-16
No ratings yet
GEO 2010 Surface Hydrology Spring 2017 Exercise 1 Date: 2017-01-16
7 pages
Black Carbon in Himalayas
No ratings yet
Black Carbon in Himalayas
7 pages
PICO TX-punya Bapaknya
No ratings yet
PICO TX-punya Bapaknya
56 pages
Effects of Pollution on Housing Prices
No ratings yet
Effects of Pollution on Housing Prices
31 pages
Vegetable Crops in Bareilly District of Up Determinants of Yield Levels
No ratings yet
Vegetable Crops in Bareilly District of Up Determinants of Yield Levels
5 pages
Impact of Mobile Phone Usage On Academic Performance Among Secondary School Students in Taraba State, Nigeria 2
100% (2)
Impact of Mobile Phone Usage On Academic Performance Among Secondary School Students in Taraba State, Nigeria 2
9 pages
Effect of Brand Image On Customer Satisfaction & Loyalty Intention and The Role of Customer..
No ratings yet
Effect of Brand Image On Customer Satisfaction & Loyalty Intention and The Role of Customer..
13 pages
Nonlife Actuarial Models: Model Evaluation and Selection
No ratings yet
Nonlife Actuarial Models: Model Evaluation and Selection
26 pages
Tax Incensive Term Paper
No ratings yet
Tax Incensive Term Paper
21 pages
2021 Jyothish
No ratings yet
2021 Jyothish
16 pages
Quantitative Analysis On Context of Misinformation
No ratings yet
Quantitative Analysis On Context of Misinformation
34 pages
Metopen Ekonomi Pembangunan
No ratings yet
Metopen Ekonomi Pembangunan
14 pages
Gautam Et Al., 2021
No ratings yet
Gautam Et Al., 2021
12 pages
SPSS Independent Samples T Test
No ratings yet
SPSS Independent Samples T Test
72 pages
Kepuasan: Uji Univariat Uji Normalitas DATA: Jika P Value 0.05 Maka Ho Diterima, Artinya Data Berdistribusi Normal
No ratings yet
Kepuasan: Uji Univariat Uji Normalitas DATA: Jika P Value 0.05 Maka Ho Diterima, Artinya Data Berdistribusi Normal
4 pages
Tutorial 9
No ratings yet
Tutorial 9
1 page
Module 5 - Post Task
No ratings yet
Module 5 - Post Task
5 pages
Genepop
No ratings yet
Genepop
51 pages
Abhi Presentation
No ratings yet
Abhi Presentation
23 pages
Gas Cap Blowdown
No ratings yet
Gas Cap Blowdown
59 pages
5-statistics-and-probability-g11-quarter-4-module-5-identifying-the-appropriate-rejection-region-for-a-given-level-of-significance
No ratings yet
5-statistics-and-probability-g11-quarter-4-module-5-identifying-the-appropriate-rejection-region-for-a-given-level-of-significance
28 pages
Statistical Rituals - The Replication Delusion and How We Got Here
No ratings yet
Statistical Rituals - The Replication Delusion and How We Got Here
21 pages
Unit 3 Chow's Test
No ratings yet
Unit 3 Chow's Test
4 pages
Optimization of Multilevel Ethanol Leaching Using Response Surface Methodology
No ratings yet
Optimization of Multilevel Ethanol Leaching Using Response Surface Methodology
13 pages
Goodness of Fit Test Example
No ratings yet
Goodness of Fit Test Example
3 pages
Gretl Empirical Exercise 2 - KEY PDF
No ratings yet
Gretl Empirical Exercise 2 - KEY PDF
3 pages
BM 31 1 010502
No ratings yet
BM 31 1 010502
27 pages
Module 17
No ratings yet
Module 17
16 pages
Parental Involvement and Study Habit Among
No ratings yet
Parental Involvement and Study Habit Among
14 pages
Assessing Linkage Between The Parent City and Satellite Towns: A Case Study of Ramanagara and Hoskote-Karnataka
No ratings yet
Assessing Linkage Between The Parent City and Satellite Towns: A Case Study of Ramanagara and Hoskote-Karnataka
10 pages
The Efficacy of an Automated Reminder System for Employee Clockin and Clockout Times
No ratings yet
The Efficacy of an Automated Reminder System for Employee Clockin and Clockout Times
4 pages
Demand Estimation: Managerial Economics: Economic Tools For Today's Decision Makers, 4/e by Paul Keat and Philip Young
No ratings yet
Demand Estimation: Managerial Economics: Economic Tools For Today's Decision Makers, 4/e by Paul Keat and Philip Young
44 pages