0% found this document useful (0 votes)
60 views

Regression Analysis With Python

The document discusses regression analysis using Python. It describes how to calculate the slope, intercept, and correlation coefficient of a linear regression line from sample data. Code is provided to compute the regression parameters in Python using NumPy and SciPy functions and the results are printed.

Uploaded by

Sin Claire
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Regression Analysis With Python

The document discusses regression analysis using Python. It describes how to calculate the slope, intercept, and correlation coefficient of a linear regression line from sample data. Code is provided to compute the regression parameters in Python using NumPy and SciPy functions and the results are printed.

Uploaded by

Sin Claire
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Blog Books Book Subscription

×
Join the Community!
Get expert insights straight to your inbox. Subscribe
below to receive computing updates!

PYTHON Subscribe now

Regression Analysis with Python


by Rheinwerk Computing

A regression analysis is a statistical analysis method that examines the relationships


between two variables.

These are a dependent variable Y (also called the outcome variable) and an independent
variable X (also called the influencing variable). Regression analyses are always used when
correlations need to be described or predicted in terms of quantities. Mathematically, the
influence of X on Y is symbolized by an arrow:

X → Y

I want to use a simple example to describe the regression analysis method: In a storage
room for synthetic fibers, there is a certain relative humidity. Over a period of 15 days, the
relative humidity of the room (X) and the moisture content of the synthetic fiber (Y) are
measured once a day. A regression analysis will be used to investigate whether there is a
correlation between these two variables and, if so, how strong this correlation is. In this
table, the measurement values are documented.

In the simplest case, a linear relationship exists between the X and Y values, which can be
described by a linear function with the slope m and the intercept a of the function line with
the y axis:

y = mx +a

The m and a parameters are also referred to as regression parameters. The strength of the
correlation between X and Y of the two measurement series is determined by the correlation
coefficient r.

Computing the Regression Parameters


The correlation coefficient r is defined as the quotient of the covariance sxy of two
measurement series and the product of the standard deviations, sxsy:

The slope of the regression line is the quotient of the covariance and the square of the
standard deviation of the x-values:

The slope can also be calculated using the correlation coefficient:

The intercept of the regression line is computed from the difference of the mean value of all
y-values and the product of the slope m with the mean value of all x-values:

All regression parameters can be computed directly using the following SciPy function:

m,a,r,p,e = stats.linregress(X,Y)

The return values p and e of the tuple are not needed for computing the regression
parameters.

The listing below calculates the slope m, the y-axis intercept a of the regression line and the
correlation coefficient r for the measurement values listed earlier. To show the different
options of Python, the parameters are calculated and compared by using the corresponding
NumPy and SciPy functions.

01 #14_correlation.py

02 import numpy as np

03 from scipy import stats

04 X=np.array([46,53,29,61,36,39,47,49,52,38,55,32,57,54,44])

05 Y=np.array([12,15,7,17,10,11,11,12,14,9,16,8,18,14,12])

06 xm=np.mean(X)

07 ym=np.mean(Y)

08 sx=np.std(X,ddof=1)

09 sy=np.std(Y,ddof=1)

10 sxy=np.cov(X,Y)

11 r1=sxy/(sx*sy)

12 r2=np.corrcoef(X,Y)

13 m1=sxy[0,1]/sx**2

14 m2=r2[0,1]*sy/sx

15 a1=ym-m2*xm

16 m3, a2, r3, p, e = stats.linregress(X,Y)

17 print("NumPy1 slope:",m1)

18 print("NumPy2 slope:",m2)

19 print("SciPy slope:",m3)

20 print("Intersection with the y-axis:",a1)

21 print("Intersection with the y-axis:",a2)

22 print("Def. correlation coefficient:",r1[0,1])

23 print("NumPy correlation coefficient:",r2[0,1])

24 print("SciPy correlation coefficient:",r3)

25 print("Estimated error:",e)

Output

NumPy1 slope: 0.32320356181404014

NumPy2 slope: 0.3232035618140402

SciPy slope: 0.3232035618140402

Intersection with the y-axis: -2.5104576516877213

Intersection with the y-axis: -2.5104576516877213

Def correlation coefficient: 0.9546538498757964

NumPy correlation coefficient: 0.9546538498757965

SciPy correlation coefficient: 0.9546538498757965

Estimated error: 0.027955268902524828

Analysis

Lines 04 and 05 contain the measurement values for the relative humidity X and the
moisture content of the material Y. The program computes the regression parameters from
these measurement values and uses the correlation coefficient to check whether a
correlation exists between the influencing variable X and the outcome variable Y and to
determine how strong this correlation is.

Line 10 computes the covariance sxy using the np.cov(X,Y) NumPy function. This function
returns a 2 × 2 matrix. The value for sxy is either in the first row, second column, of the
matrix or in the second row, first column, of the matrix. The correlation coefficient r1 is then
calculated with sxy in line 11.

A simpler way to calculate the correlation coefficient is to directly use NumPy function
np.corrcoef(X,Y) (line 12). This function also returns a 2 × 2 matrix. The value for r2 is either
in the first row, second column, of the matrix or in the second row, first column of the matrix.

Line 13 calculates the slope m1 from the covariance sxy[0,1] and the square of the standard
deviation sx from the X measurement values. The slope m2 is calculated in line 14 using the
correlation coefficient r2(0,1) and the standard deviations sx and sy.

In line 15, the y-axis intercept a1 is calculated from the mean values of the X and Y values
using the conventional method. Instead of the slope m2, the slope m1 could have been used
as well.

The most effective method to calculate all three parameters with only one statement is
shown in line 16. The slope m3, the y-axis intercept a2, and the correlation coefficient r3 are
returned as tuples by the SciPy function stats.linregress(X,Y).

The print() function outputs the regression parameters in lines 17 to 25. All computation
methods provide the same results. The slope is about 0.32, and the y-axis intercept has a
value of about −2.51. Thus, the regression line adheres to the following equation:

The correlation coefficient of r = 0.95465 is close to 1. Thus, a strong correlation exists


between the relative humidity (X) and the moisture content of the material (Y).

Representing the Scatter Plot and the Regression Line


When the discrete yi values of the Y measurement series and the discrete xi values of the X
measurement series are plotted in an x-y coordinate system, this plot is referred to as a
scatter plot. This listing shows how to implement such a scatter plot with the values listed in
our table and the corresponding regression line.

01 #15_regeression_line.py

02 import numpy as np

03 import matplotlib.pyplot as plt

04 from scipy import stats

05 X=np.array([46,53,29,61,36,39,47,49,52,38,55,32,57,54,44])

06 Y=np.array([12,15,7,17,10,11,11,12,14,9,16,8,18,14,12])

07 m, a, r, p, e = stats.linregress(X,Y)

08 fig, ax=plt.subplots()

09 ax.plot(X, Y,'rx')

10 ax.plot(X, m*X+a)

11 ax.set_xlabel("Relative humidity in %")

12 ax.set_ylabel("Moisture content of the material")

13 plt.show()

Output

This figure shows the output regression line after the calculation.

Analysis

The program determines the y-axis intercept a and the slope m in line 07 using the
stats.linregress(X,Y) SciPy function. Line 09 plots the discrete xi and yi values as red
crosses using the plot(X, Y, 'rx') method. The ax.plot(X, m*X+a) statement in line 10 causes
the plot of the regression line. The crosses of the scatter plot clearly show that a strong
correlation exists between the relative humidity and the moisture content of the material.

Editor’s note: This post has been adapted from a section of the book Python for Engineering
and Scientific Computing by Veit Steinkamp.

Recommendation

Python for Engineering and Scientific


Computing
It’s finally here—your guide to Python for engineers
and scientists, by an engineer and scientist! Get to
know your development environments and the key
Python modules you’ll need: NumPy, SymPy, SciPy,
Matplotlib, and VPython. Understand basic Python
program structures and walk through practical
exercises that start simple and increase in
complexity as you work your way through the book.
With information on statistical calculations, Boolean
algebra, and interactive programming with Tkinter,
this Python guide belongs on every scientist’s shelf!

Learn More

by Rheinwerk Computing
Rheinwerk Computing is an imprint of Rheinwerk Publishing and
publishes books by leading experts in the fields of programming,
administration, security, analytics, and more.

Python

Comments

FIRST NAME*

LAST NAME

EMAIL*

WEBSITE

COMMENT*

This reCAPTCHA is for testing purposes only. Please report to the


site protected
admin if you areby reCAPTCHA
seeing this.

Privacy - Terms

Submit Comment

Latest Blog Posts

PYTHON PYTHON

How to Perform Matrix Multiplication with 5 Python Modules for Scientific Computing
Python
Read More
Read More

Subscribe to our blog!


Get notified about future blog updates.

Your Email Address* Submit

The official Rheinwerk Computing Blog Blog Topics

Rheinwerk Computing is an imprint of Rheinwerk Publishing and publishes resources All Topics Web Development HTML & CSS
that will help you accelerate your computing journey. The Rheinwerk Computing Blog
Programming DevOps Security
is designed to provide helpful, actionable information on a variety of topics, including
Languages
programming, administration, security, and analytics! Python What Is?
JavaScript
Software
Java Development

Blog curated by About

Home Contact Privacy Policy


Visit Rheinwerk Computing Store
About Us Legal Notes Terms of Use
© 2024 Rheinwerk Publishing, Inc. | Change Privacy Options

You might also like