0% found this document useful (0 votes)

13 views15 pages

Sushant

Uploaded by

ompatil16022002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views15 pages

Sushant

Uploaded by

ompatil16022002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Computational Statistics Capstone Report

On
Karl Pearson’s Coefficient of Correlation

MTech Year I Semester-I

Computer Science and Engineering (Data Science)
By
Sushant Kothari (24CD1005)

Supervisor
Prof. Chandrakant Gaikwad
Theory
Karl Pearson's Coefficient of Correlation (r) is a widely used statistical
measure that quantifies the strength and direction of a linear relationship
between two variables. It is a value between -1 and 1, and the closer the
value is to either extreme, the stronger the relationship between the two
variables. It was developed by the British statistician Karl Pearson in 1896
and has since become a fundamental tool in statistics and data analysis.

Key Concepts in Pearson's Correlation:

• Linearity: Pearson’s (r) measures only linear relationships between two

variables. A linear relationship means that changes in one variable result in
proportional changes in another variable, following a straight-line pattern. If
the relationship is non-linear (e.g., exponential, quadratic), Pearson’s (r)may
fail to capture the strength or direction of the relationship.

• Range: The coefficient (r) lies in the interval:

• r=1: Indicates a perfect positive linear relationship. As one variable

increases, the other increases in a perfectly predictable manner.
• r = −1: Indicates a perfect negative linear relationship. As one variable increases, the
other decreases in a perfectly predictable manner.
• r = 0: Implies no linear relationship between the two variables. However, this does not
mean the variables are unrelated; they could have a non-linear relationship.

Formula:
The Pearson correlation coefficient r between two variables X and Y is
given by the formula:
Where:
Derivation of Pearson’s Correlation
Coefficient:
Step 1: Concept of Covariance:
Covariance is a measure of how two variables vary together. If two
variables tend to increase together, the covariance is positive. If one variable
increases while the other decreases, the covariance is negative. The formula
for covariance is:

Where:

Step 2: Variance of Each Variable

The variance measures the spread or dispersion of a set of values. It’s the
average of the squared deviations from the mean. The formula for the
variance of X and Y is:
The standard deviation is simply the square root of the variance:

Step 3: Normalization by Standard Deviations

The Pearson correlation coefficient normalizes the covariance by

dividing it by the product of the standard deviations of X and Y. This
process scales the covariance to ensure that the resulting correlation
coefficient is a dimensionless value, bound between -1 and 1. Hence, the
formula for Pearson’s (r) becomes:

Substituting the expressions for covariance and standard deviations, the

formula becomes:

This formula gives the final value of Pearson’s correlation coefficient.

Step 4: Interpretation of the Formula

The closer (r) is to 1 or -1, the stronger the linear relationship between X
and Y, and the closer it is to 0, the weaker the relationship.
Python Simulation and Output
Let’s now use Python to calculate Pearson’s correlation coefficient for
two variables with a linear relationship.

Python Code:
Output:
Understanding the Significance of r=0.98:
When the Pearson correlation coefficient, (r), is calculated as 0.98, it
implies an extremely strong positive linear relationship between the
two variables X and Y. Let’s break this down in detail:
1. Strength of the Relationship
• A correlation value of r=0.98 is very close to 1.0, which is the theoretical
maximum for Pearson’s correlation coefficient.
• This indicates that the data points almost perfectly align along a straight
line with a positive slope, suggesting minimal deviation from the
expected linear trend.
• In practical terms, such a strong value means that the changes in X almost
entirely explain the changes in Y, with very little variation left
unaccounted for.
2. Direction of the Relationship
• The positive sign of (r)signifies that as the values of X increase, the
values of Y also increase. This is called a direct relationship. o For
instance, if X represents advertising spend and Y represents sales
revenue, a correlation of 0.98 would indicate that increased spending on
advertising is strongly associated with increased revenue.
• Conversely, if the correlation were negative (e.g., r= −0.98), it would
mean that as X increases, Y decreases in a nearly perfect linear pattern.
3. Scatterplot Analysis
• A scatterplot of the data for r=0.98 would show that almost all data points
lie very close to or directly on a straight line with a positive slope.
• This alignment visually confirms the nearly perfect relationship. Unlike
lower correlation values where data points might show a more scattered
pattern, here the clustering along the line is extremely tight.
Results with Evaluation
Parameters
Interpretation of Results:

1.Strength of Relationship:
• The Pearson correlation coefficient of r= 0.98 signifies an exceptionally
strong positive linear relationship. This means that as X increases, Y
tends to increase in a highly predictable manner with minimal deviation.
• A correlation this high implies that nearly all variation in Y is accounted
for by the variation in X.
2.Direction of Relationship:
• The positive value of r=0.98 indicates a direct relationship, meaning X
and Y increase together.
3.Evaluation Parameters:
• Strength: The correlation value of 0.98 indicates a nearly perfect
positive relationship

• Outliers: Pearson’s correlation is sensitive to outliers. While the

correlation here is very strong, it is critical to ensure no extreme data
points distort this value.
• Linearity Assumption: Since Pearson’s method measures linear
relationships, it’s important to verify that the relationship between X and
Y is indeed linear, which scatterplots or regression residuals can confirm.

4. Visual Representation:
1. Tight Clustering of Points:

• The scatterplot with r=0.98 would show that nearly all data points are
tightly clustered along a straight line with a positive slope.

• The small deviations from the line suggest minimal residual variance,
emphasizing the strength of the linear relationship.

2. Best-Fit Regression Line:

• The best-fit line (calculated using least squares regression) would pass
very close to most of the data points, confirming the high predictability of
Y based on X.

• The slope of this line would indicate the rate of change in Y for a unit
change in X, which is meaningful in practical terms (e.g., a 1-unit increase
in marketing spend increases revenue by a measurable amount).
Conclusion
1. Significance of r = 0.98:
• A Pearson correlation coefficient of 0.98 represents an exceptionally
strong positive linear relationship. Such a high value indicates that the
changes in one variable are almost entirely mirrored by proportional
changes in the other.
• This relationship is extremely reliable for predictive modeling, as nearly
all variance in Y can be explained by X.

2. Implications:
• High Predictability: The high (r) value suggests that X is a very
effective predictor of Y, which is useful in fields like finance,
healthcare, and physics.
• Practical Utility: In applied scenarios, a correlation of 0.98 might
indicate near-perfect synchronization, such as the relationship
between:
• Marketing budget and revenue growth.
• Physical measurements like height and weight.
• Scientific measurements like temperature and reaction rates.

3. Cautions:
• Outliers: Even with r=0.98, a small number of extreme values could
artificially inflate the correlation. It is vital to preprocess the data and
remove or address such outliers.
• Causation: A high correlation does not imply causation. Both X and Y
could be influenced by external or latent factors.
4. Final Insight: Broader Implications of r=0.98:
4.1 Quantitative Strength of r=0.98:
• A correlation coefficient of r=0.98underscores a near-perfect linear
association, indicating that almost every fluctuation in one variable X is
mirrored by a proportional change in the other variable Y.
• This strong relationship offers confidence that predictive models based on
this data will be highly accurate, reducing uncertainty in decision-making.

4.2 Practical Applications:

• Finance and Economics:
o In financial markets, such a high correlation might indicate that two
stocks or indices move almost in lockstep, making them ideal
candidates for hedging strategies or portfolio diversification studies. o
In economics, r=0.98r = 0.98r=0.98 might suggest a strong linkage
between consumer spending and disposable income, aiding
policymaking.  Healthcare:
o In health-tech, such a high correlation between a diagnostic metric
(e.g., blood pressure) and an outcome (e.g., risk of stroke) could lead to
early intervention strategies.
• Education:
o A study of test preparation hours and scores yielding r=0.98r =
0.98r=0.98 would indicate that targeted intervention programs
could have substantial impacts.
• Environmental Science:
o In climate studies, r=0.98r = 0.98r=0.98 between carbon emissions
and temperature rise would strongly emphasize the importance of
reducing emissions.

4.3. Reliability for Predictive Analysis:

• With r=0.98r = 0.98r=0.98, the predictability of Y given X is
exceptionally high, making this relationship a foundation for robust
machine learning models or statistical forecasting methods.
• Regression models built on such a relationship will have minimal
prediction errors, enabling precise and actionable insights.

4.4. Confidence in Linearity:

• This high correlation confirms that the underlying relationship is truly
linear with minimal noise. It negates the need to explore non-linear
models unless theoretical or contextual factors suggest otherwise.
• The exceptional value of r=0.98 assures that deviations from linearity are
negligible, making this relationship ideal for regression-based predictions
without requiring complex transformations or higher-order terms.
Thus,
The Pearson correlation coefficient of r=0.98 highlights a nearly perfect
positive linear relationship between two variables. This relationship is both
statistically significant and practically valuable, providing a strong foundation
for predictive analysis, strategic decision-making, and research exploration.
The high (r) value ensures that X is an excellent predictor of Y, with 96.04%
of the variation in Y explained by X. However, careful consideration must be
given to potential outliers, causation, and the assumptions of linearity.
Ultimately, r = 0.98 demonstrates the robustness of Karl Pearson’s correlation
method for capturing linear relationships and its invaluable role in a wide
array of disciplines, from business to science. This finding paves the way for
impactful real-world applications, offering clarity, predictability, and
actionable insights.

Iso 22476-1-2012
100% (2)
Iso 22476-1-2012
44 pages
Correlation New
100% (1)
Correlation New
38 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
PS 135 Chapman
No ratings yet
PS 135 Chapman
4 pages
Understanding Pearson Product-Moment Correlation
No ratings yet
Understanding Pearson Product-Moment Correlation
3 pages
Pearson Correlation Coefficient
No ratings yet
Pearson Correlation Coefficient
14 pages
5 Correlation and Cofficient 2023
No ratings yet
5 Correlation and Cofficient 2023
51 pages
Mean Median and Mode Presentation in A Blue Orange and Yellow Hand Drawn Style
No ratings yet
Mean Median and Mode Presentation in A Blue Orange and Yellow Hand Drawn Style
16 pages
Biostatistics PPT - 6
No ratings yet
Biostatistics PPT - 6
35 pages
BONGGA Statistics-and-Probability 4Q SLM8
No ratings yet
BONGGA Statistics-and-Probability 4Q SLM8
10 pages
Correlation Analysis and Regression 22
No ratings yet
Correlation Analysis and Regression 22
41 pages
Pearson Correlation Coefficient
No ratings yet
Pearson Correlation Coefficient
15 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
63 pages
CORRELATION and hOW TO FIND VALUE OF CORRELATION COEFFICIENT
No ratings yet
CORRELATION and hOW TO FIND VALUE OF CORRELATION COEFFICIENT
12 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
44 pages
Pearson
No ratings yet
Pearson
23 pages
Statistic Group 4
No ratings yet
Statistic Group 4
12 pages
Day 8 - Module Linear Correlation
No ratings yet
Day 8 - Module Linear Correlation
5 pages
Lecture 5 Correlation
No ratings yet
Lecture 5 Correlation
2 pages
The Concept of Correlation
No ratings yet
The Concept of Correlation
2 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
8 pages
Lecture10 Correlation
No ratings yet
Lecture10 Correlation
13 pages
Regression
No ratings yet
Regression
5 pages
Chapter 4 Correlation PART 1
No ratings yet
Chapter 4 Correlation PART 1
52 pages
Pearson
0% (2)
Pearson
7 pages
Pearson Correlation Coefficient
No ratings yet
Pearson Correlation Coefficient
4 pages
Psych Assess Chap 4
No ratings yet
Psych Assess Chap 4
5 pages
Definition
100% (1)
Definition
5 pages
MATH 121 (Chapter 10) - Correlation & Regression
No ratings yet
MATH 121 (Chapter 10) - Correlation & Regression
30 pages
Correlation Coefficients
No ratings yet
Correlation Coefficients
3 pages
Eight Things You Need To Know About Interpreting Correlations
No ratings yet
Eight Things You Need To Know About Interpreting Correlations
9 pages
Correlational Analysis
No ratings yet
Correlational Analysis
31 pages
Correlation
No ratings yet
Correlation
8 pages
Correlation 1
100% (1)
Correlation 1
57 pages
Correlation
100% (1)
Correlation
78 pages
Sas#20-Acc 116
No ratings yet
Sas#20-Acc 116
9 pages
Introduction To Correlation and Regression Analysis
No ratings yet
Introduction To Correlation and Regression Analysis
14 pages
Correlation
No ratings yet
Correlation
44 pages
Pearson's Sample Coefficient Correlation: Lesson 1
No ratings yet
Pearson's Sample Coefficient Correlation: Lesson 1
7 pages
Karl Pearson's Coefficient of Correlation: Formula
No ratings yet
Karl Pearson's Coefficient of Correlation: Formula
2 pages
Pearson R
No ratings yet
Pearson R
43 pages
CORRELATION
No ratings yet
CORRELATION
61 pages
Pearson's Correlation Coefficient
No ratings yet
Pearson's Correlation Coefficient
7 pages
Lesson 10 Relationship Between Variables
No ratings yet
Lesson 10 Relationship Between Variables
85 pages
Correlation Analysis PEARSON
No ratings yet
Correlation Analysis PEARSON
13 pages
Lesson 7 Pearson Product of Moment Coefficient Correlation
No ratings yet
Lesson 7 Pearson Product of Moment Coefficient Correlation
6 pages
Correlation Lecture
No ratings yet
Correlation Lecture
20 pages
The Pearsons Correlation Coefficient
No ratings yet
The Pearsons Correlation Coefficient
11 pages
Linear Correlation Coefficient
No ratings yet
Linear Correlation Coefficient
3 pages
Correlation
No ratings yet
Correlation
8 pages
Pearson R Tutorial
No ratings yet
Pearson R Tutorial
6 pages
Correlation
No ratings yet
Correlation
59 pages
Correlation
No ratings yet
Correlation
14 pages
Correlation Analysis - Final
No ratings yet
Correlation Analysis - Final
40 pages
19 - Correlation and Regression
No ratings yet
19 - Correlation and Regression
7 pages
Correlation and Its Significance
No ratings yet
Correlation and Its Significance
15 pages
Correlation Analysis in Excel
No ratings yet
Correlation Analysis in Excel
7 pages
Course Pack Correlation
No ratings yet
Course Pack Correlation
12 pages
Correlation and Regression
No ratings yet
Correlation and Regression
13 pages
Pearson's Correlation Coefficient
No ratings yet
Pearson's Correlation Coefficient
7 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
I. Objectives: School Level Teacher Learning Area Teaching Dates and Time Quarter
100% (1)
I. Objectives: School Level Teacher Learning Area Teaching Dates and Time Quarter
9 pages
1888 1982 Reff2021
No ratings yet
1888 1982 Reff2021
13 pages
Cost Engineering Guide For The Press Break Operations
No ratings yet
Cost Engineering Guide For The Press Break Operations
19 pages
Signals
No ratings yet
Signals
3 pages
Introduction & Objectives (Exp 4)
No ratings yet
Introduction & Objectives (Exp 4)
3 pages
Conditional Statements: Goals - Recognize A Conditional Statement - Write The Converse, Inverse, and Conditional Statement
No ratings yet
Conditional Statements: Goals - Recognize A Conditional Statement - Write The Converse, Inverse, and Conditional Statement
16 pages
Quantum Algorithms For Lattice Problems
No ratings yet
Quantum Algorithms For Lattice Problems
65 pages
Physics Practical: Computer Science Engineering 1 ST Year
No ratings yet
Physics Practical: Computer Science Engineering 1 ST Year
9 pages
22-23 Chapterwise Questions - 2
No ratings yet
22-23 Chapterwise Questions - 2
116 pages
Full The Ideas of Particle Physics 4th Edition James Dodd PDF All Chapters
100% (3)
Full The Ideas of Particle Physics 4th Edition James Dodd PDF All Chapters
55 pages
University of Delhi: Semester Examination JUNE 2024 Statement of Marks / Grades
No ratings yet
University of Delhi: Semester Examination JUNE 2024 Statement of Marks / Grades
2 pages
Narrative Report (Carousel Ride) Group 4 - St. Thomas Aquinas
No ratings yet
Narrative Report (Carousel Ride) Group 4 - St. Thomas Aquinas
7 pages
Specifications: Varistor GNR20D
No ratings yet
Specifications: Varistor GNR20D
5 pages
Scientific Method
No ratings yet
Scientific Method
3 pages
A Constitutive Model For The Non-Shock Ignition and Mechanical Response of High Explosives
No ratings yet
A Constitutive Model For The Non-Shock Ignition and Mechanical Response of High Explosives
20 pages
Mayne Peuchen 2022 Undrainedshearstrengthofclaysfrompiezoconetestdatabase
No ratings yet
Mayne Peuchen 2022 Undrainedshearstrengthofclaysfrompiezoconetestdatabase
8 pages
Engineering Properties of Rocks
No ratings yet
Engineering Properties of Rocks
61 pages
11.8a Ghibson GB015
No ratings yet
11.8a Ghibson GB015
2 pages
PW Notes (Physics) 2
No ratings yet
PW Notes (Physics) 2
44 pages
SM Cie 3
No ratings yet
SM Cie 3
3 pages
2-Combine Report Important Factor 1.5
No ratings yet
2-Combine Report Important Factor 1.5
49 pages
(Code: 4340501) : Process Heat Transfer Course Code: 4340501
100% (1)
(Code: 4340501) : Process Heat Transfer Course Code: 4340501
9 pages
OPENSEES
No ratings yet
OPENSEES
41 pages
Regulador de Modulo Dival 600
No ratings yet
Regulador de Modulo Dival 600
8 pages
Chemistry KV Sitapur Ut1
No ratings yet
Chemistry KV Sitapur Ut1
3 pages
wph14 01 Que 20240601
No ratings yet
wph14 01 Que 20240601
32 pages
Lecture 3
No ratings yet
Lecture 3
88 pages
Maths O Level 2021-22
No ratings yet
Maths O Level 2021-22
14 pages

Sushant

Uploaded by

Sushant

Uploaded by

Computational Statistics Capstone Report

MTech Year I Semester-I

Key Concepts in Pearson's Correlation:

• Linearity: Pearson’s (r) measures only linear relationships between two

• Range: The coefficient (r) lies in the interval:

• r=1: Indicates a perfect positive linear relationship. As one variable

Step 2: Variance of Each Variable

Step 3: Normalization by Standard Deviations

The Pearson correlation coefficient normalizes the covariance by

Substituting the expressions for covariance and standard deviations, the

This formula gives the final value of Pearson’s correlation coefficient.

Step 4: Interpretation of the Formula

• Outliers: Pearson’s correlation is sensitive to outliers. While the

2. Best-Fit Regression Line:

4.2 Practical Applications:

4.3. Reliability for Predictive Analysis:

4.4. Confidence in Linearity:

You might also like