100% found this document useful (2 votes)

175 views12 pages

Statistics Regression Final Project

The document discusses correlation and regression analysis. It defines correlation as the relationship between two variables, and identifies three types: positive, negative, and zero correlation. Regression lines represent the trend in the data and can be used to predict future outcomes. Five data sets are presented with varying strengths of correlation to demonstrate the practical applications of correlation plots and regression analysis. While correlation indicates a relationship, it does not necessarily imply causation.

Uploaded by

Henry Pinolla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

175 views12 pages

Statistics Regression Final Project

Uploaded by

Henry Pinolla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

1

Basic Statistics

FOR REFERENCE ONLY - PROPERTY OF AUTHOR

Practical applications of correlation plots and regression.

Introduction

I will be explaining what correlation is, what regression lines are, and also

how each is determined. To help visualize the practical applications for which correlation plots

and regressions are used for, I will show 5 data sets with correlation lines using Excel. However,

any program capable of drawing scatter plots with linear correlations will suffice. After a

correlation line is added and a regression equation determined for each of the 5 data sets, I hope

that their real-world application will be evident.

Definition of Correlation

A correlation is simply the relationship or interdependence of two variables. Correlations

are useful because they can indicate a predictive relationship. For instance, the amount of time a

student spends studying (the first variable) and their academic performance (the second

variable). Common sense would dictate that the more time a student spends studying the better

their academic performance, and vice versa. Hence, the first variable and the second variable are

linked in that their, potentially, is a positive effect.

In statistics, these variables are commonly defined as x a nd y. X is called the

independent variable and Y is called the dependent variable. In the example above, the amount

spent studying for an exam is independent as it does not depend on anything and is up to each

individual student. We can call that x. The students’ academic performance, on the other hand,

does depend on the amount spent studying and therefore we can call that the y variable.

There are three types of correlations. These are positive correlation, negative correlation,

and zero (or no) correlation. Positive correlation refers to a dependent variable that shows a

clear relationship that is greater than zero between the x and y variables. For instance, height and

weight are positive correlations since taller people tend to be heavier. A negative correlation

would be a relationship between two variables in which an increase in one variable results in a

decrease in the other variable. For instance, the more time a student spends playing video games,

the lower their GPA. While one variable increases (playing video games), the other decreases

(their GPA). It’s important to note that a negative correlation does not imply a negative side

effect. As a basic example, the more time a person spends exercising, the lower their weight

tends to be; a negative correlation but not inherently bad. A zero correlation is one in which

there is no relationship between two variables. For instance, the lumen of a flashlight and how

waterproof it is have no linear relationship whatsoever and therefore we can not determine a

correlation between the two variables (luminosity and water-resistance).

A positive correlation ranges from > 0 to +1, with intervals in between that determining

the strength of the correlation. A negative correlation ranges from < 0 to -1. And a zero

correlation is simply a 0. A +1 correlation is a perfect positive correlation. Similarly, a -1 is a

perfect negative correlation.

● A correlation of +0.5 is a stronger positive correlation than +0.33.

● A correlation of -0.2 is a weaker negative correlation than -0.

● The closer the data points are to the lines, the “stronger” the correlation is. If there are

many outliers, it can be said that the correlation is “moderate” or even “weak”. If there is

no distinguishable relationship, then there is no correlation.

Source: danshiebler.com

Scatter Plots

A scatterplot is a graph that is used to plot the data points for two or more variables. Each

scatterplot has a horizontal axis (x-axis) and a vertical axis (y-axis). One variable is plotted on

each axis. Scatterplots are made up of marks; each mark represents one study participant's

measures on the variables that are on the x-axis and y-axis of the scatterplot. Most scatter plots

contain a line of best fit, which is a straight line drawn through the center of the data points that

best represents the trend of the data. Scatter plots provide a visual representation of the
4

relationship between the variables and make it easier to spot trends quickly.

In statistics, the correlation coefficient r measures the strength and direction of a linear

relationship between two variables on a scatter plot. The correlation coefficient tells us how

closely the data variables of a scatter plot fall along a trend line (closer to the trend line would

indicate a strong correlation while further away would indicate a relatively weaker correlation).

The value of r is always between +1 and –1. To interpret its value, see which of the

following values your correlation coefficient r is closest to:

● r = –1 A perfect downhill (negative) linear relationship

● r = –0.70 A strong downhill (negative) linear relationship

● r = –0.50 A moderate downhill (negative) relationship

● r = –0.30 A weak downhill (negative) linear relationship

● r = 0 No linear relationship (zero correlation)

● r = +0.30 A weak uphill (positive) linear relationship

● r = +0.50 A moderate uphill (positive) relationship

● r = +0.70 A strong uphill (positive) linear relationship

● r = +1 A perfect uphill (positive) linear relationship

Source: sciencedirect.com

Regression Lines

A regression line is a straight line that describes a data set in a visual way. It’s also

known as a trend line or “line of best fit”. Regression lines are very useful for predicting future

outcomes and trends. The purpose of the line is to describe the correlation of a dependent

variable, y, with one or more independent variables, x. Regression lines are used in a variety of

ways. Some of the more common ways that they are used are when predicting pandemic

infection rates, predicting stock prices, predicting sports odds and gambling and other areas

where a strong trend may point to a predictable potential future outcome.

Data Sets and Examples

Data Set 1

This scatter plot shows a strong positive linear correlation indicating that the more time a student

spends studying the higher their test scores will be. As the X axis increases, the Y axis increases

with a linear upwards trend. The correlation coefficient is 0.86. The closer a correlation

coefficient is to 1, the stronger the correlation and thus this proves to be a strong positive

correlation. The data points are close to the trend line and are indicative of a strong correlation.

If the number of hours of studying is 7, the predicted test score is 76. Using the slope

intercept formula y = 3.855x + 49.156, if X = 7, then y = (3.855 x 7) + 49.156.

Data Set 2

This scatter plot represents a moderately weak negative linear correlation with a coefficient of

-0.46. As x increases, y tends to decrease with a linear downwards trend. However, compared to

the first data set, it is easy to see that the correlation is not as strong since the data points are

further from the trend line. According to this graph, for one reason or another, the more time a

student spends in a lab the lower their course grade is.

Data Set 3

This plot represents a strong (nearly perfect) negative linear correlation with a coefficient of

-0.98. As the age of a person increases, the amount of hours spent jogging per week decreases.

The data points are almost on the trend line itself - indicating a very strong correlation.

If we predict the amount of hours a 40 year old person jogs per week, we can use the intercept

and predict that he or she jogs approximately 4.6 hours per week. If x = 40, then y = (-0.1396 x

40) + 10.199 = 4.615.

Data Set 4

This scatter plot represents a moderate positive linear correlation, with a coefficient of

correlation r = 0.59. Some data points are on near the trend line while others are further away

and thus this shows a neither strong nor weak linear correlation. According to this graph,

spending more on advertising may influence the number of products sold in a positive way.
10

Data Set 5

Data set 5 plot shows no linear relationship between the variables as data points do not have a

clear trend line and are scattered randomly throughout. According to this graph, temperature

does not affect plant growth.

Causation versus correlation

Correlation, as defined above, indicates a simple relationship between the values of two

variables. A scatter plot displays this data and is a useful tool for visually determining if there

exists a correlation between the variables' strength.

Causation means that one event causes another event to occur. Causation only applies

when one variable has been proven to cause a change in a dependent variable. Causation is

determined by testing and rigorous experiments with at least 95% confidence intervals.

Causation and correlation can occur simultaneously between two data sets. However,

correlation does not imply causation. As an example, there seems to be a correlation between the

number of 5G cell phone towers and confirmed COVID-19 cases on maps. However, there is no

evidence that the 5G cell phone towers actually cause or increase the risk of getting COVID-19.

The correlation might be that the areas with 5G towers tend to be in large metropolitan areas

with larger populations and that may account for the increase in COVID-19 cases compared to

cities with lower populations and no 5G towers. The 5G towers can’t be said to “cause or

increase the risk” of contracting COVID-19 and thus there is no causal link even though a

positive correlation may be seen. We are always looking for patterns around us to explain what

we see and find links between things. Events that seem to “connect” based on our own common

sense and judgement can not be said to be causal unless tested and should be assumed to be

correlations.
12

Conclusion

In conclusion, scatter plots, correlation and regression lines are very useful statistical

tools that help determine the relationship between a set of data. We can forecast and predict

future outcomes making their usage and interpretation very important in a variety of settings.

Scatter plots allow us to visualize data and quickly determine what type of, if any, correlation

exists between two variables. One drawback of a scatter plot is that it may be used to present

data that shows correlation but not causation and presented as evidence of a false link between

two variables. For instance, in data set 3, it can be said that as we age we tend to jog less per

week. However, this is based off of 6 people and can not be implied that getting older results in

jogging less hours per week. The sample size is very small and the people may have been

cherry-picked to imply causation. More rigorous experiments and studies would need to be done

to determine if there is a causative factor. Knowing these important statistical measurement tools

can help us better understand the relationship between various factors in our world.

FINANCIAL EDUCATION AND MEDITATION ON TRUE WEALTH
From Everand
FINANCIAL EDUCATION AND MEDITATION ON TRUE WEALTH
tiago tozzi nunes
No ratings yet
Kim Dissertation
No ratings yet
Kim Dissertation
301 pages
Discrete Probability Distribution Updated
No ratings yet
Discrete Probability Distribution Updated
44 pages
Brightspace Orientation NEIS PDF
No ratings yet
Brightspace Orientation NEIS PDF
24 pages
Statistics Chapter 4 Project
No ratings yet
Statistics Chapter 4 Project
3 pages
2012 - Duration As A Measure of Time Structure of Bond and Interest Rate Risk - IJEP PDF
No ratings yet
2012 - Duration As A Measure of Time Structure of Bond and Interest Rate Risk - IJEP PDF
12 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
18 pages
Unit II - Correlation
No ratings yet
Unit II - Correlation
28 pages
Statistical Experiment: 5.1 Random Variables and Probability Distributions
No ratings yet
Statistical Experiment: 5.1 Random Variables and Probability Distributions
23 pages
Statistics - Linear Regression - Correlation Worksheet PDF
No ratings yet
Statistics - Linear Regression - Correlation Worksheet PDF
2 pages
STAB22 Data Analysis Project Instruction-1-已转档
No ratings yet
STAB22 Data Analysis Project Instruction-1-已转档
7 pages
Business Applications of Multiple Regression
50% (4)
Business Applications of Multiple Regression
48 pages
Statistics Basic (1-3)
No ratings yet
Statistics Basic (1-3)
37 pages
R Examples
No ratings yet
R Examples
56 pages
Correlation Final
No ratings yet
Correlation Final
52 pages
Case 2 - Body Shop International - Teaching Note
No ratings yet
Case 2 - Body Shop International - Teaching Note
11 pages
The Ruler's Gaze: A Study of British Rule over India from a Saidian Perspective
From Everand
The Ruler's Gaze: A Study of British Rule over India from a Saidian Perspective
Arvind Sharma
No ratings yet
Ruck Man
No ratings yet
Ruck Man
180 pages
Math1325 1 Sided Limits
No ratings yet
Math1325 1 Sided Limits
9 pages
Potential of The R Packages in Engineering
No ratings yet
Potential of The R Packages in Engineering
14 pages
2.descriptive Statistics-Measures of Central Tendency
100% (1)
2.descriptive Statistics-Measures of Central Tendency
25 pages
Using The Google Chart Tools With R
No ratings yet
Using The Google Chart Tools With R
40 pages
Student's T Distribution
No ratings yet
Student's T Distribution
17 pages
Assignment
No ratings yet
Assignment
9 pages
Ch08 Sampling Methods and The Central Limit Theorem
No ratings yet
Ch08 Sampling Methods and The Central Limit Theorem
13 pages
03 Visualization
0% (1)
03 Visualization
89 pages
Practice Test - Chap 7-9
No ratings yet
Practice Test - Chap 7-9
12 pages
Vuelve A Intentarlo Cuando Estés Listo.: Week 3 Quiz
No ratings yet
Vuelve A Intentarlo Cuando Estés Listo.: Week 3 Quiz
4 pages
15 Linear Regression in Geography
No ratings yet
15 Linear Regression in Geography
24 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
17 pages
Palompon Institute of Technology Palompon, Leyte: FD 502 (Educational Statitics)
No ratings yet
Palompon Institute of Technology Palompon, Leyte: FD 502 (Educational Statitics)
18 pages
Assignment 3 - Answer Key
No ratings yet
Assignment 3 - Answer Key
13 pages
Macaulay's Duration An Appreciation
No ratings yet
Macaulay's Duration An Appreciation
5 pages
Chapter 1: Descriptive Statistics: 1.1 Some Terms
No ratings yet
Chapter 1: Descriptive Statistics: 1.1 Some Terms
15 pages
Chap4 Normality (Data Analysis) FV
100% (1)
Chap4 Normality (Data Analysis) FV
72 pages
Business Statistics: A Decision-Making Approach: Using Probability and Probability Distributions
100% (2)
Business Statistics: A Decision-Making Approach: Using Probability and Probability Distributions
41 pages
Chapter 4 Student
No ratings yet
Chapter 4 Student
15 pages
Linear Regression
No ratings yet
Linear Regression
28 pages
R Exercice
No ratings yet
R Exercice
11 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Time Series Analysis
No ratings yet
Time Series Analysis
24 pages
Chapter 1 Assignment What Is Statistics?
No ratings yet
Chapter 1 Assignment What Is Statistics?
2 pages
Asst. Prof. Florence C. Navidad, RMT, RN, M.Ed
100% (1)
Asst. Prof. Florence C. Navidad, RMT, RN, M.Ed
37 pages
Duration - and - Convexity
No ratings yet
Duration - and - Convexity
22 pages
Introduction To Rstudio: Creating Vectors
No ratings yet
Introduction To Rstudio: Creating Vectors
11 pages
Applying Duration: A Bond Hedging Example
No ratings yet
Applying Duration: A Bond Hedging Example
8 pages
9B BMGT 220 THEORY of ESTIMATION 2
No ratings yet
9B BMGT 220 THEORY of ESTIMATION 2
4 pages
Chap 016
No ratings yet
Chap 016
59 pages
Statistics and Standard Deviation
100% (10)
Statistics and Standard Deviation
50 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Statistics and Probability: Quarter 4 - Module
0% (1)
Statistics and Probability: Quarter 4 - Module
17 pages
Chap 2
No ratings yet
Chap 2
41 pages
Nature of Regression Analysis
No ratings yet
Nature of Regression Analysis
19 pages
Displaying Descriptive Statistics: Chapter 2 Map
No ratings yet
Displaying Descriptive Statistics: Chapter 2 Map
58 pages
Oneway
No ratings yet
Oneway
37 pages
Probability and Statistics
No ratings yet
Probability and Statistics
8 pages
Total Quality Management: A Continuous Improvement Process
No ratings yet
Total Quality Management: A Continuous Improvement Process
26 pages
Confidence Intervals PDF
No ratings yet
Confidence Intervals PDF
5 pages
Groebner Business Statistics 7 Ch07
No ratings yet
Groebner Business Statistics 7 Ch07
34 pages
Sample MT 2 Mckey
No ratings yet
Sample MT 2 Mckey
5 pages
R Docs
No ratings yet
R Docs
45 pages
Partial Correlation
No ratings yet
Partial Correlation
28 pages
Da Test 2
No ratings yet
Da Test 2
68 pages
Descriptive and Graphical Analysis Using R
No ratings yet
Descriptive and Graphical Analysis Using R
40 pages
BRM Presentation Group 5 - Univariate & Bivariate Analysis
No ratings yet
BRM Presentation Group 5 - Univariate & Bivariate Analysis
26 pages
Meanings of Statistics
No ratings yet
Meanings of Statistics
28 pages
Introduction Statistics
100% (1)
Introduction Statistics
23 pages
C3 Graphic Organizers
No ratings yet
C3 Graphic Organizers
25 pages
Textbook of Urgent Care Management: Chapter 41, Measuring and Improving Patient Satisfaction
From Everand
Textbook of Urgent Care Management: Chapter 41, Measuring and Improving Patient Satisfaction
Sybil Yeaman
No ratings yet
Ccw331 Lab Manual
No ratings yet
Ccw331 Lab Manual
102 pages
Correlation Analysis
No ratings yet
Correlation Analysis
40 pages
Logistics Regression
No ratings yet
Logistics Regression
14 pages
Ancient Coin
No ratings yet
Ancient Coin
9 pages
STAT Lab
No ratings yet
STAT Lab
6 pages
A-Level H2 Maths 2016 - Paper 2
No ratings yet
A-Level H2 Maths 2016 - Paper 2
11 pages
Partial Correlation
No ratings yet
Partial Correlation
2 pages
2.1. Answer The Following: 2.1.1. What Is A Chart?: MS Excel: Charts
No ratings yet
2.1. Answer The Following: 2.1.1. What Is A Chart?: MS Excel: Charts
6 pages
Correlation Regression 1
No ratings yet
Correlation Regression 1
9 pages
Survey and Analysis of Energy Consumption in University Campuses
No ratings yet
Survey and Analysis of Energy Consumption in University Campuses
6 pages
A Framework For Analysis Chapter 2
No ratings yet
A Framework For Analysis Chapter 2
6 pages
Our Lady of Fatima University Midterm Reviewer SASA211 Chapter 3: Graphing Data
No ratings yet
Our Lady of Fatima University Midterm Reviewer SASA211 Chapter 3: Graphing Data
7 pages
Grade 8 - Scatters Diagram
No ratings yet
Grade 8 - Scatters Diagram
36 pages
Graphs Are Picture Representatives For 1 or More Sets of Information and How These Visually Relate To One Another
No ratings yet
Graphs Are Picture Representatives For 1 or More Sets of Information and How These Visually Relate To One Another
3 pages
Data - Visualisation - Charts and Types of Data
No ratings yet
Data - Visualisation - Charts and Types of Data
7 pages
Lesson Plan Scatter Graphs
100% (2)
Lesson Plan Scatter Graphs
2 pages
General Maths Unit 3&4 Reference Book
No ratings yet
General Maths Unit 3&4 Reference Book
73 pages
Statistics Workshop With Excel
No ratings yet
Statistics Workshop With Excel
4 pages
1.2.7 Practice - Patterns in Data (Practice)
No ratings yet
1.2.7 Practice - Patterns in Data (Practice)
5 pages
Bio Statistics - Question & Answers
83% (12)
Bio Statistics - Question & Answers
157 pages
Biometry Lecture 2 Part 2 Posted
No ratings yet
Biometry Lecture 2 Part 2 Posted
41 pages

Statistics Regression Final Project

Uploaded by

Statistics Regression Final Project

Uploaded by

1

FOR REFERENCE ONLY - PROPERTY OF AUTHOR

Practical applications of correlation plots and regression.

that their real-world application will be evident.

A correlation is simply the relationship or interdependence of two variables. Correlations

linked in that their, potentially, is a positive effect.

correlation between the two variables (luminosity and water-resistance).

correlation is simply a 0. A +1 correlation is a ​perfect positive correlation.​ Similarly, a -1 is a

perfect negative correlation.

● A correlation of +0.5 is a stronger positive correlation than +0.33.

● A correlation of -0.2 is a weaker negative correlation than -0.

no distinguishable relationship, then there is no correlation.

following values your correlation coefficient ​r​ is closest to:

● r = ​–1 A perfect downhill (negative) linear relationship

● r = ​–0.70 A strong downhill (negative) linear relationship

● r = ​–0.50 A moderate downhill (negative) relationship

● r ​= –0.30 A weak downhill (negative) linear relationship

● r ​= 0 No linear relationship (zero correlation)

● r ​= +0.30 A weak uphill (positive) linear relationship

● r ​= +0.50 A moderate uphill (positive) relationship

● r​ = +0.70 A strong uphill (positive) linear relationship

● r​ = +1 A perfect uphill (positive) linear relationship

where a strong trend may point to a predictable potential future outcome.

Data Sets and Examples

intercept formula y = 3.855x + 49.156, if X = 7, then y = (3.855 x 7) + 49.156.

student spends in a lab the lower their course grade is.

40) + 10.199 = 4.615.

does not affect plant growth.

Causation versus correlation

exists a correlation between the variables' strength.

You might also like

correlation is simply a 0. A +1 correlation is a perfect positive correlation. Similarly, a -1 is a

following values your correlation coefficient r is closest to:

● r = –1 A perfect downhill (negative) linear relationship

● r = –0.70 A strong downhill (negative) linear relationship

● r = –0.50 A moderate downhill (negative) relationship

● r = –0.30 A weak downhill (negative) linear relationship

● r = 0 No linear relationship (zero correlation)

● r = +0.30 A weak uphill (positive) linear relationship

● r = +0.50 A moderate uphill (positive) relationship

● r = +0.70 A strong uphill (positive) linear relationship

● r = +1 A perfect uphill (positive) linear relationship