0% found this document useful (0 votes)
11 views12 pages

Module 6 Content

Biostatistics for College

Uploaded by

sunshine.catan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views12 pages

Module 6 Content

Biostatistics for College

Uploaded by

sunshine.catan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Module 6

SIMPLE LINEAR REGRESSION

I. INTRODUCTION

In the previous topic (Pearson’s Moment Correla:on), you are tasked to determine whether the
rela:onship between two variables is significant or not. Also, you were able to draw findings from
sta:s:cal result, contextualize conclusions and provide addi:onal perspec:ve based on the problem.
It was also men:oned that correla:on is not causa:on, however, it is not the case in a regression
analysis. Also, the purpose of regression analysis is to predict by looking at the trend of the regression
line based on the data.

II. OBJECTIVES

At the end of this lesson, you (students) are expected to:


a. Determine the assump:ons of using Simple Linear Regression;
b. apply appropriately Simple Linear Regression using the sta:s:cal soNware; and
c. write findings, conclusions and addi:onal perspec:ve based on the sta:s:cal results of Simple Linear
Regression.

III. LESSON PROPER


There are assumptions that need to be satisfied
before using simple linear regression analysis,
the following are as follows:

Assump+ons of Simple Linear Regression (Bluman, 2009):

1.) The sample is a random sample.

2.) For any specific value of the independent variable x, the value of the dependent
variable y must be normally distributed about the regression line.

3.) The standard devia:on of each of the dependent variables must be the same for
each value of the independent variable.

Extrapola+on, or making predic+ons beyond the bounds of the data, must be interpreted cau+ously.
For example, in 1979, some experts predicted that the United States would run out of oil by the year
2003. This predic:on was based on the current consump:on and on known oil reserves at that :me.
However, since then, the automobile industry has produced many new fuel-efficient vehicles. Also,
there are many as yet undiscovered oil fields. Finally, science may someday discover a way to run a car
on something as unlikely but as common as peanut oil. In addi:on, the price of a gallon of gasoline
was predicted to reach $10 a few years later. Fortunately, this has not come to pass. Remember that
when predic+ons are made, they are based on present condi:ons or on the premise that present

1|Page12
trends will con+nue. This assump:on may or may not prove true in the future (Bluman, 2009, pp.
556).

Difference between correla+on and regression analysis.

2|Page12
In Example 1, I will show you how to compute linear regression manually (however, you are not
required to compute manually). The purpose is for you to appreciate and understand deeply the
meaning of the best fit line and coefficient of determina:on (r²).

Example 1. The data below are the Mathema:cs and Physics scores of the students. Answer the
following ques:ons:

a. Given the Mathema:cs score alone, guess the Math score of we let another student take the
test.
b. What is the varia:on of the dependent variable (Physics scores)
c. How many percent of the varia:on of the dependent variable (DV) is explained by the
independent variable (IV) and what is the best fit line?
d. Is Mathema:cs score a significant predictor of the Physics scores?
e. What are the possible conclusions can you draw based on the sta:s:cal results?

What do you think is the score of another


student who took the test?

Given our lack of


informa:on about the test
takers' characteris:cs, our
best es:mate is derived
from the average
performance of the ini:al
seven students who
previously undertook the
same Physics test. That
serves as our best fit line
and the basis of our predic:on.

What is the variation of the dependent


variable (physics score)

Note: The residual is the distance


between the best fit line and the
observed scores.

3|Page12
How many percent of the variation of the DV
(physics score) is explained by the IV (Math
Score)?

Note that there are many


possible lines to predict the
varia:on of the DV, however,
there is only one best fit line.

The term “Regression line”


and “best fit line” are
synonymous.

4|Page12
No:ce that the graph of the best
fit line has the least
error/residual.

20 30 46 15 30 35 20

22.48 35.36 43.65 16.95 21.56 33.52 22.48

-2.48 -5.36 2.35 -1.95 8.44 1.48 -2.48

6.15 28.73 5.52 3.80 71.23 2.19 6.15

2.
1
5.
9
52
71.23 3
.
8

5.
52
6.1
5
28.73
6.1
5
2.
1
9
28.73

71.23

6.1 6.1
5 5

3
.
8

64 64
4 169 4 49
324

2.
1
5. 6.1
52 5
28.7
9

71.23 3
3 .
8
6.1
5

GOOD FIT!

5|Page12
It is easier to use the sta:s:cal soNware compared to manual computa:on, you may follow the
following steps using the Minitab soNware:

It is not necessary to do step 2


and 3 if the assumptions of using
parametric test have been
satisfied. You may continue
running the regression analysis.

6|Page12
You can no:ce that
both manual
computa:on and
using the Minitab
soNware yield the
same result.

7|Page12
Below are the findings based on the Minitab
statistical results.

Note: No:ce that the slope in the best fit line is posi:ve. By compu:ng r using the Pearson’s
moment correla:on, it is indeed posi:ve. It is shown below. However, by looking at the slope
of the best fit line, you will know whether r is posi:ve or nega:ve.

What are possible conclusion and additional


perspective can you make?

Conclusions and other perspec:ve:

(1) Mastery of Mathema:cs concepts is essen:al to learn Physics or failure to learn


Mathema:cs concepts hinders in learning Physics.
(2) There may be other factors (unexplained varia:on) that explain the Physics score aside
from Mathema:cs score such as the cri:cal thinking, problem-solving skills, etc.

Note: In item 2, it's worth no:ng that while some students excel in mathema:cs, they may
s:ll struggle to develop the cri:cal thinking and problem-solving skills essen:al for processing
and applying mathema:cal concepts to solve physics problem.

8|Page12
Example 2. These data were obtained for the years 1993 through 1998 and indicate the number of
fireworks (in millions) used and the related injuries. Predict the number of injuries if 100 million
fireworks are used during a given year. Use α = 0.05.

Step 1: Input the data in different column.

Step 2: Click regression, regression, fit regression model.

Step 3: Transfer the DV to responses and IV to con:nuous predictors. If the IV is nominal or ordinal
data, it must be transferred to categorical predictors.

9|Page12
Step 4: Interpret the results.

Regression Analysis: Related Injuries versus Fireworks in use

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Regression 1 8115590 8115590 1.44 0.296
Fireworks in use 1 8115590 8115590 1.44 0.296
Error 4 22552744 5638186
Total 5 30668333

Model Summary

S R-sq R-sq(adj) R-sq(pred)


2374.49 26.46% 8.08% 0.00%

Regression Equation

Related Injuries = 16778 - 61.1 Fireworks in use

Findings:

Since the p-value is 0.296, the number of fireworks is not a significant predictor of related
injuries.

Conclusion and other perspec+ve/s:

(1) Increasing the number of fireworks uses will not increase the number of injuries.
(2) Firework is not the cause of injuries, there may be other firecrackers that causes
injuries.

In the next example, I use the problem in the pre-class


activity in the topic Pearson’s Moment Correlation
for you to distinguish the difference between
correlation and regression analysis.

Example 3. Is there soluble protein a significant predictor of


chlorophyll? Assume that all assump:ons of parametric test are
met. Note that chlorosis is the loss of chlorophyll in plants,
especially in the leaves, which causes the plant to weaken and
eventually die. Use α = 0.01. Assume that all parametric
assump:ons are met.

10 | P a g e 1 2
Statistical Result:

Regression Analysis: Chlorophyll versus Soluble Protein


Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 2.3843 2.38426 39.13 0.002
Soluble Protein 1 2.3843 2.38426 39.13 0.002
Error 5 0.3046 0.06092
Total 6 2.6889

Model Summary
S R-sq R-sq(adj) R-sq(pred)
0.246829 88.67% 86.41% 66.49%

Regression Equation
Chlorophyll = -0.240 + 1.093 Soluble Protein

Findings:

(1) Since the p-value is 0.002 is less than 0.01. The soluble protein is a significant predictor of
amount of chlorophyll in plants.

(2) 86.67% of the varia:on of chlorophyll in plants is explained by the amount of soluble
protein.

(3) For every 1 unit increase in soluble protein, there is 1.093 unit increase in the chlorophyll
of plants.

Conclusion/Other perspective/s:

(1) The increase in soluble protein in plants leads to high chlorophyll content. Conversely,
low soluble protein in plants leads to low chlorophyll content.
(2) Ensuring plants have a high concentra:on of soluble protein is vital for promo:ng their
overall health and vitality.
(3) Insufficient soluble protein in plants can lead to diseases and, in severe cases, can result
in death.

11 | P a g e 1 2
IV. REFERENCES

Books
Abbop, M. L., (2017). Using Sta:s:cs In The Social And Health Sciences With Spss® And Excel®.
John Wiley & Sons, Inc

Bluman, A. G., (2009). Elementary Sta:s:cs: A Step by Step Approach (Eight Edi:on). McGraw-Hill

Chaudhary, K., (2020). Introduc:on to Biotechnology and Biosta:s:cs. Delve Publishing

Ho, R., (2018). Understanding Sta:s:cs for the Social Sciences with IBM SPSS. Taylor & Francis
Group, LLC

Navidi, W. & Monk, B., (2019). Elementaty Sta:s:cs (Third Edi:on). McGraw-Hill EducaMon

Internet Source and Related Studies

ANOVA Examples. (n.d.).


hpps://www.people.vcu.edu/~wsstreet/courses/314_20033/Examples.ANOVA.pdf

ANOVA Test - Types, Table, Formula, Examples. (2021). Cuemath.


hpps://www.cuemath.com/anova-formula/

hpp://eagri.org/eagri50/STAM101/pdf/pract07.pdf

hpps://www.cimt.org.uk/projects/mepres/alevel/fstats_ch7.pdf

Indoria, A. K., Sharma, K. L., Reddy, K. S., & Rao, C. S. (2017). Role of soil physical proper:es in soil
health management and crop produc:vity in rainfed systems-I: Soil physical constraints
and scope. Current science, 2405-2414.

hpps://www.kaggle.com/

hpps://sesricdiag.blob.core.windows.net/oicstatcom/TEXTBOOK-CORRELATION-AND-
REGRESSION-ANALYSIS-EGYPT-EN.pdf

hpps://www.cimt.org.uk/projects/mepres/alevel/stats_ch12.pdf

hpps://02402.compute.dtu.dk/enotes/solu:ons-chapter5.pdf

Mathew, T. K., & Tadi, P. (2020). Blood glucose monitoring.

Utah State University. (2024). What is Iron Chlorosis and What Causes it? | Forestry | Extension.
Usu.edu. hpps://extension.usu.edu/forestry/trees-ci:es-towns/tree-care/causes-iron-
chlorosis#:~:text=The%20primary%20symptom%20of%20iron,as%20the%20plant%20ce
lls%20die.

12 | P a g e 1 2

You might also like