0% found this document useful (0 votes)
21 views9 pages

Activity 5 - Correlation and Regression Analysis

Correlation and Regression Analysis

Uploaded by

guihulngan city
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views9 pages

Activity 5 - Correlation and Regression Analysis

Correlation and Regression Analysis

Uploaded by

guihulngan city
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

NAME: SETH FRANJO F.

SARANILLO

Activity: Exercise 5 Correlation and Regression Analysis

Instruction: Download the data (Activity 5 Data.xls) provided along with this worksheet. Using
MS Excel calculate the correlation coefficient between patient’s Age and Blood Salt
Concentration. Interpret the result. Then perform linear regression analysis to determine the
functional relationship between the variables Age and Blood Salt Concentration. Report and
interpret the regression equation between Age and Blood Salt Concentration.

Table 1. Age and Blood Salt Concentration of 20 Patients

Age Blood Salt


Patient
(year) Concentration (mg)
1 37 1.435
2 39 1.526
3 44 1.686
4 45 1.548
5 55 1.131
6 56 1.526
7 58 1.988
8 60 1.099
9 64 2.46
10 67 1.386
11 71 2.002
12 72 2.617
13 74 1.917
14 76 1.723
15 76 2.054
16 77 1.825
17 81 2.054
18 84 1.932
19 89 2.262
20 91 2.701
Interpretation:

Table 1 shows the Ages of the 20 patients with their corresponding Blood Salt Concentration
results. Based on the results, Patient 20 has the highest blood salt concentration result with
2.701 mg and appears to be the oldest among all patients (91 years old) while the patient that
has the lowest blood salt concentration result is Patient 8 (60 years old) with 1.099 mg.

Table 2. Correlation Coefficient Between Patient’s


Age and Blood Salt Concentration

Age (year) Blood Salt Concentration (mg)


Age (year) 1
Blood Salt Concentration (mg) 0.620137145 1

Interpretation:

Table 2 shows the correlation coefficient between a patient's age and blood salt concentration.
In this case, the correlation coefficient between a patient's age and blood salt concentration is
0.620137145. This indicates a positive correlation between the two variables. A positive
correlation coefficient indicates a positive relationship between the variables, meaning that as
one variable increases, the other variable tends to increase as well. In this case, as the age of
the patient increases, the blood salt concentration also tends to increase. However, it is
important to note that correlation does not imply causation.
LINEAR REGRESSION ANALYSIS BETWEEN PATIENT’S
AGE AND BLOOD SALT CONCENTRATION

Table 3. Regression Statistics


Multiple R 0.620137145
R Square 0.384570079
Adjusted R Square 0.350379528
Standard Error 13.03455741
Observations 20

Interpretation:

Table 3 shows the regression statistics between a patient's age and blood salt concentration
regression model.

1. Multiple R represents the correlation coefficient between the dependent variable and the
independent variables in the regression model. In this case, the multiple R is 0.620137145,
indicating a moderate positive correlation between the variables.

2. R Square which is also known as the coefficient of determination, R Square measures the
proportion of the variance in the dependent variable that can be explained by the independent
variables. In this case, the R Square value is 0.384570079, indicating that approximately 38.46%
of the variance in the dependent variable is explained by the independent variables.

3. Adjusted R Square adjusts the R Square value for the number of independent variables and
the sample size. It penalizes the addition of unnecessary variables to the model. In this case, the
Adjusted R Square is 0.350379528.

4. Standard Error measures the average distance between the observed values of the
dependent variable and the predicted values from the regression model. In this case, the
standard error is 13.03455741.

5. Observations represents the number of data points or observations used in the regression
analysis. In this case, there are 20 observations which are the 20 patients.
Table 4. ANOVA
df SS MS F Significance F
1911.00563 11.2478467
Regression 1 1911.005636 6 2 0.003535245
169.899686
Residual 18 3058.194364 9
Total 19 4969.2

Coeffici Standard Lower Upper Lower Upper


t Stat P-value
ents Error 95% 95% 95.0% 95.0%
- -
Intercept 24.4750 12.66190 1.93297 0.06912 2.12658 51.076 2.12658 51.0767
9489 537 0921 8486 1176 771 12 71
Blood Salt
Concentration 22.4153 6.683600 3.35378 0.00353 8.37360 36.457 8.37360 36.4570
(mg) 3147 316 0958 5245 8262 0547 826 547

Interpretation:

Table 4 shows the results of regression analysis which is an analysis of variance (ANOVA) for a
regression model.

ANOVA Table:
- The ANOVA table shows the results of the analysis of variance.
- The "Regression" row indicates the statistics related to the regression model.
- "df" represents the degrees of freedom, which is the number of predictors in the model
(1 in this case).
- "SS" represents the sum of squares, which measures the total variation explained by the
regression model.
- "MS" represents the mean square, which is the sum of squares divided by the degrees
of freedom.
- "F" represents the F-statistic, which is a ratio of the mean square of the regression to
the mean square of the residuals.
- "Significance F" represents the p-value associated with the F-statistic, indicating the
significance of the regression model.
- The "Residual" row represents the statistics related to the residuals or errors in the
model.
- "df" represents the degrees of freedom for the residuals (18 in this case).
- "SS" represents the sum of squares of the residuals.
- "MS" represents the mean square of the residuals.
- The "Total" row represents the total statistics for the model, including the total degrees
of freedom and sum of squares.

Coefficients Table:
- The coefficients table provides information about the estimated coefficients of the
regression model.
- The table includes columns for the coefficient names, standard errors, t-statistics, p-
values, and confidence intervals.
- In this case, there are two coefficients: "Intercept" and "Blood Salt Concentration (mg)".
- For each coefficient, the table provides the estimated value, standard error, t-statistic,
p-value, and 95% confidence interval.

The regression model has a significant F-statistic (11.24784672) with a p-value of 0.003535245,
indicating that the model as a whole is statistically significant. The coefficient for the "Intercept"
is 24.4750 with a standard error of 12.66190. The p-value (0.069128486) suggests that the
intercept is not statistically significant at the conventional significance level of 0.05. The
coefficient for "Blood Salt Concentration (mg)" is 22.4153 with a standard error of 6.683600.
The p-value (0.003535245) indicates that the variable is statistically significant in predicting the
response variable. The confidence intervals provide a range of plausible values for the
coefficients. For example, the 95% confidence interval for the coefficient of "Blood Salt
Concentration (mg)" is (8.37360, 36.4570).

Overall, the regression model suggests that the "Blood Salt Concentration (mg)" variable has a
significant effect on the response variable, while the intercept may not be statistically
significant.
Table 5. Residual Output
Standard
Observation Predicted Age (year) Residuals
Residuals
1 56.64109556 -19.64109556 -1.54813906
2 58.68089072 -19.68089072 -1.55127577
3 62.26734376 -18.26734376 -1.43985799
4 59.17402802 -14.17402802 -1.11721703
5 49.82683479 5.173165209 0.407756231
6 58.68089072 -2.680890724 -0.21131162
7 69.03677386 -11.03677386 -0.8699342
8 49.10954418 10.89045582 0.8584012
9 79.61681032 -15.61681032 -1.23093918
10 55.54274432 11.45725568 0.903077171
11 69.35058851 1.649411494 0.130008957
12 83.13601736 -11.13601736 -0.87775671
13 67.44528533 6.55471467 0.516651923
14 63.09671102 12.90328898 1.017055569
15 70.51618574 5.483814258 0.432242031
16 65.38307483 11.61692517 0.915662546
17 70.51618574 10.48381426 0.826349135
18 67.7815153 16.2184847 1.278364006
19 75.17857469 13.82142531 1.08942438
20 85.01890521 5.981094794 0.471438389

Graph 1. Blood Salt Concentration (mg)


Blood Salt Concentration (mg) Residual
Plot
20
15
10
5
Residuals

0
-5 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
-10
-15
-20
-25
Blood Salt Concentration (mg)

Graph 2. Blood Salt Concentration (mg) Line Fit Plot

Blood Salt Concentration (mg) Line Fit Plot


100
90
80 f(x) = 22.4153314742233 x + 24.475094894122
70
60
Age (year)

50 Age (year)
40 Linear (Age (year))
30
20
10
0
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
Blood Salt Concentration (mg)

Interpretation:

The table above shows the residual output for a set of observations between a patient's age
and blood salt concentration.

- Observation column represents the index or number assigned to each observation.

- Predicted Age (year) column represents the predicted age for each observation.
- Residuals column represents the difference between the predicted age and the actual
age for each observation. It is calculated as the predicted age minus the actual age.

- Standard Residuals column represents the standardized residuals for each observation.
Standardized residuals are calculated by dividing the residual by the standard deviation
of the residuals. They provide a measure of how far each observation deviates from the
mean in terms of standard deviations.

In this table, the predicted ages are compared to the actual ages, and the residuals and
standardized residuals are calculated to assess the accuracy of the predictions. Negative
residuals indicate that the predicted age is lower than the actual age, while positive residuals
indicate that the predicted age is higher than the actual age. The standardized residuals provide
a standardized measure of the deviation from the mean.

Table 6. Probability Output Graph 3. Normal Probability Plot

Percentile Age (year)


2.5 37 Normal Probability Plot
7.5 39
100
12.5 44 90
17.5 45 80
22.5 55 70
60
27.5 56
Age (year)

50
32.5 58 40
37.5 60 30
42.5 64 20
10
47.5 67
0
52.5 71 0 20 40 60 80 100 120
57.5 72 Sample Percentile
62.5 74
67.5 76
72.5 76
77.5 77
82.5 81
87.5 84
92.5 89
97.5 91

Interpretation:
Table above shows the probability output that assess whether a set of data follows a normal
distribution. It compares the observed data to the expected values of a normal distribution. In a
normal probability plot, the x-axis represents the expected percentiles of a normal distribution,
while the y-axis represents the observed values.

In the table 6 above, the first column represents the percentiles, ranging from 2.5% to 97.5%.
The second column represents the corresponding ages in years.

By examining the pattern of the dots on the graph 3, you can determine if the data follows a
normal distribution. If the dots roughly follow a straight line, it suggests that the data is
normally distributed. If the dots deviate significantly from a straight line, it indicates that the
data does not follow a normal distribution. In this graph, the dots roughly follow a straight line,
so therefore I conclude that the data is normally distributed because the points look closer align
with the straight line, which means stronger the evidence of normality.

You might also like