0% found this document useful (0 votes)
13 views22 pages

Adhithyan

The document is a practical record for the B.Sc. (Hons.) in Statistics program at St. Thomas College, detailing various statistical analyses conducted by a student named Adhithyan O.S. It includes calculations and inferences for correlation coefficients, scatter plots, multiple correlation, and partial correlation based on collected data. The document serves as a certification of the student's original work for the course STA2CJ101: Bivariate Data Analysis during the academic year 2024-2025.

Uploaded by

adhithyan200418
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views22 pages

Adhithyan

The document is a practical record for the B.Sc. (Hons.) in Statistics program at St. Thomas College, detailing various statistical analyses conducted by a student named Adhithyan O.S. It includes calculations and inferences for correlation coefficients, scatter plots, multiple correlation, and partial correlation based on collected data. The document serves as a certification of the student's original work for the course STA2CJ101: Bivariate Data Analysis during the academic year 2024-2025.

Uploaded by

adhithyan200418
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

ST.

THOMAS COLLEGE (AUTONOMOUS)


THRISSUR

P.G & RESEARCH DEPARTMENT OF STATISTICS

PRACTICAL RECORD
of

STA2CJ101 –BIVARIATE DATA ANALYSIS

NAME: ADHITHYAN.O.S
ROLL.NO: ST02

II Semester
B.Sc. (Hons.) in STATISTICS
2024-25
ST. THOMAS COLLEGE (AUTONOMOUS)
THRISSUR
Affiliated to the University of Calicut

CERTIFICATE
Certified that this is a bona fide record of the original work done by
Adhithyan.o.s ST02 of the II semester B.Sc. (Hons.) in STATISTICS for the
practical of the course STA2CJ101: Bivariate Data Analysis during the year
2024-2025.

Dr. Jeena Joseph


Teacher in Charge

Submitted for the Examination held on....................................

Examiner 1:

Examiner 2:
Page
Sl. No Topic Number

1 Karl Pearson’s coefficient of Correlation

2 Spearman correlation coefficient

3 Scatter plot

4 Multiple correlation

5 Partial correlation

6 Point biserial correlation

7 Regression

8 Curve fitting of the form y=a+bx

9 Curve fitting of the form y=abx

10 Curve fitting of the form y=axb

11 Curve fitting of the form y=aebx

CONTENTS
KARL PEARSON’S COEFFICIENT OF CORELATION

1. A researcher collects data on daily temperature in degrees Celsius and the


number of cooldrinks sold at a beach shop. Determine the correlation coefficient.
TEMPERATUR COOLDRINKS
SL COOLDRINKS SL NO. E SOLD
NO. TEMPERATURE SOLD
16 25 185
1 22 150
17 26 198
2 24 155
18 19 147
3 26 160
19 31.5 215
4 17 190
20 24 178
5 26 180
21 27 198
6 23 175
22 21 160
7 25 140
23 28 230
8 24 185
24 23.5 173
9 17 145
25 26.8 193
10 32 210
26 18.9 142
11 24 175
27 29.8 214
12 27 195
28 22.8 178
13 20 152
29 25 181
14 29.5 225

Input:
x=c(22,24,26,17,26,23,25,24,17,32,24,27,20,29.5,25,26,19,31.5,24,27,21,28,23.5,2
6.8,18.9,29.8,22.8,25)
y=c(150,155,160,190,180,175,140,185,145,210,175,195,152,225,185,198,147,215,
178,198,160,230,173,173,142,214,178,181,)
cor(x, y, method = "pearson")

Output:
[1] 0.735503

Inference:
The karl Pearson’s coefficient of correlation of the following data is 0.735503.which
means there is an positive correlation between the temperature and no of cool
drinks sold

SPEARMAN RANK CORELATION


2. A company surveys 34 customers about a product. Each customer rates product
quality and therir satisfaction level on a scale of 1 to 100. Compute the spearman
rank correlation coefficient

PRODUCT SATISFACTION PRODUCT SATISFACTION


SL NO. QUALITY LEVEL SL NO. QUALITY LEVEL
1 67 78 18 63 70
2 23 30 19 36 45
3 89 92 20 95 97
4 45 55 21 15 20
5 12 18 22 75 82
6 91 95 23 42 48
7 56 60 24 56 90
8 78 85 25 36 38
9 34 40 26 60 65
10 98 99 27 25 27
11 30 35 28 93 94
12 69 80 29 53 58
13 50 52 30 69 75
14 82 88 31 48 50
15 19 22 32 80 86
16 59 60 33 22 30
17 70 79 34 29 32

Input:
X=c(67,23,89,45,12,91,56,78,34,98,30,69,50,82,19,59,70,63,36,95,15,75,42,56,36,6
0,25,93,53,69,48,80,22,29)
Y=c(78,30,92,55,18,95,60,85,40,99,35,80,52,88,22,60,79,70,45,97,20,82,48,90,38,6
5,27,94,58,75,50,86,30,32)
spearmans_corr <- cor(x, y, method = "spearman")
print(spearmans_corr)

Output:
[1] 0.726450

Inference:
There is a strong positive correlation between product quality and satisfaction level .

SCATTER PLOT
3.A organisation recorded the number of hours of 30 students studied and their
corresponding exam scores. Create a scatterplot with study hours on the x-axis and
exam scores on y-axis
SL NO. STUDY HOURS EXAM SCORES SL NO. STUDY HOURS EXAM SCORES
1 8 70 16 36 99
2 10 73 17 37 100
3 13 75 18 38 100
4 14 85 19 40 80
5 16 86 20 44 82
6 18 88 21 45 85
7 20 90 22 46 83
8 22 91 23 47 66
9 23 93 24 49 69
10 26 95 25 50 100
11 28 95 26 51 70
12 30 96 27 55 72
13 32 98 28 56 69
14 34 99 29 58 75
15 35 99 30 60 84

Input:
x=c(8,10,13,14,16,18,20,22,23,26,28,30,32,34,35,36,37,38,40,44,45,46,47,49,50,51
,55,56,58,60)
y=c(70,73,75,85,86,88,90,91,93,95,95,96,98,99,99,99,100,100,80,82,85,83,66,69,1
00,70,72,69,75,84)
plot(x,y,main="Scatter plot",xlab="Study hours",ylab="Exam
scores",col="blue",pch=19)

Output:

Inference: The plot shows slight positive correlation


MULTIPLE CORELATION
4. A researcher collected data on three variables—Variable X (predictor 1), Variable
Y (predictor 2), and Variable Z (dependent variable)—across 30 observations. Using
this dataset,Compute the multiple correlation coefficient RR between the dependent
variable Z and the two predictor variables X and Y.

Variable Variable Variable Variable Variable Variable


Observation Observation
X1 X2 Z X1 x2 Z
1 5.2 3.1 12.4 13 5.1 3.36 12.1
2 4.9 2.7 11.8 14 5.08 3.38 12.05
3 6.1 3.8 14.2 15 5.06 3.4 12
4 5.5 3.3 13.1 16 5.04 3.42 11.95
5 4.8 2.9 11.5 17 5.02 3.44 11.9
6 5.24 3.22 12.45 18 5 3.46 11.85
7 5.22 3.24 12.4 19 4.98 3.48 11.8
8 5.2 3.26 12.35 20 4.96 3.5 11.75
9 5.18 3.28 12.3 21 4.94 3.52 11.7
10 5.16 3.3 12.25 22 4.92 3.54 11.65
11 5.14 3.32 12.2 23 4.9 3.56 11.6
12 5.12 3.34 12.15 24 4.88 3.58 11.55

Input:
data <-
data.frame(x1=c(5.1,5.08,5.06,5.04,5.02,5,4.98,4.96,4.94,4.92,4.9,4.88,5.1,5.08,5.0
6,5.04,5.02,5,4.98,4.96,4.94,4.92,4.9,4.88),
x2=c(3.1,2.7,3.8,3.3,2.9,3.22,3.24,3.26,3.28,3.3,3.32,3.34,3.36,3.38,3.4,3.42,3.44,3.
46,3.48,3.5,3.52,3.54,3.56,3.58),
z=c(12.4,11.8,14.2,13.1,11.5,12.45,12.4,12.35,12.3,12.25,12.2,12.15,12.1,12.05,12,
11.95,11.9,11.85,11.8,11.75,11.7,11.65,11.6,11.55))
model <- lm(y ~ x1 + x2, data = data)
r_squared <- summary(model)$r.squared
multiple_correlation <- sqrt(r_squared)
print(multiple_correlation)

Output:
[1] 0.312962

Inference:
The multiple correlation coefficient (R) between variable(x1) and the variable(X2)
and Number of Bedrooms (z) is 0.312962

This indicates a slight positive relationship.

PARTIAL CORELATION
5. A doctor is studying the relationship between sleep time and reaction time, while
controlling for age. Calculate its partial correlation

SL SLEEP REACTION AGE SL SLEEP REACTION AGE


NO. DURATION TIME NO. DURATION TIME

1 7 200 30 16 7 265 33

2 7 250 35 17 6 315 38

3 7 200 28 18 8 215 26

4 8 300 40 19 5 365 43

5 8 200 25 20 9 195 23

6 9 260 32 21 7 258 32

7 6 310 37 22 6 308 37

8 8 210 29 23 8 208 27

9 5 360 42 24 5 358 42

10 9 190 26 25 9 188 22

11 7 255 31 26 7 268 33

12 6 305 36 27 6 318 38

13 8 205 27 28 8 200 28

14 5 355 41 29 5 360 43

15 9 185 24 30 9 190 23

Input:
install.packages("ppcor")
library(ppcor)
data <- data.frame(x = c(7,7,7,8,8,9,6,8,5,9,7,6,8,5,9,7,6,8,5,9,7,6,8,5,9,7,6,8,5,9),
y=c(200,250,200,300,200,260,310,210,360,190,255,305,205,355,185,265,315,215,
365,195,258,308,208,358,188,268,318,200,360,190),
z=c(30,35,28,40,25,32,37,29,42,26,31,36,27,41,24,33,38,26,43,23,32,37,27,42,22,3
3,38,28,43,23))
partial_corr <- pcor.test(data$x,data$y,data$z)
print(partial_corr)

Output:
estimate p.value statistic n gp Method
1 -0.2747443 0.1491924 -1.48475 30 1 pearson

Inference: The partial correlation coefficient between Work Experience and Monthly
Salary, while controlling for Education Level, is -0.2747443
This indicate slight negative correlation
POINT BISERIAL CORRELATION
6) Find the correlation between gender (male/female) and test scores

(male =1 and female =o)

X(Gender) Y(Testscore)
Y(Test
1 score)75
0 88
1 80
0 92
1 67
0 85
1 78
0 90
1 70
0 95
1 82
0 91
1 68
0 87
1 74
0 94
1 66
0 93
1 77
0 89
1 79
0 86
1 72
0 92
1 65
0 90
1 80
0 96
1 73
0 88
X(Gender)

1 85

0 84

1 76

0 91

1 69
0 89

1 74

0 93

1 78

0 97

1 71

0 85

1 63

0 82

1 68

0 80

1 75

0 94

Input:

X <- c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,

1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,

1, 0, 1, 0, 1, 0, 1, 0, 1, 0) # X (1 = Male, 0 = Female)

Y <- c(78, 91, 82, 95, 70, 87, 76, 92, 68, 97, 80, 90, 67, 85, 74, 93,

65, 94, 79, 89, 81, 86, 72, 91, 63, 88, 77, 96, 75, 85, 84, 83,

73, 90, 71, 89, 69, 92, 78, 98, 66, 84, 61, 79, 64, 81, 70, 95) # Y (Test scores)

r_pb <- biserial(Y, X)

print(r_pb)
Output:

-0.8296

Inference

There is a strong negative correlation between gender and test scores.

REGRESSION

7. The time and speed of 20 objects are given below find their regression value

X Y X Y X Y X Y
1 2.1 42 43.9 66 71.9 89 103
2 2.9 43 44.7 67 73.2 90 104.3
3 4.5 44 45.5 68 74.8 91 105.8
4 5 45 47.2 69 75.5 92 107.3
5 5.7 46 48 70 76.9 93 108.9
6 7.2 47 49.3 71 78.3 94 110.2
7 7.9 48 50.2 72 79.7 95 111.7
8 9.1 49 51.7 73 80.9 96 113
9 9.8 50 52.9 74 82.2 97 114.5
10 11.5 51 53.5 75 83.5 98 115.9
11 12.3 52 54.8 76 85 99 117.3
12 13.1 53 56.1 77 86.3 100 118.9
13 13.9 54 57.3 78 87.5 101 112.2
14 14.8 55 58.9 79 88.9 102 113.3
15 16.2 56 59.5 80 90.3 103 114.5
16 16.9 57 60.8 81 91.7 104 116.8
17 18.1 58 62.1 82 93.2 105 117.8
18 18.8 59 63.7 83 94.7 106 118.6
19 19.5 60 64.5 84 96 107 119.3
20 21 61 65.2 85 97.2 108 120.3
21 21.8 62 66.9 86 98.9 109 121.3
22 22.7 63 67.5 87 100.1 110 121.6
23 24.3 64 69.3 88 101.5 111 125.8
24 24.9 65 70.5 89 103 112 126.4

Input:

x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
80, 81, 82, 83, 84, 85, 86, 87, 88, 89)

y=c(2.1, 2.9, 4.5, 5, 5.7, 7.2, 7.9, 9.1, 9.8, 11.5, 12.3, 13.1, 13.9, 14.8, 16.2, 16.9,
18.1, 18.8, 19.5, 21, 21.8, 22.7, 24.3, 24.9, 43.9, 44.7, 45.5, 47.2, 48, 49.3, 50.2,
51.7, 52.9, 53.5, 54.8, 56.1, 57.3, 58.9, 59.5, 60.8, 62.1, 63.7, 64.5, 65.2, 66.9, 67.5,
69.3, 70.5, 71.9, 73.2, 74.8, 75.5, 76.9, 78.3, 79.7, 80.9, 82.2, 83.5, 85, 86.3, 87.5,
88.9, 90.3, 91.7, 93.2, 94.7, 96, 97.2, 98.9, 100.1, 101.5, 103, 103, 104.3, 105.8,
107.3, 108.9, 110.2, 111.7, 113, 114.5, 115.9, 117.3, 118.9, 112.2, 113.3, 114.5,
116.8, 117.8, 118.6, 119.3, 120.3, 121.3, 121.6, 125.8, 126.4)

model=m(x~y)

summary(model)

Output:

Coefficients:

Estimate

(Intercept) 9.47735

y 0.75654

Inference :
Regression equation=9.477+0.7565*x

CURVE FITTING OF THE FORM: y = a + bx

8. A researcher is studying the relationship between the number of hours studied


and the score obtained on a test. The following data is collected

Hours
Studied(x
) 1 2 3 4 5 6 7 8 9 10
Test
Score(y) 55 60 65 75 80 85 90 95 100 95

Using the least squares method, find the values of a and b for the equation y = a +
bx that best fit the data.

Input:
x=c(1,2,3,4,5,6,7,8,9,10)
y=c(60,65,70,75,80,85,90,95,100,95)

model=lm(y~x)

summary(model)

coefficients=coef(model)

a=coefficients[1]

b=coefficients[2]

cat("fitted curve:y=",a,"+",b,"*x\n")

plot(x,y,main="curve fit:y=a=bx",xlab="x",ylab="y",pch=17,col="blue")

abline(model,col="green",lwd=3)

output:

Inference:

fitted curve:y= 57 + 4.454545 *x


Time in hours (x) Population(y)
0 200
1 220
2 240
3 265
4 290
5 320
6 339
7 369
8 387
9 410
CURVE 10 434 FITTING OF
THE FORM: 11 458 y = abx
12 482
9. A biologist 13 506 is studying
the growth of 14 530 a population
of bacteria. 15 554 The number
of bacteria at different
16 577
times is recorded as
17 601
follows:
18 625
19 649
20 673
Using the form y = ab^x , determine the values of a and b that best fit the data using
logarithmic transformation.

Input:

x <- c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

y <- c(20,22,24,26,29,32,33,36,37,41,43,45,48,50,53,55,57,60,62,65,67)

log_y <- log(y)

model <- lm(log_y ~ x)

coefficients <- coef(model)

log_a <- coefficients[1]

log_b <- coefficients[2]

a <- exp(log_a)

b <- exp(log_b)

cat("Fitted curve: y =", a, "*", b, "^x\n")

plot(x, y, main = "Curve Fit: y = a * b^x", xlab = "x", ylab = "y", pch = 19, col = "blue")

y_fitted <- a * b^x

lines(x, y_fitted, col = "red", lwd = 2)

output:
Inference:

Fitted curve: y = 22.42318 * 1.060931 ^x

CURVE FITTING OF THE FORM: y = axb

10. A researcher is studying the relationship between the pressure and volume of a
gas in a sealed container, and believes that the relationship follows a power law of
the form , where and are constants to be determined.

The following data was collected:

Volume (x) pressure(y)


20 5
19 4.9
18 4.7
17 4.5
16 4.2
15 4
14 3.8
13 3.6
12 3.3
11 3.17
10 2.96
9 2.75
8 2.54
7 2.33
6 2.12
5 1.91
4 1.71
3 1.49
2 1.28
1 1.07

Using the form y = axb, determine the values of a and b that best fit the data using
logarithmic transformation

Input:

x <- c(0,1,2,3,4,5,6,7,8,9)

y <- c(50,65,85,110,145,190,250,325,420,550)

log_y <- log(y)

model <- lm(log_y ~ x)

coefficients <- coef(model)

log_a <- coefficients[1]

b <- coefficients[2]

a <- exp(log_a)

cat("Fitted curve: y =", a, "* e^(", b, "* x)\n")

plot(x, y, main = "Curve Fit: y = ae^(bx)", xlab = "x", ylab = "y", pch = 19, col = "blue")

y_fitted <- a * exp(b * x)

lines(x, y_fitted, col = "red", lwd = 2)


Inference:
Fitted curve: y = 0.8526409 * x^ 0.560916

CURVE FITTING OF THE FORM: y = aebx

11. A chemical reaction follows a rate of decay described by the equation , where is
the amount of reactant remaining at time , and are constants. The following data
points were recorded during the reaction:

Time amount of
(minutes)x reactant
0 250
2 225
4 212
6 200
8 190
10 182
12 170
14 165
16 160
18 155
20 150
22 145
24 140
26 135
28 130
30 125
32 120
34 115
36 110
38 105

Find the values of and that best fit the equation to the data provided

Input:

x <- c(0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38)

y <-
c(250,225,212,200,190,182,170,165,160,155,150,145,140,135,130,125,120,115,110
,105)

log_y <- log(y)

model <- lm(log_y ~ x)

coefficients <- coef(model)

log_a <- coefficients[1]

b <- coefficients[2]

a <- exp(log_a)

cat("Fitted curve: y =", a, "* e^(", b, "* x)\n")

plot(x, y, main = "Curve Fit: y = ae^(bx)", xlab = "x", ylab = "y", pch = 19, col = "blue")

y_fitted <- a * exp(b * x)

lines(x, y_fitted, col = "red", lwd = 2)

Output:
Inference:

Fitted curve: y = 228.8132 * e^( -0.02062442 * x)

You might also like