Adhithyan
Adhithyan
PRACTICAL RECORD
of
NAME: ADHITHYAN.O.S
ROLL.NO: ST02
II Semester
B.Sc. (Hons.) in STATISTICS
2024-25
ST. THOMAS COLLEGE (AUTONOMOUS)
THRISSUR
Affiliated to the University of Calicut
CERTIFICATE
Certified that this is a bona fide record of the original work done by
Adhithyan.o.s ST02 of the II semester B.Sc. (Hons.) in STATISTICS for the
practical of the course STA2CJ101: Bivariate Data Analysis during the year
2024-2025.
Examiner 1:
Examiner 2:
Page
Sl. No Topic Number
3 Scatter plot
4 Multiple correlation
5 Partial correlation
7 Regression
CONTENTS
KARL PEARSON’S COEFFICIENT OF CORELATION
Input:
x=c(22,24,26,17,26,23,25,24,17,32,24,27,20,29.5,25,26,19,31.5,24,27,21,28,23.5,2
6.8,18.9,29.8,22.8,25)
y=c(150,155,160,190,180,175,140,185,145,210,175,195,152,225,185,198,147,215,
178,198,160,230,173,173,142,214,178,181,)
cor(x, y, method = "pearson")
Output:
[1] 0.735503
Inference:
The karl Pearson’s coefficient of correlation of the following data is 0.735503.which
means there is an positive correlation between the temperature and no of cool
drinks sold
Input:
X=c(67,23,89,45,12,91,56,78,34,98,30,69,50,82,19,59,70,63,36,95,15,75,42,56,36,6
0,25,93,53,69,48,80,22,29)
Y=c(78,30,92,55,18,95,60,85,40,99,35,80,52,88,22,60,79,70,45,97,20,82,48,90,38,6
5,27,94,58,75,50,86,30,32)
spearmans_corr <- cor(x, y, method = "spearman")
print(spearmans_corr)
Output:
[1] 0.726450
Inference:
There is a strong positive correlation between product quality and satisfaction level .
SCATTER PLOT
3.A organisation recorded the number of hours of 30 students studied and their
corresponding exam scores. Create a scatterplot with study hours on the x-axis and
exam scores on y-axis
SL NO. STUDY HOURS EXAM SCORES SL NO. STUDY HOURS EXAM SCORES
1 8 70 16 36 99
2 10 73 17 37 100
3 13 75 18 38 100
4 14 85 19 40 80
5 16 86 20 44 82
6 18 88 21 45 85
7 20 90 22 46 83
8 22 91 23 47 66
9 23 93 24 49 69
10 26 95 25 50 100
11 28 95 26 51 70
12 30 96 27 55 72
13 32 98 28 56 69
14 34 99 29 58 75
15 35 99 30 60 84
Input:
x=c(8,10,13,14,16,18,20,22,23,26,28,30,32,34,35,36,37,38,40,44,45,46,47,49,50,51
,55,56,58,60)
y=c(70,73,75,85,86,88,90,91,93,95,95,96,98,99,99,99,100,100,80,82,85,83,66,69,1
00,70,72,69,75,84)
plot(x,y,main="Scatter plot",xlab="Study hours",ylab="Exam
scores",col="blue",pch=19)
Output:
Input:
data <-
data.frame(x1=c(5.1,5.08,5.06,5.04,5.02,5,4.98,4.96,4.94,4.92,4.9,4.88,5.1,5.08,5.0
6,5.04,5.02,5,4.98,4.96,4.94,4.92,4.9,4.88),
x2=c(3.1,2.7,3.8,3.3,2.9,3.22,3.24,3.26,3.28,3.3,3.32,3.34,3.36,3.38,3.4,3.42,3.44,3.
46,3.48,3.5,3.52,3.54,3.56,3.58),
z=c(12.4,11.8,14.2,13.1,11.5,12.45,12.4,12.35,12.3,12.25,12.2,12.15,12.1,12.05,12,
11.95,11.9,11.85,11.8,11.75,11.7,11.65,11.6,11.55))
model <- lm(y ~ x1 + x2, data = data)
r_squared <- summary(model)$r.squared
multiple_correlation <- sqrt(r_squared)
print(multiple_correlation)
Output:
[1] 0.312962
Inference:
The multiple correlation coefficient (R) between variable(x1) and the variable(X2)
and Number of Bedrooms (z) is 0.312962
PARTIAL CORELATION
5. A doctor is studying the relationship between sleep time and reaction time, while
controlling for age. Calculate its partial correlation
1 7 200 30 16 7 265 33
2 7 250 35 17 6 315 38
3 7 200 28 18 8 215 26
4 8 300 40 19 5 365 43
5 8 200 25 20 9 195 23
6 9 260 32 21 7 258 32
7 6 310 37 22 6 308 37
8 8 210 29 23 8 208 27
9 5 360 42 24 5 358 42
10 9 190 26 25 9 188 22
11 7 255 31 26 7 268 33
12 6 305 36 27 6 318 38
13 8 205 27 28 8 200 28
14 5 355 41 29 5 360 43
15 9 185 24 30 9 190 23
Input:
install.packages("ppcor")
library(ppcor)
data <- data.frame(x = c(7,7,7,8,8,9,6,8,5,9,7,6,8,5,9,7,6,8,5,9,7,6,8,5,9,7,6,8,5,9),
y=c(200,250,200,300,200,260,310,210,360,190,255,305,205,355,185,265,315,215,
365,195,258,308,208,358,188,268,318,200,360,190),
z=c(30,35,28,40,25,32,37,29,42,26,31,36,27,41,24,33,38,26,43,23,32,37,27,42,22,3
3,38,28,43,23))
partial_corr <- pcor.test(data$x,data$y,data$z)
print(partial_corr)
Output:
estimate p.value statistic n gp Method
1 -0.2747443 0.1491924 -1.48475 30 1 pearson
Inference: The partial correlation coefficient between Work Experience and Monthly
Salary, while controlling for Education Level, is -0.2747443
This indicate slight negative correlation
POINT BISERIAL CORRELATION
6) Find the correlation between gender (male/female) and test scores
X(Gender) Y(Testscore)
Y(Test
1 score)75
0 88
1 80
0 92
1 67
0 85
1 78
0 90
1 70
0 95
1 82
0 91
1 68
0 87
1 74
0 94
1 66
0 93
1 77
0 89
1 79
0 86
1 72
0 92
1 65
0 90
1 80
0 96
1 73
0 88
X(Gender)
1 85
0 84
1 76
0 91
1 69
0 89
1 74
0 93
1 78
0 97
1 71
0 85
1 63
0 82
1 68
0 80
1 75
0 94
Input:
X <- c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0) # X (1 = Male, 0 = Female)
Y <- c(78, 91, 82, 95, 70, 87, 76, 92, 68, 97, 80, 90, 67, 85, 74, 93,
65, 94, 79, 89, 81, 86, 72, 91, 63, 88, 77, 96, 75, 85, 84, 83,
73, 90, 71, 89, 69, 92, 78, 98, 66, 84, 61, 79, 64, 81, 70, 95) # Y (Test scores)
print(r_pb)
Output:
-0.8296
Inference
REGRESSION
7. The time and speed of 20 objects are given below find their regression value
X Y X Y X Y X Y
1 2.1 42 43.9 66 71.9 89 103
2 2.9 43 44.7 67 73.2 90 104.3
3 4.5 44 45.5 68 74.8 91 105.8
4 5 45 47.2 69 75.5 92 107.3
5 5.7 46 48 70 76.9 93 108.9
6 7.2 47 49.3 71 78.3 94 110.2
7 7.9 48 50.2 72 79.7 95 111.7
8 9.1 49 51.7 73 80.9 96 113
9 9.8 50 52.9 74 82.2 97 114.5
10 11.5 51 53.5 75 83.5 98 115.9
11 12.3 52 54.8 76 85 99 117.3
12 13.1 53 56.1 77 86.3 100 118.9
13 13.9 54 57.3 78 87.5 101 112.2
14 14.8 55 58.9 79 88.9 102 113.3
15 16.2 56 59.5 80 90.3 103 114.5
16 16.9 57 60.8 81 91.7 104 116.8
17 18.1 58 62.1 82 93.2 105 117.8
18 18.8 59 63.7 83 94.7 106 118.6
19 19.5 60 64.5 84 96 107 119.3
20 21 61 65.2 85 97.2 108 120.3
21 21.8 62 66.9 86 98.9 109 121.3
22 22.7 63 67.5 87 100.1 110 121.6
23 24.3 64 69.3 88 101.5 111 125.8
24 24.9 65 70.5 89 103 112 126.4
Input:
x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
80, 81, 82, 83, 84, 85, 86, 87, 88, 89)
y=c(2.1, 2.9, 4.5, 5, 5.7, 7.2, 7.9, 9.1, 9.8, 11.5, 12.3, 13.1, 13.9, 14.8, 16.2, 16.9,
18.1, 18.8, 19.5, 21, 21.8, 22.7, 24.3, 24.9, 43.9, 44.7, 45.5, 47.2, 48, 49.3, 50.2,
51.7, 52.9, 53.5, 54.8, 56.1, 57.3, 58.9, 59.5, 60.8, 62.1, 63.7, 64.5, 65.2, 66.9, 67.5,
69.3, 70.5, 71.9, 73.2, 74.8, 75.5, 76.9, 78.3, 79.7, 80.9, 82.2, 83.5, 85, 86.3, 87.5,
88.9, 90.3, 91.7, 93.2, 94.7, 96, 97.2, 98.9, 100.1, 101.5, 103, 103, 104.3, 105.8,
107.3, 108.9, 110.2, 111.7, 113, 114.5, 115.9, 117.3, 118.9, 112.2, 113.3, 114.5,
116.8, 117.8, 118.6, 119.3, 120.3, 121.3, 121.6, 125.8, 126.4)
model=m(x~y)
summary(model)
Output:
Coefficients:
Estimate
(Intercept) 9.47735
y 0.75654
Inference :
Regression equation=9.477+0.7565*x
Hours
Studied(x
) 1 2 3 4 5 6 7 8 9 10
Test
Score(y) 55 60 65 75 80 85 90 95 100 95
Using the least squares method, find the values of a and b for the equation y = a +
bx that best fit the data.
Input:
x=c(1,2,3,4,5,6,7,8,9,10)
y=c(60,65,70,75,80,85,90,95,100,95)
model=lm(y~x)
summary(model)
coefficients=coef(model)
a=coefficients[1]
b=coefficients[2]
cat("fitted curve:y=",a,"+",b,"*x\n")
plot(x,y,main="curve fit:y=a=bx",xlab="x",ylab="y",pch=17,col="blue")
abline(model,col="green",lwd=3)
output:
Inference:
Input:
x <- c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
y <- c(20,22,24,26,29,32,33,36,37,41,43,45,48,50,53,55,57,60,62,65,67)
a <- exp(log_a)
b <- exp(log_b)
plot(x, y, main = "Curve Fit: y = a * b^x", xlab = "x", ylab = "y", pch = 19, col = "blue")
output:
Inference:
10. A researcher is studying the relationship between the pressure and volume of a
gas in a sealed container, and believes that the relationship follows a power law of
the form , where and are constants to be determined.
Using the form y = axb, determine the values of a and b that best fit the data using
logarithmic transformation
Input:
x <- c(0,1,2,3,4,5,6,7,8,9)
y <- c(50,65,85,110,145,190,250,325,420,550)
b <- coefficients[2]
a <- exp(log_a)
plot(x, y, main = "Curve Fit: y = ae^(bx)", xlab = "x", ylab = "y", pch = 19, col = "blue")
11. A chemical reaction follows a rate of decay described by the equation , where is
the amount of reactant remaining at time , and are constants. The following data
points were recorded during the reaction:
Time amount of
(minutes)x reactant
0 250
2 225
4 212
6 200
8 190
10 182
12 170
14 165
16 160
18 155
20 150
22 145
24 140
26 135
28 130
30 125
32 120
34 115
36 110
38 105
Find the values of and that best fit the equation to the data provided
Input:
x <- c(0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38)
y <-
c(250,225,212,200,190,182,170,165,160,155,150,145,140,135,130,125,120,115,110
,105)
b <- coefficients[2]
a <- exp(log_a)
plot(x, y, main = "Curve Fit: y = ae^(bx)", xlab = "x", ylab = "y", pch = 19, col = "blue")
Output:
Inference: