Assignment # 05
Assignment # 05
R Notebook
This is an R Markdown (http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the
results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and
pressing Ctrl+Shift+Enter.
plot(cars)
getwd()
## [1] "C:/Users/LENOVO/Documents/Statistics/Exercises"
Assignment-05
Exercise-46
Load the data
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 1/31
11/20/24, 2:35 PM R Notebook
head(OHS_data)
6 rows
str(OHS_data)
Performed Wilcoxon signed-rank test with approximate p-value (no exact p-value)
print(wilcox_time1_time2)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 2/31
11/20/24, 2:35 PM R Notebook
##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data$OHS_1 and OHS_data$OHS_2
## V = 45.5, p-value = 0.08502
## alternative hypothesis: true location shift is not equal to 0
print(wilcox_time2_time3)
##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data$OHS_2 and OHS_data$OHS_3
## V = 57.5, p-value = 0.6041
## alternative hypothesis: true location shift is not equal to 0
print(wilcox_time1_time3)
##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data$OHS_1 and OHS_data$OHS_3
## V = 28, p-value = 0.07343
## alternative hypothesis: true location shift is not equal to 0
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 3/31
11/20/24, 2:35 PM R Notebook
## [1] "Wilcoxon Test for OHS_1 vs OHS_2 (after removing zero differences):"
print(wilcox_time1_time2)
##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data_filtered$OHS_1 and OHS_data_filtered$OHS_2
## V = 21.5, p-value = 0.3269
## alternative hypothesis: true location shift is not equal to 0
## [1] "Wilcoxon Test for OHS_2 vs OHS_3 (after removing zero differences):"
print(wilcox_time2_time3)
##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data_filtered$OHS_2 and OHS_data_filtered$OHS_3
## V = 23, p-value = 0.3978
## alternative hypothesis: true location shift is not equal to 0
## [1] "Wilcoxon Test for OHS_1 vs OHS_3 (after removing zero differences):"
print(wilcox_time1_time3)
##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data_filtered$OHS_1 and OHS_data_filtered$OHS_3
## V = 16, p-value = 0.1422
## alternative hypothesis: true location shift is not equal to 0
Using Box plot to visualize the difference in OHS-1, OHS-2 and OHS-3 with p-value
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 4/31
11/20/24, 2:35 PM R Notebook
library(ggplot2)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 5/31
11/20/24, 2:35 PM R Notebook
Conclusion:
#Null Hypothesis (H0):The happiness scores of students are the same (no significant difference)
across the three time points.
# Based on the results of the Wilcoxon signed-rank tests, we can conclude that there are no stat
istically significant differences in the happiness scores of students between the three time poi
nts: OHS_1, OHS_2, and OHS_3. The p-values for all comparisons (OHS_1 vs OHS_2, OHS_2 vs OHS_3,
and OHS_1 vs OHS_3) are all greater than the typical significance threshold of 0.05.Thus, it can
be concluded that we are accepting the Null Hypothesis that students' happiness levels remained
consistent over time during the observed periods.
Exercise-49
Load the data
ICM<-read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\ICM.txt",stringsAsFactors=
F)
head(ICM)
summary(ICM)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 6/31
11/20/24, 2:35 PM R Notebook
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 7/31
11/20/24, 2:35 PM R Notebook
##
## Wilcoxon rank sum test with continuity correction
##
## data: OHS by Siblings
## W = 1956.5, p-value = 0.9803
## alternative hypothesis: true location shift is not equal to 0
Box Plot to visualize difference between Communication style of students with siblings and students
without siblings
library(ggplot2)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 8/31
11/20/24, 2:35 PM R Notebook
Conclusion:
#Null Hypothesis (𝐻0): The communication styles of students with siblings and without siblings h
ave identical distributions.
#Alternative Hypothesis (𝐻a): The communication styles of students with siblings and without sib
lings do not have identical distributions.
#Since the p-value (0.9803) is greater than 0.05, we fail to reject the null hypothesis. There i
s no significant difference in the distributions (or medians) of OHS by Siblings. This suggests
that the location shift is likely to be 0.
Exercise-50
Load the dataset
Check the summary statistics to get an idea of the mental health scores
summary(ICM)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 9/31
11/20/24, 2:35 PM R Notebook
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 10/31
11/20/24, 2:35 PM R Notebook
##
## Wilcoxon rank sum test with continuity correction
##
## data: Mentalhealth by Children
## W = 2032.5, p-value = 0.09124
## alternative hypothesis: true location shift is not equal to 0
Create a boxplot to visualize the mental health scores by Has Children group
library(ggplot2)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 11/31
11/20/24, 2:35 PM R Notebook
Conclusion:
#Null Hypothesis (𝐻0): The mental health of students with children and students without children
have identical data distribution.
#Alternative Hypothesis (𝐻a): TThe mental health of students with children and students without
children do not have identical distributions.
#Since the p-value (0.09124) is greater than 0.05, we fail to reject the null hypothesis. There
is no statistically significant difference in the distributions (or medians) of mental health sc
ores between individuals with children and those without children. However, the p-value is close
to 0.05, suggesting a trend that may warrant further investigation with a larger sample size
Exercise-53
Load the dataset
Check for summary statistics to get an idea of the mental health scores
summary(ICM)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 12/31
11/20/24, 2:35 PM R Notebook
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 13/31
11/20/24, 2:35 PM R Notebook
unique(ICM$Socialmediahours)
##
## Wilcoxon rank sum test with continuity correction
##
## data: NegativeMood by Socialmediahours_bin
## W = 2014.5, p-value = 0.003768
## alternative hypothesis: true location shift is not equal to 0
Create a boxplot to visualize the negative mood scores by social media use
library(ggplot2)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 14/31
11/20/24, 2:35 PM R Notebook
Conclusion
#Null Hypothesis (H0):The negative mood scores of students do not differ depending on social med
ia use (i.e., they have identical distributions).
#Alternative Hypothesis(Ha):The negative mood of students does not have identical data distribut
ions depending on the social media use
# There is sufficient evidence to reject the null hypothesis and conclude that there is a signif
icant difference in the distribution of Negative Mood between the groups defined by Social media
hours_bin. This suggests that social media usage has an impact on mood.
Exercise-54
Load the data
Check for summary statistics to get an idea of the mental health scores
summary(ICM)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 15/31
11/20/24, 2:35 PM R Notebook
Group “Low” if time spent is less than or equal to 10 hours, else “High”
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 16/31
11/20/24, 2:35 PM R Notebook
table(ICM$TimeGroupBinary)
##
## High Low
## 152 47
wilcox.res
##
## Wilcoxon rank sum test with continuity correction
##
## data: Socialization by TimeGroupBinary
## W = 4673, p-value = 0.0001855
## alternative hypothesis: true location shift is not equal to 0
Create a boxplot to visualize the socialization scores by time spent with friends
library(ggplot2)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 17/31
11/20/24, 2:35 PM R Notebook
Conclusion:
#Null Hypothesis (H0):The socialization of students do not differ (has identical data distribut
ions) depending on the time spent with friends
#Alternative Hypothesis(Ha):The socialization of students differ (does not have identical data d
istribution) depending on the time spent with friends
#The p-value is significantly smaller than 0.05, we can reject the null hypothesis and conclude
that there is a significant difference in the distribution of Socialization between the two grou
ps defined by Time Group Binary. This means that Time Group Binary (likely indicating different
time-based categories, such as morning vs evening or high vs low time periods) has a significant
effect on Socialization.
#Exercise-56
survey<-read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\survey_PCA.txt",
stringsAsFactors=F)
head(survey, 3)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 18/31
11/20/24, 2:35 PM R Notebook
1 2 male 10 48 3 1 1 2 4
2 3 female 10 47 1 3 1 2 3
3 4 female 12 22 3 3 3 3 2
library(ggpubr)
ggdensity(survey$openness,
main = "Density Plot of Openness",
xlab = "Openness",
fill = "blue",
color = "darkblue",
alpha = 0.4)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 19/31
11/20/24, 2:35 PM R Notebook
library(ggpubr)
ggqqplot(survey$openness,
title = "QQ Plot of Openness",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles",
color = "steelblue",
shape = 21,
fill = "lightblue",
size = 2) +
theme_minimal()
library(gridExtra)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 20/31
11/20/24, 2:35 PM R Notebook
library(ggplot2)
library(gridExtra)
# QQ plot
qqplot <- ggplot(survey, aes(sample = openness)) +
geom_qq(aes(color = "Points"), size = 2, alpha = 0.7) +
geom_qq_line(color = "blue", size = 1, linetype = "dashed") +
theme_minimal() +
labs(title = "QQ Plot of Openness") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Density plot
densityplot <- ggplot(survey, aes(openness)) +
geom_density(aes(fill = "Density", color = "Outline"), alpha = 0.4, size = 1) +
scale_fill_manual(values = c("Density" = "lightblue")) +
scale_color_manual(values = c("Outline" = "blue")) +
theme_minimal() +
labs(title = "Density Plot of Openness", x = "Openness", y = "Density") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)
# Box plot
boxplot <- ggplot(survey, aes(y = openness)) +
geom_boxplot(fill = "lightblue", color = "darkblue", alpha = 0.7, outlier.color = "black", out
lier.size = 3) +
theme_minimal() +
labs(title = "Box Plot of Openness", x = NULL, y = "Openness") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14)
)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 21/31
11/20/24, 2:35 PM R Notebook
shapiro.test(survey$openness)
##
## Shapiro-Wilk normality test
##
## data: survey$openness
## W = 0.97794, p-value = 0.08856
Conclusion
#Null Hypothesis (H₀):The variable "openness" follows a normal distribution.
#Alternative Hypothesis (H₁):The variable "openness" does not follow a normal distribution.
#From the output of the Shapiro-Wilk test, the p-value (0.08856) is greater than 0.05. This indi
cates that the data do not significantly deviate from a normal distribution. Therefore, we acce
pt the null hypothesis and can assume that the data are approximately normally distributed.
Exercise-57
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 22/31
11/20/24, 2:35 PM R Notebook
survey<-read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\survey_PCA.txt",
stringsAsFactors=F)
head(survey, 3)
1 2 male 10 48 3 1 1 2 4
2 3 female 10 47 1 3 1 2 3
3 4 female 12 22 3 3 3 3 2
library(ggpubr)
ggdensity(survey$compatibility,
main = "Density Plot of compatibility",
xlab = "Compatibility",
fill = "skyblue",
alpha = 0.4)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 23/31
11/20/24, 2:35 PM R Notebook
library(ggpubr)
ggqqplot(survey$compatibility,
title = "QQ Plot of Compatibility",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles",
color = "darkblue",
shape = 21,
fill = "skyblue",
size = 2) +
theme_minimal()
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 24/31
11/20/24, 2:35 PM R Notebook
library(gridExtra)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 25/31
11/20/24, 2:35 PM R Notebook
library(ggplot2)
library(gridExtra)
# QQ plot
qqplot <- ggplot(survey, aes(sample = compatibility)) +
geom_qq(aes(color = "Points"), size = 2, alpha = 0.7) +
geom_qq_line(color = "darkblue", size = 1, linetype = "dashed") +
theme_minimal() +
labs(title = "QQ Plot of Compatibility") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)
# Density plot
densityplot <- ggplot(survey, aes(compatibility)) +
geom_density(aes(fill = "Density", color = "Outline"), alpha = 0.4, size = 1) +
scale_fill_manual(values = c("Density" = "skyblue")) +
scale_color_manual(values = c("Outline" = "darkblue")) +
theme_minimal() +
labs(title = "Density Plot of compatibility", x = "compatibility", y = "Density") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)
# Box plot
boxplot <- ggplot(survey, aes(y = compatibility)) +
geom_boxplot(fill = "skyblue", color = "darkblue", alpha = 0.7, outlier.color = "purple", outl
ier.size = 3) +
theme_minimal() +
labs(title = "Box Plot of compatibility", x = NULL, y = "compatibility") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14)
)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 26/31
11/20/24, 2:35 PM R Notebook
shapiro.test(survey$compatibility)
##
## Shapiro-Wilk normality test
##
## data: survey$compatibility
## W = 0.97105, p-value = 0.02543
#coclusion
#Alternative Hypothesis (H₁):The variable "compatibility" does not follow a normal distribution.
#From the output of the Shapiro-Wilk test, the p-value (0.02543) is less than 0.05. We reject th
e null hypothesis. This indicates that the data significantly deviate from a normal distributio
n, suggesting that the variable "compatibility" does not follow a normal distribution.
#Exercise-58
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 27/31
11/20/24, 2:35 PM R Notebook
survey<-read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\survey_PCA.txt",
stringsAsFactors=F)
summary(survey$conscientiousness)
library(ggpubr)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 28/31
11/20/24, 2:35 PM R Notebook
library(ggpubr)
ggqqplot(survey$conscientiousness,
title = "Q-Q Plot of Conscientiousness",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles",
color = "darkgreen",
shape = 21,
fill = "lightgreen",
size = 2) +
theme_minimal()
library(gridExtra)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 29/31
11/20/24, 2:35 PM R Notebook
library(ggplot2)
library(gridExtra)
# QQ plot
qqplot <- ggplot(survey, aes(sample = conscientiousness)) +
geom_qq(aes(color = "Points"), size = 2, alpha = 0.7) +
geom_qq_line(color = "darkgreen", size = 1, linetype = "dashed") +
theme_minimal() +
labs(title = "QQ Plot of conscientiousness") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)
# Density plot
densityplot <- ggplot(survey, aes(conscientiousness)) +
geom_density(aes(fill = "Density", color = "Outline"), alpha = 0.4, size = 1) +
scale_fill_manual(values = c("Density" = "darkgreen")) +
scale_color_manual(values = c("Outline" = "lightgreen")) +
theme_minimal() +
labs(title = "Density Plot of conscientiousness", x = "conscientiousness", y = "Density") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)
# Box plot
boxplot <- ggplot(survey, aes(y = conscientiousness)) +
geom_boxplot(fill = "lightgreen", color = "darkgreen", alpha = 0.7, outlier.color = "black", o
utlier.size = 3) +
theme_minimal() +
labs(title = "Box Plot of conscientiousness", x = NULL, y = "conscientiousness") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14)
)
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 30/31
11/20/24, 2:35 PM R Notebook
shapiro.test(survey$conscientiousness)
##
## Shapiro-Wilk normality test
##
## data: survey$conscientiousness
## W = 0.98133, p-value = 0.1638
Conclusion
#Null Hypothesis (H₀):The variable "conscientiousness" follows a normal distribution.
#Alternative Hypothesis (H₁):The variable "conscientiousness" does not follow a normal distribut
ion.
#From the output of the Shapiro-Wilk test, the p-value (0.1638) is greater than 0.05. This indic
ates that the data do not significantly deviate from a normal distribution. Therefore, we can as
sume that the data are approximately normally distributed(accept the null hypothesis).
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 31/31