0% found this document useful (0 votes)
2 views

Assignment # 05

The document is an R Notebook detailing statistical analyses using Wilcoxon signed-rank tests on happiness scores across three time points (OHS_1, OHS_2, OHS_3) and a dataset related to students' communication styles. The results indicate no statistically significant differences in happiness scores over time, with p-values exceeding 0.05. Additionally, it includes data loading, structure checking, and visualization using box plots.

Uploaded by

shanza161199
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment # 05

The document is an R Notebook detailing statistical analyses using Wilcoxon signed-rank tests on happiness scores across three time points (OHS_1, OHS_2, OHS_3) and a dataset related to students' communication styles. The results indicate no statistically significant differences in happiness scores over time, with p-values exceeding 0.05. Additionally, it includes data loading, structure checking, and visualization using box plots.

Uploaded by

shanza161199
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

11/20/24, 2:35 PM R Notebook

R Notebook
This is an R Markdown (http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the
results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and
pressing Ctrl+Shift+Enter.

plot(cars)

getwd()

## [1] "C:/Users/LENOVO/Documents/Statistics/Exercises"

Assignment-05
Exercise-46
Load the data

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 1/31
11/20/24, 2:35 PM R Notebook

# Read the data correctly using tab separator


OHS_data <- read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\OHS_2020_paired.tx
t", sep="\t", stringsAsFactors = FALSE)

Check the structure of the dataset

head(OHS_data)

Name OHS_1 OHS_2 OHS_3


<chr> <dbl> <dbl> <dbl>

1 Jennifer NA 4.8 5.2

2 Tanja 4.6 4.8 NA

3 Heike 3.7 3.8 4.5

4 David 4.6 5.0 4.9

5 Florian 4.2 4.6 4.6

6 Denise 4.6 5.4 5.3

6 rows

str(OHS_data)

## 'data.frame': 21 obs. of 4 variables:


## $ Name : chr "Jennifer" "Tanja" "Heike" "David" ...
## $ OHS_1: num NA 4.6 3.7 4.6 4.2 4.6 5 4.3 5.1 5.2 ...
## $ OHS_2: num 4.8 4.8 3.8 5 4.6 5.4 4.4 5.1 4.8 5.8 ...
## $ OHS_3: num 5.2 NA 4.5 4.9 4.6 5.3 4.1 4.9 4.6 5.9 ...

# Convert columns to numeric

OHS_data$OHS_1 <- as.numeric(OHS_data$OHS_1)


OHS_data$OHS_2 <- as.numeric(OHS_data$OHS_2)
OHS_data$OHS_3 <- as.numeric(OHS_data$OHS_3)

Performed Wilcoxon signed-rank test with approximate p-value (no exact p-value)

wilcox_time1_time2 <- wilcox.test(OHS_data$OHS_1, OHS_data$OHS_2, paired = TRUE, exact = FALSE)


print("Wilcoxon Test for OHS_1 vs OHS_2 (approximate p-value):")

## [1] "Wilcoxon Test for OHS_1 vs OHS_2 (approximate p-value):"

print(wilcox_time1_time2)

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 2/31
11/20/24, 2:35 PM R Notebook

##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data$OHS_1 and OHS_data$OHS_2
## V = 45.5, p-value = 0.08502
## alternative hypothesis: true location shift is not equal to 0

wilcox_time2_time3 <- wilcox.test(OHS_data$OHS_2, OHS_data$OHS_3, paired = TRUE, exact = FALSE)


print("Wilcoxon Test for OHS_2 vs OHS_3 (approximate p-value):")

## [1] "Wilcoxon Test for OHS_2 vs OHS_3 (approximate p-value):"

print(wilcox_time2_time3)

##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data$OHS_2 and OHS_data$OHS_3
## V = 57.5, p-value = 0.6041
## alternative hypothesis: true location shift is not equal to 0

wilcox_time1_time3 <- wilcox.test(OHS_data$OHS_1, OHS_data$OHS_3, paired = TRUE, exact = FALSE)


print("Wilcoxon Test for OHS_1 vs OHS_3 (approximate p-value):")

## [1] "Wilcoxon Test for OHS_1 vs OHS_3 (approximate p-value):"

print(wilcox_time1_time3)

##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data$OHS_1 and OHS_data$OHS_3
## V = 28, p-value = 0.07343
## alternative hypothesis: true location shift is not equal to 0

Remove pairs with zero differences from the dataset

OHS_data_filtered <- OHS_data[!(OHS_data$OHS_1 == OHS_data$OHS_2 | OHS_data$OHS_2 == OHS_data$OH


S_3 | OHS_data$OHS_1 == OHS_data$OHS_3),]

# Perform the Wilcoxon test after filtering out zero differences


wilcox_time1_time2 <- wilcox.test(OHS_data_filtered$OHS_1, OHS_data_filtered$OHS_2, paired = TRU
E, exact = FALSE)
print("Wilcoxon Test for OHS_1 vs OHS_2 (after removing zero differences):")

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 3/31
11/20/24, 2:35 PM R Notebook

## [1] "Wilcoxon Test for OHS_1 vs OHS_2 (after removing zero differences):"

print(wilcox_time1_time2)

##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data_filtered$OHS_1 and OHS_data_filtered$OHS_2
## V = 21.5, p-value = 0.3269
## alternative hypothesis: true location shift is not equal to 0

wilcox_time2_time3 <- wilcox.test(OHS_data_filtered$OHS_2, OHS_data_filtered$OHS_3, paired = TRU


E, exact = FALSE)
print("Wilcoxon Test for OHS_2 vs OHS_3 (after removing zero differences):")

## [1] "Wilcoxon Test for OHS_2 vs OHS_3 (after removing zero differences):"

print(wilcox_time2_time3)

##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data_filtered$OHS_2 and OHS_data_filtered$OHS_3
## V = 23, p-value = 0.3978
## alternative hypothesis: true location shift is not equal to 0

wilcox_time1_time3 <- wilcox.test(OHS_data_filtered$OHS_1, OHS_data_filtered$OHS_3, paired = TRU


E, exact = FALSE)
print("Wilcoxon Test for OHS_1 vs OHS_3 (after removing zero differences):")

## [1] "Wilcoxon Test for OHS_1 vs OHS_3 (after removing zero differences):"

print(wilcox_time1_time3)

##
## Wilcoxon signed rank test with continuity correction
##
## data: OHS_data_filtered$OHS_1 and OHS_data_filtered$OHS_3
## V = 16, p-value = 0.1422
## alternative hypothesis: true location shift is not equal to 0

Using Box plot to visualize the difference in OHS-1, OHS-2 and OHS-3 with p-value

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 4/31
11/20/24, 2:35 PM R Notebook

library(ggplot2)

OHS_data_long <- data.frame(


Time = rep(c("OHS_1", "OHS_2", "OHS_3"), each = nrow(OHS_data)),
Happiness = c(OHS_data$OHS_1, OHS_data$OHS_2, OHS_data$OHS_3)
)

ggplot(OHS_data_long, aes(x = Time, y = Happiness, fill = Time)) +


geom_boxplot() +
theme_minimal() +
labs(title = "Box Plot of Happiness Scores at Different Time Points",
x = "Time Points",
y = "Happiness Score") +
scale_fill_manual(values = c("skyblue", "lightgreen", "salmon"))

## Warning: Removed 3 rows containing non-finite outside the scale range


## (`stat_boxplot()`).

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 5/31
11/20/24, 2:35 PM R Notebook

Conclusion:
#Null Hypothesis (H0):The happiness scores of students are the same (no significant difference)
across the three time points.

#Alternative Hypothesis(Ha):There is a statistically significant difference in happiness scores


between at least two of the three time points.

# Based on the results of the Wilcoxon signed-rank tests, we can conclude that there are no stat
istically significant differences in the happiness scores of students between the three time poi
nts: OHS_1, OHS_2, and OHS_3. The p-values for all comparisons (OHS_1 vs OHS_2, OHS_2 vs OHS_3,
and OHS_1 vs OHS_3) are all greater than the typical significance threshold of 0.05.Thus, it can
be concluded that we are accepting the Null Hypothesis that students' happiness levels remained
consistent over time during the observed periods.

Exercise-49
Load the data

ICM<-read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\ICM.txt",stringsAsFactors=
F)
head(ICM)

ID Gen… A… Englishfluent Germanfluent Transport Highest_level_of_education


<int><chr> <int><chr> <chr> <chr> <chr>

1 75 female 22 yes no PublicTransport College

2 90 female 22 yes no PublicTransport College

3 173 female 37 yes yes Car University

4 189 female 17 yes yes Car none

5 100 female 19 yes yes Walk HighSchool

6 155 female 16 yes no Walk none

6 rows | 1-8 of 24 columns

Look at the data

summary(ICM)

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 6/31
11/20/24, 2:35 PM R Notebook

## ID Gender Age Englishfluent


## Min. : 1.0 Length:199 Min. :16.00 Length:199
## 1st Qu.: 52.5 Class :character 1st Qu.:19.00 Class :character
## Median :103.0 Mode :character Median :20.00 Mode :character
## Mean :103.9 Mean :24.98
## 3rd Qu.:155.5 3rd Qu.:25.00
## Max. :209.0 Max. :87.00
##
## Germanfluent Transport Highest_level_of_education
## Length:199 Length:199 Length:199
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Do_you_smoke Socialmediahours Timewithfriends Pet
## Length:199 Length:199 Length:199 Length:199
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Siblings Children Relationshipstatus Activitieshours
## Length:199 Length:199 Length:199 Min. : 5.00
## Class :character Class :character Class :character 1st Qu.:10.00
## Mode :character Mode :character Mode :character Median :10.00
## Mean :16.51
## 3rd Qu.:20.00
## Max. :50.00
##
## NegativeMood PositiveMood Mentalhealth Socialization
## Min. :0.000 Min. :0.000 Min. :0.1667 Min. :0.500
## 1st Qu.:1.000 1st Qu.:1.792 1st Qu.:2.0000 1st Qu.:1.833
## Median :1.545 Median :2.333 Median :2.5000 Median :2.667
## Mean :1.684 Mean :2.273 Mean :2.4478 Mean :2.512
## 3rd Qu.:2.364 3rd Qu.:2.833 3rd Qu.:3.0000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.0000 Max. :4.000
## NA's :5 NA's :3 NA's :1 NA's :6
## Activity SocialSupport Communication_open_direct OHS
## Min. :0.400 Min. :0.3333 Min. :1.462 Min. :2.241
## 1st Qu.:2.200 1st Qu.:2.0000 1st Qu.:3.538 1st Qu.:3.586
## Median :2.600 Median :3.0000 Median :3.846 Median :4.276
## Mean :2.627 Mean :2.6700 Mean :3.746 Mean :4.205
## 3rd Qu.:3.000 3rd Qu.:3.3333 3rd Qu.:4.077 3rd Qu.:4.862
## Max. :4.000 Max. :4.0000 Max. :4.846 Max. :5.655
## NA's :2 NA's :23 NA's :18

Wilcoxon signed-rank test:

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 7/31
11/20/24, 2:35 PM R Notebook

wilcox.res <- wilcox.test(OHS ~ Siblings, data=ICM)


wilcox.res

##
## Wilcoxon rank sum test with continuity correction
##
## data: OHS by Siblings
## W = 1956.5, p-value = 0.9803
## alternative hypothesis: true location shift is not equal to 0

Box Plot to visualize difference between Communication style of students with siblings and students
without siblings

library(ggplot2)

ggplot(ICM, aes(x = Siblings, y = OHS)) +


geom_boxplot(fill = c("lightblue", "lightgreen")) +
labs(title = "Comparison of Communication Style (OHS) between Students with and without Siblin
gs",
x = "Siblings ( No = without sibling, Yes = with sibling)",
y = "Communication Style (OHS)") +
theme_minimal()

## Warning: Removed 18 rows containing non-finite outside the scale range


## (`stat_boxplot()`).

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 8/31
11/20/24, 2:35 PM R Notebook

Conclusion:
#Null Hypothesis (𝐻0): The communication styles of students with siblings and without siblings h
ave identical distributions.

#Alternative Hypothesis (𝐻a): The communication styles of students with siblings and without sib
lings do not have identical distributions.

#Since the p-value (0.9803) is greater than 0.05, we fail to reject the null hypothesis. There i
s no significant difference in the distributions (or medians) of OHS by Siblings. This suggests
that the location shift is likely to be 0.

Exercise-50
Load the dataset

ICM <- read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\ICM.txt", stringsAsFactor


s = F)

Check the summary statistics to get an idea of the mental health scores

summary(ICM)

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 9/31
11/20/24, 2:35 PM R Notebook

## ID Gender Age Englishfluent


## Min. : 1.0 Length:199 Min. :16.00 Length:199
## 1st Qu.: 52.5 Class :character 1st Qu.:19.00 Class :character
## Median :103.0 Mode :character Median :20.00 Mode :character
## Mean :103.9 Mean :24.98
## 3rd Qu.:155.5 3rd Qu.:25.00
## Max. :209.0 Max. :87.00
##
## Germanfluent Transport Highest_level_of_education
## Length:199 Length:199 Length:199
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Do_you_smoke Socialmediahours Timewithfriends Pet
## Length:199 Length:199 Length:199 Length:199
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Siblings Children Relationshipstatus Activitieshours
## Length:199 Length:199 Length:199 Min. : 5.00
## Class :character Class :character Class :character 1st Qu.:10.00
## Mode :character Mode :character Mode :character Median :10.00
## Mean :16.51
## 3rd Qu.:20.00
## Max. :50.00
##
## NegativeMood PositiveMood Mentalhealth Socialization
## Min. :0.000 Min. :0.000 Min. :0.1667 Min. :0.500
## 1st Qu.:1.000 1st Qu.:1.792 1st Qu.:2.0000 1st Qu.:1.833
## Median :1.545 Median :2.333 Median :2.5000 Median :2.667
## Mean :1.684 Mean :2.273 Mean :2.4478 Mean :2.512
## 3rd Qu.:2.364 3rd Qu.:2.833 3rd Qu.:3.0000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.0000 Max. :4.000
## NA's :5 NA's :3 NA's :1 NA's :6
## Activity SocialSupport Communication_open_direct OHS
## Min. :0.400 Min. :0.3333 Min. :1.462 Min. :2.241
## 1st Qu.:2.200 1st Qu.:2.0000 1st Qu.:3.538 1st Qu.:3.586
## Median :2.600 Median :3.0000 Median :3.846 Median :4.276
## Mean :2.627 Mean :2.6700 Mean :3.746 Mean :4.205
## 3rd Qu.:3.000 3rd Qu.:3.3333 3rd Qu.:4.077 3rd Qu.:4.862
## Max. :4.000 Max. :4.0000 Max. :4.846 Max. :5.655
## NA's :2 NA's :23 NA's :18

Wilcoxon signed-rank test:

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 10/31
11/20/24, 2:35 PM R Notebook

wilcox.res <- wilcox.test(Mentalhealth ~ Children, data = ICM)


wilcox.res

##
## Wilcoxon rank sum test with continuity correction
##
## data: Mentalhealth by Children
## W = 2032.5, p-value = 0.09124
## alternative hypothesis: true location shift is not equal to 0

Create a boxplot to visualize the mental health scores by Has Children group

library(ggplot2)

ggplot(ICM, aes(x = factor(Children), y = Mentalhealth)) +


geom_boxplot(fill = c("lightblue", "lightgreen")) +
labs(title = "Mental Health Scores by Presence of Children",
x = "Has Children ( No = does not have children, Yes = has childen)",
y = "Mental Health Score") +
theme_minimal()

## Warning: Removed 1 row containing non-finite outside the scale range


## (`stat_boxplot()`).

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 11/31
11/20/24, 2:35 PM R Notebook

Conclusion:
#Null Hypothesis (𝐻0): The mental health of students with children and students without children
have identical data distribution.

#Alternative Hypothesis (𝐻a): TThe mental health of students with children and students without
children do not have identical distributions.

#Since the p-value (0.09124) is greater than 0.05, we fail to reject the null hypothesis. There
is no statistically significant difference in the distributions (or medians) of mental health sc
ores between individuals with children and those without children. However, the p-value is close
to 0.05, suggesting a trend that may warrant further investigation with a larger sample size

Exercise-53
Load the dataset

ICM <- read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\ICM.txt", stringsAsFactor


s = F)

Check for summary statistics to get an idea of the mental health scores

summary(ICM)

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 12/31
11/20/24, 2:35 PM R Notebook

## ID Gender Age Englishfluent


## Min. : 1.0 Length:199 Min. :16.00 Length:199
## 1st Qu.: 52.5 Class :character 1st Qu.:19.00 Class :character
## Median :103.0 Mode :character Median :20.00 Mode :character
## Mean :103.9 Mean :24.98
## 3rd Qu.:155.5 3rd Qu.:25.00
## Max. :209.0 Max. :87.00
##
## Germanfluent Transport Highest_level_of_education
## Length:199 Length:199 Length:199
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Do_you_smoke Socialmediahours Timewithfriends Pet
## Length:199 Length:199 Length:199 Length:199
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Siblings Children Relationshipstatus Activitieshours
## Length:199 Length:199 Length:199 Min. : 5.00
## Class :character Class :character Class :character 1st Qu.:10.00
## Mode :character Mode :character Mode :character Median :10.00
## Mean :16.51
## 3rd Qu.:20.00
## Max. :50.00
##
## NegativeMood PositiveMood Mentalhealth Socialization
## Min. :0.000 Min. :0.000 Min. :0.1667 Min. :0.500
## 1st Qu.:1.000 1st Qu.:1.792 1st Qu.:2.0000 1st Qu.:1.833
## Median :1.545 Median :2.333 Median :2.5000 Median :2.667
## Mean :1.684 Mean :2.273 Mean :2.4478 Mean :2.512
## 3rd Qu.:2.364 3rd Qu.:2.833 3rd Qu.:3.0000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.0000 Max. :4.000
## NA's :5 NA's :3 NA's :1 NA's :6
## Activity SocialSupport Communication_open_direct OHS
## Min. :0.400 Min. :0.3333 Min. :1.462 Min. :2.241
## 1st Qu.:2.200 1st Qu.:2.0000 1st Qu.:3.538 1st Qu.:3.586
## Median :2.600 Median :3.0000 Median :3.846 Median :4.276
## Mean :2.627 Mean :2.6700 Mean :3.746 Mean :4.205
## 3rd Qu.:3.000 3rd Qu.:3.3333 3rd Qu.:4.077 3rd Qu.:4.862
## Max. :4.000 Max. :4.0000 Max. :4.846 Max. :5.655
## NA's :2 NA's :23 NA's :18

identify the unique values

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 13/31
11/20/24, 2:35 PM R Notebook

unique(ICM$Socialmediahours)

## [1] "1.5-3hrs/day" "<1.5hrs/day" "3-5hrs/day" ">5hours/day"

creates a binary categorization

ICM$Socialmediahours_bin <- ifelse(ICM$Socialmediahours > median(ICM$Socialmediahours, na.rm = T


RUE), "High", "Low")
ICM$Socialmediahours_bin <- factor(ICM$Socialmediahours_bin, levels = c("Low", "High"))

Wilcoxon signed-rank test:

wilcox.res <- wilcox.test(NegativeMood ~ Socialmediahours_bin, data = ICM)


wilcox.res

##
## Wilcoxon rank sum test with continuity correction
##
## data: NegativeMood by Socialmediahours_bin
## W = 2014.5, p-value = 0.003768
## alternative hypothesis: true location shift is not equal to 0

Create a boxplot to visualize the negative mood scores by social media use

library(ggplot2)

ggplot(ICM, aes(x = factor(Socialmediahours_bin), y = NegativeMood)) +


geom_boxplot(fill = c("lightblue", "lightgreen")) +
labs(title = "Negative Mood Scores by Social Media Use",
x = "Social Media Use (High vs Low)",
y = "Negative Mood Score") +
theme_minimal()

## Warning: Removed 5 rows containing non-finite outside the scale range


## (`stat_boxplot()`).

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 14/31
11/20/24, 2:35 PM R Notebook

Conclusion
#Null Hypothesis (H0):The negative mood scores of students do not differ depending on social med
ia use (i.e., they have identical distributions).

#Alternative Hypothesis(Ha):The negative mood of students does not have identical data distribut
ions depending on the social media use

# There is sufficient evidence to reject the null hypothesis and conclude that there is a signif
icant difference in the distribution of Negative Mood between the groups defined by Social media
hours_bin. This suggests that social media usage has an impact on mood.

Exercise-54
Load the data

ICM <- read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\ICM.txt", stringsAsFactor


s = F)

Check for summary statistics to get an idea of the mental health scores

summary(ICM)

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 15/31
11/20/24, 2:35 PM R Notebook

## ID Gender Age Englishfluent


## Min. : 1.0 Length:199 Min. :16.00 Length:199
## 1st Qu.: 52.5 Class :character 1st Qu.:19.00 Class :character
## Median :103.0 Mode :character Median :20.00 Mode :character
## Mean :103.9 Mean :24.98
## 3rd Qu.:155.5 3rd Qu.:25.00
## Max. :209.0 Max. :87.00
##
## Germanfluent Transport Highest_level_of_education
## Length:199 Length:199 Length:199
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Do_you_smoke Socialmediahours Timewithfriends Pet
## Length:199 Length:199 Length:199 Length:199
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Siblings Children Relationshipstatus Activitieshours
## Length:199 Length:199 Length:199 Min. : 5.00
## Class :character Class :character Class :character 1st Qu.:10.00
## Mode :character Mode :character Mode :character Median :10.00
## Mean :16.51
## 3rd Qu.:20.00
## Max. :50.00
##
## NegativeMood PositiveMood Mentalhealth Socialization
## Min. :0.000 Min. :0.000 Min. :0.1667 Min. :0.500
## 1st Qu.:1.000 1st Qu.:1.792 1st Qu.:2.0000 1st Qu.:1.833
## Median :1.545 Median :2.333 Median :2.5000 Median :2.667
## Mean :1.684 Mean :2.273 Mean :2.4478 Mean :2.512
## 3rd Qu.:2.364 3rd Qu.:2.833 3rd Qu.:3.0000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.0000 Max. :4.000
## NA's :5 NA's :3 NA's :1 NA's :6
## Activity SocialSupport Communication_open_direct OHS
## Min. :0.400 Min. :0.3333 Min. :1.462 Min. :2.241
## 1st Qu.:2.200 1st Qu.:2.0000 1st Qu.:3.538 1st Qu.:3.586
## Median :2.600 Median :3.0000 Median :3.846 Median :4.276
## Mean :2.627 Mean :2.6700 Mean :3.746 Mean :4.205
## 3rd Qu.:3.000 3rd Qu.:3.3333 3rd Qu.:4.077 3rd Qu.:4.862
## Max. :4.000 Max. :4.0000 Max. :4.846 Max. :5.655
## NA's :2 NA's :23 NA's :18

Group “Low” if time spent is less than or equal to 10 hours, else “High”

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 16/31
11/20/24, 2:35 PM R Notebook

ICM$TimeGroupBinary <- ifelse(ICM$Timewithfriends <= 10, "Low", "High")

Check the distribution of the new ‘TimeGroupBinary’ variable

table(ICM$TimeGroupBinary)

##
## High Low
## 152 47

Wilcoxon signed-rank test:

wilcox.res <- wilcox.test(Socialization ~ TimeGroupBinary, data = ICM)

wilcox.res

##
## Wilcoxon rank sum test with continuity correction
##
## data: Socialization by TimeGroupBinary
## W = 4673, p-value = 0.0001855
## alternative hypothesis: true location shift is not equal to 0

Create a boxplot to visualize the socialization scores by time spent with friends

library(ggplot2)

ggplot(ICM, aes(x = TimeGroupBinary, y = Socialization)) +


geom_boxplot(fill = c("lightblue", "lightgreen")) +
labs(title = "Socialization by Time Spent with Friends",
x = "Time Spent with Friends",
y = "Socialization Score") +
theme_minimal()

## Warning: Removed 6 rows containing non-finite outside the scale range


## (`stat_boxplot()`).

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 17/31
11/20/24, 2:35 PM R Notebook

Conclusion:
#Null Hypothesis (H0):The socialization of students do not differ (has identical data distribut
ions) depending on the time spent with friends

#Alternative Hypothesis(Ha):The socialization of students differ (does not have identical data d
istribution) depending on the time spent with friends

#The p-value is significantly smaller than 0.05, we can reject the null hypothesis and conclude
that there is a significant difference in the distribution of Socialization between the two grou
ps defined by Time Group Binary. This means that Time Group Binary (likely indicating different
time-based categories, such as morning vs evening or high vs low time periods) has a significant
effect on Socialization.

#Exercise-56

Load the data

survey<-read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\survey_PCA.txt",
stringsAsFactors=F)

look at the loaded data

head(survey, 3)

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 18/31
11/20/24, 2:35 PM R Notebook

vpn sex school_years age n1 n2 n3 n4 n5


<int> <chr> <dbl> <int> <int> <int> <int> <int> <int>

1 2 male 10 48 3 1 1 2 4

2 3 female 10 47 1 3 1 2 3

3 4 female 12 22 3 3 3 3 2

3 rows | 1-10 of 70 columns

Density plot for Openness

library(ggpubr)

## Warning: package 'ggpubr' was built under R version 4.4.2

ggdensity(survey$openness,
main = "Density Plot of Openness",
xlab = "Openness",
fill = "blue",
color = "darkblue",
alpha = 0.4)

Q-Q plot for openness

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 19/31
11/20/24, 2:35 PM R Notebook

library(ggpubr)

ggqqplot(survey$openness,
title = "QQ Plot of Openness",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles",
color = "steelblue",
shape = 21,
fill = "lightblue",
size = 2) +
theme_minimal()

load the library “gridExtra”

library(gridExtra)

Print QQ-Plot, Density-Plot and Box-Plot

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 20/31
11/20/24, 2:35 PM R Notebook

library(ggplot2)
library(gridExtra)

# QQ plot
qqplot <- ggplot(survey, aes(sample = openness)) +
geom_qq(aes(color = "Points"), size = 2, alpha = 0.7) +
geom_qq_line(color = "blue", size = 1, linetype = "dashed") +
theme_minimal() +
labs(title = "QQ Plot of Openness") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Density plot
densityplot <- ggplot(survey, aes(openness)) +
geom_density(aes(fill = "Density", color = "Outline"), alpha = 0.4, size = 1) +
scale_fill_manual(values = c("Density" = "lightblue")) +
scale_color_manual(values = c("Outline" = "blue")) +
theme_minimal() +
labs(title = "Density Plot of Openness", x = "Openness", y = "Density") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)

# Box plot
boxplot <- ggplot(survey, aes(y = openness)) +
geom_boxplot(fill = "lightblue", color = "darkblue", alpha = 0.7, outlier.color = "black", out
lier.size = 3) +
theme_minimal() +
labs(title = "Box Plot of Openness", x = NULL, y = "Openness") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14)
)

# Arrange the plots in a single column


grid.arrange(qqplot, densityplot, boxplot, ncol = 1)

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 21/31
11/20/24, 2:35 PM R Notebook

shapiro.test perform to check the Shapiro-Wilktest of normality

shapiro.test(survey$openness)

##
## Shapiro-Wilk normality test
##
## data: survey$openness
## W = 0.97794, p-value = 0.08856

Conclusion
#Null Hypothesis (H₀):The variable "openness" follows a normal distribution.

#Alternative Hypothesis (H₁):The variable "openness" does not follow a normal distribution.

#From the output of the Shapiro-Wilk test, the p-value (0.08856) is greater than 0.05. This indi
cates that the data do not significantly deviate from a normal distribution. Therefore, we acce
pt the null hypothesis and can assume that the data are approximately normally distributed.

Exercise-57
file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 22/31
11/20/24, 2:35 PM R Notebook

Load the data

survey<-read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\survey_PCA.txt",
stringsAsFactors=F)

look at the loaded data

head(survey, 3)

vpn sex school_years age n1 n2 n3 n4 n5


<int> <chr> <dbl> <int> <int> <int> <int> <int> <int>

1 2 male 10 48 3 1 1 2 4

2 3 female 10 47 1 3 1 2 3

3 4 female 12 22 3 3 3 3 2

3 rows | 1-10 of 70 columns

Density plot for compatibility

library(ggpubr)

ggdensity(survey$compatibility,
main = "Density Plot of compatibility",
xlab = "Compatibility",
fill = "skyblue",
alpha = 0.4)

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 23/31
11/20/24, 2:35 PM R Notebook

Q-Q plot for compatibility

library(ggpubr)

ggqqplot(survey$compatibility,
title = "QQ Plot of Compatibility",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles",
color = "darkblue",
shape = 21,
fill = "skyblue",
size = 2) +
theme_minimal()

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 24/31
11/20/24, 2:35 PM R Notebook

load the library “gridExtra”

library(gridExtra)

Print QQ-Plot, Density-Plot and Box-Plot

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 25/31
11/20/24, 2:35 PM R Notebook

library(ggplot2)
library(gridExtra)

# QQ plot
qqplot <- ggplot(survey, aes(sample = compatibility)) +
geom_qq(aes(color = "Points"), size = 2, alpha = 0.7) +
geom_qq_line(color = "darkblue", size = 1, linetype = "dashed") +
theme_minimal() +
labs(title = "QQ Plot of Compatibility") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)

# Density plot
densityplot <- ggplot(survey, aes(compatibility)) +
geom_density(aes(fill = "Density", color = "Outline"), alpha = 0.4, size = 1) +
scale_fill_manual(values = c("Density" = "skyblue")) +
scale_color_manual(values = c("Outline" = "darkblue")) +
theme_minimal() +
labs(title = "Density Plot of compatibility", x = "compatibility", y = "Density") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)

# Box plot
boxplot <- ggplot(survey, aes(y = compatibility)) +
geom_boxplot(fill = "skyblue", color = "darkblue", alpha = 0.7, outlier.color = "purple", outl
ier.size = 3) +
theme_minimal() +
labs(title = "Box Plot of compatibility", x = NULL, y = "compatibility") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14)
)

# Arrange the plots in a single column


grid.arrange(qqplot, densityplot, boxplot, ncol = 1)

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 26/31
11/20/24, 2:35 PM R Notebook

shapiro.test perform to check the Shapiro-Wilktest of normality

shapiro.test(survey$compatibility)

##
## Shapiro-Wilk normality test
##
## data: survey$compatibility
## W = 0.97105, p-value = 0.02543

#coclusion

#Null Hypothesis (H₀):The variable "compatibility" follows a normal distribution.

#Alternative Hypothesis (H₁):The variable "compatibility" does not follow a normal distribution.

#From the output of the Shapiro-Wilk test, the p-value (0.02543) is less than 0.05. We reject th
e null hypothesis. This indicates that the data significantly deviate from a normal distributio
n, suggesting that the variable "compatibility" does not follow a normal distribution.

#Exercise-58

Load the data

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 27/31
11/20/24, 2:35 PM R Notebook

survey<-read.delim("C:\\Users\\LENOVO\\Documents\\Statistics\\Datasets\\survey_PCA.txt",
stringsAsFactors=F)

look at the loaded data

summary(survey$conscientiousness)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 12.00 27.00 32.00 31.18 36.00 48.00

Density plot for conscientiousness

library(ggpubr)

# Colorful density plot


ggdensity(survey$conscientiousness,
main = "Density Plot of Conscientiousness",
xlab = "conscientiousness",
fill = "darkgreen",
alpha = 0.4)

Q-Q plot for conscientiousnes

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 28/31
11/20/24, 2:35 PM R Notebook

library(ggpubr)

ggqqplot(survey$conscientiousness,
title = "Q-Q Plot of Conscientiousness",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles",
color = "darkgreen",
shape = 21,
fill = "lightgreen",
size = 2) +
theme_minimal()

load the library “gridExtra”

library(gridExtra)

Print QQ-Plot, Density-Plot and Box-Plot

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 29/31
11/20/24, 2:35 PM R Notebook

library(ggplot2)
library(gridExtra)

# QQ plot
qqplot <- ggplot(survey, aes(sample = conscientiousness)) +
geom_qq(aes(color = "Points"), size = 2, alpha = 0.7) +
geom_qq_line(color = "darkgreen", size = 1, linetype = "dashed") +
theme_minimal() +
labs(title = "QQ Plot of conscientiousness") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)

# Density plot
densityplot <- ggplot(survey, aes(conscientiousness)) +
geom_density(aes(fill = "Density", color = "Outline"), alpha = 0.4, size = 1) +
scale_fill_manual(values = c("Density" = "darkgreen")) +
scale_color_manual(values = c("Outline" = "lightgreen")) +
theme_minimal() +
labs(title = "Density Plot of conscientiousness", x = "conscientiousness", y = "Density") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
legend.position = "none"
)

# Box plot
boxplot <- ggplot(survey, aes(y = conscientiousness)) +
geom_boxplot(fill = "lightgreen", color = "darkgreen", alpha = 0.7, outlier.color = "black", o
utlier.size = 3) +
theme_minimal() +
labs(title = "Box Plot of conscientiousness", x = NULL, y = "conscientiousness") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14)
)

# Arrange the plots in a single column


grid.arrange(qqplot, densityplot, boxplot, ncol = 1)

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 30/31
11/20/24, 2:35 PM R Notebook

shapiro.test perform to check the Shapiro-Wilktest of normality

shapiro.test(survey$conscientiousness)

##
## Shapiro-Wilk normality test
##
## data: survey$conscientiousness
## W = 0.98133, p-value = 0.1638

Conclusion
#Null Hypothesis (H₀):The variable "conscientiousness" follows a normal distribution.

#Alternative Hypothesis (H₁):The variable "conscientiousness" does not follow a normal distribut
ion.

#From the output of the Shapiro-Wilk test, the p-value (0.1638) is greater than 0.05. This indic
ates that the data do not significantly deviate from a normal distribution. Therefore, we can as
sume that the data are approximately normally distributed(accept the null hypothesis).

file:///C:/Users/LENOVO/Documents/Statistics/Exercises/Assignment-05.html 31/31

You might also like