Rstudio Divya
Rstudio Divya
PROFESSIONAL STUDIES-
TECHNICAL CAMPUS
BCOM 2023-2026
RStudio
INTRODUCTION TO R
4. R continues to use English for all messages and help files; this doesn't change that.
You may click Next again until R installs, and you can use the default settings without risk.
We may install the R Studio setup once we have installed the R setup. The steps to install R
studio are as follows: .
1. Set up R. Keep the installation parameters set to their default values.
2. Let R Studio open.
3. Select "Install Packages" under the "Packages" tab.
4. Type "Rcmdr" until a list of results appears. ..
5. Hold off until the R Commander package has been installed in its entirety.
Launched on October 17, 2023, the most recent version of R Studio is "2023.09.01,"
known as "Deserted Sunflower."
RStudio Layout with snapshot. Explain the purpose of
all panes.
There are four panes in R Studio also known as windows. These panes are : -
1. Source-
This is that part of the window where we write our code. Our code will not
be evaluated until we run this code in the console.
2. Console-
This is that part of the window where our code from the source is evaluated
by R. we can also use the console to perform quick calculations that we don’t need to
save.
3. Environment/History- This is that part of the window where we can see that
. space
what objects are in our working
4. Files/Post/Packages/Help-
This is that part of the window where we can see
file directories, view, plots see our packages and access R help.
What is CRAN?
The official repository is the Comprehensive R Archive Network (CRAN), a global network
of web and file servers run by the R community. It is coordinated by the R community, and
order for a package to be listed in CRAN, it must pass a number of tests to make sure that i
complies with CRAN guidelines..
2. The assignment operator \-, which consists of the less than sign < right after the minus s
-, is the second element.
3. The value or values that should be applied to the name make up the last part.
A variable's value is assigned from right to left via the symbol <-.
OPERATOR DESCRIPTION
mean(x) Mean of x
median(x) Median of x
var(x) Variance of x
sd(x) Standard Deviation of x
scale(x) Standard scores(z-scores) of
x
quantile(x) The quartiles of x
Packages in R Programming
The “tidyr” Package
Apply Important Functions (gather,separate,
unite, spread, fill, full_seq, drop_na, and
replace_na)in “tidyr Package”for following
dataset
Gather ()
Separate ()
Unite ()
Spread ()
Drop na ()
Replace na ()
Arrange ()
Select ()
Rename ()
Mutate ()
Transmute ()
Sample_n ()
Sample_frac ()
Data Visualization in R Studio
Quick plot with ggplot2
Generate BCOM marks data containingthe
sections and overall percentage(5 sections
ranging from A to E ), with 60 students in each
section
Plot
Create following Quick plots withcustomized
labels (with your name and DOB) for both the axis
and Main title of the chart
Histogram plot
• Histogram fill color by group (Section)
• Basic density plot
• Density plot line color by group (Section) and change line
type
• Draw a plot using data from numeric vectors where X conta
values ranging from 10 to 20 and
Y is square of X
• Add to the dot plot for X & Y
HISTOGRAM PLOT:
DENSITY PLOT:
PLOT6: Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000)
on x axis) with Smoothed line. and the point shape by groups (Number of
gears)
Provide 5 commands for Descriptive statistics
DATA:
STATISTICAL COMMANDS:
Q36.Provide summary statistics for the MTCARS
datasetwhile displayinga count summaryof
categorical variables.
Summary ()
Count Summary
HYPOTHESIS TESTING using R studio
For all test import excel with the given data saved by your name_test
T-TEST
One Sample t- Test using dummy (One- Tailed)
File name example: divya_ttest
Problem 1:
To determine that the population mean of age is equal to 40
at α=0.05.
Age
18
24
56
78
67
24
65
89
25
23
45
65
78
55
32
33
44
26
56
89
44
34
3
4
56
56
76
SOLUTION: -
H0 : Population mean of age is not equal to 30
0H
:≠ 30
H1 : Population mean of age is equal to 30
1H: =30
STEPS: -
1. In the File Tab, Click on the Import Dataset then click From Excel-
2. Click on Browse, select file, select sheet and import-
CODING:
library(readxl)
Divya_ttest <- read_excel("C:/Users/mrvai/Desktop/divya singh/college/Divya_ttest.xlsx")
View(Divya_ttest)
data<-Divya_ttest[,c("Age")]
null_mean<-30
t_test_result<-t.test(data$Age,mu=null_mean,alternative = "less")
print(t_test_result)
RESULT-
data: data$Age
t = 3.5836, df = 26, p-value = 0.9993
alternative hypothesis: true mean is less than 30
95 percent confidence interval:
-Inf 54.87245
sample estimates:
mean of x
46.85185
DECISION RULE:-
If P(T) is less than a, reject Null Hypothesis.
Inference:
Since P(0.9993) is greater than a(0.05), accept Null Hypothesis.
Conclusion:-
The population mean of age is not equal to 30 at a = 0.05.
SOLUTION:
Hypothesis Testing
H0 : The time spent by full time students in studying statistics is not different from time spe
by part time students.
0H
: µF= µP
H0: µF - µP = 0
H1: The time spent by full time students in studying statistics is different from time spent by
part time students.
H
1: µF≠ µ
P
H1: µF - µP ≠ 0
Steps-
1. In the File Tab, click on Import Dataset then click From Excel-
4. Write the coding in source, click on run and analyse the output from
console-
CODING:
library(readxl)
Divya_ttest2 <- read_excel("C:/Users/mrvai/Desktop/divya
singh/college/Divya_ttest2.xlsx")
View(Divya_ttest2)
t_test_result<-t.test(Divya_ttest2$'Full Time',Divya_ttest2$`Part Time`)
print(t_test_result)
RESULT:
Welch Two Sample t-test
DECISION RULE:
If P(T) is less than a, reject Null Hypothesis
Inference:
Since P(0.8) is greater than a(0.005), accept Null Hypothesis.
Conclusion:
The time spent by full time students in studying statistics is not different as time spent by p
time students at a=0.005
C. Two Sample t- Test
Problem 1:
Is there sufficient evidence to suggest that the
mean time to exhaustion is greater after chocolate
milk than after carbohydrate replacement drink?
Use a significance level of 0.05. (Use µCM-µCD in
hypothesis statements)
CYCLIST CHOCOLATE MILK CARBOHYDRATED
REPLACEMENT
MILK
1 50.46 32.9
2 47.08 20.1
3 57.51 41.67
4 46.6 32.69
5 49.1 46.33
6 27.5 31.63
7 23.87 50.61
8 28.65 14.99
9 35.37 20.11
SOLUTION:
HYPOTHESIS TESTING-
H0 : The mean time to exhaustion is not greater after chocolatemilk than after carbohydrate
replacement drink.
: 𝜇CM
0H ≤𝜇
CD
: 𝜇CM
0H - 𝜇CD ≤ 0
H1: The mean time to exhaustion is greater after chocolatemilk than carbohydrate replacem
drink.
: 𝜇CM
1H >𝜇
CD
: 𝜇CM
1H - 𝜇CD >0
Steps:
1. In the file tab, Click on Import Dataset then click From Excel:
4. Write the coding in source, click on run and analyse the output from
console:
CODING:
library(readxl)
Divya_ttest3 <- read_excel("Divya_ttest3.xlsx")
View(Divya_ttest3)
t_test_result<-t.test(Divya_ttest3$`CHOCOLATE
MILK`,Divya_ttest3$`CARBOHYDRATED REPLACEMENT MILK`,paired =
T,alternative = "greater")
print(t_test_result)
RESULT:
Paired t-test
Decision Rule:
If P(T) is less than a, reject Null Hypothesis.
Inference:
Since p(0.007) is greater than a(0.005), reject Null Hypothesis.
Conclusion:
The mean time to exhaustion is greater after chocolate milk than after carbohyd
replacement drink at a=0.005.
D. Paired t- Test
Problem 1:
Coaching was given to students for Statistical
software after their result was evaluatedin
January in order to improve their performance in
April exams.Determineif the coachingwas
successful. (α = 0.05%)
JAN MAY
45 56
54 57
44 32
56 67
34 44
45 34
34 34
56 76
45 56
54 45
67 55
56 87
56 66
56 65
76 45
45 76
SOLUTION:
HYPOTHESIS TESTING
H0: The coaching was not successful
𝜇MAY
0:H ≤𝜇
JAN
0:H𝜇MAY - 𝜇JAN ≤ 0
1:H𝜇MAY > 𝜇
JAN
1:H𝜇MAY - 𝜇JAN > 0
Steps:-
1. In the File Tab, click Import Dataset, then click sheet and import-
4. Write the coding in source, click on run and analyse the output from
console-
CODING:-
> library(readxl)
> Divya_ttest4 <- read_excel("C:/Users/mrvai/Desktop/divya
singh/college/Divya_ttest4.xlsx")
> View(Divya_ttest4)
> t_test_result<-t.test(Divya_ttest4$JAN,Divya_ttest4$MAY,paired = =
T,alternative
"greater")
> print(t_test_result)
Result:-
Paired t-test
Decision Rule:
If P(T) is less than a, reject Null Hypothesis.
Inference:
Since P(0.85) is greater than a(0.005), accept Null Hypothesis
Conclusion:
The coaching was not successful.
E. Two Sample t Test
Problem 1:
To analyse that there is a significant difference
between the marks scored by class groups A & B
in mathematics at α=10%.
GROUP A GROUP B
76 95
55 97
76 87
76 89
89 56
65 98
76 76
88 56
78 76
87 56
87 87
65 76
76 87
89 88
65 76
78 66
69 45
65 76
89 77
SOLUTION:
HYPOTHESIS TESTING
H0: There is no significant difference between the marks scored by class group A and B in
mathematics.
0H
: µA =µB
0H
: µA -µB =0
H1: There is a significant difference between the marks scored by class group A and B in
mathematics.
1H
: µA ≠µB
1H
: µA -µB ≠ 0
Steps:-
1. In the File Tab, click Import Dataset, then click sheet and import-
4. Write the coding in source, click on run and analyse the output from
console-
CODING:-
> library(readxl)
> Divya_ttest <- read_excel("C:/Users/mrvai/Desktop/divya
singh/college/Divya_ttest.xlsx",
+ sheet = "Sheet2")
> View(Divya_ttest)
> t_test_result<-t.test(Divya_ttest$`GROUP A`,Divya_ttest$`GROUP B`,conf.level =
0.95)
> print(t_test_result)
Result:-
Welch Two Sample t-test
Decision Rule:
If P(T) is less than a, reject Null hypothesis
Inference:
Conclusion:
There is no significant difference between the marks scored by class group A and B in
mathematics
F. F Test
Problem 1:
Determine whether or not there is a significant
difference between variances of two data sets
GROUP1 GROUP2
150 170
125 165
160 130
130 155
160 125
125 150
SOLUTION:
HYPOTHESIS TESTING
H0: There is no significant difference between variances of two data sets.
0:Hµ1 = µ2
0:Hµ1 - µ2 = 0
4. Write the coding in source, click on run and analyse the output from
console-
CODING:-
> library(readxl)
> Divya_ttest <- read_excel("C:/Users/mrvai/Desktop/Divya
singh/college/Divya_ttest.xlsx",
+ sheet = "Sheet3")
> View(Divya_ttest)
> var.test(Divya_ttest$GROUP1,Divya_ttest$GROUP2)
RESULT:-
F test to compare two variances
DECISION RULE:-
If P(T) is less than a, reject Null Hypothesis.
Inference:-
Since P(0.87) is greater than a(0.005), accept Null Hypothesis.
Conclusion:
There is no significant difference between variances of two data sets.
G. One Way Anova
Problem 1:
The marks for 3 different groups in Economics,
Science, History are given. Determine whether
there is a significant difference between the means
of population.
SOLUTION:
HYPOTHESIS TESTING:
H0: There is no significant difference between the mean of population.
0H
: µ1 = µ2= µ3
H1: There is significant difference between the mean of population.
1H
: Atleast one of the means is different.
STEPS:
1. In the File Tab, click Import Dataset, then click sheet and import-
4. Write the coding in source, click on run and analyse the output from
console-
CODING:
library(readxl)
singh_ttest <- read_excel("C:/Users/mrvai/Desktop/singh
/college/singh_ttest.xlsx",
sheet = "Sheet4")
View(singh_ttest)
group1=c(singh_ttest$ECONOMICS)
group2=c(singh_ttest$SCIENCE)
group3=c(singh_ttest$HISTORY)
combined_group=data.frame(cbind(group1,group2,group3))
summary(combined_group)
stack(combined_group)
stacked_group=stack(combined_group)
annova_result=aov(values~ind,data=stacked_group)
print(summary(annova_result))
RESULT:
Df Sum Sq Mean Sq F value Pr(>F)
ind 2 1125 562.4 3.238 0.0585 .
Residuals 22 3821 173.7
Decision Rule:
If P(T) is less than a, reject Null Hypothesis.
Inference:
Since P(0.859) is greater than a(0.005), accept Null Hypothesis
Conclusion:
There is no significant difference the mean of population.
H. Chi Square Test
Problem 1:
Determine whether brand preference is
independent of age group.
AGE/BRAND BRAND 1 BRAND 2 BRAND 3
15-25 75 56 72
26-35 60 40 64
36-45 45 52 50
45-55 55 35 45
SOLUTION:
HYPOTHESIS TESTING:
H0: There is no association between brand preference and age group.
H1: There is an association between brand preference and age group.
STEPS:-
1. In the File Tab, click Import Dataset, then click sheet and import-
2. Click on Browse, select file, select sheet and import-
CODING:
library(readxl)
Divya_ttest <- read_excel("C:/Users/mrvai/Desktop/divya
singh/college/Divya_ttest.xlsx",
sheet = "Sheet5")
View(Divya_ttest)
chi_square_result<-chisq.test(Divya_ttest[,c("BRAND 1","BRAND 2","BRAND 3")])
print(chi_square_result)
RESULT:
Pearson's Chi-squared test
Conclusion:
There is no association between brand preference and age group.