0% found this document useful (0 votes)

81 views6 pages

R Functions

Here are some common R functions for statistical analysis and visualization: # Data manipulation read.csv() - Import CSV data str() - View structure of data subset() - Subset data merge() - Join dataframes # Summarizing data summary() - Summary stats table() - Frequency tables boxplot() - Boxplots hist() - Histograms # Inferential stats t.test() - Student's t-test prop.test() - Test proportions chisq.test() - Chi-squared test cor.test() - Correlation test lm() - Linear regression aov() - ANOVA # Visualization plot() - Basic plots

Uploaded by

Shreya Ghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views6 pages

R Functions

Uploaded by

Shreya Ghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

A pain researcher is interested in finding methods to reduce lower back pain in individuals

without having to use drugs. The researcher thinks that having acupuncture in the lower back
might reduce back pain. To investigate this, the researcher recruits 25 participants to their
study. At the beginning of the study, the researcher asks the participants to rate their back pain
on a scale of 1 to 10, with 10 indicating the greatest level of pain. After 4 weeks of twice weekly
acupuncture, the participants are asked again to indicate their level of back pain on a scale of 1
to 10, with 10 indicating the greatest level of pain. The researcher wishes to understand whether
the participants' pain levels changed after they had undergone the acupuncture, so a Wilcoxon
signed-rank test is run.

R Functions
library(help = "datasets")
data(cars)
View(cars)
str(cars)

#Session / Set working directory

#Titanic Data: Read the data: Objective?
 My_Test <- read.csv ("test.csv", header = TRUE)
 View (My_Test)
 My_Train <- read.csv ("train.csv", header = TRUE)
 str(My_Train)
 str(My_Test)
 hist(My_Train$Age)
 boxplot(Train$Age)

# to add these two set together we must have same number of columns, so we need to add
one Survival column to My_Test set.. My_Test [Row, Coloum]

My_Test.survived <- data.frame(Survived = rep ("None", nrow(My_Test)), My_Test[,])

#combine the dataset
data.combined <- rbind(My_Train,My_Test.survived)

# factor …drop down list.. enumeration, States of india,..

# str ().. gives the way the data is read by R from the csv file.. so we need to prepare this for R
 is.factor(My_Test$Pclass)
 data.combined$Pclass <- as.factor(data.combined$Pclass)
 str(data.combined$Pclass)
# distribution of data
 table(data.combined$Pclass)
# ggplot2 ..first install the package.. add gg plot2 and then library (ggplot2).. if you get an error
add package stringer.. run again.. library..
 library (ggplot2)

 hist(My_Train$Age)
# Let us use the ggplot … this graph is more powerful and informative
 ggplot(data.combined, aes (x= data.combined$Sex, fill = Survived)) +
geom_bar(width= 0.5)+
facet_wrap(~Pclass) + ggtitle ("Pclass") +
xlab("Title")+ ylab ("Total count")+
labs (fill= "Survived")
 My_Train$Survived <- as.factor(My_Train$Survived)
 ggplot(My_Train, aes (x= My_Train$Sex, fill = Survived)) +
geom_bar(width= 0.5)+
xlab("Title")+ ylab ("Total count")+
labs (fill= "Survived")

 ggplot(data.combined, aes (x= data.combined$Sex, fill = Survived)) +

geom_bar(width= 0.5)+
facet_wrap(~Pclass) + ggtitle ("Pclass") +
xlab("Title")+ ylab ("Total count")+
labs (fill= "Survived")

 ggplot(My_Train, aes (x= My_Train$Sex, fill = Survived)) +

geom_bar(width= 0.5)+
facet_wrap(~Pclass) + ggtitle ("Pclass") +
xlab("Title")+ ylab ("Total count")+
labs (fill= "Survived")

 ggplot(TrainTitle, aes (x= TrainTitle$Age, fill = Survived)) +

geom_bar(width= 0.5)+
facet_wrap(~Pclass + Sex) + ggtitle ("Pclass") +
xlab("Title")+ ylab ("Total count")+
labs (fill= "Survived")

# added Title
TrainTitle <- read.csv ("trainTitle.csv", header = TRUE)
TrainTitle$Title <- as.factor(TrainTitle$Title)
TrainTitle$Survived <- as.factor(TrainTitle$Survived)
 ggplot(TrainTitle, aes (x= TrainTitle$Age, fill = Survived)) +
geom_bar(width= 0.5)+
facet_wrap(~Pclass + Sex) + ggtitle ("Pclass") +
xlab("Title")+ ylab ("Total count")+
labs (fill= "Survived")

 ggplot(TrainTitle, aes (x= TrainTitle$Age, fill = Survived)) +

geom_bar(width= 10)+
facet_wrap(~Pclass + Sex) + ggtitle ("Pclass") +
xlab("Title")+ ylab ("Total count")+
labs (fill= "Survived")
Statistical Operations
# create a dataframe from scratch
age <- c(25, 30, 56)
gender <- c("male", "female", "male")
weight <- c(160, 110, 220)
mydata <- data.frame(age,gender,weight)
How to deal with Missing Data
 My_Train_NA_Omit <- na.omit(My_Train)
 summary(My_Train_NA_Omit$Age)
 summary(My_Train$Age)
 mean(My_Train_NA_Omit$Age)

normal density function (by default m=0 sd=1)

# plot standard normal curve

x <- pretty(c(-3,3), 30)

y <- dnorm(x)

plot(x, y, type='l', xlab="Normal Deviate", ylab="Density", yaxs="i")

pnorm(q)

cumulative normal probability for q (area under the normal curve to the right of q)

pnorm(1.96) is 0.975

(x <- c(1,2,NA,3)
mean(x) # returns NA
mean (x, na.rm=TRUE) # returns 2
 mean(My_Train_NA_Omit$Age)
 sd(My_Train_NA_Omit$Age)
 range(My_Train_NA_Omit$Age)

 write.csv(My_Train_NA_Omit, "My_Train_NA_Omit.csv")

One sample T test

Before examining an analysis, it is always better to plot a histogram / scater plot r Box plot
of the data.
 Boxplot (My_Train_NA_Omit$Age)
#H0: Mu = 35 , Mu != 35 …We want to be 95 % confident on our finding. It can be one tailed
too… like Mu>35, Mu< 35.
> t.test(My_Train_NA_Omit$Age, mu = 35)
> t.test(My_Train_NA_Omit$Age, mu = 35, alt = "two.sided", con= 0.90)

 t.test(My_Train_NA_Omit$Age, mu = 35, alternative = "less", conf.level= 0.95)

#'arg' should be one of “two.sided”, “less”, “greater”

Two sided is by default…

 t.test(My_Train_NA_Omit$Age, mu = 35, alt = "two.sided", con= 0.95)

Two Sample T test (Independent Two sample Test)
 t.test (My_Train$Age~My_Train$Sex, paired= F)
 t.test (My_Train$Age~My_Train$Sex, mu=0, alt= "two.sided", conf = 0.95, var.eq = T,
paired= F)
 t.test (My_Train_NA_Omit$Age~My_Train_NA_Omit$Sex, mu=0, alt= "two.sided", conf =
0.95, var.eq = T, paired= F)

 boxplot(My_Train_NA_Omit$Age~My_Train_NA_Omit$Sex)

Paired T test Two tail test

 t.test (Test_paired$Hotel.16, Test_paired$Hotel.17, paired = T)
 t.test (Test_paired$Hotel.16, Test_paired$Hotel.17, mu=0, alternative = "two.sided",
paired = T, conf.level = 0.95)

Anova
# One Way Anova (Completely Randomized Design)
fit <- aov(y ~ A, data=mydataframe)
Y is the Values ( numeric) and A is factors (categorical)

Group1<- c(1,3,4,6,7,8,6,4,5,3,5)
Group2<- c(3,6,3,4,5,6,7,5,6)
Group3 <- c(4,5,6,7,8,9)
combines_group <-(data.frame(cbind(Group1,Group2,Group3)))
S<-stack(combines_group)
ANV<- aov (values~ ind, data= S)
summary (ANV)

Correlation

> cor(x~y)
> cor.test (x~y)
x<- data.frame(My_Test$Age,My_Test$Fare)
> cor(x, use="complete.obs", method="pearson")

Simple Linear regression

Lm (a~b)

Multiple Regression
# Multiple Linear Regression Example
fit <- lm(y ~ x1 + x2 + x3, data=mydata)
summary(fit) # show results

Non Parametric Test

chisq.test (My_Test$Pclass, My_Test$Sex)

wilcox.test(y~A) # where y is numeric and A is A binary factor

wilcox.test(y,x) # where y and x are numeric
wilcox.test(y1,y2,paired=TRUE) # where y1 and y2 are numeric
kruskal.test(y~A) # where y1 is numeric and A is a factor

R For Health Data Science - 1st Edition Optimized DOCX Download
100% (13)
R For Health Data Science - 1st Edition Optimized DOCX Download
14 pages
Module 4 Item Analysis and Validation
100% (9)
Module 4 Item Analysis and Validation
7 pages
Machine Learning Project On Cars
92% (13)
Machine Learning Project On Cars
22 pages
Marketing Research Coetzee AJ Chapter 4
No ratings yet
Marketing Research Coetzee AJ Chapter 4
37 pages
Social Media Addiction Scale - Student Form: The Reliability and Validity Study
100% (3)
Social Media Addiction Scale - Student Form: The Reliability and Validity Study
6 pages
Statistics Cheat Sheet
100% (1)
Statistics Cheat Sheet
4 pages
(Ebook PDF) Score Reliability: Contemporary Thinking On Reliability Issues PDF Download
100% (6)
(Ebook PDF) Score Reliability: Contemporary Thinking On Reliability Issues PDF Download
55 pages
Sampling Error and Non-Sampling Error
No ratings yet
Sampling Error and Non-Sampling Error
8 pages
ASL Exercise Ch1
100% (1)
ASL Exercise Ch1
5 pages
Psy02 Module 1
100% (1)
Psy02 Module 1
8 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
HLST 2301 Notes Print Me
No ratings yet
HLST 2301 Notes Print Me
29 pages
Projects of Precincts Banaybanay Davao Oriental
No ratings yet
Projects of Precincts Banaybanay Davao Oriental
6 pages
Statistics With R
No ratings yet
Statistics With R
20 pages
R Code
No ratings yet
R Code
13 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
Advanced Quantitative Methods
No ratings yet
Advanced Quantitative Methods
12 pages
Introduction To Psych Package
No ratings yet
Introduction To Psych Package
65 pages
ZXUR 9000 UMTS (V4.15.10.20) Radio Network Controller Performance Counter Reference
No ratings yet
ZXUR 9000 UMTS (V4.15.10.20) Radio Network Controller Performance Counter Reference
4,107 pages
DATA 1001 Course Notes
No ratings yet
DATA 1001 Course Notes
94 pages
Data1901 Notes
No ratings yet
Data1901 Notes
70 pages
Unit 2 Assignment SKELETON R spr18
No ratings yet
Unit 2 Assignment SKELETON R spr18
12 pages
UL3
No ratings yet
UL3
2 pages
Lec 6 Data Preprocessing Using R
No ratings yet
Lec 6 Data Preprocessing Using R
84 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
SPECIMEN EXAM SOLUTIONS - CS1B - IFoA - 2019 - Final
No ratings yet
SPECIMEN EXAM SOLUTIONS - CS1B - IFoA - 2019 - Final
8 pages
Linear and Generalized Linear Models: Nicholas Christian BIOST 2094 Spring 2011
No ratings yet
Linear and Generalized Linear Models: Nicholas Christian BIOST 2094 Spring 2011
22 pages
Lab Checkup Notes 2
No ratings yet
Lab Checkup Notes 2
7 pages
Classification
No ratings yet
Classification
4 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
Assignment# 06
No ratings yet
Assignment# 06
16 pages
QUIZ Notes
No ratings yet
QUIZ Notes
5 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
PA Univariate R Solution
No ratings yet
PA Univariate R Solution
6 pages
R Practice
No ratings yet
R Practice
38 pages
Regn Lect 5
No ratings yet
Regn Lect 5
9 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Logistic Regression Implementation in R: The Dataset
No ratings yet
Logistic Regression Implementation in R: The Dataset
8 pages
20mia1006 Lab 4 FDA
No ratings yet
20mia1006 Lab 4 FDA
15 pages
R Practicals
No ratings yet
R Practicals
32 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
No ratings yet
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
6 pages
Regression
No ratings yet
Regression
36 pages
Factorial Analysis of Variance PDF
No ratings yet
Factorial Analysis of Variance PDF
2 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
SurvivalwithR PDF
No ratings yet
SurvivalwithR PDF
28 pages
R
No ratings yet
R
6 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
An Introduction To The Psych Package: Part I: Data Entry and Data Description
No ratings yet
An Introduction To The Psych Package: Part I: Data Entry and Data Description
63 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
Ds
No ratings yet
Ds
2 pages
R Stats Cheatsheet
No ratings yet
R Stats Cheatsheet
1 page
R Cheat Sheet
No ratings yet
R Cheat Sheet
9 pages
DSBAProject Oct 2020
No ratings yet
DSBAProject Oct 2020
24 pages
R Course
No ratings yet
R Course
7 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
48 pages
Adavantages of Historical Research
No ratings yet
Adavantages of Historical Research
2 pages
BAN5
No ratings yet
BAN5
2 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
1.1 Objective: 2. Data Preparation and Exploratory Analysis
No ratings yet
1.1 Objective: 2. Data Preparation and Exploratory Analysis
11 pages
Cheat Sheet Final
No ratings yet
Cheat Sheet Final
2 pages
1.1 Loading The Data: Survival by Sex
No ratings yet
1.1 Loading The Data: Survival by Sex
6 pages
Appendix: Answers To Selected Exercises: /user
No ratings yet
Appendix: Answers To Selected Exercises: /user
8 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Appendix R
No ratings yet
Appendix R
11 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Cheat Sheet F
No ratings yet
Cheat Sheet F
2 pages
MV 464
No ratings yet
MV 464
2 pages
Exp With Random Factors
No ratings yet
Exp With Random Factors
42 pages
Dynamic PDF
No ratings yet
Dynamic PDF
55 pages
All Values in The First Column
No ratings yet
All Values in The First Column
7 pages
UL2
No ratings yet
UL2
2 pages
Questionnaire BEL
No ratings yet
Questionnaire BEL
2 pages
Biostat - Inferential Statistics 1 and 2 - Lec 4
No ratings yet
Biostat - Inferential Statistics 1 and 2 - Lec 4
11 pages
28 1 79 1 10 20170613 PDF
No ratings yet
28 1 79 1 10 20170613 PDF
6 pages
28) Zeng, S. X., Tian, P., & Tam, C. M. (2007) .Overcoming - Barriers - To - Sustainable - Implementation
No ratings yet
28) Zeng, S. X., Tian, P., & Tam, C. M. (2007) .Overcoming - Barriers - To - Sustainable - Implementation
12 pages
CH10 習題解答
No ratings yet
CH10 習題解答
67 pages
Linguaskill Exam Format Only
No ratings yet
Linguaskill Exam Format Only
2 pages
Https GBRCR - Gov.bd Images 1734257564HSC Pre-Test Humanities
No ratings yet
Https GBRCR - Gov.bd Images 1734257564HSC Pre-Test Humanities
436 pages
Biostatistics Activity 4
No ratings yet
Biostatistics Activity 4
4 pages
R Studio Cheat Sheet For Math1041
No ratings yet
R Studio Cheat Sheet For Math1041
3 pages
IFT Notes R06 Hypothesis Testing
No ratings yet
IFT Notes R06 Hypothesis Testing
30 pages
Hypothesis Testing: Cee 3040 - Uncertainty Analysis in Engineering
No ratings yet
Hypothesis Testing: Cee 3040 - Uncertainty Analysis in Engineering
1 page
Wrap-Up Quiz 9: Attempt History
No ratings yet
Wrap-Up Quiz 9: Attempt History
6 pages
KEYP
No ratings yet
KEYP
3 pages
MARYLAND Blank Ballots TwoPageBallotSummary
No ratings yet
MARYLAND Blank Ballots TwoPageBallotSummary
6 pages
Confounding Variable
No ratings yet
Confounding Variable
3 pages
Notice 1712221509
No ratings yet
Notice 1712221509
1 page
Assignment of Software Testing
No ratings yet
Assignment of Software Testing
32 pages

R Functions

Uploaded by

R Functions

Uploaded by

A pain researcher is interested in finding methods to reduce lower back pain in individuals

#Session / Set working directory

My_Test.survived <- data.frame(Survived = rep ("None", nrow(My_Test)), My_Test[,])

# factor …drop down list.. enumeration, States of india,..

 ggplot(data.combined, aes (x= data.combined$Sex, fill = Survived)) +

 ggplot(My_Train, aes (x= My_Train$Sex, fill = Survived)) +

 ggplot(TrainTitle, aes (x= TrainTitle$Age, fill = Survived)) +

 ggplot(TrainTitle, aes (x= TrainTitle$Age, fill = Survived)) +

normal density function (by default m=0 sd=1)

# plot standard normal curve

x <- pretty(c(-3,3), 30)

plot(x, y, type='l', xlab="Normal Deviate", ylab="Density", yaxs="i")

One sample T test

 t.test(My_Train_NA_Omit$Age, mu = 35, alternative = "less", conf.level= 0.95)

#'arg' should be one of “two.sided”, “less”, “greater”

Two sided is by default…

 t.test(My_Train_NA_Omit$Age, mu = 35, alt = "two.sided", con= 0.95)

Paired T test Two tail test

Simple Linear regression

Non Parametric Test

wilcox.test(y~A) # where y is numeric and A is A binary factor

You might also like