Introduction To R PDF
Introduction To R PDF
Introduction To R PDF
Statistics
Statistics is a branch of mathematics dealing with
Data collection
Organization
Analysis
Interpretation
Make decisions
• Data consists of information coming from
observations, counts, measurements, or responses.
The workspace
Your R objects are stored in a workspace.
To list the objects in your workspace:
> ls()
History
• To Work with your previous commands:
>history() #display last 25 commands
>history(max.show=Inf) #display all previous
commands
>(5+(6+7)*(pi^2))/8
[1] 16.66311
>log(exp(1))
[1] 1
>log(10000, 10)
[1] 4
> sin(pi/3)^2 + cos(pi/3)^2
[1] 1
>Sin(pi/3)^2 + cos(pi/3)^2
Error: couldn’t find function “Sin”
>ExP(-1)
Error: could not find function "ExP“
>exp(-1)
[1] 0.3678794
Naming Variables
• Three types of Variables
Numeric {Ex: 3, 4.098, 1234}
Character {Ex: Andrew, today, RRR}
Logical{Ex: TRUE, FALSE}
• Names can be built from letters, digits, and the period (dot)(.)
symbol.
• Names must not start with a digit or a period followed by a
digit.
• Names are case-sensitive.
• Some names are already used by the system. You can’t use the
followings as variable names
Eg: c, q, t, D, F, I, T, diff, df, pt
Assigning values to variables
i. a = Y + Y
ii. b = Y *(1/2)* Y
iii. c = a + b
iv. d = 1/c
v. Print Y, a, b, c, d
• Suppose you want to handle the 2nd element of
the Y.
> Y [2]
[1] 4
> Y [1:3]
[1] 2 4 3
>Y[5:8]
[1] 5 1 7 8
Character Vectors
• A character vector is a vector of text strings, whose
elements are specified and printed in Quotes.
> x = c (“Wednesday”, “Tuesday”, “Monday”)
>x
[1] “Wednesday” “Tuesday” “Monday”
> color[2]
[1] “Blue”
Logical Vectors
> 0/0
[1] NaN
R as a Number Generator
• Generate a variable with numbers ranging from 1
to 12:
> x <- 1:12
>x
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> rep(c(1:4), 3)
[1] 1 2 3 4 1 2 3 4 1 2 3 4
> gl(2, 4, 8)
[1] 1 1 1 1 2 2 2 2
> sample(1:60, 5)
[1] 32 26 6 18 9
Data frames
Syntax
>data2=read.table(“H/marks.txt”, header=T)
Or
>data2=read.table(file.choose(), header=T)
• header=T columns have headings.
>data.frame.name=read.table(“Drive\\Directory\\FileName.extension”,
header=T)
Variations of read.table
1. read.csv
fields are separated by commas
2. Using History Window
Naming Columns
• It can be named columns after import the
data set into R.
Syntax:
>names(dataset_name) = c(“var_name1”,
“var_name2”, ............)
Eg:
>names(data)=c("Index","Weight","Height","S
ex","Sub1","Sub2","Sub3","Class")
To separate the data items into separate vectors
• Syntax: >variable_name = data_frame_name[column_no]
Eg: >Sub1=data2[1]
OR
• Syntax:
>variable_name = data_frame_name$ variable_name_in_text_file
Eg: >Sub1=data2$Sub1
Descriptive Statistics
• It can be used some predefined functions to
perform some necessary statistics one by one.
• Syntax:
> mean(variable_name)
> sd(variable_name)
> var(variable_name)
> min(variable_name)
> max(variable_name)
> median(variable_name)
• Eg:
>mean(Height)
>var(Height)
• All these statistics can be performed at once
by using the function 'summary'.
• Syntax: > summary(variable_name)
• Eg: >summary(Height)
• If there is any missing value in the variable, R produce
the result as a missing value (NA).
• To avoid that problem, you can give the argument
'na.rm‘ (not available, remove) to request that missing
values to be removed.
• Syntax: > mean(variable_name, na.rm=T)
• Eg:
>mean(Sub1)
>mean(Sub1,na.rm=T)
How to use a by variable?
1st method
• Consider about each levels of given category
• Syntax: > tapply(association_var, classification_var,statistic)
association_variable - any continuous variable
classification_variable - any categorical
variable (by variable)
statistic - any statistic that you want to perform
• Eg:
> tapply(Height,Sex,mean)
2nd method
Consider only the given level of given category
• Syntax:
> summary(association_var [classification_var = =level])
• Instead of 'summary', any predefined function for
descriptive statistics can also be used here.
Eg:
> summary(Height [Sex = ='M'])
> mean(Height [Sex = ='M'])
> var(Height [Sex = ='M'])
Tally and Contingency Tables
• Table uses the cross-classifying factors to build a
contingency table of the counts at each combination of
factor levels.
Tally table
Eg: >table(Sex)
Contingency Tables
• Syntax:
> table(var_name1, var_name2)
Eg:
>table(Sex,Class)