R Intro
R Intro
R Intro
6. Vector operations-
• Vector operations are functions that make
calculations on a complete vector, like sum().
• Each result depends on more than one value of the
vector.
7. Matrix operations-
• These functions are used for operations and
calculations on matrices.
Objects
• To create new variables, you will need to use the
assignment operator (<-).
• Instead of declaring data types, as done in C++ and Java, in
R, the user assigns the variables with certain Objects in R,
the most popular are:
• Vectors
• Factors
• Lists
• Data Frames
• Matrices
• The data type of the object in R becomes the data type of
the variable by definition.
• R's basic data types are character, numeric, integer,
complex, and logical.
vector
• A vector is the simplest type of data structure in R. A
vector is a sequence of data elements of the same
basic type.
• There are six data types of the simplest object - vector:
1. Logical
2. Numeric
3. Integer
4. Character
5. Raw
6. Complex
• If you want to check the variable type, use class().
• A vector is a sequence of elements that share
the same data type. These elements are
known as components of a vector.
• R vector comes in two parts: Atomic
vectors and Lists.
• All elements of an atomic vector must be of
the same type, whereas the elements of a list
can have different types.
Atomic Vectors in R
• [1] 6
• r1 r2 r3 r4
1 4 7 10
c1 c2
• r1 1 2
• r2 4 5
• r3 7 8
• r4 10 11
c1 c2
• r1 4 6
• r2 7 9
Matrix Arithmetic
• The dimensions ( no of rows and columns)
should be same for the matrices involved in the
operation.
• Matrix1 <- matrix(c(10,20,30,40,50,60), nrow=2)
• Matrix2 <- matrix(c(1,2,3,4,5,6), nrow=2)
• Sum <- Matrix1 + Matrix2
• Difference <- Matrix1 – Matrix2
• Product <- Matrix1 * Matrix2
• Quotient <- Matrix1 / Matrix2
[,1] [,2] [,3]
• [1,] 10 20 30
• [2,] 40 50 60
• [1,] 1 5 9
• [1] 4
• [1] 24
• [1] 3 6 9 12
• [1] 22 23 24
Array Element Manipulation
• We can do calculations across the elements in an
array using the apply() function.
• Syntax- apply(x, margin,func)
• X is an array, margin is the name of the dataset,
func is function to be applied.
• V1 <- c(1,2,3)
• V2 <- c(10,20,30,40,50,60)
• A<- array(c(V1,V2), dim=c(3,3,2))
• B <- apply(A, c(1), sum)
• C <- apply (C, c(2), sum)
Array Arithmetic
• To perform the arithmetic operations, we need to
convert the multi-dimensional matrix into one
dimensional matrix.
• V1 <- c(1,2,3)
• V2 <- c(10,20,30,40,50,60)
• A<- array(c(V1,V2), dim=c(3,3,2))
• mat.a <- A[ , , 1]
• mat.b <- A[ , ,2]
• mat.a + mat.b
• mat.a - mat.b
• mat.a * mat.b
• mat.a / mat.b
Factors
• Factor is a data structure used for fields that takes
only predefined finite number of values or
categorical data.
• They are used to categorize the data and store it
as levels.
• They can store both string and integers.
• For ex., A data field such as marital status may
contain only values from single, married,
separated, divorced and widowed. In such case,
the possible values are predefined and distnict
called levels.
Creating factors
• factors are created with the help
of factor() functions, by taking a vector as input.
• Factor contains a predefined set value called
levels. By default, R always sorts levels in
alphabetical order.
• directions <- c("North", "North", "West", "South")
• factor(directions)
• o/p= levels: North, South,West
Accessing Factor
• There are various ways to access the elements
of a factor in R. Some of the ways are as
follows:
• data <- c("East", "West", "East", "North)
• data[4]
• data[c(2,3)]
• data[-1]
• data[c(TRUE, FALSE, TRUE, TRUE)]
Modifying Factor
• To modify a factor, we are only limited to the
values that are not outside the predefined
levels.
• print(data)
• data[2] <- "North"
• data[3] <- "South"
Data Frames
• names(X)
• nrow(X)
• ncol(X)
• str(X)
• summary(X)
Accessing Data Frame Components
• Components of data frame can be accessed like a
list or like a matrix.
(a) Accessing like a list – we can use either [[ or $
operator to access columns of data frame.
• Accessing with [[ and $ is similar.
• X <-
data.frame("roll"=1:2,"name"=c("jack","jill"),"age"=c(20,
22))
• X$name
• X[["name"]]
• X[[3]] # retrieves the value for the third col name as list
(b) Accessing like a Matrix – Data frame can be
accessed like a matrix by providing index for
row and column.
• We can use the [] for indexing, this will return
us a data frame unlike the other two [[ and $
will reduce it into a vector.
• We can use the head() function to display first
n rows.
• Negative number for the index are also
allowed in data frames.
• X <-
data.frame("roll"=1:3,"name"=c("jack","jill","Tom"),"age"=c(20,22,23))
• X["name"]
• X[1:2,]
• X[, 2:3]
• X[c(1,2),c(2,3)]
• X[,-1]
• X[-1,]
• X[X$age>21,]
• head(X,2)
Modifying Data Frames
• Data frames can be modified like we modified matrices
through reassignment.
• X <-
data.frame("roll"=1:3,"name"=c("jack","jill","Tom"),"age"=c(2
0,22,23))
• X[1,"age"] <- 25
• A data frame can be expanded by adding columns and rows.
• We can add the column vector using a new column name.
• Columns can also be added using the cbind() function.
• Similarly rows can be added using the rbind() function.
• Data frame columns can be deleted by assigning NULL to it.
• Similarly, rows can be deleted through reassignment.
• print(X$bloodgroup <- c("A+","B-","AB+"))
#sort by name
• newdata <- X[order(X$name),]
• print(df1)
• print(df2)
• # inner join
• merge(df1,df2, by= "CustomerId")
• # outer join
• merge(x=df1,y=df2, by= "CustomerId",all=TRUE)
• #cross join
• merge(x=df1,y=df2, by= NULL)
Reshaping Data
• R provides a variety of methods for reshaping
data prior to analysis.
• Two important functions for reshaping data are
the melt() and cast() functions.
• These functions are available in reshape package.
• Before using these functions, make sure that the
package is properly installed in your system.
• We can “melt” the data so that each row is a
unique id-variable combination. Then we can
“cast” the melted data into any shape we would
like.
• y <- data.frame("id"=c(1,2,1,2,1), "age"=c(20,20,21,21,19),
"marks1"=c(80,60,70,80,90),"marks2"=c(100,98,99,75,80))
• print(y)
• #melting data
• mdata= melt(y, id=c("id","age"))
• newdata <-
subset(X,age>=25&age<30,select=c(roll,name,age))
• print(newdata)
• newdata <-
subset(X,name=="smith"|name=="john",select=roll:age)
• print(newdata)
Data Type Conversion
• We can convert one data type to another data
type as in any programming language.
• We can convert any basic data type to
numeric using the function as.numeric().
• Similarly as.integer() converts to integer,
as.character() converts to character,
as.logical() converts to logical and
as.complex() converts to complex data types.
Unit 3
Conditions and loops
• Decision making structures are used by the
programmer to specify one or more
conditions to be evaluated or tested by the
program.
• A statement or statements need to be
executed if the condition is TRUE and
optionally other statements to be executed if
the condition is FALSE.
Decision Making
• R provides the following types of decision
making statements which includes if
statement, if..else statement, nested if…else
statement, ifelse() function and switch
statement.
if Statement
• An if statement consists of a boolean
expression followed by one or more
statements. The syntax is-
• If( boolean_expression)
{
// statement will execute if the boolean
expression is true.
}
• If the boolean_expression evaluates to TRUE,
then the block of code inside the if statement
will be executed.
• If boolean_expression evaluates to FALSE,
then the first set of code after the end of if
statement will be executed.
• Here boolean expression can be a logical or
numeric vector, but only the first element is
taken into consideration.
• In the case of numeric vector, zero is taken as
FALSE, rest as TRUE.
• x<- 10
if (x > 0)
{
cat(x, “ is a positive number\n”)
}
if….else Statement
• An if statement can be followed by an optional else
statements which executes when the boolean expression is
FALSE.
• The syntax of if…else is-
if (boolean_expression)
{
// if expression is true
}
else
{
// if expression is false
}
• If the boolean_expression evaluates to be
TRUE, then if block of code will be executed,
otherwise else block of code will be executed.
• X <- -5
If(x > 0){
cat( x, “is a positive number\n”)
} else {
cat( x, “is a negative number\n”)
}
• We can write the if…else statement in a single
line if the “if and else” block contains only one
statement as follows.
• if( x>0) cat ( x, ”is a positive no\n”) else cat(x, “is
a negative no\n”)
Nested if…else Statement
• An if statement can be followed by an optional
else if..else statement, which is very useful to
test various conditions using single if…else if
statement.
• We can nest as many if..else statement as we
want.
• Only one statement will get executed
depending upon the boolean_expression.
• if( boolean_expression 1) {
// execute when expression 1 is true.
} else if(boolean_expression 2) {
// execute when expression 2 is true.
} else if(boolean_expression 3) {
// execute when expression 3 is true.
} else {
// execute when none of the above condition is
true.
}
• X <- 19
if (x < 0)
{
cat(x, ”is a negative number”)
} else if (x>0)
{
cat(x, “is a positive number”)
}
else
print(“zero”)
ifelse() function
• Most of the function in R take vector as input and
output a resultant vector.
• This vectorization of code, will be much faster
than applying the same function to each element
of the vector individually.
• There is an easier way to use if..else statement
specifically for vectors in R.
• We can use if…else() function instead which is the
vector equivalent form of the if..else statement.
• ifelse(boolean_expression, x, y)
• Here, boolean_expression must be a logical
vector.
• The return value is a vector with the same length
as boolean_expression.
• This returned vector has element from x if the
corresponding value of boolean_expression is
TRUE or from Y if the corresponding value of
boolean_expression is FALSE.
• For example, the ith element of result will be x[i],
if boolean_expression[i] is TRUE else it will take
the value of y[i].
• The vectors x and y are recycled whenever
necessary.
• a = c(5,7,2,9)
ifelse( a %% 2 == 0 , “even” ,”odd”)
• o/p = ?
• In the above example, the boolean_expression
is a %% 2 ==0 which will result into the
vector(FALSE, FALSE,TRUE,FALSE).
• Similarly, the other two vectors in the function
argument gets recycled to (“even”, ”even”,
”even”, ”even”) and (“odd”, “odd”, “odd”,
“odd”) respectively.
• Hence the result is evaluated accordingly.
switch Statement
• A switch statement allows a variable to be tested
for equality against a list of values.
• Each value is called a case, and the variable being
switched on is checked for each case.
• switch( expression, case1, case2, case3….)
• If the value of expression is not a character string,
it is coerced to integer.
• We can have any no of case statements within a
switch.
• Each case is followed by the value to be
compared to and a colon.
• If the value of the integer is between 1 and
nargs()-1 { the max no of arguments} then the
corresponding element of case condition is
evaluated and the result is returned.
• If expression evaluates to a character string
then the string is matched(exactly) to the
names of the elements.
• If there is more than one match, the first
matching element is returned.
• No default argument is available.
• Switch( 2, “red”, “green”, “blue”)
• Switch(“color”, “color” = “red”, “shape” = “
square” ,”length “=5)
• Output- [1] “green”
[2] “red”
• If the value evaluated is a number, that item of the list
is returned.
• In the above example, “red”, “green”, ”blue” from a
three item list. The switch() function returns the
corresponding item to the numeric value evaluated.
• In the above example, green is returned.
• The result of the statement can be a string as well.
• In this case, the matching named item’s value is
returned.
• In the above example, “color” is the string that is
matched and its value “red” is returned.
Loops
• In General, statements are executed
sequentially.
• Loops are used in programming to repeat a
specific block of code.
• R provides various looping structures like for
loop, while loop and repeat loop.
for loop
• A for loop is a repetition control structure that allow us
to efficiently write a loop that needs to execute a
specific number of times.
• A for loop is used to iterate over a vector in R
programming.
for ( value in sequence)
{
statements
}
• Here sequence is a vector and value takes on each of
its value during the loop.
• In each iteration, statements are evaluated.
• X <- c(2,5,3,9,8,11,6)
count <- 0
for(val in X)
{
if (val %% 2 == 0)
count = count+1
}
cat( “no of even numbers in”, X, “is”, count, ”\n”)
• o/p = ?
• The for loop in R is flexible that they are not
limited to integers in the input.
• We can pass character vector, logical vector,
lists or expressions.
• Ex-
• V <- c( “a”, “e”, “i”, “o”, “u”)
for ( vowel in V)
{
print(vowel)
}
• o/p- ?
while loop
• while loops used to loop until a specific condition in
met.
• Syntax-
while ( test_expression)
{ statement
}
• Here, test expression is evaluated and the body of the
loop is entered if the result is TRUE.
• The statements inside the loop are executed and the
flow returns to evaluate the test_expression again.
• This is repeated each time until test_expression
evaluated to FALSE, in which case, the loop exits.
num=5
sum=0
while(num>0)
{ sum= sum + num
num= num - 1
} cat( “the sum is”, sum, “\n”)
repeat loop
• A repeat loop is used to iterate over a block of
code multiple number of times.
• There is no condition check in repeat loop to
exit the loop. We must ourselves put a
condition explicitly inside the body of the loop
and use the break statement to exit the loop.
• Otherwise it will result in an infinite loop.
repeat {
Statements
if( condition)
{
Break
}
}
Loop Control Statements
• Loop control statements are also known as
jump statements.
• Loop control statements change execution
from its normal sequence.
• When execution leaves a scope, all automatic
objects that were created in that scope are
destroyed.
• The loop control statements in R are break
statement and next statement.
break statement
• A break statement is used inside a loop
(repeat, for, while) to stop the iterations and
flow the control outside of the loop.
• In a nested looping situation, where there is a
loop inside another loop, this statement exists
from the innermost loop that is being
evaluated.
• x<- 1:10
for( val in x) {
if (val == 3) {
break
}
print(val) }
• o/p = ?
• In the above example, we iterate over the vector
x, which has consecutive numbers from 1 to 10.
• Inside the for loop we have used an if condition
to break if the current value is equal to 3.
next statement
• A next statement is useful when we want to
skip the current iteration of a loop without
terminating it.
• On encountering next, the R parser skips
further evaluation and starts next iteration of
loop.
• This is equivalent to the continue statement in
C, java and python.
• X <- 1:10
for( val in X) {
if ( val == 3) {
next
}
print( val)
}
• We use the next statement inside a condition to
check if the value is equal to 3.
• If the value is equal to 3, the current evaluation
stops( value is not printed) but the loop continues
with the next iteration.
Functions
• Functions are used to logically break our code
into simpler parts which becomes easy to
maintain and understand.
• A function is a set of statements organized
together to perform a specific task.
• R has a large no of built-in functions and the
user can create their own functions.
• A function is an object, with or without
arguments.
Function Definition
• The reserved word function is used to declare a
function in R.
• func_name <- function(argument)
{
Statement
}
• Here, the reserved word function is used to declare a
function in R.
• This function object is given a name by assigning it to a
variable, func_name.
• The statements within the curly braces form the body
of the function. These braces are optional if the body
contains only a single expression.
• Following are the components of a function in R-
1. Function Name – This is the actual name of the
function. It is stored in R environment as an
object with this name.
2. Arguments – When a function is invoked, we can
pass values to the arguments. Arguments are
optional. A function may or may not contain
arguments. The arguments can also have default
values.
3. Function Body – The function body contains a
collection of statements that defines what the
function does.
Function Calling
• We can create user-defined functions in R. They are
specific to what a user wants and once created they can
be used like build-in functions.
• power <- function(x,y)
{
result <- x^y
cat( x, “raised to the power”, y, “is”, result, “\n”)
}
• power(2,3)
• Here, the arguments used in the function declaration x
and y are called formal arguments and those used while
calling the function are called actual argument.
Function without Arguments
• It is possible to create a function in R without
arguments.
• square <- function()
{
for( i in 1:5)
cat(“square of”, i, “is”, (i*i), “\n”)
}
• square()
Function with named Arguments
• When calling a function in this way, the order of the
actual arguments does not matter or we can pass the
arguments in a shuffled order.
• For example, all the function calls given below are
equivalent.
• power <- function(x,y)
{
result <- x^y
cat( x, “raised to the power”, y, “is”, result, “\n”)
}
• power(2,3)
• Power(x=2,y=3)
• Power(y=3,x=2)
• Further we can use named and unnamed
arguments in a single function call.
• In such case, all the named arguments are
matched first and then the remaining
unnamed arguments are matched in a
positional order
• power( x=2,3)
• power(2, y=3)
Function with default Arguments
• We can assign default values to arguments in a
function in R. This is done by providing an
appropriate value to the formal argument in the
function declaration.
• The function named power is defined with a
default value for Y in the following example
program. If no value is passed for Y, then the
default value is taken.
• If the value is passed for Y, then the default value
will be overridden.
• power <- function(x,y=2)
{
result <- x^y
cat( x, “raised to the power”, y, “is”, result,
“\n”)
}
• power(2)
• Power(2,3)
Built-in Functions
• There are several built-in functions available in
R. These functions can be directly used in user
written program.
• The built-in functions can be grouped into
mathematical functions, character functions,
statistical functions, probability functions,
date functions, time functions and other
useful functions.
Mathematical functions
1. abs()- this function computes the absolute value
of numeric data.
• The syntax is abs(x), where x is any numeric
value, array or vector.
• abs(-1)
• x <- c( -2,4,0,45,9,-4)
• abs(x)
• x <- matrix (c( -3,5,-7,1,-9,4), nrow= 3, ncol=2,
byrow=TRUE)
• abs(x[1, ])
• abs (x[, 1])
2. Sin(), cos() and tan()- the function sin()
computes the sine value, cos() computes the
cosine value and tan() computes the tangent
value of numeric data in radians.
• Syntax is sin(x), cos(x), tan(x), where x is any
numeric, array or vector.
• sin(10) , cos(90) , tan(50)
• x <- c( -2,4,0,45,9,-4)
• sin(x) , cos(x) , tan(x)
• x <- matrix (c( -3,5,-7,1,-9,4), nrow= 3, ncol=2,
byrow=TRUE)
• sin(x[1, ]) ,cos(x[,1 ]), tan(x[1,])
3. asin(), acos() and atan() – the asin() computes the
inverse sine value, acos() computes inverse cosine
value and atan() computes inverse tangent value
of numeric data in radians.
• asin(1), acos(1), atan(50)
4.exp(x) – the function computes the exponential
value of a number or number vector, e^x.
• x=5 , exp(x)
5. ceiling- This function returns the smallest integer
larger than the parameter.
• x <- 2.5
• Ceiling(x)
• 3
6. floor- This function returns the largest integer
not greater than the giving number.
• x <- 2.5
• floor(x)
7. round()- This function returns the integer
rounded to the giving number.
• The syntax is round( x, digits=n), where x is
numeric variable or a vector and digit specifies
the number of digits to be rounded.
• x<- 2.587888
• round(x,3)
7. trunc()- This function returns the integer
truncated with decimal part.
• x <- 2.99
• trunc(x)
8. signif(x, digits=n)- This function rounds the
values in its first argument to the specified
number of significant digits.
• x <- 2.587888
• Signif (x,3)
• 2.59
10. log(), log10(), log2(), log(x,b)- log() function
computes natural algorithms for a no or vector.
11. max() and min() – max() function computes the
max value of a vector and min() function
computes the minimum value of a vector.
• x <- c(10, 289, -100, 8000)
• max(x) , min(x)
12. beta() and Ibeta()- function returns the beta
value and Ibeta() returns the natural logarithm of
the beta function.
• beta(4,9)
• Ibeta(4,9)
o/p - 0.0005050, -7.590852
13. gamma()- this function returns the gamma
function £x.
• x=5
• gamma(x)
• o/p – 24
14. factorial ()- this function computes factorial
of a number or a numeric vector.
• x=5
• factorial(x)
Character Function
• These functions are used for string handling
operations like extracting characters from a
string, extracting substrings from a string,
concatenation of strings, matching strings,
inserting strings, converting strings from one
case to another and so on.
1. agrep()- this function searches for
approximate matches to pattern within each
element of the string.
• agrep( pattern, x, ignore.case=FALSE, value=
FALSE, max.distance=0.1, useBytes= FALSE)
• x <- c(“R language”, “and”, “SAND”)
• agrep( “an”,x)
• agrep(“an”, x, ignore.case=TRUE)
• agrep(“uag”, x, ignore.case=TRUE, max=1)
• agrep(“uag”, x, ignore.case=TRUE, max=2)
• [1] 1 2
• [1] 1 2 3
• [1] 1
• [1] 1 2 3
2. char.expand()- This function seeks for a unique
match of its first argument among the elements
of its second.
• If successful, it returns this element, otherwise, it
performs an action specified by the third
argument. The syntax is as follow-
char.expand( input, target, nomatch= stop(“no
match”), warning())
• Where input is the character string to be
expanded, target is the character vector with the
values to be matched against, nomatch is an R
expression to be evaluated in case expansion was
not possible and warning function prints the
warning message in case there is no match.
• The match string searches only in the beginning.
• x<- c(“sand”, “and”, “land”)
• char.expand(“an”, x, warning(“no expand”))
• char.expand(“a”, x, warning(“no expand”))
3. charmatch()- This function finds matches
between two arguments and returns the index
position.
• charmatch( x, table, nomatch= NA_integer_)
• Where x gives the value to be matched, table
gives the value to be matched against and
nomatch gives the value to be returned at non
matching positions.
• charmatch (“an”, c(“and”, ”sand”))
• charmatch(“an”, “sand”)
• [1] 1
• [1] NA
4. charToRow – This function converts character
to ASCII or “raw” objects.
• x <- charToRaw(“a”)
• Y <- charToRaw(“AB”)
• [1] 61
• [1] 41 42
5. chartr() – this function is used for character
substitutions.
• chartr(old, new, x)
• x <- “apples are red”
• chartr(“a”, “g”, x)
6. dquote()- this function is used for putting double
quotes on a text.
• x <- ‘2013-06-12’
• dquote(x)
7. format()- numbers and strings can be formatted
to a specific style using format() function.
• Ex- format(x, digits, nsmall, scientific, width,
justify= c(“left”, “right”, “centre”, “none”))
8. gsub()- this function replaces all matches of a
string, if the parameter is a string vector, returns
a string vector of the same length and with the
same attributes.
• gsub(pattern, replacement, x, ignore.case=FALSE)
Ex- x<- “apples are red”
gsub(“are”, “were”, x)
o/p- “apples were red”
9. nchar() & nzchar()- This function determines
the size of each elements of a character
vector. nzchar() tests whether elements of a
character vector are non-empty strings.
Syn- nchar(x, type=“chars”, allowNA= FALSE)
syn- nzchar()
10. noquote()- This function prints out strings
without quotes. The syntax is noquote(x)
where x is a character vector.
Ex- letters
noquotes(letters)
11. paste()- Strings in R are combined using the
paste() function. It can take any number of string
arguments to be combined together.
Syn- paste(…., sep = “ “, collapse = NULL)
• Where…. Represents any number of arguments
to be combined, sep represents any seperator
between the arguments. It is optional.
• Collapse is used to eliminate the space in
between two strings but not the space within two
words of one string.
• Ex- a <- “hello”
• b <- “everyone”
• print(paste(a,b,c))
• print( paste(a,b,c, sep = “-” ))
• print( paste(a,b,c, sep = “”, collapse = “”)
12. replace()- This function replaces the values in X
with indices given in list by those given in values.
If necessary, the values in ‘values’ are recycled.
syn- replace( x, list, values)
Ex- x <- c(“green”, ”red”, “yellow”)
y <- replace(x,1,”black”)
13. sQuote()- This function is used for putting single
quote on a text.
X <- “2013-06-12 19:18:05”
sQuote(X)
14. strsplit()- This function splits the elements of a
character vector x into substrings according to
the matches to substring split within them.
Syn- strsplit( x, split)
15. substr()- This function extracts or replace
substrings in a character vector.
Syn- substr( x, start, stop)
substr( x, start, stop) <- value
Ex- substr( “programming”, 2,3)
x= c(“red”, “blue”, “green”, “yellow”)
Substr(x,2,3) <- “gh”
16. tolower() – This function converts string to
its lower case.
Syn- tolower(“R Programming”)
17. toString – This function produces a single
character string describing an R object.
Syn- toString(x)
toString( x, width = NULL)
18. toupper- This function converts string to its
upper case.
Syn- toupper(“r programming”)
Statistical Function
1. mean()- The function mean() is used to
calculate average or mean in R.
Syn- mean(x, trim= 0, na.rm = FALSE)
Trim is used to drop some observation from
both end of the sorted vector and na.rm is
used to remove the missing values from the
input vector.
2. median()- the middle most value in a data
series is called the median. The median() fun
is used in R to calculate this value.
Syn- median(x, na.rm= FALSE)
3. var()- returns the estimated variance of the
population from which the no in vector x are
sampled.
Syn- x<- c(10,2,30,2,5,8)
var(x, na.rm= TRUE)
4. sd()- returns the estimated standard deviation of
the population from which the no in vector x are
sampled.
Syn- sd(x, na.rm= TRUE)
5. scale()- returns the standard scores(z-score) for
the no in vector in x. Used to standardizing a
matrix.
Syn- x<- matrix(1:9, 3,3)
scale(x)
6. sum()- adds up all elements of a vector.
Syn- sum(X)
sum(c(1:10))
7. diff(x,lag=1)- returns suitably lagged and iterated
differences.
Syn- diff(x, lag, differences)
Where X is a numeric vector or matrix containing the
values to be differenced, lag is an integer indicating
which lag to use and difference is an integer indicating
the order of the difference.
• For ex., if lag=2, the difference between third and first
value, between the fourth and the second value are
calculated.
• The attribute differences returns the differences of
differences.
8. range()- returns a vector of the minimum and
maximum values.
Syn- x<- c(10,2,14,67,86,54)
range(x)
o/p- 2 86
9. rank()- This function returns the rank of the
numbers( in increasing order) in vector x.
Syn- rank(x, na.last = TRUE)
10. Skewness- how much differ from normal
distribution.
Syn- skewness(x)
Date and Time Functions
• R provides several options for dealing with date and
date/time.
• Three date/time classes commonly used in R are Date,
POSIXct and POSIXIt.
1. Date – date() function returns a date without time as
character string.Sys.Date() and Sys.time() returns the
system’s date and time.
Syn <- date()
Sys.Date()
Sys.time()
• We can create a date as follows-
• Dt <- as.Date(“2012-07-22”)
• While creating a date, the non-standard must be
specified.
• Dt2 <- as.Date(“04/20/2011” , format =“%m%d%Y”)
• Dt3 <- as.Date(“October 6, 2010”, format = “%B %d,%Y”)
2. POSIXct- If we have times in your data, this is
usually the best class to use. In POSIXct, “ct”
stands for calender time.
• We can create some POSIXct objects as follows.
Tm1<- as.POSIXct(“2013-07-24 23:55:26”)
o/p – “2013-07-24 23:55:26 PDT”
Tm2 <- as.POSIXct(“25072012 08:32:07”, format=
“%d%m%Y %H:%M:%S”)
• We can specify the time zone as follows.
Tm3<- as.POSIXct(“2010-12-01 11:42:03”,
tz=“GMT”)
• Times can be compared as follows.
• Tm2> Tm1
• We can add or subtract seconds as follows.
• Tm1 +30
• Tm1- 30
• Tm2 – Tm1
3. POSIXlt- This class enables easy extraction of
specific components of a time. In POSIXit, “lt”
stands for local time.
• “lt” also helps one remember that POSIXlt objects
are lists.
• Tm1.lt <- as.POSIXlt(“2013-07-24 23:55:26”)
• o/p- “2013-07-24 23:55:26”
• We can extract the components in time as follows.
• unlist(Tm1.lt)
sec min hour mday mon year wday yday isdat
26 55 23 24 6 113 3 204 1
• mday, wday, yday stands for day of the month, day of
the week and day of year resp.
• A particular component of a time can be extracted as
follows.
• Tm1.lt$sec
• we can truncate or round off the times as given below.
• trunc( Tm1.lt, “days”) o/p - “2013-07-24”
• trunc( Tm1.lt, “mins”) o/p – “2013-07-24 23:55:00”
Other Functions
1. rep( x, ntimes) – This function repeats x n
times.
Ex.- rep( 1:3,4)