Functions and Flow Control
Functions and Flow Control
Presidency University
November, 2024
User Defined Functions
[1] 10
Doing more than one computation
testfunction <-function(x,y)
{
prod=x*y
su= x+y
return(c(prod,su))
}
testfunction(2,5)
[1] 10 7
Doing more than one computation
result=testfunction(2,5)
result[1]
[1] 10
result[2]
[1] 7
Doing more than one computation
I Alternatively multiple output can be extracted using list(). This will
enable us to extract by names (along with indices)
testfunction <- function(x,y)
{
prod=x*y
su= x+y
output=list(prod,su) #--- Creates the list
names(output)=c("Product", "Sum") #--- name them (optional)
return(output) #---- returns the list
}
result=testfunction(2,5)
result
$Product
[1] 10
$Sum
[1] 7
[1] 10
[1] 7
Default argument of a function
I R provides method to define the default value of the arguments
while defining the function. These default values will be used when
the function is called unless this argument values are changed
during calling.
testfunction <- function(x=1,y=1)
{
prod=x*y
su= x+y
output=list(prod,su) #--- Creates the list
names(output)=c("Product", "Sum") #--- name them (optional)
return(output) #---- returns the list
}
testfunction() #--call with no argument
$Product
[1] 1
$Sum
[1] 2
testfunction(x=4)
$Product
[1] 4
$Sum
[1] 5
Additional Arguments
I Provision for additional arguments ( probably optional arguments,
which cannot be decided beforehand) can be done using ...
$Product
[1] 10
$Sum
[1] 7
Data types of arguments
I Since the types of the arguments are not specified (at the time of
definition), the arguments can be of any data type provided the
internal code of the function is conformable with that data
type
$Product
[1] 3 8
$Sum
[1] 4 6
testfunction= function(x,y)
{
#---check if the arguments are not characters
stopifnot( typeof(x)!="character", typeof(y)!="character" )
prod=x*y
su= x+y
output=list(prod,su) #--- Creates the list
names(output)=c("Product", "Sum") #--- name them (optional)
return(output)
}
testfunction("F","M")
0.15
0.00
−4 −2 0 2 4
x
Example: Does lim sin x1 exist?
x→0
−1.0
−2 −1 0 1 2
x
Example (Contd.): Zoom at the origin
−1.0
x
Applications: Solving Equation
I For equations involving one variable we can use uniroot( function, interval,.....)
I For solving e x = sin(x) we write
$root
[1] -3.183063
$f.root
[1] -1.359327e-08
$iter
[1] 8
$init.it
[1] NA
$estim.prec
[1] 6.103516e-05
I For finding real or complex roots of a ploynomial use polyroot() and for solving
roots of n non-linear equations we can use multiroot() in package rootSolve
Applications: Calculus
´1
I Definite integral can be done using integrate() .e.g. (x)dx can be done
0
using
integrate (function(x) x, 0, 1)
´1
I Definite integral can be done using integrate() .e.g. (x)dx can be done
0
using
integrate (function(x) x, 0, 1)
optimize(function(x) exp(-x),c(0,5))
$minimum
[1] 4.999936
$objective
[1] 0.006738379
Applications: Optimization
optimize(function(x) exp(-x),c(0,5))
$minimum
[1] 4.999936
$objective
[1] 0.006738379
I The syntax is
for ( variable in sequence)
{
expression to be evaluated
}
Loops in R
I Loops helps to repeat a job. We first start with for loop.
I The syntax is
for ( variable in sequence)
{
expression to be evaluated
}
I The syntax is
for ( variable in sequence)
{
expression to be evaluated
}
I The syntax is
for ( variable in sequence)
{
expression to be evaluated
}
I The syntax is
while ( condition )
{
expression to be evaluated
}
While Loop
I The syntax is
while ( condition )
{
expression to be evaluated
}
I The loop repeats its action untill the test condition is not
satisfied.
While Loop
I The syntax is
while ( condition )
{
expression to be evaluated
}
I The loop repeats its action untill the test condition is not
satisfied.
I The syntax is
new variable= ifelse( Some Condition , Value of new
variable if condition is true, value if condition is false)
If-Else function
I The syntax is
new variable= ifelse( Some Condition , Value of new
variable if condition is true, value if condition is false)
I The syntax is
new variable= ifelse( Some Condition , Value of new
variable if condition is true, value if condition is false)
I When we have more than two cases we can use else-if ladder
Else if Ladder
I When we have more than two cases we can use else-if ladder
f=function(x)
{
if (x==1) print(a)
else if(x==2) print(b)
else print(c)
}
Switch Statement
I e.g. switch(2 , “A”, “B”, “C”) gives the answer “B”. It selects
the item no. 2 from the list.
Switch Statement
I e.g. switch(2 , “A”, “B”, “C”) gives the answer “B”. It selects
the item no. 2 from the list.
I e.g. switch(2 , “A”, “B”, “C”) gives the answer “B”. It selects
the item no. 2 from the list.
[1] 5.5
[1] 5
Repeat Loop
I Basic syntax is
repeat
{
expression to be evaluated
}
Repeat Loop
I Basic syntax is
repeat
{
expression to be evaluated
}
I Basic syntax is
repeat
{
expression to be evaluated
}
I Basic syntax is
repeat
{
expression to be evaluated
}
Y = y0 N a +
60000
40000
20000
Population
I Suppose we choose y0 = 6611. We want to fit the model
Y = y0 N a +
Y = y0 N a +
Y = y0 N a +
$a
[1] 0.1258166
$iterations
[1] 58
$converged
[1] TRUE
What’s wrong with this?
x[x>2]
sum(x*y)
Avoid using loops in R
x[x>2]
sum(x*y)
mydata=na.omit(airquality)
apply(mydata, MARGIN=2, FUN=min)
trimmed_mean = function(v) {
q1 = quantile(v, prob=0.1)
q2 = quantile(v, prob=0.9)
return(mean(v[q1 <= v & v <= q2]))
}
I What kind of data type will apply() give us? Depends on what
function we pass. Suppose we have FUN=my.fun(), then:
I if my.fun() returns a single value, then apply() will return a
vector.
I if my.fun() returns k values, then apply() will return a matrix
with k rows (note: this is true regardless of whether
MARGIN=1 or MARGIN=2).
I if my.fun() returns different length outputs for different inputs,
then apply() will return a list.
I if my.fun() returns a list, then apply() will return a list.
A word of caution
I For example
I rowSums(), colSums(): for computing row, column sums of a
matrix
I rowMeans(), colMeans(): for computing row, column means of
a matrix
I max.col(): for finding the maximum position in each row of a
matrix
A word of caution
I For example
I rowSums(), colSums(): for computing row, column sums of a
matrix
I rowMeans(), colMeans(): for computing row, column means of
a matrix
I max.col(): for finding the maximum position in each row of a
matrix
I Combining these functions with logical indexing and vectorized
operations will enable you to do quite a lot.
I E.g., how to count the number of positives in each row of a
matrix?
x = matrix(rnorm(9), 3, 3)
# Don't do this (much slower for big matrices)
apply(x, MARGIN=1, function(v) { return(sum(v > 0)) })
[1] 2 1 0
[1] 2 1 0
Using lapply()
The lapply() function takes inputs as in: lapply(x, FUN=my.fun),
to apply my.fun() across elements of a list or vector x. The output
is always a list.
Consider the following
x=2:5
lapply(x, FUN=log) #same as log(x)
[[1]]
[1] 0.6931472
[[2]]
[1] 1.098612
[[3]]
[1] 1.386294
[[4]]
[1] 1.609438
I Let us prepare a list and apply mean function to every element
of a list
$nums
[1] 0.2
$chars
[1] NA
$bools
[1] 0.3333333
lapply(my.list, FUN=summary)
$nums
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.10 0.15 0.20 0.20 0.25 0.30
$chars
Length Class Mode
3 character character
$bools
Mode FALSE TRUE
logical 2 1
Using sapply()
The sapply() function works just like lapply(), but tries to simplify
the return value whenever possible. E.g., most common is the
conversion from a list to a vector
Let us use sapply() in the previous example
sapply(my.list, FUN=mean) # Simplifies the result, now a vector
$nums
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.10 0.15 0.20 0.20 0.25 0.30
$chars
Length Class Mode
3 character character
$bools
Mode FALSE TRUE
logical 2 1
Using tapply()
[1] "list"
[1] "data.frame"
# For each region, display the first 3 rows of the data frame
lapply(state.by.reg, FUN=head, 3)
$Northeast
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862
Maine 1058 3694 0.7 70.39 2.7 54.7 161 30920
Massachusetts 5814 4755 1.1 71.83 3.3 58.5 103 7826
$South
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
Delaware 579 4809 0.9 70.06 6.2 54.6 103 1982
$`North Central`
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Illinois 11197 5107 0.9 70.14 10.3 52.6 127 55748
Indiana 5313 4458 0.7 70.88 7.1 52.9 122 36097
Iowa 2861 4628 0.5 72.56 2.3 59.0 140 55941
$West
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
# For each region, average each of the 8 numeric variables
lapply(state.by.reg, FUN=function(df) {
return(apply(df, MARGIN=2, mean))
})
$Northeast
Population Income Illiteracy Life.Exp Murder HS.Grad
5495.111111 4570.222222 1.000000 71.264444 4.722222 53.966667
Frost Area
132.777778 18141.000000
$South
Population Income Illiteracy Life.Exp Murder HS.Grad
4208.12500 4011.93750 1.73750 69.70625 10.58125 44.34375
Frost Area
64.62500 54605.12500
$`North Central`
Population Income Illiteracy Life.Exp Murder HS.Grad
4803.00000 4611.08333 0.70000 71.76667 5.27500 54.51667
Frost Area
138.83333 62652.00000
$West
Population Income Illiteracy Life.Exp Murder HS.Grad
2.915308e+03 4.702615e+03 1.023077e+00 7.123462e+01 7.215385e+00 6.200000e+01
Frost Area
Split Apply Combine Procedure
[1] 625 8
head(strikes.df)
Call:
lm(formula = strike.volume ~ left.parliament, data = strikes.df.italy)
Residuals:
Min 1Q Median 3Q Max
-930.2 -411.6 -137.3 387.2 1901.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -738.75 1200.62 -0.615 0.543
left.parliament 40.29 27.76 1.451 0.156
2500
2000
Strike volume
1500
1000
500
38 40 42 44 46 48
Leftwing alignment
I (Functionalization) The next step is to turn this into a function
my.strike.lm = function(country.df) {
coef(lm(strike.volume ~ left.parliament, data=country.
}
my.strike.lm(strikes.df.italy)
(Intercept) left.parliament
-738.74531 40.29109
I (Split data into appropriate chunks) Next we shall split our
data into appropriate chunks, each of which can be handled by
our function. For this purpose, the function split() in R is
often helpful: split(df, f=my.factor) splits a data frame df into
several data frames, defined by constant levels of the factor
my.factor. So we want to split strikes.df into 18 smaller data
frames, each of which has the data for just one country.
strikes.by.country = split(strikes.df, f=strikes.df$country)
class(strikes.by.country)
[1] "list"
names(strikes.by.country) # It has one element for each country
[1] "Australia" "Austria" "Belgium" "Canada" "Denmark"
[6] "Finland" "France" "Germany" "Ireland" "Italy"
[11] "Japan" "Netherlands" "New.Zealand" "Norway" "Sweden"
[16] "Switzerland" "UK" "USA"
head(strikes.by.country$Italy) # Same as what we saw before
40
Regression coefficient
20
0
−20
y
Using plyr
I Here
I .data : an array
I .margins : index (or indices) to split the array by
I .fun : the function to be applied to each piece
I ... : additional arguments to be passed to the function.
I Note that this looks like:
, , Group2
, , Group3
library(plyr)
X1 V1
1 row1 117
2 row2 126
3 row3 135
$`1`
[1] 117
$`2`
[1] 126
$`3`
[1] 135
attr(,"split_type")
[1] "array"
attr(,"split_labels")
X1
1 row1
2 row2
3 row3
I Now we change the index which will create a different splitting.
X2
X1 Group1 Group2 Group3
column1 6 33 60
column2 15 42 69
column3 24 51 78
X1 X2 V1
1 column1 Group1 6
2 column2 Group1 15
3 column3 Group1 24
4 column1 Group2 33
5 column2 Group2 42
6 column3 Group2 51
7 column1 Group3 60
8 column2 Group3 69
9 column3 Group3 78
alply(new.array, 2:3, sum) # Get back a list
$`1`
[1] 6
$`2`
[1] 15
$`3`
[1] 24
$`4`
[1] 33
$`5`
[1] 42
$`6`
[1] 51
$`7`
[1] 60
$`8`
[1] 69
$`9`
l*ply() - the input is a list
Here
I .data : a list
I .fun : the function to be applied to each element
I ... : additional arguments to be passed to the function
Note that this looks like:
1 2
[1,] "-3.66418302870311" "2.6689240524252"
[2,] "a" "z"
[3,] "365" "21198"
ldply(my.list, range) # Get back a data frame
.id V1 V2
1 nums -3.66418302870311 2.6689240524252
2 lets a z
3 pops 365 21198
$nums
[1] -3.664183 2.668924
$lets
[1] "a" "z"
$pops
[1] 365 21198
laply(my.list, summary) # Doesn't work! Outputs have different types/lengths
$nums
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.66418 -0.70961 -0.01693 -0.02758 0.63021 2.66892
$lets
Length Class Mode
26 character character
$pops
Min. 1st Qu. Median Mean 3rd Qu. Max.
365 1080 2838 4246 4968 21198
The fourth option for * I
par(mfrow=c(3,3), mar=c(4,4,1,1))
a_ply(new.array, 2:3, plot, ylim=range(new.array), pch=19, c
The fourth option for * II
25
25
25
piece
piece
piece
15
15
15
5
5
0
0
1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0
25
25
piece
piece
piece
15
15
15
5
5
0
0
1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0
25
25
piece
piece
piece
15
15
15
5
5
0
0
1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0
$Australia
(Intercept) left.parliament
414.7712254 -0.8638052
$Austria
(Intercept) left.parliament
423.077279 -8.210886
$Belgium
(Intercept) left.parliament
-56.926780 8.447463
$Canada
(Intercept) left.parliament
-227.8218 17.6766
$Denmark
(Intercept) left.parliament
-1399.35735 34.34477
$Finland
(Intercept) left.parliament
108.2245 12.8422
Splitting on two (or more) variables
#First create a variable that indicates whether the year is pre 1975, and add
# it to the data frame
strikes.df$yearPre1975 = strikes.df$year <= 1975
# Then use (say) ddply() to compute regression coefficients for each country
[1] 36 4
head(strikes.coefs.1975)
[1] 36 4
head(strikes.coefs.1975)