R Lectures Chapter 4
R Lectures Chapter 4
1 Summarizing objects
1. The summary() command is a general command that provides a sum-
mary of an object.
2. If you have numerical data, then you get a numerical summary (for ex-
ample, mean, max, min) but if the data are text, you get a note of how
many different items you have.
3. The summary() command works for both matrix and data frame objects
by summarizing the columns rather than the rows.
2 Summarizing samples
1. Numerical samples can be summarized by many commands. Following
commands that produce a single value as a summary statistic:
Command Explanation
max(x, na.rm = F ALSE) Shows the maximum value. By default NA values
are not removed. A value of NA is considered the
largest unless na.rm = TRUE is used.
min(x, na.rm = F ALSE) Shows the minimum value in a vector. If there are
NA values, this returns a value of NA unless
na.rm = TRUE is used.
length(x) Gives the length of the vector and includes any NA
values. The na.rm = instruction does not work with
this command.
sum(x, na.rm = F ALSE) Shows the sum of the vector elements.
mean(x, na.rm = F ALSE) Shows the arithmetic mean.
median(x, na.rm = F ALSE) Shows the median value of the vector.
sd(x, na.rm = F ALSE) Shows the standard deviation.
var(x, na.rm = F ALSE) Shows the variance.
mad(x, na.rm = F ALSE) Shows the median absolute deviation.
2. The quantile() command, for example, produces five values as its result
(the five basic quartiles).
1
3. You can use the na.omit() command to strip out NA items. Essentially,
you use this to temporarily remove NA items like so:
length(na.omit(object.name))
Command Explanation
max(f rame) The largest value in the entire data frame
min(f rame) The smallest value in the entire data frame
sum(f rame) The sum of the entire data frame
f ivenum(f rame) The Tukey’s five number summary for the entire data frame:
minimum, first(or lower)quartile, median
(or second quartile), upper(or third) quartile, maximum
length(f rame) The number of columns in the data frame
summary(f rame) Gives summary for each column
3 Cumulative statistics
1. Following commands produce cumulative values:
Command Explanation
cumsum(x) The cumulative sum of a vector
cummax(x) The cumulative maximum value
cummin(x) The cumulative minimum value
cumprod(x) The cumulative product
2. The seq along() command creates a simple index. For more details, read
page- 117 of referred book.
3. These commands can be combined and used to create a range of cumula-
tive statistics. For example the running mean:
cumsum(my.data)/seq(along = my.data)
2
2. The apply() command is more flexible in that any function can be applied
to the columns (default) or rows of a data frame or matrix. Tables can be
summarized in exactly the same way as data frames and matrix objects
by using apply(). The lapply() and sapply() commands are similar but
are designed to work with list objects.
5 Summarizing Tables
1. table() : Contingency tables can be created using the table().
2. ftable() : a flat table can be produced using the ftable() command.
3. xtabs() : Data can be cross-tabulated to form contingency tables using
the xtabs() command.
4. margin.table() : The margin.table() command gives sums for rows/columns.
5. prop.table() : The prop.table() command determines the proportion
that table entries make toward the total.
6. addmargins() : The addmargins() command applies any function to
rows/columns of a table.
7. class() and any() : The class() command can be used to view or set
the current type of an object. Objects can have more than one class so
if a test is required the any() command can be used to match any of the
classes that may be present.
8. is.table() and is.matrix() : These commands will test for a table and a
matrix, respectively. These commands produce a TRUE or FALSE result.
9. for() :The for() command can be used to create loops (for example, in
creating cumulative statistics like a running median).
10. if() :The if() command is used to test some condition and carry out a
command if the result is TRUE. It can add the command else to the end
to carry out a command when the result is FALSE.
11. any() :The any() command can be used in testing conditions to match
any item in a list.
12. function() : Customized commands can be created using the function()
command.
13. attach() : The enclosing object can be opened using the attach() com-
mand.
14. detach() : The detach() command closes the enclosing object.
15. with() : The with() command enables temporary access to an enclosing
object.