Introduction To R For Business Analytics
Introduction To R For Business Analytics
12 October 2021
Vector
R is a vector-based language. A vector is a collection of data items. In the example below, we create
a vector x and assign values from 1 to 10. We can create a vector using c() function or an operator
such as sequence operator (:)
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
or
x <- 1:10
or
x[3:5]
In the following example, we create two vectors of 50 normally distributed random numbers and
plot them.
x <- rnorm(50)
x <- rnorm(50)
plot(x, y)
Object
1
NOTE: Object names must start with a letter or a dot. The names should contain letters, numbers,
underscore (_) or dots (.) only. The names cannot be the same as R keywords such as if, else and
for.
Packages
Arithmetic
In R, we can apply the following arithmetic operators: +, -, *, /, ^ and %% (modulo). For example
x <- 1:10
x <- x + 1
Functions
R comes with a lot of mathematical functions such as: abs(), exp(), sqrt(),min(x), and
sum(). For example
x <- 1:10
z <- sum(x)
Matrix
We can define a matrix using matrix() function. Compare the two commands below.
matrix_A <- matrix(1:12, ncol = 4)
matrix_B <- matrix(1:12, ncol = 4, byrow = TRUE)
We can find the dimension of a matrix using dim() function, the number of rows using nrow()
function, and the number of columns using ncol() function.
dim(matrix_A)
nrow(matrix_A)
ncol(matrix_A)
2
matrix_A[1:3, ]
Data frame
We often store our data in a data frame before we do some analysis. The following example shows a
data frame (in practice, the data is usually read from a file).
x <- rnorm(10)
y <- rnorm(10)
df = data.frame(x=x, y=y)
df
3
It is easier if we change the working directory to where the csv file is located. If you do not have any
csv file to play with, you can use revenue.csv from the blackboard. To change the working directory,
select session -> load workspace -> Choose Directory.
We can read a csv file using one of the following commands. The read.csv() function works if
the file uses comma as the separator symbol. The read.csv2() function works if the file uses
semicolon as the separator symbol. The read.table() function is the most flexible as we can
specify the separation symbol. The header argument is set to TRUE if the first line of the file being
read contains the header with the variable names. Please note that the data will be stored as data
frame df in the example below (hence, you can apply what you have learned from the earlier
section on data frame).
df <- read.csv("mydata.csv", header = TRUE)
df <- read.csv2("mydata.csv", header= TRUE)
df <- read.table("mydata.csv", header = TRUE, sep = ",")
To check that you are in the right working directory, you can use getwd() function. You can also
check if the data file is in the directory by using list.files() function.
Packages
Packages
4
}
Descriptive Statistics
Scatter plot
plot(df$NorthAm, df$SouthAm)
Measures of association:
cov(x=df$NorthAm, y=df$SouthAm)
cor(x=df$NorthAm, y=df$SouthAm)
5
Nominal data
R uses a special data structure called factors for nominal data. To create a factor, we use factor()
function that requires a vector that we want to turn into a factor. We can also include an optional
parameter called levels (in case we want the levels to be different than the one in the vector).
Alternatively, we can use as.factor()function. For example:
directions <- c("North", "East", "South", "West")
dir_cat <- factor(directions, labels = c("N", "E", "S", "W"))
dir_cat2 <- as.factor(directions)
Ordinal data
We can also use factor() function for ordinal data by setting ordered=TRUE. For example:
scale <- c("Low", "Medium", "High")
scale_cat <- factor(scale, ordered = TRUE)
Is there anything wrong with the content of scale_cat? Now try the following:
scale_ord <- factor(scale, ordered = TRUE, levels=c("Low",
"Medium", "High"))
Note that setting the correct data type will allow R to conduct correct analysis. To demonstrate this,
please load college.csv from the blackboard and check the state field. It is defined as characters;
hence, when you obtain the summary statistics using summary() function, it does not show
anything useful (unless you are interested in the total number of characters). Compare the output of
the summary() function after you convert the state field into a nominal or categorical data.
college <- read.csv("college.csv")
college$state
summary(college$state)
my_states <- as.factor(college$state)
summary(my_state)
Dates
6
which_q <- quarters(a_date)
The end