0% found this document useful (0 votes)
20 views7 pages

Introduction To R For Business Analytics

Uploaded by

Todd Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views7 pages

Introduction To R For Business Analytics

Uploaded by

Todd Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Introduction to R for Business Analytics

Professor Stephan Onggo

12 October 2021

Vector

R is a vector-based language. A vector is a collection of data items. In the example below, we create
a vector x and assign values from 1 to 10. We can create a vector using c() function or an operator
such as sequence operator (:)
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

or
x <- 1:10

To see the content of x:


x

To see the content of a specific element of x:


x[3]

or
x[3:5]

We can apply standard operations to the vector such as:


x <- x * 2

We can apply some functions to the vector


z <- sum(x)

In the following example, we create two vectors of 50 normally distributed random numbers and
plot them.
x <- rnorm(50)
x <- rnorm(50)
plot(x, y)

Object

You can list the available R objects using ls()function.


ls()

You can remove an object using rm()function.


ls(object_name)

1
NOTE: Object names must start with a letter or a dot. The names should contain letters, numbers,
underscore (_) or dots (.) only. The names cannot be the same as R keywords such as if, else and
for.

Packages

To install a package, use install.packages() function.


install.packages("package_name")

To load a package, use library() function.


library("package_name")

Arithmetic

In R, we can apply the following arithmetic operators: +, -, *, /, ^ and %% (modulo). For example
x <- 1:10
x <- x + 1

Functions

R comes with a lot of mathematical functions such as: abs(), exp(), sqrt(),min(x), and
sum(). For example
x <- 1:10
z <- sum(x)

Matrix

We can define a matrix using matrix() function. Compare the two commands below.
matrix_A <- matrix(1:12, ncol = 4)
matrix_B <- matrix(1:12, ncol = 4, byrow = TRUE)

We can find the dimension of a matrix using dim() function, the number of rows using nrow()
function, and the number of columns using ncol() function.
dim(matrix_A)
nrow(matrix_A)
ncol(matrix_A)

To see the content of a specific element of a matrix:


matrix_A[1, 2]
matrix_A[1:3, 2]

2
matrix_A[1:3, ]

We can apply standard matrix operations such as:


matrix_C <- matrix_A + matrix_B
t(matrix_C)
matrix_C <- matrix_A * matrix_B
matrix_C <- matrix_A %*% t(matrix_B)

Data frame

We often store our data in a data frame before we do some analysis. The following example shows a
data frame (in practice, the data is usually read from a file).
x <- rnorm(10)
y <- rnorm(10)
df = data.frame(x=x, y=y)
df

We can access the content using the following commands


df$x
df$y
df[1,1]

We can filter the data using an expression. For example:


df[df$x < 0,]

We can sort the data using order() function. For example:


df[order(df$x),]
df[order(df$x, decreasing = TRUE),]

We can use rbind() function to insert a row into a data frame


z <- rbind(df, c(5, 5))

We can delete a row using the following command


z <- z[-c(11),]

We can apply some functions to a column.


z <- sum(df$x)

Reading csv files

3
It is easier if we change the working directory to where the csv file is located. If you do not have any
csv file to play with, you can use revenue.csv from the blackboard. To change the working directory,
select session -> load workspace -> Choose Directory.

We can read a csv file using one of the following commands. The read.csv() function works if
the file uses comma as the separator symbol. The read.csv2() function works if the file uses
semicolon as the separator symbol. The read.table() function is the most flexible as we can
specify the separation symbol. The header argument is set to TRUE if the first line of the file being
read contains the header with the variable names. Please note that the data will be stored as data
frame df in the example below (hence, you can apply what you have learned from the earlier
section on data frame).
df <- read.csv("mydata.csv", header = TRUE)
df <- read.csv2("mydata.csv", header= TRUE)
df <- read.table("mydata.csv", header = TRUE, sep = ",")

To check that you are in the right working directory, you can use getwd() function. You can also
check if the data file is in the directory by using list.files() function.

Packages

To install a package, use install.packages("package_name") function.


install.packages("moments")

To load a package, use library("package_name") function.


library("moments")

Packages

To define a function, use this template


function(parameters){
do_your_calculation here
return(a_value)

4
}

Descriptive Statistics

Measures for centrality:


mean(df$NorthAm)
median(df$NorthAm)
midrange <-function(v) {
return((min(v) + max(v))/2)
}
getmode <- function(v) {
unique_val <- unique(v)
return(unique_val [which.max(tabulate(match(v, unique_val)))])
}

Measures for dispersion:


range(df$NorthAm)
quantile(df$NorthAm)
var(df$NorthAm)
sd(df$NorthAm)
scale(df$NorthAm)

Measures for shape:


library("moments")
skewness(df$NorthAm)
kurtosis(df$NorthAm)

Plot the histogram


hist(df$NorthAm)
hist(df$NorthAm, breaks = c(500000, 1000000, 1500000, 2000000))

Scatter plot
plot(df$NorthAm, df$SouthAm)

Measures of association:
cov(x=df$NorthAm, y=df$SouthAm)
cor(x=df$NorthAm, y=df$SouthAm)

5
Nominal data

R uses a special data structure called factors for nominal data. To create a factor, we use factor()
function that requires a vector that we want to turn into a factor. We can also include an optional
parameter called levels (in case we want the levels to be different than the one in the vector).
Alternatively, we can use as.factor()function. For example:
directions <- c("North", "East", "South", "West")
dir_cat <- factor(directions, labels = c("N", "E", "S", "W"))
dir_cat2 <- as.factor(directions)

Ordinal data

We can also use factor() function for ordinal data by setting ordered=TRUE. For example:
scale <- c("Low", "Medium", "High")
scale_cat <- factor(scale, ordered = TRUE)

Is there anything wrong with the content of scale_cat? Now try the following:
scale_ord <- factor(scale, ordered = TRUE, levels=c("Low",
"Medium", "High"))

Note that setting the correct data type will allow R to conduct correct analysis. To demonstrate this,
please load college.csv from the blackboard and check the state field. It is defined as characters;
hence, when you obtain the summary statistics using summary() function, it does not show
anything useful (unless you are interested in the total number of characters). Compare the output of
the summary() function after you convert the state field into a nominal or categorical data.
college <- read.csv("college.csv")
college$state
summary(college$state)
my_states <- as.factor(college$state)
summary(my_state)

Dates

We can use as.Date() function to create a date. For example:


a_date <- as.Date("2021-10-14")

R provides several functions to work on date variables. For example:


weekdays(a_date)
what_month <- months(a_date)

6
which_q <- quarters(a_date)

You can also apply some arithmetic operators. For example:


next_week <- a_date + 7
last_week <- a_date - 7

The end

You might also like