0% found this document useful (0 votes)
22 views

Lab 3 (Tutorial 1)

lab lecture notes of R language.

Uploaded by

neilzhaony
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Lab 3 (Tutorial 1)

lab lecture notes of R language.

Uploaded by

neilzhaony
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Basic data analysis using R

MACC7006 Accounting Data and Analytics

Keri Hu, Lin Lin, Shuling Chen

Faculty of Business and Economics

1/20
Today: more basic commands in R

By the end of today’s tutorial, you should be able to:

• Combine/add vectors into a data frame


• Understand some logic statements in R
• Generate subsets of data using logic operators and $
• Generate plots and tables

2/20
Recall: create vectors

Create two vectors, countries and life expectancy.

CountryName <- c("Brazil", "China", "India")


CountryName

## [1] "Brazil" "China" "India"

LifeExpectancy <- c(74, 75, 66)


LifeExpectancy

## [1] 74 75 66

3/20
Combine vectors into a data frame

We use the function data.frame().

CountryData <- data.frame(CountryName, LifeExpectancy)


CountryData

##

CountryName LifeExpectancy
1 Brazil 74
2 China 75
3 India 66

4/20
Add a vector to a data frame

We use $:
name of the data frame $ name of the new variable <- vector of values

CountryData$Population <- c(199000000, 1390000000,


1240000000)
str(CountryData)

The $ denotes a vector in a data frame.

5/20
Some logical statements in R

==, !=, >, <, >=, <=, |, &

• Differentiate: = (assignment) versus == (comparison)


1==0
## [1] FALSE
• R is case sensitive.
"HKU" != "Hku"
## [1] TRUE

6/20
Some logical statements in R

==, !=, >, <, >=, <=, |, &

• |: OR

1==0 | "HKU" != "Hku"

## [1] TRUE

• &: AND
1==0 & "HKU" != "Hku"
## [1] FALSE

7/20
The class function outputs the class of the object

x = 5
class(x)

## [1] "numeric"

y = "I am studying data analytics."


class(y)

## [1] "character"

z = TRUE
class(z)

## [1] "logical"

8/20
Subset the data

It is often useful to create a subset of a data frame to be used in analysis


or for building models.

• $ refers to a particular variable.

Example: Subset the data frame WHO to only contain countries in the
Europe region, the % of population under 15 ă 14, and fertility rate ă 1.5

Europelowfertility <- subset(WHO, WHO$Region == "Europe" &


WHO$Under15 < 14 & WHO$FertilityRate < 1.5)

9/20
Identify the extreme value

• The which.min() function returns index of the observation with


the minimum value of Under15.
• By looking at the country name for the 86th observation, we can
see that the country is Japan.
• The which.max() function returns index of the observation with
the maximum value of the variable.

10/20
Basic plots in R

A scatterplot with options (show the relationship):


plot(WHO$GNI, WHO$FertilityRate, ylab = "Fertility rate",
xlab = "GNI", main = "WHO data")

11/20
Basic plots in R

A histogram (show the distribution):


hist(WHO$CellularSubscribers)

12/20
Basic plots in R

A box plot of LifeExpectancy with observations sorted by Region


(show the statistical range):
boxplot(WHO$LifeExpectancy „ WHO$Region)

13/20
Summary tables in R

Summary of the variable Region:


table(WHO$Region)

14/20
Summary tables in R

Use logical statements to count observations

• How many countries have a life expectancy greater than 75?

table(WHO$LifeExpectancy > 75)

15/20
Summary tables in R

A two-dimensional table to compare values of two variables:

• How does the life expectancy vary across regions?

table(WHO$Region, WHO$LifeExpectancy > 75)

16/20
Summary tables in R

Compute the mean of Over60 (% of population over 60), with


observations sorted by Region:
tapply(WHO$Over60, WHO$Region, mean)

17/20
Summary tables in R

Compute minimum of LiteracyRate, sorted by Region:

• Include missing values of LiteracyRate:


tapply(WHO$LiteracyRate, WHO$Region, min)
• Exclude missing values of LiteracyRate:
tapply(WHO$LiteracyRate, WHO$Region, min, na.rm=TRUE)

18/20
Here are the commands/operators we covered today:

• data.frame()
• $
• ==, !=, >, <, >=, <=, |, &
• class()
• subset()
• which.min(), which.max()
• plot(), hist(), boxplot()
• table(), tapply()

19/20
Acknowledgement

I received enormous help from Xiaowei Zhang and Zack Goodman in


developing this course and preparing relevant materials.

20/20

You might also like