0% found this document useful (0 votes)
41 views

Lab 3 (Tutorial 1)

lab lecture notes of R language.

Uploaded by

neilzhaony
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Lab 3 (Tutorial 1)

lab lecture notes of R language.

Uploaded by

neilzhaony
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Basic data analysis using R

MACC7006 Accounting Data and Analytics

Keri Hu, Lin Lin, Shuling Chen

Faculty of Business and Economics

1/20
Today: more basic commands in R

By the end of today’s tutorial, you should be able to:

• Combine/add vectors into a data frame


• Understand some logic statements in R
• Generate subsets of data using logic operators and $
• Generate plots and tables

2/20
Recall: create vectors

Create two vectors, countries and life expectancy.

CountryName <- c("Brazil", "China", "India")


CountryName

## [1] "Brazil" "China" "India"

LifeExpectancy <- c(74, 75, 66)


LifeExpectancy

## [1] 74 75 66

3/20
Combine vectors into a data frame

We use the function data.frame().

CountryData <- data.frame(CountryName, LifeExpectancy)


CountryData

##

CountryName LifeExpectancy
1 Brazil 74
2 China 75
3 India 66

4/20
Add a vector to a data frame

We use $:
name of the data frame $ name of the new variable <- vector of values

CountryData$Population <- c(199000000, 1390000000,


1240000000)
str(CountryData)

The $ denotes a vector in a data frame.

5/20
Some logical statements in R

==, !=, >, <, >=, <=, |, &

• Differentiate: = (assignment) versus == (comparison)


1==0
## [1] FALSE
• R is case sensitive.
"HKU" != "Hku"
## [1] TRUE

6/20
Some logical statements in R

==, !=, >, <, >=, <=, |, &

• |: OR

1==0 | "HKU" != "Hku"

## [1] TRUE

• &: AND
1==0 & "HKU" != "Hku"
## [1] FALSE

7/20
The class function outputs the class of the object

x = 5
class(x)

## [1] "numeric"

y = "I am studying data analytics."


class(y)

## [1] "character"

z = TRUE
class(z)

## [1] "logical"

8/20
Subset the data

It is often useful to create a subset of a data frame to be used in analysis


or for building models.

• $ refers to a particular variable.

Example: Subset the data frame WHO to only contain countries in the
Europe region, the % of population under 15 ă 14, and fertility rate ă 1.5

Europelowfertility <- subset(WHO, WHO$Region == "Europe" &


WHO$Under15 < 14 & WHO$FertilityRate < 1.5)

9/20
Identify the extreme value

• The which.min() function returns index of the observation with


the minimum value of Under15.
• By looking at the country name for the 86th observation, we can
see that the country is Japan.
• The which.max() function returns index of the observation with
the maximum value of the variable.

10/20
Basic plots in R

A scatterplot with options (show the relationship):


plot(WHO$GNI, WHO$FertilityRate, ylab = "Fertility rate",
xlab = "GNI", main = "WHO data")

11/20
Basic plots in R

A histogram (show the distribution):


hist(WHO$CellularSubscribers)

12/20
Basic plots in R

A box plot of LifeExpectancy with observations sorted by Region


(show the statistical range):
boxplot(WHO$LifeExpectancy „ WHO$Region)

13/20
Summary tables in R

Summary of the variable Region:


table(WHO$Region)

14/20
Summary tables in R

Use logical statements to count observations

• How many countries have a life expectancy greater than 75?

table(WHO$LifeExpectancy > 75)

15/20
Summary tables in R

A two-dimensional table to compare values of two variables:

• How does the life expectancy vary across regions?

table(WHO$Region, WHO$LifeExpectancy > 75)

16/20
Summary tables in R

Compute the mean of Over60 (% of population over 60), with


observations sorted by Region:
tapply(WHO$Over60, WHO$Region, mean)

17/20
Summary tables in R

Compute minimum of LiteracyRate, sorted by Region:

• Include missing values of LiteracyRate:


tapply(WHO$LiteracyRate, WHO$Region, min)
• Exclude missing values of LiteracyRate:
tapply(WHO$LiteracyRate, WHO$Region, min, na.rm=TRUE)

18/20
Here are the commands/operators we covered today:

• data.frame()
• $
• ==, !=, >, <, >=, <=, |, &
• class()
• subset()
• which.min(), which.max()
• plot(), hist(), boxplot()
• table(), tapply()

19/20
Acknowledgement

I received enormous help from Xiaowei Zhang and Zack Goodman in


developing this course and preparing relevant materials.

20/20

You might also like