0% found this document useful (0 votes)
7 views38 pages

C3 DSC551 R Programming

The document outlines the key concepts of data management in R programming, including reading and writing data from various sources, handling missing values, and understanding data coercion. It provides examples of functions for importing and exporting data, as well as techniques for dealing with missing values and converting data types. Additionally, it emphasizes the importance of understanding coercion rules when working with different data classes in R.

Uploaded by

2025142175
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views38 pages

C3 DSC551 R Programming

The document outlines the key concepts of data management in R programming, including reading and writing data from various sources, handling missing values, and understanding data coercion. It provides examples of functions for importing and exporting data, as well as techniques for dealing with missing values and converting data types. Additionally, it emphasizes the importance of understanding coercion rules when working with different data classes in R.

Uploaded by

2025142175
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

DSC551: Programming for Data

Science (R Programming)
3. Data Management in R Programming

Lecturer, Department of Statistics


2024-10-01

Asmui Abd Rahim, DSC551:R , Oct 2024


Introduction
Aims:

1. Explains how to read data from various sources and how to write data to CSV file and text
files.
2. Discusses how to identify, handle, and replace missing values in datasets.
3. Explores the concept of converting data between different classes and highlights the
importance of understanding coercion rules.
4. Demonstrates how to combine datasets and how to select specific variables or
observations.
5. Illustrates how to sort data in ascending or descending order.

Asmui Abd Rahim, DSC551:R , Oct 2024


1. Getting data in and out
Reading data in R.
Reading data from external file.
Reading data with read.data() and read.csv().
Writing data into CSV and text file.

Reading data in R R to CSV file (.csv)

From text file (.txt), (.csv) R to text file (.txt)

Read Data Write Data

From MS Excel(.xlsx) R to excel files

From SPSS (.sav) R to SPSS files (.sav)

Asmui Abd Rahim, DSC551:R , Oct 2024


Some functions
Functions Explanation
data() Give the list of datasets in R.
data(trees) Called the trees data.
edit(trees) Invoke a text editor on an R object.
str(trees) Display internal structure of an R object.
colMeans(trees) Form column means.
rowMeans(trees) Form row means.

Warning

Make sure you close the edit() window before you run the next function/expression. Your
console is not ready to run until you close the window.

Asmui Abd Rahim, DSC551:R , Oct 2024


Read Data
Example using read.delim("clipboard")
Step 1: Type the data in excel file first, then copy (save to clipboard).
id Age Gender Height Weight
1 1 25 female 158 55
2 2 24 male 172 70
3 3 27 male 181 64

Step 2: Use read.delim("clipboard")

1 dt.char <- read.delim("clipboard")


2 dt.char
3
4 # to attach the data in R
5 attach(dt.char)
6
7 # demonstrate with and without attach

Asmui Abd Rahim, DSC551:R , Oct 2024


Example reading data with read.table() function.

1 read.table()
2 ?read.table

file: the name of a file, or a connection (which will be opened for reading if necessary).
header: a logical value indicating whether the file contains the names of the variables as its
first line.
sep: the field separator character. Values on each line of the file are separated by this
character. If sep = "" (the default for read.table) the separator is ‘white space’, that is
one or more spaces, tabs, newlines or carriage returns.
col.names: a vector of optional names for the variables. The default is to use "V" followed
by the column number.
stringsAsFactors: should character variable be coded as factors? by default TRUE.

Asmui Abd Rahim, DSC551:R , Oct 2024


To read the data from CSV file.
Example, please download CSV file, telco.csv in my ufuture files folder.

1 mydata1 <- read.csv("telco.csv")


2 mydata1
Gender Programs Car_Ownership Telco_Prefer Usage_GB Hour_Perday
1 Male Statistics Yes Celcom 14.6 3.0
2 Female Business Yes DiGi 15.7 3.8
3 Female Sciences No Celcom 14.8 4.0
4 Female Statistics No Maxis 15.4 3.5
5 Female Sciences No U-Mobile 12.9 3.0
6 Female Sciences Yes Celcom 22.4 6.0
7 Male Account Yes Maxis 28.0 6.5
8 Male Statistics No DiGi 19.2 5.5
9 Female Statistics Yes DiGi 25.4 6.0
10 Female Sciences No Maxis 25.3 6.0
11 Female Account Yes U-Mobile 5.0 1.5
12 Male Business Yes DiGi 27.5 6.0
13 Female Sciences Yes Celcom 9.8 3.0
14 Female Account Yes Maxis 8.3 2.0
15 Female Business Yes DiGi 23.5 6.0
16 Male Statistics No DiGi 23.5 6.2
17 Male Business Yes Maxis 15.2 3.5
18 Male Sciences No DiGi 13.7 3.4
19 Male Sciences Yes Maxis 28.2 6.0
20 Female Statistics No Celcom 30.4 8.0
21 Female Account No DiGi 32.4 8.0
22 Female Account No Maxis 17.9 4.5
If you use read.table() , need to specify the header=TRUE and sep=","

1 mydata2 <- read.table("telco.csv", header=TRUE, sep=",")


Asmui Abd Rahim, DSC551:R , Oct 2024
You can use file.choose() to opens a window where you can browse your computer
and select the file directly.

1 mydata1 <- read.csv(file.choose())

Tip

To read data from excel files, can use readxl or openxlsx package.
Or use menu File > Import Dataset >…

Warning

Make sure you close the window opened by file.choose() before you run to the next
line of your R script. Your console is not ready to run next line expression until you close the
window.

Asmui Abd Rahim, DSC551:R , Oct 2024


Write Data
To export the data from R to CSV file

1 data.trees <- trees


2 write.csv(data.trees,"trees.csv")
3
4 # write csv without row.names
5 write.csv(data.trees,"trees2.csv", row.names = FALSE)
6
7 write.table(data.trees, "trees.txt", sep="\t", col.names=NA)

Tip
1 # Directly export to Excel format
2 library(openxlsx)
3 write.xlsx(your_data_frame, file = "my_data.xlsx")
4
5 # Export to SPSS statistical software
6 library(haven)
7 write_sav(your_data_frame, "my_data.sav")

Asmui Abd Rahim, DSC551:R , Oct 2024


2. Dealing with Missing Values
Missing value denoted by NA, while NaN for undefined mathematical operations.
is.na() is used to test objects if they are NA.
is.nan() is used to test for NaN

1 vec1 <- c(1, 5, 7, NA, 10, 3)


2 vec2 <- c(NaN, 5, 7, NA, 10, 3)
3
4 # return a logical vector indicating which elements are NA
5 is.na(vec1)
[1] FALSE FALSE FALSE TRUE FALSE FALSE
1 is.na(vec2)
[1] TRUE FALSE FALSE TRUE FALSE FALSE
1 # return a logical vector indicating which elements are NaN
2 is.nan(vec1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE
1 is.nan(vec2)
[1] TRUE FALSE FALSE FALSE FALSE FALSE

Asmui Abd Rahim, DSC551:R , Oct 2024


Note

You can use is. function to test the object type/structure. Example, is.numeric() ,
is.matrix() , is.data.frame() and others.

Arithmetic expressions and functions that contain missing values yield missing values.

1 sum(vec1)
[1] NA

We have to put na.rm=TRUE option to removes missing values prior to calculations and
applies the function to the remaining values.

1 sum(vec1, na.rm=TRUE)
[1] 26

can remove any observation with missing data by using the na.omit()

Asmui Abd Rahim, DSC551:R , Oct 2024


Replacing 99 with NA.

1 w <- c(-2, 0, 4, 99, 8)


2
3 # replacing 99 with NA
4
5 w[w==99] <- NA
6 w
[1] -2 0 4 NA 8

Tracing missing value

1 is.na(w)
[1] FALSE FALSE FALSE TRUE FALSE
1 # counting the number of NAs
2 sum(is.na(w))
[1] 1

Asmui Abd Rahim, DSC551:R , Oct 2024


if missing values = 99.

1 w <- c(-2, 99, 4, 99, 8, 10, 11, 20, 99)


2 mean(w)
[1] 38.66667
1 w[w==99] <- NA
2 w
[1] -2 NA 4 NA 8 10 11 20 NA
1 mean(w, na.rm=TRUE)
[1] 8.5
1 # display not NA
2 w[!is.na(w)]
[1] -2 4 8 10 11 20
1 # or
2 w[complete.cases(w)]
[1] -2 4 8 10 11 20

Asmui Abd Rahim, DSC551:R , Oct 2024


More example

1 xx <- airquality[1:10,]
2 xx
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9
10 NA 194 8.6 69 5 10
1 colMeans(xx)
Ozone Solar.R Wind Temp Month Day
NA NA 11.98 65.10 5.00 5.50
1 colMeans(xx, na.rm=TRUE)
Ozone Solar.R Wind Temp Month Day
23.125 172.625 11.980 65.100 5.000 5.500

Asmui Abd Rahim, DSC551:R , Oct 2024


1 xx[complete.cases(xx),]
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9
1 na.omit(xx)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9

Asmui Abd Rahim, DSC551:R , Oct 2024


3. Coercion
What happen when different classes of R objects get mixed together?

1 y1 <- c(1.7, "a")


2 y2 <- c(TRUE, 2)
3 y3 <- c("a", T)
4
5 class(y1)
[1] "character"
1 class(y2)
[1] "numeric"
1 class(y3)
[1] "character"

Coercion occurs so that every element in the vector have the same class.
R will try to find a way to represent all of the objects in the vector in a reasonable fashion.

Asmui Abd Rahim, DSC551:R , Oct 2024


Objects can be explicitly coerced from one class to another using the as. functions, if
available;

1 x1 <- 1:6
2 class(x1)
[1] "integer"
1 as.numeric(x1)
[1] 1 2 3 4 5 6
1 as.logical(x1)
[1] TRUE TRUE TRUE TRUE TRUE TRUE
1 as.character(x1)
[1] "1" "2" "3" "4" "5" "6"

Asmui Abd Rahim, DSC551:R , Oct 2024


Sometimes, R can’t figure out how to coerce an object and this can result in NAs being
produced.

1 x2 <- c("a", "b", "c")


2
3 as.numeric(x2)
[1] NA NA NA
1 as.logical(x2)
[1] NA NA NA
1 as.complex(x2)
[1] NA NA NA

Asmui Abd Rahim, DSC551:R , Oct 2024


Changing object type

1 as.numeric() # change object type into numeric


2 as.character() # change object type into character
3 as.logical() # change object type logical
4 as.integer() # change object type integer

Changing object structure

1 as.factor() # change object structure to factor


2 as.vector() # change object structure to vector
3 as.matrix() # change object structure to matrix
4 as.array() # change object structure to array
5 as.data.frame() # change object structure to data frame
6 as.list() # change object structure to list

Asmui Abd Rahim, DSC551:R , Oct 2024


Example of change of object structure

1 mat1 <- matrix(1:20, nrow=5, ncol=4)


2 vec3 <- as.vector(mat1)
3 df1 <- as.data.frame(mat1)
4
5 print(mat1)
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
1 print(vec3)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 print(df1)
V1 V2 V3 V4
1 1 6 11 16
2 2 7 12 17
3 3 8 13 18
4 4 9 14 19
5 5 10 15 20

Note

Can check the structure of object by using str() function.


Asmui Abd Rahim, DSC551:R , Oct 2024
Conversion example

Asmui Abd Rahim, DSC551:R , Oct 2024


Why use it? Sometimes R needs data in a specific format to perform certain operations.
Explicit coercion ensures your data is compatible and consistent for your analysis.
How it works: You use a function like as.numeric() to tell R, “Hey, treat this data as a
number.”
Example: Let’s say you have a variable storing the value “10” as a character string. To use it
in a calculation, you’d use as.numeric("10") to convert it to a numeric data type.

Warning

While powerful, explicit coercion requires careful consideration. Forcing a conversion when
it’s not appropriate can lead to data loss or unexpected results. Always make sure the
conversion makes sense in the context of your data and analysis.

Asmui Abd Rahim, DSC551:R , Oct 2024


4. Merging and Subsetting Datasets
If your data exists in multiple locations, you’ll need to combine it.
Adding columns (variables) to data frame.
Two data frames are joined by one or more common key variables (an inner join).

1 stud_score <- merge(dataframeA, dataframeB, by="ID")

If don’t need to specify a common key, can use the cbind() function

1 A <- data.frame(V1=c("a", "b"), V2=c(10, 25))


2 B <- data.frame(Var1=c(100, 400), Var2=c("DD", "GG"))
3
4 AB <- cbind(A,B)
5 print(AB)
V1 V2 Var1 Var2
1 a 10 100 DD
2 b 25 400 GG

Can add variable using $ operator.

1 AB$Var3 <- c(8.5, 7.5)


2 AB
V1 V2 Var1 Var2 Var3
1 a 10 100 DD 8.5
2 b 25 400 GG 7.5 Asmui Abd Rahim, DSC551:R , Oct 2024
Adding rows (observations) to a data frame

1 H <- data.frame(V1="c", V2=44, Var1=200, Var2="AA", Var3=3.5)


2 ABH <- rbind(AB,H)

Warning

Two data frames must have the same variables, but they don’t have to be in the same order.

Before joining two dataframe, example, dfAB<-rbind(dataframeA,


dataframeB)do one of the following:

1. Delete the extra variables in dataframeA.


2. Create the additional variables in dataframeB and set them to NA.

Asmui Abd Rahim, DSC551:R , Oct 2024


5. Selecting variables
Use telco.csv and named the data frame as mydata1.
Several methods for keeping or deleting variables and observations.
Example selecting variables.

1 str(mydata1)
'data.frame': 45 obs. of 6 variables:
$ Gender : chr "Male" "Female" "Female" "Female" ...
$ Programs : chr "Statistics" "Business" "Sciences" "Statistics" ...
$ Car_Ownership: chr "Yes" "Yes" "No" "No" ...
$ Telco_Prefer : chr "Celcom" "DiGi" "Celcom" "Maxis" ...
$ Usage_GB : num 14.6 15.7 14.8 15.4 12.9 22.4 28 19.2 25.4 25.3 ...
$ Hour_Perday : num 3 3.8 4 3.5 3 6 6.5 5.5 6 6 ...
1 newdata1 <- mydata1[, c(1:3)]
2 head(newdata1)
Gender Programs Car_Ownership
1 Male Statistics Yes
2 Female Business Yes
3 Female Sciences No
4 Female Statistics No
5 Female Sciences No
6 Female Sciences Yes

Asmui Abd Rahim, DSC551:R , Oct 2024


Leaving the row indices blank [,j] selects all the rows by default.

1 # use variable name


2 vars <- c("Gender", "Programs", "Usage_GB")
3 newdata2 <- mydata1[,vars]
4 head(newdata2)
Gender Programs Usage_GB
1 Male Statistics 14.6
2 Female Business 15.7
3 Female Sciences 14.8
4 Female Statistics 15.4
5 Female Sciences 12.9
6 Female Sciences 22.4

Asmui Abd Rahim, DSC551:R , Oct 2024


Adding variables

1 head(newdata2)
Gender Programs Usage_GB
1 Male Statistics 14.6
2 Female Business 15.7
3 Female Sciences 14.8
4 Female Statistics 15.4
5 Female Sciences 12.9
6 Female Sciences 22.4
1 newdata2$Telco <- mydata1$Telco_Prefer
2 head(newdata2)
Gender Programs Usage_GB Telco
1 Male Statistics 14.6 Celcom
2 Female Business 15.7 DiGi
3 Female Sciences 14.8 Celcom
4 Female Statistics 15.4 Maxis
5 Female Sciences 12.9 U-Mobile
6 Female Sciences 22.4 Celcom

Asmui Abd Rahim, DSC551:R , Oct 2024


Dropping variables

1 newdata3 <- mydata1[c(-2,-4)]


2 head(newdata3)
Gender Car_Ownership Usage_GB Hour_Perday
1 Male Yes 14.6 3.0
2 Female Yes 15.7 3.8
3 Female No 14.8 4.0
4 Female No 15.4 3.5
5 Female No 12.9 3.0
6 Female Yes 22.4 6.0

Tip

When you modify the variables, be careful not to change the original data. It is advisable to
use the source datasets to creates fresh data.frame object. Thus, even if you incorrectly edit
the contents, you don’t mess up the original data.

Asmui Abd Rahim, DSC551:R , Oct 2024


Selecting observations
1 newdata4 <- mydata1[1:5, ]
2 newdata4
Gender Programs Car_Ownership Telco_Prefer Usage_GB Hour_Perday
1 Male Statistics Yes Celcom 14.6 3.0
2 Female Business Yes DiGi 15.7 3.8
3 Female Sciences No Celcom 14.8 4.0
4 Female Statistics No Maxis 15.4 3.5
5 Female Sciences No U-Mobile 12.9 3.0
1 # using logical operator to select observation
2 newdata5 <- mydata1[mydata1$Gender=="Male" & mydata1$Usage_GB>15, ]
3
4 newdata5
Gender Programs Car_Ownership Telco_Prefer Usage_GB Hour_Perday
7 Male Account Yes Maxis 28.0 6.5
8 Male Statistics No DiGi 19.2 5.5
12 Male Business Yes DiGi 27.5 6.0
16 Male Statistics No DiGi 23.5 6.2
17 Male Business Yes Maxis 15.2 3.5
19 Male Sciences Yes Maxis 28.2 6.0
24 Male Account No Maxis 21.5 5.0
27 Male Business No Maxis 16.2 4.0
30 Male Statistics No DiGi 23.2 6.0
31 Male Account Yes Maxis 16.5 4.0
32 Male Statistics Yes DiGi 19.7 5.0
39 Male Statistics Yes Maxis 21.5 5.0

Asmui Abd Rahim, DSC551:R , Oct 2024


The subset() function.
Easiest way to select variables and observations.

1 newdata6 <- subset(mydata1, Telco_Prefer=="Maxis")


2 newdata6
Gender Programs Car_Ownership Telco_Prefer Usage_GB Hour_Perday
4 Female Statistics No Maxis 15.4 3.5
7 Male Account Yes Maxis 28.0 6.5
10 Female Sciences No Maxis 25.3 6.0
14 Female Account Yes Maxis 8.3 2.0
17 Male Business Yes Maxis 15.2 3.5
19 Male Sciences Yes Maxis 28.2 6.0
22 Female Account No Maxis 17.9 4.5
24 Male Account No Maxis 21.5 5.0
27 Male Business No Maxis 16.2 4.0
29 Female Business Yes Maxis 13.3 3.5
31 Male Account Yes Maxis 16.5 4.0
34 Male Statistics No Maxis 11.9 3.0
36 Female Account Yes Maxis 16.5 4.0
37 Female Account No Maxis 11.7 3.0
39 Male Statistics Yes Maxis 21.5 5.0
41 Female Sciences Yes Maxis 7.6 2.0

Asmui Abd Rahim, DSC551:R , Oct 2024


1 newdata7 <- subset(mydata1, Telco_Prefer=="Maxis", select=c(1,2,4))
2 newdata7
Gender Programs Telco_Prefer
4 Female Statistics Maxis
7 Male Account Maxis
10 Female Sciences Maxis
14 Female Account Maxis
17 Male Business Maxis
19 Male Sciences Maxis
22 Female Account Maxis
24 Male Account Maxis
27 Male Business Maxis
29 Female Business Maxis
31 Male Account Maxis
34 Male Statistics Maxis
36 Female Account Maxis
37 Female Account Maxis
39 Male Statistics Maxis
41 Female Sciences Maxis

Asmui Abd Rahim, DSC551:R , Oct 2024


6. Sort, Rank and Order
1 xs <- c(10, 15, 6, 13, 18)
2
3 # increasing order
4 sort(xs)
[1] 6 10 13 15 18
1 # decreasing order
2 sort(xs, decreasing=T)
[1] 18 15 13 10 6
1 rev(sort(xs))
[1] 18 15 13 10 6
1 # give the position of the real data after sorting
2 order(xs)
[1] 3 1 4 2 5
1 # arrange according to the position of order xs
2 rank(xs)
[1] 2 4 1 3 5

Asmui Abd Rahim, DSC551:R , Oct 2024


sort() function in R great for sorting vectors, but it does not directly work on data frames.
To sort a data frame in R, we will need to use order() function in conjunction with
indexing.

1 # sort by Usage_GB in ascending order for newdata6


2 newdata6[order(newdata6$Usage_GB), ]
Gender Programs Car_Ownership Telco_Prefer Usage_GB Hour_Perday
41 Female Sciences Yes Maxis 7.6 2.0
14 Female Account Yes Maxis 8.3 2.0
37 Female Account No Maxis 11.7 3.0
34 Male Statistics No Maxis 11.9 3.0
29 Female Business Yes Maxis 13.3 3.5
17 Male Business Yes Maxis 15.2 3.5
4 Female Statistics No Maxis 15.4 3.5
27 Male Business No Maxis 16.2 4.0
31 Male Account Yes Maxis 16.5 4.0
36 Female Account Yes Maxis 16.5 4.0
22 Female Account No Maxis 17.9 4.5
24 Male Account No Maxis 21.5 5.0
39 Male Statistics Yes Maxis 21.5 5.0
10 Female Sciences No Maxis 25.3 6.0
7 Male Account Yes Maxis 28.0 6.5
19 Male Sciences Yes Maxis 28.2 6.0

Asmui Abd Rahim, DSC551:R , Oct 2024


For descending/decreasing order:

1 # sort by Usage_GB in descending order for newdata6


2 newdata6[order(newdata6$Usage_GB, decreasing=TRUE), ]
Gender Programs Car_Ownership Telco_Prefer Usage_GB Hour_Perday
19 Male Sciences Yes Maxis 28.2 6.0
7 Male Account Yes Maxis 28.0 6.5
10 Female Sciences No Maxis 25.3 6.0
24 Male Account No Maxis 21.5 5.0
39 Male Statistics Yes Maxis 21.5 5.0
22 Female Account No Maxis 17.9 4.5
31 Male Account Yes Maxis 16.5 4.0
36 Female Account Yes Maxis 16.5 4.0
27 Male Business No Maxis 16.2 4.0
4 Female Statistics No Maxis 15.4 3.5
17 Male Business Yes Maxis 15.2 3.5
29 Female Business Yes Maxis 13.3 3.5
34 Male Statistics No Maxis 11.9 3.0
37 Female Account No Maxis 11.7 3.0
14 Female Account Yes Maxis 8.3 2.0
41 Female Sciences Yes Maxis 7.6 2.0

Asmui Abd Rahim, DSC551:R , Oct 2024


For multiple columns:

1 newdata6[order(newdata6$Usage_GB, newdata6$Hour_Perday), ]
Gender Programs Car_Ownership Telco_Prefer Usage_GB Hour_Perday
41 Female Sciences Yes Maxis 7.6 2.0
14 Female Account Yes Maxis 8.3 2.0
37 Female Account No Maxis 11.7 3.0
34 Male Statistics No Maxis 11.9 3.0
29 Female Business Yes Maxis 13.3 3.5
17 Male Business Yes Maxis 15.2 3.5
4 Female Statistics No Maxis 15.4 3.5
27 Male Business No Maxis 16.2 4.0
31 Male Account Yes Maxis 16.5 4.0
36 Female Account Yes Maxis 16.5 4.0
22 Female Account No Maxis 17.9 4.5
24 Male Account No Maxis 21.5 5.0
39 Male Statistics Yes Maxis 21.5 5.0
10 Female Sciences No Maxis 25.3 6.0
7 Male Account Yes Maxis 28.0 6.5
19 Male Sciences Yes Maxis 28.2 6.0

Asmui Abd Rahim, DSC551:R , Oct 2024


The rank() function has several other arguments that you can use to customize the
ranking behavior. For example, you can use the ties.method argument to specify how to
handle ties.
You can also use the rank() function to rank data in other ways, such as by the order in
which they appear in the data frame.

1 # Create a data frame


2 df <- data.frame(
3 name = c("Alice", "Bob", "Charlie", "David", "Emily"),
4 score = c(85, 92, 78, 92, 88)
5 )
6
7 # Rank the scores in ascending order
8 df$rank_asc <- rank(df$score)
9 # Rank the scores in descending order
10 df$rank_desc <- rank(-df$score, ties.method = "min")
11
12 print(df)
name score rank_asc rank_desc
1 Alice 85 2.0 4
2 Bob 92 4.5 1
3 Charlie 78 1.0 5
4 David 92 4.5 1
5 Emily 88 3.0 3

Asmui Abd Rahim, DSC551:R , Oct 2024


Summary
1. Getting Data In and Out: It explains how to import data from various sources like CSV, Excel,
and SPSS files, as well as how to export data to CSV and text files.
2. Handling Missing Values: This section details how to identify, manage, and replace missing
values (NA and NaN) within datasets using functions like `is.na()`, `is.nan()`, and
`na.omit()`.
3. Coercion: It explores the concept of converting data between different classes (e.g.,
numeric, character, logical) using `as.` functions and emphasizes understanding coercion
rules.
4. Merging and Subsetting Datasets: This part demonstrates how to combine datasets by
adding columns or rows using functions like `merge()`, `cbind()`, and `rbind()`. It also
covers how to select specific variables or observations using indexing, logical operators, and
the `subset()` function.
5. Sorting, Ranking, and Ordering: This section illustrates how to sort data in ascending or
descending order using `sort()`, `rev()`, `order()`, and `rank()`.
Asmui Abd Rahim, DSC551:R , Oct 2024
End of Slides

Asmui Abd Rahim, DSC551:R , Oct 2024

You might also like