Module 2.9
Module 2.9
9 APPLYING R ON DATASETS
> str(students)
# str() is a function that displays the structure of the Object
> dim(students)
# dim() displays the dimensions of the Object
> names(students)
# names() will list down the Names of the Object (Columns)
> summary(students)
# summary() will list down a General Description of the Object
# the displayed result will vary based on the Object itself
> plot(students)
# plot() will list draw a general visualization based on the records and
columns of the Object.
# this will allow you to assess which variables would be best fitted to be
processed and evaluated more to better describe the dataset
Page 1 of 89
Now let’s use some investigate further on our
dataset and use the Basic Concepts of R to process
the data. Let’s start by
Page 2 of 89
asking some questions to better understand the
dataset and we will implement some R codes to get
the results.
> dim(students)
# dim() displays 2 values, rows and columns
> nrow(students)
# nrow() displays the number of rows in an object.
# ncol() displays the number of columns in an object.
# dim() displays both
Answer:
> table(students$Age)
# table() displays the count of each unique value of a specific column
Answer: 19 20 21 22 23
> unique(students$Age)
# unique() displays each unique value of a specific column
Answer:
> mean(students$Score)
# mean() calculates and displays the average of the values of the column
Page 3 of 89
Answer:
Page 4 of 89
5. Which student has the highest score?
> max(students$Score)
# max() displays maximum value of the provided data
> students[which.max(students$Score), ]
# which.max() searches for the maximum value and returns the complete
record (row) of the dataset using the index value
Answer:
> min(students$Score)
# min() displays minimum value of the provided data
> students[which.min(students$Score), ]
# which.min() searches for the minimum value and returns the complete
record (row) of the dataset using the index value
Answer:
> median(students$Age)
# median() searches and displays the median value of the provided data
Answer:
Answer:
Page 5 of 89
9. What is the age range of the students
(oldest and youngest)?
> range(students$Age)
# range() displays the minimum and maximum value of a specific dataset
# min() and max() can also be used. But these are 2 different commands
> table(students$Score)
# this is the simplest command to execute, find the score with a value
greater than 1 (one) and that’s it
> sum(duplicated(students$Score))
# duplicated() counts the number of items that has a duplicated value
# using sum() as a counter
Answer:
Page 6 of 89
Answer: [18,20) [20,23)
Answer:
Scenario:
You are provided a dataset with at least 3 columns
and 40 records. You are then asked to describe the
dataset and provide some data processing operation
an produce a valuable result.
Steps:
> nrow(students)
> table(students$Age)
> unique(students$Age)
> mean(students$Score)
> max(students$Score)
> students[which.max(students$Score), ]
> min(students$Score)
> students[which.min(students$Score), ]
> median(students$Age)
Page 9 of 89
10. Are there any students with the same score? If so, how
many?
> table(students$Score)
> nrow(myTab_duplicate)
> sum(duplicated(students$Score))
11. How many students fall within specific age groups (e.g., 18-20, 21-23)?
> myTab_by_Age <- cut(students$Age, breaks = c(18, 20, 23), right = FALSE)
13. List down the all the students and put a remark Passed when the score is 75
or above and Failed if not.
> for (i in 1:nrow(students)) {
if (students$Score[i] >= 75) {
cat(students$Name[i], "Passed.\n")
}
else {
cat(students$Name[i], "Failed.\n")
}
}
14. Create additional column in the students dataset named Grade where A is
given to Scores from 90 and above, B from 80 to 89, C for Scores below 80
> students$Grade <- ifelse(students$Score >= 90, "A",
ifelse(students$Score >= 80, "B", "C"))
Page 10 of 89
(Add more sheets when needed)
Page 11 of 89