0% found this document useful (0 votes)
2 views

Module 2.9

The document provides a comprehensive guide on applying R to analyze a dataset of students, detailing commands for inspecting the dataset structure, dimensions, and general descriptions. It includes specific R functions to answer various analytical questions about the dataset, such as calculating averages, identifying unique values, and categorizing scores. Additionally, it outlines a hands-on activity for users to create their own dataset and apply similar analytical techniques.

Uploaded by

anonatnem
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 2.9

The document provides a comprehensive guide on applying R to analyze a dataset of students, detailing commands for inspecting the dataset structure, dimensions, and general descriptions. It includes specific R functions to answer various analytical questions about the dataset, such as calculating averages, identifying unique values, and categorizing scores. Additionally, it outlines a hands-on activity for users to create their own dataset and apply similar analytical techniques.

Uploaded by

anonatnem
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

2.

9 APPLYING R ON DATASETS

Now that we have a dataset that we can work on,


let’s inspect it first to better understand what we are
dealing with before doing something with it.

• Display the Dataset Structure

> str(students)
# str() is a function that displays the structure of the Object

• Check the Dimension of the Dataset

> dim(students)
# dim() displays the dimensions of the Object

• Draw out the names of the Columns

> names(students)
# names() will list down the Names of the Object (Columns)

• Get the General Description of the Dataset

> summary(students)
# summary() will list down a General Description of the Object
# the displayed result will vary based on the Object itself

• Plot a General Visualization of the Dataset


(more on this data exploration and visualization, Module 3)

> plot(students)
# plot() will list draw a general visualization based on the records and
columns of the Object.
# this will allow you to assess which variables would be best fitted to be
processed and evaluated more to better describe the dataset

Page 1 of 89
Now let’s use some investigate further on our
dataset and use the Basic Concepts of R to process
the data. Let’s start by

Page 2 of 89
asking some questions to better understand the
dataset and we will implement some R codes to get
the results.

1. How many records does the dataset contain?

> dim(students)
# dim() displays 2 values, rows and columns

> nrow(students)
# nrow() displays the number of rows in an object.
# ncol() displays the number of columns in an object.
# dim() displays both

Answer:

2. How many students in each Age?

> table(students$Age)
# table() displays the count of each unique value of a specific column

Answer: 19 20 21 22 23

3. What are the unique ages represented in the


dataset?

> unique(students$Age)
# unique() displays each unique value of a specific column

Answer:

4. What is the average score of the students?

> mean(students$Score)
# mean() calculates and displays the average of the values of the column

Page 3 of 89
Answer:

Page 4 of 89
5. Which student has the highest score?

> max(students$Score)
# max() displays maximum value of the provided data

> students[which.max(students$Score), ]
# which.max() searches for the maximum value and returns the complete
record (row) of the dataset using the index value

Answer:

6. Which student has the lowest score?

> min(students$Score)
# min() displays minimum value of the provided data

> students[which.min(students$Score), ]
# which.min() searches for the minimum value and returns the complete
record (row) of the dataset using the index value

Answer:

7. What is the median age of the students?

> median(students$Age)
# median() searches and displays the median value of the provided data

Answer:

8. How many students scored above 80?

> number_of_students_above_80 <- sum(students$Score > 80)


# sum() when used this way will behave as a count() in other languages

Answer:

Page 5 of 89
9. What is the age range of the students
(oldest and youngest)?

> range(students$Age)
# range() displays the minimum and maximum value of a specific dataset
# min() and max() can also be used. But these are 2 different commands

10. Are there any students with the same


score? If so, how many?

> table(students$Score)
# this is the simplest command to execute, find the score with a value
greater than 1 (one) and that’s it

> myTab <- table(students$Score)


> myTab_duplicate <- myTab[myTab > 1]
> nrow(myTab_duplicate)
# this is a bit complicated but all it does is create a table of unique values
of Score and a count of repetition then at the second command, it
removes all records whose count is not greater than 1.
# the last command counts the number of rows in your Object.

> sum(duplicated(students$Score))
# duplicated() counts the number of items that has a duplicated value
# using sum() as a counter

Answer:

11. How many students fall within


specific age groups (e.g., 18-20, 21-23)?

> myTab_by_Age <- cut(students$Age, breaks = c(18, 20, 23), right


= FALSE)
> myTab_count <- table(myTab_by_Age)
# cut() categorizes the ages based on the provided parameters in the breaks,
# the right parameters specifies if the right most value is included or not
# with the given category, running the table(), counts the number of
occurrences each category repeats itself, in other words, count.

Page 6 of 89
Answer: [18,20) [20,23)

12. What percentage of students


scored above the average score?

> mean(students$Score > mean(students$Score)) * 100


# mean() is the average of the Score.
# this process will collate all Scores that are greater than the average score
and then computes the percentage of the Scores above the average score.

Answer:

13. List down the all the students and put a


remark Passed when the score is 75 or above and
Failed if not.

> for (i in 1:nrow(students)) {


if (students$Score[i] >= 75) {
cat(students$Name[i],
"Passed.\n")
}
else {
cat(students$Name[i], "Failed.\n")
}
}

14. Create additional column in the


students dataset named Grade where A is
given to Scores from 90 and above, B from 80
to 89, C for Scores below 80

> students$Grade <- ifelse(students$Score >= 90, "A",


ifelse(students$Score >= 80, "B", "C"))
# creating a new column in a dataset is as simple as calling the students
dataset and placing a new column name Grade (students$Grade)
Page 7 of 89
HANDS-ON ACTIVITY # 4
ell then, now that we have learned how to use the
basic commands in R to process data, let us now
emulate what we have done in this session.

Scenario:
You are provided a dataset with at least 3 columns
and 40 records. You are then asked to describe the
dataset and provide some data processing operation
an produce a valuable result.

Data: You can create a dataset on your own or find some


simple datasets online

Assessing and Describing a Dataset:


Objective: Apply some analysis to describe your
dataset and present some valuable data to further
description of the said dataset.

Steps:

1. Create your dataset.

2. Load your dataset to R (refer to pages 77 to 80).

3. Use the basic R commands that will


describe the dataset (refer to page 81).

4. Once you are done with the basic dataset


description, do some data processing to better
evaluate and process the information in your
dataset. (use the example questions found in pages 81
to 85 as reference).
Page 8 of 89
Paste or write your R Scripts below.
(Should you be using a CSV file for your dataset, please include a copy within
this document)

1. How many records does the dataset contain?


> dim(students)

> nrow(students)

2. How many students in each Age?

> table(students$Age)

3. What are the unique ages represented in the dataset?

> unique(students$Age)

4. What is the average score of the students?

> mean(students$Score)

5. Which student has the highest score?

> max(students$Score)

> students[which.max(students$Score), ]

6. Which student has the lowest score?

> min(students$Score)

> students[which.min(students$Score), ]

7. What is the median age of the students?

> median(students$Age)

8. How many students scored above 80?

> number_of_students_above_80 <- sum(students$Score > 80)

9. What is the age range of the students (oldest and youngest)?


> range(students$Age)

(Add more sheets when needed)

Page 9 of 89
10. Are there any students with the same score? If so, how
many?
> table(students$Score)

> myTab <- table(students$Score)

> myTab_duplicate <- myTab[myTab > 1]

> nrow(myTab_duplicate)

> sum(duplicated(students$Score))

11. How many students fall within specific age groups (e.g., 18-20, 21-23)?
> myTab_by_Age <- cut(students$Age, breaks = c(18, 20, 23), right = FALSE)

> myTab_count <- table(myTab_by_Age)

12. What percentage of students scored above the average score?


> mean(students$Score > mean(students$Score)) * 100

13. List down the all the students and put a remark Passed when the score is 75
or above and Failed if not.
> for (i in 1:nrow(students)) {
if (students$Score[i] >= 75) {
cat(students$Name[i], "Passed.\n")
}
else {
cat(students$Name[i], "Failed.\n")
}
}
14. Create additional column in the students dataset named Grade where A is
given to Scores from 90 and above, B from 80 to 89, C for Scores below 80
> students$Grade <- ifelse(students$Score >= 90, "A",
ifelse(students$Score >= 80, "B", "C"))

Page 10 of 89
(Add more sheets when needed)

Page 11 of 89

You might also like