0% found this document useful (0 votes)

8 views

Module 2.9

The document provides a comprehensive guide on applying R to analyze a dataset of students, detailing commands for inspecting the dataset structure, dimensions, and general descriptions. It includes specific R functions to answer various analytical questions about the dataset, such as calculating averages, identifying unique values, and categorizing scores. Additionally, it outlines a hands-on activity for users to create their own dataset and apply similar analytical techniques.

Uploaded by

anonatnem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Module 2.9

Uploaded by

anonatnem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

2.

9 APPLYING R ON DATASETS

Now that we have a dataset that we can work on,

let’s inspect it first to better understand what we are
dealing with before doing something with it.

• Display the Dataset Structure

> str(students)
# str() is a function that displays the structure of the Object

• Check the Dimension of the Dataset

> dim(students)
# dim() displays the dimensions of the Object

• Draw out the names of the Columns

> names(students)
# names() will list down the Names of the Object (Columns)

• Get the General Description of the Dataset

> summary(students)
# summary() will list down a General Description of the Object
# the displayed result will vary based on the Object itself

• Plot a General Visualization of the Dataset

(more on this data exploration and visualization, Module 3)

> plot(students)
# plot() will list draw a general visualization based on the records and
columns of the Object.
# this will allow you to assess which variables would be best fitted to be
processed and evaluated more to better describe the dataset

Page 1 of 89
Now let’s use some investigate further on our
dataset and use the Basic Concepts of R to process
the data. Let’s start by

Page 2 of 89
asking some questions to better understand the
dataset and we will implement some R codes to get
the results.

1. How many records does the dataset contain?

> dim(students)
# dim() displays 2 values, rows and columns

> nrow(students)
# nrow() displays the number of rows in an object.
# ncol() displays the number of columns in an object.
# dim() displays both

Answer:

2. How many students in each Age?

> table(students$Age)
# table() displays the count of each unique value of a specific column

Answer: 19 20 21 22 23

3. What are the unique ages represented in the

dataset?

> unique(students$Age)
# unique() displays each unique value of a specific column

Answer:

4. What is the average score of the students?

> mean(students$Score)
# mean() calculates and displays the average of the values of the column

Page 3 of 89
Answer:

Page 4 of 89
5. Which student has the highest score?

> max(students$Score)
# max() displays maximum value of the provided data

> students[which.max(students$Score), ]
# which.max() searches for the maximum value and returns the complete
record (row) of the dataset using the index value

Answer:

6. Which student has the lowest score?

> min(students$Score)
# min() displays minimum value of the provided data

> students[which.min(students$Score), ]
# which.min() searches for the minimum value and returns the complete
record (row) of the dataset using the index value

Answer:

7. What is the median age of the students?

> median(students$Age)
# median() searches and displays the median value of the provided data

Answer:

8. How many students scored above 80?

> number_of_students_above_80 <- sum(students$Score > 80)

# sum() when used this way will behave as a count() in other languages

Answer:

Page 5 of 89
9. What is the age range of the students
(oldest and youngest)?

> range(students$Age)
# range() displays the minimum and maximum value of a specific dataset
# min() and max() can also be used. But these are 2 different commands

10. Are there any students with the same

score? If so, how many?

> table(students$Score)
# this is the simplest command to execute, find the score with a value
greater than 1 (one) and that’s it

> myTab <- table(students$Score)

> myTab_duplicate <- myTab[myTab > 1]
> nrow(myTab_duplicate)
# this is a bit complicated but all it does is create a table of unique values
of Score and a count of repetition then at the second command, it
removes all records whose count is not greater than 1.
# the last command counts the number of rows in your Object.

> sum(duplicated(students$Score))
# duplicated() counts the number of items that has a duplicated value
# using sum() as a counter

Answer:

11. How many students fall within

specific age groups (e.g., 18-20, 21-23)?

> myTab_by_Age <- cut(students$Age, breaks = c(18, 20, 23), right

= FALSE)
> myTab_count <- table(myTab_by_Age)
# cut() categorizes the ages based on the provided parameters in the breaks,
# the right parameters specifies if the right most value is included or not
# with the given category, running the table(), counts the number of
occurrences each category repeats itself, in other words, count.

Page 6 of 89
Answer: [18,20) [20,23)

12. What percentage of students

scored above the average score?

> mean(students$Score > mean(students$Score)) * 100

# mean() is the average of the Score.
# this process will collate all Scores that are greater than the average score
and then computes the percentage of the Scores above the average score.

Answer:

13. List down the all the students and put a

remark Passed when the score is 75 or above and
Failed if not.

> for (i in 1:nrow(students)) {

if (students$Score[i] >= 75) {
cat(students$Name[i],
"Passed.\n")
}
else {
cat(students$Name[i], "Failed.\n")
}
}

14. Create additional column in the

students dataset named Grade where A is
given to Scores from 90 and above, B from 80
to 89, C for Scores below 80

> students$Grade <- ifelse(students$Score >= 90, "A",

ifelse(students$Score >= 80, "B", "C"))
# creating a new column in a dataset is as simple as calling the students
dataset and placing a new column name Grade (students$Grade)
Page 7 of 89
HANDS-ON ACTIVITY # 4
ell then, now that we have learned how to use the
basic commands in R to process data, let us now
emulate what we have done in this session.

Scenario:
You are provided a dataset with at least 3 columns
and 40 records. You are then asked to describe the
dataset and provide some data processing operation
an produce a valuable result.

Data: You can create a dataset on your own or find some

simple datasets online

Assessing and Describing a Dataset:

Objective: Apply some analysis to describe your
dataset and present some valuable data to further
description of the said dataset.

Steps:

1. Create your dataset.

2. Load your dataset to R (refer to pages 77 to 80).

3. Use the basic R commands that will

describe the dataset (refer to page 81).

4. Once you are done with the basic dataset

description, do some data processing to better
evaluate and process the information in your
dataset. (use the example questions found in pages 81
to 85 as reference).
Page 8 of 89
Paste or write your R Scripts below.
(Should you be using a CSV file for your dataset, please include a copy within
this document)

1. How many records does the dataset contain?

> dim(students)

> nrow(students)

2. How many students in each Age?

> table(students$Age)

3. What are the unique ages represented in the dataset?

> unique(students$Age)

4. What is the average score of the students?

> mean(students$Score)

5. Which student has the highest score?

> max(students$Score)

> students[which.max(students$Score), ]

6. Which student has the lowest score?

> min(students$Score)

> students[which.min(students$Score), ]

7. What is the median age of the students?

> median(students$Age)

8. How many students scored above 80?

> number_of_students_above_80 <- sum(students$Score > 80)

9. What is the age range of the students (oldest and youngest)?

> range(students$Age)

(Add more sheets when needed)

Page 9 of 89
10. Are there any students with the same score? If so, how
many?
> table(students$Score)

> myTab <- table(students$Score)

> myTab_duplicate <- myTab[myTab > 1]

> nrow(myTab_duplicate)

> sum(duplicated(students$Score))

11. How many students fall within specific age groups (e.g., 18-20, 21-23)?
> myTab_by_Age <- cut(students$Age, breaks = c(18, 20, 23), right = FALSE)

> myTab_count <- table(myTab_by_Age)

12. What percentage of students scored above the average score?

> mean(students$Score > mean(students$Score)) * 100

13. List down the all the students and put a remark Passed when the score is 75
or above and Failed if not.
> for (i in 1:nrow(students)) {
if (students$Score[i] >= 75) {
cat(students$Name[i], "Passed.\n")
}
else {
cat(students$Name[i], "Failed.\n")
}
}
14. Create additional column in the students dataset named Grade where A is
given to Scores from 90 and above, B from 80 to 89, C for Scores below 80
> students$Grade <- ifelse(students$Score >= 90, "A",
ifelse(students$Score >= 80, "B", "C"))

Page 10 of 89
(Add more sheets when needed)

Page 11 of 89

Complete Marketing Analytics A Practical Guide To Real Marketing Science 1st Edition Mike Grigsby PDF For All Chapters
67% (6)
Complete Marketing Analytics A Practical Guide To Real Marketing Science 1st Edition Mike Grigsby PDF For All Chapters
62 pages
R Programming Cheatsheet
100% (1)
R Programming Cheatsheet
6 pages
Lab 5
0% (1)
Lab 5
5 pages
Year 3 Naplan - Style Tests: Get The Results You Want!
67% (3)
Year 3 Naplan - Style Tests: Get The Results You Want!
29 pages
Forecasting Exercises Problem
100% (2)
Forecasting Exercises Problem
4 pages
R Assignment
No ratings yet
R Assignment
9 pages
Capstone Project On R Studio
No ratings yet
Capstone Project On R Studio
13 pages
Experiment 5
No ratings yet
Experiment 5
13 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
r file code
No ratings yet
r file code
16 pages
Notebook 1 - Basic R & Data Exploration
No ratings yet
Notebook 1 - Basic R & Data Exploration
19 pages
FE418_RLectureNotes1
No ratings yet
FE418_RLectureNotes1
15 pages
R study material I
No ratings yet
R study material I
8 pages
Kids C ("Jack", "Jill") : 5.1 Creating Data Frames
No ratings yet
Kids C ("Jack", "Jill") : 5.1 Creating Data Frames
11 pages
R code
No ratings yet
R code
9 pages
R Commands
No ratings yet
R Commands
18 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
R Course Own English HS
No ratings yet
R Course Own English HS
70 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages
L3 Notes-1
No ratings yet
L3 Notes-1
8 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
R
No ratings yet
R
13 pages
Practical 1 EDA
No ratings yet
Practical 1 EDA
14 pages
Statistic And R Programming lab Exercise
No ratings yet
Statistic And R Programming lab Exercise
8 pages
R-Programming-Cheat-Sheet
No ratings yet
R-Programming-Cheat-Sheet
7 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Notebook 1_ Basic R & Data Exploration - Jupyter Notebook
No ratings yet
Notebook 1_ Basic R & Data Exploration - Jupyter Notebook
21 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
R Functions
No ratings yet
R Functions
8 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
Data Cleansing
No ratings yet
Data Cleansing
18 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
advance R prog.-1
No ratings yet
advance R prog.-1
24 pages
DS_Practice
No ratings yet
DS_Practice
3 pages
DS Lab
No ratings yet
DS Lab
31 pages
Da (22C01156)
No ratings yet
Da (22C01156)
26 pages
First Course On R
No ratings yet
First Course On R
26 pages
MBA C MBA20207 SinghPravinRajendra Assignment R
No ratings yet
MBA C MBA20207 SinghPravinRajendra Assignment R
6 pages
R Record
No ratings yet
R Record
16 pages
Econ 2b03 Assignment 1
No ratings yet
Econ 2b03 Assignment 1
8 pages
R Programming Notes
No ratings yet
R Programming Notes
23 pages
Lab Manual Record: St. Josephs PG College
No ratings yet
Lab Manual Record: St. Josephs PG College
14 pages
Lesson 7 - The Data Frame
No ratings yet
Lesson 7 - The Data Frame
7 pages
arunav da prac
No ratings yet
arunav da prac
55 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Midterm Session II #0000000224 - On March 25, 2016 14 13: Processing
No ratings yet
Midterm Session II #0000000224 - On March 25, 2016 14 13: Processing
11 pages
DSDA MANUAL
No ratings yet
DSDA MANUAL
64 pages
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
Part a r Programming
No ratings yet
Part a r Programming
10 pages
Notebook 1 - Basic R & Data Exploration
No ratings yet
Notebook 1 - Basic R & Data Exploration
19 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
DMPA Codes
No ratings yet
DMPA Codes
16 pages
Unit - I: Topic - 1
No ratings yet
Unit - I: Topic - 1
13 pages
UL2
No ratings yet
UL2
2 pages
R Programming
No ratings yet
R Programming
11 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
All Values in The First Column
No ratings yet
All Values in The First Column
7 pages
R Studio
No ratings yet
R Studio
13 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
Raising Capital
No ratings yet
Raising Capital
14 pages
Intellectual Property
No ratings yet
Intellectual Property
29 pages
Techno Reporting
No ratings yet
Techno Reporting
45 pages
Copy-of-Think-Safe-HIRAC-G2-B2
No ratings yet
Copy-of-Think-Safe-HIRAC-G2-B2
1 page
ESTIMATION OF WIRES AND CONDUITS
No ratings yet
ESTIMATION OF WIRES AND CONDUITS
4 pages
Ethics, Social Responsibility and Globalization (1)
No ratings yet
Ethics, Social Responsibility and Globalization (1)
17 pages
Copy-of-Think-Safe-HIRAC-G1-B2
No ratings yet
Copy-of-Think-Safe-HIRAC-G1-B2
1 page
Copy of Think-Safe-COSH-JHA-Workshop-G2-B2 (1)
No ratings yet
Copy of Think-Safe-COSH-JHA-Workshop-G2-B2 (1)
12 pages
Cost Estimate
No ratings yet
Cost Estimate
2 pages
Effects-of-Urbanization_ALDEON
No ratings yet
Effects-of-Urbanization_ALDEON
10 pages
Group-1_ALLOSA
No ratings yet
Group-1_ALLOSA
9 pages
MARTIN RESUME 1 (3)
No ratings yet
MARTIN RESUME 1 (3)
1 page
ANONAT, NEMESIO-ASSIGNMENT NO.1
No ratings yet
ANONAT, NEMESIO-ASSIGNMENT NO.1
8 pages
Assgt de Application 01
No ratings yet
Assgt de Application 01
4 pages
Finals Exam Sol'n
No ratings yet
Finals Exam Sol'n
4 pages
MENDICANCY Position Paper Final1
No ratings yet
MENDICANCY Position Paper Final1
6 pages
LECTURE NO. 5 Measures of Central Tendency
No ratings yet
LECTURE NO. 5 Measures of Central Tendency
27 pages
2023 Bcm 3a 11_51485_basic Numerical Skills
No ratings yet
2023 Bcm 3a 11_51485_basic Numerical Skills
8 pages
BLS: Is The US Labor Market For Truck Drivers Broken?
No ratings yet
BLS: Is The US Labor Market For Truck Drivers Broken?
21 pages
Primary Market and It
No ratings yet
Primary Market and It
44 pages
Microchip PIC16F PIC16F882 Learning Centre MCU Application Notes Microchip - Application - Notes - 2
No ratings yet
Microchip PIC16F PIC16F882 Learning Centre MCU Application Notes Microchip - Application - Notes - 2
16 pages
Module 2 Math 10
No ratings yet
Module 2 Math 10
6 pages
average-questions-free-pdf-for-bank-prelims-exam-english-version
No ratings yet
average-questions-free-pdf-for-bank-prelims-exam-english-version
5 pages
Plantilla Resultats Gallup
No ratings yet
Plantilla Resultats Gallup
8 pages
A Case Study of Misconceptions Students in The Learning of Mathematics The Concept Limit Function in High School
No ratings yet
A Case Study of Misconceptions Students in The Learning of Mathematics The Concept Limit Function in High School
8 pages
MAT 510 Business Statistics Week 1 To 11
No ratings yet
MAT 510 Business Statistics Week 1 To 11
7 pages
ELJMC Mathematics in The Modern World 4
100% (1)
ELJMC Mathematics in The Modern World 4
10 pages
Atr Report
100% (2)
Atr Report
23 pages
Measures of Central Tendency
100% (3)
Measures of Central Tendency
36 pages
Lesson Plan For Central Tendency
No ratings yet
Lesson Plan For Central Tendency
4 pages
16.6 Using Statistics - G6
No ratings yet
16.6 Using Statistics - G6
17 pages
Lesson Plan - Central Tendency
No ratings yet
Lesson Plan - Central Tendency
10 pages
Essential Math Skills AsA-level Business - Charlotte Wright
No ratings yet
Essential Math Skills AsA-level Business - Charlotte Wright
113 pages
4 Math154-1 Module 1 Measures of Central Tendency
No ratings yet
4 Math154-1 Module 1 Measures of Central Tendency
30 pages
MMW Measures of Central Tendency
No ratings yet
MMW Measures of Central Tendency
28 pages
Stat Unit 1
No ratings yet
Stat Unit 1
125 pages
Statistical Analysis (Weighted Average & Likert Scale)
100% (3)
Statistical Analysis (Weighted Average & Likert Scale)
21 pages
Index Numbers
No ratings yet
Index Numbers
20 pages
GRE数学170难题3 0
No ratings yet
GRE数学170难题3 0
42 pages
Robust Analysis 5725-5
No ratings yet
Robust Analysis 5725-5
2 pages
Computer Numerical and Statistical Method Unit 1 Calicut Univercitty Note
No ratings yet
Computer Numerical and Statistical Method Unit 1 Calicut Univercitty Note
24 pages
Yr 9 Exams PDF
No ratings yet
Yr 9 Exams PDF
112 pages
Studenttext
No ratings yet
Studenttext
26 pages

Module 2.9

Uploaded by

Module 2.9

Uploaded by

2.

Now that we have a dataset that we can work on,

• Display the Dataset Structure

• Check the Dimension of the Dataset

• Draw out the names of the Columns

• Get the General Description of the Dataset

• Plot a General Visualization of the Dataset

1. How many records does the dataset contain?

2. How many students in each Age?

3. What are the unique ages represented in the

4. What is the average score of the students?

6. Which student has the lowest score?

7. What is the median age of the students?

8. How many students scored above 80?

> number_of_students_above_80 <- sum(students$Score > 80)

10. Are there any students with the same

> myTab <- table(students$Score)

11. How many students fall within

> myTab_by_Age <- cut(students$Age, breaks = c(18, 20, 23), right

12. What percentage of students

> mean(students$Score > mean(students$Score)) * 100

13. List down the all the students and put a

> for (i in 1:nrow(students)) {

14. Create additional column in the

> students$Grade <- ifelse(students$Score >= 90, "A",

Data: You can create a dataset on your own or find some

Assessing and Describing a Dataset:

1. Create your dataset.

2. Load your dataset to R (refer to pages 77 to 80).

3. Use the basic R commands that will

4. Once you are done with the basic dataset

1. How many records does the dataset contain?

2. How many students in each Age?

3. What are the unique ages represented in the dataset?

4. What is the average score of the students?

5. Which student has the highest score?

6. Which student has the lowest score?

7. What is the median age of the students?

8. How many students scored above 80?

> number_of_students_above_80 <- sum(students$Score > 80)

9. What is the age range of the students (oldest and youngest)?

(Add more sheets when needed)

> myTab <- table(students$Score)

> myTab_duplicate <- myTab[myTab > 1]

> myTab_count <- table(myTab_by_Age)

12. What percentage of students scored above the average score?

You might also like