BigData - BCom Unit 4

This document provides an overview of data frames in R, including how to create, access, and manipulate them. It covers the creation of a data frame for employee data, methods for accessing data using indices and column names, and functions for exploring data such as dim(), nrow(), and summary(). Additionally, it explains how to load data from external files and subset data frames based on specific conditions.

Uploaded by

Murali Mohan Reddy E

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views9 pages

BigData - BCom Unit 4

Uploaded by

Murali Mohan Reddy E

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Unit-IV: EXPLORING DATA IN R

Data Frames
Imagine a data frame as something akin to a database table or an Excel
spreadsheet. It has a specific number of columns, each of which is
expected to contain values of a particular data type. It also has an
indeterminate number of rows, i.e. sets of related values for each column.
Assume, we have been asked to store data of our employees (such as
employee ID, name and the project that they are working on). We have
been given three independent vectors, viz., namely, “EmpNo”,
“EmpName” and “ProjName” that holds details such as employee ids,
employee names and project names, respectively.
> EmpNo <- c(1000, 1001, 1002, 1003, 1004)
> EmpName <- c(“Jack”, “Jane”, “Margaritta”, “Joe”, “Dave”)
> ProjName <- c(“PO1”, “PO2”, “PO3”, “PO4”, “PO5”)
However, we need a data structure similar to a database table or an Excel
spreadsheet
that can bind all these details together. We create a data frame by the
name, “Employee” to store all the three vectors together.
> Employee <- data.frame(EmpNo, EmpName, ProjName)
Let us print the content of the date frame, “Employee”.
> Employee
EmpNo EmpName ProjName
1 1000 Jack PO1
2 1001 Jane PO2
3 1002 Margaritta PO3
4 1003 Joe PO4
5 1004 Dave PO5
We have just created a data frame, “Employee” with data neatly
organised into rows and the variable names serving as column names
across the top.

Data Frame Access

There are two ways to access the content of data frames:
i. By providing the index number in square brackets
ii. By providing the column name as a string in double brackets.
By Providing the Index Number in Square Brackets
Example 1: To access the second column, “EmpName”, we type the
following command at the R prompt.
> Employee[2]
EmpName
1 Jack
2 Jane
3 Margaritta
4 Joe
5 Dave
Example 2: To access the first and the second column, “EmpNo” and
“EmpName”, we type the following command at the R prompt.
> Employee[1:2]
EmpNo EmpName
1 1000 Jack
2 1001 Jane
3 1002 Margaritta
4 1003 Joe
5 1004 Dave
Example 3:
> Employee [3,]
EmpNo EmpName ProjName
3 1002 Margaritta PO3
Please notice the extra comma in the square bracket operator in the
example. It is not a typo.
Example 4: Let us define row names for the rows in the data frame.
> row.names(Employee) <- c(“Employee 1”, “Employee 2”,
“Employee 3”,
“Employee 4”, “Employee 5”)
> row.names (Employee)
[1] “Employee 1” “Employee 2” “Employee 3” “Employee 4” “Employee
5”
> Employee
EmpNo EmpName ProjName
Employee 1 1000 Jack P01
Employee 2 1001 Jane P02
Employee 3 1002 Margaritta P03
Employee 4 1003 Joe P04
Employee 5 1004 Dave P05
Let us retrieve a row by its name.
> Employee [“Employee 1”,]
EmpNo EmpName ProjName
Employee 1 1000 Jack P01

Let us pack the row names in an index vector in order to retrieve multiple
rows.
> Employee [c (“Employee 3”, “Employee 5”),]
EmpNo EmpName ProjName
Employee 3 1002 Margaritta P03
Employee 5 1004 Dave P05
By Providing the Column Name as a String in Double Brackets
> Employee [[“EmpName”]]
[1] Jack Jane Margaritta Joe Dave
Levels: Dave Jack Jane Joe Margaritta
Just to keep it simple (typing so many double brackets can get unwieldy
at times), use
the notation with the $ (dollar) sign.
> Employee$EmpName
[1] Jack Jane Margaritta Joe Dave
Levels: Dave Jack Jane Joe Margaritta
To retrieve a data frame slice with the two columns, “EmpNo” and
“ProjName”, we
pack the column names in an index vector inside the single square
bracket operator.
> Employee[c(“EmpNo”, “ProjName”)]
EmpNo ProjName
1 1000 P01
2 1001 P02
3 1002 P03
4 1003 P04
5 1004 P05
Let us add a new column to the data frame.
To add a new column, “EmpExpYears” to store the total number of years
of experience that the employee has in the organisation, follow the steps
given as follows:
> Employee$EmpExpYears <-c(5, 9, 6, 12, 7)
Print the contents of the date frame, “Employee” to verify the addition of
the new
column.
> Employee
EmpNo EmpName ProjName EmpExpYears
1 1000 Jack P01 5
2 1001 Jane P02 9
3 1002 Margaritta P03 6
4 1003 Joe P04 12
5 1004 Dave P05 7

Ordering the Data Frames

Let us display the content of the data frame, “Employee” in ascending
order of
“EmpExpYears”.
> Employee[order(Employee$EmpExpYears),]
EmpNo EmpName ProjName EmpExpYears
1 1000 Jack P01 5
3 1002 Margaritta P03 6
5 1004 Dave P05 7
2 1001 Jane P02 9
4 1003 Joe P04 12
Use the syntax as shown next to display the content of the data frame,
“Employee” in
descending order of “EmpExpYears”.
> Employee[order(-Employee$EmpExpYears),]
EmpNo EmpName ProjName EmpExpYears
4 1003 Joe P04 12
2 1001 Jane P02 9
5 1004 Dave P05 7
3 1002 Margaritta P03 6
1 1000 Jack P01 5
R Functions for understanding Data in Data Frames
We will explore the data held in the data frame with the help of the
following R
functions:
 dim()  names()
 nrow()  head()
 ncol()  tail()
 str()  edit()
 summary()

dim() Function
The dim()function is used to obtain the dimensions of a data frame. The
output of this function returns the number of rows and columns.
> dim(Employee)
[1] 5 4
The data frame, “Employee” has 5 rows and 4 columns.
nrow() Function
The nrow() function returns the number of rows in a data frame.
> nrow(Employee)
[1] 5
The data frame, “Employee” has 5 rows.
ncol() Function
The ncol() function returns the number of columns in a data frame.
> ncol(Employee)
[1] 4
The data frame, “Employee” has 4 columns.
str() Function
The str() function compactly displays the internal structure of R objects.
We will use
it to display the internal structure of the dataset, “Employee”.
> str (Employee)
‘data.frame’ : 5 obs. of 4 variables:
$ EmpNo : num 1000 1001 1002 1003 1004
$ EmpName : Factor w/ 5 levels “Dave”, “Jack”, ..: 2 3 5 4 1
$ ProjName : Factor w/ 5 levels “P01”, “P02”, “P03”, ..: 1 2 3 4 5
$ EmpExpYears : num 5 9 6 12 7
summary() Function
We will use the summary() function to return result summaries for each
column of the dataset.
> summary (Employee)
EmpNo EmpName ProjName EmpExpYears
Min. : 1000 Dave : 1 P01:1 Min. : 5.0
1st Qu. : 1001 Jack : 1 P02:1 1st Qu. : 6.0
Median : 1002 Jane : 1 P03:1 Median : 7.0
Mean : 1002 Joe : 1 P04:1 Mean : 7.8
3rd Qu. : 1003 Margaritta : 1 P05:1 3rd Qu. : 9.0
Max. : 1004 Max. : 12.0
names() Function
The names()function returns the names of the objects. We will use the
names() function to return the column headers for the dataset,
“Employee”.
> names (Employee)
[1] “EmpNo” “EmpName” “ProjName” “EmpExpYears”
In the example, names(Employee) returns the column headers of the
dataset “Employee”.
The str() function helps in returning the basic structure of the dataset.
This function
provides an overall view of the dataset.
head() Function
The head()function is used to obtain the first n observations where n is set
as 6 by default.
Examples 1: In this example, the value of n is set as 3 and hence, the
resulting output would contain the first 3 observations of the dataset.
> head(Employee, n=3)
EmpNo EmpName ProjName EmpExpYears
1 1000 Jack P01 5
2 1001 Jane P02 9
3 1002 Margaritta P03 6
2. Consider x as the total number of observations. In case of any negative
values as input for n in the head() function, the output obtained is first
x+n observations. In this example, x=5 and n= -2, then the number of
observations returned will be
x + n =5 + (-2)= 3
> head(Employee, n=-2)
EmpNo EmpName ProjName EmpExpYears
1 1000 Jack P01 5
2 1001 Jane P02 9
3 1002 Margaritta P03 6
tail() Function
The tail()function is used to obtain the last n observations where n is set
as 6 by default.
> tail(Employee, n=3)
EmpNo EmpName ProjName EmpExpYears
3 1002 Margaritta P03 6
4 1003 Joe P04 12
5 1004 Dave P05 7
Example: Consider the example, where the value of n is negative, and
the output is returned by a simple sum up value of x+n. Here x = 5 and n
=-2. When a negative input is given in the case of the tail()function, it
returns the last x+n observations. The example given as follows returns
the last 3 records from the dataset, “Employee”.
> tail(Employee, n=-2)
EmpNo EmpName ProjName EmpExpYears
3 1002 Margaritta P03 6
4 1003 Joe P04 12
5 1004 Dave P05 7
edit() Function
The edit() function will invoke the text editor on the R object. We will use
the edit() function to open the dataset , “Employee” in the text editor.
> edit(Employee)
To retrieve the first three rows (with all columns) from the dataset,
“Employee”, use
the syntax given as follows:
> Employee[1:3,]
EmpNo EmpName ProjName EmpExpYears
1 1000 Jack P01 5
2 1001 Jane P02 9
3 1002 Margaritta P03 6
To retrieve the first three rows (with the first two columns) from the
dataset, “Employee”, use the syntax given as follows:
> Employee[1:3, 1:2]
EmpNo EmpName
1 1000 Jack
2 1001 Jane
3 1002 Margaritta

A brief summary of functions for exploring data in R

Function Name Description
nrow(x) Returns the number of rows
ncol(x) Returns the number of columns
str(mydata) Provides structure to a dataset
summary(mydata) Provides basic descriptive statistics and
frequencies
edit(mydata) Opens the data editor
names(mydata) Returns the list of variables in a dataset
head(mydata) Returns the first n rows of a dataset. By
default, n = 6
head(mydata, Returns the first 10 rows of a dataset
n=10)
head(mydata, n= - Returns all the rows but the last 10
10)
tail(mydata) Returns the last n rows. By default, n = 6
tail(mydata, n=10) Returns the last 10 rows
tail(mydata, n= - Returns all the rows but the first 10
10)
mydata[1:10, ] Returns the first 10 rows
mydata[1:10,1:3] Returns the first 10 rows of data of the first 3
variables

Load Data Frames

Let us look at how R can load data into data frames from external files.
Reading from a .csv (comma separated values file)
We have created a .csv file by the name, “item.csv” in the D:\ drive. It has
the following content:
A B C
1 Itemcode ItemCategory ItemPrice
2 I1001 Electronics 700
3 I1002 Desktop 300
supplies
4 I1003 Office supplies 350

Let us load this file using the read.csv function.

> ItemDataFrame <- read.csv(“D:/item.csv”)
> ItemDataFrame
Itemcode ItemCategory ItemPrice
1 I1001 Electronics 700
2 I1002 Desktop 300
supplies
3 I1003 Office supplies 350
Subsetting Data Frame
To subset the data frame and display the details of only those items
whose price is greater than or equal to 350.
> subset(ItemDataFrame, ItemPrice >=350)
Itemcode ItemCategory ItemPrice
1 I1001 Electronics 700
3 I1003 Office supplies 350
To subset the data frame and display only the category to which the items
belong (items whose price is greater than or equal to 350).
> subset(ItemDataFrame, ItemPrice >=350, select =
c(ItemCategory))
ItemCategory
1 Electronics
3 Office supplies
To subset the data frame and display only the items where the category is
either “Office supplies” or “Desktop supplies”.
> subset(ItemDataFrame, ItemCategory == “Office supplies” |
ItemCategory
== “Desktop supplies”)
Itemcode ItemCategory ItemPrice
2 I1002 Desktop 300
supplies
3 I1003 Office supplies 350
Reading from a Tab Separated Value File
For any file that uses a delimiter other than a comma, one can use the
“read.table” command.
Example: We have created a tab separated file by the name, “item-tab-
sep.txt” in the D:\ drive. It has the following content.
Itemcode ItemQtyOnHan ItemReorderL
d vl
I1001 75 25
I1002 30 25
I1003 35 25
Let us load this file using the “read.table” function. We will read the
content from the file but will not store its content to a data frame.
> read.table(“d:/item-tab-sep.txt”,sep=“\t”)
V1 V2 V3
1 Itemcode ItemQtyOnHan ItemReorderL
d vl
2 I1001 75 25
3 I1002 30 25
4 I1003 35 25

Notice the use of V1, V2 and V3 as column headings. It means that our
specified column names, “Itemcode”, ItemCategory” and “ItemPrice” are
not considered. In other words, the first line is not automatically treated
as a column header.
Let us modify the syntax, so that the first line is treated as a column
header.
> read.table(“d:/item-tab-sep.txt”,sep=“\t”, header=TRUE)
1 Itemcode ItemQtyOnHan ItemReorderL
d vl
2 I1001 75 25
3 I1002 30 25
4 I1003 35 25
Now let us read the content of the specified file into the data frame,
“ItemDataFrame”.
> ItemDataFrame <- read.table(“D:/item-tab-sep.txt”,sep=“\t”,
header=TRUE)
> ItemDataFrame
Itemcode ItemQtyOnHan ItemReorderL
d vl
1 I1001 75 25
2 I1002 30 25
3 I1003 35 25
Reading from a Table
A data table can reside in a text file. The 1001 Physics 85
cells inside the table are separated by 2001 Chemistry 87
3001 Mathematics 93
blank characters. An example of a table
4001 English 84
with 4 rows and 3 columns is given as
follows:
V1 V2 V3 Copy and paste the table in a file
1 1001 Physics 85 named “d:/mydata.txt” with a text
2 2001 Chemistry 87 editor and then load the data into the
3 3001 Mathematics 93 workspace with the function
4 4001 English 84 “read.table”.
> mydata =
read.table(“d:/mydata.txt”)
> mydata
Merging Data Frames
Let us now attempt to merge two data frames using the merge function.
The merge function takes an x frame (item.csv) and a y frame (item-tab-
sep.txt) as arguments. By
default, it joins the two frames on columns with the same name (the two
“Itemcode”
columns).
> csvitem <- read.csv(“d:/item.csv”)
> tabitem <- read.table(“d:/item-tab-sep.txt”,sep=“\t”,header=TRUE)
> merge (x=csvitem, y=tabitem)
Itemco ItemCategory ItemPri ItemQtyOnHa ItemReorder
de ce nd Lvl
1 I1001 Electronics 700 75 25
2 I1002 Desktop 300 30 25
supplies
3 I1003 Office supplies 350 35 25

Unit 1.3
No ratings yet
Unit 1.3
36 pages
L3 Notes-1
No ratings yet
L3 Notes-1
8 pages
8 R Basics 3
No ratings yet
8 R Basics 3
27 pages
(R) Internal-2 Q & A
No ratings yet
(R) Internal-2 Q & A
65 pages
Daur Unit 2
No ratings yet
Daur Unit 2
28 pages
Dar Lecture 7
No ratings yet
Dar Lecture 7
24 pages
Gries Stefan Thomas (2013) - Statistics For Linguistics With R - 2
No ratings yet
Gries Stefan Thomas (2013) - Statistics For Linguistics With R - 2
100 pages
Unit 2 Reading and Writing Files
No ratings yet
Unit 2 Reading and Writing Files
33 pages
Frs Unit - 2
No ratings yet
Frs Unit - 2
27 pages
CH 03
No ratings yet
CH 03
42 pages
R Programming Basics Guide
No ratings yet
R Programming Basics Guide
30 pages
Cost Lab 1
No ratings yet
Cost Lab 1
13 pages
Lab 6 A
No ratings yet
Lab 6 A
3 pages
3 Scalar, Dataframe
No ratings yet
3 Scalar, Dataframe
13 pages
DSF 11-12
No ratings yet
DSF 11-12
21 pages
R Intro2021
No ratings yet
R Intro2021
23 pages
R Data Frame - Javatpoint
No ratings yet
R Data Frame - Javatpoint
14 pages
Lab 1
No ratings yet
Lab 1
26 pages
Practical 1 - Data Frame Manipulation - 072502
No ratings yet
Practical 1 - Data Frame Manipulation - 072502
16 pages
MLlab 5 TH
No ratings yet
MLlab 5 TH
17 pages
STA 272 Chapter 02 Notes and Codes Data Frames in R
No ratings yet
STA 272 Chapter 02 Notes and Codes Data Frames in R
5 pages
Data Wrangling
No ratings yet
Data Wrangling
12 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
DA Lab Week-2
No ratings yet
DA Lab Week-2
22 pages
R Programming Cont..
No ratings yet
R Programming Cont..
24 pages
Introduction To R
No ratings yet
Introduction To R
18 pages
Base R
No ratings yet
Base R
9 pages
4 Overview of R Part 2
No ratings yet
4 Overview of R Part 2
63 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
Broomspatial
No ratings yet
Broomspatial
31 pages
Unit 1 Factor
No ratings yet
Unit 1 Factor
9 pages
R Programming
No ratings yet
R Programming
22 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
Kiran R1
No ratings yet
Kiran R1
12 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
6 Working With Data Frames in R
No ratings yet
6 Working With Data Frames in R
8 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Introduction To R For Business Analytics
No ratings yet
Introduction To R For Business Analytics
7 pages
Kids C ("Jack", "Jill") : 5.1 Creating Data Frames
No ratings yet
Kids C ("Jack", "Jill") : 5.1 Creating Data Frames
11 pages
R Commands
No ratings yet
R Commands
18 pages
CH 3
No ratings yet
CH 3
33 pages
Digital & Tech Solutions Notes - 2024
No ratings yet
Digital & Tech Solutions Notes - 2024
11 pages
R Programming Basics Guide
No ratings yet
R Programming Basics Guide
5 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
How To Read Math
No ratings yet
How To Read Math
47 pages
Week 7
No ratings yet
Week 7
10 pages
Linux Foundation Certified Kubernetes Administrator (CKA) Program - CKA Exam Questions (2025)
No ratings yet
Linux Foundation Certified Kubernetes Administrator (CKA) Program - CKA Exam Questions (2025)
5 pages
Bebras Solution Guide 2020 R1
100% (1)
Bebras Solution Guide 2020 R1
110 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
Timetable Management System: Project Report
100% (1)
Timetable Management System: Project Report
37 pages
Lesson 7 - The Data Frame
No ratings yet
Lesson 7 - The Data Frame
7 pages
Data in R
No ratings yet
Data in R
7 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
R Imp Funtions
No ratings yet
R Imp Funtions
10 pages
Experiment No 6
No ratings yet
Experiment No 6
5 pages
Python An Introduction
From Everand
Python An Introduction
Renier Engelbrecht
No ratings yet
Bcom II Sem Ecommerce-Final
No ratings yet
Bcom II Sem Ecommerce-Final
148 pages
R Functions
No ratings yet
R Functions
8 pages
BigData BCom
No ratings yet
BigData BCom
57 pages
DBMS Bcom Unit-5
No ratings yet
DBMS Bcom Unit-5
24 pages
Hina Resume Update
No ratings yet
Hina Resume Update
3 pages
GFK1192B
No ratings yet
GFK1192B
229 pages
R
No ratings yet
R
15 pages
DBMS Bcom Unit-2
No ratings yet
DBMS Bcom Unit-2
16 pages
N2 Data in R
No ratings yet
N2 Data in R
7 pages
DS CT9510 en Co 51966
No ratings yet
DS CT9510 en Co 51966
6 pages
Manual Radio Android
No ratings yet
Manual Radio Android
76 pages
AZ 900 Udemy Practice Test 1
No ratings yet
AZ 900 Udemy Practice Test 1
118 pages
Instruction Manual: 30-T0601/P New Software For Data Acquisition and Processing of Geotechnical Tests
No ratings yet
Instruction Manual: 30-T0601/P New Software For Data Acquisition and Processing of Geotechnical Tests
45 pages
Computer Programming 2 Final Project
No ratings yet
Computer Programming 2 Final Project
17 pages
Motu M4 Analisi e Misure
No ratings yet
Motu M4 Analisi e Misure
25 pages
Capstone Project
No ratings yet
Capstone Project
8 pages
Glass Panel Calculations
No ratings yet
Glass Panel Calculations
12 pages
Smartphone Addiction and Its Associated Factors Among Students in Twin Cities of Pakistan
No ratings yet
Smartphone Addiction and Its Associated Factors Among Students in Twin Cities of Pakistan
7 pages
UGRD-IS6200A System, Analysis, Design & Development
No ratings yet
UGRD-IS6200A System, Analysis, Design & Development
24 pages
IT Management Assessment-8
No ratings yet
IT Management Assessment-8
14 pages
Impact 360 V10 Desktop Installations User Guide
0% (1)
Impact 360 V10 Desktop Installations User Guide
52 pages
Midsem22 23
No ratings yet
Midsem22 23
4 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
Ethiopian TVET-System: Learning Guide # 7
100% (1)
Ethiopian TVET-System: Learning Guide # 7
16 pages
L06 Ch2.5 CFE Strings
No ratings yet
L06 Ch2.5 CFE Strings
5 pages
Histogram Samples
No ratings yet
Histogram Samples
5 pages
Facestation F2: Fusion Multimodal Terminal
No ratings yet
Facestation F2: Fusion Multimodal Terminal
4 pages
Module1 PDF
No ratings yet
Module1 PDF
10 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
Business Proposal: This Proposal Is Prepared For: This Proposal Is Prepared by
No ratings yet
Business Proposal: This Proposal Is Prepared For: This Proposal Is Prepared by
6 pages
STS Activity 3
No ratings yet
STS Activity 3
2 pages
Steps To Create Issue Database in Express
No ratings yet
Steps To Create Issue Database in Express
11 pages
Option & Firmware Version Section: OPTVER00-00
No ratings yet
Option & Firmware Version Section: OPTVER00-00
19 pages
Cs411 Midterm Solved Mcqs by Junaid
No ratings yet
Cs411 Midterm Solved Mcqs by Junaid
48 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

BigData - BCom Unit 4

Uploaded by

BigData - BCom Unit 4

Uploaded by

Unit-IV: EXPLORING DATA IN R

Data Frame Access

Ordering the Data Frames

A brief summary of functions for exploring data in R

Load Data Frames

Let us load this file using the read.csv function.

You might also like