Merging and Importing Data Additionalmaterial

The document provides an overview of built-in functions in R for exploring the iris dataset, including how to determine its dimensions, structure, and summary statistics. It also discusses the use of head and tail functions to view the first and last observations of the dataset. Additionally, it outlines the dependencies required for reading XML and Excel files in R, including necessary package installations on different operating systems.

Uploaded by

D.R.Anitha Sofia Liz CSE STAFF

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views2 pages

Merging and Importing Data Additionalmaterial

Uploaded by

D.R.Anitha Sofia Liz CSE STAFF

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Additional Material for Merging and Importing Data

Built-in functions for exploring a data frame

We will use built-in dataset iris to explore some of the useful functions in base package of R language. In
order to know the dimensions of iris, we use dim function. The output of dim function is a vector, in which
the elements represent the number of rows and number of columns, respectively.
dim(iris)

## [1] 150 5
We can also use nrow and ncol to get the number of rows and number of columns, respectively.
nrow(iris)

## [1] 150
ncol(iris)

## [1] 5
Thus, iris has 150 rows and 2 columns, which can also be verified by using str function. It also returns
many useful pieces of information, including the above information and the types of data for each column.
str(iris)

## 'data.frame': 150 obs. of 5 variables:

## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
The first row in the output indicates that this dataset is a data frame with 150 observations of 5 variables. Also,
num denotes that the variables Sepal.Length, Sepal.Width, Petal.Length and Petal.Width are numeric.
Factor denotes that the variable Species is categorical with 3 levels (setosa, versicolor, virginica).
To know the range of values inside iris, we use summary function. In particular, this function provides a
number of useful statistics including range, median and mean (Andrew Shaughnessy 2018).
summary(iris)

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##

1
Figure 1: Installing readxl and Rcpp

We use head to obtain the first n observations and tail to obtain the last n observations; by default, n = 6.
These are good commands for obtaining an intuitive idea of what the data look like without revealing the
entire dataset, which could have millions of rows and thousands of columns (Cai 2013).
head(iris, 2)

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species

## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
tail(iris, 2)

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species

## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica

Dependencies for reading datasets in R

In order to read XML files in R, we need to install XML package. However, the Ubuntu package libxml2-dev
needs to be installed beforehand (Overflow 2013). On Linux operating system, open the terminal and type
the following commands.
sudo apt-get update
sudo apt-get install libxml2-dev
Similarly, while importing Excel data in R, we need to install readxl and Rcpp. If these packages are not
installed and you try importing Excel data, a pop-up message as shown in Figure 1 will be generated. By
clicking Yes to this message, these packages can be installed.
In case you are using Windows OS, you don’t need to install these packages.

References
Andrew Shaughnessy, Elizabeth Hasenmueller, Christopher Prener. 2018. “Exploring Data in R.” https:
//cran.r-project.org/web/packages/driftR/vignettes/ExploringData.html.
Cai, Eric. 2013. “Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame.” https://
chemicalstatistician.wordpress.com/2013/08/19/exploratory-data-analysis-useful-r-functions-for-exploring-a-data-frame/.
Overflow, Stack. 2013. “Unable to install R package in Ubuntu 11.04.” https://stackoverflow.com/questions/
7765429/unable-to-install-r-package-in-ubuntu-11-04.

EDA With R Lab Manual
No ratings yet
EDA With R Lab Manual
110 pages
GM Series 6864115b62-c GM DSM
No ratings yet
GM Series 6864115b62-c GM DSM
452 pages
03 IRENA Load Flow Analysis
No ratings yet
03 IRENA Load Flow Analysis
20 pages
Data Visualisation Slides 1-6
No ratings yet
Data Visualisation Slides 1-6
318 pages
Introduction To Statistical Theory Part PDF
50% (2)
Introduction To Statistical Theory Part PDF
2 pages
NUMPY-case Study
100% (1)
NUMPY-case Study
4 pages
1 3 ST-explore
No ratings yet
1 3 ST-explore
55 pages
Oracle Cloud Infrastructure (OCI) Architect Associate Exam (1Z0-932) Study Guide
No ratings yet
Oracle Cloud Infrastructure (OCI) Architect Associate Exam (1Z0-932) Study Guide
9 pages
Genetica Cuantitativa
No ratings yet
Genetica Cuantitativa
120 pages
Babd End-Term
No ratings yet
Babd End-Term
43 pages
cs3251 2marks Question With Answer
No ratings yet
cs3251 2marks Question With Answer
42 pages
Ds Practical
No ratings yet
Ds Practical
25 pages
Lecture 2 Data Presentation
No ratings yet
Lecture 2 Data Presentation
18 pages
R - Lecture 4
No ratings yet
R - Lecture 4
37 pages
Unit Ii - Da Using R
No ratings yet
Unit Ii - Da Using R
18 pages
EDA AnalysisA
No ratings yet
EDA AnalysisA
15 pages
Useful R Commands
No ratings yet
Useful R Commands
17 pages
Tutorial Trello
No ratings yet
Tutorial Trello
49 pages
ML R Experiment1
No ratings yet
ML R Experiment1
10 pages
R Programs
No ratings yet
R Programs
30 pages
Data Exploration and Visualisation With R: Yanchang Zhao
No ratings yet
Data Exploration and Visualisation With R: Yanchang Zhao
45 pages
Babd Mid-Term
No ratings yet
Babd Mid-Term
16 pages
Data Visualization Using R & Ggplot2: Karthik Ram October 6, 2013
No ratings yet
Data Visualization Using R & Ggplot2: Karthik Ram October 6, 2013
78 pages
Dsbda Lab - 3 - 1737952797670
No ratings yet
Dsbda Lab - 3 - 1737952797670
9 pages
Chapter 3 - STAT1204..
No ratings yet
Chapter 3 - STAT1204..
10 pages
Unit-Iv Bdaur-Bcom
No ratings yet
Unit-Iv Bdaur-Bcom
9 pages
Exploratory Data Analysis Using R
No ratings yet
Exploratory Data Analysis Using R
48 pages
10
No ratings yet
10
7 pages
Summarizing Data
No ratings yet
Summarizing Data
13 pages
STA 272 Chapter 02 Notes and Codes Data Frames in R
No ratings yet
STA 272 Chapter 02 Notes and Codes Data Frames in R
5 pages
Data Wrangling Cheatsheet PDF
No ratings yet
Data Wrangling Cheatsheet PDF
2 pages
Exno 4
No ratings yet
Exno 4
13 pages
DA Lab Week-1
No ratings yet
DA Lab Week-1
7 pages
Module2 R Report
No ratings yet
Module2 R Report
6 pages
Plot Library Handouts
No ratings yet
Plot Library Handouts
6 pages
R Programming
No ratings yet
R Programming
4 pages
Task 1
No ratings yet
Task 1
14 pages
DS Lab
No ratings yet
DS Lab
31 pages
AWS Control Tower
No ratings yet
AWS Control Tower
10 pages
Atheros BT Win7 User Guide
No ratings yet
Atheros BT Win7 User Guide
42 pages
Assignment 5'
No ratings yet
Assignment 5'
4 pages
IRIS Commands Practice
No ratings yet
IRIS Commands Practice
10 pages
STATISTICALinference
No ratings yet
STATISTICALinference
5 pages
Summary Data
No ratings yet
Summary Data
2 pages
Assigntment 3 Python Lab
No ratings yet
Assigntment 3 Python Lab
1 page
CS3251 Programming in C Apr May 2024 Question Paper Download
No ratings yet
CS3251 Programming in C Apr May 2024 Question Paper Download
3 pages
A Complete Guide To The Iris Dataset in R
No ratings yet
A Complete Guide To The Iris Dataset in R
3 pages
Using R For Data Preprocessing, Exploratory Analysis, Visualization
No ratings yet
Using R For Data Preprocessing, Exploratory Analysis, Visualization
7 pages
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
No ratings yet
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
13 pages
Oracle Fusion Middleware Administration: Atul Kumar
No ratings yet
Oracle Fusion Middleware Administration: Atul Kumar
27 pages
Deep Reinforcement Learning Based Transmission Scheduling For Sensing Aware Control
No ratings yet
Deep Reinforcement Learning Based Transmission Scheduling For Sensing Aware Control
15 pages
Deep-Deterministic Policy Gradient Based Multi-Resource Allocation in Edge-Cloud System A Distrib
No ratings yet
Deep-Deterministic Policy Gradient Based Multi-Resource Allocation in Edge-Cloud System A Distrib
18 pages
C Programming Lecture Notes
No ratings yet
C Programming Lecture Notes
58 pages
Introduction To R. Graphical Representation of Multivariate Observations
No ratings yet
Introduction To R. Graphical Representation of Multivariate Observations
5 pages
MS Publisher Excel Training
No ratings yet
MS Publisher Excel Training
4 pages
Module 2 Iris Data Set
No ratings yet
Module 2 Iris Data Set
1 page
Material DA 7
No ratings yet
Material DA 7
3 pages
Material DA 7
No ratings yet
Material DA 7
3 pages
Material DA 7
No ratings yet
Material DA 7
3 pages
Tidyverse Cheat Sheet
No ratings yet
Tidyverse Cheat Sheet
1 page
Deep Reinforcement Learning-Based Adaptive Scheduling
No ratings yet
Deep Reinforcement Learning-Based Adaptive Scheduling
21 pages
Some R Commander Examples: Sunday, January 03, 2010
No ratings yet
Some R Commander Examples: Sunday, January 03, 2010
5 pages
R For Data Science: Dplyr Ggplot2
No ratings yet
R For Data Science: Dplyr Ggplot2
1 page
Ip Practical File 2
No ratings yet
Ip Practical File 2
30 pages
Penalty 2
No ratings yet
Penalty 2
6 pages
Basic Descriptive Statistics Using R
No ratings yet
Basic Descriptive Statistics Using R
4 pages
Data Wrangling Cheatsheet PDF
No ratings yet
Data Wrangling Cheatsheet PDF
2 pages
Os Presentation
No ratings yet
Os Presentation
15 pages
CS8651 Notes 005-4 Edubuzz360
No ratings yet
CS8651 Notes 005-4 Edubuzz360
55 pages
R For Data Science - Tidyverse For Beginners (Ggplot2, Dplyr, Tidyr, Readr, Purr, Tibble, Stringr, Forcats) PDF
No ratings yet
R For Data Science - Tidyverse For Beginners (Ggplot2, Dplyr, Tidyr, Readr, Purr, Tibble, Stringr, Forcats) PDF
1 page
Ch02v4
No ratings yet
Ch02v4
95 pages
Devops Lab Manuual
No ratings yet
Devops Lab Manuual
24 pages
Shell m1
No ratings yet
Shell m1
45 pages
Distributed System UNIT - III
No ratings yet
Distributed System UNIT - III
23 pages
Sensors 22 08378
No ratings yet
Sensors 22 08378
20 pages
Microsoft Azure Cloud Fundamentals AZ 90
No ratings yet
Microsoft Azure Cloud Fundamentals AZ 90
62 pages
Sylabus Devops
No ratings yet
Sylabus Devops
2 pages
E-ASSESSMENT 2 (Dated 22-4-2025) Suggested Answers
No ratings yet
E-ASSESSMENT 2 (Dated 22-4-2025) Suggested Answers
4 pages
Led Display
No ratings yet
Led Display
5 pages
Sna Cit 2
No ratings yet
Sna Cit 2
2 pages
Et200pa Smart en en-US
No ratings yet
Et200pa Smart en en-US
266 pages
Linux-Shell Scripting
No ratings yet
Linux-Shell Scripting
26 pages
CIT-1 Updated-1
No ratings yet
CIT-1 Updated-1
4 pages
Self-Adaptive Gathering for Energy-Efficient Data Stream in Heterogeneous Wireless Sensor Networks Based on Deep Learning
No ratings yet
Self-Adaptive Gathering for Energy-Efficient Data Stream in Heterogeneous Wireless Sensor Networks Based on Deep Learning
6 pages
Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber Physical Systems
No ratings yet
Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber Physical Systems
8 pages
Unit 3 WT Question Bank
No ratings yet
Unit 3 WT Question Bank
4 pages
Print - Udyam Registration Certificate
No ratings yet
Print - Udyam Registration Certificate
2 pages
Flask Installation Process-1
No ratings yet
Flask Installation Process-1
6 pages
Unit 4
No ratings yet
Unit 4
82 pages
Uraku Raku S-200c HD DVR
No ratings yet
Uraku Raku S-200c HD DVR
59 pages
Lab 1
No ratings yet
Lab 1
10 pages
CS1101 Computational Engineering: Introduction To C Programming Language
No ratings yet
CS1101 Computational Engineering: Introduction To C Programming Language
34 pages
Actualpdf: Unlimited Lifetime Access To 5000+ Certification Actual Exams PDF
No ratings yet
Actualpdf: Unlimited Lifetime Access To 5000+ Certification Actual Exams PDF
28 pages
PR To Be Made Mandatory Field in Purchase Order - SCN
No ratings yet
PR To Be Made Mandatory Field in Purchase Order - SCN
3 pages
Automated Thermal Cycler Flyer
No ratings yet
Automated Thermal Cycler Flyer
2 pages
L2 Website
No ratings yet
L2 Website
19 pages
Samsung Galaxy M13
No ratings yet
Samsung Galaxy M13
2 pages
HCI Previous
No ratings yet
HCI Previous
4 pages
C Is 201 CH 1 Review Answers
No ratings yet
C Is 201 CH 1 Review Answers
5 pages
AI and DS
No ratings yet
AI and DS
6 pages
Planning Tools Sap Integrated Planning Ip
No ratings yet
Planning Tools Sap Integrated Planning Ip
10 pages
Blutooth Car
No ratings yet
Blutooth Car
4 pages
Part 1 Ict Notes
No ratings yet
Part 1 Ict Notes
3 pages
Advanced Migration Technologies Wintershall Case Study
No ratings yet
Advanced Migration Technologies Wintershall Case Study
2 pages

Merging and Importing Data Additionalmaterial

Uploaded by

Merging and Importing Data Additionalmaterial

Uploaded by

Additional Material for Merging and Importing Data

Built-in functions for exploring a data frame

## 'data.frame': 150 obs. of 5 variables:

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species

Dependencies for reading datasets in R

You might also like