R Lab
R Lab
Importing Data in R
Estimated time needed: 15 minutes
Objectives
After completing this lab you will be able to:
Import csv and excel file
Access rows and columns from dataset
Access R built-in dataset
Table of Contents
About the Dataset
Reading CSV Files
Reading Excel Files
Accessing Rows and Columns from dataset
Accessing Built-in Datasets in R
Let's learn how to import and read data from two common types of files used to store
tabular data (when data is stored in a table or a spreadsheet.)
CSV files (.csv)
Excel files (.xls or .xlsx)
To begin, we'll need to download the data!
# CSV file
requests.get.file("https://cf-courses-data.s3.us.cloud-object-storage.appdom
destfile="movies-db.csv")
# XLS file
requests.get.file("https://cf-courses-data.s3.us.cloud-object-storage.appdom
destfile="movies-db.xls")
https://labs.cognitiveclass.ai/v2/tools/jupyterlab?ulid=ulid-6080b253d3a4cb69b2885c96311644ebe83b4200 2/7
11/17/24, 5:44 PM lab1_jupyter_importing-data
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipykernel_69/1092272164.py in <module>
2
3 # CSV file
----> 4 requests.get.file("https://cf-courses-data.s3.us.cloud-object-storag
e.appdomain.cloud/IBMDeveloperSkillsNetwork-RP0101EN-Coursera/v2/dataset/movi
es-db.csv",
5 destfile="movies-db.csv")
6
If you ran the cell above, you have now downloaded the following files to your
current folder:
movies-db.csv
movies-db.xls
The data was loaded into the my_data variable. But instead of viewing all the data at
once, we can use the head function to take a look at only the top six rows of our table,
like so:
https://labs.cognitiveclass.ai/v2/tools/jupyterlab?ulid=ulid-6080b253d3a4cb69b2885c96311644ebe83b4200 3/7
11/17/24, 5:44 PM lab1_jupyter_importing-data
Additionally, you may want to take a look at the structure of your newly created table. R
provides us with a function that summarizes an entire table's properties, called str .
Let's try it out.
In [ ]: # Prints out the structure of your table.
str(my_data)
When we loaded the file with the read.csv function, we had to only pass it one
parameter -- the path to our desired file.
Coding Exercise: in the code cell below, get the summary of my_data data frame
In [ ]: # Write your code below. Don't forget to press Shift+Enter to execute the ce
Now that we have our library and its functions ready, we can move on to actually reading
the file. In readxl , there is a function called read_excel , which does all the work for
us. You can use it like this:
https://labs.cognitiveclass.ai/v2/tools/jupyterlab?ulid=ulid-6080b253d3a4cb69b2885c96311644ebe83b4200 4/7
11/17/24, 5:44 PM lab1_jupyter_importing-data
In [ ]: # Read data from the XLS file and attribute the table to the my_excel_data v
my_excel_data <- read_excel("movies-db.xls")
Since my_excel_data is now a dataframe in R, much like the one we created out of
the CSV file, all of the native R functions can be applied to it, like head and str .
In [ ]: # Prints out the structure of your table.
# Tells you how many rows and columns there are, and the names and type of e
# This should be the very same as the other table we created, as they are th
str(my_excel_data)
Much like the read.csv function, read_excel takes as its main parameter the path
to the desired file.
[Tip] A library is basically a collection of different classes and functions which are
used to perform some specific operations. You can install and use libraries to add
more functions that are not included on the core R files. For example, the readxl
library adds functions to read data from excel files.
It's important to know that there are many other libraries too which can be used for a
variety of things. There are also plenty of other libraries to read Excel files -- readxl is
just one of them.
Another way to do this is by using the $ notation which at the output will provide a
vector:
In [ ]: # Retrieve the data for the "name" column in the data frame.
my_data$name
You can also do the same thing using double square brackets, to get a vector of
names column.
https://labs.cognitiveclass.ai/v2/tools/jupyterlab?ulid=ulid-6080b253d3a4cb69b2885c96311644ebe83b4200 5/7
11/17/24, 5:44 PM lab1_jupyter_importing-data
In [ ]: my_data[["name"]]
Similarly, any particular row of the dataset can also be accessed. For example, to get the
first row of the dataset with all column values, we can use:
In [ ]: # Retrieve the first row of the data frame.
my_data[1,]
The first value before the comma represents the row of the dataset and the second
value (which is blank in the above example) represents the column of the dataset to be
retrieved. By setting the first number as 1 we say we want data from row 1. By leaving the
column blank we say we want all the columns in that row.
We can specify more than one column or row by using c , the concatenate function. By
using c to concatenate a list of elements, we tell R that we want these observations out
of the data frame. Let's try it out.
In [ ]: # Retrieve the first row of the data frame, but only the "name" and "length_
my_data[1, c("name","length_min")]
As you can see, there are many different datasets already inbuilt in the R environment.
Having to go through each of them to take a look at their structure and try to find out
what they represent might be very tiring. Thankfully, R has documentation present for
each inbuilt dataset. You can take a look at that by using the help function.
For example, if we want to know more about the women dataset, we can use the
following function:
In [ ]: # Opens up the documentation for the inbuilt "women" dataset.
help(women)
Since the datasets listed are inbuilt, you do not need to import or load them to use them.
If you reference them by their name, R already has the data frame ready.
https://labs.cognitiveclass.ai/v2/tools/jupyterlab?ulid=ulid-6080b253d3a4cb69b2885c96311644ebe83b4200 6/7
11/17/24, 5:44 PM lab1_jupyter_importing-data
In [ ]: women
Coding Exercise: in the code cell below, get the CO2 dataset
In [ ]: # Write your code below. Don't forget to press Shift+Enter to execute the ce
Authors
Hi! It's Iqbal Singh and Walter Gomes, the authors of this notebook. I hope you found it
easy to learn how to import data into R! Feel free to connect with us if you have any
questions.
Other Contributors
Yan Luo
Change Log
Date (YYYY-MM-DD) Version Changed By Change Description
2021-03-04 2.0 Yan Added coding tasks
--!>
https://labs.cognitiveclass.ai/v2/tools/jupyterlab?ulid=ulid-6080b253d3a4cb69b2885c96311644ebe83b4200 7/7