0% found this document useful (0 votes)
34 views37 pages

R - Lecture 4

R data frames are two-dimensional data structures that allow for the storage of data with mixed data types across multiple variables. Data frames can be loaded from built-in datasets, external files like CSVs, or created from vectors. They can be manipulated by subsetting, extracting variables, editing values, and transforming or summarizing the data.

Uploaded by

mxmlan21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views37 pages

R - Lecture 4

R data frames are two-dimensional data structures that allow for the storage of data with mixed data types across multiple variables. Data frames can be loaded from built-in datasets, external files like CSVs, or created from vectors. They can be manipulated by subsetting, extracting variables, editing values, and transforming or summarizing the data.

Uploaded by

mxmlan21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Lecture 4: R-Blocks

Data Frames

Ivan Belik

Assembled and built based on:


https://openlibra.com/en/book/download/an-introduction-to-r-2
R: Data Frames
• Data frame is a two dimensional data structure.

• It is similar to lists, BUT all components in the data frame are of equal length.

In data frames:

• Each component forms the column

• Content of the component (i.e. column) forms the rows

• Consider the built-in R data frame “BOD”:


R: Data Frames
• Let’s start with loading some datasets.

• The easiest way to load data is to use the data() function.

• If entered without arguments, it will bring up a list of all datasets that come bundled with R.
R: Data Frames
• To load a data frame, simply add its name as an argument of the data()-function

• For example, to load built-in data frame BOD:

• Running the class() function, we can verify that BOD is indeed a data frame object:
R: Data Frames
• We can find names of components with the names() function:

• “BOD” – built-in R data frame:

• “IRIS” - data frame from built-in package “datasets”:

To access all datasets known to R type


R: Data Frames
• We can also check some characteristics:

• dim() – the dimension of data frame

• nrow() – the number of observations

• ncol() – the number of variables


R: Data Frames
• To download data frame use save() – function:

• Note:

• Depending on your R installations, the alternative way to specify Address can be as following ( use \\ )

save(BOD, file = "C:\\Files\\BOD.Rdata")


R: Data Frames
• To upload data frame to R use load() – function:

• If you are MAC-user, you can find how to specify file address at the following link:

• http://rischanlab.github.io/SaveLoad.html

• It will be similar to the following:


R: Data Frames
• Frequently, you should create or download data working with “external” software:

• One of the most common data formats is .csv

• CSV is a comma separated values file

• CSV format is supported by the most popular data analytics tools including R

• It can be created in EXCEL or even in a trivial text editor

• A typical .csv file looks as following:


R: Data Frames
• Assume that we have a my_csv.csv file at the following location:

"C:\\Files\\my_csv.csv"

• We can upload it to R using the command read.csv():

Note for Mac users:

File path can start with ~/


For example: "~/Desktop/my_csv.csv" If “my_csv.csv” is located in ‘Desktop’-folder
R: Data Frames
• read.csv() – function can take the following arguments:
R: Data Frames
• We can also download data frames from Internet if we have a valid link:
https://data.ssb.no/api/v0/dataset/85430.csv?lang=en

• For example, we can load csv file (“Statistics Norway”, www.ssb.no):


R: Data Frames
• You can also download the csv file to the specific folder

• So that you have a physical copy on your disk

download.file(url = "https://data.ssb.no/api/v0/dataset/85430.csv?lang=en",
destfile = "C:/Files/data_from_web.csv")

• Many warnings can appear since the data from the web is frequently not well-formatted

• Then, the downloaded csv-file will appear in the specified location:

• "C:/Files/data_from_web.csv”

• Note: you location address will be different since you will save to your local folder on your PC

• Later you can load the downloaded file with read.csv() function (see the previous slides)
R: Data Frames
• Once again, you can read csv-file from your local drive:

• R opens my_csv.csv:

• This list has three variables (Col1, Col2, Col3) and corresponding data (4 observations for each variable)
R: Data Frames
• Now, I can do some manipulations with the imported csv file and write all changes back to my_csv.csv

row.names=FALSE:
required to avoid adding new
row names into the updated csv-
file (see next slide)
R: Data Frames
• If we ignore row.names=FALSE then we will get the following result:
R: Data Frames
• we can create a data frame based on vectors:

# 1. Create DataFrame from vectors


employee <- c('Lasse Lien', 'Eirik Knudsen', 'Ivan Belik')
salary <- c(25300, 24400, 23800)
start_date <- as.Date(c('2020-3-1','2019-3-25','2018-3-14'))
employ_data <- data.frame(employee, salary, start_date)
employ_data

• we can create a data frame based on matrices:

# 2. Create DataFrame from Matrix


x <- matrix(data=1:9, nrow=3, ncol=3)
y <- data.frame(x)
y
Data Frame Manipulations
R: Manipulating Data Frames
• Consider the built-in data frame “IRIS” for our future work:
> data ("iris")

• The first thing to note about data frames:

• It is usually not terribly helpful to print the object to the screen

• R will literally printed all the data to the screen

(It is OK if you are dealing


with small data frames)

• Most of the data frames are huge


and it is important to explore their structure
R: Manipulating Data Frames
• First, the best thing to do is to use the names() and the dim() functions

• You will get all variable names of the data frame

• Also, you will have some idea about the size of the data frame

5 variables (columns) in iris

5 columns (variables), 150 rows (observations) in iris


R: Extraction
• Most of the basic extraction principles – namely the [ ] – that we used for matrices will also work for data frames.

• But you should remember that data frames are a special type of lists

• It means that we can use $ to retrieve data.

• Let’s consider the built-in “iris”-data frame and its columns (i.e., variables):

> data ("iris")


> names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

• For example, lets try to extract the variable called "Petal.Width" (see next slide)
R: Extraction
• We know that Petal.Width is the 4-th variable in “iris”-data frame
> names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

• You can combine the $-notation and the [ ] notation since the $ extracts a vector,
and vectors are indexed via [ ]
R: Extraction
• You can manipulate the data frame’s variables in the same way as with vectors
R: The with() function
• It might be not convenient to use $-notation when you have to use many variables of the same data frame

• R has a function that makes things easier 

• Whenever you need to extract or index multiple variables


and
you don’t feel like typing dataset$variable.name each time, just use with() function:
R: Subsetting
• The subset( ) function is the easiest way to select variables and observations:

• Here, we retrieve iris-data with Sepal.Length in open interval (7.0, 7.3)


R: Subsetting
• The subset( ) function can filter data combining different variables from the data frame:
R: Editing
• To edit you data manually as a spreadsheet, you can use edit() function

• Type edit(iris) and you will get the following basic spreadsheet (R data editor):

• It is not very stable. Please, use some other tools (like MS Excel) for this purpose if you have a choice
R: Editing
• Let’s do some manual data entry.

• We will use the $-operator to create a new (the 6-th) variable in the “iris” - data frame:
R: Function transform()
• To edit or transform a number of variables at once, the transform() function can be used:
R: More about objects and modes
• To detect the type of object you are dealing with use class() function

• First, let’s extract six components of the iris data frame

• Second, apply class() to each of them and check the result

• factors (for ex., Species) are special vectors that contain an attribute called level

• They are different form numeric vectors (see next slide)


R: More about objects and modes
• To make it clear, print Sepal_Length and Species:

• In the given example we can see that printing each object produces different results:
R: More about objects and modes
• Sepal_Length is a trivial numeric vector

• Runing Sepal_Length we return all its elements

• Species is a factor

• Factor is a special type of vector that contains categorical data (see the red “box” below):

• Frequently, it is important to now Levels when you work with complex statistical modeling using R
R: Changing object types
• R is capable of changing object type:

• For example:

• Functions listed above are well described in R documentation


• Check documentation when you need to use any of these functions
R: Data Summaries
• Summaries can be computed with functions that work with vectors and matrices:

• You can also apply summary() function to the entire data frame:
R: Data Summaries
• Another useful function is table()

• It combines data into subsets and shows the frequency of each element:

• The str() function summarizes the structure of a dataset


R: Data Summaries
• To get a more detailed summary of data frames, you can load the Hmisc package.

• You may have to install it first:


R: Data Summaries
• After we installed package Hmisc we import it to R working space and run describe() function (from Hmisc) for the
detailed summary (including quantiles):

You might also like