R - Lecture 4
R - Lecture 4
Data Frames
Ivan Belik
• It is similar to lists, BUT all components in the data frame are of equal length.
In data frames:
• If entered without arguments, it will bring up a list of all datasets that come bundled with R.
R: Data Frames
• To load a data frame, simply add its name as an argument of the data()-function
• Running the class() function, we can verify that BOD is indeed a data frame object:
R: Data Frames
• We can find names of components with the names() function:
• Note:
• Depending on your R installations, the alternative way to specify Address can be as following ( use \\ )
• If you are MAC-user, you can find how to specify file address at the following link:
• http://rischanlab.github.io/SaveLoad.html
• CSV format is supported by the most popular data analytics tools including R
"C:\\Files\\my_csv.csv"
download.file(url = "https://data.ssb.no/api/v0/dataset/85430.csv?lang=en",
destfile = "C:/Files/data_from_web.csv")
• Many warnings can appear since the data from the web is frequently not well-formatted
• "C:/Files/data_from_web.csv”
• Note: you location address will be different since you will save to your local folder on your PC
• Later you can load the downloaded file with read.csv() function (see the previous slides)
R: Data Frames
• Once again, you can read csv-file from your local drive:
• R opens my_csv.csv:
• This list has three variables (Col1, Col2, Col3) and corresponding data (4 observations for each variable)
R: Data Frames
• Now, I can do some manipulations with the imported csv file and write all changes back to my_csv.csv
row.names=FALSE:
required to avoid adding new
row names into the updated csv-
file (see next slide)
R: Data Frames
• If we ignore row.names=FALSE then we will get the following result:
R: Data Frames
• we can create a data frame based on vectors:
• Also, you will have some idea about the size of the data frame
• But you should remember that data frames are a special type of lists
• Let’s consider the built-in “iris”-data frame and its columns (i.e., variables):
• For example, lets try to extract the variable called "Petal.Width" (see next slide)
R: Extraction
• We know that Petal.Width is the 4-th variable in “iris”-data frame
> names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
• You can combine the $-notation and the [ ] notation since the $ extracts a vector,
and vectors are indexed via [ ]
R: Extraction
• You can manipulate the data frame’s variables in the same way as with vectors
R: The with() function
• It might be not convenient to use $-notation when you have to use many variables of the same data frame
• Type edit(iris) and you will get the following basic spreadsheet (R data editor):
• It is not very stable. Please, use some other tools (like MS Excel) for this purpose if you have a choice
R: Editing
• Let’s do some manual data entry.
• We will use the $-operator to create a new (the 6-th) variable in the “iris” - data frame:
R: Function transform()
• To edit or transform a number of variables at once, the transform() function can be used:
R: More about objects and modes
• To detect the type of object you are dealing with use class() function
• factors (for ex., Species) are special vectors that contain an attribute called level
• In the given example we can see that printing each object produces different results:
R: More about objects and modes
• Sepal_Length is a trivial numeric vector
• Species is a factor
• Factor is a special type of vector that contains categorical data (see the red “box” below):
• Frequently, it is important to now Levels when you work with complex statistical modeling using R
R: Changing object types
• R is capable of changing object type:
• For example:
• You can also apply summary() function to the entire data frame:
R: Data Summaries
• Another useful function is table()
• It combines data into subsets and shows the frequency of each element: