L3 Notes-1
L3 Notes-1
> print(min(v))
[1] 1
> print(max(v))
[1] 9
Special Values in R
There are a few special values that are used in R- NA, Inf, -Inf, NaN and NULL.
NA
In R, the NA values are used to represent missing values. (NA stands for “not available.”)
You may have NA values in text loaded into R (to represent missing values).
If you expand the size of a vector (or matrix or array) beyond the size where values were
defined, the new spaces will have the value NA:
[1] 1 2 3
> length(v)
[1] 3
[1] 1 2 3 NA
z <- c(1,2,NA,8,3,NA,3)
[1] 1 2 NA 8 3 NA 3
> is.na(z)
If a computation results in a number that is too big, R will return Inf for a positive number
and -Inf for a negative number (meaning positive and negative infinity, respectively):
> 2 ^ 1024
[1] Inf
> - 2 ^ 1024
Page 1 of 8
[1] -Inf
>1/0
[1] Inf
NaN
Sometimes, a computation will produce a result that makes no sense.
In these cases, R will often return NaN (meaning “not a number”):
> x <- 10/0
>x
[1] Inf
>y
[1] -Inf
>x+y
[1] NaN
NULL
> print(z)
[1] 1 3
> print(x)
[1] 1 3 5
Data Frames
Page 2 of 8
A DataFrame is made up of three components - the data, rows, and columns. It contains data
in a tabular fashion – columns have variables and rows have different records. Columns have
different data types – character, integer, logical etc. Thus data is spread across various
columns of different types.
For doing statistics in R we will be importing data into R. This external, ready-made data is
available in comma-separated values (CSV) and text file formats. DataFrames will be widely
used in reading comma-separated files (CSV) and text files.
Example:
# Create vectors x, y, z.
# x contains numbers 1 to 5
# y contains capital letters from A to E
# z contains names Albert, Bob, Charlie, Denver and Elie
x <- 1:5
y <- LETTERS[1:5]
z <- c(“Albert”, “Bob”, “Charlie”, “Denver”,”Elie”)
x
y
z
# Create a data frame of vectors x, y & z and give it name df
df <- data.frame(x, y, z)
# Print data frame df
print(df)
Page 3 of 8
4 4 D Denver
5 5 E Elie
Common Data Frame operations
# Find Number of Rows, Number of Columns, Dimension and Class of a data frame
using nrow(), ncol(), dim() and class() functions
> nrow(df)
[1] 5
> ncol(df)
[1] 3
> dim(df)
[1] 5 3
> class(df)
[1] "data.frame"
# Get the Structure of (an R) Data Frame
# One can get the structure of a data frame using str() function in R. It displays the internal
structure of a data frame, all objects in it, their contents and specifications.
# Example
> print(str(df))
Output
'data.frame': 5 obs. of 3 variables:
$ player_id : int 1 2 3 4 5
$ player_initial: chr "A" "B" "C" "D" ...
$ player_name : chr "Albert" "Bob" "Charlie" "Denver" ...
# Find Summary of data in a data frame
# In R data frame, statistical summary and nature of the data can be obtained by
applying summary() function.
# It is a generic function used to produce result summaries.
Example:
> print(summary(df))
player_id player_initial player_name
Min. :1 Length:5 Length:5
1st Qu.:2 Class :character Class :character
Median :3 Mode :character Mode :character
Mean :3
3rd Qu.:4
Page 4 of 8
Max. :5
# You have received height measurements of the players too, viz., (170, 179, 176,
182, 184). Expand your data frame df by specifying a new column “player_height”
> df$player_height <- c(170, 179, 176, 182, 184)
> # Print the expanded data frame
> print(df)
player_id player_initial player_name player_height
1 1 A Albert 170
2 2 B Bob 179
3 3 C Charlie 176
4 4 D Denver 182
5 5 E Elie 184
Extracting Data from a Data Frame
Extracting data from a data frame means to access its rows or columns.
The syntax for the data frame named df is:
df[val1, val2]
where
df = dataframe object name
val1 = rows of a data frame (can also be an array of values such as “1:2” or “2:3” etc.)
val2 = columns of a data frame (-----------------do------------------------------------------------)
If you specify only df[val2] this only refers to the set of columns for access from the data
frame.
Page 5 of 8
#Try print(df$player_name) and Compare the output with that of print(df[3])
# Select the subset of the data frame where player_name is equal to Albert OR
player_height is greater than 180
newDf = subset(df, player_name =="Albert"| player_height >180)
print(newDf)
player_id player_initial player_name player_height
1 1 A Albert 170
4 4 D Denver 182
5 5 E Elie 184
# A certain competitive exam can be taken up to three times. The average score
obtained by a candidate in all attempts is used to decide his or her qualification.
If their score is >= 12.5 then they quality, otherwise not. Ten students named
Anastasia, Dima, Katherine, James, Emily, Michael, Matthew, Laura, Kevin and
Jonas took the examination, a few of them multiple times. The number of
attempts by each of them were (1, 3, 2, 3, 2, 3, 1, 1, 2, 1) respectively. The
(average) scores obtained by the candidates were (12.5, 9, 16.5, 12, 9, 20, 14.5,
13.5, 8, 19). Using this information, create and print a data frame named
exam_data. This data frame should have four columns- name, score, attempts and
qualify. Describe the structure of the exam_data data frame using the str()
function. Produce summary using the summary() function.
> name = c('Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas')
> score = c(12.5, 9, 16.5, 12, 9, 20, 14.5, 13.5, 8, 19),
> attempts = c(1, 3, 2, 3, 2, 3, 1, 1, 2, 1)
> qualify = c('yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'yes', 'no', 'yes')
> exam_data = data.frame(name, score, attempts, qualify)
> exam_data
name score attempts qualify
1 Anastasia 12.5 1 yes
2 Dima 9.0 3 no
3 Katherine 16.5 2 yes
4 James 12.0 3 no
5 Emily 9.0 2 no
6 Michael 20.0 3 yes
Page 6 of 8
7 Matthew 14.5 1 yes
8 Laura 13.5 1 yes
9 Kevin 8.0 2 no
10 Jonas 19.0 1 yes
> str(exam_data)
'data.frame': 10 obs. of 4 variables:
$ name : chr "Anastasia" "Dima" "Katherine" "James" ...
$ score : num 12.5 9 16.5 12 9 20 14.5 13.5 8 19
$ attempts: num 1 3 2 3 2 3 1 1 2 1
$ qualify : chr "yes" "no" "yes" "no" ...
> print(summary(exam_data))
name score attempts qualify
Length:10 Min. : 8.00 Min. :1.00 Length:10
Class :character 1st Qu.: 9.75 1st Qu.:1.00 Class :character
Mode :character Median :13.00 Median :2.00 Mode :character
Mean :13.40 Mean :1.90
3rd Qu.:16.00 3rd Qu.:2.75
Max. :20.00 Max. :3.00
# Extract the second row of the exam_data
> exam_data[2,]
name score attempts qualify
2 Dima 9 3 no
# Extract the first two rows of the exam_data and call them a
> exam_data[1:2,]
> a <- exam_data[1:2,]
>a
name score attempts qualify
1 Anastasia 12.5 1 yes
2 Dima 9.0 3 no
# More information is available about the test-takers, i.e., their last degree
respectively is as follows: “UG”,"UG","PG","PG","UG","UG","UG","PG","PG","UG".
Incorporate this information in the existing data frame by adding another column
called degree with these values.
> exam_data$degree=c("UG","UG","PG","PG","UG","UG","UG","PG","PG","UG")
> print(exam_data)
Page 7 of 8
name score attempts qualify degree
1 Anastasia 12.5 1 yes UG
2 Dima 9.0 3 no UG
3 Katherine 16.5 2 yes PG
4 James 12.0 3 no PG
5 Emily 9.0 2 no UG
6 Michael 20.0 3 yes UG
7 Matthew 14.5 1 yes UG
8 Laura 13.5 1 no PG
9 Kevin 8.0 2 no PG
10 Jonas 19.0 1 yes UG
Saving Your Work
You have been programming at R command prompt.
To save your R session for later use, you go to the File menu of RGui:
File > Save to File… and save it as a text file (with .txt extension) at the desired location.
This file may be opened with Notepad and is editable.
R Script File
Alternately, you can do your programming by writing programs in script files to save, edit and
reuse your work in R Studio. Those scripts can be executed at the command prompt with the
help of R interpreter called Rscript.
Example−
print ( myString)
We save the above code in a file called test.R
We can use “Source R code” in the File menu to run the above script file test.R.
It produces the following result-
[1] "Hello, World!"
Page 8 of 8