0% found this document useful (0 votes)
5 views8 pages

L3 Notes-1

The document provides an overview of vectors and special values in R, including NA, Inf, -Inf, NaN, and NULL, along with their usage. It explains how to create and manipulate data frames, including adding columns, extracting data, and performing operations like subsetting and summarizing. Additionally, it covers saving work in R and using script files for programming.

Uploaded by

Pragya Madaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views8 pages

L3 Notes-1

The document provides an overview of vectors and special values in R, including NA, Inf, -Inf, NaN, and NULL, along with their usage. It explains how to create and manipulate data frames, including adding columns, extracting data, and performing operations like subsetting and summarizing. Additionally, it covers saving work in R and using script files for programming.

Uploaded by

Pragya Madaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Vectors

> v <- c(2,4,1,9,8)

> print(min(v))

[1] 1

> print(max(v))

[1] 9

Special Values in R
There are a few special values that are used in R- NA, Inf, -Inf, NaN and NULL.
NA
 In R, the NA values are used to represent missing values. (NA stands for “not available.”)
 You may have NA values in text loaded into R (to represent missing values).
 If you expand the size of a vector (or matrix or array) beyond the size where values were
defined, the new spaces will have the value NA:

> v <- c(1,2,3)


>v

[1] 1 2 3

> length(v)

[1] 3

> length(v) <- 4


>v

[1] 1 2 3 NA

z <- c(1,2,NA,8,3,NA,3)

[1] 1 2 NA 8 3 NA 3

> is.na(z)

[1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE

Inf and -Inf

 If a computation results in a number that is too big, R will return Inf for a positive number
and -Inf for a negative number (meaning positive and negative infinity, respectively):

> 2 ^ 1024
[1] Inf
> - 2 ^ 1024

Page 1 of 8
[1] -Inf

This is also the value returned when you divide by 0:

>1/0
[1] Inf

NaN
 Sometimes, a computation will produce a result that makes no sense.
 In these cases, R will often return NaN (meaning “not a number”):
> x <- 10/0

> y <- -10/0

>x

[1] Inf

>y

[1] -Inf

>x+y

[1] NaN

> Inf – Inf


[1] NaN
>0/0
[1] NaN

NULL

 There is a null object in R, represented by the symbol NULL.


 The symbol NULL always points to the same object.
 NULL is often used as an argument in functions to mean that no value was assigned to the
argument.
 Additionally, some functions may return NULL.
 NULL is not the same as NA, Inf, -Inf, or NaN.

> z <- c(1, NULL, 3)

> print(z)

[1] 1 3

> x <- c(1, NULL, 3, NULL, 5)

> print(x)

[1] 1 3 5

Data Frames

Page 2 of 8
A DataFrame is made up of three components - the data, rows, and columns. It contains data
in a tabular fashion – columns have variables and rows have different records. Columns have
different data types – character, integer, logical etc. Thus data is spread across various
columns of different types.

For doing statistics in R we will be importing data into R. This external, ready-made data is
available in comma-separated values (CSV) and text file formats. DataFrames will be widely
used in reading comma-separated files (CSV) and text files.

Creating a Dataframe using Vectors

DataFrame can be created using the data.frame() function.

Example:

# Create vectors x, y, z.
# x contains numbers 1 to 5
# y contains capital letters from A to E
# z contains names Albert, Bob, Charlie, Denver and Elie
x <- 1:5
y <- LETTERS[1:5]
z <- c(“Albert”, “Bob”, “Charlie”, “Denver”,”Elie”)
x
y
z
# Create a data frame of vectors x, y & z and give it name df
df <- data.frame(x, y, z)
# Print data frame df
print(df)

Adding Names of Columns

# Give names player_id, player_initial and player_name to vectors x, y and z


respectively while creating the data frame df
> df <- data.frame(player_id = x, player_initial = y, player_name = z)
> print(df)
player_id player_initial player_name
1 1 A Albert
2 2 B Bob
3 3 C Charlie

Page 3 of 8
4 4 D Denver
5 5 E Elie
Common Data Frame operations

# Find Number of Rows, Number of Columns, Dimension and Class of a data frame
using nrow(), ncol(), dim() and class() functions

> nrow(df)
[1] 5
> ncol(df)
[1] 3
> dim(df)
[1] 5 3
> class(df)
[1] "data.frame"
# Get the Structure of (an R) Data Frame
# One can get the structure of a data frame using str() function in R. It displays the internal
structure of a data frame, all objects in it, their contents and specifications.
# Example
> print(str(df))
Output
'data.frame': 5 obs. of 3 variables:
$ player_id : int 1 2 3 4 5
$ player_initial: chr "A" "B" "C" "D" ...
$ player_name : chr "Albert" "Bob" "Charlie" "Denver" ...
# Find Summary of data in a data frame
# In R data frame, statistical summary and nature of the data can be obtained by
applying summary() function.
# It is a generic function used to produce result summaries.
Example:
> print(summary(df))
player_id player_initial player_name
Min. :1 Length:5 Length:5
1st Qu.:2 Class :character Class :character
Median :3 Mode :character Mode :character
Mean :3
3rd Qu.:4

Page 4 of 8
Max. :5
# You have received height measurements of the players too, viz., (170, 179, 176,
182, 184). Expand your data frame df by specifying a new column “player_height”
> df$player_height <- c(170, 179, 176, 182, 184)
> # Print the expanded data frame
> print(df)
player_id player_initial player_name player_height
1 1 A Albert 170
2 2 B Bob 179
3 3 C Charlie 176
4 4 D Denver 182
5 5 E Elie 184
Extracting Data from a Data Frame
 Extracting data from a data frame means to access its rows or columns.
 The syntax for the data frame named df is:
df[val1, val2]

where
df = dataframe object name
val1 = rows of a data frame (can also be an array of values such as “1:2” or “2:3” etc.)
val2 = columns of a data frame (-----------------do------------------------------------------------)
 If you specify only df[val2] this only refers to the set of columns for access from the data
frame.

# Access the first and second rows of the data frame df


print(df[1:2, ])
player_id player_initial player_name player_height
1 1 A Albert 170
2 2 B Bob 179

# Access the third column player_name of df


print(df[3])
player_name
1 Albert
2 Bob
3 Charlie
4 Denver
5 Elie
# Alternatively, one can also extract a specific column from a data frame using its
column name.
# Extract player_name column of the data frame df using its column name

Page 5 of 8
#Try print(df$player_name) and Compare the output with that of print(df[3])

Selecting a Subset of a DataFrame based on some Conditions


A subset of a DataFrame can also be created based on certain conditions. The syntax is -
newDF = subset(df, conditions)
df = Original dataframe
conditions = given conditions

# Select the subset of the data frame where player_name is equal to Albert OR
player_height is greater than 180
newDf = subset(df, player_name =="Albert"| player_height >180)
print(newDf)
player_id player_initial player_name player_height
1 1 A Albert 170
4 4 D Denver 182
5 5 E Elie 184
# A certain competitive exam can be taken up to three times. The average score
obtained by a candidate in all attempts is used to decide his or her qualification.
If their score is >= 12.5 then they quality, otherwise not. Ten students named
Anastasia, Dima, Katherine, James, Emily, Michael, Matthew, Laura, Kevin and
Jonas took the examination, a few of them multiple times. The number of
attempts by each of them were (1, 3, 2, 3, 2, 3, 1, 1, 2, 1) respectively. The
(average) scores obtained by the candidates were (12.5, 9, 16.5, 12, 9, 20, 14.5,
13.5, 8, 19). Using this information, create and print a data frame named
exam_data. This data frame should have four columns- name, score, attempts and
qualify. Describe the structure of the exam_data data frame using the str()
function. Produce summary using the summary() function.
> name = c('Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas')
> score = c(12.5, 9, 16.5, 12, 9, 20, 14.5, 13.5, 8, 19),
> attempts = c(1, 3, 2, 3, 2, 3, 1, 1, 2, 1)
> qualify = c('yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'yes', 'no', 'yes')
> exam_data = data.frame(name, score, attempts, qualify)
> exam_data
name score attempts qualify
1 Anastasia 12.5 1 yes
2 Dima 9.0 3 no
3 Katherine 16.5 2 yes
4 James 12.0 3 no
5 Emily 9.0 2 no
6 Michael 20.0 3 yes

Page 6 of 8
7 Matthew 14.5 1 yes
8 Laura 13.5 1 yes
9 Kevin 8.0 2 no
10 Jonas 19.0 1 yes
> str(exam_data)
'data.frame': 10 obs. of 4 variables:
$ name : chr "Anastasia" "Dima" "Katherine" "James" ...
$ score : num 12.5 9 16.5 12 9 20 14.5 13.5 8 19
$ attempts: num 1 3 2 3 2 3 1 1 2 1
$ qualify : chr "yes" "no" "yes" "no" ...
> print(summary(exam_data))
name score attempts qualify
Length:10 Min. : 8.00 Min. :1.00 Length:10
Class :character 1st Qu.: 9.75 1st Qu.:1.00 Class :character
Mode :character Median :13.00 Median :2.00 Mode :character
Mean :13.40 Mean :1.90
3rd Qu.:16.00 3rd Qu.:2.75
Max. :20.00 Max. :3.00
# Extract the second row of the exam_data
> exam_data[2,]
name score attempts qualify
2 Dima 9 3 no
# Extract the first two rows of the exam_data and call them a
> exam_data[1:2,]
> a <- exam_data[1:2,]
>a
name score attempts qualify
1 Anastasia 12.5 1 yes
2 Dima 9.0 3 no
# More information is available about the test-takers, i.e., their last degree
respectively is as follows: “UG”,"UG","PG","PG","UG","UG","UG","PG","PG","UG".
Incorporate this information in the existing data frame by adding another column
called degree with these values.
> exam_data$degree=c("UG","UG","PG","PG","UG","UG","UG","PG","PG","UG")
> print(exam_data)

Page 7 of 8
name score attempts qualify degree
1 Anastasia 12.5 1 yes UG
2 Dima 9.0 3 no UG
3 Katherine 16.5 2 yes PG
4 James 12.0 3 no PG
5 Emily 9.0 2 no UG
6 Michael 20.0 3 yes UG
7 Matthew 14.5 1 yes UG
8 Laura 13.5 1 no PG
9 Kevin 8.0 2 no PG
10 Jonas 19.0 1 yes UG
Saving Your Work
You have been programming at R command prompt.
 To save your R session for later use, you go to the File menu of RGui:
 File > Save to File… and save it as a text file (with .txt extension) at the desired location.
 This file may be opened with Notepad and is editable.

R Script File
Alternately, you can do your programming by writing programs in script files to save, edit and
reuse your work in R Studio. Those scripts can be executed at the command prompt with the
help of R interpreter called Rscript.
Example−

# My first program in R Programming


myString <- "Hello, World!"

print ( myString)
We save the above code in a file called test.R
We can use “Source R code” in the File menu to run the above script file test.R.
It produces the following result-
[1] "Hello, World!"

Page 8 of 8

You might also like