0% found this document useful (0 votes)
11 views39 pages

Sds 02

R is an interpreted scripting language that allows both console-based and script-based interactions. It uses variables to store data, functions to process data, and can perform limited input/output with external files. RStudio is a popular graphical user interface for R that allows package installation, visualizing variables, and accessing help. Workspaces containing defined data and functions can be saved, loaded, and restored between sessions. Scripts allow executing a sequence of R statements from a text file. Common data types in R include vectors, arrays, matrices, lists, and data frames. Logical operators and indexing allow selecting elements or subsets of data structures.

Uploaded by

nimra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views39 pages

Sds 02

R is an interpreted scripting language that allows both console-based and script-based interactions. It uses variables to store data, functions to process data, and can perform limited input/output with external files. RStudio is a popular graphical user interface for R that allows package installation, visualizing variables, and accessing help. Workspaces containing defined data and functions can be saved, loaded, and restored between sessions. Scripts allow executing a sequence of R statements from a text file. Common data types in R include vectors, arrays, matrices, lists, and data frames. Logical operators and indexing allow selecting elements or subsets of data structures.

Uploaded by

nimra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

R: BASICS

Andrea Passarella

(plus some additions by Salvatore Ruggieri)


BASIC CONCEPTS
• R is an interpreted scripting
language
• Types of interactions
• Console based
• Input commands into the console
• Examine results
• Scripting
• Sequence of statements in a text file
• Use the ”source()” command to process the file
• Equivalent to provide the sequence of statements to the console

• How we will use it


• Variables to store data
• Functions (either existing in the packages or new ones written on purpose) to
process data
• (Limited) I/O with external files for
• Input/output of data
• scripting

15/02/2022 [email protected] 2
LAUNCH, HELP, SAVE, EXIT

• Launching the “R” application means


running the interpreter shell

15/02/2022 [email protected] 3
LAUNCH, HELP, SAVE, EXIT

• RStudio is a front-end to the language


• Embeds the interpreter shell (Console)
• Visualisation of available variables
• Package installation
• Help

15/02/2022 [email protected] 4
LAUNCH, HELP, SAVE, EXIT

• Help
• Search with the user interface
• help() function from the console

15/02/2022 [email protected] 5
LAUNCH, HELP, SAVE, EXIT

• Workspace = set of
data, function, ...
defined during a session

• The elements of the


workspace are
shown in the
“Environment” pane

• or can be listed with


ls() from the console

15/02/2022 [email protected] 6
LAUNCH, HELP, SAVE, EXIT

• Workspaces can be saved and restored from previous sessions


• Either through the UI in RStudio

• or via save.image() and load() functions from the R console

• Automatic actions (upon running/exiting from R/RStudio)


• Load workspace from a file “.RData” in the working directory upon launch
• Ask to save to “.RData” in the working directory upon exiting

15/02/2022 [email protected] 7
SCRIPTING

• For non-toys use, most likely you want to


• Write a script with a set of R statements
• Execute the script and get the results

• Writing a script
• Write the script as a text file in any text editor
• NOT using Word, using a real file text editor
• Use the file editor integrated in RStudio

• Execute the script


• Using the source()
function
• Loading the script file
into the editor and
“sourcing” from there

15/02/2022 [email protected] 8
LOADING DATASETS

• Function data() list the set of available dataset provided by the


currently loaded packages

• data(iris) loads data from iris (the name of the dataset) in the
current workspace
• a variable (a dataframe, see later) called iris is added to the workspace

• Depending on the dataset format, it might be needed to access the


dataframe to “expand” it
• E.g., ls(iris)

15/02/2022 [email protected] 9
VARIABLES

• Defined as they are needed

• Assignment operator, <-, or =


• a = 15 defines variable a, with value 15
• From then on, a becomes available in the workspace

• Looking into variables


• Type the name in the console

• summary(variable_name) shows a summary, which depends on the type of


the variable
• e.g., if p is a set of values, summary(p) shows some reference percentiles of
these values

15/02/2022 [email protected] 10
VECTORS, ARRAYS

• Vectors are the most basic structure in R


• a collection of values of the same type

Function c(), returns a collection of the arguments

a is a vector of integers

b is a vector of character strings


• note the difference between
• 5 in a
• “5” in b

15/02/2022 [email protected] 11
VECTORS, ARRAYS

• Arrays are vectors with given dimensions

Collection of values
without any specific
dimension attribute

Now gets a single dimension

2 dimensions, 2 rows, 5 columns

15/02/2022 [email protected] 12
VECTORS, ARRAYS

• Arrays can be created more simply with array()

seq() generates a sequence of values dim parameter of the function


between the given extremes to set the dimensions

• Matrices are arrays with 2 dimensions only


• Note that arrays can have
more than 2 dimensions

15/02/2022 [email protected] 13
ACCESSING VECTOR/ARRAY ELEMENTS

• The [] operator
• Start counting from 1,
not from 0!
Element with index (1,3)

All elements of the first row

All elements of the second column

NB: c[,2] is itself a vector, thus one can further index it

First element of c[,2]


(equivalent to c[1,2])

15/02/2022 [email protected] 14
ACCESSING VECTOR/ARRAY ELEMENTS

• Negative indices
• c[,-2]: c with all columns but 2
• In general, negative indices are excluded,
e.g. c[,c(-1;-3)]

variable c combination function c()

• Range indices
• c[,2:4]: all columns of matrix c between 2 and 4

• Expressions as indices
• c[c>5]: all values greater than 5
• c[c>5 & c<10]: all values between 5 and 10
• return value is a vector

15/02/2022 [email protected] 15
LOGICAL OPERATORS

• Standard set of operators of any programming language


• ! Unary not
• < Less than, binary
• > Greater than, binary
• == Equal to, binary
• >= Greater than or equal to, binary
• <= Less than or equal to, binary
• & And, binary, vectorized
• && And, binary, not vectorized
• | Or, binary, vectorized
• || Or, binary, not vectorized

15/02/2022 [email protected] 16
LOGICAL OPERATORS: VECTORISED VS
NON-VECTORISED
• c[c>5 & c<10]: all values between 5 and 10

• Steps
• c>5: a matrix of the same dimensions of c,
with TRUE or FALSE values

• c<10

• c>5 & c<10: a matrix of the same


dimensions of c, with the logical AND
of the two expressions

• c[c>5 & c<10]: select from c only


the elements for which the indices
are TRUE

15/02/2022 [email protected] 17
LOGICAL OPERATORS: VECTORISED VS
NON-VECTORISED
• c[c>5 & c<10]: all values between 5 and 10

• Steps
• c>5: a matrix of the same dimensions of c,
with TRUE or FALSE values

• c<10

• c>5 & c<10: a matrix of the same


dimensions of c, with the logical AND
of the two expressions
• We need to do the logical AND on an
element-by-element of the two matrices
• This is obtained with the vectorised version
of the operator, “&”

• c>5 && c<10: non-vectorised version


• Applicable to single-element data
• In case of vectors stops at the first element
• Typically used for indices in control statements and loops

15/02/2022 [email protected] 18
BUILDING MATRICES

• Sometimes useful to build matrices by stitching together existing


arrays or matrices
• cbind() joins together vectors/matrices by column
• rbind() joins together vectors/matrices by row

Do not assign names to columns

previous matrix c vector d

15/02/2022 [email protected] 19
LISTS, DATA FRAMES

• Lists are collections of arbitrary data types

character string

integer Function length()


• size of the variable
• different from dim()

vector of 3 elements

15/02/2022 [email protected] 20
LISTS, DATA FRAMES

• Data frames
• lists whose components are all of the same length
• If components are seen as columns of a matrix,
all columns must have the same size
• With respect to matrices, columns can be of different types

Note the difference with the


definition of Lst!

15/02/2022 [email protected] 21
ACCESSING ELEMENTS OF LISTS AND
DATAFRAMES
• $ or [[]] operator
• Selection of elements in a list or data frame
• Either by position: df[[1]]
• Or by name: df[[“name”]], df$name

• Levels are the unique elements found,


if defined

15/02/2022 [email protected] 22
ADDING REMOVING ELEMENTS FROM
LISTS/DATA FRAMES
• Assigning NULL to an element drops that element

• Create a new element by just assigning values to the name of the


new element

15/02/2022 [email protected] 23
MODIFYING ELEMENTS IN A LIST/DATA FRAME

• [[]] or $ operators return a vector


• Whose elements can be managed with the normal index operators
• E.g., []

15/02/2022 [email protected] 24
DATA FRAMES AS MATRICES

• Sometimes it is useful to access Data Frames as matrices


Names of the columns
Names of the rows • Access and modify
• Access and modify via colnames(df)
via rownames(df)

Matrix part of the data frame


Access and modify via T/F index vectors can also
the [,] operator be applied to columns!
• Select only those columns
for which the condition
is true
Select people whose age
is greater than 16

15/02/2022 [email protected] 25
ARITHMETIC OPERATIONS

• With arrays, element-by-element operation

• Same semantic with matrices

• Use “%*%” for the standard matrix product form

15/02/2022 [email protected] 26
CONDITIONAL STATEMENT

• General form
• If ( statement1 )
statement2
else
statement3

• Example
• if (x > 0) {
count = count+1
x = x+1
print(x)
} else {
count = count-1
x = x-1
print(x)
}

15/02/2022 [email protected] 27
LOOP STATEMENT

• While loop
• while (expression)
statement

NB: statement1 is typically


• For loop
a set of values
• for (name in statement1) statement2

15/02/2022 [email protected] 28
FUNCTIONS

• General form
• name <- function(arg_1, arg_2, ...)
expression

• Return the max of two arguments

• Return the max and whether it was first or second argument

15/02/2022 [email protected] 29
DEFAULT AND NAMED ARGUMENTS

• Functions may be defined with default arguments

• Parameters can also be given by name (instead of by position)

15/02/2022 [email protected] 30
IMPLICIT LOOPS

• lapply(ls, f)
• Applies function f() to each element of list ls. Returns a list of results.

• sapply(ls, f)
• Applies function f() to each element of list ls. Returns an array of results.

15/02/2022 [email protected] 31
PROBABILITY DISTRIBUTIONS
• R includes a family of functions to manage the most popular
distributions
• Given a specific distribution (e.g., normal, named “norm” in R)
• rnorm(100, mean=0, std=1)
• Generates 100 samples from a normal distribution with mean 0 and standard
deviation 1
F(k)
• dnorm(3, mean=0, std=1) 1
• Density function computed at 3 (f(3))
x = pnorm(t)
• pnorm(3, mean=0, std=1)
• Distribution function computed at 3
(F(3) = P(X<=3) = 0.9986501) 0
• qnorm(0.9986501, mean=0, std=1) t = qnorm(x)
• Percentile corresponding to 0.9986501 (t s.t. P(X<=t)= 0.9986501) k

• Given a set of values in a vector x


• mean(x) gives the average
• sd(x) gives the standard deviation
• Summary(x) gives a summary of the main percentiles of the distribution

15/02/2022 [email protected] 32
PROBABILITY DISTRIBUTIONS

• Parameters to the p,q,r,d functions depend on the particular


distribution
• See also https://CRAN.R-project.org/view=Distributions

15/02/2022 [email protected] 33
BASIC I/O

• Read values into a vector


• scan() function

File “sample.txt”

Initial lines to skip

A path to the file to read


• if relative, the working directory is assumed
• Use getwd() for the name of the working directory
• Equivalent to paste(getwd(),”/sample.txt”,sep=“”)

By default, elements are separated by white spaces or end-of-line


• can be modified through the sep argument

15/02/2022 [email protected] 34
BASIC I/O

• Read structured data into data frames


• read.table() function

File “sample.txt”

Whether the first


line should be used
to get the column
names

15/02/2022 [email protected] 35
WRITING DATA FRAMES TO FILES

• write.table() function

Data frame to write Where to write it

Whether to put row names Use tab as separator


(usually numbers) in front of rows

Whether to put quotes around character strings

File “out_df.txt”

15/02/2022 [email protected] 36
WRITING VECTORS, LISTS, OR MATRICES

• write() function

Object to write Where to write it

Use tab as separator Number of columns in the output file


• Here equal to the number of columns of the
matrix
• Same with function ncol(c)

File “out_matrix.txt”

15/02/2022 [email protected] 37
EXERCISE

1. Install (if needed) the MASS package and load it

2. Load the “Animals” data set

3. Calculate the ratio between animals' brain size and their body
size, adding the result as a new column called “proportions” to
the Animals data frame

4. Calculate average and standard deviation of the “proportions”

5. Remove the column “proportions” from the data frame

6. Select animals with body size > 100

7. Get a list of animals' names with body size > 100 and brain size >
100

15/02/2022 [email protected] 38
EXERCISE
8. Find the average body and brain size for the first 10 animals in the dataset

9. Write a function that returns a list of two elements containing the mean
value and the standard deviation of a vector of elements
• Apply this to the body and brain sizes of Animals

10. Create a vector called body_norm with 100 samples from a Normal random
variable with average and standard deviation equal to those of body sizes in
the Animals dataset
• print the summary of the generated dataset
• compare the summary with another dataset of 100 samples with same average and sd = 1

11. Save the Animals data frame to a file named “animals_a.txt” with row
and column names

12. Create a copy of the file named “animals_b.txt”, then


• modify some data in it
• Read the file into a new data frame, Animals_b
• Write a function that returns the rows that differ between Animals and Animals_b

13. Save the workspace to a file, clean the workspace, restore the workspace
from the file

15/02/2022 [email protected] 39

You might also like