Sds 02
Sds 02
Andrea Passarella
15/02/2022 [email protected] 2
LAUNCH, HELP, SAVE, EXIT
15/02/2022 [email protected] 3
LAUNCH, HELP, SAVE, EXIT
15/02/2022 [email protected] 4
LAUNCH, HELP, SAVE, EXIT
• Help
• Search with the user interface
• help() function from the console
15/02/2022 [email protected] 5
LAUNCH, HELP, SAVE, EXIT
• Workspace = set of
data, function, ...
defined during a session
15/02/2022 [email protected] 6
LAUNCH, HELP, SAVE, EXIT
15/02/2022 [email protected] 7
SCRIPTING
• Writing a script
• Write the script as a text file in any text editor
• NOT using Word, using a real file text editor
• Use the file editor integrated in RStudio
15/02/2022 [email protected] 8
LOADING DATASETS
• data(iris) loads data from iris (the name of the dataset) in the
current workspace
• a variable (a dataframe, see later) called iris is added to the workspace
15/02/2022 [email protected] 9
VARIABLES
15/02/2022 [email protected] 10
VECTORS, ARRAYS
a is a vector of integers
15/02/2022 [email protected] 11
VECTORS, ARRAYS
Collection of values
without any specific
dimension attribute
15/02/2022 [email protected] 12
VECTORS, ARRAYS
15/02/2022 [email protected] 13
ACCESSING VECTOR/ARRAY ELEMENTS
• The [] operator
• Start counting from 1,
not from 0!
Element with index (1,3)
15/02/2022 [email protected] 14
ACCESSING VECTOR/ARRAY ELEMENTS
• Negative indices
• c[,-2]: c with all columns but 2
• In general, negative indices are excluded,
e.g. c[,c(-1;-3)]
• Range indices
• c[,2:4]: all columns of matrix c between 2 and 4
• Expressions as indices
• c[c>5]: all values greater than 5
• c[c>5 & c<10]: all values between 5 and 10
• return value is a vector
15/02/2022 [email protected] 15
LOGICAL OPERATORS
15/02/2022 [email protected] 16
LOGICAL OPERATORS: VECTORISED VS
NON-VECTORISED
• c[c>5 & c<10]: all values between 5 and 10
• Steps
• c>5: a matrix of the same dimensions of c,
with TRUE or FALSE values
• c<10
15/02/2022 [email protected] 17
LOGICAL OPERATORS: VECTORISED VS
NON-VECTORISED
• c[c>5 & c<10]: all values between 5 and 10
• Steps
• c>5: a matrix of the same dimensions of c,
with TRUE or FALSE values
• c<10
15/02/2022 [email protected] 18
BUILDING MATRICES
15/02/2022 [email protected] 19
LISTS, DATA FRAMES
character string
vector of 3 elements
15/02/2022 [email protected] 20
LISTS, DATA FRAMES
• Data frames
• lists whose components are all of the same length
• If components are seen as columns of a matrix,
all columns must have the same size
• With respect to matrices, columns can be of different types
15/02/2022 [email protected] 21
ACCESSING ELEMENTS OF LISTS AND
DATAFRAMES
• $ or [[]] operator
• Selection of elements in a list or data frame
• Either by position: df[[1]]
• Or by name: df[[“name”]], df$name
15/02/2022 [email protected] 22
ADDING REMOVING ELEMENTS FROM
LISTS/DATA FRAMES
• Assigning NULL to an element drops that element
15/02/2022 [email protected] 23
MODIFYING ELEMENTS IN A LIST/DATA FRAME
15/02/2022 [email protected] 24
DATA FRAMES AS MATRICES
15/02/2022 [email protected] 25
ARITHMETIC OPERATIONS
15/02/2022 [email protected] 26
CONDITIONAL STATEMENT
• General form
• If ( statement1 )
statement2
else
statement3
• Example
• if (x > 0) {
count = count+1
x = x+1
print(x)
} else {
count = count-1
x = x-1
print(x)
}
15/02/2022 [email protected] 27
LOOP STATEMENT
• While loop
• while (expression)
statement
15/02/2022 [email protected] 28
FUNCTIONS
• General form
• name <- function(arg_1, arg_2, ...)
expression
15/02/2022 [email protected] 29
DEFAULT AND NAMED ARGUMENTS
15/02/2022 [email protected] 30
IMPLICIT LOOPS
• lapply(ls, f)
• Applies function f() to each element of list ls. Returns a list of results.
• sapply(ls, f)
• Applies function f() to each element of list ls. Returns an array of results.
15/02/2022 [email protected] 31
PROBABILITY DISTRIBUTIONS
• R includes a family of functions to manage the most popular
distributions
• Given a specific distribution (e.g., normal, named “norm” in R)
• rnorm(100, mean=0, std=1)
• Generates 100 samples from a normal distribution with mean 0 and standard
deviation 1
F(k)
• dnorm(3, mean=0, std=1) 1
• Density function computed at 3 (f(3))
x = pnorm(t)
• pnorm(3, mean=0, std=1)
• Distribution function computed at 3
(F(3) = P(X<=3) = 0.9986501) 0
• qnorm(0.9986501, mean=0, std=1) t = qnorm(x)
• Percentile corresponding to 0.9986501 (t s.t. P(X<=t)= 0.9986501) k
15/02/2022 [email protected] 32
PROBABILITY DISTRIBUTIONS
15/02/2022 [email protected] 33
BASIC I/O
File “sample.txt”
15/02/2022 [email protected] 34
BASIC I/O
File “sample.txt”
15/02/2022 [email protected] 35
WRITING DATA FRAMES TO FILES
• write.table() function
File “out_df.txt”
15/02/2022 [email protected] 36
WRITING VECTORS, LISTS, OR MATRICES
• write() function
File “out_matrix.txt”
15/02/2022 [email protected] 37
EXERCISE
3. Calculate the ratio between animals' brain size and their body
size, adding the result as a new column called “proportions” to
the Animals data frame
7. Get a list of animals' names with body size > 100 and brain size >
100
15/02/2022 [email protected] 38
EXERCISE
8. Find the average body and brain size for the first 10 animals in the dataset
9. Write a function that returns a list of two elements containing the mean
value and the standard deviation of a vector of elements
• Apply this to the body and brain sizes of Animals
10. Create a vector called body_norm with 100 samples from a Normal random
variable with average and standard deviation equal to those of body sizes in
the Animals dataset
• print the summary of the generated dataset
• compare the summary with another dataset of 100 samples with same average and sd = 1
11. Save the Animals data frame to a file named “animals_a.txt” with row
and column names
13. Save the workspace to a file, clean the workspace, restore the workspace
from the file
15/02/2022 [email protected] 39