Bio 9
Bio 9
what is programming?
a way to instruct the computer to perform certain tasks. The instructions are what we define as
programs. In summary, we want the computer to run specific tasks and we need to learn how to
generate such instructions.
Why do we want the computer to run the tasks?
Because computers are:
1. fast,
2. cheaper than the time it will take a human to perform certain tasks,
3. can work 24 hours.
Writing a paragraph of instructions, it was renamed coding. We generate a to-do list for the
computer.
Why R?
-R was assigned originally by Ross Ihaka and Robert Gentleman.
-R was assigned originally for a statistical analysis.
-R was assigned as an interpret language.
That means we can run the code without a compiler. Interpret means we can run the code, our
orders line by line. Compile means we need to write the entire program and run the entire
program.
In R, what we will need is a common line interpreter. We write the code, we write the order in a
line, someone has to interpret and transfer the orders to the computer. Few last things. R is a very
popular sourceful environment. It's open source, it's under a general public license, and it's
written primarily in C and Fortran.
What is RStudio
Rstudio is an Integrated Development Environment, IDE, for R. An IDE is a software
application that provides comprehensive facilities for software development to compute the
parameters. Provide an environment
Data type:
1-Vector: concatenating values of the same type.
Syntax:
Vec1 <- c(1,2,3)
2-List: If we want to have different types, then we will be using what we call lists.
Syntax:
L1 <- list(‘a’,’b’)
4- DataFrames.
You can think of DataFrames as a collection of columns, where every column can be a vector of
different types.
df1 <- data.frame(col1 = v1, col2 = v2, col3 = v3, name = v4)
# Convert data to factor, all levels are considered equal, i.e. no order
data.factor <- factor(data.vec)
another function:
class ()
head()
dim()
nrow()
ncol()
length()
control flow:
# Basic syntax
## Switch-statements
```{r}
colorMapper('red')
colorMapper('tree')
## For-loops
# Create dummy matrix
mat <- matrix(
data = rnorm(20),
nrow = 5,
ncol = 4,
dimnames = list(NULL, c('col1', 'col2', 'col3', 'col4')))
```
## While-loops
```{r}
# Initialize a vector of 0
items <- vector('numeric', length = 3)
# Add a vector of random numbers to the initial vector, until the total sum is
larger 10
# the total number of iterations is not known beforehand
iter <- 0
while(sum(items) < 10) {
iter <- iter + 1
items <- items + rnorm(length(items))
}
iter
items
Week 3
Python
Lists:
- Lists are ordered collections of elements in Python.
- Elements in a list can be of different types, such as numbers, strings, Boolean values, or even
other lists.
- Lists can be modified by adding, removing, or changing elements.
- List elements can be accessed using indexing, starting from 0.
- Negative indexing can be used to access elements from the end of the list.
- Slicing can be used to access a subset of elements in a list.
- Functions like `len()` and methods like `reverse()` and `sort()` can be used to manipulate lists.
Tuples:
- Tuples are similar to lists, but they are immutable, meaning they cannot be modified once
created.
- Tuples are created using round brackets instead of square brackets.
- Tuples can contain elements of different types, similar to lists.
- Tuple elements can be accessed using indexing, starting from 0.
Dictionaries:
- Dictionaries are unordered collections of key-value pairs.
- Each element in a dictionary consists of a key and its corresponding value, separated by a colon
(:).
- Dictionaries are defined using curly braces ({}) and can be empty or contain elements.
- Keys in a dictionary must be unique, but values can be duplicated.
- Dictionary elements are not accessed by position, but rather by their keys.
- The `keys()` method can be used to retrieve all the keys in a dictionary.
- Dictionaries can be modified by adding, updating, or deleting key-value pairs.
- Adding a new element involves assigning a value to a new key.
- Updating an element involves assigning a new value to an existing key.
- Deleting an element can be done using the `del` keyword followed by the key.
Sets:
- Sets are unordered collections of unique elements.
- Sets are defined using curly braces ({}) or the `set()` function.
- Sets automatically remove duplicate values, so each element appears only once.
- Sets are useful for operations such as intersection, union, and difference.
- The `union()` method or the `|` operator can be used to find the union of two sets.
- The `intersection()` method or the `&` operator can be used to find the intersection of two sets.
- The `difference()` method or the `-` operator can be used to find the difference between two
sets.
- The `symmetric_difference()` method or the `^` operator can be used to find elements that are
in either set, but not in both.
Week 4
What is programming?
a) A way to instruct the computer to perform certain tasks.
b) A method of organizing data.
c) A process of analyzing biological sequences.
d) A technique for visualizing complex data.
What is RStudio?
a) An integrated development environment (IDE) for R.
b) A programming language used for statistical analysis.
c) A database management system.
d) A web-based server for running R scripts.
Which data type in R is used for concatenating values of the same type?
a) Vector.
b) List.
c) Matrix.
d) DataFrame.
Which set operation can be used to find the union of two sets in Python?
a) |
b) &
c) -
d) ==
What information is typically stored in the obs and var data frames in single-cell RNA-seq data
representation?
a) Cell-related metadata and gene-related metadata, respectively.
b) Gene-related metadata and cell-related metadata, respectively.
c) Pairwise similarities between cells and genes, respectively.
d) Gene expression values and cell expression values, respectively.
What is the role of the obsm data frame in single-cell RNA-seq data representation?
a) Storing pairwise similarities between cells.
b) Storing pairwise similarities between genes.
c) Storing additional encodings for gene expression values.
d) Storing additional encodings for cell expression values.
Which R function is used to determine the class or data type of an object?
a) class()
b) type()
c) typeof()
d) dtype ()
In R, which function is used to display the first few rows of a data frame?
a) head()
b) tail()
c) first()
d) top()
What does the var data frame represent in single-cell RNA-seq data representation?
a) Cell-related metadata.
b) Gene-related metadata.
c) Pairwise similarities between cells.
d) Pairwise similarities between genes.
What is the significance of the obsp data frame in single-cell RNA-seq data representation?
a) It represents gene expression values in a PCA space.
b) It stores cell-related metadata.
c) It stores pairwise similarities between cells.
d) It stores pairwise similarities between genes.