0% found this document useful (0 votes)
16 views9 pages

Bio 9

Uploaded by

Canada ??
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

Bio 9

Uploaded by

Canada ??
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Bioinformatics (week 1)

what is programming?
a way to instruct the computer to perform certain tasks. The instructions are what we define as
programs. In summary, we want the computer to run specific tasks and we need to learn how to
generate such instructions.
Why do we want the computer to run the tasks?
Because computers are:
1. fast,
2. cheaper than the time it will take a human to perform certain tasks,
3. can work 24 hours.
Writing a paragraph of instructions, it was renamed coding. We generate a to-do list for the
computer.

Why R?
-R was assigned originally by Ross Ihaka and Robert Gentleman.
-R was assigned originally for a statistical analysis.
-R was assigned as an interpret language.
That means we can run the code without a compiler. Interpret means we can run the code, our
orders line by line. Compile means we need to write the entire program and run the entire
program.

In R, what we will need is a common line interpreter. We write the code, we write the order in a
line, someone has to interpret and transfer the orders to the computer. Few last things. R is a very
popular sourceful environment. It's open source, it's under a general public license, and it's
written primarily in C and Fortran.

What is RStudio
Rstudio is an Integrated Development Environment, IDE, for R. An IDE is a software
application that provides comprehensive facilities for software development to compute the
parameters. Provide an environment
Data type:
1-Vector: concatenating values of the same type.
Syntax:
Vec1 <- c(1,2,3)

2-List: If we want to have different types, then we will be using what we call lists.
Syntax:
L1 <- list(‘a’,’b’)

3-matrix, what do we need to provide? (Same type)


We need to provide: the name of the matrix.
the number of rows.
the number of columns and what are the values
Syntax:
M1 <- matrix(fill, nrows, ncolums)
M1 <- matrix(0, 2, 3)

4- DataFrames.
You can think of DataFrames as a collection of columns, where every column can be a vector of
different types.
df1 <- data.frame(col1 = v1, col2 = v2, col3 = v3, name = v4)

5-Factors are list of predefined set of values


# Initial data vector
data.vec <- c("small", "small", "medium", "large", "medium")
class(data.vec)

# Convert data to factor, all levels are considered equal, i.e. no order
data.factor <- factor(data.vec)

another function:
class ()
head()
dim()
nrow()
ncol()
length()
control flow:
# Basic syntax

if (x1 < 10) {print("A")}

# Basic syntax: else

if (x1 < 10) {print("A")}


else {print("B")}

## Switch-statements

```{r}

colorMapper <- function(x) {


switch(x,
red = "#FF0000",
green = "#00FF00",
blue = "#0000FF",
stop("Invalid color name")
)
}

colorMapper('red')
colorMapper('tree')

## For-loops
# Create dummy matrix
mat <- matrix(
data = rnorm(20),
nrow = 5,
ncol = 4,
dimnames = list(NULL, c('col1', 'col2', 'col3', 'col4')))

# Initialize result vector. We know how large the result is.


means <- vector("list", ncol(mat))

# Iterate over matrix columns and populate result


for (i in 1:ncol(mat)) {
means[[i]] <- mean(mat[,i])
}

```

## While-loops

```{r}

# Initialize a vector of 0
items <- vector('numeric', length = 3)

# Add a vector of random numbers to the initial vector, until the total sum is
larger 10
# the total number of iterations is not known beforehand

iter <- 0
while(sum(items) < 10) {
iter <- iter + 1
items <- items + rnorm(length(items))
}
iter
items

Week 3
Python
Lists:
- Lists are ordered collections of elements in Python.
- Elements in a list can be of different types, such as numbers, strings, Boolean values, or even
other lists.
- Lists can be modified by adding, removing, or changing elements.
- List elements can be accessed using indexing, starting from 0.
- Negative indexing can be used to access elements from the end of the list.
- Slicing can be used to access a subset of elements in a list.
- Functions like `len()` and methods like `reverse()` and `sort()` can be used to manipulate lists.

Tuples:
- Tuples are similar to lists, but they are immutable, meaning they cannot be modified once
created.
- Tuples are created using round brackets instead of square brackets.
- Tuples can contain elements of different types, similar to lists.
- Tuple elements can be accessed using indexing, starting from 0.

Dictionaries:
- Dictionaries are unordered collections of key-value pairs.
- Each element in a dictionary consists of a key and its corresponding value, separated by a colon
(:).
- Dictionaries are defined using curly braces ({}) and can be empty or contain elements.
- Keys in a dictionary must be unique, but values can be duplicated.
- Dictionary elements are not accessed by position, but rather by their keys.
- The `keys()` method can be used to retrieve all the keys in a dictionary.
- Dictionaries can be modified by adding, updating, or deleting key-value pairs.
- Adding a new element involves assigning a value to a new key.
- Updating an element involves assigning a new value to an existing key.
- Deleting an element can be done using the `del` keyword followed by the key.

Sets:
- Sets are unordered collections of unique elements.
- Sets are defined using curly braces ({}) or the `set()` function.
- Sets automatically remove duplicate values, so each element appears only once.
- Sets are useful for operations such as intersection, union, and difference.
- The `union()` method or the `|` operator can be used to find the union of two sets.
- The `intersection()` method or the `&` operator can be used to find the intersection of two sets.
- The `difference()` method or the `-` operator can be used to find the difference between two
sets.
- The `symmetric_difference()` method or the `^` operator can be used to find elements that are
in either set, but not in both.

Week 4

Representing scRNA-seq experiments in Python


- Single-cell RNA-seq data in Python is typically stored within an annotated data object.
- The main component of the data object is the X matrix, which contains the gene expression
values. Cells are represented as rows, and genes as columns. Multiple layers are possible within
X, such as raw counts and unnormalized data.
- The obs data frame contains information specific to each cell, such as donor name,
experimental group, and other cell-related metadata.
- The var data frame contains information specific to each gene, such as the chromosome where
the gene is located or other gene-related metadata.
- The obsm data frame typically contains additional encodings for the gene expression values,
such as the new coordinates of each cell in a principal component analysis (PCA) space.
- The obsp data frame contains information about pairwise similarities between cells, indicating
how similar each cell is to all the others.
- The varp data frame is similar to obsp but contains information about pairwise similarities
between genes.

What is programming?
a) A way to instruct the computer to perform certain tasks.
b) A method of organizing data.
c) A process of analyzing biological sequences.
d) A technique for visualizing complex data.

Why do we want the computer to run tasks?


a) Computers are fast.
b) Computers are cheaper than human labor.
c) Computers can work 24 hours.
d) All of the above.

Why was R originally assigned as an interpreted language?


a) It allows running the code without a compiler.
b) It has better performance compared to compiled languages.
c) It requires less memory to execute programs.
d) It supports parallel processing.

What is RStudio?
a) An integrated development environment (IDE) for R.
b) A programming language used for statistical analysis.
c) A database management system.
d) A web-based server for running R scripts.

Which data type in R is used for concatenating values of the same type?
a) Vector.
b) List.
c) Matrix.
d) DataFrame.

What is a list in Python?


a) An ordered collection of elements.
b) An immutable data structure.
c) A key-value pair.
d) A set of unique elements.

How are elements accessed in a list?


a) Using indexing, starting from 1.
b) Using indexing, starting from 0.
c) Using negative indexing.
d) Using slicing.

What is a dictionary in Python?


a) An ordered collection of elements.
b) A mutable data structure.
c) A key-value pair.
d) A set of unique elements.

Which set operation can be used to find the union of two sets in Python?
a) |
b) &
c) -
d) ==

How are single-cell RNA-seq experiments represented in Python?


a) Using a list of lists.
b) Using a dictionary of lists.
c) Using an annotated data object.
d) Using a matrix of values.

What is the purpose of the X matrix in single-cell RNA-seq data representation?


a) It contains gene expression values.
b) It stores cell-related metadata.
c) It represents pairwise similarities between cells.
d) It encodes gene expression values in a PCA space.

What information is typically stored in the obs and var data frames in single-cell RNA-seq data
representation?
a) Cell-related metadata and gene-related metadata, respectively.
b) Gene-related metadata and cell-related metadata, respectively.
c) Pairwise similarities between cells and genes, respectively.
d) Gene expression values and cell expression values, respectively.

What is the role of the obsm data frame in single-cell RNA-seq data representation?
a) Storing pairwise similarities between cells.
b) Storing pairwise similarities between genes.
c) Storing additional encodings for gene expression values.
d) Storing additional encodings for cell expression values.
Which R function is used to determine the class or data type of an object?
a) class()
b) type()
c) typeof()
d) dtype ()

In R, which function is used to display the first few rows of a data frame?
a) head()
b) tail()
c) first()
d) top()

What is the purpose of an if-else statement in R?


a) To perform a specific action based on a condition.
b) To iterate over a sequence of values.
c) To define a custom function.
d) To handle exceptions and errors.

How are elements added to a list in Python?


a) Using the add() function.
b) Using the insert() function.
c) Using the append() function.
d) Using the extend() function.

What is the key difference between a list and a tuple in Python?


a) Lists are mutable, while tuples are immutable.
b) Lists can store elements of different data types, while tuples cannot.
c) Lists have a fixed length, while tuples can have a variable length.
d) Lists are ordered, while tuples are unordered.

Which method is used to remove an element from a dictionary in Python?


a) remove()
b) pop()
c) delete()
d) discard()

What is the purpose of the intersection operation on sets in Python?


a) To find the common elements between two sets.
b) To combine the elements of two sets.
c) To find the unique elements in two sets.
d) To remove the common elements between two sets.
How is gene expression data typically represented in a single-cell RNA-seq experiment?
a) As a matrix with cells as rows and genes as columns.
b) As a matrix with genes as rows and cells as columns.
c) As a list of gene expression values.
d) As a dictionary with cells as keys and gene expression values as values.

What does the var data frame represent in single-cell RNA-seq data representation?
a) Cell-related metadata.
b) Gene-related metadata.
c) Pairwise similarities between cells.
d) Pairwise similarities between genes.

What is the significance of the obsp data frame in single-cell RNA-seq data representation?
a) It represents gene expression values in a PCA space.
b) It stores cell-related metadata.
c) It stores pairwise similarities between cells.
d) It stores pairwise similarities between genes.

You might also like