0% found this document useful (0 votes)
6 views32 pages

Tba L09

Vh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views32 pages

Tba L09

Vh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Translational Bioinformatics

Application
(CSE-896)

Omics Data Integration

Spring 2025
Lecture 09
Middle Integration:
Most commonly
used

Uses ML models for consolidating data without concatenation


of features before analysis
1
Major Challenges in Integration:
• Some of the major challenges that must be tackled for reliable
integration of omics datasets include:

1. Data heterogeneity

2
Major Challenges in Integration:
• Some of the major challenges that must be tackled for reliable
integration of omics datasets include:

1. Data heterogeneity

Whole-genome vs Transcriptome data

3
Major Challenges in Integration:
• Some of the major challenges that must be tackled for reliable
integration of omics datasets include:

2. Uneven and missing data

Variations in the number of features/samples

Features may be absent in certain samples

4
Major Challenges in Integration:
• Some of the major challenges that must be tackled for reliable
integration of omics datasets include:

3. High dimensionality

Large p, small n problem

5
Major Challenges in Integration:
• Some of the major challenges that must be tackled for reliable
integration of omics datasets include:

4. Computational Performance

6
ML Methods for Multiomic Integration:
• mixOmics:

mixOmics is an R package for exploring and integrating omics data, including transcriptomics,
proteomics, lipidomics, microbiome, metagenomics and beyond. The mixOmics package includes tools
for data integration, biomarker discovery, and data visualization.

7
ML Methods for Multiomic Integration:

R and Rstudio were set up …


mixOmics was installed …

8
ML Methods for Multiomic Integration:

Reproducibility is very important while performing data analysis

9
Setting up R/RStudio:

10
ML Methods for Multiomic Integration:
• Reproducibility means …

• That the methods of an experiment can be repeated?

• The results of subsequent methods based on those methods


would generate identical results?

• If two groups were analyzing the same data, they would reach the
same conclusions?

11
ML Methods for Multiomic Integration:

“Reproducibility is a minimum necessary condition


for a finding to be believable and informative.”
Goodman et al., 2016

12
Setting up R/RStudio:

13
Setting up R/RStudio:

14
ML Methods for Multiomic Integration:
• Ensuring reproducibility … Data Organization

• Make plans for appropriate storage of raw and processed data

• Create project directories

• Stay organized during the analysis

You can't have any sort of reproducibility


without good data management

Adapted from Goldman, June 2020 and Goldman and Obrycki, December 2020 15
Setting up R/RStudio:

https://phdcomics.com/comics.php?f=1689 16
ML Methods for Multiomic Integration:
• Ensuring reproducibility … How to?

• Document everything

• Create READMEs that detail data organization, analysis methods,


dates, naming conventions, etc.

• Version and parameters that were for the analysis tools

• What were the exact commands that you ran throughout the analysis?

• Annotate code with comments

Adopted from Goldman, June 2020 17


ML Methods for Multiomic Integration:
• Ensuring reproducibility … Tools for documentation

• Documentation … RMarkdown, Jupyter Notebook

• Version Control … Git, Bitbucket

• Collaboration and Version Control … GitHub, Bitbucket

• Containerization … Docker

18
Basic R and RStudio:

R is not just a programming language


… but an environment

19
Basic R and RStudio:

Available here: https://r4ds.hadley.nz/

20
Basic R and RStudio:

RStudio is a freely available open-source IDE

Runs on all major platforms: Windows, Mac OS, UNIX/Linux.

21
Basic R and RStudio:
• Let’s create our first project directory using RStudio.

• Open RStudio

• Go to the File menu and select New Project.

• In the New Project window, choose New Directory. Then, choose New
Project. Name your new directory Intro-to-R and then “Create the
project as subdirectory of:” the Desktop (or location of your choice).

• Click on Create Project.

22
Basic R and RStudio:
• Let’s create our first project directory using RStudio.

23
Basic R and RStudio:
• Go to the File menu and select New File, and select R Script.
• Go to the File menu and select Save As..., type Intro-to-R.R and
select Save

24
Basic R and RStudio:

3. History/
1. Code Editor Environment

4. Plots/ Help/
Packages
2. Console

25
Basic R and RStudio:

What is a project in RStudio?

26
Basic R and RStudio:
• A directory that contains everything related to your analyses for
a specific project.

.Rproj file is created

… keeps track of command history and variables in the environment


… can be used to reopen the project in its current state

27
Basic R and RStudio:
• When a project is (re) opened within RStudio the following actions are
taken:
• A new R session (process) is started

• The .RData file in the project’s main directory is loaded, populating the environment with
any objects that were present when the project was closed

• The .Rhistory file in the project’s main directory is loaded into the RStudio History pane
(and used for Console Up/Down arrow command history).

• The current working directory is set to the project directory.

• Previously edited source documents are restored into editor tabs

• Other RStudio settings (e.g. active tabs, splitter positions, etc.) are restored to where they
were the last time the project was closed.
28
Basic R and RStudio:
• Setting Up Working Directory:

• Use the getwd() function

29
Basic R and RStudio:
• Setting Up Working Directory:

• Use the getwd() function

Changing working directory

30
Basic R and RStudio:
• Structuring your working directory:
For instance, separate directories
for raw & processed data

Create three directories:


data
results
figures

31

You might also like