R NGS

The document discusses the application of R and Bioconductor in next-generation sequencing (NGS) analysis, highlighting various tools and packages for RNA-seq, ChIP-seq, and SNP-seq. It emphasizes the importance of R as a programming language in bioinformatics and outlines the capabilities of Bioconductor for genomic data analysis. Key topics include data import/export, differential expression analysis, and the integration of biological metadata.

Uploaded by

azhagar_ss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views29 pages

R NGS

Uploaded by

azhagar_ss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

R & NGS

Dr. G. Ramesh Kumar, PhD.,

AU-KBC Research Centre,
MIT, Anna University,
Chromepet,Chennai-44.
UNIT III
• Application of R in NGS analysis:
• 5 TOPICS
• Introduction to Bioconductor GR
• Reading of RNA-seq data (ShortRead,
Rsamtools, GenomicRanges),
• annotation (biomaRt, genomeIntervals),
• reads coverage and assign counts (IRanges,
GenomicFeatures),
• differential expression (DESeq).
REF

• http://manuals.bioinformatics.ucr.edu/
home/ht-seq#R_BACK
Application of R in NGS analysis
• They are central to many applications in the:
• Genome annotation and
• NGS analysis areas, such as
• RNA-Seq,
• ChIP-Seq and
• SNP-Seq.
Application of R in NGS analysis

• Seq2pathway: an R/Bioconductor package for

pathway analysis of next-generation
sequencing data
R
• In recent years the R language has become the
Lingua Franca of data intensive research, and is
now by far the most widely used data analysis
programming language in bioinfomatics.
• One of the outstanding strengths of the R
language is the ease of programming extensions
to automate the analysis and mining of almost
any data type.
R

• The following topics will be introduced:

• (1) conditional executions,
• (2) loops,
• (3) writing custom functions,
• (4) calling external software,
• (5) running and debugging R programs, and
• (6) building custom R packages.
R
• R (http://www.r-project.org) is a versatile data
analysis environment that has a broad
application spectrum in all experimental and
quantitative scientific areas.
• The associated Bioconductor project provides
access to over 700 R extension packages for
the analysis of modern biological and
biomedical data sets, such as next generation
sequences, comparative genomics, network
modeling and statistical analysis.
R
• The R software is free and runs on all common operating
systems.

• The following topics will be covered:

• (1) command syntax,
• (2) basic functions,
• (3) data import/export,
• (4) data/object types,
• (5) graphical display,
• (6) usage of R packages/libraries (e.g. Bioconductor) and
• (7) using R for basic data analysis operations.
Bioconductor
• https://en.wikipedia.org/wiki/Bioconductor
• Bioconductor is a free, open source and open
development software project for the analysis and
comprehension of genomic data generated by wet
lab experiments in molecular biology.
• https://www.bioconductor.org/
• Bioconductor provides tools for the analysis and
comprehension of high-throughput genomic data.
• Bioconductor uses the R statistical programming
language, and is open source and open
development.
Why Open Source
• so that you can find out what algorithm is being
used, and how it is being used
• so that you can modify these algorithms to try
out new ideas or to accommodate local
conditions or needs
• so you can read the code, find bugs, suggest
improvements etc.
• so that they can be used as components
(potentially modified) in other peoples software
Overview
• biology is a computational science
• problems of data analysis, data generation,
reproducibility require computational support and
computational solutions
• we value code reuse
– many of the tasks have already been solved
– if we use those solutions we can put effort into new
research
• well designed, self-describing data structures help us
deal with complex data
Goals
• Provide access to powerful statistical and graphical methods
for the analysis of genomic data.
• Facilitate the integration of biological metadata (GenBank,
GO, Entrez Gene, PubMed) in the analysis of experimental
data.
• Allow the rapid development of extensible, interoperable, and
scalable software.
• Promote high-quality documentation and reproducible
research.
• Provide training in computational and statistical methods.
Bioconductor packages
Release 2.10, 554 Software Packages!
• General infrastructure
Biobase, Biostrings, biocViews
• Annotation:
annotate, annaffy, biomaRt, AnnotationDbi  data packages.
• Graphics/GUIs:
geneplotter, hexbin, limmaGUI, exploRase
• Pre-processing:
affy, affycomp, oligo, makecdfenv, vsn, gcrm, limma
• Differential gene expression:
genefilter, limma, ROC, siggenes, EBArrays, factDesign
• GSEA/Hypergeometric Testing
GSEABase, Category, GOstats, topGO
• Graphs and networks:
graph, RBGL, Rgraphviz
• Flow Cytometry:
flowCore, flowViz, flowUtils
• Protein Interactions:
ppiData, ppiStats, ScISI, Rintact
• Sequence Data:
Biostrings,ShortRead,rtracklayer,IRanges,GenomicFeatures,
VariantAnnotation
• Other data:
xcms, DNAcopy, PROcess, aCGH, rsbml, SBMLR, Rdisop
Component software

• interesting problems will require the

coordinated application of many
different techniques
• thus we need integrated interoperable
software
• of primary importance is well designed
and shared data structures
Data complexity
• Dimensionality.
• Dynamic/evolving data: e.g., gene annotation, sequence,
literature.
• Multiple data sources and locations: in-house, WWW.
• Multiple data types: numeric, textual, graphical.
No longer Xnxp!
We distinguish between biological metadata and
experimental metadata.
Experimental metadata

• when were the samples processed

and how
• what arrays were used/what kits
• if size selection of some sort (eg.
fractionation for proteomics
experiments) was used
• date the samples were run
• lane or chip information
• treatments
Biological metadata
• Biological attributes that can be applied to the
experimental data.
• E.g. for genes
– chromosomal location;
– gene annotation (Entrez Gene, GO);
– gene models
– relevant literature (PubMed)
• Biological metadata sets are large, evolving rapidly, and
typically distributed via the WWW.
• Tools: annotate, biomaRt, and
AnnotationDbi, GenomicFeatures packages,
and annotation data packages.
Annotation packages
annotate, annafy, biomaRt, and AnnotationDbi
Metadata package hgu95av2 mappings • Assemble and process genomic
between different gene IDs for this chip. annotation data from public
repositories.
GENENAME
ENTREZID • Build annotation data packages.
zinc finger protein 261
9203 • Associate experimental data in
real time to biological metadata
ACCNUM from web databases such as
X95808 MAP GenBank, GO, KEGG, Entrez
Xq13.1 Gene, and PubMed.
AffyID
41046_s_at
• Process and store query results:
e.g., search PubMed abstracts.
• Generate HTML reports of
analyses.
SYMBOL
ZNF261
PMID
10486218 GO
9205841 GO:0003677
8817323 GO:0007275
GO:0016021 + many other mappings
Sequence Annotation
• for a given gene:
– gene models
– sequence
– exon/intron boundaries
– location
– conservation
• often in the form of tracks
• it is important to keep track of the reference
genome being used
Vignettes
• Bioconductor developed a new documentation
paradigm, the vignette.
• A vignette is an executable document consisting of a
collection of documentation text and code chunks.
• Vignettes form dynamic, integrated, and reproducible
statistical documents that can be automatically
updated if either data or analyses are changed.
• Vignettes can be generated using the Sweave
function from the R tools package.
Bioconductor Software

• concentrate development resources on a few

important aspects
• Biobase: core classes and definitions that allow for
succinct description and handling of the data
• annotate: generic functions for annotation that can be
specialized
• genefilter/limma/DESeq/DEXSeq: differential
expression
• ShortRead/IRanges/GenomicFeatures/
VariantAnnotation: string manipulations, sequence
analysis
Quality Assessment
• ensuring that the data are of sufficient quality
is an essential first step
• arrayQuality Metrics: comprehensive QA
assessment of microarrays (one color or two
color)
– modifications are coming to make it more suitable
for sequence data
• ShortRead: tools for QA of short reads,
primarily Illumina
Biobase:ExpressionSet
• software should help organize and manipulate your
data
• the data need to be assembled correctly once, and
then they can be processed, subset etc without
worrying about them
• we developed the ExpressionSet class
• SummarizedExperiment class is the next iteration in
this process (in the GenomicRanges package)
Microarray data analysis
CEL, CDF .gpr, .Spot

Pre-processing affy marray

vsn limma
vsn
ExpressionSet
Annotation
annotate
Differential Graphs & Cluster Prediction annaffy
expression networks analysis biomaRt
edd graph CRAN + metadata
CRAN packages
genefilter RBGL class
class
limma Rgraphviz e1071
cluster Graphics
multtest ipred
MASS geneplotter
ROC LogitBoost
mva hexbin
+ CRAN MASS
nnet + CRAN
randomForest
rpart
Differential Expression
• limma: provides a linear models interface for
DE
– uses a moderated variance
– a variety of p-value correction methods are
provided
• DESeq and edgeR: for sequence data
– similar approach to limma
– make use of count data (Neg Binomial)
• DEXSeq for exon level differential expression
Machine Learning
• Software for machine learning has been written by many
different people
– the calling sequences and return values are unique to each
method
• MLInterfaces
• provides uniform calling sequences and return values for
all machine learning algorithms
• MLearn is the main wrapper function
– methods, eg knni, are passed to the wrapper
• return values are of class MLOutput
• see the MLInterfaces vignette for more details
Publications
• Bioconductor: Open software development for
computational biology and bioinformatics, Genome
Biology 2004, 5:R80,
http://genomebiology.com/2004/5/10/R80
• Bioinformatics and Computational Biology Solutions
using R and Bioconductor, Springer, 2005, R.
Gentleman, V. Carey, W. Huber, R. Irizarry, S. Dudoit
eds.
• Bioconductor Case Studies, Springer
• R Programming for Bioinformatics, Chapman Hall
Comprehensive R Archive Network
• CRAN is a network of ftp and web servers
around the world that store identical, up-to-
date, versions of code and documentation for
R.
• https://cran.r-project.org/

Bioinformatics and Computational Biology Solutions Using R and Bioconductor - 1st Edition Authorized Download
100% (16)
Bioinformatics and Computational Biology Solutions Using R and Bioconductor - 1st Edition Authorized Download
16 pages
Chapter 11 Answers
100% (1)
Chapter 11 Answers
13 pages
Instant Notes in Bioinformatics, Richard M Tywman
100% (2)
Instant Notes in Bioinformatics, Richard M Tywman
257 pages
HRMDP M&E System
100% (1)
HRMDP M&E System
162 pages
Sequences, Genomes, and Genes in R / Bioconductor: Martin Morgan October 21, 2013
No ratings yet
Sequences, Genomes, and Genes in R / Bioconductor: Martin Morgan October 21, 2013
46 pages
R..Sequences, Genomes, and Genes in R Bioconductor
100% (1)
R..Sequences, Genomes, and Genes in R Bioconductor
46 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
14 pages
01-What Is Bioinformatics
No ratings yet
01-What Is Bioinformatics
40 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Bioinformatics 1
No ratings yet
Bioinformatics 1
37 pages
1 introToR 2
No ratings yet
1 introToR 2
32 pages
Bioinformatics Learning Framework
No ratings yet
Bioinformatics Learning Framework
7 pages
BTH 403-BTG407 Lecture 1
No ratings yet
BTH 403-BTG407 Lecture 1
6 pages
Joint Beca-Ilri Hub, Slu and Unesco Advanced Genomics and Bioinformatics
No ratings yet
Joint Beca-Ilri Hub, Slu and Unesco Advanced Genomics and Bioinformatics
27 pages
Bioinfo Course Notes M1 2020 DR Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 DR Mbulli
56 pages
Bio Informatics
No ratings yet
Bio Informatics
46 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Supplementary List of Software For Bioinformatics and Comparative Genomics
No ratings yet
Supplementary List of Software For Bioinformatics and Comparative Genomics
5 pages
Bioinformatics and Functional Genomics - Ebook PDF
No ratings yet
Bioinformatics and Functional Genomics - Ebook PDF
51 pages
Biological Databases
No ratings yet
Biological Databases
28 pages
Bio Conductor
No ratings yet
Bio Conductor
3 pages
Lab 1 - Introduction and Protocol
No ratings yet
Lab 1 - Introduction and Protocol
28 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
Introduction To Bioinformatics and Biocomputing I: DR Tan Tin Wee Director Bioinformatics Centre
No ratings yet
Introduction To Bioinformatics and Biocomputing I: DR Tan Tin Wee Director Bioinformatics Centre
39 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Plant Biotechnology
No ratings yet
Plant Biotechnology
44 pages
Navigating-Internet-Resources-in-Bioinformatics - PPTX 2
No ratings yet
Navigating-Internet-Resources-in-Bioinformatics - PPTX 2
11 pages
Module1 Understanding Bioinformatics
No ratings yet
Module1 Understanding Bioinformatics
28 pages
Genome Parsergenome Parsergenome Parsergenome Parser
No ratings yet
Genome Parsergenome Parsergenome Parsergenome Parser
165 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
Biopython Org DIST Docs Tutorial Tutorial HTML
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
267 pages
Computational Validation and Analysis of Semi-Quantitative Data Using In-Silico Approaches
No ratings yet
Computational Validation and Analysis of Semi-Quantitative Data Using In-Silico Approaches
5 pages
Biological Sequence Databases
No ratings yet
Biological Sequence Databases
35 pages
Lec (1) - Introduction
No ratings yet
Lec (1) - Introduction
41 pages
Bio1 Summary
No ratings yet
Bio1 Summary
3 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
2006 09 01 - Lect01 - ch1 2 PDF
No ratings yet
2006 09 01 - Lect01 - ch1 2 PDF
104 pages
Bi Workbook
No ratings yet
Bi Workbook
13 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
BMB402 502 Introduction To Bioinformatics Syllabus 2025
No ratings yet
BMB402 502 Introduction To Bioinformatics Syllabus 2025
11 pages
Bioinformatics - Trends and Methodologies
No ratings yet
Bioinformatics - Trends and Methodologies
736 pages
Overview On Bioinformatics
No ratings yet
Overview On Bioinformatics
75 pages
NEW BMS Software Requirement Specification1
No ratings yet
NEW BMS Software Requirement Specification1
135 pages
PB Bioinfo L1 2023
No ratings yet
PB Bioinfo L1 2023
21 pages
Bioconductor: Open Software Development For Computational Biology and Bioinformatics
No ratings yet
Bioconductor: Open Software Development For Computational Biology and Bioinformatics
16 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
APPLICATION OF BIOINFORMATICS IN MOLECULAR BIOLOGY AND CURRENT RESEACRH-Dr. Ruchi Yadav
No ratings yet
APPLICATION OF BIOINFORMATICS IN MOLECULAR BIOLOGY AND CURRENT RESEACRH-Dr. Ruchi Yadav
105 pages
Exploring Database and Analyzing Protein Sequence
No ratings yet
Exploring Database and Analyzing Protein Sequence
70 pages
I Am Sharing 'Document' With You
No ratings yet
I Am Sharing 'Document' With You
3 pages
Bioinformatics: Tina Elizabeth Varghese
No ratings yet
Bioinformatics: Tina Elizabeth Varghese
9 pages
DIVYA Bioinformatics
No ratings yet
DIVYA Bioinformatics
20 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
33 pages
Lecture 5
No ratings yet
Lecture 5
44 pages
Bioinformatics Final
No ratings yet
Bioinformatics Final
18 pages
Bioinformatics and Functional Genomics 1st Edition Jonathan Pevsner Instant Download
100% (1)
Bioinformatics and Functional Genomics 1st Edition Jonathan Pevsner Instant Download
59 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Latthika
No ratings yet
Latthika
21 pages
Bioinformatics New Tools and Applications in Life
No ratings yet
Bioinformatics New Tools and Applications in Life
16 pages
Bioinfo PPT Unit 1 Half
No ratings yet
Bioinfo PPT Unit 1 Half
42 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
From Everand
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
GAP Lecture 5
No ratings yet
GAP Lecture 5
27 pages
GAP Lecture 1
No ratings yet
GAP Lecture 1
24 pages
GAP6
No ratings yet
GAP6
10 pages
GAP Lecture 2
No ratings yet
GAP Lecture 2
21 pages
Lecture 5 GridComputing-2014
No ratings yet
Lecture 5 GridComputing-2014
39 pages
Chou Fasman
No ratings yet
Chou Fasman
6 pages
CUDAProg Model
No ratings yet
CUDAProg Model
24 pages
CH IPSeq
No ratings yet
CH IPSeq
27 pages
Theoretical Biology and Medical Modelling
No ratings yet
Theoretical Biology and Medical Modelling
11 pages
Biochemical Calculations by Erwin Segel
No ratings yet
Biochemical Calculations by Erwin Segel
458 pages
Brief Bioinform-2010-Li-473-83
No ratings yet
Brief Bioinform-2010-Li-473-83
11 pages
1o9u.pdb (Renum - 1, Water & Ligand Remove) : 1. Extract The Residues Sequence by Using The Following Script
No ratings yet
1o9u.pdb (Renum - 1, Water & Ligand Remove) : 1. Extract The Residues Sequence by Using The Following Script
6 pages
Big 2013 0036
No ratings yet
Big 2013 0036
6 pages
Align View
No ratings yet
Align View
9 pages
ACTIVIDAD ASSESSMENT PROCESAMIENTO NUMERICO English Version
No ratings yet
ACTIVIDAD ASSESSMENT PROCESAMIENTO NUMERICO English Version
3 pages
10 Recommendation Engine Problem Statement
No ratings yet
10 Recommendation Engine Problem Statement
10 pages
MySQL Command
No ratings yet
MySQL Command
7 pages
2016 Doctoral Conference Graduate School of Education University of Bristol
100% (1)
2016 Doctoral Conference Graduate School of Education University of Bristol
36 pages
E Commerce
No ratings yet
E Commerce
15 pages
Adbms: Database Recovery Techniques in DBMS
No ratings yet
Adbms: Database Recovery Techniques in DBMS
5 pages
Big Data Analytics - Sgtrategy and Roadmap
No ratings yet
Big Data Analytics - Sgtrategy and Roadmap
31 pages
Informed Consent Form
No ratings yet
Informed Consent Form
2 pages
1 2 ArcGIS Components V2 Color
No ratings yet
1 2 ArcGIS Components V2 Color
8 pages
E Check Ni Mocs
No ratings yet
E Check Ni Mocs
19 pages
OODB and Previllage For Lab
No ratings yet
OODB and Previllage For Lab
146 pages
Shebora Osaio Kamara
No ratings yet
Shebora Osaio Kamara
2 pages
Fundamental Data Structures
100% (1)
Fundamental Data Structures
376 pages
DBMS Unit5
No ratings yet
DBMS Unit5
20 pages
Ad3381 - Data Base Design and Management Manual
No ratings yet
Ad3381 - Data Base Design and Management Manual
56 pages
Lesson Plan
No ratings yet
Lesson Plan
6 pages
SQL - Advanced Interview Questions
No ratings yet
SQL - Advanced Interview Questions
17 pages
Microsoft Actualtests AI-100 v2019-10-04 by Sebastian 67q
No ratings yet
Microsoft Actualtests AI-100 v2019-10-04 by Sebastian 67q
61 pages
MSC Datascience Unit1
No ratings yet
MSC Datascience Unit1
20 pages
WBUT Data C Book
No ratings yet
WBUT Data C Book
587 pages
Mail Marge-WPS Office
No ratings yet
Mail Marge-WPS Office
3 pages
Chapter 2 Data Encoding Techniques
No ratings yet
Chapter 2 Data Encoding Techniques
63 pages
Sandeep
No ratings yet
Sandeep
9 pages
It Terminology
No ratings yet
It Terminology
19 pages
IJCRT2208249
No ratings yet
IJCRT2208249
14 pages
XI-Emerging Trends-Notes
100% (2)
XI-Emerging Trends-Notes
6 pages
Rifkipbi, SUJATMIKO
No ratings yet
Rifkipbi, SUJATMIKO
16 pages
Vmware Data Protection With Hitachi Data Protection Suite: Please Insert Your Hitachi Truenorth Partner Logo Here
No ratings yet
Vmware Data Protection With Hitachi Data Protection Suite: Please Insert Your Hitachi Truenorth Partner Logo Here
21 pages

R NGS

Uploaded by

R NGS

Uploaded by

R & NGS

Dr. G. Ramesh Kumar, PhD.,

• Seq2pathway: an R/Bioconductor package for

• The following topics will be introduced:

• The following topics will be covered:

• interesting problems will require the

• when were the samples processed

• concentrate development resources on a few

Pre-processing affy marray

You might also like