0% found this document useful (0 votes)

43 views18 pages

Introduction to Bioinformatics - Notes

Uploaded by

�DARLA ASHISH RATNA �DARLA ASHISH RATNA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views18 pages

Introduction to Bioinformatics - Notes

Uploaded by

�DARLA ASHISH RATNA �DARLA ASHISH RATNA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes

Unit I - Introduction to Bioinformatics - Notes

Due No Due Date Points 0 Available after Apr 12 at 12am

BioInformatics

Unit I - Introduction to BioInformatics

Topics for Discussion

Introduction

Branches in BioInformatics

Aim and Scope of BioInformatics

Sequence File Formats

Sequence Conversion Tools

Molecular Filer Formats

Molecular File Format Conversion

Questions

Introduction
The word “bioinformatics” is a shortened form of “biological informatics”. The huge demand
for the analysis and interpretation of the biological data is being managed by the evolving
https://canvas.instructure.com/courses/4675110/assignments/29562991 1/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes

science of bioinformatics. Bioinformatics is defined as the application of computational and

analytical tools to capture and interpret the biological data.

Bioinformatics is often focused on obtaining biologically oriented data such as nucleic acid
(DNA/RNA) and protein sequences, structures, functions, pathways, and interactions
organizing these data into databases, developing methods to get useful information from
these databases, and devising methods to integrate the related data from disparate
sources. These computer databases and algorithms are developed to speed up and enhance
biological research.

Bioinformatics is defined as the Application of Tools of Computation and Analysis to

Capture and Interpretation of Biological Data.

Bioinformatics can be understood as Study of Biological Information through the

Computational Science and Tools. It could also be considered as “Biology as a Data
Science” Course.

Bioinformatics is an interdisciplinary field mainly involving Molecular Biology and

Genetics, Computer Science, Mathematics, And Statistics.

Data Intensive, Large-Scale Biological Problems are addressed from a Computational Point
of View.

The Most Common Problems are Modelling Biological Processes at The Molecular Level And
Making Inferences From Collected Data.

A Bioinformatics Solution usually involves the following Steps:

Collect Statistics From Biological Data.

Build a Computational Model.
Solve a Computational Modelling Problem.
Test and Evaluate a Computational Algorithm.

Bioinformatics Work Involves Applying Computer Science Techniques To Biological

Problems, Especially Related To Sequence Analysis, Alignment and Assembly.

Bioinformaticians Are Needed To Perform Tasks Such As:

Modelling: Estimation of Protein Structures and Simulation of Molecular Interactions.

D t P i g P i dA l i
https://canvas.instructure.com/courses/4675110/assignments/29562991
S i D t F E l F N 2/18
t
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes
Data Processing: Processing and Analyzing Sequencing Data, For Example, From Next-
generation Sequencing or Single-cell Sequencing.
Virtual Screening: Discovery of Leads (potential New Drugs) using Computational
Methods.
Data Science: Analysis and Interpretation of Data.

Few Biological Terms to be Familiar With:

DNA (DeoxyriboNucliec Acid): It is the hereditary material in humans and almost all
other organisms. Nearly every cell in a person’s body has the same DNA.
RNA (RiboNucliec Acid): It is a molecule similar to DNA. RNA is single-stranded. An
RNA strand has a backbone made of alternating sugar (ribose) and phosphate groups.
Gene: A gene is the basic physical and functional unit of heredity. Genes are made up
of DNA.
Amino Acid: Amino acids are molecules that combine to form proteins. Amino acids
and proteins are the building blocks of life.

Deoxyribonucleic Acid (DNA)

DNA, the carrier of information of inheritance, which consists of only four alphabets A, T,
G, and C.

Precisely, the human genome contains several thousand genes, distributed between the 23
pairs of chromosomes in a cell.

The genes are the recipes for proteins, the building blocks and workers in the body.

Different genes are active in different types of cells, e.g., a liver cell does not express the
same genes as a brain cell.

Some proteins are vital for the survival of a cell and their corresponding genes are therefore
active in all cell types and are known as “Housekeeping Genes”.

Gene Consists of Three Major Structures:

Gene Regulatory Segment

Exon
Intron

The gene regulatory segment, which contains structures involved in the initiation and
regulation of transcription.

Exons, the protein coding part of the gene.

Introns, the non-coding part of the gene.

The flow of information from the genes determines the protein composition and thereby the
functions of the cell.

DNA is situated in the nucleus of the cell, organized into chromosomes.

Every cell must contain genetic information, so the DNA is duplicated before a cell divides;
thi i k R li ti
https://canvas.instructure.com/courses/4675110/assignments/29562991 3/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes
this process is known as Replication.

In all eukaryotic cells, DNA never leaves the nucleus; instead, the genetic recipe (the genes)
is copied into RNA, which in turn is decoded (translated) into proteins in the cytoplasm.
The DNA itself is not translated into proteins directly for several reasons:
Security: The daily transcription of genes to proteins would be harmful to the DNA, which
has to stay intact to maintain life.
Regulate The Rate of Protein Synthesis: Speed at which the rate of Conversion Takes
Place.

Information Flow from DNA to Protein Through Transcription and Translation

The journey from gene to protein is complex and tightly controlled within each cell.
It Consists of Two Major Steps:

Transcription
Translation

Together, transcription and translation are known as gene expression.

During the process of transcription, the information stored in a gene's DNA is
passed to a similar molecule called RNA (ribonucleic acid) in the cell nucleus.
Both RNA and DNA are made up of a chain of building blocks called nucleotides, but they
have slightly different chemical properties.
The type of RNA that contains the information for making a protein is called messenger

RNA (mRNA) because it carries the information, or message, from the DNA out of the
nucleus into the cytoplasm.
Translation, the second step in getting from a gene to a protein, takes place in the
cytoplasm.
The mRNA interacts with a specialized complex called a ribosome, which "reads"
th f RNA l tid
https://canvas.instructure.com/courses/4675110/assignments/29562991 4/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes
the sequence of mRNA nucleotides.
Each sequence of three nucleotides, called a codon, usually codes for one particular amino
acid.
Amino acids are the building blocks of proteins.
A type of RNA called transfer RNA (tRNA) assembles the protein, one amino acid at a time.
Protein assembly continues until the ribosome encounters a “stop” codon (a sequence of
three nucleotides that does not code for an amino acid).
The flow of information from DNA to RNA to proteins is one of the fundamental principles
of molecular biology. It is so important that it is sometimes called the “central dogma.”
DNA makes RNA makes protein.

Amino Acids:
Amino acids are molecules that combine to form proteins. Amino acids and proteins are
the building blocks of life.
When proteins are digested or broken down, amino acids are left. The human body uses
amino acids to make proteins to help the body:

Break down food

Grow
Repair body tissue
Perform many other body functions

Amino acids can also be used as a source of energy by the body.

Amino acids are classified into three groups:

Essential Amino Acids

Nonessential Amino Acids
Conditional Amino Acids

Essential Amino Acids:

Essential amino acids cannot be made by the body. As a result, they must come from
food.
The 9 essential amino acids are: histidine, isoleucine, leucine, lysine, methionine,
phenylalanine, threonine, tryptophan, and valine.

Non-Essential Amino Acids:

https://canvas.instructure.com/courses/4675110/assignments/29562991 5/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes

Nonessential means that our bodies can produce the amino acid, even if we do not get it
from the food we eat.
Nonessential amino acids include: alanine, arginine, asparagine, aspartic acid, cysteine,
glutamic acid, glutamine, glycine, proline, serine, and tyrosine.

Conditional Amino Acids:

Conditional amino acids are usually not essential, except in times of illness and stress.
Conditional amino acids include: arginine, cysteine, glutamine, tyrosine, glycine,
ornithine, proline, and serine.

Branches in BioInformatics
A living cell is a system where cellular components such as genome, the gene transcript,
and the proteins interact with each other, and these interactions determine the fate of the
cell, e.g., whether a stem cell is going to become a liver cell or a cancer cell. The
characterization of these three types of components and the associated development of
analytical methods lead to the establishment of the three closely related branches of
bioinformatics: genomics, transcriptomics, and proteomics.

Genomics:
Genomics is the study of all of a person's genes (the genome), including interactions of
those genes with each other and with the person's environment.
It Studies the Mapping of Nucleotide Sequences of all the Chromosomes of an Organism
and the Location of Different Genes and their Sequences are thereby Determined.
Thi i l t i l i f th l i
https://canvas.instructure.com/courses/4675110/assignments/29562991
id th h l l bi l t h i 6/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes
This involves extensive analysis of the nucleic acids through molecular biology techniques
before the data are ready for processing by computers.
It is a science that attempts to describe a living organism in terms of the sequence of its
genome (its constituent genetic material).
Genomics uses the techniques of molecular biology and bioinformatics to identify cellular
components such as proteins, rRNA, tRNA, etc., and analyse the sequences attributed to
the structural genes, regulatory sequences, and even non-coding sequences.
Genomics is closely related to, and sometimes considered a branch of genetics, the study of
genes and heredity.
The first automatic DNA sequencer was developed in 1986 by Leroy Hood. This paved the
way for the official beginning of the HGP in 1990, which gave a boost to genomics.
A large number of bacterial genomes have already been fully sequenced and put in the
public domain.
Haemophilus influenzae was the first bacterium to be sequenced in 1995. The sequencing
of bacterial genomes was followed by the first sequenced eukaryotic organism, the
unicellular genetic model system Saccharomyces cerevisiae (commonly known as baker’s
yeast).
In December 1998, the first multicellular organism was added to the list, the nematode
Caenorhabditis elegans, which is now considered as a model organism to provide us with
information about unique functions in organisms of greater complexity.
The sum of all these information is enormous and its potential in our understanding of life
processes can be explored with the help of genomics, almost synonymous with
bioinformatics.

Transcriptomics:
It is the study of the transcriptome - the complete set of RNA transcripts that are produced
by the genome, under specific circumstances or in a specific cell - using high-throughput
methods, such as microarray analysis.
Transcriptomics is the study of the transcriptome, which includes the whole set of mRNA
molecules (or transcripts) in one or a population of biological cells for a given set of
environmental circumstances.
This study helps us to depict the expression level of genes, often using techniques such as
DNA microarrays, that is capable of sampling tens of thousands of different mRNAs at a
time.
Limitation of Transcriptomics:

The relative abundance of transcripts as characterized by the sequential analysis of

gene expression (SAGE) or microarray experiments is not always a good predictor of the
relative abundance of proteins.

Proteomics:
It is the systematic, large-scale analysis of proteins. It is based on the concept of the
proteome as a complete set of proteins produced by a given cell or organism under a
defined set of conditions.
Proteomics represents the earliest attempt to identify a major sub-class of cellular
t th t i d th i i t
https://canvas.instructure.com/courses/4675110/assignments/29562991
ti 7/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes
components - the proteins - and their interactions.
Proteomics involves the sequencing of amino acids in a protein, determining its 3D
structure and relating it to the function of the protein.
Metabolic proteins such as haemoglobin and insulin have been subjected to intensive
proteomic investigation.
The term ‘proteomics’ was coined to make an analogy with genomics, and while it is often
viewed as the ‘next step’, proteomics is much more complicated than genomics.
A single organism has radically different protein expressions in different parts of its body,
in different stages of its life cycle and in different environmental conditions.
The complete set of proteins existing in an organism throughout its life cycle or, on a
smaller scale, the set of proteins found in a particular cell type under a particular type of
stimulation, is referred to as the proteome of the organism or cell type, respectively.

Aim and Scope of BioInformatics

The aim of bioinformatics is fourfold and includes:

Data Acquisition
Tool and Database Development
Data Analysis
Data Integration

Data Acquisition:

Data acquisition is primarily concerned with accessing and storing data generated
directly from the biological experiments.
The data generated by various sequencing projects have to be retrieved in the
appropriate format, and be capable of being linked to all the information related to the
DNA samples, such as the species, tissue type, and quality parameters used in the
experiments.
The data are organized in different databases so that the researchers can access
existing information and submit new entries as and when they are produced.
Examples of such database are the Entrez Genome of NCBI (for genome data) and the
Protein Data Bank (for 3D macromolecular structures data).

Tool and Database Development:

Many laboratories generate large volumes of data such as DNA sequences, gene
expression information, 3D molecular structure, and high-throughput screening.
Consequently, they must develop effective databases for storing and quickly accessing
data.
The other aim is to develop tools and resources that aid in the analysis of data.
F l h i d ti l
https://canvas.instructure.com/courses/4675110/assignments/29562991
t i it i f i t tt it ith8/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes
For example, having sequenced a particular protein, it is of interest to compare it with
previously characterized sequences.
Programs such as FASTA and PSIBLAST must consider what comprises a biologically
significant match.

Data Analysis:

The third aim is to use these tools to analyze the data and interpret the results in a
biologically meaningful manner.
Traditionally, biological studies examined individual systems in detail, and compared
those with a few related systems.
In bioinformatics, we can now conduct a global analysis of all the available data with
the aim of unveiling common principles that apply across many systems and highlight
novel features.
Efficient analysis requires an efficiently designed database.
It must allow researchers to place their query effectively and provide them with all the
information they need to begin their data analysis.

Data Integration:

Once information has been analyzed, a researcher must often associate or integrate it
with the related data from other databases.
For example, a scientist may run a series of gene expression analysis experiments and
observe that a particular set of 100 genes is more highly expressed in a cancerous lung
tissue than in a normal lung tissue.

Sequence File Formats

The biological data stored in databases are broadly represented either as sequence or
molecular coordinates.
Each Database has its Own File Format for Storing Data.

File Formats Categorized as:

Sequence File Format

Molecular File Format

Sequence File Format:

Sequence File is a flat file consisting of binary key/value pairs.

It is extensively used in MapReduce as input/output formats.
It is also worth noting that, internally, the temporary outputs of maps are stored using
Sequence File.

GenBank Flat File Format:

G B k fl tfil (GBF) f ti f th
https://canvas.instructure.com/courses/4675110/assignments/29562991
t l fil f t b 9/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes
GenBank flatfile (GBF) format is one of the most popular sequence file formats because
of its detailed sequence features and ease of readability.
To use the data in the file by a computer, a parsing process is required and is
performed according to a given grammar for the sequence and the description in a GBF.
Each GenBank entry includes a concise description of sequence, its scientific name and
taxonomy of the source organism, a table of features that identifies the coding regions
and other sites of biological significance (such as transcription units, sites of mutations
or modifications or repetitions).
GenBank Flat File Format has Three Sections:
Header
Features
Sequence

https://canvas.instructure.com/courses/4675110/assignments/29562991 10/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes

FASTA Format:

FASTA format is a text-based format for representing either nucleotide sequences or

peptide sequences, in which base pairs or amino acids are represented using single-
letter codes.
A sequence in FASTA format begins with a single-line description, followed by lines of
sequence data.
The description line is distinguished from the sequence data by a greater-than (">")
symbol in the first column.
It is recommended that all lines of text be shorter than 80 characters in length.
A sequence in FASTA format consists of: One line starting with a ">" sign, followed by a
sequence identification code.
It is optionally be followed by a textual description of the sequence.
For Example:
>gi|129295|sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED)
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSF

Multi-FASTA Format:

A text file file containing several DNA sequences in fasta format. Every fasta entry has 2
fundamental blocks.

The first one is a single text line starting by '>' character following by a sequence
description. The second block is the sequence and may contain several lines.
For Example:

https://canvas.instructure.com/courses/4675110/assignments/29562991 11/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes

GCG-MSF Format:

We can combine multiple sequences in a single file, called a Multiple Sequence Format
(MSF) file. MSF files include not only the sequence name but also the sequence itself,
which is usually aligned with the other sequences in the file.
We can specify a single sequence within an MSF file, a subset of sequences, or all
sequences. Like other sequences, those in an MSF file can be used with other GCG
programs.
For Example:

EMBL Format:

European Molecular Biology Laboratory (EMBL) File Format stores sequence and its
annotation together.

The start of the annotation section is marked by a line beginning with the word “ID”.

The start of sequence section is marked by a line beginning with the word “SQ”.

The “//” (terminator) line also contains no data or comments and designates the end of an
entry

For Example:

https://canvas.instructure.com/courses/4675110/assignments/29562991 12/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes

Clustal Format:

A clustal-formatted file is a plain text format. It can optionally have a header, which
states the clustal version number.
This is followed by the multiple sequence alignment, and optional information about the
degree of conservation at each position in the alignment.
Each sequence in the alignment is divided into subsequences each at most 60
characters long.
The sequence identifier for each sequence precedes each subsequence.
Each subsequence can optionally be followed by the cumulative number of non-gap
characters up to that point in the full sequence
ClustalW is a widely used system for aligning any number of homologous nucleotide or
protein sequences.
For multi-sequence alignments, ClustalW uses progressive alignment methods. In
these, the most similar sequences, that is, those with the best alignment score are
aligned first.
Then progressively more distant groups of sequences are aligned until a global
alignment is obtained.
This heuristic approach is necessary because finding the global optimal solution is
prohibitive in both memory and time requirements.
ClustalW performs very well in practice. The algorithm starts by computing a rough
distance matrix between each pair of sequences based on pairwise sequence alignment
scores.
These scores are computed using the pairwise alignment parameters for DNA and
protein sequences.

Phylip Format:

PHYLIP format is a plain text format containing exactly two sections: a header
describing the dimensions of the alignment, followed by the multiple sequence
alignment itself.
PHYLIP requires that each sequence identifier is exactly 10 characters long.
https://canvas.instructure.com/courses/4675110/assignments/29562991 13/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes

The header consists of a single line describing the dimensions of the alignment. It must
be the first line in the file.
The header consists of optional spaces, followed by two positive integers (n and m)
separated by one or more spaces.
The first integer (n) specifies the number of sequences (i.e., the number of rows) in the
alignment.
The second integer (m) specifies the length of the sequences (i.e., the number of
columns) in the alignment.
The smallest supported alignment dimensions are 1*1.

Nexus Format:

NEXUS is the file format used by many popular programs like GDA, Paup*, Mesquite,
ModelTest, MrBayes, and MacClade. Nexus file names often have a .nxs or .nex extension.

The NEXUS format conveys data organized according to the character state data model, in
which the features of operational taxonomic units (OTUs) (e.g., species, individuals, genes,
genomes, etc.) are observable states of underlying homologous characters.

For instance, in a protein sequence alignment, proteins are the OTUs, alignment columns
are characters, and amino acids (or gaps) are states.

In evolutionary analysis, it is typical to consider differences as the result of state

transitions that take place on branches of a tree, therefore the NEXUS file provides a
means to represent a tree (in the standard Newick (a.k.a. New Hampshire) format).

The syntactic structure of a NEXUS file is as follows:

https://canvas.instructure.com/courses/4675110/assignments/29562991 14/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes

Each of the pre-defined types of public blocks may appear only once. The TAXA block is the
only necessary block.

Sequence Conversion Tools

GCG:

RedSeq:

SeqVerter:

SeqVerter can help you to view automatic DNA sequencer chromatogram files. It is a
free sequence file format conversion utility by GeneStudio, Inc.
SeqVerter encapsulates a small subset of the features offered by the GeneStudio Pro
suite of programs.
Advanced Sequence File Format Conversion:
O f lti l
https://canvas.instructure.com/courses/4675110/assignments/29562991
fil i lt l 15/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes
Open sequences from multiple source files simultaneously.
View sequences,
Select a subset of sequences for conversion.
Merge sequences from different source files into one multiple sequence file.
Split sequences from multiple sequence files into individual (single) sequence files.
Trim ends of automatic sequencer-generated files.
Set your favorite default output format.
Enter file headers required by the GenBank sequence submission and update tool,
SequIn.

Molecular File Formats

The 3D Structures of Proteins Obtained from X-Ray Crystalography and NMR Methods are
represented by their atomic or molecular coordinates.

Some File Formats are:

Protein Data Bank.

Tripo’s Alchemy and Sybyl Mol2 Format.
MacroMolecular Crystallographic Information File (mmCIF).

Protein Data Bank:

The PDB is a structure database that contains the three-dimensional crystal structure of
macromolecules that are experimentally determined. These experimental methods are X-
ray crystallography and NMR spectroscopy and nowadays cryo-electron microscopy is also
used. The PDB is a key in areas of structural biology, such as structural genomics. Most
major scientific journals and some funding agencies now require scientists to submit their
structure data to the PDB. Many other databases use protein structures deposited in the
PDB. For example, SCOP and CATH classify protein structures, while PDBsum provides a
graphic overview of PDB entries using information from other sources, such as Gene
Ontology. PDB provides access to 3D structure data for large biological molecules (proteins,
DNA, and RNA). These are the molecules of life, found in all organisms on the planet.

https://canvas.instructure.com/courses/4675110/assignments/29562991 16/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes

Molecular File Format Conversion

Pdb2cif: Converts a PDB File to mmCIF File.
Cif2pdb: A Program to Convert mmCIF to Psuedo-PDB Format.
Babel: A popular program that is designed to inter-convert a number of file formats
used in molecular modeling.
Mol2Mol: It is popular molecular file conversion tool, which supports the read and write
operations of the following formats:

Questions
Uday

Kiran
https://canvas.instructure.com/courses/4675110/assignments/29562991 17/18
5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes
Kiran

https://canvas.instructure.com/courses/4675110/assignments/29562991 18/18

BBL 434 - Bioinformatics: D. Sundar
100% (1)
BBL 434 - Bioinformatics: D. Sundar
22 pages
Bioinfo PPTs
No ratings yet
Bioinfo PPTs
204 pages
Introduction To Bioinformatics Unit-II Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics Unit-II Introduction To Bioinformatics
78 pages
BIF401 Midterm Short Notes
No ratings yet
BIF401 Midterm Short Notes
45 pages
Bio Info Merged
No ratings yet
Bio Info Merged
154 pages
#1 Pendahuluan
No ratings yet
#1 Pendahuluan
134 pages
CE6068 Lecture 2
No ratings yet
CE6068 Lecture 2
95 pages
Bioinfromatics part -2
No ratings yet
Bioinfromatics part -2
77 pages
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
100% (2)
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
54 pages
CE6068 Lecture 1
No ratings yet
CE6068 Lecture 1
89 pages
Bioin
No ratings yet
Bioin
34 pages
Lecture1-Bioinformatics Technologies
No ratings yet
Lecture1-Bioinformatics Technologies
69 pages
21BTB102T_2024_09_25_ClassExtra_N1
No ratings yet
21BTB102T_2024_09_25_ClassExtra_N1
48 pages
Lecture 1: INTRODUCTION: A/Prof. Ly Le School of Biotechnology Email: Office: RM 705
100% (1)
Lecture 1: INTRODUCTION: A/Prof. Ly Le School of Biotechnology Email: Office: RM 705
43 pages
Bioinformatics_BridgeCourse-200727-135910
No ratings yet
Bioinformatics_BridgeCourse-200727-135910
29 pages
Bioinformatics 1.1
No ratings yet
Bioinformatics 1.1
52 pages
Bioninformaticas Lecture - 1
No ratings yet
Bioninformaticas Lecture - 1
33 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
1
No ratings yet
1
36 pages
Merge
No ratings yet
Merge
370 pages
Module-I
No ratings yet
Module-I
65 pages
Intro To Bioinformatics
No ratings yet
Intro To Bioinformatics
16 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Bioinformatics Lecture 1-Fall 2024
No ratings yet
Bioinformatics Lecture 1-Fall 2024
39 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
104 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
76 pages
Bio-1-25
No ratings yet
Bio-1-25
50 pages
lec-01
No ratings yet
lec-01
93 pages
L1-MolecularBiology
No ratings yet
L1-MolecularBiology
31 pages
Genome Organization and Biosynthesis of Proteins
No ratings yet
Genome Organization and Biosynthesis of Proteins
48 pages
Introduction To Genomics: Children's Hospital Informatics Program
No ratings yet
Introduction To Genomics: Children's Hospital Informatics Program
22 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
61 pages
Genetics: An Introduction: Peter J. Russell
No ratings yet
Genetics: An Introduction: Peter J. Russell
72 pages
Lecture 01
No ratings yet
Lecture 01
20 pages
Bioinfo Notes
No ratings yet
Bioinfo Notes
5 pages
Biotechnology Notes
No ratings yet
Biotechnology Notes
49 pages
Molecular Biology Notes
No ratings yet
Molecular Biology Notes
4 pages
INtroduction To Informatics
No ratings yet
INtroduction To Informatics
61 pages
CS284A Introduction To Computational Biology and Bioinformatics
No ratings yet
CS284A Introduction To Computational Biology and Bioinformatics
24 pages
Bioinformatics Made Easy
No ratings yet
Bioinformatics Made Easy
232 pages
Introduction To Bioinformatics 1
No ratings yet
Introduction To Bioinformatics 1
109 pages
Bioinformatics
100% (2)
Bioinformatics
104 pages
Test For Upload
No ratings yet
Test For Upload
25 pages
Into To Bioinfo
No ratings yet
Into To Bioinfo
53 pages
L2 Proteomics, Genomics and Bioinformatics
No ratings yet
L2 Proteomics, Genomics and Bioinformatics
30 pages
01introduction PDF
No ratings yet
01introduction PDF
22 pages
Complete Notes On Bioinformatics
No ratings yet
Complete Notes On Bioinformatics
17 pages
Introduction To Bioinformatics
100% (1)
Introduction To Bioinformatics
52 pages
BTH 403-BTG407 LECTURE 1
No ratings yet
BTH 403-BTG407 LECTURE 1
6 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
Biotechnology Merged
No ratings yet
Biotechnology Merged
369 pages
Bioinformatics and Quantumcomputing: Bio Informatics
No ratings yet
Bioinformatics and Quantumcomputing: Bio Informatics
10 pages
Bioinformatics and Functional Genomics, Second Edition. by Jonathan Pevsner
No ratings yet
Bioinformatics and Functional Genomics, Second Edition. by Jonathan Pevsner
9 pages
Protein Synthesis Lecture Powerpoint
100% (1)
Protein Synthesis Lecture Powerpoint
26 pages
What Is Bioinformatics
No ratings yet
What Is Bioinformatics
3 pages
Download
No ratings yet
Download
19 pages
SU Admission Topics Medical Biology Medical Chemistry
No ratings yet
SU Admission Topics Medical Biology Medical Chemistry
3 pages
Pymble 2022 Biology Trials & Solutions
No ratings yet
Pymble 2022 Biology Trials & Solutions
41 pages
Evidence of Organic Evolution
No ratings yet
Evidence of Organic Evolution
11 pages
Essentials of Genomic and Personalized Medicine Full-Feature Download
100% (10)
Essentials of Genomic and Personalized Medicine Full-Feature Download
15 pages
Molecular Basis of Inheritance Practice Sheet Yakeen NEET 4.0 2024
No ratings yet
Molecular Basis of Inheritance Practice Sheet Yakeen NEET 4.0 2024
4 pages
Biotech Qns 1
100% (3)
Biotech Qns 1
4 pages
PCR Amplification Lab Report
100% (7)
PCR Amplification Lab Report
5 pages
Hi Scribe T7 ARCAm RNAmanual E2060
No ratings yet
Hi Scribe T7 ARCAm RNAmanual E2060
11 pages
12Th Biology FBISE Solved Past Papers by Encore Star Academy 03064941878 DR - Sadaqat Baloch
No ratings yet
12Th Biology FBISE Solved Past Papers by Encore Star Academy 03064941878 DR - Sadaqat Baloch
11 pages
Module 3 Biochemistry of The Brain
100% (1)
Module 3 Biochemistry of The Brain
12 pages
DPP Biomolecules
No ratings yet
DPP Biomolecules
7 pages
Ligation Sequencing V14 - PCR Barcoding (SQK-LSK114 With EXP-PBC001 or EXP-PBC096) - Minion
No ratings yet
Ligation Sequencing V14 - PCR Barcoding (SQK-LSK114 With EXP-PBC001 or EXP-PBC096) - Minion
10 pages
EARTH & LIFE SCIENCE Learning Activity Sheet No. 3 - Q2 Genetic Engineering
No ratings yet
EARTH & LIFE SCIENCE Learning Activity Sheet No. 3 - Q2 Genetic Engineering
4 pages
Genome Browser Exercise
No ratings yet
Genome Browser Exercise
5 pages
AIIMS Solved Paper 2007
100% (1)
AIIMS Solved Paper 2007
32 pages
AP Bio Gene Expression & Regulation MC (5 Steps To A 5)
No ratings yet
AP Bio Gene Expression & Regulation MC (5 Steps To A 5)
7 pages
Grade 12 LM Physical Science 1 Module4
No ratings yet
Grade 12 LM Physical Science 1 Module4
21 pages
Grade 9 WHLP and PA Week 3 & 4 Science
100% (1)
Grade 9 WHLP and PA Week 3 & 4 Science
12 pages
DNAorigami
No ratings yet
DNAorigami
2 pages
SCIENCE 8 Remedial Class: What Is DNA?
No ratings yet
SCIENCE 8 Remedial Class: What Is DNA?
2 pages
Cell Biology Lecture 1 & 2 Sem I 2011-2012 Introduction To Cell Biology, Cell As The Basic Unit of Life SPIN
100% (1)
Cell Biology Lecture 1 & 2 Sem I 2011-2012 Introduction To Cell Biology, Cell As The Basic Unit of Life SPIN
121 pages
Regents Protein Synthesis
No ratings yet
Regents Protein Synthesis
5 pages
Restriction Enzymes Activity
No ratings yet
Restriction Enzymes Activity
7 pages
Structure and Function of DNA and RNA
No ratings yet
Structure and Function of DNA and RNA
6 pages
DNA Purification Kit
No ratings yet
DNA Purification Kit
8 pages
My COT
No ratings yet
My COT
3 pages
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Biological Sequence Analysis: A Comprehensive Guide to Unraveling Genetic Code
From Everand
Biological Sequence Analysis: A Comprehensive Guide to Unraveling Genetic Code
Pasquale De Marco
No ratings yet
ChatGPT talks on science for young people: Molecular Biology!: Discover the secrets of life with the help of artificial intelligence
From Everand
ChatGPT talks on science for young people: Molecular Biology!: Discover the secrets of life with the help of artificial intelligence
Paulo Dario
No ratings yet
Molecular Methods in the Biological Sciences
From Everand
Molecular Methods in the Biological Sciences
Pasquale De Marco
No ratings yet
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Biology Unleashed: A Comprehensive Guide to Mastering the Science of Life
From Everand
Biology Unleashed: A Comprehensive Guide to Mastering the Science of Life
Dominic Front
No ratings yet
Systems Biology: A Textbook
From Everand
Systems Biology: A Textbook
Edda Klipp
No ratings yet

Introduction to Bioinformatics - Notes

Uploaded by

Introduction to Bioinformatics - Notes

Uploaded by

5/5/22, 8:42 PM Unit I - Introduction to Bioinformatics - Notes

Unit I - Introduction to Bioinformatics - Notes

Due No Due Date Points 0 Available after Apr 12 at 12am

Unit I - Introduction to BioInformatics

Topics for Discussion

Aim and Scope of BioInformatics

Sequence File Formats

Sequence Conversion Tools

Molecular Filer Formats

Molecular File Format Conversion

science of bioinformatics. Bioinformatics is defined as the application of computational and

Bioinformatics is defined as the Application of Tools of Computation and Analysis to

Bioinformatics can be understood as Study of Biological Information through the

Bioinformatics is an interdisciplinary field mainly involving Molecular Biology and

A Bioinformatics Solution usually involves the following Steps:

Collect Statistics From Biological Data.

Bioinformatics Work Involves Applying Computer Science Techniques To Biological

Bioinformaticians Are Needed To Perform Tasks Such As:

Modelling: Estimation of Protein Structures and Simulation of Molecular Interactions.

Few Biological Terms to be Familiar With:

Deoxyribonucleic Acid (DNA)

Gene Consists of Three Major Structures:

Gene Regulatory Segment

Exons, the protein coding part of the gene.

Introns, the non-coding part of the gene.

DNA is situated in the nucleus of the cell, organized into chromosomes.

Information Flow from DNA to Protein Through Transcription and Translation

Together, transcription and translation are known as gene expression.

Break down food

Amino acids can also be used as a source of energy by the body.

Amino acids are classified into three groups:

Essential Amino Acids

Essential Amino Acids:

Non-Essential Amino Acids:

Conditional Amino Acids:

The relative abundance of transcripts as characterized by the sequential analysis of

Aim and Scope of BioInformatics

Tool and Database Development:

Sequence File Formats

File Formats Categorized as:

Sequence File Format

Sequence File Format:

Sequence File is a flat file consisting of binary key/value pairs.

GenBank Flat File Format:

FASTA format is a text-based format for representing either nucleotide sequences or

In evolutionary analysis, it is typical to consider differences as the result of state

The syntactic structure of a NEXUS file is as follows:

Sequence Conversion Tools

Molecular File Formats

Some File Formats are:

Protein Data Bank.

Protein Data Bank:

Molecular File Format Conversion

You might also like