0% found this document useful (0 votes)
7 views

Sec1 Introduction to Bioinformatics

The document provides an introduction to bioinformatics, defining it as the development of methods for managing and analyzing biological information from genomics and high-throughput experiments. It outlines the types of genomic data, the importance of studying bioinformatics, and the role of databases in organizing biological information. Additionally, it discusses various sequence data formats, particularly the FASTA format, used for representing nucleotide and peptide sequences.

Uploaded by

mernagoodgirl666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Sec1 Introduction to Bioinformatics

The document provides an introduction to bioinformatics, defining it as the development of methods for managing and analyzing biological information from genomics and high-throughput experiments. It outlines the types of genomic data, the importance of studying bioinformatics, and the role of databases in organizing biological information. Additionally, it discusses various sequence data formats, particularly the FASTA format, used for representing nucleotide and peptide sequences.

Uploaded by

mernagoodgirl666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction to

Bioinformatics

Helwan National University


Faculty of Science
Biotechnology and Genetic
Engineering Program

Second-year students 2024-2025

TA / A S M A A A S H R A F 04/07/20
Defi nition: Bioinformatics is the
development of methods for the
management and analysis of biological
information arising from genomics and high
throughput experiments.

What is
o for molecular biologists , bioinformatics
Bioinformatics can be regarded as computational
molecular biology, that uses
? computational techniques to study the
structure, function, regulation, and
interactive network of genes and proteins.
The ultimate is to analyze and predict the structure,
goal organization, function, regulation, and
dynamics of the entire genome of an
organism.
It is an interdisciplinary
field, which harnesses
computer science,
mathematics, physics, and
biology

Fig1 Interaction of disciplines that have contributed to the


formation of bioinformatics.
o Genomic data encompasses a wide
range of information, with sequence
data at its core

Data o The term "ome" in genomics refers


to the entire collection of an entity,
such as the transcriptome,
proteome, or interactome.
Fig 2 The central dogma of molecular biology and correspondence with 'omics'
TYPES OF GENOMIC DATA INCLUDE:

oDNA sequence data , which includes gene and mRNA sequences in the form of
complementary DNA (cDNA)1.

oGene- and protein-expression data , facilitated by techniques like microarrays for


studying global gene expression, also known as transcriptomics.

oProteome data

oMetabolome data .

oProtein-protein interaction data .

oProtein structural data.

oProtein-DNA interaction data.

oGene and protein network data.

oSmall noncoding RNA (ncRNA) data.


Why to study Bioinformatics
o Understanding Biological Processes: Develop a deep understanding of biological processes
through analysis and integration of gene and protein information.

o Determines the biological role of genes and proteins

o Identifies conserved genes and mutations.

o Identifies genes, proteins, and functional elements in DNA/RNA sequences

o Compares genomes of different species to study evolution.

o Enables sequence alignment, comparison, and annotation

o Integrates multi-omics data (genomics, proteomics, metabolomics).

o Predicts 3D structures of proteins and nucleic acids.

o Aids in drug discovery and molecular docking

o Tool Development: Create new bioinformatics tools and improve existing ones for various
analyses.
DATABASES

are digital repositories based on a


computerized software for storage of
information in a system and their
retrieval from the system using search
tools.

Databases
BIOLOGICAL DATABASES
The primary objective of a
database is to organize the data are libraries of biological information,
in a structured and searchable collected from scientifi c experiments,
form allowing easy retrieval of
published literature, high-throughput
useful data
experiment technology, and
computational analysis.
PRIMARY DATABASES SECONDARY DATABASES

o are archival in nature. o are curated, non-redundant


databases
o contain raw sequence data (experimental
results) with some interpretation and o are derived from the primary (archival)
explanation databases.

o the data are not curated. o Multiple entries of the same sequence
in primary databases are merged to
o There are redundancies—that is, the create a single sequence in the
same sequence might be submitted by secondary database with extensive
different laboratories, sometimes under annotation derived from all available
different names information on the sequence

o There are three primary databases that o For example:


contain all the sequence data so far  the NCBI RefSeq database:
generated: sequences including genomic DNA,
 GenBank, transcripts, and proteins
 EMBL database, also called the EMBL-Bank,  UniProtKB/Swiss-Prot: Secondary
 and DDBJ (DNA Databank of Japan) database of proteins
o All published DNA and RNA
sequences are usually deposited
in three parallel public
databases.

o Three collaborat ing databases


The INS DC collaboration

(Int ernational Nucleot ide Sequence


Dat abase Collaboration)

1. GenBank

2. DNA Database of Japan(DDBJ)

3.E uropean M olecular Biology


Laboratory (EM BL) Database
o NCBI (National Center for Biotechnology
Information)

o GenBank (Genetic Sequence Databank) at


NCBI

o EMBL (European Molecular Biology


Laboratory)
Some popular o DDBJ (DNA Data Bank of Japan)

databases
are:
Protein databases

o SwissProt

o UniProt
NCBI
o Very comprehensive biological database

o GENBANK: The nucleotide sequence


database

o Provides 42 diff erent resource NCBI


o Provides a simple and easy to use web National Center for
Biotechnology Information
interface
(Part of the U.S. National
o Search Engine for data retrieval: Entrez Institutes of Health)

o Retrieves information across all the


resources under NCBI. Example: PubMed,
taxonomy, SNP, PubChem etc.
Entrez databases

For example, most common


Entrez databases include
PubMed, Nucleotide, Protein
and Structure.
o A sequence data format is a specifi c
layout or arrangement of text
characters, symbols, keywords, and
description that identify a sequence
and contain information about its

SEQUENCE various attributes.

DATA o A variety of diff erent fi le formats have


been developed to store/analyse DNA
FORMATS and protein sequence information.

o A widely used input sequence format


for the purpose of analysis is the FASTA
format.
o FASTA (pronounced fast “A”) stands for
“fast all”.

o FASTA format is a text-based format for


representing either nucleotide sequences
or peptide sequences, in which base pairs
or amino acids are represented using

FASTA File single-letter codes.

o A sequence in FASTA format begins with a

Format single-line description, followed by lines of


sequence data.

o The description line is distinguished from


the sequence data by a greater-than (">")
symbol in the fi rst column.

o It is recommended that all lines of text be


shorter than 80 characters in length.
An Accession Number: The unique identifier for a sequence record. An
accession number applies to the complete record and is usually a
combination of a letter(s) and numbers, (unchangeable).
THANK YOU

You might also like