2024.HF BioInformatics Lec3p
2024.HF BioInformatics Lec3p
Bioinformatics:
Dr. Hossein Fallahi Lecture 3
Dep. of Biology,
School of Sceinces,
Razi University Databases
Kermanshah
Iran
Databases
q A very simple form of (non-electronic) database is a filing cabinet. In the filing cabinet,
you can store many different records (sheets of paper), each containing multiple data
elements.
ü the columns are the fields of data on the individual invoices (customer, product, price,
quantity)
q The biggest problem with a filing cabinet is that you can only store your data one way
(e.g., in alphabetical order of the customer’s last name), and there’s no good way of
searching your files based on any other criteria (say, by product ordered).
2
2
1
2/11/24
3
3
Tables (entities)
•basic elements of information to track, e.g., gene, organism, sequence,
citation
Columns (fields)
•attributes of tables, e.g. for citation table, title, journal, volume, author
Rows (records)
•actual data
•whereas fields describe what data is stored, the rows of a table are where
the actual data is stored
4
4
2
2/11/24
• The flat file formats from the sequence databases are still used to access
and display sequence and annotation.
5
5
6
6
3
2/11/24
XML format
7
7
Biological Databases
ØA simple, easy to understand structure. Øgood interface with Easy retrieval of data.
ØCross-referenced ØAccuracy
ØComprehensive, but easy to search. Øis up-to-date
ØAnnotated, but not “too annotated”. Øbatch search/download
ØMinimum redundancy.
8
8
4
2/11/24
• Primary Databases
• Original submissions by experimentalists
• Content controlled by the submitter
• Derivative Databases
• Derived from primary data
• Content controlled by third party (NCBI)
9
9
Primary Databases
1. repositories for nucleotide sequence data from all organisms. All three databases
2. accept nucleotide sequence submissions,
3. exchange new and updated data on a daily basis to achieve optimal synchronization
between them.
10
10
5
2/11/24
Secondary Databases
1. RefSeq
2. SNP / Disease Databases
3. OMIM; Online Mendelian Inheritance in Man OMIM Inherited Diseases
4. HapMap
5. 23andme's database
11
11
12
6
2/11/24
• Genomic information: chromosomal location, introns, UTRs, regulatory regions, shared domains, etc.
• Expression information: expression specific to particular tissues, developmental stages, phenotypes, diseases, etc.
• Functional information: enzymatic/molecular function, pathway/cellular role, localization, role in diseases
13
13
14
14
7
2/11/24
1 Meta Databases
2 Nucleic Acid Databases
3 Amino Acid / Protein Databases
4 Additional Databases (carbohydrate, systems)
5 Specialized Databases (antibodies, barcode of life)
6 Wiki-Style Databases
15
15
Meta Databases
16
16
8
2/11/24
17
17
Genome Databases
18
18
9
2/11/24
19
19
Structure Databases
20
20
10
2/11/24
21
21
Pathway Databases
22
22
11