0% found this document useful (0 votes)
6 views14 pages

2.data Bases

A database is a structured collection of data that is searchable, periodically updated, and cross-referenced to transform meaningless data into useful information. Database Management Systems (DBMS) facilitate the storage, extraction, and modification of data, with various types including flat file, hierarchical, relational, and object-oriented databases. Biological databases, which organize diverse types of biological data for optimal analysis, face challenges such as lack of standard formats and nomenclature, data optimization issues, and data quality errors.

Uploaded by

Gohar Iqbal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views14 pages

2.data Bases

A database is a structured collection of data that is searchable, periodically updated, and cross-referenced to transform meaningless data into useful information. Database Management Systems (DBMS) facilitate the storage, extraction, and modification of data, with various types including flat file, hierarchical, relational, and object-oriented databases. Biological databases, which organize diverse types of biological data for optimal analysis, face challenges such as lack of standard formats and nomenclature, data optimization issues, and data quality errors.

Uploaded by

Gohar Iqbal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 14

WHAT is a database?

• A collection of data that needs to be:


– Structured
– Searchable
– Updated (periodically)
– Cross referenced

• Challenge:
– To change “meaningless” data into useful information that can be
accessed and analysed the best way possible.

For example:
HOW would YOU organise all biological sequences so that the
biological information is optimally accessible?

You need an appropriate database management system (DBMS)


DBMS

• Internal organization
Database
– Controls speed and
flexibility

• A unity of programs that


Store Extract Modify
– Store
– Extract
– Modify
USER(S)
DBMS organisation types

• Flat file databases (flat DBMS)


– Simple, restrictive, table
• Hierarchical databases (hierarchical DBMS)
– Simple, restrictive, tables
• Relational databases (RDBMS)
– Complex,versatile, tables
• Object-oriented databases (ODBMS)
– Complex, versatile, objects
Relational databases
• Data is stored in multiple related tables

• Data relationships across tables can be


either many-to-one or many-to-many

• A few rules allow the database to be


viewed in many ways
• Lets convert the “course details” to a
relational database
Our flat file database
FLAT DATABASE 2 Course details

Name Depart. Course E1 E2 E3 P1 P2


Student 1 Chemistry Biology A B B A C …..
Student 1 Chemistry Maths C C B A A …..
Student 1 Chemistry English A A A A A …..
.
.
.
.
Student 2 Ecology Biology A B A A A …..
Student 2 Ecology Maths A D A A A …..
.
.
.
.
Normalize (1NF) …
• We remove repeating records (rows)
sID Name dID sID cID E1 E2 E3 P1 P2
1 Student1 1 1 1 A B B A C …..
2 Student2 2 1 2 C C B A A …..
cID Course 1 3 A A A A A …..
.
1 Biology .
2 Maths .
.
3 English 2 1 A B A A A …..
2 2 A D A A A …..
dID Department
.
1 Chemistry .
2 Ecology .
.
Foreign keys
Primary keys
Normalize (2NF) …
• We remove redundant fields (columns)
sID Name dID cID Course sID cID gID wID
1 Student1 1 1 Biology 1 1 1 1
2 Student2 2 2 Maths 1 1 2 2
3 English 1 1 2 3
gID Grade
1 1 1 4
1 A
wID Project 1 1 3 5
2 B
1 E1 2 1 1 1
3 C
2 E2 2 1 1 2
dID Department 3 E3 2 1 2 3
1 Chemistry 4 P1 2 1 1 4
2 Ecology 5 P2 2 1 1 5
Relational Databases
• What have we achieved?
– No repeating information
– Less storage space
– Better reality representation
– Easy modification/management
– Easy usage of any combination of records

Remember
the DBMS has programs to access and edit this
information so ignore the human reading limitation of
the primary keys
Accessing database information
• A request for data from a database is
called a query

• Queries can be of three forms:


– Choose from a list of parameters
– Query by example (QBE)
– Query language
Distributed databases
• From local to global attitude
• Data appears to be in one location but is most definitely
not

• A definition: Two or more data files in different locations,


periodically synchronized by the DBMS to keep data in
all locations consistent (A,B,C)

• An intricate network for combining and sharing


information
• Administrators praise fast network technologies!!!
• Users praise the internet!!!
Three main Points
• Database proliferation
– Dozens to hundreds at the moment
• More and more scientific discoveries result
from inter-database analysis and mining
• Rising complexity of required data-
combinations
– E.g. translational medicine: “from bench to
bedside” (genomic data vs. clinical data)
Biological databases
• Like any other database
– Data organization for optimal analysis

• Data is of different types


– Raw data (DNA, RNA, protein sequences)
– Curated data (DNA, RNA and protein
annotated sequences and structures,
expression data)
A few biological databases
• Nucleotide Databases
Alternative Splicing, EMBL-Bank, Ensembl, Genomes Server, Genome,
MOT, EMBL-Align, Simple Queries, dbSTS Queries, Parasites, Mutations,
IMGT
• Genome Databases
Human, Mouse, Yeast, C.elegans, FLYBASE, Parasites
• Protein Databases
Swiss-Prot, TrEMBL, InterPro, CluSTr, IPI, GOA, GO, Proteome Analysis,
HPI, IntEnz, TrEMBLnew, SP_ML, NEWT, PANDIT
• Structure Databases
PDB, MSD, FSSP, DALI
• Microarray Database
ArrayExpress
• Literature Databases
MEDLINE, Software Biocatalog, Flybase Archives
• Alignment Databases
BAliBASE, Homstrad, FSSP
A short word on problems
• Even today we face some key limitations
– There is no standard format
• Every database or program has its own format
– There is no standard nomenclature
• Every database has its own names
– Data is not fully optimized
• Some datasets have missing information without indications
of it
– Data errors
• Data is sometimes of poor quality, erroneous, misspelled
• Error propagation resulting from computer annotation

You might also like