0% found this document useful (0 votes)
3 views59 pages

2 DB Overview

The document provides an overview of bioinformatics database development, detailing the components of a Database Management System (DBMS) environment, roles within the DB environment, and the history of database systems. It discusses the advantages and disadvantages of DBMSs, the ANSI-SPARC three-level architecture, and various data models and languages. Additionally, it covers the functions of a DBMS and different client-server architectures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views59 pages

2 DB Overview

The document provides an overview of bioinformatics database development, detailing the components of a Database Management System (DBMS) environment, roles within the DB environment, and the history of database systems. It discusses the advantages and disadvantages of DBMSs, the ANSI-SPARC three-level architecture, and various data models and languages. Additionally, it covers the functions of a DBMS and different client-server architectures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

BIO-350

Bioinformatics Database Development


by
Waseem haider, PhD
Associate Professor,

Computational Biology and Bioinformatics Group (CBBG)


Dept. of Biosciences CUI, Islamabad-Pakistan
CEO Next Gen. Solutions (NGS)
Components of the DBMS Environment

Hardware
Can range from a PC to a network of computers.
Software
DBMS, operating system, network software (if
necessary) and also the application programs.
Data
Used by the organization and a description of this
data called the schema.
Procedures
Instructions and rules that should be applied to the
design and use of the database and DBMS.
People
Discussed in the next section
Roles in the DB Environment
• Data Administrator (DA)
– Database planning
– Development and maintenance of standards, policies and procedures
• Database Designers (Logical/Physical)
– Logical and Physical database design
• Application Programmers
• Develop Applications Database Administrator (DBA)
– Physical realization of the database
– Physical database design and implementation
– Security and integrity control
– Maintenance of the operational system
– Ensuring satisfactory performance of the applications for users
• End Users
– Naive
– Sophisticated
History of Database Systems
• Roots of the DBMS
– Apollo moon-landing project, 1960s
– NAA (North American Aviation), prime contractor
for the project
– Developed a software GUAM (Generalized Update
Access Method), hierarchical
– In mid–1960s IBM joined NAA, result was IMS
(Information Management System)
History of Database Systems..
• IDS (Integrated Data Store)
– By General Electric, network, mid-1960
• CODASYL
– Conference on Data Systems Languages
• DBTG
– Data Base Task Group
History of Database Systems..
• DBTG proposal (1971) included following
components for DB system architecture:
– The schema
– The subschema
– A data management language
• Schema DDL
• Subschema DDL
• DML
• Proposal was not formally adopted by ANSI
History of Database Systems..
• E. F. Codd, 1970
– IBM Research Laboratory
– Relational model
– System R project by IBM’S San Jose Research
Laboratory California
– Result of this project
• Development of SQL
• Commercial relational DBMS products e.g. DB2, SQL/DS
from IBM, Oracle from Oracle Corporation
DBMS Generations
• First-generation
– Hierarchical and Network
• Second generation
– Relational
• Third generation
– Object-Relational
– Object-Oriented
Advantages of DBMSs
• Control of data redundancy
– Minimized/controlled duplication
• Data consistency
– Less duplication means increased data consistency
• More information from the same amount of
data
– More information shared by relevant users
• Sharing of data
– Data is shared by all authorized users
Advantages of DBMSs..
• Improved data integrity
– Integrity in terms of constraints
• Improved security
– Authentication, access rights
• Enforcement of standards
– Data formats, naming conventions, documentation
etc.
• Economy of scale
– Cost savings due to database approach
Advantages of DBMSs..
• Balance conflicting requirements
– DBA resolves conflicts between different user’s
groups
• Improved data accessibility/ responsiveness
– Ad hoc queries on integrated data
• Increased productivity
– Developer need to focus on application
• Improved maintenance
– Through program data independence
Advantages of DBMSs..
• Increased concurrency
– Multiple users are allowed to access same data
• Improved backup and recovery services
– Backup routines, recovery procedures by skilled
staff
Disadvantages of DBMSs
• Complexity
• Size
• Cost of DBMS
• Additional hardware costs
• Cost of conversion
• Performance
• Higher impact of a failure
Objectives of Three-Level Architecture
• All users should be able to access same data
but have a different customized view
• A user’s view is immune to changes made in
other views
• Users should not need to know physical
database storage details
Objectives of Three-Level Architecture..

• DBA should be able to change database


storage structures without affecting the
users’ views
• Internal structure of database should be
unaffected by changes to physical aspects of
storage
• DBA should be able to change conceptual
structure of database without affecting all
users
ANSI-SPARC Three-Level Architecture
ANSI-SPARC Three-Level Architecture..

• External Level
– Users’ view of the database
– Describes that part of database that is relevant to a
particular user
– Different views may have different representation
of same data (e.g. different date formats, age
derived from DOB etc.)
ANSI-SPARC Three-Level Architecture..

• Conceptual Level
– Community view of the database
– Describes what data is stored in database and
relationships among the data
– Along with any constraints on data
– Independent of any storage considerations
ANSI-SPARC Three-Level Architecture..

• Internal Level
– Physical representation of the database on the computer
– Describes how the data is stored in the database
– physical implementation of the database to achieve
optimal runtime performance and storage space
utilization
– Data structures and file organizations used to store data
on storage devices
– Interfaces with the operating system access methods to
place the data on the storage devices, build the indexes,
retrieve the data, and so on
Differences between Three Levels of ANSI-SPARC
Architecture
Schemas
• External Schemas
– Also called subschemas
– Multiple schemas per database
– Corresponds to different views of data
• Conceptual Schema
– Describes all the entities, attributes, and
relationships together with integrity constraints
– Only one schema per database
Schemas..
• Internal Schema
– A complete description of the internal model,
containing the definitions of stored records, the
methods of representation, the data fields, and the
indexes and storage structures used
– Only one schema per database
Mappings
• The DBMS is responsible for mapping
between these three types of schema:
– The DBMS must check that each external schema
is derivable from the conceptual schema, and it
must use the information in the conceptual schema
to map between each external schema and the
internal schema
• Types of mappings
– Conceptual/Internal mapping
– External/Conceptual mapping
Conceptual/Internal Mapping
• Enables the DBMS to
– Find the actual record or combination of records in
physical storage that constitute a logical record in
the conceptual schema,
– Together with any constraints to be enforced on the
operations for that logical record
– It also allows any differences in entity names,
attribute names, attribute order, data types, and so
on, to be resolved
External/Conceptual Mapping
• Enables the DBMS to
– Map names in the user’s view on to the relevant
part of the conceptual schema
Instances
• Database Schema
– Description of database (also called intension)
– Specified during design phase
– Remain almost static
• Database Instance
– Data in the database at any particular point in time
– Dynamic (changes with the time)
– Also called an extension (or state) of database
Data Independence
• Logical Data Independence
– Refers to immunity of external schemas to changes
in conceptual schema
– Conceptual schema changes (e.g. addition/removal
of entities)
– Should not require changes to external schema or
rewrites of application programs
Data Independence
• Physical Data Independence
– Refers to immunity of conceptual schema to
changes in the internal schema
– Internal schema changes (e.g. using different file
organizations, storage structures, storage devices
etc.)
– Should not require change to conceptual or
external schemas
Data Independence and the ANSI-SPARC
Three-Level Architecture
Summary
• Components of the DBMS environment
• Roles in the DB environment
• History of DBMS
• Advantages/Disadvantages of DBMSs
• ANSI-SPARC three-level architecture
• Schemas, mappings, and instances
• Data independence
Database Languages
• Data sublanguage consist of two parts:
– DDL (Data Definition Language)
– DML (Data Manipulation Language)
• Data sublanguage
– Does not include constructs for all computing needs
such as iterations or conditional statements
– Many DBMSs provide embedding the sublanguage
in a high level programming language e.g. C, C++,
Java etc.
– In this case , these high level languages are called
host languages
Data Definition Language (DDL)
• Allows the DBA or user to describe and
name entities, attributes, and relationships
required for the application
• Plus any associated integrity and security
constraints
• System catalog (data dictionary, data
directory)
• Metadata (data about data, data
description, data definitions)
Data Manipulation Language (DML)
• Provides basic data manipulation
operations on data held in the database
– Procedural DML
– Non-Procedural DML
Procedural DML
• Allows user to tell system exactly how to
manipulate data
– Operate on records individually
– Typically, embedded in a high level language
– Network or hierarchical DMLs
– More work is done by user (programmer)
Non-Procedural DML
• Allows user to state what data is needed
rather than how it is to be retrieved
– Operate on set of records
– Relational DBMS include e.g. SQL, QBE etc.
– Easy to understand and learn than procedural
DML
– More work is done by DBMS than user
– Provides considerable degree of data independence
– Also called declarative languages
Fourth Generation Languages (4GLs)
• No clear consensus
– Forms generators
– Report generators
– Graphics generators
– Application generators
– Examples : SQL and QBE
Functions of a DBMS

• Data storage, retrieval, and update


• A user-accessible catalog
• Transaction support
• Concurrency control services
• Recovery services
Functions of a DBMS..

• Authorization services
• Support for data communication
• Integrity service
• Services to promote data independence
• Utility services
DBMS Environment

• Single user
• Multi-user
– Teleprocessing
– File-Server Architecture
– Client-Server Architecture
Teleprocessing
Teleprocessing

• Traditional architecture
• Single mainframe with a number of
terminals attached
• Trend is now towards downsizing
File-Server Architecture
File-Server Architecture
• DBMS and applications run on each
workstation
• Disadvantages include:
– Significant network traffic
– Copy of DBMS on each workstation
– Concurrency, recovery and integrity control more
complex because multiple DBMSs accessing same
files
Client-Server Architecture
Client-Server Architecture
• Client (tier 1) manages user interface and
runs applications
• Server (tier 2) holds database and DBMS
• Advantages include:
– Wider access to existing databases
– Increased performance
– Possible reduction in hardware costs
– Reduction in communication costs
– Increased consistency
Two-Tier Client-Server
Three-Tier Client-Server
• Client side issues in two-tier client/server
model preventing true scalability:
– ‘Fat’ client, requiring considerable resources on
client’s computer to run effectively
– Significant client side administration overhead
• By 1995, three layers proposed, each
potentially running on a different platform
Three-Tier Client-Server
Three-Tier Client-Server
• Advantages:
– ‘Thin’ client, requiring less expensive hardware
– Application maintenance centralized
– Easier to modify or replace one tier without
affecting others
– Separating business logic from database functions
makes it easier to implement load balancing
– Maps quite naturally to Web environment
Data Model
• Integrated collection of concepts for
describing data, relationships between data,
and constraints on the data in an
organization
Purpose of Data Model
• To represent data in an understandable way
– Represents the organization itself
– Helps in unambiguous and accurate
communication between between database
designers and end-users about their understanding
of the organizational data
Components of a Data Model
• A data model comprises:
– A structural part
– A manipulative part
– Possibly a set of integrity rules
– ANSI-SPARC architecture related models
• External data model (Universe of Discourse)
• Conceptual data model (DBMS independent)
• Internal data model
Categories of Data Models
• Categories of data models include:
– Object-based
• Entity-Relationship
• Semantic
• Functional
• Object-Oriented
– Record-based
• Relational Data Model
• Network Data Model
• Hierarchical Data Model
– Physical
Relational Data Model
Network Data Model
Hierarchical Data Model
Conceptual Modeling
• Conceptual modeling is process of
developing a model of information use in an
enterprise that is independent of
implementation details
– Should be complete and accurate representation of
an organization’s data requirements
– Conceptual schema is the core of a system
supporting all user views
• Conceptual vs. logical data model
Summary
• Database languages
• Functions of a DBMS
• DBMS environment
• Data models and their categories
References

• All the material (slides, diagrams etc.) presented in this


lecture is taken (with modifications) from the Pearson
Education website given below

– http://www.booksites.net/connbegg

You might also like