1 Introduction
1 Introduction
Introduction
Introduction
• Data: Known facts that can be recorded and have an implicit meaning; raw data, unprocessed data
• Information: Processed data
• Database: a highly organized, interrelated, and structured set of data about a particular enterprise
• Controlled by a database management system (DBMS)
• DBMS
• Set of programs to access the data
• An environment that is both convenient and efficient to use
• Database systems are used to manage collections of data that are:
• Highly valuable
• Relatively large
• Accessed by multiple users and applications, often at the same time.
• A modern database system is a complex software system whose task is to manage a large, complex
collection of data.
• Databases touch all aspects of our lives
Database Examples
• Enterprise Information
• Sales: customers, products, purchases
• Accounting: payments, receipts, assets
• Human Resources: Information about employees, salaries, payroll taxes.
• Manufacturing: management of production, inventory, orders, supply chain.
• Banking and finance
• customer information, accounts, loans, and banking transactions.
• Credit card transactions
• Finance: sales and purchases of financial instruments (e.g., stocks and bonds; storing real-time market data
• Universities: registration, grades
Databases
• Traditional applications:
• Numeric and textual databases
• More recent applications:
• Multimedia databases
• Geographic Information Systems (GIS)
• Biological and genome databases
• Data warehouses
• Mobile databases
• Real-time and active databases
• Social Networks started capturing a lot of information about people
and about communications among people-posts, tweets, photos,
videos in systems such as:
- Facebook
- Twitter
- Linked-In
• All of the above constitutes data
• Search Engines, Google, Bing, Yahoo: collect their own repository of
web pages for searching purposes
DBMS Functions
• Define a particular database in terms of its data types, structures etc.
• Construct or load the initial database contents on a secondary storage medium
• Manipulating the database:
• Retrieval: Querying, generating reports
• Modification: Insertions, deletions and updates to its content
• Accessing the database through Web applications
• Processing and sharing by a set of concurrent users and application programs –
yet, keeping all data valid and consistent
• DBMS may additionally provide:
• Protection or security measures to prevent unauthorized access
• “Active” processing to take internal actions on data
• Presentation and visualization of data
• Maintenance of the database and associated programs over the lifetime of
the database application
Purpose of Database Systems
File-processing system is supported by a conventional operating system. The system stores permanent records in
various files, and it needs different application programs to extract records from, and add records to, the appropriate
files. Before database management systems (DBMSs) were introduced, organizations usually stored information in
such systems.
Data redundancy and inconsistency:
• Since different programmers create the files and application programs over a long period, the various files are likely
to have different structures and the programs may be written in several programming languages. Moreover, the same
information may be duplicated in several places (files). This redundancy leads to higher storage and access cost. In
addition, it may lead to data inconsistency; that is, the various copies of the same data may no longer agree.
Difficulty in accessing data
• Need to write a new program to carry out each new task
Data isolation
• Multiple files and formats. Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
Integrity problems:
• The data values stored in the database must satisfy certain types of consistency constraints. Suppose also that
the university requires that the account balance of a department may never fall below zero. Developers enforce
these constraints in the system by adding appropriate code in the various application programs. However, when
new constraints are added, it is difficult to change the programs to enforce them. The problem is compounded
when constraints involve several data items from different files.
• Atomicity of updates
• Failures may leave database in an inconsistent state with partial updates carried out
• Example: Transfer of funds from one account to another should either complete or not happen at all
• Concurrent access by multiple users
• Concurrent access needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
• Ex: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at the
same time
• Security problems
• Hard to provide user access to some, but not all, data
Simplified database system
environment
Types of Databases
1. Relational Database
• A relational database management system (RDBMS) is a system where data is
organized in two-dimensional tables using rows and columns.
• This is one of the most popular data models which is used in industries. It is
based on SQL.
• Every table in a database has a key field which uniquely identifies each record.
• This type of system is the most widely used DBMS.
• Relational database management system software is available for personal
computers, workstation and large mainframe systems.
• For example − Oracle Database, MySQL, Microsoft SQL Server etc.
2. Object Oriented Database
• It is a system where information or data is represented in the form of
objects which is used in object-oriented programming.
• It is a combination of relational database concepts and object-oriented
principles.
• Relational database concepts are concurrency control, transactions, etc.
• OOPs principles are data encapsulation, inheritance, and polymorphism.
• It requires less code and is easy to maintain.
• For example − Object DB software.
3. Hierarchical Database
• It is a system where the data elements have a one to many
relationship (1: N). Here data is organized like a tree which is similar to
a folder structure in your computer system.
• The hierarchy starts from the root node, connecting all the child
nodes to the parent node.
• It is used in industry on mainframe platforms.
• For example− IMS(IBM), Windows registry (Microsoft).
4. Network database
• A Network database management system is
a system where the data elements maintain
one to one relationship (1: 1) or many to
many relationship (N: N).
• It also has a hierarchical structure, but the
data is organized like a graph and it is
allowed to have more than one parent for
one child record.
5. NoSQL databases
• NoSQL is a broad category that includes any database that doesn’t use SQL as its
primary data access language.
• These types of databases are also sometimes referred to as non-relational databases.
• Unlike in relational databases, data in a NoSQL database doesn’t have to conform to a
pre-defined schema, so these types of databases are great for organizations seeking to
store unstructured or semi-structured data.
• One advantage of NoSQL databases is that developers can make changes to the
database on the fly, without affecting applications that are using the database.
•
• Examples: Apache Cassandra, MongoDB, CouchDB, and CouchBase
6. Cloud databases
• A cloud database refers to any database that’s designed to run in the cloud. Like other cloud-
based applications, cloud databases offer flexibility and scalability, along with high availability.
Cloud databases are also often low-maintenance, since many are offered via a SaaS model.
• Examples: Microsoft Azure SQL Database, Amazon Relational Database Service, Oracle
Autonomous Database.
7. Columnar databases
• Also referred to as column data stores, store data in columns rather than rows. These types of
databases are often used in data warehouses because they’re great at handling analytical
queries. When you’re querying a columnar database, it essentially ignores all of the data that
doesn’t apply to the query, because you can retrieve the information from only the columns
you want.
• Examples: Google BigQuery, Cassandra, HBase, MariaDB, Azure SQL Data Warehouse
8. Document databases
• Document databases, also known as document stores, use JSON-like documents to model data instead of
rows and columns. Sometimes referred to as document-oriented databases, document databases are
designed to store and manage document-oriented information, also referred to as semi-structured data.
Document databases are simple and scalable, making them useful for mobile apps that need fast iterations.
• Examples: MongoDB, Amazon DocumentDB, Apache CouchDB
9. Graph databases
• Graph databases are a type of NoSQL database that are based on graph theory. Graph-Oriented Database
Management Systems (DBMS) software is designed to identify and work with the connections between data
points. Therefore graph databases are often used to analyze the relationships between heterogeneous data
points, such as in fraud prevention or for mining data about customers from social media.
• Examples: Datastax Enterprise Graph, Neo4J
4. Manages Information
• A database always takes care of its information because information is always helpful for
whatever work we do. It manages all the information that is required to us.
▻Those who actually use and control the database content, and those who
design, develop and maintain database applications (called “Actors on the
Scene”), and
▻Those who design and develop the DBMS software and related tools, and the
computer systems operators (called “Workers Behind the Scene”).
Actors on the scene
▻Database administrators:
▻Responsible for authorizing access to the database, for coordinating and
monitoring its use, acquiring software and hardware resources, controlling
its use and monitoring efficiency of operations.
▻Database Designers:
▻Responsible to define the content, the structure, the constraints, and
functions or transactions against the database. They must communicate
with the end-users and understand their needs.
▻ End-users: They use the data for queries, reports and some of them update the database
content. End-users can be categorized into:
Sophisticated:
These include business analysts, scientists, engineers, others thoroughly familiar with the system capabilities.
Many use tools in the form of software packages that work closely with the stored database.
Stand-alone:
Mostly maintain personal databases using ready-to-use packaged applications.
An example is a tax program user that creates its own internal database.
Another example is a user that maintains an address book
Workers Behind the scene
▰ DBMS system designers and implementers :
▻Design and implement the DBMS modules and interfaces including modules for
implementing the catalog, query language processing, interface processing, accessing and
buffering data, controlling concurrency, and handling data recovery and security.
▰ Tool developers
▻ Design and implement tools which are optional packages for database design, performance
monitoring, natural language or graphical interfaces, prototyping, simulation, and test data
generation
▰ Operators and maintenance personnel (system administration personnel) are responsible for the
actual running and maintenance of the hardware and software environment for the database
system.
Schemas, Instances and
Database State
Database Schema (meta-data): The Design of a database is called the
schema. It Includes descriptions of the database structure and the
constraints that should hold on the database. The database schema
changes very infrequently.
Database Instance: The actual data stored in a database at a particular
moment in time. Also called database state ( or occurrence, snapshot)
The database state changes every time the database is updated.
The capacity to change the schema at one level without having to change the schema
at the next higher level
Types:
Logical Data Independence: The capacity to change the conceptual schema without
having to change the external schemas and their application programs.
Physical Data Independence: The capacity to change the internal schema without
having to change the conceptual schema.
Requires only the mappings between one schema and higher-lever schemas to
change
Three Schema Architecture –
Advantages
• Database abstraction
• Easier to use for a user.
• Allows each user to access customized view of data.
• Enables a database admin to change the storage structure without
affecting the user’s view
3-tier Client Server DBMS
Architecture
• The 3-tier architecture consists of the three layers as follows −
• Presentation layer − This layer is also called the client layer. The front-
end layer consists of a user interface. The main purpose is to
communicate with the application layer.
• Application layer − This layer is also called the business logic layer. It
acts as a middle layer between the client and the database server
which are used to exchange partially processed data.
• Database layer − In this layer the data or information is stored. This
layer performs operations like insert, update and delete to connect
with the database.