Module1 ADBMS
Module1 ADBMS
Parallel Database
Distributed Database
Architecture for Parallel Databases, Types of Distributed Databases,
Distributed DBMS Architecture, Storing Data in a Distributed DBMS.
1
• A parallel database system is one that seeks to improve
performance through parallel implementation of various operations
such as loading data, building indexes, and evaluating queries.
2
• Parallel processing divides a large task into many smaller tasks, and
executes the smaller tasks concurrently on several nodes. As a result,
the larger task completes more quickly.
• Some tasks can be effectively divided, and thus are good candidates
for parallel processing.
3
• For example, in a bank with only one teller, all customers must form a single
queue to be served. With two tellers, the task can be effectively split so that
customers form two queues and are served twice as fast-or they can form a
single queue to provide fairness. This is an instance in which parallel
processing is an effective solution.
• By contrast, if the bank manager must approve all loan requests, parallel
processing will not necessarily speed up the flow of loans. No matter how
many tellers are available to process loans, all the requests must form a single
queue for bank manager approval. No amount of parallel processing can
overcome this built-in bottleneck to the system.
4
• Problems of Parallel Processing
• Effective implementation of parallel processing involves two challenges:
• structuring tasks so that certain tasks can execute at the same time (in parallel)
• preserving the sequencing of tasks which must be executed serially
5
What Is a Parallel Database?
• A variety of hardware architectures allow multiple computers to share
access to data, software, or peripheral devices.
7
• Higher Performance
• With more CPUs available to an application, higher speedup and scaleup
can be attained.
• Higher Availability
• Nodes are isolated from each other, so a failure at one node does not bring
the whole system down.
• The remaining nodes can recover the failed node and continue to provide
data access to users.
• This means that data is much more available than it would be with a single
node upon node failure, and amounts to significantly higher availability of
the database.
8
• Greater Flexibility
• An Oracle Parallel Server environment is extremely flexible. Instances
can be allocated or deallocated as necessary. When there is high demand
for the database, more instances can be temporarily allocated. The
instances can be deallocated and used for other purposes once they are
no longer necessary.
• More Users
• Parallel database technology can make it possible to overcome memory
limits, enabling a single system to serve thousands of users.
9
Architecture for parallel database
• Three main architectures have been proposed for building parallel
DBMSs.
• shared-memory system, multiple CPUs are attached to an
interconnection network and can access a common region of main
memory.
10
• shared-disk system, each CPU has a private memory and direct
access to all disks through an interconnection network.
11
• In a shared-nothing system, each CPU has local main memory and
disk space, but no two CPUs can access the same storage area; all
communication between CPUs is through interconnection network
12
• Shared memory is extremely efficient communication between processors
. Data in shared memory can accessed by any processor without being
moved with software.
• Shared Disk since each processor has it’s own memory ,the memory bus is
not bottleneck. It offers cheap way to provide a degree of fault tolerance. If
a processor fails other processor can take over it’s task since database is
resident on disk.
• Shared nothing the processor at one end may communicate with one
another processor at another node by high speed interconnection network.
13
• Problem with Shared-memory and shared-disk architectures is
inter-ference: As more CPUs are added, existing CPUs are
slowed down because of the increased contention for memory
accesses and network bandwidth.
14
Speedup
• Speedup is the extent to which more hardware can perform the
same task in less time than the original system.
• With added hardware, speedup holds the task constant and
measures time savings.
• It shows how each parallel hardware system performs half of the
original task in half the time required to perform it on a single
system.
• With good speedup, additional processors reduce system response
time.
You can measure speedup using this formula:
Where speedup=time_original/time_parallel
• Time_Parallel is the elapsed time spent by a larger, parallel system on the given task
15
Scaleup
• Scaleup is the factor m that expresses how much more work can be
done in the same time period by a system n times larger.
• With added hardware, a formula for scaleup holds the time constant,
and measures the increased size of the job which can be done.
16
With good scaleup, if transaction volumes grow, you can keep response time
constant by adding hardware resources such as CPUs.
You can measure scaleup using this formula:
For example, if the original system can process 100 transactions in a given amount of time
, and the parallel system can process 200 transactions in this amount of time, then the value of scaleup
would be equal to 2.
That is, 200/100 = 2. A value of 2 indicates the ideal of linear scaleup: when twice as much
hardware can process twice the data volume in the same amount of time.
17
18
• The Shared Nothing Architecture has shown:
• a) Linear Speed Up: the time taken to execute operations
decreases in proportion to the increase in the number of CPU‟s and
disks
• b) Linear Scale Up: the performance is sustained if the number
of CPU‟s and disks are increased in proportion to the amount of
data.
19
PARALLEL QUERY EVALUATION
• There are two query evaluation technique
⮚pipelined parallelism
⮚data partitioning
• A relational query execution plan is a graph of relational algebra operators and the
operators in a graph can be executed in parallel.
• If an operator consumes the output of a second operator, we have pipelined
parallelism (pipeline is a set of data processing elements connected in series, so that
the output of one element is the input of the next one.);
• if not, the two operators can proceed essentially independently.
• An operator is said to block if it produces no output until it has consumed all its
inputs. Pipelined parallelism is limited by the presence of operators (e.g., sorting or
aggregation) that block.
20
PARALLEL QUERY EVALUATION
• The key to evaluating an operator in parallel is to partition the input data; we
can then work on each partition in parallel and combine the results.
• This approach is called data-partitioned parallel evaluation.
• Basics of Partitioning
• Partitioning allows a table, index, or index-organized table to be subdivided into
smaller pieces, where each piece of such a database object is called a partition.
Each partition has its own name, and may optionally have its own storage
characteristics.
21
Data Partitioning
⮚Round-Robin partitioning :If there are n processors, the ith tuple is assigned
to processor i mod n
⮚Hash partitioning: a hash function is applied to(selected fields of) a tuple to
determine its processor. Hash partitioning has the additional virtue that it
keeps data evenly distributed even if the data grows and shrinks over time.
⮚Range partitioning: tuples are sorted (conceptually), and n ranges are chosen
for the sort key values so that each range contains roughly the same number of
tuples; tuples in range i are assigned to processor i.
⮚Range partitioning can lead to data skew; that is, partitions with widely
varying numbers of tuples across partitions or disks. Skew causes
processors dealing with large partitions to become performance bottlenecks.
22
23
24
25
Data Partitioning
• Composite Partitioning
• Composite partitioning is a combination of the basic data distribution methods;
a table is partitioned by one data distribution method and then each partition is
further subdivided into subpartitions using a second data distribution method.
All subpartitions for a given partition together represent a logical subset of the
data.
26
Distributed Database
System
27
Introduction
• Data in a distributed database system is stored across several
sites, and each site is typically managed by a DBMS that can
run independent of the other sites.
• The classical view of a distributed database system is that
the system should make the impact of data distribution
transparent.
28
What are distributed databases?
29
The above diagram is a typical example of distributed database system, in which
communication channel is used to communicate with the different locations and
every system has its own memory and database.
30
Features of DDBMS
• Sharing data:- Users at one site may be able to access the data
residing at other sites.
• Autonomy:- Each site is able to retain a degree of control over data
that are stored locally.
• Availability:- If one site fails in a distributed system, the remaining
sites may be able to continue operating.
31
Goals of Distributed Database system.
• The concept of distributed database was built with a goal to
improve:
32
Properties of DDBMS
• Distributed data independence: Users should be able
to ask queries without specifying where the referenced
relations, or copies or fragments of the relations, are
located
• Distributed transaction atomicity: Users should be
able to write transactions that access and update data
at several sites just as they would write transactions
over purely local data. In particular, the effects of a
transaction across sites should continue to be atomic;
that is, all changes persist if the transaction commits,
and none persist if it aborts.
33
Types of distributed databases.
• The two types of distributed systems are as follows:
1. Homogeneous distributed databases system:
• Homogeneous distributed database system is a network of two or more
databases (With same type of DBMS software) which can be stored on one
or more machines.
• So, in this system data can be accessed and modified simultaneously on
several databases in the network. Homogeneous distributed system are easy
to handle.
• Example: Consider that we have three departments using Oracle-9i for
DBMS. If some changes are made in one department then, it would update
the other department also.
• In it data is distributed but all servers run the same DBMS software.
34
35
• 2. Heterogeneous distributed database system.
• Heterogeneous distributed database system is a network of two or
more databases with different types of DBMS software, which can be
stored on one or more machines.
• In this system data can be accessible to several databases in the
network with the help of generic connectivity (ODBC and JDBC).
• Example: In the following diagram, different DBMS software are
accessible to each other using ODBC and JDBC.
36
37
Architecture of DDBMS
• Three DDBMS architecture are :-
★Client-server
★Collaborating server
★Middleware
38
Client-Server Architecture
• A client-server system has one or more client
processes and one or more server processes.
• A client process can send a query to any one server
process.
• The earliest available server solves it and replies.
• Clients are responsible for user-interface issues and
servers manage data and execute transaction.
• A Client-server architecture is simple to implement
and execute due to centralized server system.
39
Advantages of Client Server Architecture
• It is relatively simple to implement due to its clean
separation of functionality and because the server is
centralized.
• Users can run a graphical user interface that they are
familiar with, rather than the (possibly unfamiliar and
unfriendly) user interface on the server.
40
Disadvantages of Client Server Architecture
• It does not allow a single query to span multiple
servers because the client process would have to be
capable of breaking such a query into appropriate
subqueries to be executed at different sites and then
piecing together the answers to the subqueries.
• The client process would therefore be quite complex,
and its capabilities would begin to overlap with the
server; distinguishing between clients and servers
become harder.
41
Collaborating Server
🞂 Collaborating server architecture is designed
to run a single query on multiple servers.
🞂 Servers break single query into multiple small
queries and the result is sent to the client.
🞂 Collaborating server architecture has a
collection of database servers. Each server is
capable for executing the current transactions
across the databases.
🞂 Decomposition of the query should be done
taking into account the cost of network
communication as well as local processing
cost.
42
Middleware
🞂 The middleware architecture is designed to allow a single
query to span multiple servers, without requiring all
database servers to be capable of managing such multi-site
execution strategies.
🞂 One database server capable of managing queries and
transactions spanning multiple servers is needed; the
remaining servers need to handle only local queries and
transaction.
🞂 This special server is a layer of s/w that coordinates the
execution of queries and transaction across one or more
independent database servers.
🞂 Such s/w is known as middleware.
🞂 The middleware layer is capable of executing joins &
other relational operations on data obtained from the other
servers but, does not itself maintain any data.
43
Storing Data in a Distributed DBMS
• Consider a relation r that is to be stored in the
database. There are two approaches to store this
relation in the distributed database:
• Fragmentation
• Replication
• Fragmentation and replication can be combined: a
relation can be partitioned into several fragments and
there may be several replicas of each fragment.
44
What is fragmentation?
• The process of dividing the database into a smaller multiple parts is
called as fragmentation.
• These fragments may be stored at different locations.
• The data fragmentation process should be carrried out in such a way
that the reconstruction of original database from the fragments is
possible.
45
Types of data Fragmentation
• There are three types of data fragmentation:
• Horizontal data fragmentation
• Vertical Fragmentation
• Hybrid Fragmentation
46
Fragmentation
47
Fragmentation
48
Fragmentation
Horizontal
Vertical
Fragment
Fragment
49
Fragmentation
50
Replication
51
Motivation for Replication
52
• Relational database systems support a small, fixed
collection of data types (e.g., integers, dates, strings),
which has proven adequate for traditional application
domains such as administrative data processing.
• In many application domains, however, much more
complex kinds of data must be handled.
• Typically this complex data has been stored in OS
file systems or specialized data structures, rather than
in a DBMS. Examples of domains with complex data
include computer-aided design and modeling
(CAD/CAM), multimedia repositories, and document
management.
• As the amount of data grows, the many features offered by
a DBMS for example, reduced application development
time, concurrency control and recovery, indexing support,
and query capabilities become increasingly attractive and,
ultimately, necessary.
• In order to support such applications, a DBMS must
support complex data types.
• Object-oriented concepts have strongly influenced efforts
to enhance database support for complex data and have led
to the development of object-database systems,
• Object-database systems have developed along two distinct
paths:
• RDBMS
– Relational Database Management Systems
• OODBMS
– Object-Oriented Database Management Systems
• ORDBMS
– Object-Relational Database Management Systems
• ODBMS
– Object-Database Management Systems
Object-database vendor
• IBM (db2)
• Oracle
• Informix
• Sybase
• Microsoft SQL
Manipulating the New Kinds of Data
• Our first challenge comes from the Clog breakfast
cereal company. Clog produces a cereal called
Delirious, and it wants to lease an image of Herbert
the Worm in front of a sunrise, to incorporate in the
Delirious box design.
• The thumbnail method in the Select clause
produces a small version of its full-size input
image.
• The is_ sunrise method is a Boolean function that
analyzes an image and returns true if the image
contains a sunrise;
• The is_Herbert method returns true if the image
contains a picture of Herbert. The query produces the
frame code number, image thumbnail, and price for
all frames that contain Herbert and a sunrise.
SQL2 Extended SQL to find pictures of
Herbert at sunrise
• The second challenge comes from Dinky's
executives.
• They know that Delirios is exceedingly popular in the
tiny country of Andorra, so they want to make sure
that a number of Herbert films are playing at
theaters near Andorra when the cereal hits the
shelves.
• To check on the current state of a
airs, the executives want to find the names of all
theaters showing Herbert films within 100
kilometers of Andorra.
• SELECT N.theatername,N.theateraddress,F.title
from Nowshowing N, Films F, Countries C where
N.film=F.filmno AND overlaps(C.boundry,
radius(N.theateraddress,100)) AND
C.name=‘Andorra’ AND ‘Herbert the Worm’=
F.stars[1]
• The theater attribute of the Nowshowing table is a reference to
an object in another table, which has attributes name,
address,and location.
• This object referencing allows for the notation N.theatername
and N.theateraddress, each of which refers to attributes of the
theater_t object referenced in the Nowshowing row N.
• The stars attribute of the films table is a set of names of each
film's stars.
• The radius method returns a circle centered at its first argument
with radius equal to its second argument.
• The overlaps method tests for spatial overlap. Thus,
Nowshowing and Films are joined by the equijoin clause, while
Nowshowing and Countries are joined by the spatial overlap
clause.
• some unusual features:
• User-defined methods: User-defined abstract types
are manipulated via their methods,
• Operators for structured types: Along with the
structured types available in the data model,
ORDBMSs provide the natural methods for those
types. For example, the array types support the
standard array operation of accessing an array
element by specifying the index. F.stars[1] return the
first element of the array in the star column of film F
• Operators for reference types: Reference types are
dereferenced via an arrow(→) notation.
• User defined abstract data types, structured
types, and reference types collectively we refer
these types as complex types
STRUCTURED DATA TYPES
• Atomic types and user-defined types can be combined
to describe more complex structures using type
constructors.
• Types defined using type constructor are called as
structured types
• ROW(n1 t1, ..., nn tn): A type representing a row, or
tuple, of n fields with fields n1,n2,……nn of types
t1,t2,……tn respectively.
• base ARRAY[i]: a type representing an array of (up
to) i base-type items.
• Also in the example we are having new ROW data
type.
• Row type has special role because every table is
collection of rows
• A star field of table films illustrates the new ARRAY
type.
• It is an array of upto 10 elements, each of which is of
type VARCHAR(25), 10 is the maximum number of
elements in the array.
• Array can contain fewer elements.
• Other common type constructors include:
• listof(base): A type representing a sequence of base-type
items.
• ARRAY(base): A type representing an array of base-type
items.
• setof(base): A type representing a set of base-type items.
Sets cannot contain duplicate elements.
• bagof(base): A type representing a bag or multiset of
base-type items.
Operations on row
• Given an item i whose type is ROW(n1 t1; :::; nn tn),
the field extraction method allows us to access an
individual field nk using the traditional dot notation
i.nk.
• If row constructors are nested in a type definition,
dots may be nested to access the fields of the nested
row; for example: i.nk.m1
• select c.c_add.street from customer17 c where
custid=2;
• This nested-dot notation is often called a path
expression because it describes a path through the
nested structure
Path Expressions
• Can have nested row types (Emp.spouse.name)
• Can have ref types and row types combined
nested dots & arrows. (Emp->Dept->Mgr.name)
• Generally, called path expressions
– Describe a “path” to the data
• Path-expression queries can often be
rewritten as joins.
select E->Dept->Mgr.name
from emp E;
Operations on arrays
• Array type support an ‘array index’ method to allow
user to access array items of particular offset.
• A postfix ‘square bracket’ syntax is usually used.
• There is an operator(cardinality) that return the
number of elements in the array.
• The variable number of elements also motivates an
operator to concatenate two arrays.
Select F.filmno, (F.stars || [‘brando’,’pacino’])
From films F
Where cardinality(F.stars)<3 AND F.stars[1]=‘redford’;
• For each film with the redford as first star and fewer
than three stars,
• The result contain the films array of star concatenated
with the array containing the two elements ‘Brando’
and pacino’.
Defining method
• To register a new method for a user defined data type,
user must write the code fro the method and then
inform the database about the method.
• The code be written depends on language supported
by DBMS
INHERITANCE
• In object-database systems, inheritance can be used in
two ways:
• For reusing and refining types and
• For creating hierarchies of collections of similar
but not identical objects.
• CREATE TYPE theater_t AS ROW( tno integer,
name text, address text, phone text)
• created theaters with type theater_t.
• Theater-café is just like Theaters but contains
additional attributes.
• With the help of Inheritance we can get specialization
URLs include network addresses and often oids are simply identifiers and carry no
file-system names as well, meaning that if physical information about the objects
the resource identified by the URL has to they identify this makes it possible to
move to another file or network address, change the storage location of an object
then all links to that resource will either be without modifying pointers to the object.
incorrect or require a `forwarding'
mechanism.
For Urls deletion can be troublesome: this For oids, SQl allows us to say
gives ‘404 Page Not Found Error’ REFERENCES ARE CHECKED as part
of SCOPE clause and choose out several
actions when reference object is deleted
Object identity
OODBMS:
• OODBMS s support collection types make it possible
to provide a query language over collections. Indeed,
a standard has been developed by the Object
Database Management Group and is called Object
Query Language.
• OQL is similar to SQL, with a SELECT-FROM-
WHERE-style syntax
• OQL supports structured types, including sets , bags,
arrays , and lists.
• OQL also supports reference types ,path expressions,
ADTs and inheritance ,type extents, and SQL-style
nested queries.
• Object Data Language(ODL) is similar to the DDL
subset of SQL but supports the additional features
found in OODBMSs, such as ADT definitions.
Similarity b/w OODBMS AND
ORDBMS
• Both support
• user-defined ADTs,
• structured types,
• object identity and reference types,
• And inheritance.
• Query language for manipulating collection types.
• Both provide functionality such as concurrency
control and recovery.
• Note: ORDBMS support an extended form of SQL.
• OODBMS supports ODL/OQL.