0% found this document useful (0 votes)
31 views12 pages

DWHM 1

This document discusses database software features that are useful for data warehousing. It describes how database management systems (DBMSs) have been enhanced to support large databases and data warehousing functions like data loading, transformation and replication. It also covers indexing techniques, parallel processing options for DBMSs including horizontal, vertical and hybrid parallelism. Key considerations for selecting a DBMS for a data warehouse are discussed like load balancing, query optimization and high performance data loading.

Uploaded by

Catherine Muhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views12 pages

DWHM 1

This document discusses database software features that are useful for data warehousing. It describes how database management systems (DBMSs) have been enhanced to support large databases and data warehousing functions like data loading, transformation and replication. It also covers indexing techniques, parallel processing options for DBMSs including horizontal, vertical and hybrid parallelism. Key considerations for selecting a DBMS for a data warehouse are discussed like load balancing, query optimization and high performance data loading.

Uploaded by

Catherine Muhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

DATABASE SOFTWARE

Prepared by:
CATHERINE M. MUHI
BSAIS 3
Database software
• This Database software that started out for use in operational OLTP
systems have been enhanced to cater to decision support systems.
DBMSs have also been scaled up to support very large databases.
• Some RDBMS products now include support for the data acquisition
area of the data warehouse. Mass loading and retrieval of data from
other database systems have become easier. Some vendors have paid
special attention to the data transformation function. Replication
features have been reinforced to assist in bulk refreshes and
incremental loading of the data warehouse.
• Bit-mapped indexes could be very effective in a data warehouse
environment to index on fields that have a smaller number of distinct
values.
For example, in a database table containing geographic regions, the
number of distinct region codes is few. But frequently, queries involve
selection by regions. In this case, retrieval by a bit-mapped index on the
region code values can be very fast. Vendors have strengthened this type
of indexing.
Parallel Processing Options
• intended only for machines with multiple processors.
• Most of the current database software can parallelize a large number of
operations. These operations include the following: mass loading of data,
full table scans, queries with exclusion conditions, queries with grouping,
selection with distinct values, aggregation, sorting, creation of tables using
subqueries, creating and rebuilding indexes, and so on. Notice that this is an
impressive list of operations that the RDBMS can process in parallel.
Interquery Parallelization
• In this method, several server processes handle multiple requests simultaneously.
• Multiple queries may be serviced based on your server configuration and the
number of available processors. You may successfully take advantage of this
feature of the DBMS on SMP systems, thereby increasing the throughput and
supporting more concurrent users.
• However, interquery parallelism is limited. Multiple queries are processed
concurrently, but each query is still being processed serially by a single server
process. Suppose a query consists of index read, data read, join, and sort
operations; these operations are carried out in this order. Each operation must
finish before the next one can begin. Parts of the same query do not execute in
parallel. To overcome this limitation, many DBMS vendors have come up with
versions of their products to provide intraquery parallelization
Intraquery Parallelization
Let us say a query from one of your users consists of an index
read, a data read, a data join, and a data sort from the data
warehouse database. A serial processing DBMS will process
this query in the sequence of these base operations and produce
the result set. However, while this query is executing on one
processor in the SMP system, other queries can execute in
parallel. This method is the interquery parallelization discussed
above. The first group of operations in Figure 8-15 illustrates
this method of execution.
Using the intraquery parallelization technique, the DBMS splits
the query into the lower level operations of index read, data
read, data join, and data sort. Then each one of these basic
operations is executed in parallel on a single processor. The
final result set is the consolidation of the intermediary results.
Let us review three ways a DBMS can provide intraquery
parallelization, that is, parallelization of parts of the operations
within the same query itself.
Horizontal Parallelism
• The data is partitioned across multiple
disks. Parallel processing occurs
within each single task in the query;
for example, data read, which is
performed on multiple processors
concurrently on different sets of data
to be read from multiple disks. After
the first task is completed from all of
the relevant parts of the partitioned
data, the next task of that query is
carried out, and then the next one
after that task, and so on.
Vertical Parallelism
• This kind of parallelism occurs among different tasks, not just a single task
in a query as in the case of horizontal parallelism.
• All component query operations are executed in parallel, but in a pipelined
manner.
• This assumes that the RDBMS has the capability to decompose the query
into subtasks; each subtask has all the operations of index read, data read,
join, and sort. Then each subtask executes on the data in serial fashion. In
this approach, the database records are ideally processed by one step and
immediately given to the next step for processing, thus avoiding wait
times. Of course, in this method, the DBMS must possess a very high level
of sophistication in decomposing tasks.
Hybrid Method
• In this method, the query decomposer partitions the query both
horizontally and vertically. Naturally, this approach produces the best
results. You will realize the greatest utilization of resources, optimal
performance, and high scalability.
Selection of the DBMS
• Our discussions of the server hardware and the DBMS parallel
processing options must have convinced you that selection of the
DBMS is most crucial. You must choose the server hardware with the
appropriate parallel architecture. Your choice of the DBMS must
match with the selected server hardware. These are critical decisions
for your data warehouse.
Apart from the criteria that the selected DBMS must have load balancing and parallel processing
options, the other key features listed below must be considered when selecting the DBMS for your
data warehouse.

• Query governor - to anticipate and abort runaway queries


• Query optimizer - to parse and optimize user queries Query management - to balance
the execution of different types of queries
• Load utility - for high-performance data loading, recovery, and restart
• Metadata management - with an active data catalog or dictionary
• Scalability - in terms of both number of users and data volumes
• Extensibility - having hybrid extensions to OLAP databases
• Portability - across platforms
• Query tool Application Program Interfaces (APIs)—for tools from leading vendors
• Administration - providing support for all DBA functions
END.

You might also like