Adbms Part2
Adbms Part2
Data warehouse: it is a single complete and consistent store of data obtained from variety
of different sources which are made available to end user in what they can understand and
use data in business context.
Write short note on data warehousing:
Every business generates large amount of data
Example: Bank may have data about all customer loans, FDs other security etc. these data
can be used to find out depositing patterns, withdrawal pattern can be track and best suited
scheme are proposed in customized way to every customer. Other information like
customer expects which services / highest and lowest margin per customer, which product
customer do like and dislike, impact of new services on existing customer etc.
What are the advantages of using data warehouse?
1. Data ware house delivers business intelligence:
Data ware house provides data from various sources and it helps managers and
executives to take optimum decisions about business strategies.
2. Cost effective decision making:
Cost of decision making is increased due to warehouse
3. Enhance customer service:
Because of customized data it is possible to provide advertisement so that targeted
customers. Due to business intelligence customer is known to business more clearly.
4. Saves time:
5. Better data quality:
Data conversion from various sources into a common format is done in data
warehousing. Because of historical data current data having missing values can be
easily replaced by optimum quality data.
6. Prediction:
Prediction of what product will be bought by which customer in roughly which
duration can be predicted by intelligent systems.
7. Return on investment is increase:
Due to targeted marketing and common data availability return on investment for
business is increased.
Disadvantages of data warehousing:
1. Data Compatibility:
Due to various sources of data, formats of data may differ and therefore data
compatibility may arise.
2. Data inconsistency (repeated data):
Due to various formats sometimes data may be inconsistent for processing purpose.
3. Required data may not be captured. Sometimes source is new or not known due to
which require data even if it is available may not be captured.
4. Homogeneous data:
When we convert data from all sources to single format some characteristics of data
are lost.
5. Responsibility of data ownership:
In business communication and processes some person/community/department
may be unhappy on sharing of data. in such cases it may affect ownership of data.
6. High maintenance:
Data ware houses are generally high maintenance system. Maintenance cost even
increases in case of reorganization of business processes.
Properties/features of data warehouse? (5 M; DEC-16)
1. Subject oriented:
Data warehouse provides subject oriented data to businesses. All available data
cannot be provided to businesses. Subject may be customer, employee, product
salary amount, geographical area etc.
2. Integrated:
Data from various sources is collected and then integrated to enhance effective
analysis of data.
3. Time – Variant:
Time specific data is available in data warehouse.
4. Non – volatile:
Data in data warehouse remains stable and safe irrespective of any damage to data
warehouse.
Draw and explain data ware housing architecture with neat label diagram?
TOP-DOWN (Bill-Inmon):
1. Data ware house is designed first data marts are built on top of data ware house
2. Data is extracted from data marts through validation also data is extracted from
system sources and EDL tools are use to push data in data ware house.
Advantage:
it provides consistent dimensional views
This method is robust to business changes like adding new department to the
organization is simpler so adding new data mart.
Disadvantages:
It is not flexible to changing business needs during implementation phase.
It is applicable for very large projects. Where budget of project is very much
significant.
BOTTOM-UP (Ralph-kimbali):
1. Data marts are created first to produce reporting and analytical capability. As data
mart are specific to business processes reporting and analysis is also done with
respect to processes.
2. ETL is performed before data is loaded into data marts.
3. From data marts again ETL allow us to store data into data ware house.
Advantages:
This model contains consistent data marts.
As data marts are created first, reports can be generated quickly.
Extending data warehouse become simple as for new business unit new data
mart is created and it is integrated with other data mart.
Disadvantages:
Transition from top down to bottom up is complicated.
Q. Dimensional model
Elements of dimensional modelling
1. Facts: are measurement or matrix of measurement or facts from your business.
Facts can be quarterly sales value.
2. Dimension: It provides context surrounded in a business event, in simple words in
question like who, what, where, about fact is asked, then we get dimensions
3. Attribute: attributes are the various characteristics of the dimension,
Example, for location dimension attributes are city, state, country etc.
4. Fact table: Is a primary table in a dimensional model. It contains
Dimensions
Foreign key to dimension table
5. Dimension table: contains dimension of facts, they are join to fact table through
foreign key, they are de normalized tables
Process of Dimensional modelling
1. Select business process as precise as possible for which dimensional model is to be
done this can be sales situation in a particular store.
2. Decide grain: after describing the business process next step is to decide upon grain
of a model.
Grain of a model is exact description for what the dimensional model should be
purposing on. Grain generally includes simplest description of a process in minimum
dimensional model packed tables and dimensional tables will be based on grains.
3. Identification of dimensions: dimension should define within the scope of a grain as
describe in step 2.
4. After dimensions defined that table is populated
5. Build schema: in this step we implement dimensional modelling and schema is
nothing but database structure (arrangement of tables). There are two popular
schemas start schema and snow flex schema.
Q. Difference between ER model and Dimensional model?
STAR SCHEMA
1. Fact table: Fact tables generally contains measurement and matrix. It consists of
numeric values and foreign keys, they are designed in such a way that low level data
can be recorded at particular event. Fact table can be of three types based on
requirement,
Transaction fact tables: this records fact about particular event or process.
Example. sales event.
Snapshot fact table: records facts at given point in a periodic way. Example:
account details at month end.
Accumulating snapshot fact tables: aggregate fact at a given time.
2. Dimension table: usually have relatively small number of records. every record may
have large number of attributes to describe fact data. dimension table generally
captures most commonly used attributes time dimension. It has relatively smaller
number of records
Time dimension table: describes the time lowest level
Geographical dimension table: describes data like state, city, location etc.
Product dimension table: describes details of product
Employee dimension table
Range dimension table: describes range of time currency or any other
measurable value.
Advantage:
Simple queries: star schema is logically simple than logic required in join
condition in SQL. Where normalized transaction schema is used.
Simple business reporting logic: highly normalized table can give simple data
which is easy to process but may not be useful for business process. Whereas it
simplifies common business reporting logic.
Fast aggregation: simple query result in high performance and aggregation
operation.
Data cubes: start schema use by OLAP system to create OLAP cubes.
Disadvantages:
Data integrity is not maintained here: as data is deformalized
In real life some systems are complex and very big in size. In such cases, start schema
cannot represent system properly here we need complex design schema i.e
snowflake schema.
For example: as shown above, fact table contains the attributes – date, cust, prod,
city_name. which acts as foreign key for four dimension tables.
A particular dimension table here city has an attribute which works as foreign key
like region.
Dimension table is further normalise and split into multiple tables to manage
increased and detailed data.
Snowflake schema is a logical arrangement of tables in multidimensional database.
ER diagram is a two dimensional arrangement whereas snowflake is a
multidimensional arrangement.
Snowflake is a method of normalising dimension tables presenting starts schema.
When it is completely normalised along all dimension tables the resultant structure
looks like snowflake.
Principle behind snowflake is normalisation. This process removes low cardinality
attributes and forms separate tables.
Advantages:
Start schema is considered special case of snowflake schema due to which designing
both schema remains simple.
The moment snowflake schema is wrong star schema is automatically drop.
Easier to maintain.
Provides improved flexibility.
Provides improved query performance.
Allows minimise disc storage requirements.
Allows less lookups with smaller size.
Disadvantages:
Needs additional maintenance effort due to more number of tables.
Query execution plans are difficult.
Q. Operations in OLAP
1. Slice:
It is a feature where by user can take specific set of data from the OLAP cube.
Slice operation performs selection on one dimension of a given cube. Resulting in a
reduction of dimension.
From the data cube of dimensions month’s product and regions, if data is selected
for only one region, it results into slice operation.
2. Dice:
When a sub cube is selected from original big cube this operation is called as dice
operation.
3. Roll Up:
Rolled up Cities to country’s
It involves aggregation of data that can be accumulated and computed in one or
more dimensions
4. Drill Down:
Quarters are drill down to months
In contrast to roll up drill down is a process of navigate through the details by
stepping down concept hierarchy for dimension.
5. Pivot Operation:
It is also known as rotate operation to view the same data from different
perspectives
Models in OLAP:
a. ROLAP
b. MOLAP (Multidimensional online analytical processing)
c. HOLAP
MODULE 3: ADMT
ACEESS CONTROL:
access control is sometime ambiguous term it is authentication or verification method
where different level of control is given to the system resources after user account
credential have been verified.
Discretionary privileges:
Standard way of Discretionary access control is granting and revoking privileges. It can be
assigned in two levels:
a. Account based: in this level DBA specifies privileges he which account hold
independently
b. Relational based: in this level DBA can control privileges to access individual
relations.
Mandatory Access Control:
In certain case we have to introduce a security policy to classify data, this approach is called
as mandatory access control. Data is classified into
Top secrete
Secrete
Confidential
Unclassified
Data as well as users are classified into above 4 categories.
In this approach there are many restrictions on reading as well as writing data
First restriction is, no subject is allowed to read an object which has security classification
higher than security subject clearance.
Other restriction is in subject from operation of writing at a lower security classification.
Types of databases:
1. Mobile databases: mobile devices like smartphone or tablets use data over mobile
network. It is a database that can be connected by mobile device over any network.
Properties:
a. it provides full database system capacity
b. provide communication capacity in the form of wired or wireless network.
Architecture:
2. Temporal Database: it is a database where time period is attached to it. By attaching
a time period, it is possible to store different states of database. To implement such
databases, data can be inserted into the table based on timestamp
Types of temporal database:
a. Historical database: it uses a data with respect to valid time.
b. Rollback database: it stores data with respect to transaction time.
c. Bi-temporal database: stores data with respect to valid time and transactions.
d. Snapshot database: in commercial databases real world data is stored, for such
databases most recent copy is known as snapshot database.
3. Spatial database: these databases offers spatial data types and model along with
altogether different query language and its supporting tools.
Spatial databases offer underlined applications of GIS and manage geometric,
geographic and spatial data.
Characteristics of spatial database:
a. It should be positional accurate
b. Accessible to whomsoever needs it
c. Readily updatable on regular schedule
d. Internally accurate and should convey correct nature of geography.
e. Exactly compatible with other information on the layers
f. Flexible and extensible so that additional datasets can added as and when
required.
ARIES ALGORITHM:
Algorithm for recovery, isolation exploiting semantics
It works on 3 important concepts:
1. WAL (write ahead logging): any changes to the database are logged first and then
transfer to permanent storage then action regarding logs is taken.
2. Repeating history during REDO: retraces the action before crash and bring system
back to the exact state
3. Logging changes during UNDO: changes made to database while UNDO operations
are also logged
Logging: for ARIES algorithm lot of logged records are created, log entries are sequentially
ordered in the log files.
Recovery: it works in 3 phases as follow,
1. Analysis: I this phase we restore DPT (dirty page table) and TT (Transaction
table ), we go through log files and decide upon which transaction are to be
undone, redone or removed from the system.
2. REDO: from DPT transaction sequence number can be identified. Due to system
crash data is lost such kinds of transactions are to be redone.
3. UNDO: system traces backward for transactions having no commit point but
having start point. For such transactions UNDO action is to be taken.
Checkpoint: To avoid whole rescanning of whole log file during analysis phase it is better to
save DPT and TT with checkpoints