0% found this document useful (0 votes)
46 views23 pages

DWMM Notes

dwmm

Uploaded by

mayurachibb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views23 pages

DWMM Notes

dwmm

Uploaded by

mayurachibb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

.

INDEX

S. No. Topic Name Page No.

1. Data Warehouse Introduction 2


2. Dimensional Modelling 11

Page | 1
Data Warehouse Introduction

Ques1: Define Data Warehouse.


According to Bill Inmon, the father of the data warehousing concept, a data warehouse is a
subject-oriented, integrated, time-variant and non-volatile collection of data in support of
management's decision-making process.
Ques2: What is a Data Warehouse?
A Data Warehouse is a Relational Database Management System made to meet the
requirements of transaction processing systems. It contains current data and historical data of
transactions, from single and multiple sources. It is a group of data specific to the entire
organization and not only to a particular group of users. It is not used for daily operations or
transaction processing but it is used for decision making.
It is a centralised data repository which can be queried for business benefits. It is a database
which has been designed to help make decisions based on stored information. It includes
tools and technologies to help executives, managers and analysts make better decisions.
Ques3: What are the characteristics of a Data Warehouse?
A Data Warehouse is subject-oriented, integrated, time-variant and non-volatile.
Ques4: What do we mean when we say that a Data Warehouse is subject-oriented?
It means that the data warehouse focuses on specific subjects or topics and not the daily
operations of an organization. It organizes and presents information around a particular
subject such as sales, inventory, or customer data. This is done by excluding data which is not
useful regarding a subject and including all data needed by the users to understand the
subject. This approach allows users to easily access and analyze data related to their area of
interest, facilitating better decision-making.
Day To Day Operations of an Organization

Page | 2
Ques5: What do you mean when we say that a Data Warehouse is integrated?
It means that data from different sources like RDBMS, flat files and online transaction
records is combined and stored in a central repository. The integrated data includes both
current and historical information. It facilitates unified access to data for various purposes,
such as reporting and analysis. Integration ensures consistency and coherence of data across
the warehouse. Users can access integrated data seamlessly, regardless of its source, enabling
comprehensive analysis and decision-making. E.g. Account is a subject. Data from various
sources – savings account, current account and loan account can be stored in the subject
Account.

Ques5: What do you mean when we say that a Data Warehouse is time-variant?
It means that the Data Warehouse stores historical data alongside current information for
comparison and trend analysis. Users can access data from various time intervals, such as
daily, weekly, monthly, or annually. Time-variant data allows for tracking changes over time
and making informed decisions based on historical patterns.
Ques5: What do you mean when we say that a Data Warehouse is non-volatile?
It means that once the data has entered the Data Warehouse, it should not change. It should
not get updated with every transaction or operation like in operational databases. Non-
volatility ensures that historical data remains accessible for analysis and decision-making.
Users can rely on the consistency and stability of the data stored in the warehouse. Non-
volatility distinguishes data warehouses from operational databases, which are subject to
frequent updates and changes.

Page | 3
Ques6: What is an Operational Database?

The Operational Database is the source of information for the data warehouse. It includes
detailed information which is used to run the day-to-day operations of the business. The data
in an Operational Database is frequently updated and reflects the current value of the last
transaction. Operational Database Management Systems also called OLTP (Online
Transactions Processing Databases), are used to manage dynamic data in real time. Data
Warehouse Systems are called as Online-Analytical Processing (OLAP) Systems.

Data Warehouse and the OLTP database are both relational databases. However, the goals of
both these databases are different.

Ques7: Differentiate between Operational Database and Data Warehouse.


Operational Database Data Warehouse
1) Are designed to support high-volume 1) Are designed to support high-volume
transaction processing (OLTP). analytical processing (OLAP).
2) Are usually concerned with current data. 2) Are usually concerned with historical
data.
3) The data within an Operational Database 3) Data Warehouse is non-volatile. The data
is regularly updated according to need. once entered the data warehouse rarely
changes.
4) It supports thousands of concurrent 4) It supports fewer concurrent clients.
clients.
5) Operational systems are widely process- 5) Data Warehousing systems are widely
oriented. subject-oriented.
6) Operational systems are usually 6) Data warehousing systems are usually
optimized to perform fast inserts and optimized to perform fast retrievals of
updates of associatively small volumes of relatively high volumes of data.
data.

Ques7: Differentiate between OLAP and OLTP.


OLTP OLAP
Purpose 1) OLTP (Online Transaction 1) OLAP (Online Analytical
Processing) manages real-time Processing) analyzes large
transactional data, focusing on volumes of historical data to
recording, processing, and provide insights and support
managing day-to-day decision-making processes.
transactions in databases.
Functionality 2) OLTP systems prioritize 2) OLAP systems perform
quick and efficient data complex queries and
manipulation, ensuring data aggregations on historical
integrity and concurrency data to generate reports,
control for frequent forecasts, and analysis for
transactions. strategic decision-making.
3) OLTP systems 3) OLAP systems often
Data Model typically utilize normalized employ denormalized or
data models to minimize star/snowflake schema
redundancy and ensure models to optimize query
Page | 4
transactional consistency. performance for analytical
operations.
User Interaction 4) OLTP systems are used by 4) OLAP systems are
front-end applications for real- accessed by analysts,
time data entry, retrieval, and managers, and decision-
updates by end-users. makers to generate reports,
perform data analysis, and
make strategic decisions
based on historical trends.

Ques8: What is data granularity?


Data granularity in DBMS refers to the level of detail or precision at which data is stored,
represented or measured within a database system. Granularity can vary from fine to coarse,
depending on the specific requirements of the application or system. Fine granularity implies
storing data at a more detailed level, such as storing individual transactions or records. Coarse
granularity involves aggregating or summarizing data into larger units, such as storing data at
the level of days, weeks, or months. Choosing the appropriate level of granularity is crucial
for optimizing storage efficiency, query performance, and data analysis capabilities within the
database system.
In a bank's transaction database, data granularity could be observed in the level of detail
recorded for each transaction. E.g.
1) High Granularity: Recording each transaction individually with specific details such as
transaction amount, date, time, account numbers involved, transaction type (withdrawal,
deposit, transfer), and location. Here, high granularity allows detailed analysis of individual
transactions, facilitating fraud detection, customer behaviour analysis, and personalised
services.
2) Low Granularity: Aggregating transactions into daily summaries without capturing
individual transaction details, providing only total amounts for deposits and withdrawals.
Here, low granularity may be sufficient for overall trend analysis but cannot provide insights
into specific transactions.

Ques9: Explain the architecture of the data warehouse with an appropriate diagram.
https://www.youtube.com/watch?v=Eh_8ZATRauQ  Copy diagram half

Page | 5
Ques10: What are the components of Source Data in a Data Warehouse?
The source data comprises of Production Data, Internal Data, Archived Data and External
Data.
1) Production Data: This is the day-to-day data which comes from different operational
systems in a company. Based on the data requirements in the data warehouse, parts of the
data are chosen.
2) Internal Data: Every organization has its private files like spreadsheets, reports, and
customer records. These are called internal data, some of which can be useful in a data
warehouse.
3) Archived Data: It is the historical data.
4) External Data: It is the data relevant to the industry provided by the external departments.
The source data from various operational systems and external sources is extracted and fed
into the data staging area of the data warehouse where it is changed, converted and made
ready in a format which can be used for querying and analysis.

Ques11: Explain the ETL process in a data warehouse.


The ETL process (Extract, Transform, Load) plays a crucial role in collecting, refining and
consolidating data from various sources into a centralized repository for efficient analysis and
decision-making.
The ETL (Extract, Transform, Load) process in a data warehouse involves three main steps:
1) Extract: Data is extracted from various sources, such as databases, files, or applications.
2) Transform: Extracted data is transformed into a format that is suitable for analysis and
storage in the data warehouse. This may involve cleaning, filtering, aggregating, or applying
business rules to the data.
3) Load: Transformed data is loaded into the data warehouse, where it is organized and
stored for further analysis and reporting. This step ensures that the data is available for
querying and decision-making processes.

Page | 6
Ques12: Explain the concept of metadata in a data warehouse.
Metadata is data about data, which provides information about the other data stored in the
data warehouse. It helps users understand the data content, its origin, and how it can be
utilized for analytical purposes. Metadata includes attributes like data types, constraints,
integrity rules, and storage details, offering comprehensive insights into the data. Proper
management of metadata ensures the accuracy, consistency, and reliability of data stored in
the data warehouse.

Ques13: What are the different types of metadata in a data warehouse?


1) Descriptive Metadata: Provides information about the content and structure of the data. It
includes details such as data types, field names, and relationships between tables.
2) Structural Metadata: Defines the structure of the data warehouse itself, including
database schemas, table definitions, and data organization.
3) Administrative Metadata: Manages the operations and administration of the data
warehouse. It includes details like user permissions, access controls, and data lineage.
4) Technical Metadata: Contains technical details about data storage, formats, and
processing mechanisms. It includes information on data extraction, transformation, and
loading (ETL) processes.
5) Rights Metadata: Specifies the intellectual property rights associated with the data,
including copyrights, licenses, and usage restrictions.
These types of metadata collectively provide comprehensive information about the data
stored in the data warehouse, enabling effective management, analysis, and utilization.

Ques14: What are the different steps for creating a data warehouse?
The different steps for creating a data warehouse are:-
1. Determine the Business Objectives: Understand the organization's goals and objectives
for building the data warehouse, including the specific business questions it aims to answer.
2. Collect and Analyze Information: Gather information about data sources, formats, and
business processes to assess the data needs and requirements
3. Identify Core Business Processes: Identify the key business processes and data elements
required to support the organization's objectives.
4. Create a Source Data Model: Develop a model that defines the structure and
relationships of the source data to be used in the data warehouse.

Page | 7
5. Conceptualize and Select the Platform: Decide on the technology platform and
architecture for the data warehouse, considering factors like scalability, performance, and
budget.
6. Develop a Project Roadmap: Outline a roadmap with timelines, milestones, and resource
requirements for building the data warehouse.
7. Design and Implement ETL Processes: Develop Extract, Transform, and Load (ETL)
processes to extract data from source systems, transform it into the desired format, and load it
into the data warehouse.
8. Implement Data Governance and Security: Establish data governance policies and
security measures to ensure the integrity, confidentiality, and availability of data.

Ques15: What is a data mart?


Data marts are specialized subsets of data warehouses which focus on specific business
functions or departments within an organization. Unlike a comprehensive data warehouse,
which encompasses the entire organisation, data marts are application-oriented and provide
easier access to targeted information for analysis and decision-making purposes. By
organizing data into smaller, focused repositories, data marts facilitate quicker insights and
more efficient data retrieval, enhancing the effectiveness of business operations and enabling
users to derive actionable insights from their data.

Ques16: What are the different types of data marts?


There are two types of data marts:-
1) Dependent Data Mart: Here, the data marts are treated as the subsets of a data
warehouse. In this technique, firstly a data warehouse is created from which various data
marts can be created. These data marts are dependent on the data warehouse and extract
essential records from it. Here, since the data warehouse creates a data mart, there is no need
for data mart integration. It is also called as top-down approach.
2) Independent Data Mart: Here, firstly independent data marts are created and then a data
warehouse is created using these independent data marts. Since all the data marts are created
independently, therefore the integration of data marts is required due to which is also called
as bottom-up approach.

Page | 8
Ques17: Why are data marts created?
Data Marts are created for various reasons:-
1) Data marts provide easier access to specific subsets of data compared to dealing with the
entire data warehouse. This facilitates quicker retrieval of relevant information for decision-
making.
2) They allow organizations to focus on particular business areas or departments, enabling
more targeted analysis and reporting.
3) Data marts can be tailored to meet the specific needs of different user groups or business
units within an organization. This customization ensures that users have access to the most
relevant and useful data for their purposes
4) By concentrating on specific subject areas, data marts simplify data maintenance tasks
compared to managing a large, comprehensive data warehouse.
5) Data marts can be scaled independently, allowing organizations to expand their analytical
capabilities incrementally as needed.

Ques18: What are the advantages of Data Marts?


1) Improved Performance: Data marts can provide faster access to data compared to a full
data warehouse, as they store a subset of data tailored to specific user needs.

Page | 9
2) Cost-Effectiveness: Data marts are cost-effective because they store a particular subset of
data, which lowers data storage costs compared to maintaining a comprehensive data
warehouse.
3) Ease of Implementation: Implementing a data mart requires less time compared to a data
warehouse, as data marts are designed for specific purposes and subsets of data.
4) Improved Access: Data marts enable easier and faster access to data for specific user
groups or departments, enhancing decision-making and operational efficiency.

Ques19: What are the disadvantages of Data Marts?


1) Limited Scope: Data marts focus on specific subject areas or departments, which can lead
to data silos and hinder cross-functional analysis.
2) Data Consistency Challenges: Maintaining data consistency across multiple data marts
and the central data warehouse can be challenging and may result in discrepancies.

Ques20: Difference between Data Warehouse and Data Mart.

Data Warehouse Data Mart


1) Data warehouse is a centralised system. 1) Data Mart is a decentralised system.
2) It is a top-down approach. 2) It is a bottom-up model.
3) Building a data warehouse is difficult. 3) Building a data mart is easy.
4) Here, fact constellation schema is used.4) Here, star schema and snowflake schema
are used.
5) A data warehouse receives data from the 5) A data mart receives data from a data
staging area. warehouse.
6) Long processing time because of large 6) Less processing time because of less
data. amount of data.
7) Data Warehouse is vast in size. 7) Data Mart is smaller than data
warehouse.
8) Data Warehouse is subject-oriented in 8) Data Mart is application-oriented in
nature. nature.

Page | 10
Ques21: Explain the different steps in implementing a Data Mart.
1) Designing: This is the first step in implementing a data mart. It covers all the functions
from initiating the request for a data mart to gathering data about the requirements and
developing the logical and physical design of the data mart. It involves the following tasks:
a) Gathering the business and technical requirements
b) Identifying the data sources
c) Selecting the appropriate subset of data
d) Designing the logical and physical architecture of the data mart
2) Constructing: This step involves creating the physical database and the logical structures
associated with the data mart to provide fast and efficient access to the data.
It involves the following tasks:
a) Creating the physical database and logical structures such as tablespaces associated with
the data mart.
b) Creating the schema objects such as tables and indexes described in the design step.
c) Determining how to best set up the tables and access structures.
3) Populating: This step includes all of the tasks related to getting data from the source,
cleaning it up, modifying it to the right format and level of detail and moving it into the data
mart. It involves the following tasks:
a) Mapping data sources to target data sources
b) Extracting data
c) Cleansing and transforming the information
d) Loading data into the data mart
e) Creating and storing metadata

4) Accessing: This step involves putting the data to use which involves querying the data,
analyzing it, creating reports, charts and graphs and publishing them. It involves the
following tasks:
a) Setting up an intermediate layer for the front-end tool to use.
b) Setting up and managing database architectures like summarized tables
5) Managing: This step involves managing the data mart throughout its lifetime. It involves
the following tasks:
a) Providing secure access to the data.
b) Managing the growth of data.
c) Optimizing the system for better performance
d) Ensuring the availability of data event with system failures

Page | 11
Dimensional Modelling

Ques1: Within a data warehouse, how is the data organised and represented?
Data in a data warehouse is usually multidimensional, having measure attributes (facts) and
dimension attributes (dimensions). Multidimensional data refers to data being organized and
represented in multiple dimensions. Multidimensional data is typically represented as a data
cube. This cube consists of multiple dimensions such as time, geography, product, and
customer. A data cube enables data to be modelled and viewed in multiple dimensions. This
structure allows for more complex analysis and querying compared to traditional relational
databases.
Ques2: What is a fact, fact table, dimension, dimension table?

Page | 12
Fact: A fact represents the numerical data or metrics that are of interest for analysis. These
are typically quantitative measures, such as sales revenue, quantity sold, or profit margin.
Facts provide the core data that analysts analyze and report on.
Fact Table: A fact table is the central table in a dimensional model. It contains the facts or
measures along with foreign keys referencing the dimension tables. Fact tables store the
quantitative information for analysis and are surrounded by dimension tables. Examples of
fact tables include sales transactions, order details, or inventory levels.
Dimension: Dimensions are the descriptive attributes that provide context to the facts. They
represent the various ways the data can be analyzed or categorized. Dimensions are typically
hierarchical and include attributes such as time, geography, product, or customer.
Dimension Table: Dimension tables contain the attributes of dimensions. Each dimension
table represents a specific dimension and includes descriptive information about that
dimension. For example, a time dimension table may contain attributes like year, month,
quarter, and day. Dimension tables provide the context and background information necessary
for analyzing the facts stored in the fact table.

Ques3: Differentiate between fact table and dimension table.


Fact table Dimension Table
1) Contains quantitative data or measures 1) Stores descriptive attributes that provide
that represent business facts or events, such
context to the facts, such as time, product,
as sales revenue or quantity sold. or location
2) Holds numerical data and facts about a 2) Contains textual or descriptive attributes
business process, often referred to as that describe the characteristics of the data
measurements or metrics. in the fact table. They hold descriptive
attributes which provide context to facts.
3) The foreign keys of fact table are the 3) The dimensional table contains the
primary keys of dimension table. primary keys which are referenced by the
fact table.
4) Contains numerical data types, often 4) Contains descriptive data types, such as
aggregated through various functions like text, dates, or categorical variables

Page | 13
sum or average.
5)Contains lesser number of attributes than 5) Contains more number of attributes than
a dimension table. a fact table.
6) Contains more number of records than a 6) Contains less number of records than a
dimension table. fact table.
7) The fact table forms a vertical table. 7) The dimension table forms a horizontal
table.
8) In a schema, the number of fact tables is 8) In a schema, the number of dimension
less than the number of dimension tables. tables is more than the number of fact
tables.
9) Used for analysis, reporting, and 9) Dimension Table: Provides context and
aggregations in data analysis and business background information for analyzing the
intelligence. data in the fact table
10) Pure fact table is a collection of foreign 10) Pure dimension table is a collection of
keys. primary keys.

Ques4: What is a data cube?


A data cube is a multidimensional structure used in data warehousing and OLAP (Online
Analytical Processing) systems to represent and analyze data in multiple dimensions. Unlike
traditional two-dimensional tables or spreadsheets, a data cube organizes data along multiple
dimensions, allowing for analysis across various attributes simultaneously. A data cube is
typically organized around a central theme, such as sales, finance, or customer demographics.
It is designed for easy retrieval and manipulation of data, making it suitable for decision-
making processes and business intelligence applications. Data cubes support OLAP
operations like slice, dice, drill-down, and pivot, enhancing the ability to analyze data from
different perspectives.

Reference: https://www.javatpoint.com/data-warehouse-what-is-data-cube

2D view of Sales Data


Here we are looking at the sales data of all electronics (in thousands) per quarter in the city of
Vancouver.

Page | 14
3D view of Sales Data
Suppose we would like to view the sales data in dollars (in thousands) according
to time, item and location for the cities of Chicago, New York, Toronto, and
Vancouver. Here, time, item and location are the three dimensions. The 3-D data
of the table are represented as a series of 2-D tables.

Conceptually, we can represent the same data in the form of 3-D data cubes, as shown in fig:

In data warehousing, the data cubes are n-dimensional. The cuboid which holds the lowest
level of summarization is called a base cuboid. The topmost 0-D cuboid, which holds the
highest level of summarization, is known as the apex cuboid.

Page | 15
Ques5: What is a data warehouse schema?
A data warehouse schema is a logical design that represents the structure of a data warehouse.
It defines how data is organized, stored, and accessed within the data warehouse
environment.

Ques6: Explain the different types of data warehouse schema. Also, list their advantages
and disadvantages.
a) Star Schema: It represents a multidimensional data model. It is known as star
schema because the entity-relationship diagram of this schema resembles a star with the fact
table at the center and dimension tables surrounding it.

Advantages
1) Star schema simplifies the queries by reducing the number of joins required, resulting in
faster query performance.
2) It enables fast aggregations and calculations, such as total items sold or revenue, making it
suitable for analytical queries.
3) Due to its simplicity, star schema is easier to maintain and modify.

Disadvantages
1) Star schema can be rigid, making it challenging to accommodate changes or additions to
the data model.
2) It may lead to data quality issues, particularly when handling denormalized data,
potentially impacting data integrity.
3) Managing a star schema with numerous dimension tables can introduce complexity,
affecting maintenance and performance

Page | 16
b) Snowflake Schema: The snowflake schema is an expansion of the star schema where each
point of the star explodes into more points. It is called snowflake schema because the diagram
of snowflake schema resembles a snowflake.
Snowflaking is a method of normalizing the dimension tables in star schemas.
When we normalize all the dimension tables entirely, the resultant structure resembles a
snowflake with the fact table in the middle. The snowflake schema consists of one fact table
which is linked to many dimension tables, which can be linked to other dimension tables
through a many-to-one relationship. Tables in a snowflake schema are generally normalized
to the third normal form. Each dimension table performs exactly one level in a hierarchy.
Advantages
1) Snowflake schema reduces data redundancy by normalizing dimension tables, leading to
efficient storage
2) Normalization reduces data redundancy, offering protection from inconsistencies and
ensuring accurate data analysis
3) The minimized disk storage requirements and smaller lookup tables enhance query
performance.
4) Snowflake schema provides structured data enhancing data integrity and organization.

Disadvantages
1) Snowflake schema needs more maintenance and is complex to manage because it involves
additional lookup tables.
2) Snowflake schema may lead to complicated queries because of large number of joins
between tables. This can slow down performance.
3) Maintaining Snowflake schema can be expensive.

c) Fact Constellation Schema: In this schema, multiple fact tables share the same
dimension tables. It is also called as galaxy schema.
This schema describes the logical structure of data warehouse.

Page | 17
Advantages
1) Fact constellation schema allows flexible data modeling.
2) It facilitates rich analysis by providing multiple paths for analyzing data.

Disadvantages
1) Designing, implementing, and maintaining a fact constellation schema can be more
challenging compared to simpler schemas like star schema due to its intricate structure.
2) It may lead to data redundancy because of repeated dimension table which can impact
storage efficiency.
3) Multiple fact tables make galaxy schema complex.

Ques7: What is the difference between star schema and snowflake schema.
Star Schema Snowflake Schema
1) Resembles a star, with the fact table at 1) Looks like a snowflake, with fact table at
the center and dimension tables around it. the center, connected to dimension tables
that further branch out into sub-dimensions.
2) It follows a top-down design approach. 2) It's more complex to design in
comparison to star schema.
3) Uses more space due to denormalized 3) Generally uses less space due to
dimension tables. normalized structure.
4) More data dependency and redundancy. 4) Less data dependency and redundancy.
5) Complicated joins are not required. 5) Complicated joins are required.

Ques8: What is the difference between snowflake schema and fact constellation schema.
Snowflake Schema Fact Constellation Schema
1) It contains a large central fact table, 1) In this schema, multiple fact tables share
dimension tables and sub-dimension tables. the same dimension tables.
2) It consists of one star schema at a time. 2) It consists of more than one star schema
at a time.
3) It is a normalized form of star schema. 3) It is a normalized form of snowflake
schema and star schema.
4) It is easy to operate because it has less 4) It is difficult to operate because of

Page | 18
number of joins between the tables. multiple joins between the tables.
5) In snowflake schema, a simple query can 5) In fact constellation schema, a complex
be used to access the data from the database. query has to be used to access the data from
the database.

Ques9: What is the difference between star schema and fact constellation schema.
Star Schema Fact Constellation Schema
1) Each dimension is represented with only 1) In this schema, multiple fact tables share
one dimension table. the same dimension tables.
2) It is easy to maintain the tables. 2) It is difficult to maintain the tables.
3) It does not use normalization 3) It is a normalized form of snowflake
schema and star schema.
4) ) In star schema, a simple query can be 4) In fact constellation schema, a complex
used to access the data from the database. query has to be used to access the data from
the database.
5) It is easy to operate because it has less 5) It is difficult to operate because of
number of joins between the tables. multiple joins between the tables.

Ques10: What are the different types of keys used in a schema?


Refer: https://www.geeksforgeeks.org/types-of-keys-in-relational-model-candidate-super-
primary-alternate-and-foreign/
https://www.geeksforgeeks.org/surrogate-key-in-dbms/
learn examples from these references.

The different types of keys are:-


1) Super Key: A superset of all possible primary keys of a relation is called as a super
key. Adding zero or more attributes to the candidate key generates a super key. A
super key can contain NULL values.
2) Candidate Key: The minimal super key which can uniquely identify a tuple or a
record in a table and which if broken any further will no longer remain a key is called
as candidate key.
3) Primary Key: A primary key is chosen from the candidate keys. It is used to uniquely
fetch a record from the database. It has unique values. It cannot contain NULL values.

Page | 19
4) Alternate Key: All candidate keys other than the primary key are called as alternate
key. It is a secondary key.

5) Foreign Key: This key acts as primary key in one table and as secondary key in
another table. It follows referential integrity constraint. In referential integrity
constraint, we wish to ensure that a value which appears in one relation for a given set
of attributes also appears for a certain set of attributes in another relation.
6) Composite Key: Sometimes, a table might not have a single attribute that uniquely
identifies all the records of a table. To uniquely identify rows of a table, a
combination of two or more attributes can be used. So composite key acts as a
primary key if there is no primary key in a table. In rare cases, a composite key can
give duplicate values. So, we need to find the optimal set of attributes that can
uniquely identify rows in a table.
7) Surrogate Key: It is also called as synthetic primary key. It is a sequential number
which is automatically generated by the database. This number is outside of the
database which is made available to the user or the application. The value of surrogate
key cannot be modified by the user or the application. If we do not have a natural
primary key in the table then we need to artificially create a primary key (surrogate
key) to uniquely fetch a record from the table. The surrogate key is called as fact less
key. It is added just for the ease of identification of unique values but it contains no
relevant fact or information which is useful for the table.

Ques10: What is the difference between multidimensional data model and relational
data model ?
Multidimensional data model Relational data model
1) Here, the data is organised in a cube like 1) Here, the data is organised in tables with
structure. rows and columns.
2) It is suitable for analyzing large amounts 2) It is ideal for managing structured data.
of data efficiently.
3) Here, the data is in denormalized form. 3) Here, the data is in normalized form.
4) It is mainly used for OLAP. 4) It is mainly used for OLTP.

Page | 20
5) MDX language is used for querying 5) SQL is used for querying relational
multidimensional databases. databases.
6) It is designed to handle data with multiple 6) It requires defining relationships between
perspectives and dimensions, allowing tables using keys and foreign keys to ensure
quick analysis from different angles. data accuracy and integrity.

Ques10: What is the difference between Data Warehousing and OLTP?


Data Warehouse OLTP
1) A data warehouse serves as a centralized 1) It stands for Online Transaction
repository for storing structured, organized, Processing. It allows real-time execution of
and historical data from multiple sources for many database transactions which are
reporting and analysis. occurring concurrently.
2) It stores large amount of data including 2) It holds current data.
historical data.
3) It generally uses denormalized data. 3) It generally uses normalized data.
4) It is regularly updated using ETL tools. 4) It is always up to date.
5) It uses query processing. 5)It uses transaction processing.
6) It is subject-oriented. 6) It is application-oriented.

Ques11: What is the difference between OLAP and OLTP?


OLAP OLTP
1) It is online database query management 1) It is online database modifying system.
system.
2)It consists of historical data from various 2) It consists of only current operational
databases. data.
3) It makes use of data warehouse. 3) It makes use of standard DBMS.
4) It is subject-oriented. 4) It is application-oriented.
5) In an OLAP database, the tables are not 5) In an OLTP database, the tables are
normalized. normalized.
6) It serves the purpose of extracting 6) It serves the purpose of insertion,
information for analysis and decision- updating and deletion of information from
making. the database.

Ques12: What are the advantages of data warehouse?


1) It serves as a centralized repository for storing structured, organized, and historical
data from multiple sources for reporting and analysis.
2) It involves the extraction, transformation, and loading (ETL) of data into a structured
format for efficient querying and analysis.
3) It is used by organizations to support decision-making processes and strategic
planning.

Page | 21
4) It produces a structured and organized view of data through reports, dashboards, and
analytics tools for effective decision-making.
5) Data warehouse allows users to access critical data from a number of sources in a
single place. In this way, it saves user’s time of retrieving data from multiple sources.
Ques13: What are the different views of data warehouse?
The different views of a data warehouse are:-
1) Top-down view: This view allows the selection of relevant information necessary for
the data warehouse.
2) Data source view: This view presents the way in which the data is being captured,
stored and managed by the data warehouse system.
3) Data warehouse view: This view includes the fact tables and dimension tables. It not
only represents the information which is stored within the data warehouse but also the
information regarding the data, time of origin of the source data to provide the
historical context.
4) Business query view: This view gives the perspective of data in the data warehouse
from the viewpoint of the end user.

Ques14: Explain the three tier architecture of data warehouse?


Data warehouses usually have a three tier architecture which includes:-
1) Bottom Tier: It consists of Data Warehouse server which is generally a Relational
Database Management System. It may include specialized data marts and metadata
repository. Backend tools and utilities are used to feed data into bottom tier. These
backend tools and utilities perform ETL and refresh operations.
2) Middle Tier: It consists of OLAP server for fast querying of the data warehouse. The
OLAP server can be implemented in one of the following ways:-

a) Through a Relational OLAP (ROLAP) model: It is an extended relational database


management system. In ROLAP, the data is stored in a relational database. Here,
each action of slicing and dicing is equivalent to adding a “WHERE” clause in
the SQL statement. ROLAP can handle large amounts of data. ROLAP model is
based on star schema.
b) Through a Multidimensional OLAP (MOLAP) model: It is an array based model
which directly implements multidimensional data and operations. Here, the storage
utilization is low if the dataset is sparse. MOLAP model is based on cube based
schema. MOLAP cubes enable fast data retrieval, are optimal for slicing and dicing
and can perform complex calculations.
3) Top Tier: It contains front-end tools for displaying results provided by OLAP, as well
as additional tools for data mining of the OLAP-generated data.

Page | 22
Ques15: What is HOLAP?

Hybrid OLAP (HOLAP) is a data processing approach that combines the benefits of
both relational OLAP (ROLAP) and multidimensional OLAP (MOLAP) systems. In
HOLAP, data is stored using a combination of multidimensional data structures and
relational database tables. This allows efficient handling of large volumes of detailed
data while also providing the flexibility and scalability of relational databases.

Ques16: What are the different types of data warehouse models?


The different types of data warehouse models are:-
1) Virtual Data Warehouses: It is a view over

Page | 23

You might also like