0% found this document useful (0 votes)
52 views13 pages

3 Marks 1.what Is Data Warehouse?: o o o o o

A data warehouse is a relational database designed for query and analysis rather than transaction processing. It contains historical data from multiple sources and focuses on supporting decision making through data modeling and analysis. Key characteristics of a data warehouse include being subject-oriented, integrated, time-variant, and non-volatile. There are different architectures for data warehouses including single-tier, two-tier, and three-tier architectures.

Uploaded by

Nimitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views13 pages

3 Marks 1.what Is Data Warehouse?: o o o o o

A data warehouse is a relational database designed for query and analysis rather than transaction processing. It contains historical data from multiple sources and focuses on supporting decision making through data modeling and analysis. Key characteristics of a data warehouse include being subject-oriented, integrated, time-variant, and non-volatile. There are different architectures for data warehouses including single-tier, two-tier, and three-tier architectures.

Uploaded by

Nimitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

3 Marks

1.What is Data Warehouse?

A Data Warehouse (DW) is a relational database that is designed for query and
analysis rather than transaction processing. It includes historical data derived from transaction
data from single and multiple sources.

A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on


providing support for decision-makers for data modeling and analysis.

A Data Warehouse is a group of data specific to the entire organization, not only to a particular
group of users.

It is not used for daily operations and transaction processing but used for making decisions.

A Data Warehouse can be viewed as a data system with the following attributes:

o It is a database designed for investigative tasks, using data from various applications.
o It supports a relatively small number of clients with relatively long interactions.
o It includes current and historical data to provide a historical perspective of information.
o Its usage is read-intensive.
o It contains a few large tables.

"Data Warehouse is a subject-oriented, integrated, and time-variant store of information in


support of management's decisions."

2. What is the Characteristics of Data warehouse?


Subject-Oriented

Data warehouses typically provide a concise and straightforward view around a particular
subject, such as customer, product, or sales, instead of the global organization's ongoing
operations.

This is done by excluding data that are not useful concerning the subject and including all data
needed by the users to understand the subject.

Integrated

A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and
online transaction records. It requires performing data cleaning and integration during data
warehousing to ensure consistency in naming conventions, attributes types, etc., among
different data sources.

Time-Variant

Historical information is kept in a data warehouse. For example, one can retrieve files from 3
months, 6 months, 12 months, or even previous data from a data warehouse. These variations
with a transactions system, where often only the most current file is kept.

Non-Volatile

The data warehouse is a physically separate data storage, which is transformed from the source
operational RDBMS. The operational updates of data do not occur in the data warehouse, i.e.,
update, insert, and delete operations are not performed. It usually requires only two procedures
in data accessing: Initial loading of data and access to data. Therefore, the DW does not require
transaction processing, recovery, and concurrency capabilities, which allows for substantial
speedup of data retrieval. Non-Volatile defines that once entered into the warehouse, and data
should not change.

3.Explain about the needs and benefits of Data Warehouse?


Needs for Data warehouse:

1. 1) Business User: Business users require a data warehouse to view summarized data
from the past. Since these people are non-technical, the data may be presented to them
in an elementary form.
2. 2) Store historical data: Data Warehouse is required to store the time variable data
from the past. This input is made to be used for various purposes.
3. 3) Make strategic decisions: Some strategies may be depending upon the data in the
data warehouse. So, data warehouse contributes to making strategic decisions.
4. 4) For data consistency and quality: Bringing the data from different sources at a
commonplace, the user can effectively undertake to bring the uniformity and
consistency in data.
5. 5) High response time: Data warehouse has to be ready for somewhat unexpected
loads and types of queries, which demands a significant degree of flexibility and quick
response time.

Benefits of Data Warehouse:

1. Understand business trends and make better forecasting decisions.


2. Data Warehouses are designed to perform well enormous amounts of data.
3. The structure of data warehouses is more accessible for end-users to navigate,
understand, and query.
4. Queries that would be complex in many normalized databases could be easier to build
and maintain in data warehouses.
5. Data warehousing is an efficient method to manage demand for lots of information from
lots of users.
6. Data warehousing provide the capabilities to analyze a large amount of historical data.

4.Explain the Data Warehouse Architecture?

A data warehouse architecture is a method of defining the overall architecture of data


communication processing and presentation that exist for end-clients computing within the
enterprise. Each data warehouse is different, but all are characterized by standard vital
components.

As the warehouse is populated, it must be restructured tables de-normalized, data cleansed of


errors and redundancies and new fields and keys added to reflect the needs to the user for
sorting, combining, and summarizing data.

Data warehouses and their architectures very depending upon the elements of an organization's
situation.

Three common architectures are:

o Data Warehouse Architecture: Basic


o Data Warehouse Architecture: With Staging Area
o Data Warehouse Architecture: With Staging Area and Data Marts
5.Explain the Data Warehouse Architecture:Basics?

Operational System

An operational system is a method used in data warehousing to refer to


a system that is used to process the day-to-day transactions of an organization.

Flat Files

A Flat file system is a system of files in which transactional data is stored, and every
file in the system must have a different name.

Meta Data

A set of data that defines and gives information about other data.

Meta Data used in Data Warehouse for a variety of purpose, including:

Meta Data summarizes necessary information about data, which can make finding and
work with particular instances of data more accessible. For example, author, data build,
and data changed, and file size are examples of very basic document metadata.

Metadata is used to direct a query to the most appropriate data source.

Lightly and highly summarized data

The area of the data warehouse saves all the predefined lightly and highly summarized
(aggregated) data generated by the warehouse manager.
The goals of the summarized information are to speed up query performance. The
summarized record is updated continuously as new information is loaded into the
warehouse.

End-User access Tools

The principal purpose of a data warehouse is to provide information to the business


managers for strategic decision-making. These customers interact with the warehouse
using end-client access tools.

The examples of some of the end-user access tools can be:

o Reporting and Query Tools


o Application Development Tools
o Executive Information Systems Tools
o Online Analytical Processing Tools
o Data Mining Tools

7. Explain the Data Warehouse Architecture:with Staging area?

We must clean and process your operational information before put it into the warehouse.

We can do this programmatically, although data warehouses uses a staging area (A place
where data is processed before entering the warehouse).

A staging area simplifies data cleansing and consolidation for operational method
coming from multiple source systems, especially for enterprise data warehouses
where all relevant data of an enterprise is consolidated .
8. Explain the Data Warehouse Architecture:with Staging area and Data marts?

We may want to customize our warehouse's architecture for multiple groups within our
organization.

We can do this by adding data marts. A data mart is a segment of a data warehouses that can
provided information for reporting and analysis on a section, unit, department or operation in
the company, e.g., sales, payroll, production, etc.

The figure illustrates an example where purchasing, sales, and stocks are separated. In this
example, a financial analyst wants to analyze historical data for purchases and sales or mine
historical information to make predictions about customer behavior.

9.What are the properties of Data warehouse Architecture?

1. Separation: Analytical and transactional processing should be keep apart as


much as possible.
2. Scalability: Hardware and software architectures should be simple to upgrade the
data volume, which has to be managed and processed, and the number of user's
requirements, which have to be met, progressively increase.

3. Extensibility: The architecture should be able to perform new operations and


technologies without redesigning the whole system.

4. Security: Monitoring accesses are necessary because of the strategic data stored in
the data warehouses.

5. Administerability: Data Warehouse management should not be complicated

10.Explain about the types of Data warehouse Architecture?

Single-Tier Architecture

Single-Tier architecture is not periodically used in practice. Its purpose is to minimize the
amount of data stored to reach this goal; it removes data redundancies.

The figure shows the only layer physically available is the source layer. In this method, data
warehouses are virtual. This means that the data warehouse is implemented as a
multidimensional view of operational data created by specific middleware, or an intermediate
processing layer.

Two-Tier Architecture

The requirement for separation plays an essential role in defining the two-tier architecture for
a data warehouse system, as shown in fig:
Three-Tier Architecture
The three-tier architecture consists of the source layer (containing multiple source
system), the reconciled layer and the data warehouse layer (containing both data
warehouses and data marts). The reconciled layer sits between the source data and
data warehouse.

The main advantage of the reconciled layer is that it creates a standard reference data
model for a whole enterprise. At the same time, it separates the problems of source
data extraction and integration from those of data warehouse population. In some
cases, the reconciled layer is also directly used to accomplish better some operational
tasks, such as producing daily reports that cannot be satisfactorily prepared using the
corporate applications or generating data flows to feed external processes periodically
to benefit from cleaning and integration.

This architecture is especially useful for the extensive, enterprise-wide systems. A


disadvantage of this structure is the extra file storage space used through the extra
redundant reconciled layer. It also makes the analytical tools a little further away from
being real-time.
11. What is ETL?

The mechanism of extracting information from source systems and bringing it into the data
warehouse is commonly called ETL, which stands for Extraction, Transformation and
Loading.

The ETL process requires active inputs from various stakeholders, including developers,
analysts, testers, top executives and is technically challenging.

ETL is a recurring method (daily, weekly, monthly) of a Data warehouse system and needs to
be agile, automated, and well documented.

12.How ETL works?


Extraction:

o Extraction is the operation of extracting information from a source system for further
use in a data warehouse environment. This is the first stage of the ETL process.
o One of the most time-consuming tasks in the ETL.
o The source systems might be complicated and poorly documented, and thus
determining which data needs to be extracted can be difficult.

The data has to be extracted several times in a periodic manner to supply all changed
data to the warehouse and keep it up-to-date

CLEANSING:

The cleansing stage is crucial in a data warehouse technique because it is supposed to


improve data quality. The primary data cleansing features found in ETL tools are rectification
and homogenization. They use specific dictionaries to rectify typing mistakes and to recognize
synonyms, as well as rule-based cleansing to enforce domain-specific rules and defines
appropriate associations between values.

The following examples show the essential of data cleaning:

If an enterprise wishes to contact its users or its suppliers, a complete, accurate and up-to-date
list of contact addresses, email addresses and telephone numbers must be available.

If a client or supplier calls, the staff responding should be quickly able to find the person in the
enterprise database, but this need that the caller's name or his/her company name is listed in
the database.

If a user appears in the databases with two or more slightly different names or different account
numbers, it becomes difficult to update the customer's information.

TRANSFORMATION:

Transformation is the core of the reconciliation phase. It converts records from its
operational source format into a particular data warehouse format. If we implement a
three-layer architecture, this phase outputs our reconciled data layer.

The following points must be rectified in this phase:

o Loose texts may hide valuable information. For example, XYZ PVT Ltd does not explicitly
show that this is a Limited Partnership company.
o Different formats can be used for individual data. For example, data can be saved as a
string or as three integers.
Following are the main transformation processes aimed at populating the reconciled
data layer:

o Conversion and normalization that operate on both storage formats and units of
measure to make data uniform.
o Matching that associates equivalent fields in different sources.
o Selection that reduces the number of source fields and records.

Cleansing and Transformation processes are often closely linked in ETL tools.

LOADING:

The Load is the process of writing the data into the target database. During the load
step, it is necessary to ensure that the load is performed correctly and with as little
resources as possible.

Loading can be carried in two ways:

1. Refresh: Data Warehouse data is completely rewritten. This means that older file is
replaced. Refresh is usually used in combination with static extraction to populate a
data warehouse initially.
2. Update: Only those changes applied to source information are added to the Data
Warehouse. An update is typically carried out without deleting or modifying preexisting
data. This method is used in combination with incremental extraction to update data
warehouses regularly.

13. Explain briefly about the OLAP?

Online Analytical Processing (OLAP) consists of a type of software tool that is used for
data analysis for business decisions. OLAP provides an environment to get insights from the
database retrieved from multiple database systems at one time.

OLAP Examples

Any type of Data Warehouse System is an OLAP system. The uses of the OLAP System are
described below.
 Spotify analyzed songs by users to come up with a personalized homepage of their songs
and playlist.
 Netflix movie recommendation system.
Benefits of OLAP Services

 OLAP services help in keeping consistency and calculation.


 We can store planning, analysis, and budgeting for business analytics within one platform.
 OLAP services help in handling large volumes of data, which helps in enterprise-level
business applications.
 OLAP services help in applying security restrictions for data protection.
 OLAP services provide a multidimensional view of data, which helps in applying
operations on data in various ways.

Drawbacks of OLAP Services

 OLAP Services requires professionals to handle the data because of its complex modeling
procedure.
 OLAP services are expensive to implement and maintain in cases when datasets are large.
 We can perform an analysis of data only after extraction and transformation of data in the
case of OLAP which delays the system.
 OLAP services are not efficient for decision-making, as it is updated on a periodic basis.
14. Explain briefly about OLTP?

Online transaction processing provides transaction-oriented applications in a 3-tier


architecture. OLTP administers the day-to-day transactions of an organization.

OLTP Examples

An example considered for OLTP System is ATM Center a person who authenticates first
will receive the amount first and the condition is that the amount to be withdrawn must be
present in the ATM. The uses of the OLTP System are described below.
 ATM center is an OLTP application.
 OLTP handles the ACID properties during data transactions via the application.
 It’s also used for Online banking, Online airline ticket booking, sending a text message,
add a book to the shopping cart.

OLTP vs OLAP

Benefits of OLTP Services

 OLTP services allow users to read, write and delete data operations quickly.
 OLTP services help in increasing users and transactions which helps in real-time access
to data.
 OLTP services help to provide better security by applying multiple security features.
 OLTP services help in making better decision making by providing accurate data or
current data.
 OLTP Services provide Data Integrity, Consistency, and High Availability to the data.
Drawbacks of OLTP Services

 OLTP has limited analysis capability as they are not capable of intending complex
analysis or reporting.
 OLTP has high maintenance costs because of frequent maintenance, backups, and
recovery.
 OLTP Services get hampered in the case whenever there is a hardware failure which leads
to the failure of online transactions.
 OLTP Services many times experience issues such as duplicate or inconsistent data.

You might also like