Advanced Database Presentation
Advanced Database Presentation
- A data warehouse is the electronic storage of large volumes of information by a business which
is designed for query and analysis instead of transaction processing.
- Data warehousing is the process of transforming data into information and making it available
to users in a timely manner to make a difference.
- This concept started in the late 1980s when IBM worker Paul Murphy and Barry Devlin
developed the Business Data Warehouse. However, the real concept was given by Inmon Bill.
He was considered as a father of data warehouse. He had written about a variety of topics for
building, usage, and maintenance of the warehouse & the Corporate Information Factory.
- It merges information coming from different sources into one comprehensive database.
- By merging all of this information in one place, an organization can analyze its customers more
holistically. This helps to ensure that it has considered all the information available.
There are basic features that define the data in the data warehouse that include subject
orientation, data integration, time-variant, non-volatile collection of data, and data granularity.
All these features helps analysts to do data mining which assists in making informed decisions in
an organization.
1. Subject-oriented
Unlike the operational systems, the data in the data warehouse revolves around subjects of the
enterprise (database normalization). Subject orientation can be really useful for decision making.
Gathering the required objects is called subject oriented.
2. Integration
The data found within the data warehouse is integrated. Since it comes from several operational
systems, all inconsistencies must be removed. Consistencies include naming conventions,
measurement of variables, encoding structures, physical attributes of data, and so forth.
3. Time-variant
While operational systems reflect current values as they support day-to-day operations, data
warehouse data represents data over a long time horizon (up to 10 years) which means it stores
historical data. It is mainly meant for data mining and forecasting, If a user is searching for a
buying pattern of a specific customer, the user needs to look at data on the current and past
purchases.
4. Nonvolatile
The data in the data warehouse is read-only which means it cannot be updated, created, or
deleted.
5. Data granularity
- A data warehouse works as a central repository where information arrives from one or more
data sources. Data flows into a data warehouse from the transactional system and other relational
databases.
-A data warehouse works by organizing data into a schema that describes the layout and type of
data, such as integer, data field, or string. When data is ingested, it is stored in various tables
described by the schema. Query tools use the schema to determine which data tables to access
and analyze.
- This is done to provide greater insight into the performance of a company by comparing the
data consolidated from multiple heterogeneous sources.
- A data warehouse is designed to run queries and analysis on historical data derived from
transactional sources.
-Once the data has been incorporated into the warehouse, it does not change and cannot be
altered since a data warehouse runs analytics on events that have already occurred by focusing
on the changes in data over time. Warehoused data must be stored in a manner that is secure,
reliable, easy to retrieve and easy to manage.
The following are the functions of data warehouse tools and utilities:
The first step is data extraction, which involves gathering large amounts of data from multiple
source points. After the data has been compiled, it goes through data cleaning, the process of
combing through the data for errors and correcting or excluding any errors found. The cleaned-
up data is then converted (data transformation) from a database format to a warehouse format.
The data then goes through the sorting, consolidation, summarization process (data loading) and
stored, this is done for coordination and presentation purposes. Over time, more data is added to
the warehouse as the multiple data sources are updated (refreshing).
Note − Data cleaning and data transformation are important steps in improving the
quality of data and data mining results.
To design an effective and efficient data warehouse, there is need to understand and analyse the
business needs and construct a business analysis framework. Each person has different views
regarding the design of a data warehouse. These views are as follows:
The top-down view - allows the selection of relevant information needed for a data
warehouse.
The data source view - presents the information being captured, stored, and managed by
the operational system.
The data warehouse view - includes the fact tables and dimension tables. It represents the
information stored inside the data warehouse.
The business query view - the view of the data from the viewpoint of the end-user.
Generally a data warehouses adopts a three-tier architecture. Following are the three tiers of the
data warehouse architecture.
Bottom Tier - this is the data warehouse database server. It is the relational database
system. We use the back end tools and utilities to feed data into the bottom tier. These
back end tools and utilities performs the functions mentioned earlier above.
Middle Tier - consists of the OLAP Server
Top-Tier - the front-end client layer. This layer holds the query, reporting, analysis and
data mining tools.
TYPES OF DATA WAREHOUSES
- This is a centralized warehouse where all business information from different sources and
applications are made available. It brings together varied functional areas of an organisation and
brings them together in a unified manner offering a unified approach for organizing and
representing data.
- Once data is stored they can be used for analytic purposes and can be used across the
organization thereby providing decision support service across the enterprise. It enables the
ability to classify data according to the subject and give access according to those divisions.
2. Operational Data Store
- An operational data store (ODS) is a central database that provides a snapshot of the latest data
from multiple transactional systems for operational reporting. It enables organizations to
combine data in its original format from various sources into a single destination to make it
available for business reporting.
- A ODS helps integrating contrasting data from multiple sources so that business operations,
analysis and reporting can be easily carried out and help the business while the process is still in
continuation.
3. Data Mart
- A data mart is a subset of the data warehouse, it focuses on storing data for a particular
functional are and it contains a subset of data that is stored in a data warehouse.
- Data marts reduces the volume of data for data analysis thereby making it easier to implement
since it is just a subset.
- It is cost effective when compared to a complete data warehouse and it is more open to change.
- It is specially designed for a particular line of business, such as sales, finance, sales or finance.
In an independent data mart, data can collect directly from sources.
ADVANTAGES OF A DATA WAREHOUSE
A data warehouse maintains a copy of information from the source transaction systems. This
architectural complexity enables the following:
-Data integration
• Integrates data from multiple sources into a single database and data model. This
enables more congregation of data to single database so a single query engine can
be used to present data in an ODS.
• Integrate data from multiple source systems, enabling a central view across the
enterprise. This benefit is always valuable, but particularly so when the
organization has grown by merger.
• Assists the executive when doing data gathering and analysis. There are decision
support technologies that help utilize the data available in a data warehouse.
These technologies help executives to use the warehouse quickly and effectively
since the information will be readily available in the warehouse.
- Mitigate the problem of database isolation level lock contention in transaction processing
systems caused by attempts to run large, long-running, analysis queries in transaction
processing databases.
- Maintain data history, even if the source transaction systems do not.
- Improve data quality, by providing consistent codes and descriptions, flagging or even
fixing bad data.
- Present the organization's information consistently.
- Provide a single common data model for all data of interest regardless of the data's
source.
- Restructure the data so that it makes sense to the business users.
- Restructure the data so that it delivers excellent query performance, even for complex
analytic queries, without impacting the operational systems.
- Add value to operational business applications, notably customer relationship
management (CRM) systems.
- Make decision–support queries easier to write.
- Organize and disambiguate repetitive data
DISADVANTAGES
Data warehouses are relational databases that act as data analysis tools, aggregating data from
multiple departments of a business into one data store. Data warehouses are typically updated as
an end-of-day batch job, rather than being churned by real time transactional data. Their primary
benefits are giving managers better and timelier data to make strategic decisions for the
company. However, they have some drawbacks as well.
- Cost/Benefit Ratio
A commonly cited disadvantage of data warehousing is the cost/benefit analysis. A data
warehouse is a big IT project, and like many big IT projects, it can suck a lot of IT man hours
and budgetary money to generate a tool that doesn't get used often enough to justify the
implementation expense. This is completely sidestepping the issue of the expense of maintaining
the data warehouse and updating it as the business grows and adapts to the market.
****Point to note****
A data warehouse is not necessarily the same concept as a standard database. A database is a
transactional system that is set to monitor and update real-time data in order to have only the
most recent data available. A data warehouse is programmed to aggregate structured data over a
period of time. For example, a database might only have the most recent address of a customer,
while a data warehouse might have all the addresses that the customer has lived in for the past 10
years.
Businesses might warehouse data for use in exploration and data mining, looking for patterns of
information that will help them improve their business processes. A good data warehousing
system can also make it easier for different departments within a company to access each other's
data.
For example, a data warehouse might allow a company to easily assess the sales team's data and
help to make decisions about how to improve sales or streamline the department. The business
might choose to focus on its customers’ spending habits to better position its products and
increase sales.
With data warehousing, the company can gather historical data of its customers’ spending over
the past, say, 20 years and run analytics on this data. The resulting information could provide
insight into the preferences of its consumers; the time of day, month, or year with greater sales;
or highest spending customer for the year.
Effective data storage and management are also what makes processes, such as initiating travel
reservations and using automated teller machines possible.