Big Data Analytics - 7th Sem VTU 2018 Scheme - Class 3
Big Data Analytics - 7th Sem VTU 2018 Scheme - Class 3
(15CS82)
Venugopala Rao A S
Dept. of CSE, SMVITM, Bantakal
Module 3
• Data Warehousing
• A data warehouse (DW) is an organized collection of
integrated, subject oriented databases designed to support
decision support functions.
• DW is organized such a way as to provide clean enterprise-
wide data in a standardized format for reports, queries, and
analysis.
• DW is physically and functionally separate from an operational
and transactional database.
• Creating a DW for analysis and queries demands for
significant investment in time and effort.
• It has to be constantly kept up-to-date for it to be useful.
• DW offers many business and technical benefits.
BDA-15CS82
Module 3
• DW supports business reporting and data mining activities.
• It can facilitate distributed access to up-to-date business
knowledge for departments and functions, thus improving
business efficiency and customer service.
• DW can present a competitive advantage by facilitating
decision making and helping reform business processes.
• DW enables a consolidated view of corporate data, all cleaned
and organized.
• DW thus provides better and timely information.
• It simplifies data access and allows end users to perform
extensive analysis.
• It enhances overall IT performance by not burdening the
operational databases used by Enterprise Resource Planning
(ERP) and other systems.
BDA-15CS82
Module 3
• Case study:
• Indian University of Health
BDA-15CS82
Module 3
• Some requirements for a good DW:
• Subject oriented: To be effective, a DW should be designed
around a subject domain,
• i.e. to help solve a certain category of problems.
• Integrated: The DW should include data from many functions
that can shed light on a particular subject area.
• Thus the organization can benefit from a comprehensive view
of the subject area.
• Time-variant (time series): The data in DW should grow at
daily or other chosen intervals.
• That allows latest comparisons over time.
BDA-15CS82
Module 3
• Nonvolatile: DW should be persistent, that is, it should not be
created on the fly from the operations databases.
• Thus, DW is consistently available for analysis, across the
organization and over time.
• Summarized: DW contains rolled-up data at the right level for
queries and analysis.
• The process of rolling up the data helps create consistent
granularity for effective comparisons.
• It also helps reduces the number of variables or dimensions of
the data to make them more meaningful for the decision
makers.
BDA-15CS82
Module 3
• Not normalized: DW often uses a star schema, which is a
rectangular central table, surrounded by some look-up tables.
• The single table view significantly enhances speed of queries.
• Metadata: Many of the variables in the database are computed from
other variables in the operational database.
• E.g.: total daily sales may be a computed field.
• The method of its calculation for each variable should be effectively
documented.
• Every element in the DW should be sufficiently well-defined.
• Near Real-time and/or right-time (active): DWs should be updated
in near real-time in many high transaction volume industries, such as
airlines.
• The cost of implementing and updating DW in real time could be
discouraging though.
• Another downside of real-time DW is the possibilities of
BDA-15CS82
Module 3
• DW Development Approaches
• There are two approaches to developing DW: top down and
bottom up.
• The top-down approach is to make a comprehensive DW that
covers all the reporting needs of the enterprise.
• The bottom-up approach is to produce small data marts, for the
reporting needs of different departments or functions, as
needed.
• The smaller data marts will eventually align to deliver
comprehensive EDW capabilities.
• The top-down approach provides consistency but takes more
time and resources.
• The bottom-up approach leads to healthy local ownership and
maintainability of data
BDA-15CS82
Module 3
• Difference between Data Mart and Data Warehouse
BDA-15CS82
Module 3
•
BDA-15CS82