Unit 2 - Data Mining and Warehousing - WWW - Rgpvnotes.in
Unit 2 - Data Mining and Warehousing - WWW - Rgpvnotes.in
The revenue we generate from the ads we show on our website and app
funds our services. The generated revenue helps us prepare new notes
and improve the quality of existing study materials, which are
available on our website and mobile app.
If you don't use our website and app directly, it will hurt our revenue,
and we might not be able to run the services and have to close them.
So, it is a humble request for all to stop sharing the study material we
provide on various apps. Please share the website's URL instead.
Downloaded from www.rgpvnotes.in, whatsapp: 8989595022
-------------------------------------------------------------------------------------------------
Unit-II
OLAP Systems: Basic concepts, OLAP queries, Types of OLAP servers, OLAP operations etc.
Data Warehouse Hardware and Operational Design: Security, Backup and Recovery
-------------------------------------------------------------------
OLAP System:
Basic Concepts-
OLAP (Online Analytical Processing) is the technology support the multidimensional view of data
for many Business Intelligence (BI) applications. OLAP provides fast, steady and proficient
access, powerful technology for data discovery, including capabilities to handle complex queries,
analytical calculations, and predictive “what if” scenario planning.
OLAP is a category of software technology that enables analysts, managers and executives to gain
insight into data through fast, consistent, interactive access in a wide variety of possible views of
information that has been transformed from raw data to reflect the real dimensionality of the
enterprise as understood by the user. OLAP enables end-users to perform ad hoc analysis of data
in multiple dimensions, thereby providing the insight and understanding they need for better
decision making.
➢ Access to aggregated data warehouse data as well as to the detail data found in operational
databases.
➢ Advanced data navigation features such as drill-down and roll-up.
➢ Rapid and consistent query response times.
➢ The ability to map end-user requests, expressed in either business or model terms, to the
appropriate data source and then to the proper data access language (usually SQL).
➢ Support for very large databases. As already explained the data warehouse can easily and
quickly grow to multiple gigabytes and even terabytes.
3. Easy-to-Use End-User Interface:
Advanced OLAP features become more useful when access to them is kept simple. OLAP tools
have equipped their sophisticated data extraction and analysis tools with easy-to-use graphical
interfaces. Many of the interface features are “borrowed” from previous generations of data
analysis tools that are already familiar to end users. This familiarity makes OLAP easily accepted
and readily used.
4. Client/Server Architecture:
Conform the system to the principals of Client/server architecture to provide a framework within
which new systems can be designed, developed, and implemented. The client/server environment
enables an OLAP system to be divided into several components that define its architecture. Those
components can then be placed on the same computer, or they can be distributed among several
computers. Thus, OLAP is designed to meet ease-of-use requirements while keeping the system
flexible.
Motivation for using OLAP
I) Understanding and improving sales: For an enterprise that has many products and uses a number
of channels for selling the products, OLAP can assist in finding the most popular products and the
most popular channels. In some cases it may be possible to find the most profitable customers.
II) Understanding and reducing costs of doing business: Improving sales is one aspect of
improving a business, the other aspect is to analyze costs and to control them as much as possible
without affecting sales. OLAP can assist in analyzing the costs associated with sales.
Guidelines for OLAP Implementation
Following are a number of guidelines for successful implementation of OLAP. The guidelines are,
somewhat similar to those presented for data warehouse implementation.
1. Vision: The OLAP team must, in consultation with the users, develop a clear vision for the
OLAP system. This vision including the business objectives should be clearly defined, understood,
and shared by the stakeholders.
2. Senior management support: The OLAP project should be fully supported by the senior
managers and multidimensional view of data. Since a data warehouse may have been developed
already, this should not be difficult.
3. Selecting an OLAP tool: The OLAP team should familiarize themselves with the ROLAP and
MOLAP tools available in the market. Since tools are quite different, careful planning may be
required in selecting a tool that is appropriate for the enterprise. In some situations, a combination
of ROLAP and MOLAP may be most effective.
4. Corporate strategy: The OLAP strategy should fit in with the enterprise strategy and business
objectives. A good fit will result in the OLAP tools being used more widely.
5. Focus on the users: The OLAP project should be focused on the users. Users should, in
consultation with the technical professional, decide what tasks will be done first and what will be
done later. Attempts should be made to provide each user with a tool suitable for that person’s skill
level and information needs. A good GUI user interface should be provided to non-technical users.
The project can only be successful with the full support of the users.
6. Joint management: The OLAP project must be managed by both the IT and business
professionals. Many other people should be involved in supplying ideas. An appropriate committee
structure may be necessary to channel these ideas.
7. Review and adapt: As noted in last chapter, organizations evolve and so must the OLAP systems.
Regular reviews of the project may be required to ensure that the project is meeting the current
needs of the enterprise.
OLTP vs. OLAP
Table 2.1: Difference between OLAP and OLTP
Example-I:
Example-II:
SELECT Time, Location, product, sum(revenue) AS Profit FROM sales GROUP BY ROLLUP
(Time, Location, product);
The Query calculates the standard aggregate values specified in the GROUP BY clause. Then, it
creates progressively higher-level subtotals, moving from right to left through the list of grouping
columns. Finally, it creates a grand total.
OLAP Servers
Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It
allows managers, and analysts to get an insight of the information through fast, consistent, and
interactive access to information.
Types of OLAP Servers
We have four types of OLAP servers −
➢ Relational OLAP (ROLAP)
➢ Multidimensional OLAP (MOLAP)
• In this example, cities New jersey and Lost Angles and rolled up into country USA
• The sales figure of New Jersey and Los Angeles are 440 and 1560 respectively. They
become 2000 after roll-up
• In this aggregation process, data is location hierarchy moves up from city to the country.
• In the roll-up process at least one or more dimensions need to be removed. In this example,
Quarter dimension is removed.
2) Drill-down:
In drill-down data is fragmented into smaller parts. It is the opposite of the rollup process. It can
be done via
• Moving down the concept hierarchy
• Increasing a dimension
3) Slice:
Here, one dimension is selected, and a new sub-cube is created.
Following diagram explain how slice operation performed:
Dice:
This operation is similar to a slice. The difference in dice is to select 2 or more dimensions that
result in the creation of a sub-cube.
4) Pivot
In Pivot, rotate the data axes to provide a substitute presentation of data.
In the following example, the pivot is based on item types.
Network Hardware
Network Architecture
➢ Sufficient bandwidth to supply the data feed and user requirements
Impact to design
➢ User access via WAN – impacts the design of Query Manager
➢ Source system data transfer
➢ Data extractions
Example: Problem of getting the data from the source systems
It may not get the data to the warehouse system early enough to allow it to be loaded, transformed,
processed and backed up within the overnight time window.
Guideline
➢ Ensure that the network architecture and bandwidth are capable of supporting the data
transfer and any data extractions in an acceptable time.
➢ The transfer of data to be loaded must be complete quickly enough to allow the rest of the
overnight processing to complete.
Client Hardware
Client Management
➢ Those responsible for client machine management will need to know the requirements for
that machine to access the data warehouse system.
➢ Details such as the network protocols supported on the server, and the server's Internet
address, will need to be supplied.
➢ If multiple access paths to the server system exist this information needs to be relayed to
those responsible for the client systems.
-During node fall over users may need to access a different machine address.
Client Tools
➢ The tool should not be allowed to affect the basic design of the warehouse itself.
➢ Multiple tools will be used against the data warehouse.
➢ Should be thoroughly tested and trialed to ensure that they are suitable for the users.
➢ Testing of the tools should ideally be performed in parallel with the data warehouse design:
✓ Usability issues to be exposed,
✓ Drive out any requirements that the tool will place on the data warehouse
Disk Technology:
RAID Technology
Redundant Array of Inexpensive Disks
➢ The purpose of RAID technology is to provide resilience against disk failure, so that the
loss of an individual disk does not mean loss of data.
➢ Striping is a technique in which data is spread across multiple disks.
➢ RAID levels 0, 1 and 5 are commercially viable and thus widely available
Table 2.2: RAID Level with Descriptions
a watchdog that avidly and quite tenaciously keeps vigil on your data access points. It can detect
an unwarranted and suspicious attempt at accessing data and generate an alert immediately. This
allows the people responsible for the data warehouse security to stop the intruders dead in their
tracks.
• The user can only back up the database while it is completely closed after a clean shutdown.
• Typically, the only media recovery option is to restore the whole database, which causes
the loss of all transactions since the last backup.
Best Practice-B: Use RMAN
There are many reasons to adopt RMAN. Some of the reasons to integrate RMAN into your backup
and recovery strategy are that it offers:
• Extensive reporting
• Incremental backups
• Downtime free backups
• Backup and restore validation
• Backup and restore optimization
• Easily integrates with media managers
• Block media recovery
• Archive log validation and management
• Corrupt block detection
[Under Case study]
Best Practice C: Use Read-Only Tablespaces
Best Practice D: Plan for NOLOGGING Operation
Best Practice E: Not All Tablespaces are Equally Important
-----------------------------***-----------------------------