NOTES2 Business Intelligence Notes
NOTES2 Business Intelligence Notes
in
DIRECTORATE OF DISTANCE
EDUCATION
BUSINESS
INTELLIGENCE
CONTENT
S
S. No. Descripti
on
1. Business Intelligence: Introduction, Meaning, Purpose and Structure of Business Intelligence
Systems. Understanding Multidimensional Analysis Concepts: Attributes, Hierarchies and
Dimensions in data Analysis. Understanding Dimensional Data Warehouse: Fact Table,
Dimension Tables, Surrogate Keys and alternative Table Structure. What is multi-dimension
OLAP?
2. Understanding OLAP: Fast response, Meta-data based queries, Spread sheet formulas.
Understanding Analysis Services speed and meta-data. Microsoft’s Business intelligence
Platform. Analysis Services Tools. Data Extraction, Transformation and Load. Meaning and Tools
for the same.
3. Creating your First Business Intelligence Project: Creating Data source, Creating Data
view. Modifying the Data view. Creating Dimensions, Time, and Modifying dimensions. Parent-
Child Dimension.
4. Creating Cube: Wizard to Create Cube. Preview of Cube. Adding measure and measure groups
to a cube. Calculated members. Deploying and Browsing a Cube.
5. Advanced Measures and Calculations: Aggregate Functions. Using MDX to retrieve values
from cube. Calculation Scripting. Creation of KPI’s.
6. Advanced Dimensional Design: Creating reference, fact and many to many dimensions. Using
Financial Analysis Cubes. Interacting with a cube. Creating Standard and Drill Down Actions.
7. Retrieving Data from Analysis Services: Creating Perspectives, MDX Queries, Excel with
Analysis Services.
8. Data Mining: Meaning and purpose. Creating data for data mining. Data mining model creation.
Selecting data mining algorithm. Understanding data mining tools. Mapping Mining Structure to
Source Data columns. Using Cube Sources. Configuring Algorithm parameters.
9. Creating Data mining queries and reports: Creation of Prediction queries. Understanding
DMX language.
10 Reporting Tools: Using SQL Server Reporting Services to develop reports for analysis services.
.
Unit 1: Introduction to Business
CONTENTS
Objectives
Introduction
1.1 Meaning of Business Intelligence
1.5 Summary
1.6 Keywords
Objectives
Introduction
Business Intelligence (BI) is a set of ideas, methodologies, processes, architectures, and
technologies that change raw data into significant and useful data for business purpose.
Business Intelligence can handle large amounts of data to help identify and evolve new
opportunities for the business. Making use of these new opportunities and applying a productive
scheme on it can provide a comparable market benefit and long-term stability.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes Business Intelligence (BI) technologies provide chronicled, present and predictive view of
business operations. Common functions of enterprise Intelligence technologies are reporting,
online analytical processing, analytics, data excavation, process excavation, business
performance management, benchmarking, text mining, predictive analytics and prescriptive
analytics.
Example: Suppose we have chronicled data of a Shopping Mart of 3-6 months. Here,
in the data we have different products with their respective specifications. Let us select one of
the products-say Candles. We have three kinds of Candles in this class say Candle A, Candle B,
Candle C. On studying of these data we come to know that sale of Candle C was at peak out of
these three classes. Now on afresh and deep study into these data we got the outcome that the
sale of this Candle C was maximum between the time intervals of 9 am to 11 am. On further
deeper analysis, we came to the conclusion that this specific Candle is the one used in place of
worship.
Now, let’s apply Business Intelligence for this analysis. What an enterprise firm or the
organization can do is, get other material that can be used in church and place them nearby
those candles. Now the customers approaching the Shopping Mart to purchase the candles
for place of worship can also have a look on the other material and may be tempted to
purchase them as well. Now this will surely enhance the sales and hence the income of
Shopping Mart.
Self Assessment
1........................................can handle large amounts of data to help identify and evolve new
opportunities for the business.
2. BI (Business Intelligence) refers to set of techniques which assist in...................., digging
out and.........................best data from the large amount of data to improve conclusion
making.
Did u know? Amalgamations and acquisitions aggregated the difficulty since the companies
integrated completely distinct systems, numerous of which were doing the similar job.
However, businesses shortly identified the analytical value of the data that they had access to.
In fact, as enterprises automated more systems, more data became accessible. However,
collecting these data for analysis was a challenge because of the incompatibilities amidst
systems.
2 LOVELY PROFESSIONAL
Unit 1: Introduction to Business
Notes
!
Caution There was no simple way (and often no way) for these systems to interact.
An infrastructure was required for data exchange, collection, and analysis that could supply
a unified view of an enterprise’s data. The data warehouse evolved to complete this need.
The concept of the data warehouse (Figure 1.1) is a lone scheme that is the repository of all of
the organization’s data (or simply data) in a pattern that can be competently analysed so that
significant accounts can be arranged for administration and other information workers.
Figure 1.1: Data Warehouse Concept
Point of Sale
Gift Registry
Ad Hoc Queries
Inventory
ETL Batch Reports
Data Warehouse
transfers
dashboards
Sales Promotions
Source: http://www.gravic.com/shadowbase/images/uses/datawarehouse.png
• The identical piece of data might reside in the databases of distinct systems in distinct
types. A specific data item might not only be represented in distinct formats, but the
values of this Data piece might be distinct in distinct databases. Which value is the
correct one?
• Data is continually altering. How often should the Data warehouse be revised to
contemplate a sensibly current view?
• The amount of Data is massive. How is it analysed and presented easily so that it is useful?
To meet these needs, a broad range of powerful tools were developed over the years and
became productized. They included:
• Extract, Transform, and Load (ETL) utilities for the moving of data from the diverse
data sources to the common data warehouse.
• Data-mining pushes for complex predetermined analysis and ad hoc queries of the
Data retained in the Data warehouse.
• Reporting tools to provide management employees with the outcomes of the analysis in
very simple to absorb formats.
LOVELY PROFESSIONAL 3
Business Intelligence
Early on, the one common interface that was provided between the disparate systems in an
association was magnetic tape. Tape formats were standardized, and any system could
compose tapes that could be read by other systems. Thus, the first data warehouses were fed
by magnetic tapes prepared by the various systems inside the association. However, that left
the difficulty of data disparity. The data written by the different systems reflected their
Notes The data written to tape by one system often had little relation to the similar data written by another system.
ETL – Extract/Transofrm/Load
Source: http://www.gravic.com/shadowbase/images/uses/etl.png
The transform function is the key to the achievement of this approach. Its job is to request a
series of rules to extracted data so that it is properly formatted for loading into the data
warehouse. An example of transformation rules includes:
• The selection of data to load.
• The translation of encoded items (for example, 1 for male, 2 for female to M, F).
4 LOVELY PROFESSIONAL
Unit 1: Introduction to Business
Source: http://3.bp.blogspot.com/_tutW43y628U/TL2I-JTIFAI/AAAAAAAAAEI/mir1v2EMiTg/
s1600/ETL_Global.jpg
The ETL function permits the consolidation of multiple data sources into a well-structured
database for use in complex analysis. The ETL process is performed occasionally, such as daily,
weekly, or monthly, depending upon the enterprise needs. This method is called offline ETL
because the key database is not relentlessly updated. It is revised on a periodic batch basis.
Though offline ETL serves its purpose well, it has some drawbacks as well:
• The data in the data warehouse is not fresh. It could be weeks old. Though, it is useful for
strategic functions but is not especially adaptable to tactical use.
• The source database typically should be temporary inactive throughout the extract
method. Otherwise, the target database is in an inconsistent state following the load. With
this result, the applications must be shutdown, often for hours.
In order to develop to support real-time business intelligence, the ETL function must be
relentless and non-invasive, which is called online ETL, and is recounted later. In compare to
offline ETL, which supplies data which is not fresh but reliable answers to queries, online ETL
supplies present but varying answers to successive queries since the data that it is using is
constantly being updated to reflect the current state of the business.
LOVELY PROFESSIONAL 5
Business Intelligence
The ETL utilities make data collection from numerous diverse systems practical. Then, the data
needs to be converted into useful information. Some key points to remember:
• Data are easily facts, figures, and text that can be processed by a computer.
Useful data-mining engines were evolved to support complex analysis and ad hoc queries on a
data warehouse’s database. Data mining looks for patterns among hundreds of seemingly
unrelated fields in a large database, patterns that recognize earlier unknown trends. These
trends play a key role in strategic decision making because they disclose localities for process
enhancement.
Example: Data-mining engines are those from SPSS and Oracle which are the
foundation for OLAP (Online Analytical Processing) systems.
The knowledge created by a data-mining engine is not very useful unless it is presented easily
and clearly to those who need it. There are many formats for reporting information and
knowledge results. One of the common techniques for displaying information is the digital
dashboard (shown in Figure 1.4).
Figure 1.4: Digital dashboard
Source: http://www.powerhealthsolutions.com/images/PBR_DigitalDashboard_KPIs.png
6 LOVELY PROFESSIONAL
Unit 1: Introduction to Business
It provides a business manager with the input necessary to push the business towards Notes
success. It presents the client a graphical view of business processes. The client then drills
down the data at will to get more details on a specific process. Today, many versions of
digital dashboards are accessible from a kind of software vendors.
As corporate-wide data warehouses came into use, it was discovered that in many situations a
full-blown data warehouse was overkill for applications. Data marts evolved to solve this
problem. A data mart is a special type of a data warehouse. It is focused on a single subject (or
functional area), such as Sales, Finance, or Marketing. Whereas data warehouses have an
enterprise- wide depth, the information in data marts pertains to a single department. The
primary use for a data mart is Business Intelligence (BI) applications. Implementing a data
mart can be less expensive than implementing a data warehouse, thus making it more practical
for the small business.
Notes A data mart can also be set up in much less time than a data warehouse.
Figure 1.5 shows the relationship between data warehouse and data mart.
Figure 1.5: Relationship between Data Warehouse and Data Mart
Source: http://www.dataprix.net/files/uploads/250image/HEFESTO%20v2_0/data%20mart%20-
%20top%20down.png
Self Assessment
3. An.................................was required for data exchange, collection, and analysis that could
supply a unified view of an enterprise’s data.
4............................................utilities for the moving of data from the diverse data sources to the
common data warehouse.
5. The first data warehouses were fed by...........................prepared by the various systems
inside the association.
LOVELY PROFESSIONAL 7
Business Intelligence
Notes 6. The data fed to the data warehouse from the...........................was converted to a format
significant to the data warehouse.
7. The job of...........................................is to request a series of rules to extracted data so that
it is properly formatted for loading into the data warehouse.
8. The................................function permits the consolidation of multiple data sources into a
well-structured database for use in complex analysis.
9. In compare to offline ETL,.................................supplies present but varying answers to
successive queries.
10.represents a pattern that connects information and usually presents a
high grade of predictability as to what is recounted or what will happen next.
11. A..............................is a special type of a data warehouse focused on a single subject (or
functional area), such as Sales, Finance, or Marketing.
12. The primary use for a data mart is.................................applications.
8 LOVELY PROFESSIONAL
Unit 1: Introduction to Business
Source: www.redbooks.ibm.com/redbooks/pdfs/sg245415.pdf
The two main query and reporting products (in IBM) are the Query Management Facility
(QMF) and Lotus approach. QMF has been used for many years as a host-based query and
reporting tool by DB2 whereas Lotus approach is a desktop relational DBMS that has gained
popularity due to its easy-to-use query and reporting capabilities.
If we talk about IBM structure, its key product in the OLAP marketplace is the DB2 OLAP
Server, which implements three-tier client/server architecture for performing complex data
analysis. The value of the DB2 OLAP server lies in its ability to generate and manage relational
tables that contain multidimensional data.
Information Mining
Intelligent Miner by IBM is one of the few products in the market to support an external API,
allowing resultant data to be collected by other products (for example an OLAP product) for
further analysis.
LOVELY PROFESSIONAL 9
Business Intelligence
Client access to warehouse and operational data from business intelligence tools requires a
client database API.
Data management offers intelligent data partitioning and parallel query and utility
processing of the data.
Example: DB2 for OS/390, DB2 for VM, and DB2 for VSE DB2 Universal Database.
Using Visual Warehouse a data warehouse can be designed and constructed. Tools for
developing data warehouse includes components for defining the relationships between the
source data and warehouse information, transforming source data and managing warehouse
Task Find out the procedure to extract data from each individual database.
maintenance.
Self Assessment
13. The first task BI has to do is to gather the necessary data about the business.
14. Business intelligence system do not supports the latest information technologies.
15. The value of the DB2 OLAP server lies in its ability to generate and manage relational
tables that contain multidimensional data.
Case Study
Business Intelligence Management
Y
ou are working for a sporting goods retail company. Overall sales have been
declining for the last three quarters and management is very much concerned.
Each
specific retail
items. store at
However has
theits own management
overall individual databases
level, onlythat
saleskeep track
figures of sales
for each storefor
a
Contd....
1 LOVELY PROFESSIONAL
Unit 1: Introduction to Business
Notes
and be able to frame them into technical requirements for the BI project. You also need to educate management regarding possibilities and constraints
Questions:
What are basic technical requirements for the BI project?
Explain the procedure to extract data from each individual database.
What role does BI specialist play in management company?
What all factors will lead to reduction of the inventory cost?
1.5 Summary
• Business Intelligence can handle large amounts of data to help identify and evolve
new opportunities for the business.
• BI (Business Intelligence) refers to set of techniques which assist in spotting, digging out
and investigating best data from the large amount of data to improve conclusion making.
• An infrastructure was required for data exchange, collection, and analysis that could
supply a unified view of an enterprise’s data.
• Early on, the one common interface that was provided between the disparate systems in
an association was magnetic tape.
• Databases configured for OLAP allowed complex analytical and ad hoc queries with
rapid execution time.
• The ETL function permits the consolidation of multiple data sources into a well-
structured database for use in complex analysis.
• In order to develop to support real-time business intelligence, the ETL function must be
relentless and non-invasive, which is called online ETL, and is recounted later.
• The ETL utilities make data collection from numerous diverse systems practical.
• Useful data-mining engines were evolved to support complex analysis and ad hoc
queries on a data warehouse’s database.
• There are many formats for reporting information and knowledge results. One of the
common techniques for displaying information is the digital dashboard.
• Business intelligence applications provide integrated business applications, hardware,
software, and consulting services.
LOVELY PROFESSIONAL 1
Business Intelligence
1 LOVELY PROFESSIONAL
Unit 1: Introduction to Business
15. True
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for Decision
Making”. John Wiley & Sons.
David Loshin (2012). “Business Intelligence: The Savvy Manager’s Guide”. Newnes.
http://www.hcltech.com/enterprise-transformation-services/data-
warehousing-and-business-intelligence
http://www.techopedia.com/definition/24170/extract-transform-load-etl
LOVELY PROFESSIONAL 1
Business Intelligence
CONTENTS
Objectives
Introduction
2.1 Dimension Attributes
2.3 Summary
2.4 Keywords
Objectives
Introduction
In statistics and related fields, multidimensional analysis is a data analysis process that groups
data into two or more categories: data dimensions and measurements. To show this, let us take
the case of a football game. A data set which comprises of the number of wins for one cricket
team every year for many years could be categorized into a single dimensional or longitudinal
data set. Another data set which comprises of the number of wins many different cricket groups
inside a year can be under a single dimensional or traverse sectional data set. A single data set
that comprises of the number of wins for diverse cricket teams over numerous years could be
comprised in a two-dimensional data set.
Multi-Dimensional analysis is an Informational analysis on data which takes into account
numerous distinct connections, each of which comprises a dimension. For example, a retail
analyst may want to understand the connections amidst sales by district, by quarter, by
demographic circulation or by product. Multi-dimensional analysis will yield outcomes for
these complex relationships.
1 LOVELY PROFESSIONAL
Unit 2: Multidimensional
Example: Some possible attributes for a product dimension could be the product
code, colour, and size.
If the dimension is defined as a hierarchy, the lower levels of the hierarchy must also have an
attribute that identifies the parent of each member. Information about each dimension is stored
in one or more dimension tables.
Each dimension contains a key attribute. Each attribute is bound to have one or more columns
in a dimension table. The key attribute is the attribute in a dimension that identifies the columns
in the dimension main table that are used in foreign key relationships to the fact table.
!
Caution Typically, the key attribute represents the primary key column or columns in the
dimension table.
An attribute can also be bound to one or more additional columns for a specific task.
Example: An attribute’s Name property determines the name that appears to the
user for each attribute member and this property can be bound to a calculated column in the
data source view.
Table 2.1 shows dimension attribute properties.
Table 2.1: Dimension Attribute Properties
Property Description
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
Custom Rollup Specifies the column that defines a custom rollup formula.
Column
Custom Rollup Specifies the column that contains the properties of a custom rollup
Properties formula.
Column
Default Member Specifies a Multidimensional Expressions (MDX) expression that
defines the default measure for the attribute.
Description Contains the description of the attribute.
Discretization Contains the number of buckets into which to discretize.
Bucket Count
Discretizati Defines the method to use for discretization.
on Method
Estimated Count Specifies the estimated number of members in the attribute. Until
you run the Aggregation Design Wizard, the default value is zero.
Either you can allow the wizard to count the number of records or
you can enter an estimated value. Enter a value manually if you
know the number of members and want to save the time that is
required to query the database for the count. If you are working
with a test subset of your production data, you can use the counts
of your production data so that the aggregation design will be
optimized for the production data instead of the test data.
Grouping A user defined value that provides a hint to client applications on
Behaviour how to group attributes.
ID Contains the unique identifier (ID) of the dimension.
Instance Selection Provides a hint to client applications about how a list of items
should be displayed, based on the expected number of items in the
list. The available options are as follows:
None No hint is provided to the client application. This is the
default value.
Drop Down The number of items is small enough to display in
a drop- down list.
List The number of items is too large for a drop-down list, but
does not require filtering.
Filtered List The number of items is large enough to require
users to filter the items to be displayed.
Mandatory Filter The number of items is so large that the
display must always be filtered.
1 LOVELY PROFESSIONAL
Unit 2: Multidimensional
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
1 LOVELY PROFESSIONAL
Unit 2: Multidimensional
Notes
Usage Describes how an attribute is
used. The available options are
as follows:
Regular The attribute is a regular attribute. This is the default
value.
Key The attribute is a key attribute.
Parent The attribute is a parent attribute.
Value Column Identifies the column that provides the value of the attribute. If
the Name Column element of the attribute is specified, the same
Data Item values are used as default values for the Value Column
element. If the Name Column element of the attribute is not
specified and the Key Columns collection of the attribute contains
a single Key Column element representing a key column with a
string data type, the same Data Item values are used as default
values for the Value Column element.
Source: http://msdn.microsoft.com/en-us/library/ms174919.aspx
Self Assessment
Source: http://oracle-bi.siebelunleashed.com/wp-content/uploads/2011/11/Time-Dim-Hierarchy.jpg
LOVELY PROFESSIONAL 1
Business Intelligence
Level-based: This type of hierarchy consists of an ordered set of two or more levels.
Example: A time hierarchy might have three levels for Year, Quarter, and Month.
Level-based hierarchies can also contain parent-child relationships. This type of dimension
hierarchy levels allow to perform aggregate navigation and configure level-based measure
calculations.
Figure 2.2: Level based Hierarchy Example
Source: http://gerardnico.com/wiki/_media/dat/obiee/bi_server/design/dimension/
obiee_multiple_dimension_hierarchie.jpg
2 LOVELY PROFESSIONAL
Unit 2: Multidimensional
Notes Also it supports special type of level-based dimension for unbalanced and Skip-level
hierarchy. It also supports time dimension to provide special functionality for
modelling time series data.
Parent-child: A parent-child hierarchy is a hierarchy in a standard dimension that
contains a parent attribute. A parent attribute describes a self-referencing
relationship, or self-join, within a dimension main table. It is actually value-based
hierarchy. It consists of values that define the hierarchy in a parent-child
relationship (Figure 2.3).
Source: http://3d.recoil.org/nojavascript/Images/Parent-Child.gif
Example: A time hierarchy might be having current month data at the day
level, the previous month’s data at the month level, and the previous 10 year’s
data at the quarter level.
Example: In India, Delhi city does not belong to another state (it belongs to
Delhi as a state itself).
What matters is that users can still navigate from the country level (India) to Delhi
(city level) and below without the need for a state level.
User-defined: These are user-defined hierarchies of attributes that are used in
service of Microsoft SQL Server to arrange the members of a dimension into
hierarchical structures and provide navigation paths in a form of cube.
For example, the Table 2.2 defines a dimension table for a time dimension.
LOVELY PROFESSIONAL 2
Business Intelligence
Source: http://msdn.microsoft.com/en-us/library/ms174935.aspx
Did u know? The Year, Quarter, and Month attributes are used to construct a user-defined
hierarchy, named Calendar, in the time dimension.
The relationship between the levels and members of the Calendar dimension is shown in
Figure 2.4.
Figure 2.4: Relationship between the Levels and Members of the Calendar Dimension
Source: http://i.msdn.microsoft.com/dynimg/IC165224.gif
Notes Any hierarchy other than the default two-level attribute hierarchy is called a user- defined hierarchy.
2 LOVELY PROFESSIONAL
Unit 2: Multidimensional
9. Skip-level is hierarchies of attributes that are used in service of Microsoft SQL Server to
arrange the members of a dimension into hierarchical structures.
10. User-defined is a hierarchy in which certain members do not have values for certain
higher levels are known as skip-level hierarchy.
11. Ragged Hierarchy consists of values that define the hierarchy in a parent-child relationship.
O
ne of the strengths of Oracle Data Mining is the ability to mine star
schemas with minimal effort. Star schemas are commonly used in
relational databases, and they often contain rich data with interesting
patterns. While dimension tables
may contain interesting demographics, fact tables will often contain user
behaviour, such as phone usage or purchase patterns. Both of these aspects -
demographics and usage patterns - can provide insight into behaviour.
Churn is a critical problem in the telecommunications industry, and companies
go to great lengths to reduce the churn of their customer base. One case
study describes a telecommunications scenario involving understanding, and
identification of, churn, where the underlying data is present in a star schema.
That case study is a good example for demonstrating just how natural it is for
Oracle Data Mining to analyse a star schema, so it will be used as the basis for
this series of posts.
The case study schema includes four tables: CUSTOMERS, SERVICES,
REVENUES, and CDR_T. The CUSTOMERS table contains one row per customer,
as does the SERVICES table, and both contain a customer id that can be used
to join the tables together. Most data mining tools are capable of handling this
type of data, where one row of input corresponds to one case for mining. The
other two tables have multiple rows for each customer. The CDR_T (call data
records) table contains multiple records for each customer which captures
calling behaviour. In the case study, this information is already pre-aggregated
by type of call (peak, international, etc.) per month, but the information may
also be available at a finer level of granularity. The REVENUES table contains
the revenue per customer on a monthly basis for a five month history, so there
are up to five rows per customer. Capturing the information in the CDR_T and
REVENUES table to help predict churn for a single customer requires collapsing
all of this fact table information into a single “case” per customer. Most tools
will require pivoting the data into columns, which has the drawbacks of
densifying data as well as pivoting data beyond column count limitations. The
data in a fact table is often stored in sparse form (this case study aggregates it
to a denser form, but it need not be this way for other mining activities), and
keeping it in sparse form is highly desirable.
LOVELY PROFESSIONAL 2
Business Intelligence
Contd....
2 LOVELY PROFESSIONAL
Unit 2: Multidimensional
For fact table data that has a much larger number of interesting groups (such as per- Notes
product sales information of a large retailer), retaining the sparse format becomes
critical to avoid densification of such high cardinality information. Oracle Data
Mining algorithms are designed to interpret missing entries in a sparse fact table
appropriately, enabling increased performance and simpler transformation
processing.
Some steps in the referenced case study are not completely defined (in my opinion),
and in those situations I will take my best guess as to the intended objective. This
approximation is sufficient since the intent of this series of posts is to show the power
and flexibility of Oracle Data Mining on a real-world scenario rather than to match
the case study letter-for- letter.
The following files support reproduction of the results in this series of posts:
telcoloadproc.plb - Obfuscated SQL which creates the procedure that can generate data
and populate the tables - all data is generated, and patterns are injected to make it
interesting and “real-world” like
telcoprep.sql - A SQL create view statement corresponding to the data preparation
steps from part 2 of this series
telcomodel.sql - A SQL script corresponding to the steps from part 3 of this series
In order to prepare a schema that can run the above SQL, a user must be created with the
following privileges: create table, create view, create mining model, and create procedure
(for telcoloadproc), as well as any other privs as needed for the database user (e.g., create
session).Once the schema is prepared, telcoddl.sql and telcoloadproc.plb can be run to
create the empty tables and the procedure for loading data. The procedure that is created
is named telco_load, and it takes one optional argument - the number of customers
(default 10000). The results from parts 2 and 3 of this series correspond to loading 10,000
customers.
The sample code in these posts has been tested against an 11gR2 database. Many new
features have been added in each release, so some of the referenced routines and syntax
are not available in older releases; however, similar functionality can be achieved with
10g. The following modified scripts can be used with 10g (tested with 10gR2):
telcoprep_10g.sql - A SQL create view statement corresponding to the data preparation
steps from part 2 of this series, including substitution for the 11g PIVOT syntax and
inclusion of manual data preparation for nested columns.
telcomodel_10g.sql - A SQL script corresponding to the steps from part 3 of this series,
including substitution of the Generalized Linear Model algorithm for 10g Support Vector
Machine, manual data preparation leveraging the transformation package, use of
dbms_data_mining.apply instead of 10gR2 built-in data mining scoring functions, explicit
commit of settings prior to build, and removal of the EXPLAIN routine from the script
flow.
In addition, the create mining model privilege is not available in 10g.
1. Handling missing values for call data records: The CDR_T table records the
number of phone minutes used by a customer per month and per call type (tariff).
For example, the table may contain one record corresponding to the number of peak
(call type) minutes in January for a specific customer, and another record associated
with international calls in March for the same customer. This table is likely to be
fairly dense (most type-month combinations for a given customer will be present)
Contd....
LOVELY PROFESSIONAL 2
Business Intelligence
Notes
due to the coarse level of aggregation, but there may be some missing values.
Missing entries may occur for a number of reasons: the customer made no calls
of a particular type in a particular month, the customer switched providers
during the timeframe, or perhaps there is a data entry problem. In the first
situation, the correct interpretation of a missing entry would be to assume that
the number of minutes for the type-month combination is zero. In the other
situations, it is not appropriate to assume zero, but rather derive some
representative value to replace the missing entries. The referenced case study
takes the latter approach. The data is segmented by customer and call type, and
within a given customer-call type combination, an average number of minutes is
computed and used as a replacement value.
In SQL, we need to generate additional rows for the missing entries and
populate those rows with appropriate values. To generate the missing rows,
Oracle’s partition outer join feature is a perfect fit.
select cust_id, cdre.tariff, cdre.month, mins
from cdr_t cdr partition by (cust_id) right outer join
(select distinct tariff, month from cdr_t) cdre
on (cdr.month = cdre.month and cdr.tariff = cdre.tariff);
I have chosen to use a distinct on the CDR_T table to generate the set of values, but
a more rigorous and performant (but less compact) approach would be to explicitly
list the tariff-month combinations in the cdre inlined subquery rather than go
directly against the CDR_T table itself.
Now that the missing rows are generated, we need to replace the missing value
entries with representative values as computed on a per-customer-call type basis.
Oracle’s analytic functions are a great match for this step.
select cust_id, tariff, month,
nvl(mins, round(avg(mins) over (partition by cust_id, tariff))) mins
from (<prev query>);
We can use the avg function, and specify the partition by feature of the over clause
to generate an average within each customer-call type group. The nvl function will
replace the missing values with the tailored, computed averages.
2. Transposing Call Data Records: The next transformation step in the case study
involves transposing the data in CDR_T from a multiple row per customer format to
a single row per customer by generating new columns for all of the tariff-month
combinations. While this is feasible with a small set of combinations, it will be
problematic when addressing items with higher cardinality. Oracle Data Mining
does not need to transpose the data. Instead, the data is combined using Oracle’s
object-relational technology so that it can remain in its natural, multi-row format.
Oracle Data Mining has introduced two data types to capture such data -
DM_NESTED_NUMERICALS and DM_NESTED_CATEGORICALS.
In addition, the case study suggests adding an attribute which contains the total
number of minutes per call type for a customer (summed across all months).
Oracle’s rollup syntax is useful for generating aggregates at different levels of
granularity.
select cust_id,
cast(collect(dm_nested_numerical(tariff||’-’||nvl(month,’ALL’),mins)) as
dm_nested_numericals) mins_per_tariff_mon from
(select cust_id, tariff, month, sum(mins) mins
Contd....
2 LOVELY PROFESSIONAL
Unit 2: Multidimensional
Once the data is generated by the inner query, there is an outer group by on cust_id
with the COLLECT operation. The purpose of this step is to generate an output of
one row per customer, but each row contains an entry of type
DM_NESTED_NUMERICALS. This entry is a collection of pairs that capture the
number of minutes per tariff-month combination.
4. Creating Derived Attributes: The final transformation step in the case study is to
generate some additional derived attributes, and connect everything together so
that each customer is composed of a single entity that includes all of the attributes
that have been identified to this point.
The PIVOT operation is used to generate named columns that can be easily
combined with arithmetic operations. Binning and filtering steps, as identified in the
case study, are included in the above SQL.
The query can execute in parallel on SMPs, as well as MPPs using Oracle’s RAC
technology. The data can be directly fed to Oracle Data Mining without having to
extract it from the database, materialize copies of any parts of the underlying tables,
or pivot data that is in a naturally multi-row format.
Questions:
1. How the missing values were handled for call data records?
LOVELY PROFESSIONAL 2
Business Intelligence
2.4 Keywords
Hierarchy: A hierarchy is a set of parent-child relationships between attributes within a
dimension.
Key attribute: The key attribute is the attribute in a dimension that identifies the columns in the
dimension main table that are used in foreign key relationships to the fact table.
Level-based: This type of hierarchy consists of an ordered set of two or more levels.
8. Explain the relationship between the levels and members of the calendar dimension.
2 LOVELY PROFESSIONAL
Unit 2: Multidimensional
5. True 6. True
7. False 8. False
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for Decision
Making”. John Wiley & Sons.
David Loshin (2012). “Business Intelligence: The Savvy Manager’s Guide”. Newnes.
LOVELY PROFESSIONAL 2
Business Intelligence
CONTENTS
Objectives
Introduction
3.1 Dimensional Model
3.5.1 MOLAP
3.5.2 ROLAP
3.5.3 HOLAP
3.6 Summary
3.7 Keywords
Objectives
Introduction
Dimensions are a common way of analysing data. Dimension model comprises of a fact
table and numerous dimensional tables and is used for assessing summarized data.
Dimensional data modelling is the preferred modelling technique in a BI environment.
Knowing the basics of data warehousing and dimensions helps you design a better data
warehouse that fits your reporting
3 LOVELY PROFESSIONAL
Unit 3: Dimensional Data
needs. This unit on data warehousing dimensions explains the importance of dimensions and Notes
dimension granularity and stresses the importance of flattening hierarchies—with the goal
being to make data more accessible and useful to users. It also focuses on fact and dimension
table.
Example: For example, Product, Region and Time are the axes of enquiry of the Sales
detail.
One such enquiry could be a scenario where the user might require to see the Sales (in dollars)
for a specific item in a market over a specific time span of time. In this case, we are calculating
the fact (Sales) over three dimensions (Product, Region and Time). Thus we can say that
dimensions give different views of the facts. They give structure to the otherwise unstructured
facts.
It typically contains the attributes for the SQL answer set. Figure 3.1 shows an example of
dimensional model.
Source: www.oedewaldt.com/movies/dimensional%20modeling.pptý
LOVELY PROFESSIONAL 2
Business Intelligence
!
Caution Facts are the measurements associated with fact table records at fact table
granularity.
The Figure 3.2 displays how Sales detail table is connected in a One-to-Many relationships with
other dimension tables.
Figure 3.2: Sales Details Table One-to-many Relationship
Source: http://2.bp.blogspot.com/-JR3HxkgK6w0/Ti_PS_Egw3I/AAAAAAAAAMQ/
tOdQBrMG3FU/s1600/Star_Model.JPG
• Additive - Measures that can be added across any dimensions are additive measure.
• Semi Additive - Measures that can be added across only some dimensions are semi
additive.
• Non Additive - Measures that cannot be added across any dimension are non-additive.
3 LOVELY PROFESSIONAL
Unit 3: Dimensional Data
• Transactional: A transactional table is the most basic and fundamental type of fact
table. The grain associated with a transactional fact table is usually specified as one row
per line in a transaction, e.g., every line on a receipt represents a transaction.
• Periodic Snapshots: It takes a picture of the moment, where the moment could be
anything like performance summary of a salesman over the previous 3 months. A periodic
snapshot table is dependent on the transactional table.
Example: The processing of an order where an order moves through specific steps until
it is completed.
As steps towards fulfilling the order are completed, the row which is associated with
it is updated in the fact table. This type of table often has multiple date columns,
each representing a complete step in the process. Therefore, it’s important to have an
entry in the date dimension that represents an unknown date, as many of the milestone
completion time are unknown at the time the row is created.
Self Assessment
4. Measures that can be added across only some dimensions are ..............................
Example: The Client dimension can contain attributes like C_No., Area, State,
Country etc.
Did u know? In a dimensional table, columns can be used to categorize the information
into hierarchical levels.
LOVELY PROFESSIONAL 3
Business Intelligence
Notes For example, a dimension table for stores in the StandardMart sample database includes the
following columns:
Table 3.1: Sample Dimension Table
Column Description
store_country Specifies the country or region in which the store is located. This is the
country level of the hierarchy.
store_state Specifies the state in which the store is located. This is the state level
of the hierarchy.
store_city Specifies the city or province in which the store is located. This is the
city level of the hierarchy.
store_id Specifies the individual store. This is the lowest level of the hierarchy.
This field contains the primary key of the store dimension table and is
used to join the dimension table to the fact table.
store_name Specifies the name of the store. The values in this column are used to
identify the store to users in a readable form.
Source: http://msdn.microsoft.com/en-us/library/aa905979(v=sql.80).aspx
Self Assessment
Example: Say for the employee ‘Emp12 the Business unit changes from B1 to B2.
Now, if you use the natural primary key ‘Emp12 for your employees within your data
warehouse then everything would be allocated to Business unit ‘B22 even what actually belongs
to ‘B1.’
If you use surrogate keys, you could create on the other day a new record for the Employee
‘Emp12 in your Employee Dimension with a new surrogate key.
Figure 3.3: Surrogate Key Example
Source: http://mahaveersingh.files.wordpress.com/2012/05/surrogate_key_blog_banner1.jpg
3 LOVELY PROFESSIONAL
Unit 3: Dimensional Data
This way, in your fact table, you have your old data (i.e. before the day you added) with the SID Notes
of the Employee ‘Emp12 >> ‘B1.’ All new data (i.e. after the day you added) would take the
SID of the employee ‘Emp12 >> ‘B2.’
• Immutability: Surrogate keys do not change while the row exists. Thus applications
cannot misplace their reference in the database.
In these cases, usually a new attribute should be added to the natural key (for example, an
old_company column). In the case of a surrogate key, only the table that characterizes the
surrogate key must be altered. But in the case of natural keys, all tables that use the
natural key will have to change.
• Uniformity: When every table has a uniform surrogate key, some tasks can be
easily automated by composing the code in a table-independent way.
Example: The keys that are intended to be used in some column of some table might
be designed to “look differently from” those that are intended to be used in another column or
table, thereby simplifying the detection of application errors in which the keys have been
misplaced.
But surrogate keys also come with some disadvantages. The values of surrogate keys have
no relationship with the real world meaning of the data held in a row. Therefore over usage
of surrogate keys lead to the problem of disassociation and creates unnecessary ETL burden
and performance degradation.
Query optimization also becomes difficult when one disassociates the surrogate key with the
natural key. This is because when surrogate key takes the place of primary key, unique index is
applied on that column. And any query based on natural key identifier leads to full table scan as
that query cannot take the advantage of unique index on the surrogate key.
LOVELY PROFESSIONAL 3
Business Intelligence
Notes
!
Caution Every fact record must have a related record in every dimension table
used with that particular fact table.
Shared Dimensions: To maintain consistency dimension tables that are
shared are created. These tables are used by all components and data marts
in the data warehouse.
Auxiliary Table
This table is created with the SQL statements CREATE AUXILIARY TABLE and is used
to hold the data for a column that is defined in a base table.
Base Table
The most common type of table is base table. You can create a base table with the
SQL CREATE TABLE statement. All programs and users that refer to this type of
table refer to the same description of the table and to the same instance of the
table.
Clone Table
A table that is structurally identical to a base table is known as clone table. You can
create a clone table by using an ALTER TABLE statement for the base table that
includes an ADD CLONE clause.
Empty Table
History Table
A history table is used by Database to store historical versions of rows from the
associated system period temporal table.
Materialized query tables are useful for complex queries that run on large amounts
Notes They are commonly used in data warehousing and business intelligence applications.
of data.
Result Table
A table that contains a set of rows that a database selects or generates, directly or
indirectly, from one or more base tables in response to an SQL statement is known
as result table. A result table is not an object that you can define using a CREATE
3 LOVELY PROFESSIONAL
Unit 3: Dimensional Data
statement.
LOVELY PROFESSIONAL 3
Business Intelligence
A temporal table is a table that records the period of time when a row is valid.
A table that is defined by the SQL statement CREATE GLOBAL TEMPORARY TABLE or
DECLARE GLOBAL TEMPORARY TABLE is temporary table. It is used to hold data
temporarily.
XML Table
It is a special table that holds only XML data. When you create a table with an XML
column, database implicitly creates an XML table space and an XML table to store
the XML data.
Self Assessment
Did u know? In the OLAP world, there are mainly two different types:
Multidimensional OLAP (MOLAP) and Relational OLAP (ROLAP). Hybrid OLAP
(HOLAP) is combination of MOLAP and ROLAP.
3.5.1 MOLAP
3 LOVELY PROFESSIONAL
Unit 3: Dimensional Data
Source: http://www.executionmih.com/dipm_images/ZCA-MOLAP.GIF
This method stores the data in multi-dimensional arrays which is different from the two
dimensional relational structure.
Advantages:
• MOLAP cubes are built for fast data retrieval and are thus optimal for slicing operations.
Disadvantages:
• MOLAP is limited in the amount of data it can handle because all the calculations are
performed when the cube is built.
• Cube technology generally do not already exist in the organization, therefore, to adopt
MOLAP technology, chances are additional investments in the form of human and capital
is needed.
3.5.2 ROLAP
This methodology depends on manipulating the data stored in the relational database. There
are detail level values in relational data warehouse.
Advantages:
• ROLAP can leverage functionalities inherent in the relational database as they sit on top
of the relational database.
Disadvantages:
• In ROLAP the performance can be slow. As it is known that ROLAP report is essentially
a SQL query on the relational database, the query time can be long if the underlying data
size is large thus the performance of same can be slow.
LOVELY PROFESSIONAL 3
Business Intelligence
• ROLAP can be limited by SQL functionalities. As, ROLAP technology mainly relies on Notes
SQL statements and SQL statements do not fit all needs (like it’s not easy to do complex
queries in SQL), thus what ROLAP can do is traditionally limited by what SQL can do.
3.5.3 HOLAP
HOLAP technologies combine the advantages of MOLAP and ROLAP. The first product to
provide HOLAP storage was Holos but with time the technology also became available in
other commercial products such as Microsoft Analysis Services (MAS), Oracle Database OLAP
Option, MicroStrategy etc.
Self Assessment
21. can leverage functionalities inherent in the relational database as they sit
on top of the relational database.
Case Study
Lolopop: Automated Data Warehouse
T
he essential concept of a data warehouse is to provide the ability to gather data into
optimized databases without regard for the generating applications or platforms.
Data warehousing can be formally defined as “the coordinated, architected, and
periodic copying of data from various sources into an environment optimized for analytical
and informational processing”.
The Challenge
Meaningful analysis of data requires us to unite information from many sources in many forms, including: images; text; audio/video recordings; datab
New sources of information may be needed periodically and some elements of information may be one time only artefacts.
A data warehouse system designed for analysis must be capable of assimilating these data elements from many disparate sources into a common form
Contd....
3 LOVELY PROFESSIONAL
Unit 3: Dimensional Data
Notes
the accuracy of the data against its original source of authority is imperative. Any such
system must also be able to: apply policy and procedure for comparing information from
multiple sources to select the most accurate source for a data element; correct data
elements as needed; and check inconsistencies amongst the data. It must accomplish this
while maintaining a complete data history of every element before and after every change
with attribution of the change to person, time and place. It must be possible to apply
policy or procedure within specific periods of time by processing date or event data to
assure comparability of data within a calendar or a processing time horizon. When data
originates from a source where different policies and procedures are applied, it must be
possible to reapply new policies and procedures. Where quality of transcription is low
qualifying the data through verification or sampling against original source documents
and media is required. Finally, it must be possible to recreate the exact state of all data at
any date by processing time horizon or by event horizon.
The analytical system applied to a data warehouse must be applicable to all data and
combinations of data. It must take into account whether sufficient data exists at the
necessary quality level to make conclusions at the desired significance level. Where
possible it must facilitate remediation of data from original primary source(s) of
authority.
When new data is acquired from new sources, it must be possible to input and register the
data automatically. Processing must be flexible enough to process these new sources
according to their own unique requirements and yet consistently apply policy and
procedure so that data from new sources is comparable to existing data.
When decisions are made to change the way data is processed, edited, or how policy and
procedure is applied, it must be possible to exactly determine the point in time that this
change was made. It must be possible to apply old policies and procedures for comparison
to old analyses, and new policy and procedure for new analyses.
The Lolopop partners served as principals in a data warehouse effort with objectives
that are shared by most users of data warehouses. During business analysis and
requirements gathering phase, we found that high quality was cited as the number one
objective. Many other objectives were actually quality objectives, as well. Based on our
experiences, Lolopop defines the generalized objectives in order of importance as:
Quality information to Create data and/or combine with other data sources
In this case, only about one in eight events could be used for analysis across databases.
Stakeholders said that reporting of the same data from the same incoming information
varied wildly when re-reported at a later date or when it came from another organization’s
analysis of the same data. Frequently the data in computer databases was demonstrably
not contained in the original documents from which they were transcribed. Conflicting
applications of policy and procedure by departments with different objectives, prejudices
and perspectives were applied inconsistently without recording the changes or their
sources, leaving the data for any given event a slave to who last interpreted it.
Here, the data was processed in time period batches. In some instances, it could take up to
four years to finalize a data period. Organizations requiring data for analysis simply went
to the reporting source and got their own copies for analysis, entirely bypassing the
official data warehouse and analytical sources.
Contd....
LOVELY PROFESSIONAL 3
Business Intelligence
Notes
Consistent relating of information
We found that reporting of data was not reproducible and the reasons for
differences in reporting were not retrievable, undermining confidence in the
data, analysis and reporting. One may essentially summarize these objectives
as quality challenges that require a basic systems engineering approach for
resolution.
Questions:
Source: http://www.lolopop.net/Lolopop.DWStudy.pdf
4 LOVELY PROFESSIONAL
Unit 3: Dimensional Data
• Various types of measure in a fact table are: Additive, Semi Additive, Non-Additive.
• There are basically three types of fact tables: Transactional, Periodic snapshots and
accumulating snapshots.
• Dimension tables consist of attributes that describe fact records in the fact table.
• A surrogate key in a database is a unique identifier for either an entity in the modelled
world or an object in the database.
• Attributes that uniquely recognize an entity might change over the time, which might
lead to invalidation of the suitability of the compound keys.
• But surrogate keys also come with some disadvantages. The values of surrogate keys
have no relationship with the real world meaning of the data held in a row.
• Referential integrity must be maintained between all dimension tables and the fact table.
• The most common type of table is base table. You can create a base table with the SQL
CREATE TABLE statement.
• A table that contains a set of rows that a database selects or generates, directly or
indirectly, from one or more base tables in response to an SQL statement is known as
result table.
• OLAP stands for On-Line Analytical Processing. In computing, OLAP is an approach to
answering multi-dimensional analytical (MDA) queries swiftly.
• In MOLAP, data is stored in a multidimensional cube. It fulfils the requirements for an
analytic application, where you require to access only summarized level of data.
• HOLAP technologies combine the advantages of MOLAP and ROLAP.
3.7 Keywords
Accumulating Snapshots: In this type of fact table the activity of a process is shown such
that it has a well-defined beginning and end.
Auxiliary Table: This table is created with the SQL statements CREATE AUXILIARY
TABLE and is used to hold the data for a column that is defined in a base table.
Dimension Tables: Dimension tables consist of attributes that describe fact records in the
fact table.
Dimensional Model: Dimensional Modelling (DM) is the name of a set of techniques and
concepts used in data warehouse design. It is considered to be different from entity-relationship
modelling (ER).
Empty Table: It is a table with zero rows is an empty table.
E-R Model: In software engineering, an Entity-relationship model (ER model) is a data model
for describing a database in an abstract way.
LOVELY PROFESSIONAL 4
Business Intelligence
Fact Table: Fact table generally represent a process or reporting environment that is of value Notes
to the organization.
HOLAP: HOLAP (Hybrid Online Analytical Processing) is a combination of ROLAP
(Relational OLAP) and MOLAP (Multidimensional OLAP) which are other possible
implementations of OLAP.
Multidimensional Online Analytical Processing (MOLAP): This is the more
traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The
storage is not in the relational database, but in proprietary formats.
Result Table: A table that contains a set of rows that a database selects or generates, directly
or indirectly, from one or more base tables in response to an SQL statement is known as result
table.
ROLAP: This methodology relies on manipulating the data stored in the relational database to
give the appearance of traditional OLAP's slicing and dicing functionality.
Surrogate Key: A surrogate key in a database is a unique identifier for either an entity in the
modelled world or an object in the database.
Temporal Table: A temporal table is a table that records the period of time when a row
is valid.
Transactional Table: The grain associated with a transactional fact table is usually
specified as one row per line in a transaction.
XML Table: It is a special table that holds only XML data.
7. “Dimension tables consist of attributes that describe fact records in the fact table”. Discuss.
8. Define the concept of surrogate key. Also write down the advantages and disadvantages.
4 LOVELY PROFESSIONAL
Unit 3: Dimensional Data
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for
Decision
Making”. John Wiley & Sons.
David Loshin (2012). “Business Intelligence: The Savvy Manager’s Guide”. Newnes.
LOVELY PROFESSIONAL 4
Unit 4: Understanding
Unit 4: Understanding
Notes
OLAP
CONTENTS
Objectives
Introduction
4.1 Basic Concepts of OLAP
4.7 Metadata
4.8 Summary
4.9 Keywords
Objectives
Introduction
Online Analytical Processing (OLAP) is a technology that is used to create decision support
software. OLAP enables application users to quickly analyse information that has been
LOVELY PROFESSIONAL 4
Business Intelligence
Notes summarized into multidimensional views and hierarchies. By summarizing predicted queries
into multidimensional views prior to run time, OLAP tools provide the benefit of increased
performance over traditional database access tools. Most of the resource-intensive calculation
that is required to summarize the data is done before a query is submitted. This unit on OLAP
explains the concepts and advantages of OLAP, spreadsheet formulas. It also covers study of
metadata.
Source: http://www.esri.com/news/arcuser/0206/graphics/olap_1.jpg
4 LOVELY PROFESSIONAL
Unit 4: Understanding
• Hierarchy: A logical tree structure that organizes the members of a dimension such
that each member has one parent member and none or more child members. A child
member is a member in the next lower level in a hierarchy that is exactly related to the
current member.
Example: In a time hierarchy containing the grades Quarterly, Monthly, and Daily,
June is a child member of Quarter2. A parent is a member in the next higher level in a hierarchy
that is exactly related to the current member. For example, in a time hierarchy that contains the
grades Quarterly, Monthly, and Daily, Quarter1 is the parent of January month.
• Level: Within a hierarchy, data can be organized into smaller and higher levels of detail,
such as Year, Quarter, Month, Week and Day level in case of time hierarchy.
Figure 4.2 shows relationship among OLAP components.
Figure 4.2: Relationship among OLAP Components
Source: http://t3.gstatic.com/images?q=tbn:ANd9GcQmlV7tN_ufWESigi-6Txks5pftS4HO-
C7o7iVrHyv4UbrgPF_PJA
(a) One major advantage of OLAP is consistency of information and computed results.
No issue how much or how quick data is processed through OLAP programs or servers,
the reporting result is offered in a reliable production, so analysts and executives
habitually understand what to gaze for and where.
LOVELY PROFESSIONAL 4
Business Intelligence
Notes
Did u know? This is especially helpful when matching data from previous reports to data
present in new ones and projected future ones. It avoids the long discussion about who
has the correct data.
(b) “What if” scenarios are some of the most well liked uses of OLAP programs and are made
eminently more possible by multidimensional processing.
(c) Another advantage of multidimensional data presentation is that it allows a supervisor to
drag down data from an OLAP database in very broad terms. In other words, reporting
can be as easy as comparing a couple of lines of data in one column of a spreadsheet or as
complex as viewing all aspects of mountain of data.
(d) OLAP is a technology that can be distributed to many users using a variety of platforms.
(e) Also, multidimensional presentation can create an understanding of connections not
before realized.
(f) OLAP creates a single stage for all the information and business requirements; budgeting,
forecasting, describing and analysis.
(g) Last but not least, the learning curve to use OLAP is negligible. The most utilised
interface to analyse data retained in OLAP technology is the well renowned.
Self Assessment
Hyperion Essbase consistently delivers very quick query response times that make an
iterative environment for analytic queries possible. OLAP user’s queries are neither
predictable nor repairable and the results of one query often frame the obligations of the
next. In this natural environment, answers must be forthcoming in seconds— not minutes or
hours — or analysts will cut short the analysis process to meet administration deadlines.
!
Caution To be productive, an analysis session should be interactive and keep pace with the
analyst’s speed of consideration.
4 LOVELY PROFESSIONAL
Unit 4: Understanding
With Hyperion Essbase, most users obtain responses to their queries in a fraction of a second. Notes
Even the most complex queries take only a couple of seconds. In audited OLAP benchmark
results, Hyperion Essbase processed more than 6,800 complex queries per minute on a four-
processor server — an average answer time of just 0.00876 seconds per query.
While query tools are becoming progressively complicated, their presentation is still limited
by the answer time of the data source. It consistently provides very quick answer by permitting
designers to optimize performance founded upon an application’s unique obligations for query
presentation, calculation complexity, assessment window (the allowance of time accessible to
load and assess the application), user concurrency and computer disk utilization. Hyperion
Essbase accomplishes this flexibility through three calculation choices: precalculate, calc on the
go by plane and calc on the go by plane and store. Together, these three assessment schemes let
designers maximize flexibility, capacity and performance.
Self Assessment
8. OLAP user’s queries are neither predictable nor repairable and the results of one query
often frame the obligations of the next.
9. To be productive, an analysis session should not be interactive and keep pace with the
analyst’s speed of consideration.
Source: http://i.msdn.microsoft.com/dynimg/IC534518.png
LOVELY PROFESSIONAL 4
Business Intelligence
Notes A metadata piece is a name/value pair that describes characteristics like author, name, and
rating. A route sign contains one or more metadata block titles. It can also identify a metadata
piece inside a metadata block. The following path expression comprises an App1 block which
comprises an IFD block which comprises the metadata piece:
/app1/ifd/{ushort=18249}
The Figure 4.3 illustrates the makeup of an example JPEG image with four origin metadata
blocks: App0, App1, XMP, and an unidentified block. Each emphasised item notes the type
of metadata (block or piece) and the query expression utilised to retrieve the data.
To get access to metadata, a completely qualified query expression must be used in most cases.
So, what is a completely qualified query expression? A completely qualified expression is a
string that begins with the path feature slash (/), pursued by a navigation route to a metadata
block or a specific metadata piece. Each step within the navigation path is separated by a slash,
forming an expression for accessing a metadata block or a metadata item.
Example: The following is a completely trained query expression that accesses the
Microsoft photograph ranking in an IFD block that is nested in an App1 block:
/app1/ifd/{ushort=18249}
When this expression is parsed, it first explore for the App1 metadata block inside the image’s
metadata. If the App1 block is found, it continues it does seek looking for the nested IFD
metadata block. If the IFD block is discovered, it then examines for the specific metadata piece.
Notes If at any time a metadata block or piece is not found, it aborts the query.
The simplest metadata query expression is an expression to get a query reader/writer for an
exact metadata block. Getting a query reader/writer enables you to direct subsequent queries
exactly to a nested metadata block without considering with its parent block. A block selection
query expression is a navigation route to the yearned metadata block. For example, in the
preceding example there are five metadata blocks, two of which are nested in other metadata
blocks. The following are the path expressions to each metadata block in the JPEG example:
/app0
/app1
/app1/ifd
/app1/ifd/exif
/xmp
When you use a query reader/writer to execute a query, it comes back a new query reader/
writer that services queries inside the scope of the particular metadata block. For instance, if
you execute the query “/app1”, a new query book reader is got and queries to the new book
reader are relation to the App1 block. This means that the query “/ifd” is legitimate for the
new reader because the App1 block contains an IFD block. However, “/xmp” would not
work because this App1 block does not comprise an XMP metadata block.
For the JPEG example, the following indexed route expression can be utilised:
/[0]app1/[0]ifd
4 LOVELY PROFESSIONAL
Unit 4: Understanding
In the query language, all indexes start at zero. In the previous expression, the first zero queries Notes
for the first App1 block and the second zero queries the first nested IFD block. Index notation
can still be utilised even when multiple blocks of the identical kind do not live. If the example
JPEG includes a second App1 block with an embedded IFD block, the expression
“/[1]app1/ifd” would be utilised to access the second App1 block.
The following expression accesses the Microsoft Photo ranking in the XMP block:
/xmp/xmp: Rating
The “xmp:” part of the expression is a schema identifier. XMP is an extensible benchmark and
allows third party entities to publish their own schemas which indicate how to shop certain
metadata items.
The following data types are accepted by the query dialect:
char
uchar
short
ushort
long
ulong
int
uint
longlong
ride high
twice
str
wstr
guid
bool
The query dialect is not case sensitive and treats all individual features as lowercase. However,
some metadata formats (such as XMP) are case sensitive. When employed with a case-sensitive
metadata format, use the backslash (\) feature when you want to identify an uppercase feature.
The following table supplies some example expressions and descriptions of their
interpretations by the query dialect parser.
Table 4.1: Examples of Expression and Descriptions
Expression Descriptio
n
ifd/xmp/exif:Author Corresponds to the following navigation path: IFD block -> XMP
block -> "Author" property in the "Exif" schema.
/[1]ifd/[0]xmp/exif:Author Same as the first item in this table except that the [#] prefix describes
which item to navigate in event of a name collision.
/ifd/{ushort=700}/Author Same as the first item in this table except that it uses a data
expression to reference the XMP block instead of the block name
"xmp" (XMP block is embedded under the unsigned short tag
identifier 700). Also, the "Author" property does not specify a
schema. The query parser will try to match the property across all
schemas and return the first match.
/ifd/xmp Provides a navigation path to a metadata block. If the block is
found, a new metadata reader/writer is returned.
/[*]tEXt/Keyword Gets or sets the Keyword property for a PNG chunk. Because the
PNG metadata specification allows for multiple chunks of a
particular type, the [*] notation gets/sets the data PNG chunk with
the appropriate property. Per the PNG specification, no two chunks
can have the same properties.
Source: http://msdn.microsoft.com/en-us/library/windows/desktop/ee719796(v=vs.85).aspx
LOVELY PROFESSIONAL 4
Business Intelligence
14. The query dialect is not case sensitive and treats all individual features as ........................
1. Aggregations
2. Matrix Calculations
5. Procedural Calculations
Aggregations are simply addition. Adding days into weeks, weeks into months, and so
on or individual customers into customer groups, families etc.
Matrix calculations are like what you would do in a spreadsheet, with arbitrarily complex
relationships (+,-,*,/, sum, count, and more) both across the row and down the columns.
Calculations like variance, variance %, total, count, inventory balances, etc. for example.
Cross dimensional calculations are like what you would do in linked spreadsheets, or in
a multidimensional spreadsheet. In these, computed results can refer to numbers in another
sheet in the cube, over distinction dimensions or different hierarchies, not just on the same
spreadsheet you are on. Calculation examples include product share, market share, etc.
OLAP Aware Functions are like spreadsheet functions that have been extended to
understand OLAP. These encompass statistics, forecasting purposes, economic purposes, time
calculations. Just like a spreadsheet, most OLAP servers have some hundred OLAP aware
formulas provided.
5 LOVELY PROFESSIONAL
Unit 4: Understanding
16. are like spreadsheet functions that have been extended to understand
OLAP.
Notes OLAP analysis offers users primary access to their data warehouses in lieu of more sophisticated investigation functionality needed by power us
Most OLAP vendors supply Multi-dimensional OLAP (MOLAP) solutions to perform this kind
of analysis, but restricted cube capacity has burdened numerous IT managers by establishing
and organizing hundreds of overlapping cube databases to hold stride with growing
organizational claims. To get the most out of BI applications, however, OLAP investigation
desires to continue beyond the benchmark MOLAP cube and supply full speed-of-thought
interactivity against the whole data warehouse.
Analysis Services carries OLAP by letting you design, create, and manage multidimensional
organisations that contain data aggregated from other data sources, such as relational databases.
For data mining applications, Analysis Services permits you design, create, and visualize data
mining models that are assembled from other data sources by using a broad kind of industry-
standard data excavation algorithms. Figure 4.4 shows analysis of services concepts and
Figure 4.4: Analysis of Services Concepts and Objects
objects.
Source: http://i.msdn.microsoft.com/dynimg/IC6000.gif
LOVELY PROFESSIONAL 5
Business Intelligence
The term metadata refers to “data about data”. The term is ambiguous, as it is utilised for two
basically different notions (types). Structural metadata is about the creation and specification of
data organisations and is more correctly called “data about the containers of data”; descriptive
metadata, on the other hand, is about one-by-one examples of submission data, the data content.
In this case, a helpful recount would be “data about data content” or “content about content”
thus metacontent.
Metadata (metacontent) are generally found in the business card catalogues of libraries. As data
has become progressively digital, metadata are furthermore used to recount digital data utilising
metadata measures exact to a particular control and respect. By describing the contents and
context of data documents, the value of the original data/files is greatly bigger.
Metadata (metacontent) are characterized as the data providing information about one or more
aspects of data, such as:
Example: A digital image may include metadata that describe how large the picture is,
the hue deepness, the image resolution, when the image was created, and other data.
A text document’s metadata may comprise information about how long the document is, who
the author is, when the article was written, and a short abstract of the article.
As such, metadata can be stored and organised in a database, often called a Metadata registry or
Metadata repository. However, without context and a reference, it might be impossible to
identify metadata just by looking at them.
Example: By itself, a database consisting of several figures, all 10 digits long could be
the results of computed results or a list of numbers to close into an equation - without any other
context, the figures themselves can be seen as the data.
Did u know? The term “metadata” was coined in 1968 by Philip Bagley, in his book
“Extension of programming dialect notions”.
Following Figure 4.5 show an example of metadata.
5 LOVELY PROFESSIONAL
Unit 4: Understanding
Source: http://t3.gstatic.com/images?q=tbn:ANd9GcR1zN_FO3Vk1jHWkuCKQAjxB3O0O41
SbnLZLnNdYmhmg4HWyLM_ZQ
4.7.1Types of Metadata
Sample elements of it are structuring tags such as title page, table of contents, chapters,
parts etc.
• Administrative metadata provides information to help manage a resource, such as
when and how it was created, file type and other technical information, and who can
access it.
Task Give examples for the use of SAS metadata DATA step functions to identify and track metadata that describes data libraries and users.
LOVELY PROFESSIONAL 5
Business Intelligence
❖ Identifying assets;
2. Organizing e-resources
❖ Organizing links to resources based on audience or theme.
3. Facilitating interoperability
❖ Using characterized metadata designs, shared protocols, and crosswalks between
designs, resources across the network can be sought more seamlessly.
4. Digital identification
❖ Components for standard number like ISBN
One major advantage of metadata is that redundancy and inconsistencies can be identified more
easily as the metadata is centralized.
Example: The system catalogue and data dictionary can help or guide developers at the
conceptual or structural phase or for further maintenance.
Self Assessment
17. By describing the contents and context of data documents, the value of the original data/
files is smaller.
18. Metadata (metacontent) are generally found in the business card catalogues of libraries.
19. Metadata can be stored and organised in a database, often called a Metadata registry or
Metadata repository.
20. Administrative metadata provides information to help manage a resource.
5 LOVELY PROFESSIONAL
Unit 4: Understanding
Notes
H
ow a Financial Services Company Developed a Performance Report for
Clients, Saved $200K and Sold $167,000,000 of Equity in Just 9
Months.
This story is based on our success at ABC, Inc., the leading outsource
collection agency for government debts in the US. We are using the
pseudonym “ABC” to protect their confidentiality. Please allow us to recount
how we applied OLAP technology to develop a flexible performance report for
ABC clients, saved $200K in accounting software expenses and helped ABC
sell some equity for $167,000,000 in just 9 months.
Like any other financial services company, ABC must provide regular reports
that measure its performance to its clients. ABC measures its performance
with what they call their CARE report. The recovery percentages that appear in
the CARE report are the primary measure of performance for ABC clients. ABC
first deployed their CARE report via a 40- page C program. But, the CARE
report for all clients was taking over 24 hours to process and the resulting 500-
page report was inflexible. There was no way to quickly focus in on a single
client or client contract and there was no way to change the level of detail.
There was also no way to further analyse the results, e.g. by loan type, so they
could discern what portions of their business are most lucrative. And there was
no convenient way to validate or understand a sum by examining the detail
records that it represents. What they needed was CARE information delivered
in the form of an Excel pivot table.
Merrill Eastman, ex-CEO of Bestfoods and then acting CEO of ABC, suggested
that we give Online Analytical Processing (OLAP) a try. Our first assignment
was to transform the old CARE report into an OLAP cube. OLAP looked like the
answer because it pre-computes numeric aggregations for the cross-product
of all relevant dimensions so that summary information for any combination of
dimensions can be displayed on demand. If you are familiar with Excel, it
suffices to say that OLAP transforms a relational database into a pivot table.
There are a number of OLAP software alternatives out there, but we quickly
settled on SQL Server Analysis Services because:
1. ABC already owned Microsoft SQL Server licenses and appreciated its
ease of use and administration.
2. Microsoft has bundled Analysis Services with every copy of SQL Server
since 1998. So, ABC didn’t have to buy anything to give it a try.
3. SQL Server Analysis Services became the OLAP market leader in 2003.
4. SQL Server OLAP Services is tightly coupled with MS Excel. Like most
other companies, ABC uses Excel exclusively for all financial reports and
analysis.
Developing the OLAP CARE report proceeded slowly at first because it was
difficult to reach consensus on CARE Report specifications. Analysis Services is
easy to use, but it was still very difficult to figure out how to get the content of
the old CARE report out of an OLAP cube. The major challenges we learned to
overcome included:
• How to export 80M facts and dimension rows from Informix to SQL Server in less
than 4 hours?
LOVELY PROFESSIONAL 5
Business Intelligence
Contd....
5 LOVELY PROFESSIONAL
Unit 4: Understanding
Notes
• How to transform exported information into a SQL Server data mart with no
referential integrity errors?
• How to compute distinct counts within the cube that have a different granularity
than the basic revenue facts?
• How to map the same facts to multiple members within the same dimension?
• What ragged hierarchies should be used as dimensions of the cube?
• How to support drillthrough to facts so that cube aggregates can be validated
and understood?
• How to tie CARE cube aggregates to the General Ledger so that data integrity could
be validated?
It took about 8 weeks to deliver the first CARE cube. A few weeks later, we delivered a
sister cube that provided more comprehensive recovery analysis. By then, ABC was a
believer in SQL Server OLAP Services and the rush was on to expand its use. We trained
three ABC software engineers to build cubes and they set about developing General
Ledger, General Ledger Budget, Payroll, Collector Performance and Revenue Forecasting
cubes in parallel.
The General Ledger cubes delivered immediate benefits. ABC was using OSAS
accounting software. They were not satisfied with the reports that OSAS produced, but
was reluctant to invest an estimated $200K to acquire a new package and train accounting
personnel to use it. Instead, they purchased an ODBC driver to export OSAS data and we
built a cube to generate their reports. Today, their Balance Sheets and Profit and Loss
Statements are implemented in an account rollup dimension. They can drill down from a
few lines at the top to any level of detail. The drill-down feature is particularly useful in
the GL Budget cube. If budget variances are detected at the highest levels, they just
double-click on their OLAP pivot table to drill down until they discover the roots of the
variance. The OLAP accounting reports reduced the time required to close ABC’s books
by 5 days. As a result, they can make critical business decisions that much faster.
Meanwhile, ABC’s impressive performance attracted outside investors. A venture capital
firm became the primary suitor and a team of business analysts set out to understand
ABC’s business. After exhaustive due diligence, the VCs decided to invest $167,000,000.
They did so because ABC has a rock solid business. But, the deal might not have
happened without the OLAP cubes. The OLAP cubes answered due diligence questions
more quickly and in much more detail than the VC had seen in previous deals. The Billing
cube that we developed at the VC’s request was fundamental to their belief that future
revenues would grow fast enough to support the necessary ROI.
Question:
4.8 Summary
• OLAP is a database expertise that has been optimized for querying and describing, rather
than of processing transactions.
• OLAP user’s queries are neither predictable nor repairable and the results of one query
often frame the obligations of the next.
LOVELY PROFESSIONAL 5
Business Intelligence
• A metadata block can contain individual metadata pieces such as an author or creation Notes
time and additional metadata blocks.
• To get access to metadata, a completely qualified query expression must be used in most
cases.
• The simplest metadata query expression is an expression to get a query reader/writer for
an exact metadata block.
• OLAP delivers the simplest form of analysis, permitting any person to slice and dice
interrelated subsets of data or “cubes” with the click of a mouse.
• Metadata (metacontent) are generally found in the business card catalogues of libraries.
• Sample elements include technical data such as scanner type and model, resolution, bit
depth, colour space, file format, compression, light source, owner, copyright date etc.
• One major advantage of metadata is that redundancy and inconsistencies can be identified
more easily as the metadata is centralized.
4.9 Keywords
Calculated member: It is a member of a dimension whose worth is calculated at run time
by utilizing an expression.
Cube: It is a data structure that aggregates the measures by the levels and hierarchies of each of
the dimensions that you want to analyse.
Descriptive metadata: Descriptive metadata describes a resource for purposes such as
discovery and identification.
Dimension: A set of one or more organized hierarchies of levels in a cube that a user
understands and benefits as the base for data analysis.
Hierarchy: A logical tree structure that organizes the members of a dimension.
Level: Within a hierarchy, data can be organized into smaller and higher levels of detail.
Measure: It is a set of values in a cube that are founded on a column in the cube's detail table
and that are generally numeric types.
Member: An item in a hierarchy comprising one or more occurrences of data.
4. “Hyperion Essbase consistently delivers very quick query response times that make an
iterative environment for analytic queries possible”. Elaborate.
5. What are the metadata based queries?
5 LOVELY PROFESSIONAL
Unit 4: Understanding
1. Averages 2. Cube
3. Measure 4. Calculated member
5. Hierarchy 6. OLAP
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for
Decision
Making”. John Wiley & Sons.
David Loshin (2012). “Business Intelligence: The Savvy Manager’s Guide”. Newnes.
LOVELY PROFESSIONAL 5
Unit 5: Microsoft Business Intelligence
CONTENTS
Objectives
Introducti
on
5.1.1 Power
5.1.2 Usability
5.4.3 Partnerships
5.7.1 Tools
5.8 Summary
5.9 Keywords
LOVELY PROFESSIONAL 5
Business Intelligence
Notes Objectives
After studying this unit, you will be able to:
Introduction
Microsoft Business Intelligence solutions leverage your existing technology
investments in
.NET, SQL Server and Office to develop rich integrated reporting and analytics
experiences that empower users to gain access to accurate, up-to-date information
for better, more relevant decision making. If you are seeking comprehensive,
server-based reporting service designed to help you author, manage, and deliver
both paper-based and interactive Web-based reports, the Microsoft Business
Intelligence platform is an ideal service to provide fast, reliable reporting services.
This unit provides an introduction to Business Intelligence and to Microsoft BI
after discussing factors affecting BI and its key benefit areas.
5.1.1 Power
5.1.2 Usability
Though some mighty tools are in the market, lack of client friendliness is a barrier
to adoption that has often frustrated business users and kept them from utilizing
the full worth of analytic applications. If the users of an analytical submission are
business staff, picking a solution that is easy to discover and use key to achieving
affirmative comes back. Office XP devices such as Excel are high on the usability
scale because they enable business users to support BI purposes through the use
of desktop tools they are already familiar with.
6 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
Server 2000 for the Oracle Customer kit to link the two databases and alleviate the
task of managing the two databases.
LOVELY PROFESSIONAL 6
Business Intelligence
Improved Information organization and access from BI can influence the bottom
line by improving some enterprise undertakings. Companies using Microsoft’s BI
devices can make it cheaper for users to access data, simplify the task of data
mining and analysis, and enable employees to make enterprise decisions that
decrease charges or improve profitability.
Some components of the Microsoft BI stage can lower the costs of accessing
information by allowing users to efficiently generate queries and reports on their
own.
6 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
Self Assessment
LOVELY PROFESSIONAL 6
Business Intelligence
6 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
!
Caution In supplement, storage forms should support the circulation of data
across both and data forms should support clear or near-clear access to data,
while it’s retained.
!
Caution Business intelligence platforms should provide OLAP support inside
their databases, OLAP functionality, interfaces to OLAP functionality, and
OLAP construct and manage capabilities.
Notes Platform should include data mining functionality that boasts a range of algorithms that can operate on data wareh
• Your company can decrease the risk of non-compliance and financial disasters
• You can competently change your business into a proactive conclusion making business
• You can accomplish greater compliance with government and regulatory guidelines
LOVELY PROFESSIONAL 6
Business Intelligence
Notes • You both as an external or interior client will achieve much quicker difficulty
explaining and conclusion making ability at all grades strategic, operational
and tactical
• You get the right information at the right time to facilitate and expedite key
decisions
5.4.3 Partnerships
6 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
firm feels that this platform is core to its database scheme and an integral
constituent of snare. Control of the whole platform from designing, design,
development, and product marketing perspectives is absolutely vital in alignment
to supply consistency, integration, and timely technology delivery for both
Microsoft’s customers and its partners. However, partnerships are critical to
Microsoft and to its business
LOVELY PROFESSIONAL 6
Business Intelligence
intelligence platform. The firm actually values partnerships to create a large Notes
groundwork of focused business intelligence tools and applications that supports
its platform. These partnerships simplify and accelerate adoption of the platform
and make the platform’s assets more effortlessly accessible. This is a perfect
partnering approach.
Notes If we compare Microsoft’s approach with IBM’s, in Microsoft it owns its business intelligence platform whereas in
Packaging and pricing distance Microsoft’s business intelligence platform from the
platforms of Oracle, IBM, and Hyperion. For the processor-based permit charge of
$19,999 per processor for SQL Server Enterprise Edition, you get the whole
business intelligence platform. OLAP, data mining, and build and organise
capabilities are included as database characteristics.
Source: http://www.microsoftbiconsultant.com/images/MS-SQL-2008-R2-BI.jpg
Oracle charges $40,000 per processor just for the Enterprise Edition of its
6 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
relational data. OLAP, data mining, and build and organise capabilities are all
individually cost and packaged features
LOVELY PROFESSIONAL 6
Business Intelligence
Notes of this Enterprise Edition of the firm’s database. IBM allegations $25,000 per
processor just for the Enterprise Server Edition of its relational database with
included but basic build and manage capabilities. OLAP, data mining, and
sophisticated build and manage are all additional. Hyperion allegations $28,000
per processor just for OLAP with packaged construct and manage capabilities.
Figure 5.1 depicts BI model in Microsoft SQL Server 2008 R2.
Oracle is a software business with two foremost lines of business: databases and
applications. The present flagship proposing of the firm’s database business is
Oracle9i. This is an object/ relational database administration scheme designed
and positioned to support all types of Internet-based applications. Oracle9i
integrates what Oracle terms a “complete and integrated infrastructure for
building business intelligence applications.” So, Oracle’s business intelligence
platform strategy to supply a comprehensive business understanding platform built
on and integrated inside its flagship database system.
Partnerships
From a packaging perspective, Oracle boasts little bundling. All the constituents of
its business intelligence platform are individually bundled and cost and the
construct and manage components have separately cost and bundled sub-
7 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
LOVELY PROFESSIONAL 7
Business Intelligence
them up and Oracle’s business intelligence platform is at smallest five times higher Notes
in price than Microsoft’ business understanding platform.
Figure 5.2: Oracle Business Intelligence Discoverer Dashboard Example
Source: http://docs.oracle.com/cd/B14099_19/core.1012/b13994/img/dashboard.gif
From a business perspective, IBM has three businesses: hardware, software, and
consulting services. The programs business has four constituents: WebSphere
programs, DB2 data administration programs, Lotus (collaboration) programs, and
Tivoli (system administration) programs.
Notes Business intelligence is one of two IBM-provided solutions of DB2 data management programs. (The other solution
IBM’s strategy for business intelligence is to help companies know their customers
and to use that information to gain comparable advantages, to maximize revenue,
and minimize cost. Business intelligence is implicitly targeted at all of IBM’s
markets. The firm makes no explicit distinction in the positioning of business
understanding for the types or dimensions of businesses or for the kinds of users
inside those businesses that can use its business understanding platform. It’s a
one dimensions aligns all approach.
7 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
In mid-2001, Hyperion changed its business strategy, shifting its focus from
business intelligence software infrastructure and applications to business
presentation management programs answers. The firm states its target “is to be
the premier global provider of business presentation management solutions.”
These solutions are created to automate the business performance administration
method of scheme setting, modelling, planning, performance supervising,
describing and investigation. Their target is to improve your profitability.
The expertise platform for Hyperion’s performance administration solutions is
Essbase, its venerable OLAP Server Within the new strategy, Hyperion states that
Essbase technology will be enhanced in the areas of ease of use, ease of
application development, interoperability of business performance management
applications, scalability, and tighter integration with relational data sources.
Missing from Hyperion’s enhancement strategy for Essbase are localities such as
analytic technology and business intelligence platform technologies. Essbase is
evolving away from a general reason OLAP facility and in the direction of a
platform for carrying a very exact type of business understanding application.
Hyperion’s new scheme and target can play quite well in today’s business
understanding market where companies’ top main concerns are to do business
more competently and efficiently. Business performance administration is a classic
business intelligence submission. It requires a comprehensive business
intelligence platform as its base in order to assemble the data that comprises
business presentation, organize that data, investigate it, present it, and use
investigation outcomes to advance business presentation.
Hyperion Essbase has a pricing model for with two elements: a per server charge
and a per entitled client charge. Currently, per server charge is $28,000 per
processor and per entitled client charge is $1,500. Essbase packaging includes the
OLAP server, administrative tools, and construct and organise tools.
!
Caution Essbase installations most commonly use relational data warehouses as
the data causes for Essbase cubes.
Interface
s
Microsoft Oracle IBM Hyperion
Relational SQL and SQL and SQL and DB2 Not applicable
interfaces Transact/SQL PL/SQL SQL
ODBC and JDBC ODBC and ODBC and JDBC
OLE DB JDBC
ADO
ADO.NET
OLAP MDX OLAP DML Essbase API Essbase API
Interfaces DSO Java OLAP API
Pivot Table Service SQL and
Contd....
LOVELY PROFESSIONAL 7
Business Intelligence
7 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
Data mining DSO Oracle9i Data Intelligent Miner Not applicable Notes
interfaces Pivot Table Service Mining API C++
Wizards (Java) SQL
Visual tools
DB2 OLAP Miner
Essbase API
Source: http://www.element61.be/assets/microsoft-business-intelligence-&-performance-management-
platform_small.jpg
Self Assessment
LOVELY PROFESSIONAL 7
Business Intelligence
• Start Page
When you first open Business Intelligence Development Studio, the Start page
appears in the centre of the Business Intelligence Development Studio user
interface. This page exhibits a register of lately revised projects; help topics, World
Wide Web sites and other resources; links to product and updated information
from Microsoft; and by default, a register of items from the RSS feed of the
specified report.
To display a page other than the Start Page at startup, click Options on the tools
menu, expand the Environment node, and in the At Startup menu, select the item
Notes To discover more about the Start page, click within the Start Page and press F1.
to display.
Business Intelligence Development Studio includes a set of windows for all stages
of solution development and project administration.
LOVELY PROFESSIONAL 7
Business Intelligence
The Figure 5.4 displays the windows in Business Intelligence Development Studio Notes
with the default configuration.
Figure 5.4: Business Intelligence Development Studio
with Default Configuration
Source: http://i.msdn.microsoft.com/dynimg/IC146944.gif
• Solution Explorer
• Properties Window
• Designer Window
• Toolbox Window
Solution Explorer
You can manage all the different tasks in a solution from a single window, Solution
Explorer. The Solution Explorer view presents the active solution as a logical
container for one or more projects, and includes all the pieces associated with the
projects. You can open project pieces for modification and present other
administration jobs directly from this view.
In Solution Explorer, you can create empty solutions and then add new or existing
tasks to the solution. If you create a new project without first creating a solution,
Business Intelligence Development Studio automatically creates the solution too.
When the solution includes tasks, the tree view includes nodes for project-specific
objects. For example, the analysis Services project includes a Dimensions node,
the Integration Services project includes a Packages node, and the Report form
Notes To access solution Explorer, click solution Explorer on the View menu.
7 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
The Properties Window menus are the properties of an object. You use this window
to view and change the properties of objects, such as packages, that are open in
editors and designers. You can furthermore use the Properties window to edit and
view document, task, and solution properties.
Notes To access the Properties window, click Properties Window on the view menu.
Designer Window
The Designer window is the tool window in which you create or modify business
intelligence objects. The designer provides both a code view and a design view of
an object. When you open an object in a project, the object opens inside a
Notes The Designer window is not available until you add a project to a solution and open an ob
The Toolbox window displays a variety of items for use in business intelligence
tasks. The tabs and items available in the Toolbox change depending on the
designer or editor currently in use. The Toolbox window habitually displays the
General tab, and may furthermore display tabs such as Control Flow Items,
The default menus that emerge in Business Intelligence Development Studio are
equal to those in Visual Studio. When you first open Business Intelligence
Development Studio, before you change the environment, open a solution, or open
any projects, Business Intelligence Development Studio includes the following
menus:
• File
• Edit
• View
• Tools
• Window
• Community
• Help
LOVELY PROFESSIONAL 7
Business Intelligence
technical
File Menu support, send
response to
The choices on the file menu support file management. When you first open Microsoft, access
Business Intelligence Development Studio, but before you have created a new groups and
project or opened an existing project, some choices are unavailable. These choices connect to the
become available only when you start to work in the context of a solution, or open developer centre.
a project within a solution.
Edit Menu
The choices on the Edit menu support editing of text and code in documents. This
menu supplies commands such as undo and redo; find and replace; enable and
manage bookmarks. When you first open Business Intelligence Development
Studio, before you have created a new project or opened an existing project, some
choices are unavailable. Depending on the project type, some menu options many
not be available.
Example: The Undo and Redo options are not supported in Integration
Services projects.
View Menu
The choices on the view menu help you manage the client interface of Business
Intelligence Development Studio. This menu and its submenus supply the choices
to open the diverse windows, toolbox, explorers, and browsers.
Tools Menu
• Choose the toolbars to display in the user interface and arrange the order of
commands
• Set the options that apply to the overall development environment, solutions
and projects, source control, debugging, and designers and editors
Window Menu
The choices on the Window menu organise the behaviour of windows, explorers,
and browsers in Business Intelligence Development Studio.
Community Menu
The choices on the Community menu lets you ask questions of other users and of
8 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
Notes
LOVELY PROFESSIONAL 8
Business Intelligence
When you first open Business Intelligence Development Studio the Toolbar
includes only the Menu Bar toolbar and only a couple of icons that are accessible
on the Menu Bar toolbar. To customize the Toolbar, click Customize on the Tools
menu, and then select additional toolbars to display, or change options for the
toolbar appearance.
In Business Intelligence Development Studio you can add projects of the following
types:
• Analysis Services projects, for creating analytic objects
• Integration Services projects, for creating ETL packages
• Report Model projects, for creating report models
• Report Server projects, for creating reports
You can configure the Business Intelligence Development Studio environment with
collection of backgrounds customized for SQL Server business intelligence
development by choosing the Business Intelligence backgrounds collection. Use
import and Export settings on the tools menu to reset all your backgrounds based
on the Business Intelligence Settings collection or to trade only the categories of
Business Intelligence backgrounds that you choose.
8 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
To configure one-by-one choices for the environment and tools, click Options on
the tools menu to open choices dialog box. To discover more about the different
choices in the dialog carton, click a node in the left pane, and then press F1.
LOVELY PROFESSIONAL 8
Business Intelligence
and
Using Source Control Services browsers in
Business
Like Visual Studio, Business Intelligence Development Studio is integrated with Intelligence
Developmen
source control programs. If source control software is established on the
t Studio.
computer, you can add solutions and projects to source control, and then open the
solutions and tasks in Business Intelligence Development Studio from the source
control application.
Business Intelligence Development Studio includes the Report Model and Report
projects for developing reporting solutions. The Report Model project type
includes the templates for report models; data sources etc. and provide the tools
for working with these objects.
Self Assessment
15. The choices on the...................menu help you manage the client interface of
Business
Intelligence Development Studio.
Notes
LOVELY PROFESSIONAL 8
Business Intelligence
Notes • Transforming it to fit operational needs, which can include quality levels
• Loading it into the end target (database, more specifically, operational data
store, data mart or data warehouse)
Source: http://upload.wikimedia.org/wikipedia/commons/d/d8/ETL_Architecture_Pattern.jpg
Extract
The first stage of an ETL process involves extracting the data from the source
systems. An intrinsic part of the extraction involves the parsing of extracted data,
resulting in a check if the data meets an expected pattern or structure. If not, the
data may be rejected entirely or in part.
8 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
Transform Notes
Load
The loading phase loads the data into the end target, usually the Data Warehouse
(DW). Some of the data warehouses may overwrite existing information frequently;
updating data is done on a daily, weekly, or monthly basis.
5.7.1Tools
A good ETL tool must be able to communicate with different relational databases
and read the various file formats used throughout an organization. Many ETL
vendors now have data profiling, data quality, and metadata capabilities. A
common use case for ETL tools include converting CSV files to formats readable
by relational databases.
Source: https://sheet.zoho.com/publicgraphs/983047000000039763.png
LOVELY PROFESSIONAL 8
Business Intelligence
Notes ETL Tools are typically used by a broad range of professionals - from students to
database architects.
Task “Microsoft enables business users to look no further than Excel for self-service BI”. Comm
Self Assessment
18. The first stage of an ETL process is transformation stage which involves
extracting the data from the source systems.
19. The loading stage applies a series of rules or functions to the extracted data
to derive the data for loading into the end target.
20. A common use case for ETL tools include converting CSV files to formats
readable by relational databases.
Background
8 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
manually cleaning and analysing massive “data dumps” from the system. Due
to the sheer amount of data, only a small subset of the
Contd....
LOVELY PROFESSIONAL 8
Business Intelligence
2. What
markets could be realistically analysed and reports were often out of date by solution
completion. Further problems were introduced by discrepancies in the data does
definitions. To make faster and more accurate decisions, managers needed a wireless
way to quickly obtain current and comprehensive reports with maximum broadban
accuracy. d
services
Looking forward, the company anticipated expanding their business provider
intelligence capabilities beyond network deployment to optimize business provides
decisions in other units. These new capabilities were expected to yield to
increasing amounts of data that would require an expansion of storage Hitachi
capacity. A framework was sought for designing and defining future data Consultin
warehousing and business intelligence implementations. g?
Creating a Platform to Support Growth and Control Costs Source:
www.hitachiconsulting
Originally selected to perform an assessment of the organization’s overall .com/files/.../CS_Wirel
business intelligence capabilities, Hitachi Consulting was subsequently essProviderBI.pdf
engaged to provide a comprehensive solution to this pressing issue. By
utilizing expertise in BI and in collaboration with the IT, finance, marketing,
and network development teams, Hitachi Consulting designed a platform to
bring visibility to the company’s network deployment efforts. The
implementation of a Cognos dashboard, built on an Oracle relational database
system, would be linked to a data feed from the existing network monitoring
system. Organized by phase and by market, the new dashboards would offer
decision makers the option to obtain a high level view or to drill down into the
specifics of each. Without the constraints of limited data or the prohibitive
costs of manual analysis, reports could be infinitely configured and generated
nightly. Key metrics, such as population counts, sites on air, sites leased, and
duration calculations, would be integrated while data definitions would be
consolidated for increased accuracy.
The future needs of the organization were also considered as the system was
designed to serve as the foundation for the development of a more robust and
scalable data warehousing platform. Hitachi Consulting helped define the
data warehouse framework and design standards by assisting the client in
developing an internal “Centre of Excellence” for future implementations.
The system is expected to pay for itself through the cost savings of only a
single market. Data that was formerly available to only a few is now
empowering those making the decisions to manage a network build-out that
is on time and on budget.
Questions:
9 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
Notes
LOVELY PROFESSIONAL 9
Business Intelligence
• Office XP devices such as Excel are high on the usability scale because they
enable business users to support BI purposes through the use of desktop
tools they are already familiar with.
• Companies can use the SQL Server 2000 for the Oracle Customer kit to link
the two databases and alleviate the task of managing the two databases.
• Data extraction is the act or process of retrieving data from data sources for
further data processing.
5.9 Keywords
Business Intelligence (BI): Business Intelligence (BI) is a set of theories,
methodologies, processes, architectures, and technologies that transform raw data
into meaningful and useful information for business purposes.
Data Extraction: Data extraction is the act or process of retrieving data out of
(usually unstructured or poorly structured) data sources for further data
processing or data storage (data migration).
Data Mining (DM): Data mining (the analysis step of the “Knowledge Discovery in
Databases” process, or KDD), an interdisciplinary subfield of computer science, is
the computational process of discovering patterns in large data sets involving
methods at the intersection of artificial intelligence, machine learning, statistics,
and database systems.
9 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for Decision
Making”. John Wiley & Sons.
LOVELY PROFESSIONAL 9
Business Intelligence
9 LOVELY PROFESSIONAL
Unit 5: Microsoft Business Intelligence
Notes
system nfotrove.com/files/GreenHill.pdf?
www.kpmg.com/GR/en/IssuesAndInsights/.../The-MS-Platform-for-
BI.pdf www.microsoft.com/en-in/bi/?
LOVELY PROFESSIONAL 9
Unit 6: Business
CONTENTS
Objectives
Introduction
6.1 Creating Data Source
6.2 Creating a Data Source View
6.3 Modifying the Data View
6.4 Creating Dimensions, Time and Modifying Dimensions
6.4.1 Creating Dimensions
6.4.2 Time Dimension
6.4.3 Modifying the Date Dimension
6.5 Parent-Child Dimensions
6.6 Summary
6.7 Keywords
6.8 Review Questions
6.9 Further Readings
Objectives
Introduction
Too many times, Business Intelligence (BI) and Data Warehousing project managers are not
equipped well to handle their role in guiding a project to success. Often, the person allotted to
lead a project is either: (1) a technician who doesn’t know the first thing about managing a
project, or (2) a project manager who doesn’t know the first thing about Business Intelligence.
Purpose of BI is making the most of an organization’s data assets. By making better data-driven
decisions through BI, companies can gain advantages like increasing revenue, reducing costs, or
reducing risks. This Unit focus on discussion of data source and data source view. Also,
working with dimension is explained in the unit. Finally, you will learn about parent-child
dimension.
LOVELY PROFESSIONAL 8
Business Intelligence
You can connect utilizing an organized Microsoft .NET Framework or native OLE DB
provider. For Oracle and other third-party data causes, check if the third-party provides a native
OLE DB provider and if it does try that first. If you get errors, try one of the other .NET
providers or native OLE DB providers listed in connection manager. Be certain that any data
provider you use is established on all computers utilized to develop and run the analysis
Services solutions.
!
Caution The account you identify should have a login on the isolated database server and
read permissions on the external database.
Windows Authentication
Connections that use Windows authentication are specified on the Impersonation data tab of
the Data Source Designer. Use this tab to choose the impersonation option that identifies
the account under which analysis Services runs when connecting to the external data
source. Not all options can be utilised in all scenarios.
Database Authentication
Notes By default, SQL Server Data Tools (SSDT) does not save passwords with the connection string. If the passwor
Create a Data Source Using the Data Source Wizard
To create data source using the data source wizard follow these steps:
1. In SQL Server Data Tools, open the Analysis Services project or connect to the Analysis
Services database in which you want to define the data source.
8 LOVELY PROFESSIONAL
Unit 6: Business
2. In Solution Explorer, right-click the Data Sources folder, and then click New Data Source Notes
to start the Data Source Wizard.
3. On the Select how to define the connection page, choose Create a data source based on an
existing or new connection and then click New to open Connection Manager. New
connections are created in Connection Manager.
4. Select the Microsoft .NET Framework or native OLE DB provider to use for the
connection.
5. Enter the information requested by the selected provider to connect to the data source. If
the Native OLE DB\SQL Server Native Client provider is selected for example, then
enter the following information:
(a) Server Name is the network name of the Database Engine instance.
(b) Log on to the Server specifies how the connection will be authentication. Use
Windows Authentication uses Windows authentication. Use SQL Server
Authentication specifies a database user login for a Windows Azure SQL databases.
(c) Select or enter a database name or Attach a database file are used to specify the
database.
(d) In the left side of the dialog box, click All to view additional settings for this
connection, including all default settings for this provider.
(e) Change settings as appropriate for your environment and then click OK. The
new connection appears in the Data Connection pane of the Select how to define
the connection page of the Data Source Wizard.
!
Caution Regardless of whether you clear or select Save my password, Analysis Services
will always encrypt and save the password. The password is encrypted and stored in both
.abf and data files. This behaviour exists because Analysis Services does not support
session- based password storage on the server.
6. Click Next. In Impersonation Information, specify the Windows credentials or user
identity that Analysis Services will use when connecting to the external data source.
7. Click Next. In Completing the Wizard, enter a data source name or use the default name.
The default name is the name of the database specified in the connection. The Preview
pane displays the connection string for this new data source.
8. Click Finish. The new data source appears in the Data Sources folder in Solution Explorer.
The attachment string is formulated based on the properties you choose in the Data Source
Designer or the New Data Source Wizard. You can view the attachment string and other
properties in SQL Server Data Tools.
1. In SQL Server Data Tools, double-click the data source object in Solution Explorer.
2. Click Edit, and then click All on the left navigation pane.
3. The property grid appears, showing available properties of the data provider you are
using.
LOVELY PROFESSIONAL 8
Business Intelligence
Notes
You can create more than one data source object to support connections to additional data
sources. To combine data from multiple data sources:
2. Create a data source view, using a SQL Server relational database as the data source.
3. In Data Source View Designer, using the data source view just created right-click
anywhere in the work area and select Add/Remove Tables.
4. Choose the second data source and then select the tables you want to add.
5. Find and select the table you added. Right-click the table and select New
Relationship. Choose the source and destination columns that contain matching data.
Types of data sources that you can use in a multidimensional model are shown in Table 6.1.
Table 6.1: Types of Data Sources
Access Microsoft Access 2003, 2007, .accdb or Microsoft Jet 4.0 OLE DB provider
databases 2010. .mdb
SQL Server Microsoft SQL Server 2005, (not OLE DB Provider for SQL Server
relational 2008, 2008 R2, 2012, applicable) SQL Server Native Client OLE DB
databases Windows Azure SQL Provider
Database
SQL Server Native 11.0 Client
OLE DB Provider
.NET Framework Data Provider
for SQL Client
SQL Server 2008 R2; 2012 (not OLE DB provider for SQL Server
Parallel applicable) PDW
Data
Warehouse
(PDW)
Oracle Oracle 9i, 10g, 11g. (not Oracle OLE DB Provider
relational applicable) .NET Framework Data Provider
databases for Oracle Client
.NET Framework Data Provider
for SQL Server
MSDAORA OLE DB provider
OraOLEDB
MSDASQL
Contd....
8 LOVELY PROFESSIONAL
Unit 6: Business
Source: http://msdn.microsoft.com/en-us/library/ms175608.aspx
Self Assessment
3. When retrieving data, the client library making the connection formulates a connection
request that includes the credentials in the connection string.
4. In completing the Wizard, the new data source appears in the Data Sources folder.
5. In Data Source View Designer, using the data source view just created right-click
anywhere in the work area and select Add/Remove Tables.
1. In Solution Explorer, right-click Data Source Views, and then click New Data Source View.
2. On the Data Source View Wizard page, click Next. Then a page appears to select a
Data Source.
3. Under Relational data sources, one of the data source will be selected. Click Next.
4. On the Select Tables and Views page, you have to select tables and views from the list of
objects that are available from the selected data source. Click > to add the selected tables
to the Included objects list. Finally, Click Next and Finish.
LOVELY PROFESSIONAL 8
Business Intelligence
Source: http://i.msdn.microsoft.com/dynimg/IC50754.gif
5. To maximize the Microsoft Visual Studio development environment, click the Maximize
button.
6. To view the tables in the Diagram pane at 50 per cent, click the Zoom icon on the Data
Source View Designer toolbar.
Source: http://i.technet.microsoft.com/dynimg/IC102093.gif
8 LOVELY PROFESSIONAL
Unit 6: Business
7. To hide Solution Explorer, click the Auto Hide button on the title bar. To unhide Solution Notes
Explorer, click the Auto Hide button again.
Self Assessment
7. To view the tables in the Diagram pane at 50 per cent, click the.........................icon on the
Data Source View Designer toolbar.
6.4.1Creating Dimensions
A database dimension is a collection of associated objects, called attributes, which can be used
to supply information about fact data in one or more cubes.
LOVELY PROFESSIONAL 8
Business Intelligence
Notes Attributes can be coordinated into user-defined hierarchies that supply navigational routes
to assist users when browsing the data in a cube.
Use the Dimension Wizard in SSDT to create a database dimension in a Microsoft SQL Server
Analysis Services project. After a database dimension is created, you can use Dimension
Designer to modify its properties. To understand the concept better we will create a DateTime
Dimension.
Source:http://www.blrf.net/blog/wp-content/uploads/2011/06/visual_studio_2008_dimension_
wizard_create_datetime_dimension.png
Click Next and on this window, un-check all related tables offered to you and click Next again.
On Select Dimension Attributes only Attribute Names T, Year, Month and Day should be
Enabled.
9 LOVELY PROFESSIONAL
Unit 6: Business
Source: http://www.blrf.net/blog/wp-content/uploads/2011/06/visual_studio_2008_dimension_
wizard_select_dimension_attributes.png
Click next and on the next window, name this dimension as DateTime and click Finish.
Every dimension has its own Attributes and those attributes can be assigned into Hierarchies,
which define how different attributes are related to each other.
In Microsoft SQL Server Analysis Services, you can use the Dimension Wizard in SSDT to
create a time dimension when no time table is available in the source database. This can be done
by selecting one of the following options on the Select Creation Method page:
• Generate a time table in the data source: Select this option when you have permission to
create objects in the underlying data source.
• Generate a time table on the server: Select this option when you do not have permission
to create objects in the underlying data source.
In this we will create a user-defined hierarchy and change the member names that are
displayed for the Date, Month, Calendar Quarter, and Calendar Semester attributes.
You can add a named calculation to a table in a data source view. The expression appears and
behaves as a column in the table.
LOVELY PROFESSIONAL 9
Business Intelligence
1. Open the data source view by double-clicking it in the Data Source Views folder in
Solution Explorer.
2. In the Tables pane, right-click Date, and then click New Named Calculation.
3. In the Create Named Calculation dialog box, type SimpleDate in the Column name box,
and then type the following DATENAMEstatement in theExpression box:
DATENAME(mm, FullDateAlternateKey) + ‘ ‘ +
DATENAME(dd, FullDateAlternateKey) + ‘, ‘ +
DATENAME(yy, FullDateAlternateKey)
4. Click OK, and then expand Date in the Tables pane. On the File menu, click Save All.
6. Review the last column in the Explore Date Table view. Close the Explore Date Table
view.
Self Assessment
9. We can use the...............................to add, delete, or modify rows of data in the table.
12. In Microsoft SQL Server Analysis Services, you can use the..............................in SSDT to
create a time dimension when no time table is available in the source database.
Example: In the following Employee table, the column that identifies each member
is Employee_Number. The column that identifies the parent of each member
is Manager_Employee_Number.
Table 6.2: Sample Employee Table
Source: http://i.msdn.microsoft.com/dynimg/IC553.gif
9 LOVELY PROFESSIONAL
Unit 6: Business
These columns can be used to define a parent-child dimension that contains the following Notes
member hierarchy.
Figure 6.5: Sample Member Hierarchy
Source: http://i.msdn.microsoft.com/dynimg/IC44843.gif
Both columns must have the same data type. Both columns must be in the same table.
When you define a parent-child dimension, you can also select a third column to provide
member names, which are displayed to end users as they browse cubes. The depth of a
parent- child dimension can vary among its hierarchy’s branches.
You can use the Dimension Wizard to create parent-child dimensions. After you create a parent-
child dimension, you can edit it in Dimension Editor (if the dimension is shared) or Cube Editor
(if the dimension is private).
Task Make report on application of business intelligence and planning in midsize companies.
Self Assessment
13. A parent-child dimension is based on two dimension table columns that together define
the lineage relationships among the members of the dimension.
14. When you define a parent-child dimension, you can also select a third column to
provide member names, which are displayed to end users as they browse cubes.
15. You can use the Dimension Wizard to create parent-child dimensions.
Case Study
Managing Data Sources for Input to Data
Warehousing and Business Intelligence
D
ata warehousing and business intelligence effort is only as good as the data that
is put into it. The saying “Garbage In, Garbage Out” is all too true. A leading
cause of data warehousing and business intelligence project failures is to obtain the wrong or poor quality data.
Contd....
LOVELY PROFESSIONAL 9
Business Intelligence
Notes
Managing data warehouse input sources includes a number of steps organized into
two phases. In the first phase the following activities are undertaken:
• Manage the Data Source Identification Process
When the major data sources have been identified it is time to quickly gain detailed
understanding of each one:
• Obtain Existing Documentation
The source identification process is critical to the success of data warehousing and
business intelligence projects. It is important to move through this effort quickly,
obtaining enough information about the data sources without being bogged down in
excess detail while still obtaining the needed information.
Start out with a list of the entities planned for the data warehouse / data mart. This
can be managed with a spreadsheet containing these columns:
• Entity name
• Subject Area
• Data Source(s)
• Analyst Name(s)
• Status
Complete the entity name, data mart role and subject area entries. Assign an analyst to
each entity who will find data sources and subject matter experts for each entity.
Contd....
9 LOVELY PROFESSIONAL
Unit 6: Business
Consider the following questions when determining the sources and costs of data for
the Data Warehouse:
• Where does the data come from?
Dimensions enable business intelligence users to put information in context. They focus
on questions of: who, when, where and what. Typical dimensions include:
• Time period/calendar
• Product
• Customer
• Household
• Market Segment
• Geographic Area
Master data is a complementary concept and may provide the best source of dimensional
data for the data warehouse. Master data is data shared between systems that describe
entities like: product, customer and household. Master data is managed using a Master
Data Management (MDM) system and stored in an MDM-Hub. Benefits of this approach
include:
• It is less expensive to access data from a single source (MDM-Hub) than extracting
from multiple sources.
• MDM data is rationalized.
If an MDM-Hub does not exist consider creating one. It will have many uses beyond
supporting the data warehouse and business intelligence.
If no MDM-Hub is available, you will need to examine source systems and determine
which system contains the data most suitable for dimensions. If the data is not stored
in a managed database, you may need to define the data locally, in a spreadsheet or
desktop database, and then provide to the data warehousing system.
Identify Fact Data Sources for the Data Mart
The Fact contains quantitative measurements while the Dimension contains classification
information. The data sources for Fact tend to be transactional software systems. For
example:
Contd....
LOVELY PROFESSIONAL 9
Business Intelligence
Note
s Syste Example Fact Data
m
Sales Order Entry Sales Transaction
Return Transaction
Larger enterprises may have multiple systems for the same kind data. In that
case, you will need to determine the best source of data - the System of
Record (SOR) as the source of data warehousing data.
Detailed Data Source Understanding for Data Warehousing
When the major data sources have been identified it is time to quickly gain
detailed understanding of each one. Consolidate the spreadsheet developed in
the identification phase by data source, then create a new spreadsheet to
track and control detailed understanding:
Data Subject Name
Obtain Doc Date
Define Input Date
Profile Input Date
Map Date
Data Quality Date
Save Results
Analyst Name
SME Name(s)
Status
9 LOVELY PROFESSIONAL
Unit 6: Business
Notes
Obtain Existing Documentation
When seeking to understand a data source, the first thing to do is look at
existing documentation. This avoids “re-inventing the wheel”. If a data source
is fully documented, data profiled and of high quality most of the job of data
source discovery is complete.
Existing documentation may include:
Data models
Data dictionary
Internal/technical documentation
Business user guides
Data profiles and data quality assessments
Check through the documentation to assess its completeness and usefulness.
The data source analyst should study the existing documentation before any
in depth discussions with the SMEs. This improves the credibility of the data
analyst and save time for the SMEs.
Model and Define the Input
The data model is a graphic representation of data structures that improves
understanding and provides automation linking database design to physical
implementation. This section assumes that the data source is stored in a
relational database that modelled using typical relational data modelling tools.
If there is an existing data model, start with that, otherwise use the reverse
engineering capability of the data modelling to build a physical data model.
Next, group the tables that are of interest into a subject area for analysis.
Unless, a large percentage of the data source is needed for the data
warehouse avoid studying the entire data source. Stay focused on the current
project.
For each selected data source table define:
Physical Name
Logical Name
Definition
Notes
For each selected data source column define:
Physical Name
Logical Name
Order in Table
Datatype
Length
Decimal Positions
Nullable/Required
Default Value
Contd....
LOVELY PROFESSIONAL 9
Business Intelligence
Note
s Edit Rules
Definition
Notes
Profile the Data Source
The actual use and behaviour of data sources often tends not to match the
name or definition of the data. Sometimes this is called “dirty data” or
“unrefined data” that may have problems such as:
Invalid code values
Missing data values
Multiple uses of a single data item
Inconsistent code values
Incorrect values such as sales revenue amounts
Data profile is an organized approach to examining data to better understand
and later use it. This can be accomplished by querying the data using tools
like:
SQL Queries
Reporting tools
Data quality tools
Data exploration tools
For code values such as gender code and account status code do a listing
showing value and count such as this gender code listing:
Other systems may represent female and male as 1 and 2 rather than F and T,
and so may require standardization when stored in the data warehouse. When
data from multiple sources is integrated in the data warehouse it is expected
that it will be standardized and integrated.
Statistical measures are a good way to better understand numeric information
such as revenue amounts. Helpful statistics are:
Mean (average)
Median
Mode
Maximum
Minimum
Quartile Averages
Contd....
9 LOVELY PROFESSIONAL
Unit 6: Business
• Variance
Data profiling may reveal problems in data quality. For example, it might show
invalid values are be entered for a particular column, such as entering ‘Z’ for gender
when ‘F’ and ‘M’ are the valid values. Some steps that could be taken to improve data
quality include:
• Work with data owners to define the appropriate level of data quality. Build this
into a data governance program.
• Determine why there are data quality problems — do a root cause analysis.
• Correct the data in the source system through manual or automated efforts.
• Make data quality visible to the business through scorecards, dashboards and
reports.
The information gathered during the data source discovery process is valuable
metadata that can be useful for future data warehousing or other projects. Be sure to
save the results and make available for future efforts. This work can be a great step
toward building an improved data resource.
Question:
Discuss the case study in contrast with efficient and effective workflow of obtaining
the right source data and using it in the data warehousing and business intelligence
project.
Source: http://infogoal.com/datawarehousing/data_sources_2.htm
6.6 Summary
• A multidimensional model must contain at least one data source object, but you can
add more to combine data from several data warehouses.
• A data source connection can occasionally use Windows authentication or an
authentication service provided by the database administration scheme, such as SQL
Server authentication when connecting to SQL Azure databases.
• The attachment string is formulated based on the properties you choose in the Data
Source Designer or the New Data Source Wizard.
• After you have defined the data sources that you will use in an Analysis Services project,
the next step is generally to define a data source view for the project.
• The ability to use the DataView to modify data in the underlying table is controlled by
setting one of three Boolean properties of the DataView: AllowNew, AllowEdit, and
AllowDelete.
LOVELY PROFESSIONAL 9
Business Intelligence
6.7 Keywords
Data Source Reference: A data source reference is an association to another Analysis Services
project or data source in the same solution.
OLTP (Online Transaction Processing): Online transaction processing, or OLTP, is a class of
information systems that facilitate and manage transaction-oriented applications, typically for
data entry and retrieval transaction processing.
Oracle: The Oracle Database (commonly referred to as Oracle RDBMS or simply as Oracle) is
an object-relational database management system produced and marketed by Oracle
Corporation.
Parent-child Dimension: A parent-child hierarchy is a hierarchy in a standard dimension
that contains a parent attribute.
RDBMS: RDBMS stands for Relational Database Management System. RDBMS data is
structured in database tables, fields and records.
SQL: SQL (Structured Query Language) is a special-purpose programming language designed
for managing data held in a relational database management system (RDBMS).
1. True 2. True
3. True 4. False
5. True 6. Maximize
7. Zoom 8. Auto Hide
9. DataView 10. Attributes
11. SSDT 12. Dimension Wizard
13. True 14. True
15. True
1 LOVELY PROFESSIONAL
Unit 6: Business
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for Decision
Making”. John Wiley & Sons.
David Loshin (2012). “Business Intelligence: The Savvy Manager’s Guide”. Newnes.
LOVELY PROFESSIONAL 1
Business Intelligence
CONTENTS
Objectives
Introducti
on
7.1.1 Explanation
7.4 Calculated
Members
7.6 Summary
7.7 Keywords
Introduction
A data cube is a three (or more) dimensional array of values, commonly used to
describe a time series of image data. An OLAP cube is an array of data understood
in terms of its 0 or more dimensions. Benefit of building a cube to store your data
is that you can centralize the business rules for calculations that you can’t easily
store in a relational data mart. The structure of the cube makes it much easier to
write queries to compare data year over year and also you gain the ability to
transparently manage aggregated data in the cube. In this chapter, you will learn
about creating a cube using Wizard. It also discusses adding measure and measure
groups to cube. Finally, calculated measures and deploying and browsing of cube
are discussed in the unit.
Use the Cube Wizard to create a cube rapidly and effortlessly. When you create the
cube, you can add living dimensions or create new dimensions that structure the
cube. You can also create dimensions separately, using the Dimension Wizard, and
then add them to a cube.
1 LOVELY PROFESSIONAL
Unit 7: Creating
A Cube acts as an OLAP database to the subscribers who need to query data from Notes
an OLAP data store. A Cube is the main object of a SSAS solution where the
majority of fine tuning, calculations, aggregation design etc. Now, we will create a
cube using our dimension and fact tables. We will use SQL Server 2008 here.
7.1.1 Explanation
Right click the Cube folder and select “New Cube”, and it will invoke the Cube
Wizard. In the first screen select one of the methods of creating a Cube. We
assume our dimensions are ready, and schema is already designed to contain
dimension and fact tables. So we will select the option of “Use existing tables”.
Figure 7.1: Select Creation Method
Source: http://www.mssqltips.com/tutorialimages/2008_Cube_Wizard_Step_1.jpg
In the next screen, we need to select the tables which will be used to create
measure groups. We again assume we have a DSV which has fact tables in the
schema. So we will use this as shown in the Figure 7.2.
LOVELY PROFESSIONAL 1
Business Intelligence
Source: http://www.mssqltips.com/tutorialimages/2008_Cube_Wizard_Step_2.jpg
In the next screen, we need to select the measures that we want to create from the
fact tables we just selected in the previous screen. For now, select all the fields as
shown below and move to the next screen.
Figure 7.3: Select Measures
Source: http://www.mssqltips.com/tutorialimages/2008_Cube_Wizard_Step_3.jpg
1 LOVELY PROFESSIONAL
Unit 7: Creating
In this screen you need to select any existing dimensions. We have created three Notes
dimensions and we will include all of these dimensions as shown below:
Figure 7.4: Select Existing Dimensions
Source: http://www.mssqltips.com/tutorialimages/2008_Cube_Wizard_Step_4.jpg
In the next screen, we can select if we want to create any additional new
dimensions from the tables available in the DSV. We do not want to create any
more dimensions, so unselect any selected tables as shown below and move to the
Figure 7.5: Select New Dimensions
next screen.
Source: http://www.mssqltips.com/tutorialimages/2008_Cube_Wizard_Step_5.jpg
LOVELY PROFESSIONAL 1
Business Intelligence
Notes Finally you need to name your cube, which is the last step of the wizard before
your cube is created. Name it something appropriate like “Sales Cube” as shown
Figure 7.6: Completing the Wizard
below.
Source: http://www.mssqltips.com/tutorialimages/2008_Cube_Wizard_Step_6.jpg
Now your cube should have been created and if your cube editor is open you
should find different tabs to configure and design various features and aspects of
Figure 7.7: Final Cube Structure
the cube.
Source: http://www.mssqltips.com/tutorialimages/2008_Cube_Wizard_Step_7.jpg
1 LOVELY PROFESSIONAL
Unit 7: Creating
Notes
Find out the defaults used by the create cube wizard.
Task
Self Assessment
2. Right-click the Cube folder and select “New Cube”, and it will invoke the .........................
A cube provides a single place where all related data, for analysis, is stored.
Did u know?
• Dimensions
• Partitions
• Perspectives
• Hierarchies
• Actions
• Calculations
• Translations
You can use the Define New Measures page to create new measures for a cube
that is being created without using a data source.
To get familiar with all related operations of a Cube, read the following context:
• If you select measures from template Options it will displays the measures
from the cube template to include in the cube.
• To include a specific measure from the template, select the check box for that measure.
• To include all measures from the template in the cube, select the check box in the header.
• Use Measure Name to lists the measures that are available in the template.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes • To rename a measure, click on that measure and type a new name.
• Use Measure Group to lists the measure group for the measure.
• To change the measure group, click on the measure group, and then either
enter a new measure group or select an existing measure group from the
list.
1. In Solution Explorer, right-click the cube, and then click View Designer.
3. Either click the New Measure Group button or right-click anywhere in the
Measures pane and then click New Measure Group.
4. In New Measure Group, click the table from the data source view that you
want to use as the new measure group, and then click OK.
1. In Solution Explorer, right-click the cube, and then click View Designer.
3. In the Measures pane, click the measure group that you want to remove.
4. Either click the Delete button or right-click the measure group and then click
Delete.
5. In the Delete Objects dialog box, review the object to be deleted, and then
click OK.
Self Assessment
4. A cube provides a single place where all related data, for analysis, is
..........................
!
Caution Only the definitions for calculated members are retained; values are
calculated in memory when required to answer a query.
Calculated members enable you to add members and measures to a cube without
1 LOVELY PROFESSIONAL
Unit 7: Creating
expanding its size. Although calculated members must be based on data (such as
constituents) that currently lives in the cube, you can create complex expressions
by combining this data with arithmetic
LOVELY PROFESSIONAL 1
Business Intelligence
Calculated members have a Format String property that controls the format of cell
values displayed to end users. This property is accessed in the properties pane of
Cube Editor. The Format String property accepts the same values as the Display
Format property of measures.
Source: http://i.msdn.microsoft.com/dynimg/IC574394.gif
Self Assessment
7. Only the definitions for calculated members are retained; values are
calculated in memory when required to answer a query.
8. Calculated members have a Format String property that controls the format
of cell values displayed to end users.
Example: You may have to define dimension member sort orders, delete
unnecessary dimension attributes, define new user hierarchies, modify existing
user hierarchies, or configure measure properties.
1 LOVELY PROFESSIONAL
Unit 7: Creating
Notes After you deploy a cube, cube data is viewable on the Browser tab in Cube
Designer, and dimension data is viewable on the Browser tab in Dimension
Designer.
2. Select the Browser tab, and then click Reconnect on the toolbar of
the designer. The following image highlights the individual panes in Cube
Source: http://i.technet.microsoft.com/dynimg/IC29955.gif
As you can see, the left pane of the designer shows the metadata for the Analysis
Services Tutorial cube. Perspective and Language options are available on the
Notes The Browser tab includes two panes to the right of the metadata pane: the upper pane is
4. In the metadata pane, expand Product. Drag the Product Model Lines user
hierarchy to the Drop Column Fields Here area of the data pane, and then
expand the Road member of the Product Line level of this user hierarchy.
LOVELY PROFESSIONAL 1
Business Intelligence
5. In the metadata pane, expand Customer, expand Location, and then drag the
Customer Geography hierarchy from the Location display folder in the
Customer dimension to the Drop Row Fields Here area of the data pane.
1 LOVELY PROFESSIONAL
Unit 7: Creating
6. In similar way, expand, United States, Order date, Customer and other Notes
headings as required. Figure 7.10 shows Internet sales by region and product line
Figure 7.10: Internet sales by region and product
for the month of February, 2002.
line for the month of February, 2002.
Self Assessment
9. After you deploy a cube,..................is viewable on the Browser tab in Cube Designer,
and.........................is viewable on the Browser tab in Dimension Designer.
10. The Browser tab includes two panes to the right of the metadata pane: the
upper pane is the ............................., and the lower pane is
the ...................................
Case Study
Building a Data Cube
T
his example uses sales figures from XYZ Co., which makes many kinds of widgets.
For each sales transaction, we know four pieces of data:
Which types of widget were involved (style, colour, size and so on)
Store or sales agent
Sales amount
Geographic region or territory
In a real-world situation, we would also know many other data items, including:
Quantity
Contd....
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
Customer
Cost to XYZ for each widget
Order date
Shipment date
Method and cost of shipping
Any of these pieces of data can function as a dimension in a data cube. We can take any tw
dimensions and produce a 2-D table:
Thus we can correlate or track sales against individual stores or sales agents. Add in a thir
That allows us to see how much each store or sales agent is selling in addition to which typ
We can now see who is selling where.
Questions:
What are different pieces of data available for the company here?
What can be effect of price change (increase by 25%) in United States on the sales in this c
Source: http://www.computerworld.com/s/article/91640/Data_Cubes
7.6 Summary
• A Cube is the main object of a SSAS solution where the majority of fine
tuning, calculations, aggregation design etc. Now, we will create a cube using
our dimension and fact tables. We will use SQL Server 2008 here.
1 LOVELY PROFESSIONAL
Unit 7: Creating
• Browsing a deployed cube helps you understand the modifications that you
should make to improve the functionality of the cube.
• The Browser tab includes two panes to the right of the metadata pane: the
upper pane is the filter pane, and the lower pane is the data pane.
7.7 Keywords
Cube: A Cube is the main object of a SSAS solution where the majority of fine
tuning, calculations, aggregation design etc.
Cube Wizard: Use this wizard to create a cube. The wizard helps you select the
data source, fact table, measures, and dimensions for a new cube.
6. What are the steps for removing a measure group from a cube?
3. Cube 4. Stored
7. True 8. True
LOVELY PROFESSIONAL 1
Business Intelligence
“ Business Intelligence”.
O’Reilly Media, Inc.
Rajiv Sabhrwal, Irma Becerra-Fernandez (2010). “Business
Intelligence”. John Wiley & Sons.
Swain Scheps (2013). “Business Intelligence for Dummies”. Wiley.
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
CONTENTS
Objectives
Introducti
on
8.6 Summary
8.7 Keywords
Objectives
Introduction
In this unit, you will learn advanced measures and calculations. The aggregate
functions like SUM, MIN, MAX and COUNT are discussed. It also explains how to
use MDX to conditionally apply formatting to a measure or calculated member and
retrieve data from cube. As the unit progress you will learn about calculation
scripts. Finally, you will learn to create Key Performance Indicators (KPIs) that
combine expressions and graphics for actual, target, status, and trend values.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes Aggregate tables store pre-computed results, which are measures that have been
aggregated (typically summed) over a set of dimensional attributes. Using
aggregate tables is a very popular technique for speeding up query response times
in decision support systems. This eliminates the need for run-time calculations
and delivers faster results to users. The calculations are done ahead of time and
the results are stored in the tables. Aggregate tables should have many fewer rows
than the non-aggregate tables, and therefore, processing should be quicker.
Once the groundwork has been laid, MDX queries and the use of several MDX and
SAS functions within those queries will be demonstrated.
Notes The examples provided will allow you to customize the OLAP cube report data and levera
Self Assessment
Example: If you want to calculate average sales per customer, you divide
total sales by number of customers. You can sum sales amount to get total sales,
but to get the number of customers, you need to count customers, making sure to
count each customer only once, regardless of how many purchases each customer
has made.
Suppose you want to analyse the overall gross margin for every product in your
data source. One way to do this is to create a new calculated field called Margin
that is equal to the profit divided by the sales. Then you could place this measure
on a shelf and use the predefined summation aggregation. Here, Margin is defined
as:
This formula calculates the ratio of profit and sales for every row in the data
source, and then sums the numbers.
!
Caution However, this is almost certainly not what you would have intended
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
LOVELY PROFESSIONAL 1
Business Intelligence
Instead, you probably want to know the sum of all profits divided by the sum of all Notes
sales. That formula is shown below:
Margin = SUM( [Profit]) / SUM([Sales])
Source: http://www.anzmall.com/node/89
Let us understand the aggregate functions using an example. The cube that these
examples use has a single measure, Sales, based on the Sales_Amount column in
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
• Customers, based on the table Customers and containing these levels from
highest to lowest:
❖ (All)
LOVELY PROFESSIONAL 1
Business Intelligence
• Retail Stores, based on the table Retail_Stores and containing these levels
from highest to lowest:
❖ (All)
❖ Retail Store with Retail_Store_Name as the member name
column and Retail_Store_ID as the member key column
• Products, based on the table Products and containing these levels from highest
to lowest:
❖ (All)
❖ Product Category with Product_Category as the member name column
and the member key column
Source: http://msdn.microsoft.com/en-us/library/ms365396.aspx
Source: http://msdn.microsoft.com/en-us/library/ms365396.aspx
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
Source: http://msdn.microsoft.com/en-us/library/ms365396.aspx
Source: http://msdn.microsoft.com/en-us/library/ms365396.aspx
Sum
If a measure’s Aggregate Function property value is Sum, the measure value for a
cube cell is calculated by adding the values in the measure’s source column from
only the rows for the combination of members that defines the cell and the
descendants of those members.
The following examples return values that represent accumulated Sales:
• A query on the Sales measure for customer A, retail store A, and product A returns 800.
• A query on the Sales measures for customer A, retail store A, and product
category AB returns 900.
• A query on the Sales measure places each retail store on the x-axis, nests
products under product categories on the y-axis, and slices by All Customers.
It returns the result as shown in Table 8.6.
Table 8.6: Result of Query
Source: http://msdn.microsoft.com/en-us/library/ms365396.aspx
LOVELY PROFESSIONAL 1
Business Intelligence
Notes Min
If a measure’s Aggregate Function property value is Min, the measure value for a
cube cell is calculated by taking the lowest value in the measure’s source column
from only the rows for the combination of members that defines the cell and the
descendants of those members.
The following examples return values that represent the lowest Sales price:
• A query on the Sales measure for customer A, retail store A, and product A
returns 250.
• A query on the Sales measures for customer A, retail store A, and product
category AB returns 100.
• A query on the Sales measure places each retail store on the x-axis, nests
products under product categories on the y-axis, and slices by All Customers.
It returns the result as shown in Table 8.7.
Table 8.7: Result of Query
Source: http://msdn.microsoft.com/en-us/library/ms365396.aspx
Max
If a measure’s Aggregate Function property value is Max, the measure value for a
cube cell is calculated by taking the highest value in the measure’s source column
from only the rows for the combination of members that defines the cell and the
descendants of those members.
Table 8.8: Result of Query
Source: http://msdn.microsoft.com/en-us/library/ms365396.aspx
The examples return values that represent the highest Sales price.
• A query on the Sales measure for customer A, retail store A, and product A
returns 300.
• A query on the Sales measures for customer A, retail store A, and product
category AB returns 300.
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
• A query on the Sales measure places each retail store on the x-axis, nests Notes
products under product categories on the y-axis, and slices by All Customers.
It returns the result as shown in Table 8.8.
Count
If a measure’s Aggregate Function property value is Count, the measure value for
a cube cell is calculated by adding the number of values in the measure’s source
column from only the rows for the combination of members that defines the cell
and the descendants of those members.
The following examples return values that represent the number of Sales
transactions.
• A query on the Sales measure for customer A, retail store A, and product A
returns 3.
• A query on the Sales measures for customer A, retail store A, and product
category AB returns 4.
• A query on the Sales measure places each retail store on the x-axis, nests
products under product categories on the y-axis, and slices by All Customers.
It returns the result as shown in Table 8.9.
Table 8.9: Result of Query
Source: http://msdn.microsoft.com/en-us/library/ms365396.aspx
Self Assessment
define an Analysis Services data source and create one or more report datasets.
When you define the data
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
Notes source definition, you should specify an attachment string and credentials so that
you can get access to the data source from your client computer.
Did u know? You can create embedded data source definition for use by a
single report or a shared data source definition that can be utilised by
multiple reports.
The methods in this topic recount how to create an embedded data source. To
create an embedded Microsoft SQL Server Analysis Services data source:
1. On the toolbar in the Report Data pane, click New, and then click Data
Source.
2. In the Data Source Properties dialog box, type a name in the Name text box,
or accept the default name.
3. Verify that Embedded connection is selected.
4. From the Type drop-down list, select Microsoft Sql Server Analysis Services.
5. Specify a connection string that works with your Analysis Services data
source. Contact your database administrator for connection information and
for the credentials to use to connect to the data source. The following
connection string example specifies the AdventureWorksDW database on
the local client:
Data Source = local host; Initial Catalog = Adventure Works DW
6. Click Credentials and then Click OK. The data source appears in the Report
Data pane.
The default scope is the whole cube, but you can define a more limited scope,
known as a subcube, and then apply an MDX script to only that particular cube
space. The SCOPE statement defines the scope of all subsequent MDX expressions
and statements in the calculation script until the scope is terminated or redefined.
The THIS statement is then used to apply an MDX expression to the current scope.
You can use the BACK_COLOR statement to specify a background cell colour for
the cells in the current scope, to help you during debugging.
1. The default MDX script: At the time that you create a cube, Analysis Services
creates a default MDX script for that cube. This script defines a calculation
pass for the whole cube.
2. User-defined MDX script: After you have created a cube, you can add user-defined
MDX scripts that extend the calculation capabilities of the cube.
LOVELY PROFESSIONAL 1
Task “Script commands let you perform almost any action that is supported by MDX on a cube
Business Intelligence
9. The default scope is the whole cube, but you can define a more limited scope,
known as a
................................. and then apply an MDX script to only that
particular cube space. 10. There are two types of MDX scripts:
1. Open a KPI for editing or create a new KPI. To create a new KPI, do one of the following:
(a) In the global header, hover the mouse pointer over the New menu,
select KPI, and from the Select Subject Area dialog, select a subject
area for the KPI. The ”KPI editor” is displayed.
2. On the ”KPI editor: General Properties page”, specify the business owner, actual
value, and target value, and indicate if you want to enable trending to
determine performance patterns.
3. On the ”KPI editor: Dimensionality page”, select the dimensions (for example,
Sales by Region and by Financial Quarter) that you want to use to aggregate
the KPI’s actual and target values.
Notes Note that you should include a time dimension for most KPIs. Exceptions include constants or metrics that are defin
4. On the ”KPI editor: Thresholds page”, indicate the desired goal based on KPI values
(for example, “High Values are Desirable”), define the ranges that evaluate
KPI values to determine performance status, and associate performance
levels with actions.
5. On the ”KPI editor: Related Documents page”, add any external links or
business intelligence objects to the KPI.
(a) If you are creating a stand-alone KPI, then click Finish to save the KPI.
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
(b) If you are creating a new KPI, then the ”Save As dialog” is displayed
where you specify the KPI’s name and where you want to save the
KPI. If you want the KPI to
LOVELY PROFESSIONAL 1
Business Intelligence
(c) If you are creating a KPI from a scorecard, then click Save from the
”Scorecard editor”.
Self Assessment
M
ost of us have heard stories of business intelligence failures. I
assure you that it is rare for technology to cause the failure.
Unfortunately, it is usually the “softer” issues that bring down the
project. Here is a list of definite project
pitfalls. By understanding these pitfalls, hopefully you will avoid them
altogether or at least decrease their effects when confronted with them.
A recent Nucleus Research report lists their top five IT mistakes in generic IT
projects. These reasons include:
Customization Overkill
Lack of Training
We would all like to think that our business intelligence applications are so
intuitive and easy to use that no training is needed. If that is the case, then
why hasn’t everyone in your organization embraced your business
intelligence environment? There are many other related processes or cultural
changes that are needed for full utilization and adoption of your business
intelligence applications. The report relates Salesforce.com stories in which
the technology was deployed with no user training. The companies ended up
either abandoning the tool (which was unfair to the vendor) or worse,
investing significantly in changing bad behaviour later because the users
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
made up their own procedures for using the tool. Please listen to your vendor
when they offer estimates for the type and length of training needed to use
their tool. Then make sure you only train those employees with a “need to
know,” rather than making the entire company goes through the training.
Contd....
LOVELY PROFESSIONAL 1
Business Intelligence
Make sure that you and your consultants understand who is doing what.
Agree on specific roles, responsibilities, costs and estimated time frames
before you initiate the project. If things get off track, do not wait to call a
meeting to determine the problems and potential solutions. A scope
document for the project and another for the consultants’ roles and
responsibilities will eliminate a large heartache later. Specific deliverables
assigned to each person on the project, as well as their time frames, will
benefit you greatly.
The report successfully points out that a project is not over simply because
the application has been deployed. It should end when it is being effectively
used by the business. This is especially true of business intelligence
applications. It may take a while before the business community actually uses
the new application. It may be because other processes must be implemented
before the analytics can be fully utilized. It is important to understand how
the application fits into the business user’s workflow before declaring that
the project is completed. Your project may not truly end for several months,
perhaps even years, after it has been organized.
Question:
Add your own comments about these in terms of specific business intelligence
projects.
Source: http://www.b-eye-network.com/view/1519
8.6 Summary
• The focus of this is the utilization of advanced methods of exploring and
surfacing OLAP cube data using Multidimensional Expression Language
(MDX), both in the Enterprise Guide viewer and via the PROC SQL interface
to OLAP.
• Once the groundwork has been laid, MDX queries and the use of several MDX
and SAS functions within those queries will be demonstrated.
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
for a cube cell is calculated by adding the values in the measure’s source
column from only the rows for the combination of members that defines the
cell and the descendants of those members.
• To use facts and figures from an analysis Services cube in your report, you
should define an Analysis Services data source and create one or more
report datasets.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes • You can create embedded data source definition for use by a single report
or a shared data source definition that can be utilised by multiple
reports.
• The default scope is the whole cube, but you can define a more limited scope,
known as a sub cube, and then apply an MDX script to only that particular
cube space.
• KPIs are measurements that define and track specific business goals and
objectives that often roll up into larger organizational strategies that require
monitoring, improvement, and evaluation.
8.7 Keywords
Aggregate Functions: In computer science, an aggregate function is a function where
the values of multiple rows are grouped together as input on certain criteria to
form a single value of more significant meaning or measurement such as a set, a
bag or a list.
Key Performance Indicators (KPIs): Key Performance Indicators are valuable for teams,
managers, and businesses to evaluate quickly the progress made against
measurable goals.
Script: A script command is an MDX script, included as part of the definition of the
cube.
7. Highest 8. Script
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
9. Subcube
LOVELY PROFESSIONAL 1
Business Intelligence
8.9 Further
Notes
Readings
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for Decision
Making”. John Wiley & Sons.
David Loshin (2012). “Business Intelligence: The Savvy Manager’s
Online links
msdn.microsoft.com/en-us/library/dd239327(v=sql.10
0).aspx quartetfs.com/en/mdx-query-basics-and-
usage-example www.bidn.com › Home › Blogs ›
DustinRyan
1 LOVELY PROFESSIONAL
Unit 8: Advanced Measures and
CONTENTS
Objectives
Introduction
9.1 Creating Dimensions
9.4 Summary
9.5 Keywords
Objectives
After studying this unit, you will be able to:
Introduction
In this unit, you will learn about advanced dimensional designs. Multiple hierarchies can be
created for a dimension to provide alternate views of dimension members. The account
dimension and its associated rules enable you to create and maintain a chart of accounts for
various financial models. Here, you will learn about creating dimensions. You will also be able
to build an account dimension to support financial analysis. Each base schema design technique
brings limitations and implications for aggregate design. Finally, you will learn about
interacting with cubes. In this the key focus will be on implementing actions, creating
standard actions and creating a drillthrough action.
Multiple hierarchies can be created for a dimension to provide alternate views of dimension
members.
Example: A time dimension that has two hierarchies can comprise of a normal calendar
view and a fiscal calendar view.
LOVELY PROFESSIONAL 1
Unit 9: Advanced Dimensional
In Microsoft® SQL Server™ 2000 Analysis Services, a dimension with multiple hierarchies is Notes
actually two or more distinct dimensions that can share dimension tables and may share the
same aggregations.
Different dimensions with a single hierarchy is called schema. It requires a time span to indicate
the presence of more than one hierarchy.
Notes When creating dimensions with multiple hierarchies, the hierarchy part of the name should not be same to any current or future level name or m
1. In the Analysis Manager Tree pane, expand the database in which you want to create a
dimension with multiple hierarchies.
2. Right-click the Shared Dimensions folder, point to New Dimension, and then click Wizard.
3. In the second step of the wizard select either Star Schema: A single dimension
table or Snowflake Schema: Multiple, related dimension tables.
4. Follow the remaining wizard steps to define levels and various options for the dimension.
5. In the Finish step of the wizard, enter a name in the Dimension name box.
8. Click Finish to complete the wizard. After you complete the wizard, Dimension Editor
appears so that you can further refine the dimension.
9. (Optional.) To create another hierarchy of the dimension, from the File menu in
Dimension Editor, point to New Dimension, and then click Wizard. Follow the steps in
the next procedure, “To create a dimension with additional defined hierarchies using the
Dimension Wizard,” beginning with Step 3.
To create a dimension with additional defined hierarchies using the Dimension Wizard:
1. In the Analysis Manager Tree pane, expand the database in which you want to define
additional hierarchies for a dimension with at least one named hierarchy.
2. Right-click the Shared Dimensions folder, point to New Dimension, and then click Wizard.
3. In the second step of the Dimension Wizard select either Star Schema: A single
dimension table or Snowflake Schema: Multiple, related dimension tables.
4. Follow the remaining wizard steps to define levels and various options for the dimension.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes 6. Select a dimension name having a defined hierarchy from the Dimension name box.
8. Click Finish to complete the wizard. After you complete the wizard, Dimension Editor
appears so that you can further refine the dimension.
9. (Optional.) To create another hierarchy of the dimension, from the File menu in
Dimension Editor, point to New Dimension, and then click Wizard. Repeat Steps 3
through 8.
Self Assessment
In Microsoft SQL Server Analysis Services, an account type dimension is a dimension whose
attributes represent a journal of accounts for financial reporting reasons. An account dimension
permits you selectively manage aggregation across various accounts over time. An account
dimension also lets you use a benchmark means to resolve most of the non-standard
aggregation issues typically came across in business understanding solutions that handle
financial data. If you did not have such a standard mechanism, settling these nonstandard
aggregation issues would need Multidimensional Expression (MDX) scripts.
The following table describes the pre-defined properties of Account dimension members:
Table 9.1: Properties of Account Dimension Members
Property Description
Account A selectable option that groups the account member into a type,
such as Tax Expense or Liability, for use with business rules. Planning
Type Member Business Modeler uses this property to determine the aggregation
ID behaviour for accounts.
When using this property in a business rule calculation, you must
explicitly reference the property as Account Type Member ID. Do
not use a substitute such as "Account type".
Debit Indicates whether, for calculation purposes such as aggregation,
Credit Planning Business Modeler treats this account type as a debit
entry or a credit entry. When used in calculation, an account type
that has a debit name has a negative sign for calculation, and an
account type that has a credit entry has a positive sign.
Time Indicates how Planning Business Modeler handles values in this
Balance account type for aggregation. The following values are possible:
Sum — aggregation value is the sum of all child members.
End — aggregation value is the last nonempty child along the
Time dimension. Also called Last-child aggregation.
Avg — aggregation value is the average value of member
children.
Contd....
1 LOVELY PROFESSIONAL
Unit 9: Advanced Dimensional
Notes
Consolidated Boolean (TRUE-FALSE) value that indicates whether this account
type should be included in the Planning Business Modeler
consolidation calculations.
Converted Boolean (TRUE-FALSE) value that indicates whether this account
type should be included in the Planning Business Modeler
currency conversion calculations.
Inter-company Boolean (TRUE-FALSE) value that indicates whether this account
should be included in intercompany calculations.
Source: http://msdn.microsoft.com/en-us/library/bb795367(v=office.12).aspx
You can update the Account Type Member ID property of a member of the Account dimension.
You can also add a new dimension member to the Account dimension and create member
hierarchies in the dimension. In addition, you can also add member sets, member views, and
member properties. The following table shows the ways you can modify user-defined objects in
the Account dimension.
Table 9.2: User-defined Objects
Source: http://msdn.microsoft.com/en-us/library/bb839292(v=office.12).aspx
Self Assessment
9.3.1Implementing Actions
The purpose of an Online Analytical Processing (OLAP) application is to supply users with
valuable information to propel business conclusions. Actions supply another means by which
users can accumulate information and take steps based on the data they find in cubes. You can
add activities to a cube that users will subsequent execute. An action is habitually started by a
user or client application and relates to an object in a cube. That object might be a dimension
member or a specific cell, which is then used as a parameter for the activity. Not all client
applications are able to execute actions, so make certain you realise the capability of the client
application before creating actions in your cube.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes You can add several kinds of activities to a cube. A URL action is helpful for navigating to
a particular World Wide Web location based on cube facts and figures.
Example: You might desire to visit a customer’s Web location after examining that
customer’s data in a cube, or you might desire to get access to information from an internal
reporting Web server to get more data about a specific product you’re analysing.
The Cube Designer in Business Intelligence Development Studio (BIDS) includes an activity
tab. You can define an action on this tab by specifying the activity name, activity goal, the
action type, and the action sign that generates a string used to run the activity. An activity goal
is the portion of the cube to which the activity connects and is the object that the client clicks to
launch the action. The activity sign is a Multidimensional Expression (MDX) sign that evaluates
as a string applicable to the activity kind. Each activity kind has its own syntax requirements,
but usually you include the MDX CURRENTMEMBER function in the action expression to
link the object to the current cube context.
Did u know? In this method, we will add a new URL activity that opens a World Wide Web
page and executes a search for a product category or subcategory.
3. After the database has been successfully deployed and processed, expand the Cubes
folder in Solution Explorer, right-click the AdventureWorks.cube, and select View
Designer.
4. In the Cube Designer, click the activities tab. On the activities tab toolbar, click New
Action.
5. In the Action Editor, change the name of the action to Internet Search.
6. An action target is the location in the cube where the action can be executed. An action
target has a target type and a target object. You can choose from several target types.
Example: If you select Cube, the action is available for all cube objects—every
dimension, hierarchy, level, and member.
7. Expand the Target Object List, expand the product dimension, choose product by
Category, and click OK. You can enter an MDX sign in the status text box to further limit
the scope of the target.
8. In the Condition text box, enter the following MDX expression:
1 LOVELY PROFESSIONAL
Unit 9: Advanced Dimensional
[ P r o d u c t ] . [ P r o d u c t by C a t e g o r y ] . L e v e l IS [ P r o d u c t ] . [ P r o d u c t by Notes
Category].[Category] OR
[ P r o d u c t ] . [ P r o d u c t by C a t e g o r y ] . L e v e l IS [ P r o d u c t ] . [ P r o d u c t by
Category].[Subcategory]
You need to select the type of action that you want to create.
Figure 9.1: Activity Tab Toolbar
Source: http://i.msdn.microsoft.com/dynimg/IC574400.gif
9. In the Action Content section of the editor, verify that URL is selected from the Type list.
The Action Expression text box includes the string that will be passed to the application
that is begun by the action.
10. In the Action Expression text box, enter the following MDX expression:
“http://search.live.com/results.aspx?q=”
+ [Product].[Product by Category].CurrentMember.Name
+ “&form=QBLH”
Now before you can execute the action, you need to deploy your project. After the project is
successfully deployed, you can browse the cube and execute the URL action that you just
created.
Drillthrough actions supply fast access to lowest grade of details stored in a cube. When
you create a drillthrough action, you choose dimension attributes and assesses that are
returned as columns of data when the action is performed. When a client examining
summary value executes
LOVELY PROFESSIONAL 1
Business Intelligence
Notes the activity, the client application executes the drillthrough query supplied by analysis
Services to return and display a set of rows containing the detailed data behind the summary
value. Contrary to the action name, a drillthrough action does not get access to data stored
in the source relational database.
!
Caution Any data that you want to be available for drillthrough should be comprised in
the cube’s dimensions and measures.
When Reporting Services is part of your Business Intelligence (BI) infrastructure, you can
effortlessly create activities that execute these reports. After the report is established, you
can create an action that executes the report.
The enhanced security environment of Windows 7 needs that you modify the default Internet
Explorer security configuration if you want to deploy a report to the local instance of reporting
Services.
Notes You must furthermore add yourself to the Reporting Services Content Manager security role.
1. On the Microsoft Windows task bar, click Start, select All Programs, right-click Internet
Explorer, and select Run as Administrator.
2. In the User Account Control dialog box, select Allow.
1 LOVELY PROFESSIONAL
Unit 9: Advanced Dimensional
4. In the Internet Options dialog box, click the Security tab, select Trusted Sites, Notes
and then click Sites.
5. In the Trusted Sites dialog box, clear the Require Server Verification (https:)
For All Sites in This Zone check box.
6. In the Add This Website to the Zone text box, type http://localhost.
7. Click Add and then click Close. In the Internet Options dialog box, click OK.
LOVELY PROFESSIONAL 1
Business Intelligence
6. A URL action is helpful for navigating to a particular World Wide Web location based on
.................................. and figures.
9. Contrary to the action name, a drillthrough action does not get access to data stored in the
source ..................................
10. After the report is established, you can create an action that executes the
..................................
Company Profile:
Name:
Company Background
Due to the nature of their industry, PMP, Inc. has many internal and external
information reporting needs. Since healthcare insurance is so highly regulated, they
must comply with external state reporting requirements such as enrolment data,
HEIDIS, Child Health Check Up and HIV/AIDS statistics, as well as fraud detection
and prevention efforts. Internally,
Contd....
1 LOVELY PROFESSIONAL
Unit 9: Advanced Dimensional
Notes
there is a need to utilize information to predict and manage expenses, making it feasible
to provide the most comprehensive benefits packages possible to members at a reasonable
cost. PMP Inc’s Information Services Project Manager, John J. Burns, PMP describes
how the company relies on efficient access to data: “From a production point of view,
we’re in the claim adjudication business. We need to determine which type of claim or
which types of providers are submitting claims requiring the most effort to adjudicate.”
To access their data, PMP, Inc. relied on traditional reporting mechanisms and practices
which are heavily dependent on Information Services (IS) resources. John J. Burns
provided insight into the common maintenance and development of reports. “Reports that
were developed internally by IS staff or canned system reports not customized to our
specific needs were run using the same transactional database that was being used in
production. There were times when this would severely affect system performance.” This
impact on the production system was most evident at the beginning of the month, when
most reports are run, and resulted in the production system being brought to a crawl. The
drain on the IS resources was also significant due to the amount of time needed to create
custom reports or modify existing reports, decreasing the amount of time IS staff had
available to devote to other projects.
The Solution
When looking into reporting solutions, PMP, Inc. was concerned about some of the issues
associated with maintaining large sets of data in a data warehouse environment. Through
his experience with the MultiValue community, Burns was aware of MITS products and
the advanced reporting and analytics MITS Discover offered. MITS Discover presented a
way to access near real time data without need for a dedicated data warehouse. During
further review of MITS Discover Burns realized “[the] Discover hypercube concept
eliminated these concerns. We [also] anticipated that the browser-based interface would
reduce training time and combined with the ability to export directly into a spreadsheet
or Adobe PDF documents would empower the end user and not burden the limited
resources of our IS Department.”
PMP, Inc. wanted to find out what types of claims were costing the most to process and
who was filing them. Additionally, they needed to perform comparative analysis, such as
differences between geographical regions, average cost per claim type and cost per claim
by each of their submitting providers. Burns worked with a MITS Product Specialist to
design and implement their first set of Hypercubes to provide this analysis.
As a result of past experience, Burns mentioned, “often a solution based on how a product
functions is offered or even forced on a client.” He did not find this to be the case with
MITS and was impressed with the amount of time the MITS staff took to understand
PMP’s business processes and unique needs. He went on to say, “MITS took the time to
thoroughly comprehend what we hoped to accomplish and how MITS Discover could
meet most of those needs. [The MITS Product Specialist] also clearly stated what the tool
was not designed to do, and knowing that actually helped us get exactly what we required
out of the design of our Hypercubes.”
The Results
While the rollout of MITS Discover to the majority of PMP’s users is still in process,
they are currently using MITS within their Claims, Fraud and Abuse, Utilization,
and IS departments. One example of improved access to data is in their Turnaround
Hypercube. Claims have a 30 day turnaround for payment and with this Hypercube
they are able to identify where outstanding claims stand in their queue. Most claims
are adjudicated
Contd....
LOVELY PROFESSIONAL 1
Business Intelligence
Notes between day 14 and day 17, resulting in a bell curve distribution of the data. This metric has proved to be so useful that
Questions:
Analyse the case and provide a solution to Preferred Medical Plan, Inc. (PMP).
What is Turnaround Hypercube?
Source:
http://www.mits.com/solutions/success-stories/healthcare.html
9.4 Summary
• In Microsoft® SQL Server™ 2000 Analysis Services, a dimension with multiple
hierarchies is actually two or more distinct dimensions that can share dimension tables
and may share the same aggregations.
• Dimensions that have multiple hierarchies can be created in the Dimension Wizard or
Dimension Editor.
• The Account dimension and its associated rules enable you to create and maintain a chart
of accounts for various financial models.
• You can update the AccountTypeMemberID property of a member of the Account
dimension. You can also add a new dimension member to the Account dimension and
create member hierarchies in the dimension.
• The purpose of an Online Analytical Processing (OLAP) application is to supply users
with valuable information to propel business conclusions.
• The Cube Designer in Business Intelligence Development Studio (BIDS) includes an
activity tab.
• Drillthrough actions supply fast access to lowest grade of details stored in a cube.
When you create a drillthrough action, you choose dimension attributes and assesses
that are returned as columns of data when the action is performed.
9.5 Keywords
AccountTypeMemberID: A selectable option that groups the account member into a type.
1 LOVELY PROFESSIONAL
Unit 9: Advanced Dimensional
Cube Designer: The Cube Designer in Business Intelligence Development Studio (BIDS) Notes
includes an activity tab.
Multidimensional Expression sign (MDX): Multidimensional Expressions (MDX) is a query
language for OLAP databases, much like SQL is a query language for relational databases. It is
also a calculation language, with syntax similar to spreadsheet formulas.
URL actions: A URL action is a hyperlink that points to a Web page, file, or other web-based
resource outside of Tableau.
3. Discuss the process for creating a dimension with a single defined hierarchy using the
dimension wizard.
4. Explain the steps for creating a dimension with additional defined hierarchies using
the dimension wizard.
5. Describe the pre-defined properties of account dimension members.
10. Discuss how to configure internet explorer security to allow reporting services
administration?
3. True 4. True
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for Decision
Making”. John Wiley & Sons.
David Loshin (2012). “Business Intelligence: The Savvy Manager’s Guide”. Newnes.
LOVELY PROFESSIONAL 1
Business Intelligence
1 LOVELY PROFESSIONAL
Unit 10: Retrieving Data from Analysis
CONTENTS
Objectives
Introducti
on
10.4 Summary
10.5 Keywords
Objectives
Introduction
Your cube comprises numerous dimensions and measures from some subject
areas: sales, finance, output, and inventory. It can assist as a source of information
for multiple workgroups or agencies across your association. Although this
centralization is beneficial for users, application developers, and IT administrators,
it might be difficult for users to find their way to the data they need. To help make
your cube simpler to comprehend and navigate, you can create perspectives that
limit the number of dimensions, calculations, actions, and KPIs that users see
when they are browsing or querying a cube. A perspective allows you to create a
view of a subset of a cube that can be more easily comprehended by users. This
unit will provide you an insight into creating perspectives. You will learn about
Multidimensional Expression (MDX) queries. Finally, you will learn about
connecting Excel client to Analysis Services environment.
just like a cube. Although, it is important to note that you cannot request security
to a viewpoint.
1 LOVELY PROFESSIONAL
Unit 10: Retrieving Data from Analysis
Notes
Notes If a client has access to a cube, the client has access to all of the perspectives in that cub
For models that contain many subject areas, for example, Sales, Manufacturing,
and Supply data, it might be helpful to Report Builder users if you create
perspectives of the model. A perspective is a sub-set of a model.
Did u know? Creating perspectives can make navigating through the contents of the
model easier for your model users.
To create a perspective:
1. In the Tree view, right-click Model, point to New, and then click Perspective.
2. In the Edit Perspective dialog box, click Clear All.
3. Locate the Purchase Order Detail entity, and then select its check box.
4. To add all the attributes of the Purchase Order Header entity to the
perspective, clear the check box, and then select the check box again.
5. Locate the Product entity, clear the check box, and then select the check box
again.
6. Click OK.
To rename the perspective
1. To see the new perspective, scroll down to the bottom of the List view. The
last item listed is called New Perspective.
2. Right-click New Perspective, and then click Rename.
3. Type Products and Purchases and on the File menu, click Save All.
Self Assessment
LOVELY PROFESSIONAL 1
Business Intelligence
The most widespread use of an MDX query is to extract values from an OLAP cube Notes
to populate a report. A cube has dimensions, but a report does not. Reports have
axes. An axis can include members from more than one dimension.
!
Caution A report generally doesn’t show all of the data comprised in
The following syntax shows a basic SELECT statement that includes the use of the
SELECT, FROM, and WHERE clauses:
[ WITH <SELECT WITH clause> [ , <SELECT WITH clause> ... ] ]
SELECT [ * | ( <SELECT query axis clause>
[ , <SELECT query axis clause> ... ] ) ]
FROM <SELECT subcube clause>
[ <SELECT slicer axis clause> ]
[ <SELECT cell property list clause> ]
The MDX SELECT statement supports optional syntax, such as the WITH keyword,
the use of MDX functions and the ability to return the values of specific cell
properties as part of the query.
The following example shows a basic MDX query that uses the SELECT statement.
This query returns a result set that contains the 2010 and 2011 sales and tax
amounts for the North sales territories.
SELECT
{ [Measures].[Sales], [Measures].
[Tax] } ON COLUMNS,
{ [Date].[Fiscal].[Fiscal Year].&[2010], [Date].
[Fiscal].[Fiscal Year].&[2011] } ON ROWS
FROM [Adventure Works]
WHERE ( [Sales Territory].[North] )
In this example, the query defines the following result set information:
• The SELECT clause sets the query axes as the Sales and Tax members of the
Measures dimension, and the 2010 and 2011 members of the Date
dimension.
• The FROM clause indicates that the data source is the Adventure Works cube.
• The WHERE clause defines the slicer axis as the North member of the Sales
Territory dimension.
Notice that the query example also uses the COLUMNS and ROWS axis aliases.
The ordinal positions for these axes could also have been used.
Example: The following example shows how the MDX query could have been
written to use the ordinal position of each axis:
SELECT
1 LOVELY PROFESSIONAL
Unit 10: Retrieving Data from Analysis
{ [Measures].[Sales],
LOVELY PROFESSIONAL 1
Business Intelligence
Notes [Measures].[Tax] } ON 0,
{ [Date].[Fiscal].[Fiscal Year].&[2010], [Date].
[Fiscal].[Fiscal Year].&[2011] } ON 1
FROM [Adventure Works]
WHERE ( [Sales Territory].[North] )
Self Assessment
4. Learning MDX will permit you to take advantage of some of the more
advanced features of Analysis Services to create precisely the dataset you
need.
• Improve the appearance of your report by hiding field headers and using the
expand/ collapse buttons.
• Display KPIs.
Analysis Services provides dimensional data that is well-suited for data exploration
in PivotTables and Power View reports. You can get Analysis Services data from:
From within Excel, select the Analysis Services drop down from the Data tab ->
From Other Sources drop down, and then walk through the data connection wizard
to identify location, cube, and credentials.
1 LOVELY PROFESSIONAL
Unit 10: Retrieving Data from Analysis
Source:http://blogs.msdn.com/blogfiles/excel/WindowsLiveWriter/UsingExcelExcelServiceswith
SQLServerAnal_B11A/image_thumb.png
You should now have a connection to a cube within Excel and a pivot table ready.
Figure 10.2: Excel Connection Property
Source: http://blogs.msdn.com/blogfiles/excel/WindowsLiveWriter/UsingExcelExcelServiceswith
SQLServerAnal_B11A/image_thumb_1.png
LOVELY PROFESSIONAL 1
Business Intelligence
Notes You’ll notice that connecting to Analysis Services 2008 uses MSOLAP.4. To add the
provider to the list of approved providers in Excel Services, go to Central
Administration -> Shared Services Administration for Excel Services. Select
Trusted Data Providers from the Excel Services Settings of the Shared Services
Administration page. Add MSOLAP.4 to the trusted list in order for the connection
Figure 10.3: Excel Services Data Provider
Click Add Trusted Data Provider at the top of the list, and enter
Provider ID = MSOLAP.4
Data Provider Type = OLE DB
Description = Microsoft OLE DB Provider for OLAP Services 10.0.
You are now set and can publish your workbook to Excel Services and you will be
able to view, interact and refresh data from Analysis Services 2008 in Excel
Services.
Self Assessment
1 LOVELY PROFESSIONAL
Unit 10: Retrieving Data from Analysis
based
7. Analysis Services provides dimensional data that is well-suited for data on
exploration in PivotTables and ...................................... Micros
oft
8. Connecting to Analysis Services 2008 uses ......................................
SQL
Server
2012
by
Case Study Data Services Firm Uses Microsoft BI and Klout.
Hadoop to Boost Insight into Big Data Source:
http://www.microsoft
K
.com/en-us/sqlserver/
lout wanted to give consumers, brands, and partners faster, more product-info/case-
detailed insight into hundreds of terabytes of social-network data. It studies/klout.aspx
also wanted to boost efficiency. To do so, Klout deployed a business
intelligence solution based on Microsoft SQL
Server 2012 Enterprise and Apache Hadoop. As a result, Klout processes data
queries in near real time, minimizes costs, boosts efficiency, increases insight,
and facilitates innovation.
Solution
At the time that Klout was initially deploying its solution, SQL Server 2012
and Hive could not communicate directly. To work around this issue,
engineers set up a temporary relational database that runs MySQL 5.5
software. It includes data from the previous 30 days and serves as a
staging area for data exchange and analysis. Klout engineers are currently
working to implement the new open database connectivity driver in SQL
Server 2012 to directly join Hive with SQL Server 2012 Analysis Services. In
addition, to enhance insight Klout plans to work with Microsoft to incorporate
other Microsoft BI tools into its solution, such as Microsoft SQL Server
Power Pivot for Microsoft Excel.
Questions:
1. Analyse the case and provide any other solution to the problem.
Notes
1 LOVELY PROFESSIONAL
Unit 10: Retrieving Data from Analysis
• For models that contain many subject areas, for example, Sales,
Manufacturing, and Supply data, it might be helpful to Report Builder users
if you create perspectives of the model.
• You can use Analysis Services as a data source for the Office Excel 2007
PivotTable and PivotChart characteristics.
• From within Excel, select the Analysis Services drop down from the Data tab -
> From Other Sources drop down, and then walk through the data
connection wizard to identify location, cube, and credentials.
• Select Trusted Data Providers from the Excel Services Settings of the Shared
Services Administration page. Add MSOLAP.4 to the trusted list in order for
the connection to work in Excel Services.
10.5 Keywords
MDX SELECT: The MDX SELECT statement supports optional syntax, such as the
WITH keyword, the use of MDX functions and the ability to return the values of
specific cell properties as part of the query.
5. Write down the various features you can use to format and analyse data with
PivotTable in Excel 2007.
LOVELY PROFESSIONAL 1
Business Intelligence
3. True 4. True
5. True 6. Office Excel
2007
7. Power View reports 8. MSOLAP.4
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for Decision
Making”. John Wiley & Sons.
1 LOVELY PROFESSIONAL
Unit 10: Retrieving Data from Analysis
CONTENTS
Objectives
Introducti
on
11.9 Summary
11.10 Keywords
Objectives
Introduction
Data mining refers to the extraction of hidden predictive information from large
databases. Data mining techniques can yield the benefits of automation on
existing software and hardware platforms. Data mining tools can answer
business questions that traditionally were too time
LOVELY PROFESSIONAL 1
Unit 11: Data
consuming to resolve. In this unit, you will learn about data mining approaches, Notes
uses and its related issues. Also, applications of data mining will be discussed. As
the unit progress, you will learn about data mining models – predictive, summary,
network and association. Finally, data mining algorithms basics will be introduced.
Data
Data are any facts, numbers, or text that can be processed by a computer. Today,
organizations are accumulating vast and growing amounts of data in different
formats and different databases.
Did u know? This includes operational or transactional data (such as, sales,
cost, inventory, payroll, and accounting), non-operational data (such as
industry sales, forecast data etc.) and metadata i.e. data about data.
Information
The patterns, associations, or relationships among all types of data can provide information.
Example: Analysis of retail point of sale transaction data can yield information
on which products are selling and when.
Knowledge
3. Data selection: In this step, data relevant to the analysis task are retrieved
from the database.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes 6. Pattern evaluation: This step is used to identify the truly interesting patterns
representing knowledge based on some interestingness measures.
Source: http://www.emeraldinsight.com/content_images/fig/0670330804027.png
Data mining enables people to discover information that they can act on to better
understand, selectively market to and retain their best customers, or sharply cut
consumer fraud.
!
Caution In this we use pattern recognition logic to identity trends within a sample
data set.
Self Assessment
LOVELY PROFESSIONAL 1
Business Intelligence
Notes This approach of segmenting the database via clustering analysis is often used as an exploratory technique because
Source: http://www.ibm.com/developerworks/data/library/techarticle/dm-0811wurst/
outlier_by_clustering.jpg
Records inside a cluster are more alike to each other, and more different from
records that are in other clusters. Counting on the specific implementation, there
is a kind of measure of likeness that is used, but the general aim is for the
approach to converge to groups of associated records.
1 LOVELY PROFESSIONAL
Unit 11: Data
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
Example: Classes can be defined to represent the likelihood that a customer
defaults on a loan (Yes/No).
Source: http://www.siggraph.org/education/materials/HyperVis/applicat/data_mining/images/
tree.gif
Regression
Data mining is utilised for a variety of reasons in both the personal and public
parts. Industries such as banking, insurance, medicine, and retailing commonly
use data mining to decrease charges, enhance research, and increase sales.
Using customer data assembled over several years, businesses can evolve forms
that predict if a customer is a good credit risk, or if whether misfortune claim may
be fraudulent and should be investigated more neatly. The medical community
sometimes utilises data mining to help forecast the effectiveness of a procedure or
surgery.
1 LOVELY PROFESSIONAL
Unit 11: Data
of product selection and placement conclusions, coupon offers etc. Companies Notes
such as phone service providers and music clubs can use data mining to create a
“churn analysis,” to assess which customers are expected to stay as subscribers
and which ones are likely to switch to a competitor.
In the public sector, data mining applications were initially used as a means to
detect fraud and waste, but now they are used for purposes such as measuring and
improving program performance.
Self Assessment
Privacy
One of the key matters raised by data mining technology is not an enterprise or
technological one, but a social one. It is the issue of individual privacy. Data
mining makes it possible to investigate routine enterprise transactions and glean a
significant amount of information about persons buying habits and preferences.
Data Integrity
Clearly, data analysis can only be as good as the data that is being analysed. A key
implementation dispute is integrating inconsistent or redundant data from
distinct sources.
Confusion
Cost
Finally, there is the issue of cost. While system hardware costs have fallen
spectacularly inside the past five years, data mining and data warehousing are
inclined to be self-reinforcing. The more mighty the data mining queries, the larger
LOVELY PROFESSIONAL 1
Business Intelligence
1 LOVELY PROFESSIONAL
Unit 11: Data
Notes data, and the larger the force to increase the amount of data being assembled and
sustained, which increases the pressure for much quicker, more mighty data
mining queries. This raises pressure for bigger, much quicker systems, which are
more expensive.
Self Assessment
7. Data analysis can only be as good as the data that is being analysed.
Many of these companies are using data mining for statistics, pattern
acknowledgement, and other significant tasks. Data mining can be utilised to find
patterns and associations that would else be difficult to find. This concept is
popular with many businesses because it permits them to discover more about
their customers and make intelligent trading conclusions.
There are a number of applications that data mining has. The first is called market
segmentation. With market segmentation, you will be able to find behaviours that
are common among your customers. You can look for patterns among customers
that appear to buy the same products at the same time.
Another application of data excavation is called customer churn. It will permit you
to estimate which customers are most likely to stop purchasing your products or
services and proceed to one of your competitors. In addition to this, a company can
use data mining to find out which purchases are the most likely to be fraudulent.
Example: By using data mining a retail shop may be able to determine which
goods are stolen the most.
By finding out which products are stolen the most, steps can be taken to protect
those goods and notice those who are stealing them. You can furthermore use data
mining to determine the effectiveness of interactive trading. Some of your
customers will be more interested to buy your products online than offline, and you
should recognise them.
While many use data mining to boost their profits, many of them don’t realize that
it can be used to create new businesses.
Example: Assume that you are the owner of a latest gadgets manufacturing
company, and you are able to accurately predict the next large-scale latest
tendency based on the buying patterns of your customers.
It is very simple to say that you will become very wealthy in a short span of time.
You will have an advantage over your competitors. For long-term thinking rather
than easily guessing what the next large-scale trend will be, you will be able work
out it based on statistics, patterns, and reasoning.
LOVELY PROFESSIONAL 1
Business Intelligence
Another example of automatic prediction is to use data mining to look at your past Notes
marketing schemes. Which one worked the best? Why did it work the best? Who
were the customers that answered most favourably to it? Data mining will allow
you to answer these queries, and once you have the responses, you will be able to
avoid making any errors that you made in your previous trade.
Data mining can allow you to become better at what you do. A financial
organisation such as a bank can predict the number of defaults that will happen
among their customers inside a given time, and they can also forecast the amount
of deception that will occur as well based on the past records overview.
Example: If you have a tool that can automatically search your database to
look for patterns which are created. If you have access to this technology, you will
be able to find relationships that could allow you to make strategic conclusions.
This can lead to growth of organization based on reasoning.
Notes While data mining is a very important tool, it is important to note that it is not a full proof thing. It cannot guaran
Self Assessment
10.........................can be utilised to find patterns and associations that would else be difficult
to find.
11. With......................, you will be able to find behaviours that are common among your
customers.
Privacy Issues
The concerns about the individual privacy have been increasing enormously
recently particularly when internet is booming with social networks, e-commerce,
forums, blogs etc. Because of privacy issues, persons are afraid that their personal
information is collected and utilised in unethical way that possibly make them face
a lot of problems. Businesses collect data about their customers in numerous ways
for understanding their buying behaviours trends. A stage may come when the
organization may be acquired by other organization or is vanished. At that time the
individual data they own likely is be sold to other party or may be leaked.
Security Matters
Security is a large-scale issue. Companies own data about their workers and
customers including social security number (like in US), anniversary, payroll etc.
although how properly this information is taken care is still doubtful. There have
been situation that hackers accessed and robbed big database of customers from
large companies.
1 LOVELY PROFESSIONAL
Unit 11: Data
Data assembled through data mining using ethical purposes can be misused. This
information may be exploited by unethical people or companies to take advantage
in various ways which not only limits to fake identity proof creation and pass
confidential data to competitors. Also if incorrect data is used for decision-making,
it will affect the results of the company.
3. Network models: This type of model represents data by nodes and links.
Example: Purchases of certain items, such as soft drink and pizza together
will be represented by association models.
Self Assessment
• Segmentation algorithms: This type of algorithm divides data into groups, or Notes
clusters, of items that have similar properties.
Self Assessment
Case Study
Logic-ITA Student Data
W
e have performed a number of queries on datasets collected by the Logic-ITA
to assist teaching and learning. The Logic-ITA is a web-based tool used at Sydney University since 2001, in
purpose is to help students practice logic formal proofs and to inform the teacher of the
class progress.
Context of Use
Over the four years, around 860 students attended the course and used the tool, in which an exercise consists of a s
Data Stored
The tool’s teacher module collates all the student models into a database that the teacher can query and mine. Two
Contd....
1 LOVELY PROFESSIONAL
Unit 11: Data
Notes
Table 1: Common Variables in Table’s Mistake and Correct_step
Data Exploration
Simple SQL queries and histograms can really allow the teacher get a first
overview of the class: what were the most common mistakes, the logic rules
causing the most problems? What was the average number of exercises per
student? Are there any student not finishing any exercise? The list goes on.
To understand better how students use the tool, how they practice and how
they come to master both the tool and logical proofs, we also analysed data,
focussing on the number of attempted exercises per student. In SODAS, the
population is partitioned into sets called symbolic objects. Our symbolic
objects were defined by the number of attempted exercises and were
characterized by the values taken for these newly calculated variables: the
number of successfully completed exercises, the average number of correct
steps per attempted exercise, the average number of mistakes per attempted
exercise. We obtained a number of tables to compare all these objects.
Association Rules
LOVELY PROFESSIONAL 1
Business Intelligence
both using (i) k-means in TADA- Ed, and (ii) a combination of k-means and
hierarchical clustering of Clementine. Because there is neither a fixed
number nor a fixed set of exercises to compare students, determining
Contd....
1 LOVELY PROFESSIONAL
Unit 11: Data
calculated
a distance between individuals was not obvious. We calculated and used a between marks
new variable: the total number of mistakes made per student in an exercise. in the final exam
As a result, students with similar frequency of mistakes were put in the same and activity with
group. Histograms showing the different clusters revealed interesting the Logic Tutor
patterns. There are three clusters: 0 (red, on the left), 1 (green, in the middle) and with the
and 4 (purple, on the right). From other windows (not shown), we know that general, human
students in cluster 0 made many mistakes per exercise not finished, students perception of
in cluster 1 made few mistakes and students in cluster 4 made an tutors in this
intermediate number of mistakes. Students making many mistakes use also course.
many different logic rules while solving exercises; this is shown with the Therefore, a
vertical, almost solid lines. sensible warning
system could
Classification look as follows:
We built decision trees to try and predict exam marks (for the question Report to the
related to formal proofs). The Decision Tree algorithm produces a tree-like lecturer- in-
representation of the model it produces. From the tree it is then easy to charge students
generate rules in the form IF condition THEN outcome. Using as a training who have
set the previous year of student data (mistakes, number of exercises, completed
difficulty of the exercises, number of concepts used in one exercise, level successfully less
reached) as well as the final mark obtained in the logic question), we can than 3 exercises.
build and use a decision tree that predicts the exam mark according to the For those
attributes. students, display
the histogram of
Supporting Teachers and Learners rules used. Be
Pedagogical Information Extracted proactive
towards these
The information extracted greatly assisted us as teachers to better students,
understand the cohort of learners. Whilst SQL queries and various distinguishing
histograms were used during the course of the teaching semester to focus the those who use
following lecture on problem areas, the more complex mining was left for out the pop-up
reflection between semesters. Symbolic data analysis revealed that if menu for logic
students attempt at least two exercises, they are more likely to do more rules from the
(probably overcoming the initial barrier of use) and complete their exercises. others.
In subsequent years we required students to do at least 2 exercises as part of
their assessment. Mistakes that were associated together indicated to us that
the very concept of formal proofs (i.e. the structure of each element of the
proof, as opposed to the use of rules for instance) was a problem. In 2003,
that portion of the course was redesigned to take this problem into account
and the role of each part of the proof was emphasized. After the end of the
semester, mining for mistakes associations was conducted again.
Surprisingly, results did not change much (a slight decrease in support and
confidence levels in 2003 followed by a slight increase in 2004). However,
marks in the final exam continued increasing. This leads us to think that
making mistakes, especially while using a training tool, is simply part of the
learning process and was supported by the fact that the number of completed
exercises per student increased in 2003 and 2004. The level of prediction
seems to be much better when the prediction is based on exercises (number,
length, variety of rules) rather than on mistakes made. This also supports the
idea that mistakes are part of the learning process, especially in a practice
tool where mistakes are not penalized.
Using data exploration and results from decision tree, one can infer that if
students do successfully 2 to 3 exercises for the topic, then they seem to have
grasped the concept of formal proof and are likely to perform well in the
exam question related to that topic. This finding is coherent with correlations
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
Contd....
1 LOVELY PROFESSIONAL
Unit 11: Data
Notes
ITS with Proactive Feedback
Data mining findings can also be used to improve the tutoring system. We
implemented a function in Tada-Ed allowing the teacher to extract patterns
with a view to integrate them in the ITS from which the data was recorded.
Presently this functionality is available for Association Rule module. That is,
the teacher can extract any association rule. Rules are then saved in an XML
file and fed into the pedagogical module of the ITS. Along with the pattern,
the teacher can specify an URL that will be added to the feedback window
and where the teacher can design his/her own proactive feedback for that
particular sequence of mistakes C (which the student has not yet made).
The structure of the XML file is fairly simple and is shown in (a). For instance,
using our logic data, we extracted the rule saying that if a student makes the
mistakes “Invalid justification” followed by “Premise set incorrect” then
she/he is likely to make the mistake “Wrong number of references lines
given” in a later step (presently there is no restriction on the time window).
This rule has a support of 47% and a confidence of 74%. The teacher, when
saving the pattern, also entered an URL to be prompted to the student. The
pedagogical module of the Logic Tutor then reads the file and adds the rule to
its knowledge base. Then, when the student makes these two initial mistakes,
she/he will receive, in addition to the relevant feedback on that mistake, an
additional message in the same window (in a different colour) advising
him/her to consult the web page created by the teacher for this particular
sequence of mistakes.
Questions:
LOVELY PROFESSIONAL 1
Business Intelligence
2. Als
o
disc
uss
the
beh
avio
ur
of
clus
teri
ng
and
clus
ter
visu
alis
atio
n.
Source:
books.go
ogle.co.i
n/books?
isbn=15
8603530
4
1 LOVELY PROFESSIONAL
Unit 11: Data
• Data are any facts, numbers, or text that can be processed by a computer.
Today, organizations are accumulating vast and growing amounts of data in
different formats and different databases.
• Two widespread data mining methods for finding concealed patterns in data
are clustering and classification analysis.
• Data mining is utilised for a variety of reasons in both the personal and
public parts.
• Data Mining is a relatively new concept that has not completely matured.
Regardless of this, there are a number of industries that are already using it
on a normal basis.
• The concerns about the individual privacy have been increasing enormously
recently particularly when internet is booming with social networks, e-
commerce, forums, blogs etc.
11.10 Keywords
Association algorithms: This type of algorithm finds correlations between
different attributes in a dataset.
Association models: Association models are used to find and characterize co-occurrences.
Data: Data are any facts, numbers, or text that can be processed by a computer.
Data mining: Data mining is the practice of automatically searching large stores of
data to discover patterns and trends that go beyond simple analysis.
Network models: This type of model represents data by nodes and links.
Predictive models: These types of models predict how likely an event is to occur.
LOVELY PROFESSIONAL 1
Business Intelligence
1 LOVELY PROFESSIONAL
Unit 11: Data
3. Information 4. Knowledge
5. Data cleaning 6. Prediction
7. True 8. True
9. True 10. Data mining
11. Market segmentation 12. Network models
13. Association models 14. Data mining
algorithm
15. Association algorithms
“Business Intelligence”.
O’Reilly Media, Inc.
Rajiv Sabhrwal, Irma Becerra-Fernandez (2010). “Business
Intelligence”. John Wiley & Sons.
Swain Scheps (2013). “Business Intelligence for Dummies”. Wiley.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
1 LOVELY PROFESSIONAL
Unit 11: Data
CONTENTS
Objectives
Introduction
12.4 Summary
12.5 Keywords
Objectives
After studying this unit, you will be able to:
• Identify the data mining tools in SQL server
Introduction
Data mining is the process of analysing data from different perspectives and summarizing it
into useful information. This information can be used to increase revenue, cuts costs, or both.
Data mining software analyses relationships and patterns in stored transaction data depending
on the open-ended user queries. In this Unit, you will learn about data mining tools. Data
mining tools in SQL server will be discussed. Also the unit covers mining structure and models
of mining structure. Finally, configuration algorithm parameters will be defined.
LOVELY PROFESSIONAL 1
Unit 12: Understanding Data
• The Data Mining Wizard in SQL Server Data Tools (SSDT) makes it very simple to Notes
create mining structures and mining models, using either relational data or
multidimensional data in cubes.
Notes In the wizard, you select data to use, and then request exact data mining techniques, such as clustering, neural networks, or time sequence model
• Model viewers are supplied in both SQL Server administrations Studio and SQL Server
Data Tools (SSDT), for exploring your mining models after they are created.
Did u know? You can browse models using viewers tailored to each algorithm, or go
deeper into analysis by using the model content viewer.
• The Prediction Query Builder is provided in both SQL Server administration Studio
and SQL Server Data Tools (SSDT) to help you create prediction queries.
Notes You can also test the accuracy of models against a holdout data set or external data, or use cross-validation to assess the value of your data set.
• SQL Server Management Studio is the interface where you organize living data mining
solutions that have been established to an example of Analysis Services. You can
reprocess structure and models to update the data in them.
• SQL Server Integration Services includes tools that you can use to clean data, to automate
jobs such as creating predictions and updating models, and to create text mining solutions.
Let us discuss more about the data mining tools in SQL Server.
The Data Mining Wizard in Microsoft SQL Server Analysis Services starts every time
when you add a new mining structure to a data mining project. It helps you choose a data
source and set up a data source model view that defines the data to be used for analysis, and
then helps you create an initial model.
To use the Data Mining Wizard, you must have opened a solution in SQL Server Data Tools
(SSDT) that contains at least one data mining or OLAP project. Follow these steps:
• If your solution is ready for data mining, you can right-click the Mining Structures node
in Solution Explorer and select New Mining Structure to start the wizard.
• If your solution does not contain any existing projects, you can add a new data mining
project. To add a new project, from the File menu, select New, and then select Project.
Then chose the template, Analysis Services Multidimensional and Data Mining Project.
The next decision to make is whether to use a relational data source, or to use an OLAP mining
model.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
Notes The Data Mining Wizard branches into two paths at this point, depending on whether your data source is relation
Choosing an Algorithm
!
Caution Each algorithm provided in Analysis Services has different features and produces
different results, so this decision can be difficult to make.
You can experiment and try several different models before determining which is most
appropriate for your business problem.
• Auto-detection of data types: The wizard will examine the uniqueness and
circulation of column values and then suggest the best data type, and suggest a usage type
for the data. You can override these proposals by choosing values from a list.
• Suggestions for variables: You can click on a dialog box and start an analyser that
calculates correlations over the columns included in the model, and determine if any
columns are expected predictors of outcome attribute, given the configuration of the
form so far.
• Feature selection: Most algorithms will automatically detect columns that are good
predictors and use those preferentially. In columns that comprise too many values, feature
selection will be applied, to decrease the cardinality of the data and improve the
possibilities for finding a meaningful pattern.
• Automatic cube slicing: If your mining model is based on an OLAP data source, the
ability to slice the model by using cube attributes is automatically provided.
After you have created a data mining structure and mining model by utilising the Data
Mining Wizard, you can use the data mining Designer from either SQL Server Data Tools
(SSDT) or SQL Server administration Studio to work with living models and structures.
The designer includes devices for these tasks:
• Change the properties of mining structures, add columns and create column aliases,
change the binning procedure or expected distribution of values.
• Add new models to an existing structure; replicate models, change model properties
or metadata, or define filters on a mining model.
• Browse the patterns and rules inside the model; discover associations or decision trees.
• Validate forms by creating lift charts, or analyse the profit curve for models. Compare
models using classification matrices, or validate a data set and its models by using cross-
validation.
1 LOVELY PROFESSIONAL
Unit 12: Understanding Data
• Create propositions and content queries against existing mining models. Build one-off Notes
queries, or set up queries to develop propositions for whole benches of external data.
After you create and deploy mining forms to a server, you can use SQL Server
administration Studio to organise the Analysis Services database that hosts the data mining
items. You can also continue to continue jobs that use the model, such as exploring the
models, processing new data, and creating propositions.
Notes Management Studio also comprises query editors that you can use to design and execute Data Mining Extensions (DMX) queries.
SQL Server Integration Services provides many components that support data mining.
Some tools in Integration Services are designed to help automate common data mining tasks
such as prediction, model building, and processing.
Example:
• Create an Integration Services package that automatically updates the model every
time the addition of new customers updates the dataset.
• Perform custom sampling of case records.
You can also use data mining in a package workflow, as an input to other processes.
Example:
• To weight score for text mining use probability values generated by the model.
• Automatically generate predictions based on prior data and use those values to
assess the validity of new data.
• To segment incoming customers by risk use logistic regression.
Self Assessment
1. Model viewers are supplied in both SQL Server administrations Studio and
..................................................
3. The...............................in Microsoft SQL Server Analysis Services starts every time when
you add a new mining structure to a data mining project.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
Task Find out the difference between Mining Model Viewer and Mining Accuracy Chart Tab?
Source: http://i.technet.microsoft.com/dynimg/IC13488.gif
The mining structure in the Figure 12.1 is based on a data source that comprises multiple
tables or views, connected on the CustomerID field. One table contains data about
customers, such as the region, age, earnings and gender, while the associated nested table
comprises multiple rows of additional data about each customer, such as goods the
customer has bought. The Figure 12.1 shows that multiple models can be constructed on
one mining structure, and that the models can use different columns from the structure.
Here:
Model 1 uses CustomerID, Income, Age, Region, and filters the data on Region.
Model 2 uses CustomerID, Income, Age, Region and filters the data on Age.
Model 3 uses CustomerID, Age, Gender, and the nested table, with no filter.
As the models use different columns for input, and because two of the models also restrict the
data that is used in the model by applying a filter, the models might have very different results
even though they are based on the same data.
1 LOVELY PROFESSIONAL
Unit 12: Understanding Data
Notes
!
Caution Here, the CustomerID column is required in all models because it is the only
available column that can be used as the case key.
• Selecting columns of data to include in the structure and defining a case key.
• Define a key for the structure (including the key for the bested table, if applicable).
• Specify whether the source data should be separate into a training set and testing set or
not (Optional).
• Process the structure.
When you define a mining structure, you use columns that are available in an existing data
source view. A data source view is a distributed object that permits you combines multiple data
sources and uses them as a single source. The initial data sources are not evident to consumer
applications, and you can use the properties of the data source view to modify data kinds, create
aggregations, or alias columns.
If you develop multiple mining models from the same mining structure, the models can use
distinct columns from the structure.
Example: You can create a single structure and then build distinct decision tree and
clustering models from it, with each form using distinct columns and forecasting distinct
attributes.
Also, each model can use the columns from the structure in different ways.
Example: Your data source view might comprise an Income column, which you can
bin in different ways for different models.
The building blocks of the mining structure are the mining structure columns, which recount the
data that the data source comprises. These columns comprise data such as data type, content
type, and how the data is distributed. The mining structure does not comprise data about how
columns are used for an exact mining model, or about the kind of algorithm that is used to build
a model; this data is characterised in the mining model itself.
A mining structure can also comprise nested tables. A nested table comprises a one-to-many
relationship between the entity of a case and its associated attributes.
Example: If the information that recounts the customer resident in one table, and the
customer’s purchase resides in another table, you can use nested tables to blend the information
into a single case.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes The customer identifier resident is the entity, and the purchases are the related attributes.
When you define the data for the mining structure, you can also specify that some of the data be
used for training, and some for testing. Therefore, it is no longer necessary to separate your data
in advance of creating a data mining structure. Instead, while you create your model, you can
specify that a certain percentage of the data be held out for testing, and the rest used for
training.
Enabling Drillthrough
You can add columns to the mining structure even if you do not plan to use the column in a
specific mining model. This is helpful if, for example, you desire to get the e-mail locations of
customers in a clustering model, without using the e-mail address throughout the analysis
method.
When you process a mining structure, Analysis Services creates a cache that stores statistics
about the data, information about how any continuous attributes are discretized and other
information that is later used by mining models.
In SQL Server Data Tools (SSDT), you can use the Mining Structure tab of Data Mining
Designer to view the structure columns and their definitions.
Self Assessment
5. A single mining structure can support multiple mining models that share the identical
domain.
6. When you define a mining structure, you use columns that are available in an existing
data source view.
7. The building blocks of the mining structure are the mining structure columns, which
recount the data that the data source comprises.
8. A mining structure cannot comprise nested tables.
1 LOVELY PROFESSIONAL
Unit 12: Understanding Data
1. On the Mining Models tab of Data Mining Designer in SQL Server Data Tools (SSDT),
right-click the algorithm type and select Set Algorithm Parameters. The Algorithm
Parameters dialog box will open.
Figure 12.2: An algorithm Parameters dialog box
Source: http://media.techtarget.com/digitalguide/images/Misc/dm_6.gif
2. In the Value column, set a new value for the algorithm that you want to change. If you do
not enter a value in the Value column, Analysis Services uses the default parameter value.
The Range column describes the possible values that you can enter.
3. Click OK. The algorithm parameter is set with the new value. The parameter change will
not be reflected in the mining model until you reprocess the model.
LOVELY PROFESSIONAL 1
Business Intelligence
10. On the Mining Models tab of Data Mining Designer in SQL Server Data Tools (SSDT),
............................... the algorithm type and select Set Algorithm Parameters.
J
ournyx Timesheet (tm) is a commercial application that provides time,
expense, and project tracking. In 1996, Curt Finch, Journyx CEO and
founder, was working in the staffing industry when he saw an opportunity
to use the web to accurately collect and
store employee timesheet information.
Figure 1: Journyx Time Entry Screen
Journyx Timesheet has been using Python from the beginning. Curt Finch chose Python
initially on the recommendation of a friend, Steve Madere, who had founded
Dejanews.com (now a part of Google). Describing the rationale for his choice, Curt said,
“I looked at Java and C and came to the conclusion that 1 line of Python is 10 lines of
Java or 100 lines of C.
Contd....
1 LOVELY PROFESSIONAL
Unit 12: Understanding Data
Developers write code at basically a constant rate so we chose Python which was (and is) Notes
the highest level language I’ve ever seen that is also flexible enough to be generally
useful.”
Architecture
From the beginning, Timesheet was designed and implemented as a web application. It
uses three-tiered web application architecture with separate layers for web presentation,
business logic, and data storage. As time has progressed, the application’s functionality
has advanced considerably, and Curt’s decision to use Python for an implementation
language has proven to be good choice. Python is currently used for all application logic
in the Timesheet application. This includes all code between the initial Apache dispatch,
where mod_python is employed to expedite interpreter instantiation, through the
application logic, and down to the point of call out to the database transport layer.
Timesheet uses not only the Python standard library but also several independently
developed open source Python subsystems, such as PyXML and ActZero’s SOAP
support. PyXML is used to implement certain business rules and to develop jxAPI, which
is a SOAP-based API into the application logic. Work is in progress to extend this API to
define Web Services Description Language templates for the jxAPI functions. The
application currently builds and ships with Python 2.1.1. Timesheet also incorporates
several non-Python technologies. The Unix and Linux distributions are packaged with
the Apache HTTP server and PostgreSQL database. The Timesheet distribution for
Windows ships with an optional Microsoft Desktop Engine (MSDE) database and
integrates with Microsoft IIS. Timesheet can be configured to use a variety of third-party
databases.
Results
The Timesheet project has succeeded spectacularly, generating millions in revenue and
allowing Journyx to grow every year, even under the current economic conditions.
Journyx, like many of our customers, uses Timesheet internally as a mission critical part
of the company infrastructure. It is used extensively for project tracking, billing, and
payroll. To date, approximately 11 person-years have gone into the Journyx Timesheet
product, resulting in over one hundred fifty thousand lines of Python code. In developing
Journyx, the two greatest benefits of Python were the speed with which features could be
written and deployed, and its true write-once-run-anywhere cross-platform capabilities.
Journyx developers have found that the simplicity and clarity of Python combine with its
object- oriented properties to make it a very powerful and productive language. Python’s
rich standard library, which includes modules for things like string manipulation and
HTML generation, further supports programmers in meeting aggressive development
schedules.
Because of these properties of the language, Python has enabled Journyx to add features
more quickly than our competitors. We’ve been able to implement SOAP/XML and
WSDL support and extended other aspects of the application’s functionality well ahead
of competitive products. One of the key enablers of this efficiency in maintenance and
improvement is the inherent clarity and readability of the Python language. Other
important factors are the vibrant and responsive Python development community, and
the high degree of backwards compatibility and stability we have seen as the language
design evolves over time. Python’s cross-platform standard library and platform-
independent byte code file format allow the deployment of Python modules to any
platform, regardless of which platform the module was prepared on. This helped not only
in avoiding per-platform development overhead but also facilitates customer support for
the Timesheet software product. For example, a patch module built on a Redhat 6.2
system can be sent to a customer for installation on Windows XP or any other operation
system without the need for cross-compilation or translation of any kind.
Contd....
LOVELY PROFESSIONAL 1
Business Intelligence
Notes Questions:
How Python made it possible for Journyx to produce a flexible, feature-rich product for multiple platforms in less time
Explain how Python becomes an important competitive advantage for Journyx Timesheet (tm)?
Source: http://www.python.org/about/success/journyx/
12.4 Summary
• The Data Mining Wizard in SQL Server Data Tools (SSDT) makes it very simple to
create mining structures and mining models, using either relational data or
multidimensional data in cubes.
• The Data Mining Wizard in Microsoft SQL Server Analysis Services starts every time
when you add a new mining structure to a data mining project.
• After you create and deploy mining forms to a server, you can use SQL Server
administration Studio to organise the Analysis Services database that hosts the data
mining items.
• The mining structure defines the data from which mining models are constructed: it
specifies the source data outlook, the number and kind of columns, and an optional
partition into training and testing groups.
• You can change the parameters supplied with the algorithms that you use to construct
data mining models to customize the results of the model.
• On the Mining Models tab of Data Mining Designer in SQL Server Data Tools (SSDT),
right-click the algorithm type and select Set Algorithm Parameters.
12.5 Keywords
Data Mining Extensions (DMX): Data Mining Extensions (DMX) is a language that you
can use to create and work with data mining models in Microsoft SQL Server Analysis
Services.
Data Mining Wizard: The Data Mining Wizard in Microsoft SQL Server 2005 Analysis
Services (SSAS) starts every time that you add a new mining structure to a data mining project.
Mining structure: It defines the data from which mining models are constructed; it
specifies the source data outlook, the number and kind of columns, and an optional partition
into training and testing groups.
SQL Server Data Tools: SQL Server Data Tools (SSDT) transforms database
development by introducing a ubiquitous, declarative model that spans all the phases of
database development and maintenance/update inside Visual Studio.
1 LOVELY PROFESSIONAL
Unit 12: Understanding Data
8. Discuss the connection of the data mining structure to the data source.
5. True 6. True
7. True 8. False
9. Algorithm parameters 10. Right-click
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for Decision
Making”. John Wiley & Sons.
David Loshin (2012). “Business Intelligence: The Savvy Manager’s Guide”. Newnes.
LOVELY PROFESSIONAL 1
Business Intelligence
CONTENTS
Objectives
Introduction
13.1 Prediction Queries
13.3 Summary
13.4 Keywords
Objectives
Introduction
Data Mining Extensions (DMX) is a query language used for Data Mining Models
supported by Microsoft’s SQL Server Analysis Services product. Like SQL, it supports a data
definition language, data manipulation language and a data query language, all three with
SQL-like syntax. Difference is that SQL statements operate on relational tables while DMX
statements operate on data mining models. In this unit, you will learn about prediction
queries like adding, singleton and batch queries. Later on in this unit you will learn about
data mining extensions and statements- data definition statements, data manipulation
statements and data query statements.
1 LOVELY PROFESSIONAL
Unit 13: Creating Data Mining Queries and
Notes
Example: You might want to forecast the amount of expected downtime for a
certain cluster of servers, or develop a score that shows if segments of customers are
expected to reply to an advertising campaign or not. To do all these things, you need to
create a prediction query.
Functionally, there are distinct types of prediction queries supported in SQL Server, depending
on the type of inputs to the query. They are shown in Table 13.1.
Table 13.1: Types of Prediction Queries
Singleton Use a singleton query when you want to predict outcomes for a single new case,
prediction or multiple new cases. You provide the input values directly in the query, and
queries the query is executed as a single session.
Batch Use batch predictions when you have external data that you want to feed
predictions into the model, to use as the basis for predictions. To make predictions for
an entire set of data, you map the data in the external source to the columns
in the model, and then specify the type of predictive data you want to output.
The query for the entire dataset is executed in a single session, making this
option much more efficient than sending multiple repeated queries.
Time Series Use a time series query when you want to predict a value over some number of
predictions future steps. SQL Server Data Mining also provides the following functionality
in time series queries:
• You can extend an existing model by adding new data as part of the
query, and make predictions based on the composite series.
• You can apply an existing model to a new data series by using the
REPLACE_MODEL_CASES option.
• You can perform cross-prediction.
Source: http://technet.microsoft.com/en-us/library/hh213169.aspx
When you create a prediction, you normally supply some piece of new information and ask the
model to develop a prediction based on the new data.
• In a batch prediction query, you map the model to an external source of data by using
a prediction join.
• In a singleton prediction query, you type one or more values to use as inputs.
Did u know? You can create multiple propositions using a singleton prediction query.
However, if you need to create many propositions, performance is better when you use a
batch query.
Both singleton and batch prediction queries use the PREDICTION JOIN syntax to define the
new data. The difference is in how the input side of the prediction join is specified.
• In a batch prediction query, the data comes from an external data source that is specified
by using the OPENQUERY syntax.
• In a singleton prediction query, the data is supplied inline as part of the query.
LOVELY PROFESSIONAL 1
Business Intelligence
In addition to predicting a value, you can customize a prediction query to return various
types of information that are related to the proposition.
Example: Clustering models support special prediction purposes that supply extra
details about the clusters created by the model, while time series models have functions that
assess difference over time.
The first step is to use the SELECT FROM <model> PREDICTION JOIN (DMX) in a singleton
prediction query.
Here, the first line of the code defines the columns from the mining model that the query should
return, and specifies the mining model that is used to generate the prediction:
SELECT <select list> FROM [<mining model name>]
The next lines of the code define the characteristics of the customer that you use to create a
prediction:
NATURAL PREDICTION JOIN
(SELECT ‘<value>’ AS [<column>], ...)
AS [<input alias>]
ORDER BY <expression>
If you specify NATURAL PREDICTION JOIN, the server matches each column from the
model to a column from the input, based on column names. If column names do not match, the
columns are ignored.
1. In Object Explorer, right-click the instance of Analysis Services, point to New Query, and
then click DMX. Query Editor opens and contains a new, blank query.
2. Copy the example of the singleton statement into the blank query.
1 LOVELY PROFESSIONAL
Unit 13: Creating Data Mining Queries and
Notes
[Car Buyer] AS Buyer, PredictHistogram([Car Buyer]) AS Statistics
The AS statement is used to alias columns returned by the query. The PredictHistogram
function returns statistics about the prediction, including the probability and the support.
4. Replace the following:
[<mining model>]
with:
[Decision Tree]
with:
(SELECT 35 AS [Age],
‘5-10 Miles’ AS [Commute Distance],
‘1’ AS [House Owner ],
2 AS [Number Bikes Owned],
2 AS [Total Children]) AS t
The complete statement should now be as follows:
SELECT
[Car Buyer] AS Buyer,
PredictHistogram([Car Buyer]) AS Statistics
FROM
[Decision Tree]
NATURAL PREDICTION JOIN
(SELECT 35 AS [Age],
‘5-10 Miles’ AS [Commute Distance],
‘1’ AS [House Owner],
2 AS [Number Bikes Owned],
2 AS [Total Children]) AS t
7. In the Save As dialog box, browse to the appropriate folder, and name the file
Singleton_Query.dmx.
8. On the toolbar, click the Execute button.
!
Caution The query returns a prediction about whether a customer with the specified
characteristics will purchase a car, as well as statistics about that prediction.
The next step is to use the SELECT FROM <model> PREDICTION JOIN (DMX) in a batch
prediction query.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
Example: The following is an example of a batch statement:
SELECT TOP <number> <select list>
FROM [<mining model name>]
PREDICTION JOIN
OPENQUERY([<datasource>],’<SELECT statement>’) AS
[<input alias>]
ON <on clause, mapping,>
WHERE <where clause, boolean expression,>
ORDER BY <expression>
Let us see which line of code is useful for what purpose. Here, the first two lines of the code
define the columns from mining model that the query returns, as well as the name of the mining
model that is used to generate the prediction in the saw as singleton query return.
Notes The TOP <number> statement specifies that the query will only return the number or the results specified by <
The next lines of the code define the source data that the predictions are based on:
OPENQUERY([<datasource>],’<SELECT statement>’) AS
[<input alias>]
The next line defines the mapping between the source columns in the mining model and the
columns in the source data:
ON <column mappings>
The WHERE clause filters the results returned by the prediction query:
WHERE <where clause, boolean expression,>
The last (and optional) line of the code specifies the column that the results will be ordered by:
ORDER BY <expression> [DESC|ASC]
Use ORDER BY in combination with the TOP <number> statement, to filter the results that are
returned.
Example: In this prediction you will return the top ten car buyers, ordered by the
probability of the prediction being correct. You can use [DESC|ASC] to control the order in
which the results are displayed.
Self Assessment
1. The aim of a usual data mining task is to use the mining model to make ....................... .
2. When you create a prediction, you normally supply some piece of new information and
ask the model to develop a prediction based on the ....................... .
3. Both ....................... and....................prediction queries use the PREDICTION JOIN syntax
to define the new data.
4. In addition to predicting a value, you can customize a prediction query to return various
types of information that are related to the ....................... .
1 LOVELY PROFESSIONAL
Unit 13: Creating Data Mining Queries and
5. In...................... -, right-click the instance of Analysis Services, point to New Query, and Notes
then click DMX.
You can use DMX statements to create, process, delete, copy and predict against data
mining models. There are three types of statements in DMX: data definition statements,
data manipulation statements and data query statements.
Task Compare and contrast the data manipulation statements and data query statements.
Data definition statements are used in DMX to create and define new mining structures and
models, to import and export mining models and mining structures, and to drop existing
models from a database. Data definition statements in DMX are part of the data definition
language (DDL). You can perform the following tasks with the data definition statements in
DMX:
• Add a mining model to the mining structure by using the ALTER MINING
STRUCTURE statement.
• Create a mining model and associated mining structure simultaneously by using the
CREATE MINING MODEL
• Export a mining model and associated mining structure to a file by using the EXPORT
statement.
• Import a mining model and associated mining structure from a file that is created by the
EXPORT statement by using the IMPORT statement.
• Copy the structure of an existing mining model into a new model, and train it with the
same data, by using the SELECT INTO statement.
• Remove a mining model from a database by using the DROP MINING MODEL statement.
Data manipulation statements are used in DMX to work with existing mining models, to
browse the models and to create predictions against them. Data manipulation statements in
DMX are part of the data manipulation language (DML). You can perform the following
tasks with the data manipulation statements in DMX:
• Train a mining model by using the INSERT INTO statement.
LOVELY PROFESSIONAL 1
Business Intelligence
Notes • Extend the SELECT statement to browse the information that is calculated during
model training and stored in the data mining model, such as statistics of the source
data.
• Create predictions that are based on an existing mining model by using the PREDICTION
JOIN clause of the SELECT statement.
• Remove all the trained data from a model or a structure by using the DELETE (DMX)
statement.
DMX functions can be used to obtain information that is discovered during the training of your
models, and to calculate new information. One can also use these functions for many purposes,
including to return statistics that describe the underlying data or the accuracy of a prediction, or
to return an expanded explanation of a prediction.
Self Assessment
6. Data manipulation language (DML) is a language that you can use to create and work
with data mining models in Microsoft SQL Server Analysis Services.
7. You can use DMX statements to create process, delete, copy and predict against data
mining models.
8. Data definition statements are used in DMX to create and define, to import and
export, and to drop existing models from a database.
9. Data manipulation statements in DMX are not a part of the Data Manipulation
Language (DML).
10. DMX functions can be used to obtain information that is discovered during the training of
your models, and to calculate new information.
Case Study
Federal Agency Data Mining Reporting
Short title
This section may be cited as the “Federal Agency Data Mining Reporting Act of 2007”.
Definitions
In this section:
Data mining: The term “data mining” means a program involving pattern-based queries, searches, or other analyses
A department or agency of the Federal Government, or a non-Federal entity acting on behalf of the Federal Governm
Contd....
1 LOVELY PROFESSIONAL
Unit 13: Creating Data Mining Queries and
(B) The queries, searches, or other analyses are not subject-based and do not use Notes
personal identifiers of a specific individual, or inputs associated with a
specific individual or group of individuals, to retrieve information from the
database or databases; and
(C) The purpose of the queries, searches, or other analyses is not solely—
(2) Database: The term “database” does not include telephone directories, news
reporting, and information publicly available to any member of the public
without payment of a fee, or databases of judicial and administrative opinions or
other legal research sources.
(1) Requirement for report: The head of each department or agency of the Federal
Government that is engaged in any activity to use or develop data mining shall
submit a report to Congress on all such activities of the department or agency under
the jurisdiction of that official. The report shall be produced in coordination with
the privacy officer of that department or agency, if applicable, and shall be made
available to the public, except for an annex described in subparagraph (C).
(2) Content of report: Each report submitted under subparagraph (A) shall include, for
each activity to use or develop data mining, the following information:
(A) A thorough description of the data mining activity, its goals, and, where
appropriate, the target dates for the deployment of the data mining activity.
(B) A thorough description of the data mining technology that is being used or
will be used, including the basis for determining whether a particular
pattern or anomaly is indicative of terrorist or criminal activity.
(C) A thorough description of the data sources that are being or will be used.
(D) An assessment of the efficacy or likely efficacy of the data mining activity in
providing accurate information consistent with and valuable to the stated
goals and plans for the use or development of the data mining activity.
(F) A list and analysis of the laws and regulations that govern the information
being or to be collected, reviewed, gathered, analysed, or used in conjunction
with the data mining activity, to the extent applicable in the context of the
data mining activity.
(G) A thorough discussion of the policies, procedures, and guidelines that are
in place or that are to be developed and applied in the use of such data
mining activity in order to—
Contd....
LOVELY PROFESSIONAL 1
Business Intelligence
Notes (i) protect the privacy and due process rights of individuals, such as
redress procedures; and
(ii) ensure that only accurate and complete information is collected,
reviewed, gathered, analysed, or used, and guard against any harmful
consequences of potential inaccuracies.
(3) Annex
(iv) trade secrets (as that term is defined in section 1839 of title 18).
(4) Time for report: Each report required under sub-paragraph (A) - shall be—
(A) submitted not later than 180 days after August 3, 2007; and
(B) updated not less frequently than annually thereafter, to include any activity
to use or develop data mining engaged in after the date of the prior report
submitted under sub-paragraph (A).
Questions:
13.3 Summary
• The aim of a usual data mining task is to use the mining model to make predictions.
• Functionally, there are distinct types of prediction queries supported in SQL Server,
depending on the type of inputs to the query.
• When you create a prediction, you normally supply some piece of new information and
ask the model to develop a prediction based on the new data.
• Both singleton and batch prediction queries use the PREDICTION JOIN syntax to define
the new data.
1 LOVELY PROFESSIONAL
Unit 13: Creating Data Mining Queries and
1. Predictions
• In addition to predicting a value, you can customize a prediction query to 2.
return various types of information that are related to the proposition.
• Data Mining Extensions (DMX) is a language that you can use to create and
work with data mining models in Microsoft SQL Server Analysis Services.
• There are three types of statements in DMX: data definition statements, data
manipulation statements and data query statements.
• Data definition statements in DMX are part of the Data Definition Language
(DDL).
• Data manipulation statements are used in DMX to work with existing mining
models, to browse the models and to create predictions against them.
13.4 Keywords
Data Definition Language (DDL): Data Definition Language (DDL) describes the
portion of SQL that allows you to create, alter, and destroy database objects.
1. What is prediction?
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
1 LOVELY PROFESSIONAL
Unit 13: Creating Data Mining Queries and
7. True 8. True
9. False 10. True
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for
Decision Making”. John Wiley & Sons.
David Loshin (2012). “Business Intelligence: The Savvy Manager’s Guide”. Newnes.
LOVELY PROFESSIONAL 1
Unit 14: Reporting
CONTENTS
Objectives
Introduction
14.2.1 Architecture
14.4 Summary
14.5 Keywords
Objectives
Introduction
Reporting software is used to generate human-readable reports from various data sources.
Business operations reporting and dashboard are the most common applications for a reporting
tool. Market share-leading reporting solution is from Microsoft Microsoft SQL Server
Reporting Services is the solution of choice for many businesses that require enterprise
reporting capabilities. In this unit, you will learn about functionalities of reporting tools. Later
on in this unit, reporting services using SQL server will be discussed with appropriate
screenshots. Finally, analysis services using SQL server will be introduced.
1. Front End: Data is ineffective if all it does is sitting in the data warehouse. As an
outcome, the production layer is of very high significance.
2. Data source connection capabilities: There are two types of data sources:
LOVELY PROFESSIONAL 1
Business Intelligence
Notes 3. Scheduling and distribution capabilities: The reporting tool must have scheduling and
distribution capabilities.
Notes Weekly reports are scheduled to run on Monday morning, and the resulting reports are distributed to the senior e
4. Security Features: Because reporting tools are aimed towards a number of users,
making sure people see only what they are supposed to see is important.
Did u know? Security can reside at the report level, folder level, column level or row level.
Generally all established reporting tools have these capabilities.
5. Customization: Provide easy way to pre-set the reports to look exactly the way that
adheres to the corporate standard.
6. Export capabilities: The most common export needs are to Excel, to a flat file, and to PDF.
!
Caution For Excel, if the situation wants, you will want to verify that the reporting
format, not just the data itself, will be exported out to Excel.
Example: The BIRT reporting features like report layout, data access, and scripting
support are used to create reports that use the custom reporting URLs from Rational Asset
Manager.
Self Assessment
1. There are two types of data sources: the.......................and the OLAP multidimensional
data source.
SSRS supplies some extensions towards the data rendering, consignment and security of
reports thereby allowing it to have a higher programmable ability. This innovative approach
enables reports to be created with lesser development effort [compared to other reporting
services], along with customized security choices.
1 LOVELY PROFESSIONAL
Unit 14: Reporting
After using SSRS, the architecture is just like a small operating system. The Report Manager is
the central person who acts as a manager to decide when the reports will be scheduled to run
along with maintaining the user profiles on the report server.
On Report Server all the reports reside. All other activities pertaining to SSRS is done at
report server.
Did u know? Report Designer is a graphical tool that is hosted within the Microsoft
Visual Studio IDE. It provides tabbed windows for Data, Layout, and Preview that allow
you to design a report interactively.
After installing SQL server reporting services on your system, start the Visual studio IDE. Go
to File -> New Project, and you will be shown a prompt with ‘New project’. Select Business
Intelligence Projects from the Project Types. As this is our first project, use Report Project
Wizard.
Figure 14.1: New Project Window
Source: http://www.codeproject.com/KB/books/Start_SSRS/1.jpg
Specify the name of the project as well as the location where the project will be placed. On click
of OK, you will be prompted with a report wizard screen as shown in Figure 14.2. Click on
Next to follow up to the next screen.
LOVELY PROFESSIONAL 1
Business Intelligence
Source: http://www.codeproject.com/KB/books/Start_SSRS/2.jpg
On the next screen, you will need to create a data source for the report. Just click on Edit to
specify the server name and the database then will be used from that server. The connection
string is automatically created.
Figure 14.3: Select Data Source
Source: http://www.codeproject.com/KB/books/Start_SSRS/3.jpg
On click of Next, you will be prompted with the Query Builder screen. Here you can add tables,
select columns as well as execute the SQL statements.
1 LOVELY PROFESSIONAL
Unit 14: Reporting
Source: http://www.codeproject.com/KB/books/Start_SSRS/5.jpg
Source: http://www.codeproject.com/KB/books/Start_SSRS/6.jpg
LOVELY PROFESSIONAL 1
Business Intelligence
Source: http://www.codeproject.com/KB/books/Start_SSRS/7.jpg
Based on what query suits your report, create the SQL statement and proceed forward. On
next click, you will be prompted with the report type screen. You can choose as Tabular or
matrix. To make things simpler, use the Tabular format.
Source: http://www.codeproject.com/KB/books/Start_SSRS/8.jpg
1 LOVELY PROFESSIONAL
Unit 14: Reporting
On next click, you will come to the table designing screen, wherein you will be prompted to Notes
display the fields as Page, Group or Details.
Figure 14.8: Design the table
Source: http://www.codeproject.com/KB/books/Start_SSRS/9.jpg
On next click, you will be prompted with the Table Style prompt, which contains a list to choose.
Select any one from them.
Source: http://www.codeproject.com/KB/books/Start_SSRS/10.jpg
LOVELY PROFESSIONAL 1
Business Intelligence
Notes On next, you will be prompted with the deployment details screen. Specify the report server
name (normally it is http://localhost/ReportServer). If you are using another server then you
can specify the location as http://servername/ReportServer . Also give the deployment
Figure 14.10: Choose Deployment Location
folder.
Source: http://www.codeproject.com/KB/books/Start_SSRS/11.jpg
Finally, the Report name needs to be entered and done; you got your first report in place.
Figure 14.11: Complete the report
Source: http://www.codeproject.com/KB/books/Start_SSRS/12.jpg
1 LOVELY PROFESSIONAL
Unit 14: Reporting
You can preview the report to change the data specifications if required. Once done, press Notes
Ctrl+F5 and the deployment of the report will occur.
Task Prepare a presentation on the steps of generating report using SQL server 2008.
Self Assessment
3. SSRS supplies some extensions towards the data rendering, consignment and security
of reports thereby allowing it to have a higher programmable ability.
4. SSRS is a comprehensive reporting platform whereby accounts are retained on a
centralized web server (or set of servers).
5. The Report Manager is the central person who acts as a manager to decide when the
reports will be scheduled to run along with maintaining the user profiles on the report
server.
Database
Engine
Integration
Services
ReportingAnalysis
ServicesServices
Source: http://i.msdn.microsoft.com/dynimg/IC6000.gif
LOVELY PROFESSIONAL 1
Business Intelligence
Notes
Notes You cannot use SQL Server Management Studio to develop, manage, or query multidimensional data sets that w
Self Assessment
G
lobal aviation fuel companies supplying airlines in the current fragile
economic climate are highly sensitised to potential airline collapses.
The liability of unpaid invoices or even un-invoiced deliveries can
significantly harm the profit margins
within an extremely competitive market. Getting this information into the
hands of the people who can make some sound commercial decisions was
paramount for one such company.
The principal project objective was to enable the creation, management, and delivery of
business intelligence reports for the analysis of customer account open debt and priced
un-invoiced deliveries from all account receivable systems.
This required the consolidation of data feeds from multiple regional accounting systems
into a single data warehouse against which data cubes and their dependent reports could
be developed.
A SQL Server 2005 Business Intelligence solution was implemented to meet the customer
requirements of accuracy, flexibility and performance.
Features:
1 LOVELY PROFESSIONAL
Unit 14: Reporting
Reports run daily and are easily available to users who are travelling for them to down load and view Notes
Timing of the extracts takes into account when updates are posted to ensure the data is as up to date as possible
Reports can be exported in multiple file formats to provide accessibility for further analysis offline
Optimized data extract procedures and schedules to ensure reporting does not impact on other accounting system functions
Deliverables:
SQL Server Integration Service components to manage the data extracts and scheduling from multiple disparate systems
SQL Server Analysis Services (SSAS) components providing a foundation for Online Analytical Processing (OLAP) analysis and data mining. Used to c
SQL Server Reporting Services components for report generation based upon data gathered from the data-cubes. This is the means by which users select,
Question:
Analyse the case and provide any other solution to the problem.
Source: http://www.bmn.ltd.uk/CaseStudy_SQL_Reporting.aspx
14.4 Summary
• Following are the reporting tool functionalities: front end, data source connection
capabilities, scheduling and distribution capabilities, security features, customization and
export capabilities.
• There are two types of data sources: The relationship database and The OLAP
multidimensional data source.
• Microsoft has come up with its own reporting service, in conjunction with SQL server
database to insert the Microsoft SQL Server Reporting Services [SSRS].
• After using SSRS, the architecture is just like a small operating system. The Report
Manager is the central person who acts as a manager to decide when the reports will be
scheduled to run along with maintaining the user profiles on the report server.
• Report Designer is a graphical tool that is hosted within the Microsoft Visual Studio IDE.
It provides tabbed windows for Data, Layout, and Preview that allow you to design a
report interactively.
• After installing SQL server reporting services on your system, start the Visual studio IDE.
Go to File -> New Project, and you will be shown a prompt with ‘New project’.
LOVELY PROFESSIONAL 1
Business Intelligence
Interactive Sorting: Applying sort capabilities to a report enable users to sort the data by
any of the columns the report contains in ascending or descending order.
Matrix: A format that supports row and column groups, and which can display aggregated
summary data in the cells where row groups and column groups intersect one another,
similarly to a pivot table or crosstab.
Report Designer: Report Designer is a graphical tool that is hosted within the Microsoft Visual
Studio IDE.
Table: A tabular format in which data is displayed in rows and columns. You can create a
hierarchy of rows to reflect groupings in your data and display group totals.
UDM: It provides an intermediate logical layer between the physical relational database
used as the data source and the proprietary cube and dimension structures that are used to
resolve user queries.
2. “After using SSRS, the architecture is just like a small operating system”. Elaborate.
Books Carlo Vercellis (2011). “Business Intelligence: Data Mining and Optimization for
Decision Making”. John Wiley & Sons.
David Loshin (2012). “Business Intelligence: The Savvy Manager’s Guide”. Newnes.
2 LOVELY PROFESSIONAL
Unit 14: Reporting
LOVELY PROFESSIONAL 2
Business Intelligence
2 LOVELY PROFESSIONAL
LOVELY PROFESSIONAL UNIVERSITY
Jalandhar-Delhi G.T. Road (NH-
1) Phagwara, Punjab (India)-
144411 For Enquiry: +91-1824-
300360
Fax.: +91-1824-506111
Email: [email protected]