Business Anlaytics New
Business Anlaytics New
Business Anlaytics New
Unit 4
1. Define Financial Analytics.
2. Discuss the importance of financial analytics.
3. Explain the types of financial analytics.
4. List out the software programs used in financial analytics.
5. What is HR analytics?
6. Discuss the Pros and Cons of HR analytics.
7. How can HR Analytics be used by organizations?
8. Explain the process of HR analytics.
9. Discuss Marketing analytics.
10. How Organizations using Marketing Analytics – Process / Steps?
11. Discuss the challenges of Marketing Analytics.
12. Explain the applications of business analytics in the industry.
13. Briefly discuss the Analytics for Government and Nonprofit.
BUSINESS
ANALYTICS
WHAT MAKES BUSINESS DECISIONS COMPLICATED
TODAY?
• Pricing :
setting prices for consumer and industrial goods, government contracts, and maintenance
contracts
• Customer segmentation
identifying and targeting key customer groups in retail, insurance, and credit card industries
CONTD…
• Merchandising
determining brands to buy, quantities, and allocations
• Location
finding the best location for bank branches and ATMs, or where to service industrial equipment
• Social Media
understand trends and customer perceptions; assist marketing managers and product designers
CONTD…
• OTT Platforms
• Amazon
COMPONENTS OF BUSINESS ANALYTICS
• Data aggregation
• Data mining
• Association and sequence identification
• Text mining
• Forecasting
• Predictive analytics
• Optimization
• Data Visualisation
EVOLUTION OF BUSINESS ANALYTICS
• Statistics –
richer understanding of data beyond Business Intelligence
Summarising + finding unknown and interesting relationships
Includes basic tools of description, exploration, estimating and
inference
Advanced tools like Regression, Forecast and Data Mining)
• Operations research/Management science
Using math/computer “model” to analyze and find solution of
complex decision problems
Modeling and optimization techniques for translating real
problems into mathematics, spreadsheets or computer
languages and using them to find the best or optimal
solution and decisions
DECISION SUPPORT SYSTEMS
• Decision support systems (BI + ORMS) – create analytical based computer systems to
support decision making
• Data management
Databases for storing data
Input, retrieve, update and manipulate the data
• Model management
Stat tools and management science models for building, manipulate, analyse and solve models
• Communication management
Provides an Interface for user to interact with data
DSS APPLICATIONS
• Visualizing data and results of analyses provide a way of easily communicating data at all
levels of a business
• reveal surprising patterns and relationships.
APPLICATION OF VISUALIZATION TOOLS
• Software such as IBM's Cognos system exploits data visualization for query and reporting,
data analysis, dashboard presentations, and scorecards linking strategy to operations.
• The Cincinnati Zoo, for example, has used this on an iPad to display hourly, daily, and monthly
reports of attendance, food and retail location revenues and sales, and other metrics for
prediction and marketing strategies.
• ARAMARK corporation developed visual "interactive simulators" to display the results of
multivariate regression models on dials similar to those on an automobile dashboard, while
allowing users to manipulate independent variables using simple sliders.
• UPS uses telematics to capture vehicle data and display them to help make decisions to
improve efficiency and performance
Applications of Business Analytics
Starting with the basic definition of Business Analytics, “it is the study and exploration of statistical data, the
formation of predictive models, deployment of the optimized technique, and communication of obtained
output to business partners, customers, and other executives for the different business issues”.
Big data is used up to the maximum extent through qualitative and quantitative techniques to get desired
business modeling and decision-making.
TELECOM
The telecommunication industry provides televisions, telephone, Internet and cable access to people all over the world. Telecom jobs include engineers,
sales people, customer service representatives, and installers. ... Some telecom jobs involve satellite communication.
The application of data analytics in the telecom industry must meet the specific needs of a telecom company. A one size fits all approach to data analytics
cannot work when an industry is so specialized. A data analytics system for a telecom company must be capable of being personalized for each users
unique tasks.
Retail analytics is the process of using big data to optimize pricing, supply chain movement, and improve customer
loyalty. Big data describes a large volume of data that is used to reveal patterns, trends, and associations, especially
relating to human behavior and interactions.
Historically, it has been defined by three key factors: volume, velocity, and variety. For the retail industry, big data
means a greater understanding of consumer shopping habits and how to attract new customers. Big data analytics in
retail enables companies to create customer recommendations based on their purchase history, resulting in
personalized shopping experiences and improved customer service. These super-sized data sets also help with
forecasting trends and making strategic decisions based on market analysis.
Predicting Spending
Amazon uses customer data to recommend items for you based on your past searches and purchases. They
generated 29 percent of sales through their recommendations engine which analyzes more than 150 million
accounts. This has led to big profits for the ecommerce giant.
In the medical or healthcare department, the Business analyst makes predictions about the stock of medicine
available in the hospital or medical store, the shipment of medicines in the local market, predictions related to
disease, impacts of different medicines on same diseases, appointment and availability of doctor, arranging
slots for patients, to a medicine available for cure.
For example, allotment of free slots to the patient considering the doctor’s working hours, duties of working
staff in the hospital, etc.
Production of medicines can also be optimized by the business analyst. He proposes the strategies regarding
production costs of medicines, areas of production and stock available, low cost, and high yield preparation
methods.
Disaster planning : Both natural and manmade disasters will put tremendous pressure on the health care systems in
that area. During a disaster, the demand for a particular service will increase way beyond its capacity. For example,
during a flu outbreak, demand for ventilators will increase. Knowing the real-time location and availability of such
facilities will be very helpful for the authorities in managing such disasters. Using data analytics, it is also possible to
predict the outbreaks of some diseases and thus putting the authorities in a better position to manage it.
Patient Flow: Healthcare is a time critical service and data analytics plays a crucial role in ensuring smooth patient
flow and reducing waiting period. Predicting patient surge will help the authorities take the necessary step to reduce
patient waiting time thereby giving timely treatment.
Cost and Effectiveness : Data analytics can be used to compare the cost and effectiveness of treatments, public
policies etc. Organizations can use cost and outcome data to check the effectiveness of medicines and stop
prescribing medicines that are not effective.
Effective resource management: Location tracking technologies like RFIDs are used to provide real-time
management, identification, and tracking of instruments within an organization. Along with tracking instruments,
now such technologies are increasingly being used to track and manage patients and staffs. Data from such services
can be used to improve patient care, resource utilization, and staff management.
Link between Strategy and Business Analytics
Starting a new business requires careful planning to maximize the chances of success. Many
small businesses are unable to make profit and fail within the first few years of operation.
The terms "business strategy" and "business model" describe related concepts that are key to
Business Strategy
1. The term "business strategy" describes the methods a business uses achieve its mission and
objectives. A business' mission encompasses its overall purpose, core values and long-term
goals. A grocery store might have the mission of making profit while providing the best food
to customers, minimizing its impact on the environment and promoting strength in the local
economy. The company's strategy might involve buying products from local food producers,
encouraging customers to bring their own grocery bags, advertising in local newspapers and
buying recycled product packaging materials. A business’ strategy includes how it deals with
Business Model
1. A company's business model describes the basic means by which it creates value, delivers
value to consumers and collects revenue from customers to make a profit. Business models
can vary greatly from one company to another. A local grocery store's business model might
involve buying food at wholesale prices and selling it to end consumers at a higher price to
make profit. A website might have a business model based on providing video content to
behind how the company plans to achieve its goals, such as making a profit. A company can
change its business model over time as a part of its profit-making strategy. For example, if
website does not make enough revenue from advertisements to make profit, managers might
decide implement a new business model, such as selling T-shirts and other goods though an
Business Plan
1. Determining a company's mission, objectives, strategy and business model are all important
steps in the process of creating new business and can help managers form a business plan. A
business plan is a document that acts as a blueprint for how the business plans to operate and
achieve profitability.
Every successful business starts with a concept, a plan and a product or service that customers
are willing to pay money to obtain. Business strategies are never conducted in a vacuum,
however, and for a business to be successful, there must be a business plan and a business
model generated. These two terms are unfortunately used interchangeably, but in reality, they
are two very different documents that cannot exist without one another. It is essential that a
At its simplest, a business plan is a written description of the future of a business. It's a
document that not only gets a business concept on paper but also outlines the people and
steps that will be involved to lead the business to success. The business plan is where you
discuss the industry and the need for a particular product or service, the business structure
The business plan also talks about the market in which the business will operate, lays out the
competition and what the plans are to position the business as a leader. Lastly, the business
plan lays out the ever-important financial plan, discussing things such as income and cash
flow, loans and obligations and when and how investors can expect to see a return.
A business model, on the other hand, is a business's rationale and plan for making a profit. If
the business plan is a road map that describes how much profit the business intends to make
in a given period of time, the business model is the skeleton that explains how that money
will be made. A model covers everything from how a company is valued within an industry
to how it will interact with suppliers, clients and partners to generate profits.
There are several different kinds of business models. A software company, for instance,
might be based on a subscription model, which generates revenue from customers that renew
subscriptions annually for a license to use the software. An example of an accessories model
would be a razor company or computer printer company that guarantees future income
Interdependence
While it's true that a business plan and business model are two separate documents, the reality
is that the business plan cannot live without the business model. While a business plan can
describe the structure of a business's financial goals, the business model explains how the
money will flow - from customer generation to marketing to sales, and finally, to customer
retention. The business model must have room to grow and adapt. Consequently, if the
One of the most prominent examples of a business model changing is currently occurring in
the computer software industry. About 10 years ago, the way to purchase software programs
was to go to the store and buy a CD-ROM to download the application and license to your
computer. Today, the advent of cloud-based subscription services makes it possible for
customers to download software and renew licenses remotely over the internet.
This transition to the Software as a Service (SaaS) subscription model has caused many
businesses to change their plans. Companies affected by this shift include computer
companies that no longer need to build machines with CD-ROM drives in them and software
As a result, software companies have had to change their business plans, including costs and
infrastructure costs for cloud storage and bandwidth, as well as maintain a cloud operations
team 24 hours a day, seven days a week. These ongoing efforts can increase costs and reduce
margins, but they're a necessary adaptation to changing customer needs and market
The main purpose of business analysis models is to provide the best business solutions
that will boost the growth of business organizations across the world.
Now, let’s have a look at a brief explanation of each of these business models.
Business Process Modelling is depicted with a diagram that shows two notions;
Business Process Modelling is one of the best BA techniques used in the industry.
2. User Stories:
User Stories is a modern-day business analysis model that deals with designing, data
gathering, requirements solutions, and project development. This is a major method
used in agile modeling for collecting business requirements from product end-users to
provide the best solution for business growth.
A major advantage of this business analysis model is the fact that it helps BA
professionals with in-depth business analyst training to analyze requirements from user
perspectives. This ensures that the outcome of the analysis is highly effective and user-
focused. The need for iteration in business analysis is a major reason why this model is
applied for business solutions.
You can learn more about the User Stories business analysis model if you enroll in
business analyst certification online. This is a program that trains business analysts on
how to apply the best business analyst techniques for their practice.
3. SWOT Analysis:
SWOT Analysis is one of the best business analysis models in the BA field. It is a basic
fundamental tool that especially evaluates the weakness and strength of a business and
also identifies the sets of opportunities that can make a business grow while identifying
possible threats too. So, the word “SWOT” is an acronym.
Business stakeholders make use of this business analysis model to make strategic
decisions that will push an organization towards achieving its business targets. It is a
model that helps them maximize business opportunities and capitalize on the strengths
of a business while limiting negative impacts caused by threats and business weakness.
SWOT Analysis is a four-grid strategic technique that can be used at any enterprise-level
for developing business strategies.
4. Brainstorming:
This business analysis technique helps to generate business ideas, analyze business
challenges while designing possible solutions to solve those challenges.
This business analysis framework is a highly structured technique that is strictly followed
by professionals at all levels of an organization. This is simply because it ensures an
organization does not lose focus on its ability and mission statement.
6. Use Case Modelling:
Use Case Modelling is one of the best business analysis modeling techniques used by
business analysts across the world. It involves the pictorial illustration of how business
functions are meant to work within a system through user interactions.
This system is known as the “TimeSheet Management System”. This technique is mainly
applied during the design phase of a software development project. This is simply
because it helps professionals in their business analyst training career to transform
business requirements into highly functional specifications.
The primary components of Use Case Modelling are;
So, the CATWOE business analysis model focuses on key areas of dilemma and how
invented solutions can impact an organization as a whole. This helps BAs to prioritize
important aspects of business growth that will please stakeholders.
This model allows for a better understanding of what a business is all about and what
needs to be done to make it successful. For instance, it ensures a company is offering
good value to its customers and clients. It identifies the cost needed to provide value
offering and also ensures an organization’s revenue model is stable.
Furthermore, the product design cost, cost of production, marketing strategy, and
company revenue management is better handled with business model analysis.
9. PESTLE Analysis:
PESTLE Analysis is a business analysis model that takes care of environmental factors
affecting business growth. These environmental factors mainly influence business
decisions during strategic planning and the best way to ensure that one arrives at a final
business decision is to apply the PESTLE Analysis technique.
Enrolling in business analyst training will give you more knowledge about this business
analysis model. A business analyst can use PESTLE Analysis to analyze the environmental
factors where an organization operates while they identify how the factors will affect
business performance soon.
It is very important to carry out proper requirement analysis to attain an effective and
efficient software project development. It is very crucial.
Conclusion
In conclusion, all the business analysis models listed in this guide are excellent business
analysis techniques that should be in every business analyst toolbox.
The best way to make this happen is by making use of business analysis models.
These BA models will make the practice of business analysis an enjoyable and successful
one. A certified business analyst must be able to apply each of these models for business
solutions in any organization they find themselves.
One of the basic things you’ll learn when you enroll for a business analyst course online
is the introduction to some of the best business analysis models. This vital aspect of
business analyst training should be handled with sincere interest. To learn more about
business analysis as a profession, ensure you acquire business analysis certification
online whenever it’s convenient for you.
Chapter 3
Introduction to “Fundamentals of Business Analytics”
OLTP and OLAP RN Prasad and Seema Acharya
Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Content of this presentation has been
taken from Book
“Fundamentals of Business
Analytics”
RN Prasad and Seema Acharya
Published by Wiley India Pvt. Ltd.
User gets
instant update
on the account
balance after
withdrawing
the money
TRANSACTIONS
• Single event that changes something
• Different types of transactions
– Customer orders
– Receipts
– Invoices
– Payments
INSERT
INSERT UPDATE
UPDATE RETRIEVE
RETRIEVE
INSERT INSERT
UPDATE UPDATE
RETRIEVE RETRIEVE
TRANSACTIONS
Cash at
register
gone up
Inventory
of video
game gone
down
Ordering of
new video
game for
the store
OLTP Segmentation
• They can be segmented into:
– Real-time Transaction Processing
– Batch Processing
Real-time Transaction processing
• Multiple users can fetch the information
• Very fast response rate
• Transactions processed immediately
• Everything is processed in real time
Batch Processing
• Where information is required in batch
• Offline access to information
• Presorting (sequence) is applied
• Takes time to process information
Day
Day 1 Day 2 Day 3 .......... 30
Monthly
purchase of
Retail Store
Characteristics of OLTP Model
• Online connectivity
• LAN,WAN
• Availability
– Available 24 hours a day
• Response rate
– Rapid response rate
– Load balancing by prioritizing the transactions
Characteristics of OLTP Model
• Cost
– Cost of transactions is less
• Update facility
– Less lock periods
– Instant updates
– Use the full potential of hardware and software
Limitations of Relational Models
• Create and maintain large number of tables
for the voluminous data
• For new functionalities, new tables are added
• Unstructured data cannot be stored in
relational databases
• Very difficult to manage the data with
common denominator (keys)
Answer a Quick Question
• The super market store is deciding on introducing a new product. The key
questions they are debating are: “Which product should they introduce?”
and “Should it be specific to a few customer segments?”
• The super market store is looking at offering some discount on their year-
end sale. The questions here are: “How much discount should they offer?”
and “Should it be different discounts for different customer segments?”
• The supermarket is looking at rewarding its most consistent salesperson.
The question here is:“How to zero in on its most consistent salesperson
(consistent on several parameters)?All the queries stated above have more
to do with analysis than simple reporting”
• Ideally these queries are not meant to be solved by an OLTP system.
OLAP - Online Analytical Processing
OLAP differs from traditional databases in the way data is conceptualized
and stored.
In OLAP data is held in the dimensional form rather than the relational
form.
OLAP’s life blood is multi-dimensional data.
OLAP tools are based on the multi-dimensional data model. The multi-
dimensional data model views data in the form of a data cube.
Online Analytical Processing (OLAP) is a technology that is used to
organize large business databases and support business intelligence.
OLAP databases are divided into one or more cubes. The cubes are
designed in such a way that creating and viewing reports become easy.
OLAP databases are divided into one or more cubes, and each cube is
organized and designed by a cube administrator to fit the way that you
retrieve and analyze data so that it is easier to create and use the PivotTable
reports and PivotChart reports that you need.
OLAP (Online Analytical Processing)
• OLAP is a category of software that allows users to analyze
information from multiple database systems at the same time. It
is a technology that enables analysts to extract and view business
data from different points of view
• Analysts frequently need to group, aggregate and join data.
These operations in relational databases are resource intensive.
With OLAP, data can be pre-calculated and pre-aggregated,
making analysis faster.
• Provides multidimensional view of data
• Used for analysis of data
• Data can be viewed from different perspectives
• Determine why data appears the way it does
• Drill down approach is used to further dig down deep into the
data
OLAP - Example
Let us consider the data of a supermarket store, “AllGoods” store, for the
year “2001”.
This data as captured by the OLTP system is under the following column
headings: Section, Product-CategoryName, YearQuarter, and SalesAmount.
We have a total of 32 records/rows.
The Section column can have one value from amongst “Men”, “Women”,
“Kid”, and “Infant”.
The ProductCategory Name column can have either the value
“Accessories” or the value “Clothing”.
The YearQuarter column can have one value from amongst “Q1”, “Q2”,
“Q3”, and “Q4”.
The SalesAmount column record the sales figures for each Section,
ProductCategory Name, and Year Quarter.
OLAP - Example
Characteristics of OLAP
• Multidimensional analysis
In Table 3.7, data has been plotted along two dimensions as we can now look at the
SalesAmount from two perspectives, i.e. by YearQuarter and ProductCategoryName. The
calendar quarters have been listed along the vertical axis and the product categories have been
listed across the horizontal axis. Each unique pair of values of these two dimensions
corresponds to a single point of SalesAmount data. For example, the Accessories sales for Q2
add up to $9680.00 whereas the Clothing sales for the same quarter total up to $12366.00.
Their sales figures correspond to a single point of SalesAmount data, i.e. $22046.
Three Dimensional
What if the company’s analyst wishes to view the data — all of it — along all the three
dimensions (Year-Quarter, ProductCategoryName, and Section) and all on the same table
at the same time? For this theanalyst needs a three-dimensional view of data as arranged
in Table 3.8. In this table, one can now look atthe data by all the three dimensions/
perspectives, i.e. Section, ProductCategoryName, YearQuarter. If theanalyst wants to
look for the section which recorded maximum Accessories sales in Q2, then by giving
aquick glance to Table 3.8, he can conclude that it is the Kid section.
Can we go beyond Three Dimensional?
Well, if the question is “Can you go beyond the third dimension?” the answer is
YES!
If at all there is any constraint, it is because of the limits of your software. But if
the question is “Should you go beyond the third dimension?” we will say it is
entirely on what data has been captured by your operational transactional systems
and what kind of queries you wish your OLAP system to respond to.
Now that we understand multi-dimensional data, it is time to look at the
functionalities and characteristics of an OLAP system. OLAP systems are
characterized by a low volume of transactions that involve very complex queries.
Some typical applications of OLAP are: budgeting, sales forecasting, sales
reporting, business process manage
Example: Assume a financial analyst reports that the sales by the company have
gone up. The next question is “Which Section is most responsible for this
increase?” The answer to this question is usually followed by a barrage of
questions such as “Which store in this Section is most responsible for the
increase?” or “Which particular product category or categories registered the
maximum incréase?” The answers to these are provided by multidimensional
analysis or OLAP;
Can we go beyond Three Dimensional?
Let us go back to our example of a company’s
(“AllGoods”) sales data viewed along three dimensions:
Section, ProductCategoryName, and YearQuarter.
Given below are a set of queries, related to example,
that a typical OLAP system is capable of responding to:
•What will be the future sales trend for “Accessories” in the “Kid’s” Section?
•Given the customers buying pattern, will it be profitable to launch product
“XYZ” in the “Kid's” Section?
• What impact will a 5% increase in the price of produces have on the
customers?
Advantages of an OLAP System
• Consistency of information.
• “What if ” analysis.
OLTP OLAP
Online Transaction Processing Online Analytical Processing
Database Design Typically normalized tables. OLTP Typically de-normalized tables; uses
system adopts ER (Entity Relationship) star or snowflake schema
model
Operations Read/Write Mostly read
Backup and Recovery Regular backups of operational data are Instead of regular backups, data
mandatory. Requires concurrency control warehouse is refreshed periodically
(locking) and recovery mechanisms using data from operational data
(logging) sources
Joins Many Few
2 OLAP systems are used by knowledge workers such as executives, OLTP systems are used by clerks, DBAs, or
managers and analysts. database professionals.
5 Based on Star Schema, Snowflake, Schema and Fact Constellation Based on Entity Relationship Model.
Schema.
7 Provides summarized and consolidated data. Provides primitive and highly detailed
data.
8 Provides summarized and multidimensional view of data. Provides detailed and flat relational view
of data.
CS 336 40
Decision Support
• Information technology to help the
knowledge worker (executive, manager,
analyst) make faster & better decisions
– “What were the sales volumes by region and product category for
the last year?”
– “How did the share price of comp. manufacturers correlate with
quarterly profits over the past 10 years?”
– “Which orders should we fill to maximize revenues?”
CS 336 41
Three-Tier Decision Support Systems
• Warehouse database server
– Almost always a relational DBMS, rarely flat files
• OLAP servers
– Relational OLAP (ROLAP): extended relational DBMS that maps
operations on multidimensional data to standard relational
operators
– Multidimensional OLAP (MOLAP): special-purpose server that
directly implements multidimensional data and operations
• Clients
– Query and reporting tools
– Analysis tools
– Data mining tools
CS 336 42
The Complete Decision Support
System
Information Sources Data Warehouse OLAP Servers Clients
Server (Tier 2) (Tier 3)
(Tier 1)
e.g., MOLAP
Semistructured Analysis
Sources
Data
Warehouse serve
extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve
Data Marts
CS 336 43
Data Warehouse vs. Data Marts
• Enterprise warehouse: collects all information about
subjects (customers,products,sales,assets,
personnel) that span the entire organization
– Requires extensive business modeling (may take years to design
and build)
• Data Marts: Departmental subsets that focus on selected
subjects
– Marketing data mart: customer, product, sales
– Faster roll out, but complex integration in the long run
• Virtual warehouse: views over operational dbs
– Materialize sel. summary views for efficient query processing
– Easy to build but require excess capability on operat. db servers
CS 336 44
Approaches to OLAP Servers
• Relational DBMS as Warehouse Servers
• Two possibilities for OLAP servers
• (1) Relational OLAP (ROLAP)
– Relational and specialized relational DBMS to
store and manage warehouse data
– OLAP middleware to support missing pieces
• (2) Multidimensional OLAP (MOLAP)
– Array-based storage structures
– Direct access to array data structures
CS 336 45
OLAP Server: Query Engine
Requirements
• Aggregates (maintenance and querying)
– Decide what to precompute and when
• Query language to support multidimensional
operations
– Standard SQL falls short
• Scalable query processing
– Data intensive and data selective queries
CS 336 46
OLAP for Decision Support
• OLAP = Online Analytical Processing
• Support (almost) ad-hoc querying for business analyst
• Think in terms of spreadsheets
– View sales data by geography, time, or product
• Extend spreadsheet analysis model to work with
warehouse data
– Large data sets
– Semantically enriched to understand business terms
– Combine interactive queries with reporting functions
• Multidimensional view of data is the foundation of
OLAP
– Data model, operations, etc.
CS 336 47
Warehouse Models & Operators
• Data Models
– relations
– stars & snowflakes
– cubes
• Operators
– slice & dice
– roll-up, drill down
– pivoting
– other
CS 336 48
Multi-Dimensional Data
• Measures - numerical data being tracked
• Dimensions - business parameters that define a
transaction
• Example: Analyst may want to view sales data
(measure) by geography, by time, and by product
(dimensions)
• Dimensional modeling is a technique for
structuring data around the business concepts
• ER models describe “entities” and “relationships”
• Dimensional models describe “measures” and
“dimensions”
CS 336 49
The Multi-Dimensional Model
“Sales by product line over the past six months”
“Sales by store between 1990 and 1995”
Store Info Key columns joining fact table
to dimension tables Numerical Measures
...
CS 336 50
Dimensional Modeling
CS 336 51
Dimension Hierarchies
Store Dimension Product Dimension
Total Total
Region Manufacturer
District Brand
Stores Products
CS 336 52
ROLAP: Dimensional Modeling
Using Relational DBMS
• Special schema design: star, snowflake
• Special indexes: bitmap, multi-table join
• Special tuning: maximize query throughput
• Proven technology (relational model,
DBMS), tend to outperform specialized
MDDB especially on large data sets
• Products
– IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
CS 336 53
MOLAP: Dimensional Modeling
Using the Multi Dimensional Model
• MDDB: a special-purpose data model
• Facts stored in multi-dimensional arrays
• Dimensions used to index array
• Sometimes on top of relational DB
• Products
– Pilot, Arbor Essbase, Gentia
CS 336 54
Star Schema (in RDBMS)
CS 336 55
Star Schema Example
CS 336 56
Star
Schema
with Sample
Data
CS 336 57
The “Classic” Star Schema
Store Dimension Fact Table
A single fact table, with
Time Dimension
STORE KEY STORE KEY
PERIOD KEY
detail and summary data
Store Description PRODUCT KEY
Period Desc
City
State
PERIOD KEY
Dollars
Year Fact table primary key has
District ID Quarter
District Desc.
Region_ID
Units
Price
Month
Day
only one key column per
Region Desc.
Regional Mgr.
Level
Product Dimension Current Flag
Resolution dimension
PRODUCT KEY Sequence
Product Desc.
Brand Each key is generated
Color
Size
Manufacturer
Level
Each dimension is a single
table, highly denormalized
Example:
Select A.STORE_KEY, A.PERIOD_KEY, A.dollars from Level is needed
Fact_Table A whenever aggregates
where A.STORE_KEY in (select STORE_KEY are stored with detail
from Store_Dimension B facts.
where region = “North” and Level = 2)
and etc...CS 336 59
The “Level” Problem
• Level is a problem because because it causes
potential for error. If the query builder, human
or program, forgets about it, perfectly
reasonable looking WRONG answers can occur.
• One alternative: the FACT CONSTELLATION
model...
CS 336 60
The “Fact Constellation” Schema
Store Dimension Fact Table Time Dimension
STORE KEY STORE KEY
PERIOD KEY
Store Description PRODUCT KEY
City PERIOD KEY Period Desc
State Year
Dollars Quarter
District ID
Units
District Desc. Month
Price
Region_ID Day
Region Desc. Current Flag
Regional Mgr.
Product Dimension
Sequence
PRODUCT KEY
Product Desc.
Brand District Fact Table
Color
Region Fact Table
Size District_ID
Manufacturer Region_ID
PRODUCT_KEY
PRODUCT_KEY
PERIOD_KEY PERIOD_KEY
Dollars
Dollars
Units Units
Price Price
CS 336 61
The “Fact Constellation” Schema
Store Dimension Fact Table Time Dimension
STORE KEY STORE KEY
PRODUCT KEY
PERIOD KEY In the Fact Constellations,
Store Description
City
State
PERIOD KEY
Dollars
Period Desc
Year aggregate tables are
District ID Quarter
District Desc.
Region_ID
Units
Price
Month
Day
created
Region Desc.
Product Dimension
Regional Mgr.
PRODUCT KEY
Sequence separately from the detail,
Current Flag
Product Desc.
BrandDist rict Fact Table therefor
Color
Size District_ID
Manufacturer
PRODUCT_KEY
it is impossible to pick up,
Region Fact Table
Region_ID
PRODUCT_KEY
PERIOD_KEY
Dollars
for PERIOD_KEY
Dollars
Units
Price example, Store detail when
Units
Price
querying
the District Fact Table.
Major Advantage: No need for the “Level” indicator in the dimension tables,
since no aggregated data is stored with lower-level detail
Disadvantage: Dimension tables are still very large in some cases, which can
slow performance; front-end must be able to detect existence of aggregate
facts, which requires more extensive metadata
CS 336 62
Another Alternative to “Level”
• Fact Constellation is a good alternative to the
Star, but when dimensions have very high
cardinality, the sub-selects in the dimension
tables can be a source of delay.
• An alternative is to normalize the dimension
tables by attribute level, with each smaller
dimension table pointing to an appropriate
aggregated fact table, the “Snowflake Schema”
...
CS 336 63
The “Snowflake” Schema
Store Dimension
STORE KEY District_ID Region_ID
Store Description District Desc. Region Desc.
City Region_ID Regional Mgr.
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
Store Fact Table District Fact Table RegionFact Table
Region_ID
STORE KEY District_ID
PRODUCT_KEY
PRODUCT_KEY PERIOD_KEY
PRODUCT KEY PERIOD_KEY Dollars
PERIOD KEY Dollars Units
Units Price
Dollars Price
Units
Price
CS 336 64
The “Snowflake” Schema
Store Dimension
• No LEVEL in dimension tables
STORE KEY District_ID Region_ID
Store Description
City
District Desc.
Region_ID
Region Desc.
Regional Mgr.
• Dimension tables are normalized by
State
District ID
decomposing at the attribute level
District Desc.
Region_ID • Each dimension table has one key for
Region Desc.
Regional Mgr.
Store Fact Table District Fact Table
District_ID
RegionFact Table
Region_ID
each level of the dimensionís hierarchy
STORE KEY PRODUCT_KEY
PRODUCT_KEY
PRODUCT KEY
PERIOD KEY
PERIOD_KEY
Dollars
PERIOD_KEY
Dollars
Units
• The lowest level key joins the
Dollars
Units
Price
Price
dimension table to both the fact table
Units
Price and the lower level attribute table
How does it work? The best way is for the query to be built by understanding which
summary levels exist, and finding the proper snowflaked attribute tables,
constraining there for keys, then selecting from the fact table.
CS 336 65
The “Snowflake” Schema
Store Dimension
• Additional features: The original Store
STORE KEY District_ID Region_ID
Store Description District Desc. Region Desc. Dimension table, completely de-
City Region_ID Regional Mgr.
State normalized, is kept intact, since certain
District ID
District Desc.
Region_ID
queries can benefit by its all-
Region Desc.
Regional Mgr.
Store Fact Table District Fact Table RegionFact Table encompassing content.
District_ID Region_ID
STORE KEY
PRODUCT KEY
PRODUCT_KEY
PERIOD_KEY
PRODUCT_KEY
PERIOD_KEY
Dollars
• In practice, start with a Star Schema
PERIOD KEY Dollars
Units
Units
Price and create the “snowflakes” with
Dollars Price
Units queries. This eliminates the need to
Price
create separate extracts for each table,
and referential integrity is inherited
from the dimension table.
CS 336 68
Aggregates
Add up amounts by day
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
CS 336 69
Another Example
Add up amounts by day, product
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId storeId date amt
p1 s1 1 12 sale prodId date amt
p2 s1 1 11 p1 1 62
p1 s3 1 50 p2 1 19
p2 s2 1 8
p1 s1 2 44 p1 2 48
p1 s2 2 4
rollup
drill-down
CS 336 70
Aggregates
• Operators: sum, count, max, min,
median, ave
• “Having” clause
• Using dimension hierarchy
– average by region (within store)
– maximum by month (within date)
CS 336 71
ROLAP vs. MOLAP
• ROLAP:
Relational On-Line Analytical Processing
• MOLAP:
Multi-Dimensional On-Line Analytical
Processing
CS 336 72
The MOLAP Cube
dimensions = 2
CS 336 73
3-D Cube
Fact table view: Multi-dimensional cube:
dimensions = 3
CS 336 74
Example
roll-up to region
Dimensions:
NY
SF
Time, Product, Store
roll-up to brand
LA
Attributes:
10
Product (upc, price, …)
Juice
Product
34
Store …
Milk
56 …
Coke
Cream 32 Hierarchies:
Soap 12 Product Brand …
Bread 56 roll-up to week Day Week Quarter
M T W Th F S S
Store Region
Country
Time
56 units of bread sold in LA on M
CS 336 75
Cube Aggregation: Roll-up
Example: computing sums
s1 s2 s3
day 2 ...
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8
s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8 129
rollup sum
p1 110
p2 19
drill-down
CS 336 76
Cube Operators for Roll-up
s1 s2 s3
day 2 ...
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8 sale(s1,*,*)
s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8 129
sum
sale(s2,p2,*) p1 110
p2 19 sale(*,*,*)
CS 336 77
Extended Cube
* s1 s2 s3 *
p1 56 4 50 110
p2 11 8 19
day 2 *
s1 67
s2 12
s3 *50 129
p1 44 4 48
p2
s1 s2 s3 * sale(*,p2,*)
day 1 * 44 4 48
p1 12 50 62
p2 11 8 19
* 23 8 50 81
CS 336 78
Aggregation Using Hierarchies
s1 s2 s3 store
day 2
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50 region
p2 11 8
country
region A region B
p1 56 54
p2 11 8
(store s1 in Region A;
stores s2, s3 in Region B)
CS 336 79
Slicing
s1 s2 s3
day 2
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8
TIME = day 1
s1 s2 s3
p1 12 50
p2 11 8
CS 336 80
Slicing & Sales
($ millions)
Products Time
Pivoting Store s1 Electronics
d1
$5.2
d2
Toys $1.9
Clothing $2.3
Cosmetics $1.1
Store s2 Electronics $8.9
Toys $0.75
Clothing $4.6
Cosmetics $1.5
Sales
($ millions)
Products d1
Store s1 Store s2
Store s1 Electronics $5.2 $8.9
Toys $1.9 $0.75
Clothing $2.3 $4.6
Cosmetics $1.1 $1.5
Store s2 Electronics
Toys
Clothing
CS 336 81
Summary of Operations
• Aggregation (roll-up)
– aggregate (summarize) data to the next higher dimension
element
– e.g., total sales by city, year total sales by region, year
• Navigation to detailed data (drill-down)
• Selection (slice) defines a subcube
– e.g., sales where city =‘Gainesville’ and date = ‘1/15/90’
• Calculation and ranking
– e.g., top 3% of cities by average income
• Visualization operations (e.g., Pivot)
• Time functions
– e.g., time average
CS 336 82
Query & Analysis Tools
• Query Building
• Report Writers (comparisons, growth, graphs,…)
• Spreadsheet Systems
• Web Interfaces
• Data Mining
CS 336 83
MOLAP, ROLAP, HOLAP
• MOLAP
– Multidimensional OLAP
• ROLAP
– Relational OLAP
• HOLAP
– Hybrid OLAP
MOLAP
• Uses multidimensional approach to solve a
problem
• Directly stores the information in cubes
• Used in SSAS (SQL Server Analysis Services)
ROLAP
• Relational databases are used to store the
data
• Translates OLAP queries to appropriate SQL
statements
• Data created by OLTP is directly used
Do it Exercise
Attributes such as num_sold are called measure attributes, since they can be
used to measure some value, and can be aggregated.
Attributes like make, color, size are called dimension attributes, since they
define the dimensions on which measure attributes are viewed.
Data that can be modeled as dimension attributes and measure attributes are
called multi-dimensional data.
Dimension Hierarchies
Cross Tabs and Data Cubes
OLAP systems allow analyst to view different summaries of the data.
The following table can be derived from
sales(make, color, size, num_sold)
Relational representation
make color num_sold
Cross-tab or pivot table Toyota white 8
Toyota red 35
WHITE RED SILVER TOTAL Toyota silver 10
TOYOTA 8 35 10 53 Toyota all 53
Nissan white 20
NISSAN 20 10 5 35 Nissan red 10
HOLDEN 14 7 28 49 Nissan silver 5
Nissan all 35
FORD 20 2 5 27 Holden white 14
TOTAL 62 54 48 164 Holden red 7
Holden silver 28
Holden all 49
Ford white 20
Ford red 2
Ford silver 5
Ford all 27
all white 62
all red 54
all silver 48
all all 164
Data Cubes
The generalization of a cross tab, which is 2-dimensional, to n
dimensions can be visualized as a n-dimensional cube, called
the data cube.
white
color
red
silver
all
MOLAP vs ROLAP
OLAP systems can use multi-dimensional array to store data cubes, called
multidimensional OLAP systems (MOLAP) .
Alternatively, they can stored data as relations in relational databases, called
relational OLAP systems (ROLAP).
ROLAP
The main relation, which relates dimensions to measures, is called the fact table.
e.g., sales(prod_id, date, shop_id, num_sold)
Very large, accumulation of facts such as sales
Each dimension can have additional attributes and an associated dimensional
table.
E.g., product(prod_id, price, color)
prod_id is a foreign key of sales
shops(shop_id, location, manager)
sales
prod_id
prod_id
date
Price
shop_id
color
num_sold shop_id
Location
manager
The Star Schema
Dimension tables are not in 3NF
The snowflake schema
A variation of the star schema where the
dimension tables are normalized.
Fact constellation
A set of fact tables that share some dimension
tables
OLAP Queries
A common operation is to aggregate a measure over one or more dimensions, e.g.,
find total/average sales for a product.
find total sales in each city/state/month etc
find top 2 products by total sales
Roll-up: moving from finer granularity to coarser granularity by means of
aggregation.
E.g., given total sales for each city, find total sales for each state.
Drill-down: The inverse of roll-up
Pivoting: aggregate on selected dimensions
Slicing and dicing:
E.g., from the data cube find the cross-tab on Model and Color for medium
cars . The cross-tab can be viewed as a slice of the data cube.
Query Processing Issues
Expensive aggregations are common
Pre-compute all aggregates? Maybe infeasible!
Materialized views can help.
Which views to materialize?
given a query and some materialized views, can we use the views to answer
the query? How?
How frequently should we refresh the views to make them consistent with the
underlying tables?
What indexes should one use?
SQL:1999 Extended Aggregations*
Example 1
Select make, color, size, sum(number) from sales
group by cube(make, color, size)
Calculates 8 groupings:
(make, color, size), (make, color), (make, size), …., ().
Example 2
Select make, color, sum(number) from sales
Group by rollup(make, color, size)
Calculates 4 groupings:
(make, color, size), (make, color), (make), ().
Examples in Oracle: Rollup
Oracle Rollup Example
OLTP and OLAP
Should OLAP be Performed Directly
on Operational Databases?
• An OLAP system on the other hand requires mostly a read only access to
data records for summarization and aggregation. If concurrency control and
recovery mechanisms are applied for such OLAP operations, it will
severely impact the throughput of an OLAP system.
OLAP Operations on Multi-dimensional Data
• Slice
• Dice
• Roll-up
• Drill down
• Drill through
• Drill across
• Pivot/Rotate
Do It Exercise
Hint: Provide the participants with a sample data sheet (Excel sheet) and
ask them to demonstrate their understanding of the various OLAP
operations on multi-dimensional data.
Data Warehouse
A repository of information gathered from multiple sources, stored under a unified
schema, usually at a single site .
Data may be augmented with additional attributes, such as timestamp, and
summary information.
Data are stored for a long time, permitting access to historical data.
Interactive response times expected for complex queries; ad-hoc updates
uncommon.
Building Data Warehouse
Issues:
– Semantic integration: When getting data from
multiple sources, must eliminate mismatches, e.g.,
different currencies.
– Heterogeneous sources: must access data from a
variety of source formats.
– Load, refresh, purge: Must load data, periodically
refresh it, and purge too old or useless data
– Metadata management: Must keep track of
source, loading time, etc.
Elements of data warehouse EIS/DSS
Apps
4
Data
Data
Replication &
Cleansing Metadata
1
Information
Operational Data Informational Directory
Database
3
Elements of data warehouse
Data Replication Manager
copying & distribution of data across databases
• data that needs to be copied, source/destination, frequency, data
transforms
• refresh copy entire source, propagate changes only
all external data is transformed & cleansed before adding to warehouse
Informational Database
database that stores data copied from multiple sources by data replication
manager
Information Directory
metadata manager - collects metadata from databases on network
EIS/DSS tools
SQL based query tools
some vendors use extended SQL
Query/Reporting tools
Formulate queries without (extended) SQL or other languages
Result displayed as table, graph, report,
Spreadsheet systems
Web interfaces
Vendor-specific tools
Oracle Discoverer:
• http://www.oracle.com/tools/disc/index.html
Column stores
A recently proposed data storage method that
allows more efficient aggregation queries in
data warehouses
stores data as columns rather than as rows.
See http://en.wikipedia.org/wiki/Column-
oriented_DBMS.
OLAP in BI
Answer a Quick Question
ERP provides several business benefits, here we enumerate the top three:
In short ERP systems are adept at capturing, storing and moving the data
across the various units smoothly.
• Detailed explanation
• when developing their strategy, companies often focus on the most visible aspects
• Involvement of Primary functions and supportive functions
• Support functions interpret business strategy in their daily activities
• No uncoordinated entities, but rather acase of a filter exist between them.
• A filter may exist because it is primarily the individual processes’ owners, on an ad hoc
basis, and not the strategy that defines which information is to be generated by the BA
function.
• BA function prioritizesits tasks according to what best serves the daily target achievement of the
company
• the BA function tasks are performed based on the driving force of different users requesting
information
• Reporting - Development of more or less authorized reports with inconsistent presentations of the
business that they are describing.
• Quality of BA will typically be an assessment of how quickly a question is answered and how
well founded the answer is.
• Other reasons-right conditions simply do not exist.
SCENARIO 2-BA SUPPORTS STRATEGY AT A
FUNCTIONAL LEVEL - DETAILED VIEW
• Adapted information strategy
• BA function is a reactive element, solely employedin connection with the monitoring of whether the defined targets of
the strategy are achieved. (diagram given in next slide)
• Recipients are individual departments, no feedback to the strategic level provided by the BA function.
• The BA function supportscompany performance reporting and processes proactively, but only reactively in terms of how
it supports the strategy creation processes.
• formalized dialogue between individual functions andBA, but the relation to the strategy function is formalized as a
monologue, from strategy to BA function.
• In terms of the quality of BA in such an organization, it’s importantto be good at defining targets based on strategy.
• the BA function is technically competent when it comes to operationalizing these targets via reports and making those
reports both accessible to users and full of the most updated information possible.
• Based on a strategy development process, individual departments define a number of specific
requirements, or targets, they are to achieve.
• It will then be up to the individual functions—with various degrees of autonomy—to decide how
they are going to achieve the given targets.
TARGET REQUIREMENTS-SMART APPROACH
• Specific
• Measureable
• agreed
• Realistic
• Timebound
Business Analytics - Types
Unit 2
Descriptive Analytics
• Meaning
• Descriptive analytics is the analysis of historical data using two key methods –
data aggregation and data mining - which are used to uncover trends and
patterns.
• Descriptive analytics is the process of using current and historical data to
identify trends and relationships. It's sometimes called the simplest form of
data analysis because it describes trends and relationships but doesn't dig
deeper.
Descriptive Analytics
• Meaning
• Descriptive analytics is a commonly used form of data analysis whereby
historical data is collected, organised and then presented in a way that is
easily understood. Descriptive analytics is focused only on what has already
happened in a business and, unlike other methods of analysis, it is not used to
draw inferences or predictions from its findings.
Descriptive Analytics
• How it works?
• Descriptive analytics uses two key methods, data aggregation and data
mining (also known as data discovery), to discover historical data.
• Data aggregation is the process of collecting and organising data to
create manageable data sets. These data sets are then used in the data mining
phase where patterns, trends and meaning are identified and then presented in
an understandable way.
Descriptive Analytics
• How it works? - Conti….
• Descriptive analytics process – steps
• Business metrics are decided.
• First, metrics are created that will effectively evaluate performance against
business goals, such as improving operational efficiency or increasing revenue.
• The success of descriptive analytics heavily relies on KPI (key performance
indicator) governance.
• ‘Without governance,’ ‘there may not be consensus regarding what the data means,
thus guaranteeing analytics a marginal role in decision making.’
Descriptive Analytics
•How it works? - Conti….
prepare the correct data sources to extract the needed data and calculate
– takes place before the analysis stage and is a critical step to ensure accuracy;
• Finally, charts and graphs are used to present findings in a way that non-
Unit 2
Descriptive Analytics
• Examples of Descriptive Analytics
• 1. Traffic and Engagement Reports
•One example of descriptive analytics is reporting.
•If one organization tracks engagement in the form of social media analytics or web
traffic, then they are already using descriptive analytics.
•These reports are created by taking raw data—generated when users interact with
your website, advertisements, or social media content—and using it to compare
current metrics to historical metrics and visualize trends.
Descriptive Analytics
• Examples of Descriptive Analytics
• 1. Traffic and Engagement Reports – Conti…
•For example, Mr.Paul responsible for reporting on which media channels drive the
most traffic to the product page of his company’s website.
•Using descriptive analytics, he can analyze the page’s traffic data to determine the
number of users from each source.
• He may decide to take it one step further and compare traffic source data to
historical data from the same sources.
• This can enable him to update his team on movement; for instance, highlighting
that traffic from paid advertisements increased 20 percent year over year.
Descriptive Analytics
•Examples of Descriptive Analytics
•2. Financial Statement Analysis
•Financial statements are periodic reports that detail financial information about a
business and, together, give a holistic view of a company’s financial health.
•There are several types of financial statements, including the balance sheet, income
statement, cash flow statement, and statement of shareholders’ equity. Each caters to
a specific audience and conveys different information about a company’s finances.
•Financial statement analysis can be done in three primary ways: vertical,
horizontal, and ratio.
Descriptive Analytics
•Examples of Descriptive Analytics
•2. Financial Statement Analysis – Conti…
•Vertical analysis involves reading a statement from top to bottom and
comparing each item to those above and below it. This helps determine
relationships between variables. For instance, if each line item is a percentage
of the total, comparing them can provide insight into which are taking up
larger and smaller percentages of the whole.
Descriptive Analytics
•Examples of Descriptive Analytics
•2. Financial Statement Analysis – Conti…
•Horizontal analysis involves reading a statement from left to right and
comparing each item to itself from a previous period. This type of analysis
determines change over time.
Descriptive Analytics
•Examples of Descriptive Analytics
Unit 2
Diagnostic Analytics
• Meaning
• Diagnostic analytics is a form of advanced analytics that examines data
or content to answer the question, “Why did it happen?”
• It is characterized by techniques such as drill-down, data discovery, data
mining and correlations.
•Diagnostic analytics is the process of using data to determine the causes of
trends and correlations between variables.
•It can be viewed as a logical next step after using descriptive analytics to
identify trends.
Diagnostic Analytics
• Importance of Diagnostic Analytics
• Every company can benefit from gaining a better understanding of its business
performance in order to replicate its success and fix any problems.
• Diagnostic analytics helps companies better understand the internal and external
factors that affect its outcomes.
• It paints a more comprehensive picture of each situation, which helps businesses
make better decisions. For example, if the company determines that a specific online
marketing campaign is responsible for higher sales of a key product, it can direct
more resources to that campaign and create similar campaigns for other products.
Diagnostic Analytics
• How Does Diagnostic Analytics Work?
•Diagnostic analytics uses a variety of techniques to provide insights into the
causes of trends. These include:
•Data drilling: Drilling down into a dataset can reveal more detailed
information about which aspects of the data are driving the observed trends.
For example, analysts may drill down into national sales data to determine
whether specific regions, customers or retail channels are responsible for
increased sales growth.
Diagnostic Analytics
• How Does Diagnostic Analytics Work? – Conti…
•Data mining hunts through large volumes of data to find patterns and
associations within the data. For example, data mining might reveal the most
common factors associated with a rise in insurance claims. Data mining can
be conducted manually or automatically with machine learning technology.
Diagnostic Analytics
• How Does Diagnostic Analytics Work? – Conti…
•Correlation analysis examines how strongly different variables are linked to
each other. For example, sales of ice cream and refrigerated soda may soar on
hot days.
Diagnostic Analytics
•Three Diagnostic Analytics Categories
•The diagnostic analytics process of determining the root cause of a problem
or trend typically comprises three primary stages.
•Identify anomalies: Trends or anomalies highlighted by descriptive analysis
may require diagnostic analytics if the cause isn’t immediately obvious. In
addition, it can sometimes be difficult to determine whether the results of
descriptive analysis really show a new trend, especially if there’s a lot of
natural variability in the data. In those cases, statistical analysis can help to
determine whether the results actually represent a departure from the norm.
Diagnostic Analytics
•Three Diagnostic Analytics Categories – Conti…
•Discovery: The next step is to look for data that explains the anomalies: data
discovery.
• That may involve gathering external data as well as drilling into internal
data. For example, searching external data might reveal changes in supply
chains, new regulatory requirements, a shifting competitive landscape or
weather patterns that are associated with the anomalous data.
Diagnostic Analytics
• Three Diagnostic Analytics Categories – Conti…
• Causal relationships: Further investigation can provide insights into
whether the associations in the data point to the true cause of the anomaly.
• The fact that two events correlate doesn’t necessarily mean one causes the
other.
• Deeper examination of the data associated with the sales increase can
indicate which factor or factors were the most likely cause.
Business Analytics - Types
Unit 2
Descriptive Analytics - Advantages
• Presents otherwise complex data in an easily digestible format.
• Provides a direct measure of the incidence of key data points.
• Is inexpensive and only requires basic mathematical skills to carry out.
• Is faster to carry out, especially with help from tools like Python or MS
Excel.
• Relies on data that organizations already have access to, meaning there’s no
need to source additional data.
• Looks at a complete population (rather than data sampling), making it
considerably more accurate than inferential statistics.
Descriptive Analytics - Disadvantages
• We can summarize data sets we have access to, but these may not tell a
complete story.
• We cannot use descriptive analytics to test a hypothesis or understand why
data present the way they do.
• We cannot use descriptive analytics to predict what may happen in the
future.
• We cannot generalize your findings to a broader population.
• Descriptive analytics tells us nothing about the data collection methodology,
meaning the data set may include errors.
Diagnostic Analytics - Advantages
• Understanding the causes of business outcomes is critical to a company’s
ability to grow and learn from mistakes.
• Diagnostic analytics lets companies zero in on the factors that drive success
or cause failure, including contributing factors that may not be obvious at first
glance.
• Diagnostic analytics can help to in still a data-driven analytical culture
throughout the business.
Diagnostic Analytics – Advantages – Conti..
• When business leaders understand that the company has the tools to
investigate the cause of problems, they’re more likely to use diagnostic
analytics in their decision-making.
• For example, if a problem in on-time deliveries is identified and
further supply chain analysis reveals disruptions and unpredictable lead times,
managers may choose to increase the inventory on hand to meet customer
demand.
Diagnostic Analytics - Disadvantages
• A drawback of diagnostic analytics is that it focuses on historical data; it can
only help businesses understand why events happened in the past.
• In addition, further investigation may be needed to determine whether the
correlations revealed by diagnostic analytics really show cause and effect.
• To look into the future, businesses need to use other analytic techniques,
such as predictive analytics, which examines the potential future impact of
trends and events, and prescriptive analytics, which suggests actions
businesses can take to influence the outcome of those future trends.
Diagnostic Analytics - Comparison
UNIT 2
• Weather forecasts
• Creating video games
• Translating voice to text for mobile phone messaging
• Customer service
• Investment portfolio development
Uses of Predictive Analytics
• Forecasting
• Credit
• Underwriting
• Marketing
Predictive Analytics vs. Machine Learning
A common misconception is that predictive analytics and machine learning
are the same things.
• Predictive Analytics help us understand possible future occurrences by
analyzing the past. At its core, predictive analytics includes a series of
statistical techniques (including machine learning, predictive modeling,
and data mining) and uses statistics (both historical and current) to
estimate, or predict, future outcomes.
Predictive Analytics vs. Machine Learning
• Machine Learning, on the other hand, is a subfield of computer science
that, as per the 1959 definition by Arthur Samuel (an American pioneer
in the field of computer gaming and artificial intelligence) means "the
programming of a digital computer to behave in a way which, if done by
human beings or animals, would be described as involving the process of
learning."
Types of Predictive Analytical Models
1. Decision Trees
2. Regression
3. Neural Networks
1. Decision Trees
This is the model that is used the most in statistical analysis. Use it
when you want to determine patterns in large sets of data and when
there's a linear relationship between the inputs. This method works by
figuring out a formula, which represents the relationship between all the
inputs found in the dataset. For example, you can use regression to
figure out how price and other key factors can shape the performance of
a security.
3. Neural Networks
• The use of predictive analytics has been criticized and, in some cases,
legally restricted due to perceived inequities in its outcomes. Most
commonly, this involves predictive models that result in statistical
discrimination against racial or ethnic groups in areas such as credit
scoring, home lending, employment, or risk of criminal behavior.
• A famous example of this is the (now illegal) practice of redlining in
home lending by banks. Regardless of whether the predictions drawn
from the use of such analytics are accurate, their use is generally
frowned upon, and data that explicitly include information such as a
person's race are now often excluded from predictive analytics.
IV. Prescriptive Analytics
1. Information created and stored in a computer mediated environment that can potentially be transmitted as
discrete information signals over the internet, and may be subsequently processed and/or stored for a range of
known and unforeseen purposes. Learn more in: Ethical Benefits and Drawbacks of Digitally Informed Consent
2. The term of digital data is a binary format of information. The computer is converted into some machine-
readable digital format. Learn more in: Threats and Security Issues in Smart City Devices
3. Any data that is produced, collected, stored, and transmitted as a natural result of digitization. Learn more in: An
Evaluation on Turkey Within the Scope of Digital Transformation
4. The information generated by any digital device is called digital data. Learn more in: A New Framework for
Politics, Law, and Government in the Digital Era: A Judge-Based Political System
5. Digital data is any information created by a digital device. Learn more in: Right to Correct Information in the
Cyber World
What Does Digital Data Mean?
• Digital data is data that represents other forms of data using specific
machine language systems that can be interpreted by various technologies.
The most fundamental of these systems is a binary system, which simply
stores complex audio, video or text information in a series of binary
characters, traditionally ones and zeros, or "on" and "off" values.
• One of the biggest strengths of digital data is that all sorts of very complex
analog input can be represented with the binary system. Along with
smaller microprocessors and larger data storage centers, this model of
information capture has helped parties like businesses and government
agencies to explore new frontiers of data collection and to represent more
impressive simulations through a digital interface.
Digital data sources
Digital data comes from many different sources.
You can look at external sources, where the data is usually aggregated and
talks about broad customer segments. The insights are usually about broad
and general online behaviours.
Your internal sources of data are usually more specific. You can often drill
down to an individual customer level, and specific interactions you have with
them. You can aggregate these up to create broad insights and use these to
create more personalised customer experiences.
Digital data from external sources
This guide will focus on the free sources you can use to capture digital
data. Do note however, there are also paid sources like Similar Web you can
use if the free sources don’t meet your needs.
Most of the big online players like Google, Facebook and Amazon make some
of their digital data available for free. Sharing data encourages you to use their
platform more often, especially for online advertising.
In general Google is the most generous and helpful when it comes to sharing
digital data. They’re a good place to start as you try to look for online trends
and business opportunities.
Google Trends
• Google Ads Keyword Research tool is another very useful digital data
source from Google. It lets you dig deeper into specific paid search
volumes on keywords than you can with Google Trends.
• You use these insights to drive your SEO and paid search in
your digital media plan.
Facebook Audience Insights
• Googles’s not the only sources of external digital data though. Facebook
give you access to their data when you book advertising with them.
• You have to set up a Business Account to access their Audience
Insights tool. This gives you access to lots of digital data about what is
happening on Facebook.
• You can filter by country, age, gender and interests to learn more about
your potential target audience.
• So in this example, we’ve taken the total population of Facebook users in
Australia.
• And we can see which brands in different types of categories are the most
popular by likes.
Amazon Best Sellers
• SQL Database
• Spreadsheets
• Censors
• Medical Devices
• Online Forms
• Point of Sales System
• Web and Server Logs
What is Unstructured Data?
Unstructured data is the data which does not conforms to a data model
and has no easily identifiable structure such that it can not be used by a
computer program easily. Unstructured data is not organised in a pre-
defined manner or does not have a pre-defined data model, thus it is
not a good fit for a mainstream relational database.
Characteristics of Unstructured Data:
• Data neither conforms to a data model nor has any structure.
• Data can not be stored in the form of rows and columns as in
Databases
• Data does not follows any semantic or rules
• Data lacks any particular format or sequence
• Data has no easily identifiable structure
• Due to lack of identifiable structure, it can not used by computer
programs easily
Sources of Unstructured Data:
• Web pages
• Images (JPEG, GIF, PNG, etc.)
• Videos
• Memos
• Reports
• Word documents and PowerPoint presentations
• Surveys
Advantages of Unstructured Data:
• Its supports the data which lacks a proper format or sequence
• The data is not constrained by a fixed schema
• Very Flexible due to absence of schema.
• Data is portable
• It is very scalable
• It can deal easily with the heterogeneity of sources.
• These type of data have a variety of business intelligence and
analytics applications.
Disadvantages Of Unstructured data:
• Email
• HTML
• Online images and videos
• Electronic data interchange
Benefits of semi-structured data
• Health care
• Banking
• Retail
Key Characteristics of Data Warehouse
• Subject-Oriented
• Integrated
• Non-Volatile
• Time-Variant
Database vs. Data Warehouse
In simple terms, data quality tells us how reliable a particular set of data
is and whether or not it will be good enough for a user to employ in
decision-making. This quality is often measured by degrees.
Data Quality Dimensions
• Accuracy
• Completeness
• Consistency
• Timeliness
• Uniqueness
• Validity
How Do You Improve Data Quality?
• Data Governance
• Data Profiling
• Data Matching
• Data Quality Reporting
• Master Data Management (MDM)
• Customer Data Integration (CDI)
• Product Information Management (PIM)
• Digital Asset Management (DAM)
What are the characteristics of data quality?
• Accuracy
• Availability
• Completeness
• Completeness
• Relevance
• Reliability
• Timeliness
ETL Process in Data Warehouse
• Scalability
• Simplicity
• Out-of-the-box
• Compliance
• Long-term costs
Introduction to Multidimensional Data Model
External data helps you build a picture of the category and your target
audience. But, it can only take you so far. You can only access what
those sources give you. And the data is shared. Anyone can access it.
If you want data that’s unique to your brand and customers, you need to
look at internal sources. You set this up, so you’ve a lot more control
over what data you capture.
Google Analytics
• One of the first jobs you do when setting up a website is to attach Google
Analytics to it.
• Google Analytics is an online tool that measures and tracks what people do when
they visit websites. It’s a vital tool to understand what works and what doesn’t on
your website.
• This understanding helps you set and track key digital objectives and measures.
• Google Analytics is an aggregated data source. It doesn’t identify individual
visitors. But it does show what visitors as a whole do on your site. So, you’re able
to look at common patterns of behaviours.
• For example, Google Analytics can tell you how many visitors come to your site.
But not just that, it can tell you what time of day they came, down to the last
minute. It can identify which cities or countries they were in and what type of
device and browser they used.
Acquisition
• Acquisition tells you how customers found their way to your website. Did they
click through from your social or SEO activities for example? Or was it a link from
another site?
• You initiate this by placing “tags” into your digital media campaigns.
• These are small pieces of code, which send a notification to Google Analytics
when a specific action happens. For example, someone clicking on the advert.
• You use this data to evaluate the impact of your advertising and your media
choices. You understand what works and what doesn’t based on how customers
interact (or don’t).
• In simple terms, the data tells you to do more of what works. And less of what
doesn’t.
• You should check your acquisition data regularly. In some cases, that could be
daily. In others, monthly can be OK. But the insights it gives you help you optimise
your digital media. You get more bang for your buck when you use this data.
Behaviour
• Bounce rate is the percentage of people who land on a page but then have
no interaction with the page. They ‘bounce’ off the page and off the site.
• You generally want customers to interact with the page. For example, you
use calls to action like click a link, view a video, download a file and so
on. If lots of customers don’t interact with a page (i.e. it has a high bounce
rate), it suggests something’s wrong in the set-up of the page.
• Bounce rates vary from site to site and from category to category. But as a
rule of thumb, a bounce rate over 60% is usually cause for concern.
Anything under 30% is usually good. And anywhere between 30% to 60% is
the ‘norm’. You can make it better, but it’s not a disaster.
Pages / session
• The final section in Google Analytics is Conversion. Here you can set
up specific events or goals to track. These are specific actions
customers do on your website, which you want to measure.
• These can be as simple as recording a sale on your e-commerce site.
But they could also be more complex. For example, people who
viewed a page or added the product to their cart, but then didn’t buy.
• They don’t always have to be sales conversions. They could be any call
to action. Reading a specific page. Spending a set amount of time on
the site. Downloading a specific tool. Registering for email updates.
These would all be examples of conversion goals.
One-to-one internal data
• omething like an email newsletter for example might only require the
capture of an email address. But you may also want to
capture when the registration happened. For example, so you could
tie it back to a campaign that tried to drive registrations.
• And while an email address is helpful, if you can start to add more
digital data to that email address, you can start to pull out much
richer insights.
CRM and e-Commerce
• For example, where you have your own e-Commerce store and details of a
consumer who buys from you, you will by nature of the transaction capture a
higher level of detail about the customer. Their name, their shipping and billing
address and what and when they shopped with you.
• As we cover in our set-up your own store guide, you’d most likely use a third
party payment gateway to manage their credit card details. But nonetheless,
there’ll still be a large amount of digital data you’ll have about the online
shopper.
• This CRM and e-Commerce data is extremely valuable to build up your level of
insight about customers, and helps you build more loyalty with them. .
• It helps you offer them better, more personalised and relevant experiences. But
it also comes with challenges from a legal and IT point of view, in terms of how
you set-up and manage this data.
Unit 4
• Meaning
• Financial analysis refers to an assessment of the viability, stability, and
profitability of a business, sub-business or project.
• It is performed by professionals who prepare reports using ratios and other
techniques, that make use of information taken from financial statements and
other reports.
Financial Analytics
• Meaning
• Financial analysis refers to the process of evaluating businesses, projects,
budgets and other finance-related entities to determine the stability, solvency,
liquidity or profitability of an organization.
Financial Analytics - Importance
• Financial analytics can help companies determine the risks they face, how to
enhance and extend the business processes that make them run more
effectively, and whether organizations' investments are focused on the right
areas.
• Advanced analytics and its ability to leverage big data will enable
organizations to rethink their strategies for solving problems and supporting
business decisions.
Financial Analytics – Importance – conti…
• Analytics can also help companies examine the profitability of products
across various sales channels and customers, which market segments will add
more profit to the business and what could have an impact on the business in
the future.
•Continuous visibility into financial and operational performance will help
with more than just decision-making; it will also increase visibility regarding
the processes that support those decisions.
Financial Analytics – Importance – conti…
• Another plus is the potential for improved electronic linkage of records
across the supply chain so that data will only need to be entered once.
• Despite the promise of financial analytics, business experts from the
academic and corporate worlds warn against automating bad processes. They
note that the processes that provide financial insights based on historical data
are often disconnected and leave serious data gaps. Poor-quality data can hurt
business performance and lead to incomplete or inaccurate customer or
prospect data, ineffective marketing and communications efforts, increased
spending and bad decisions.
Unit 4
Business analytics uses sorting, collating, processing, and studying data through
iterative methodologies and statistical models to generate meaningful and business-
relevant insights. These insights help organisations in solving business problems and
increase their revenue, efficiency, and productivity.