Business Analytics 705 v1 468
Business Analytics 705 v1 468
Business Analytics 705 v1 468
Developed by
Prof. Abasaheb Chavan
On behalf of
Prin. L.N. Welingkar Institute of Management Development & Research
Advisory Board
Chairman
Prof. Dr. V.S. Prasad
Former Director (NAAC)
Former Vice-Chancellor
(Dr. B.R. Ambedkar Open University)
Board Members
1. Prof. Dr. Uday Salunkhe 2. Dr. B.P. Sabale 3. Prof. Dr. Vijay Khole 4. Prof. Anuradha Deshmukh
Group Director Chancellor, D.Y. Patil University, Former Vice-Chancellor Former Director
Welingkar Institute of Navi Mumbai (Mumbai University) (YCMOU)
Management Ex Vice-Chancellor (YCMOU)
ALL RIGHTS RESERVED. No part of this work covered by the copyright here on may be reproduced or used in any form or by any means – graphic,
electronic or mechanical, including photocopying, recording, taping, web distribution or information storage and retrieval systems – without the written
permission of the publisher.
3
BIO SKETCH
4
CONTENTS
Contents
5
BUSINESS ANALYTICS: OVERVIEW
Chapter 1
BUSINESS ANALYTICS: OVERVIEW
Objectives:
Structure:
1.1 Introduction
1.13 Summary
6
BUSINESS ANALYTICS: OVERVIEW
1.1 INTRODUCTION
Analytics is the science of analyzing the data for decision making. Then,
what is Business Analytics? Business analytics is the process of collating,
sorting, processing, and studying business data, and using statistical
models and methodologies to transform data into business insights. The
goal of business analytics is to determine which datasets are useful and
how they can be leveraged to solve problems and increase efficiency,
productivity, and revenue.
In other words, whatever the form of business analytics may be, it would
help us answer the following fundamental questions which are critical for
decision making:
7
BUSINESS ANALYTICS: OVERVIEW
1. What happened?
• What did the data tell us?
8
BUSINESS ANALYTICS: OVERVIEW
However, data in its raw form is usually useless and driving force behind
any data driven organization is in sights: conclusion drawn from data which
can suggest new course of action. To reach these insights, organization
must use Business analytics tools and techniques to connect the data from
multiple sources, analyze the data and communicate the results in a way
that decision makers can understand. Typically, commercial organizations
use the business analytics in order to;
9
BUSINESS ANALYTICS: OVERVIEW
Prediction uses what you currently know about your business to discover
what will happen next. Imagine the value of knowing your customers likely
future value and their receptiveness to future sales, from the moment they
sign on. Then you could place them in the appropriate channel from the
beginning and improve marketing Bang for Buck. Or imagine forecasting
KPIs based on all data collected by your organization rather than just
“extrapolating Trend”
10
BUSINESS ANALYTICS: OVERVIEW
11
BUSINESS ANALYTICS: OVERVIEW
2. Raising the correct line of reasoning for what you read, learned,
and wrote in the past: Unless we verify the validity of information
sources (e.g., books, published articles, digital media), we can be sold
on logical fallacies and may end up making wrong decisions. To obviate
such mistakes, we should be aware of common logical fallacies based on
the faulty reasoning. For example, a premise based on the inverse
reasoning stating “If you do not reduce product price, it will not affect
product value and thus will not hurt the potential sales of that
product” can lead to no business action when the product at the current
price is not selling well in the market. Instead, we should have
developed the proper reasoning stating “If you reduce product price, it
will improve product value for potential customers and thus increase its
sales.” Other potential sources of logical fallacies may include the hasty
generalization of one-time instance or limited anecdotal incidents, the
unconditional belief in the high authority’s opinions, and the false
association (e.g., a turtle brings good luck to the individual who owns it
as a pet, although it is more likely to contain deadly salmonella bacteria
than other pets).
12
BUSINESS ANALYTICS: OVERVIEW
13
BUSINESS ANALYTICS: OVERVIEW
• Data Aggregation
Data aggregation can be used for a wide range of purposes in the travel
industry. These include competitive price monitoring, competitor research,
gaining market intelligence, customer sentiment analysis, and capturing
images and descriptions for the services on their online travel sites.
Competition in the online travel industry is fierce, so data aggregation or
the lack thereof can make or break a travel company.
Travel companies need to keep up with the ever-changing travel costs and
property availability. They also need to know which destinations are
trending and which audiences they should target with their travel offers.
The data needed to gain these insights is spread across many places on the
internet, making it difficult to gather manually.
14
BUSINESS ANALYTICS: OVERVIEW
Data Mining
15
BUSINESS ANALYTICS: OVERVIEW
❖ Text Mining: Companies can also collect textual information from social
media sites, blog comments, and call center scripts to extract meaningful
relationship indicators. This data can be used to:
- Energy demands for a city with a static population in any given month
or quarter
- Retail sales for holiday merchandise, including biggest sales days for
both physical and digital stores
- Spikes in internet searches related to a specific recurring event, such
as the Super Bowl or the Olympics
16
BUSINESS ANALYTICS: OVERVIEW
❖ Optimization
- Peak sales pricing and using demand spikes to scale production and
maintain a steady revenue flow
- Inventory stocking and shipping options that optimize delivery
schedules and customer satisfaction without sacrificing warehouse
space
- Prime opportunity windows for sales, promotions, new products, and
spin-offs to maximize profits and pave the way for future opportunities
❖ Data Visualization
Information and insights drawn from data can be presented with highly
interactive graphics to show:
17
BUSINESS ANALYTICS: OVERVIEW
There are four types of business analytics, each increasingly complex and
closer to achieving real-time and future situation insight application. These
analytics types are usually implemented in stages, starting with the
simplest, though one type is not more important than another as all are
interrelated.
The following brief description provide insight into the roles of each type in
the analytics process. By leveraging these four types of analytics, big data
can be dissected, absorbed, and used to create solutions for many of the
biggest challenges facing businesses today.
18
BUSINESS ANALYTICS: OVERVIEW
1. Descriptive Analytics
The findings from descriptive analytics can quickly identify areas that
require improvement - whether that be improving learner engagement or
the effectiveness of course delivery.
19
BUSINESS ANALYTICS: OVERVIEW
2. Diagnostic Analytics
Diagnostic analytics shifts from the “what” of past and current events to
“how” and “why,” focusing on past performance to determine which factors
influence trends. This type of business analytics employs techniques such
as drill-down, data discovery, data mining, and correlations to uncover the
root causes of events.
The ideal Example of Diagnostic Analytics is Health care data analytics and
Real-Time Alerting.
20
BUSINESS ANALYTICS: OVERVIEW
Another example is that of Asthma polis, which has started to use inhalers
with GPS-enabled trackers in order to identify asthma trends both on an
individual level and looking at larger populations. This data is being used in
conjunction with data from the CDC in order to develop better treatment
plans for asthmatics.
3. Predictive Analytics
21
BUSINESS ANALYTICS: OVERVIEW
• Health: One early attempt at this was Google Flu Trends (GFT). By
monitoring millions of users’ health tracking behaviors online and
comparing it to a historic baseline level of influenza activity for a
corresponding region, Google hoped to predict flu patterns. But its
numbers proved to be way overstated, owing to less than ideal
information from users. But there are other uses, such as predicting
epidemics or public health issues based on the probability of a person
suffering the same ailment again. Or predicting the chances of a person
with known illness ends up in Intensive Care due to changes in
environmental conditions. It can also predict when and why patients are
readmitted and when a patient needs behavioral health care as well.
22
BUSINESS ANALYTICS: OVERVIEW
Utilities can also predict when customers might get a high bill and send
out customer alerts to warn customers they are running up a large bill
that month. Smart meters allowed utilities to warn customers of spikes
at certain times of the day, helping them to know when to cut back on
power use.
23
BUSINESS ANALYTICS: OVERVIEW
Predictive analytics are needed to help sort what’s coming in to weed out
useless data and find what you need to take intelligent actions. In one
example, Cisco and Rockwell Automation helped a Japanese automation
equipment maker reduce down time of its manufacturing robots to near
zero by applying predictive analytics to operational data.
4. Prescriptive Analytics
24
BUSINESS ANALYTICS: OVERVIEW
Analytical model are mathematical model that that have a closed form
solution, i.e. the solution to the equation used to describe changes in the
system can be expressed as mathematical analytic function.
• Classification Models:
❖ Trusting Models: Some models are more opaque than others, i.e.it is
hard to understand the logic and model used to identify relevant patterns
and relationships in the data. The problem with these “Black Box” Models
is that business people often have a hard time trusting them until they
see quantitative results, such as reduced cost or higher revenue. Getting
business users to understand and trust the output of analytical models is
perhaps the biggest challenge in the data mining.
25
BUSINESS ANALYTICS: OVERVIEW
26
BUSINESS ANALYTICS: OVERVIEW
d. Data pollution: Data entry errors, misused fields and bogus data.
A data warehouse with well documented data can greatly accelerate the
data exploration phase because it also maintains mush of this
information.
iii. Data Preparation: One analytical modeller document and select the
data sets, then they must standardise and enrich the data. This means,
correcting any data errors that exist in the data and standardising in
machine format followed by merging and flattering that data in to single
wide table which may consists of hundreds of variables. After this,
analytical modellers transform the data
27
BUSINESS ANALYTICS: OVERVIEW
28
BUSINESS ANALYTICS: OVERVIEW
29
BUSINESS ANALYTICS: OVERVIEW
Business analytics and business intelligence tools are being integrated with
the ERP (Enterprise Resource Planning) system to facilitate better, accurate
and quicker decision making. Companies have realised that to maximise
the value of information stored in their ERP system, it is necessary to
extend these ERP architecture to include more advanced reporting,
analytical and decision support capabilities. This is best accomplished
through the application of data warehousing, data mining and other
analysis, reporting and business intelligence tools and techniques
including:
30
BUSINESS ANALYTICS: OVERVIEW
31
BUSINESS ANALYTICS: OVERVIEW
The intersection between data science and consulting is growing due to two
major large scale transformation happening in today’s enterprise
information environment.
32
BUSINESS ANALYTICS: OVERVIEW
With increase in volumes , the velocity of data generation , and the variety
of data more tools are available to deal with the “Big data” problem of
capturing and analysing large data to create value. Larger firms
specifically, are now faced with problem of integrating these new tools in to
their already complex information architecture environment. Traditional
relational data bases, such as oracle data bases, are no longer enough to
keep up with the new type of data that companies are looking to analyse,
such as Twitter feeds or call centre recordings. Data storage capabilities in
the form of Hadoop Distributed File System (HDFS) are becoming staple in
companies information architecture platform. Other competitors like
Teradata, are also becoming essential in firms data tool box.
On top of data storage tools new analytics tool are being deployed to draw
analytics insight with capabilities to extract and source the data from these
new data storage environment. Some tools focussed the capabilities that
performs the advanced analytics techniques , such as Python and R, while
other tools that provides quick and easy data manipulation in a more
intuitive way are also gaining tractions. Tools like Tableau and Spotfire are
providing ways for business analysts to gain visual insights in a wide
variety of graphs and infographics in a form of visual dashboard. They can
source from Bigdata storage environment, traditional data warehouses or
even simple text and excel files and are introducing more integration with
languages such as R to increase analytic capabilities. According to Tableau
Ä2013 report by Aberdeen group found that organisations that used the
visual discovery tools, 48 percentage of BI users can find the information
they need without the help of IT Staff. “
Without visual discovery, the rate drops to mere 23 per cent. These are
incredibly telling statistics because while data science is needed in the
market there are still many business analysts that do not necessarily have
development skills to analyse the data straight from the source.
33
BUSINESS ANALYTICS: OVERVIEW
to see a fluid relationship between different variables as they see fit. They
simply point and click on the visualisation they wish to use. Data scientist
can use analytic techniques to present the data in the format needed by
tool such as Tableau, which can then render the output in a clean visual
format for the end user. This becomes the part of inter woven information
environment where the drawn stream deliverables is greatly affected by
the data science work stream.
In many cases , companies invest in the tools above to keep up with the
place of data growth and invest in the future of the data capacity and
analysis. These tools are new and subject matter experts are needed not
only to develop relevant apps with these tools but also build out strategic
plans on how to deploy the tools across their information environment.
Consultants are brought in for both types of information architecture
projects for their knowledge of the Big Data architecture and development
capabilities.
Consultant data scientists are hired not for their knowledge of writing the
code for these tools , but for their understanding of the process flow of the
system and their use in the cases, and plans for deployments. The data
scientists doing the grunt work ( coders) are, also usually best people to
help create time lines for the project plans and strategic proposals on why
and how a firm would use big data tools to get the most value out of them.
• Customer analytics
• Marketing analytics
• Web analytics
• Text and speech to text analytics
• Pricing and sales analytics
• Workforce (Human resources) analytics
34
BUSINESS ANALYTICS: OVERVIEW
Due to way the data is used for these types of analytics which is
aggregated and stored, nuanced way to track, measure and draw value
from the data are needed. As a result, they are being introduced rapidly in
to today’s market.
Consultants are in an ever evolving business where new tools and projects
come to forefront. Data science is now not just a role for an internal
analysts and coder, companies are investing more time and resources in to
getting these skill set in their consulting practices to help their clients.
One would be hard pressed to find areal world, large scale consulting
projects that do not require some analytics work to be done to propose
tangible solutions based on factual data insight. While analytics is indeed a
very broad term as it is used currently in the consulting market, with a way
to consulting market is evolving , there will be much larger demand for the
data science variety of analytics and smaller focus on solely strategic
projects. New products will come to the market to meet the client’s
particular needs ever more exactly- and data science will be interwoven
tightly with consulting. Consulting firms will compete against each other to
grasp the influx and analytics project that are going to be in the market
and hire the data science talent that can fulfil this demand. The tools and
techniques for consultant data scientists are ever growing and today is the
best time to get in to data science market as consultant and grow with it.
35
BUSINESS ANALYTICS: OVERVIEW
The systems can anticipate when and where more than 3,000 different oil
drilling machine parts might fail, keep Shell informed about the location of
parts at their worldwide facilities, and plan when to make purchases of
machine parts. These systems also determine where to place inventory
items and how long to keep parts before putting them into rotation or
replacing/returning them. Shell has since reduced inventory analysis from
over 48 hours to less than 45 minutes, saving millions of dollars each year
thanks to reduced costs of moving and reallocating inventory.
36
BUSINESS ANALYTICS: OVERVIEW
accuracy rate. The company estimates that repeat orders increased its
revenue by $50,000 per year, and customer churn reduction equaled
retained revenues of $60,000 per year.
37
BUSINESS ANALYTICS: OVERVIEW
1.13 SUMMARY
The word analytics has come into the foreground in last decade or so. The
proliferation of the internet and information technology has made analytics
very relevant in the current age. Analytics is a field which combines data,
information technology, statistical analysis, quantitative methods and
computer-based models into one. This all are combined to provide decision
makers all the possible scenarios to make a well thought and researched
decision. The computer-based model ensures that decision makers are able
to see performance of decision under various scenarios.
38
BUSINESS ANALYTICS: OVERVIEW
Business analytics uses data from three sources for construction of the
business model. It uses business data such as annual reports, financial
ratios, marketing research, etc. It uses the database which contains
various computer files and information coming from data analysis.
There are four types of business analytics, each increasingly complex and
closer to achieving real-time and future situation insight application. These
analytics types are usually implemented in stages, starting with the
simplest, though one type is not more important than another as all are
interrelated.
39
BUSINESS ANALYTICS: OVERVIEW
With increase in volumes , the velocity of data generation , and the variety
of data more tools are available to deal with the “Big data” problem of
capturing and analysing large data to create value. Larger firms
specifically, are now faced with problem of integrating these new tools in to
their already complex information architecture environment. Traditional
relational data bases, such as oracle data bases, are no longer enough to
keep up with the new type of data that companies are looking to analyse,
such as Twitter feeds or call centre recordings. Data storage capabilities in
the form of Hadoop distributed file system (HDFS) are becoming staple in
companies information architecture platform. Other competitors like
Teradata , are also becoming essential in firms data tool box.
In the following chapters, you can see more insight on various topics of
business analytics.
40
BUSINESS ANALYTICS: OVERVIEW
5. What type of tools are being integrated with the ERP (Enterprise
Resource Planning) system to facilitate better, accurate and quicker
decision making?
a. Business intelligence
b. Business analytics
c. Both business analytics and business intelligence
d. Enhanced data extraction functionality
41
BUSINESS ANALYTICS: OVERVIEW
42
BUSINESS ANALYTICS: OVERVIEW
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
43
COMPONENTS OF BUSINESS ANALYTICS
Chapter 2
Components of Business Analytics
Objectives:
Structure:
2.1 Introduction
2.6 Forecasting
2.8 Optimisation
2.9 Visualisation
2.10 Summary
44
COMPONENTS OF BUSINESS ANALYTICS
2.1 INTRODUCTION
In later years the business analytics have exploded with the introduction to
computers. This change has brought analytics to a whole new level and has
brought about endless possibilities.
• Find patterns in your data for further analysis e.g. product association
• Find out outliers from the huge data points e.g. fraud detection
• Identify relationships within the key data variables for further prediction
e.g. next likely purchase from the Customer
45
COMPONENTS OF BUSINESS ANALYTICS
• Provide insights as to what will happen next e.g. which of the Customers
are leaving us
• Gain the competitive advantage.
Now that you know the difference between BI & BA, let us discuss the
typical components in Analytics. Following are major components or
categories in any analytics solution.
46
COMPONENTS OF BUSINESS ANALYTICS
47
COMPONENTS OF BUSINESS ANALYTICS
The retail and ecommerce industries have many possible uses for data
aggregation. One is competitive price monitoring. Competitive research is
necessary to be successful in the ecommerce and retail space. Companies
have to know what they’re up against. So, they must always be gathering
new information about their competitors’ product offerings, promotions,
and prices. This data can be pulled from competitor’s websites or from
other sites their products are listed on. In order to get accurate
information, the data needs to be aggregated from every single relevant
source. That’s a tall order for manual web data analysis.
48
COMPONENTS OF BUSINESS ANALYTICS
Data aggregation can be used for a wide range of purposes in the travel
industry. These include competitive price monitoring, competitor research,
gaining market intelligence, customer sentiment analysis, and capturing
images and descriptions for the services on their online travel sites.
Competition in the online travel industry is fierce, so data aggregation or
the lack thereof can make or break a travel company.
Travel companies need to keep up with the ever-changing travel costs and
property availability. They also need to know which destinations are
trending and which audiences they should target with their travel offers.
The data needed to gain these insights is spread across many places on the
internet, making it difficult to gather manually. That’s where our data
extraction and aggregation service, Web Data Integration, comes in.
WDI not only extracts and aggregates the data you need, it
also prepares and cleans the data and delivers it in a consumable format
for integration, discovery and analysis. So, if company needs accurate, up-
to-date data from the web, Web Data Integration is right choice for
company.
49
COMPONENTS OF BUSINESS ANALYTICS
Data mining is looking for hidden, valid, and potentially useful patterns in
huge data sets. Data Mining is all about discovering unsuspected/
previously unknown relationships amongst the data. It is a multi-
disciplinary skill that uses machine learning, statistics, and database
technology. The insights derived via Data Mining can be used for
marketing, fraud detection, and scientific discovery, etc.
Types of Data
50
COMPONENTS OF BUSINESS ANALYTICS
I. Business understanding:
II.Data understanding:
51
COMPONENTS OF BUSINESS ANALYTICS
III.Data preparation:
For example, for a customer demographics profile, age data is missing. The
data is incomplete and should be filled. In some cases, there could be data
outliers. For instance, age has a value 300. Data could be inconsistent. For
instance, name of the customer is different in different tables.
IV.Data transformation:
52
COMPONENTS OF BUSINESS ANALYTICS
The result of this process is a final data set that can be used in modelling.
V. Modelling
VI.Evaluation:
53
COMPONENTS OF BUSINESS ANALYTICS
VII.Deployment:
1. Classification:
2. Clustering:
Clustering analysis is a data mining technique to identify data that are like
each other. This process helps to understand the differences and
similarities between the data.
3. Regression:
4. Association Rules:
This data mining technique helps to find the association between two or
more Items. It discovers a hidden pattern in the data set.
54
COMPONENTS OF BUSINESS ANALYTICS
5. Outer detection:
6. Sequential Patterns:
7. Prediction:
Prediction has used a combination of the other data mining techniques like
trends, sequential patterns, clustering, classification, etc. It analyses past
events or instances in a right sequence for predicting a future event.
55
COMPONENTS OF BUSINESS ANALYTICS
Example 1:
For example, he might learn that his best customers are married females
between the age of 45 and 54 who make more than $80,000 per year.
Marketing efforts can be targeted to such demographic.
Example 2:
A bank wants to search new ways to increase revenues from its credit card
operations. They want to check whether usage would double if fees were
halved.
56
COMPONENTS OF BUSINESS ANALYTICS
• Television and radio. There are networks that apply real time data
mining to measure their online television (IPTV) and
radio audiences. These systems collect and analyse, on the
fly, anonymous information from channel views, broadcasts and
programming. Data mining allows networks to make personalised
57
COMPONENTS OF BUSINESS ANALYTICS
1) R-language:
58
COMPONENTS OF BUSINESS ANALYTICS
• It is the speedy process which makes it easy for the users to analyse
huge amount of data in less time.
Application Uses
59
COMPONENTS OF BUSINESS ANALYTICS
Retail Data Mining techniques help retail malls and grocery stores identify
and arrange most sellable items in the most attentive positions. It
helps store owners to comes up with the offer which encourages
customers to increase their spending.
Service Providers Service providers like mobile phone and utility industries use Data
Mining to predict the reasons when a customer leaves their
company. They analyse billing details, customer service
interactions, complaints made to the company to assign each
customer a probability score and offers incentives.
E-Commerce E-commerce websites use Data Mining to offer cross-sells and up-
sells through their websites. One of the most famous names is
Amazon, who use Data mining techniques to get more customers
into their eCommerce store.
Super Markets Data Mining allows supermarket's develop rules to predict if their
shoppers were likely to be expecting. By evaluating their buying
pattern, they could find woman customers who are most likely
pregnant. They can start targeting products like baby powder, baby
shop, diapers and so on.
Crime Data Mining helps crime investigation agencies to deploy police
Investigation workforce (where is a crime most likely to happen and when?), who
to search at a border crossing etc.
Bioinformatics Data Mining helps to mine biological data from massive datasets
gathered in biology and medicine.
60
COMPONENTS OF BUSINESS ANALYTICS
The rule says that 10% customers buy Cheese and Beer together, and
those who buy Cheese also buy Beer 80% of the time.
Association rule mining, however, does not consider the sequence in which
the items are purchased. Sequential pattern mining takes care of that. An
example of a sequential pattern is “5% of customers buy bed first, then
mattress and then Sequential Rule Mining is a data mining technique which
consists of discovering rules in sequences. Sequential Rule Mining has
many applications for example for analysing the behaviour of customers in
supermarkets or users on a website or passengers at an airport.
61
COMPONENTS OF BUSINESS ANALYTICS
62
COMPONENTS OF BUSINESS ANALYTICS
Association Analysis
There are a couple of terms used in association analysis that are important
to understand. Association rules are normally written like this: {Diapers}
-> {Beer} which means that there is a strong relationship between
customers that purchased diapers and also purchased beer in the same
transaction. In the above example, the {Diaper} is the antecedent and the
{Beer} is the consequent. Both antecedents and consequents can have
multiple items. In other words, {Diaper, Gum} -> {Beer, Chips} is a valid
rule.
Support is the relative frequency that the rules show up. In many
instances, you may want to look for high support in order to make sure it is
a useful relationship. However, there may be instances where a low support
is useful if you are trying to find “hidden” relationships.
Lift is the ratio of the observed support to that expected if the two rules
were independent. The basic rule of thumb is that a lift value close to 1
means the rules were completely independent. Lift values > 1 are generally
more “interesting” and could be indicative of a useful rule pattern.
So, there are three important parameters -support, confidence and lift.
Suppose there a set of transactions with item1 --> item 2. So, support for
item 1 will be defined by n(item1) / n (total transactions). Confidence on
the other hand is defined as, n (item1 & item2) / n(item1). So, confidence
tells us the strength of the association and support tells us the relevance of
the rule. Because we don’t want to include rules about items that are
seldom bought, or in other words, have low support. Lift is Confidence/
Support. Higher the lift, more the significance of applying the Apriori
algorithm to determine the rule.
63
COMPONENTS OF BUSINESS ANALYTICS
Text Mining is one of the most critical ways of analysing and processing
unstructured data which forms nearly 80% of the world’s data. Today a
majority of organizations and institutions gather and store massive
amounts of data in data warehouses, and cloud platforms and this data
continues to grow exponentially by the minute as new data comes pouring
in from multiple sources. As a result, it becomes a challenge for companies
and organizations to store, process, and analyse vast amounts of textual
data with traditional tools. This is where text mining applications, text
mining tools, and text mining techniques come in.
64
COMPONENTS OF BUSINESS ANALYTICS
• Gathering unstructured data from multiple data sources like plain text,
web pages, pdf files, emails, and blogs, to name a few.
• Detect and remove anomalies from data by conducting pre-processing
and cleansing operations. Data cleansing allows you to extract and retain
the valuable information hidden within the data and to help identify the
roots of specific words.
• For this, you get a number of text mining tools and text mining
applications.
• Convert all the relevant information extracted from unstructured data
into structured formats.
• Analyze the patterns within the data via the Management Information
System (MIS).
• Store all the valuable information into a secure database to drive trend
analysis and enhance the decision-making process of the organization.
65
COMPONENTS OF BUSINESS ANALYTICS
Let us now look at the most famous techniques used in text mining
techniques:
1. Information Extraction
This is the most famous text mining technique. Information exchange
refers to the process of extracting meaningful information from vast chunks
of textual data. This text mining technique focuses on identifying the
extraction of entities, attributes, and their relationships from semi-
structured or unstructured texts. Whatever information is extracted is then
stored in a database for future access and retrieval. The efficacy and
relevancy of the outcomes are checked and evaluated using precision and
recall processes.
2. Information Retrieval
Information Retrieval (IR) refers to the process of extracting relevant and
associated patterns based on a specific set of words or phrases. In this text
mining technique, IR systems make use of different algorithms to track and
monitor user behaviours and discover relevant data accordingly. Google
and Yahoo search engines are the two most renowned IR systems.
66
COMPONENTS OF BUSINESS ANALYTICS
3. Categorization
This is one of those text mining techniques that is a form of “supervised”
learning wherein normal language texts are assigned to a predefined set of
topics depending upon their content. Thus, categorization or rather Natural
Language Processing (NLP) is a process of gathering text documents and
processing and analyzing them to uncover the right topics or indexes for
each document. The co-referencing method is commonly used as a part of
NLP to extract relevant synonyms and abbreviations from textual data.
Today, NLP has become an automated process used in a host of contexts
ranging from personalized commercials delivery to spam filtering and
categorizing web pages under hierarchical definitions, and much more.
4. Clustering
Clustering is one of the most crucial text mining techniques. It seeks to
identify intrinsic structures in textual information and organize them into
relevant subgroups or ‘clusters’ for further analysis. A significant challenge
in the clustering process is to form meaningful clusters from the unlabelled
textual data without having any prior information on them. Cluster
analysis is a standard text mining tool that assists in data distribution or
acts as a pre-processing step for other text mining algorithms running on
detected clusters.
5. Summarisation
Text summarisation refers to the process of automatically generating a
compressed version of a specific text that holds valuable information for
the end-user. The aim of this text mining technique is to browse through
multiple text sources to craft summaries of texts containing a considerable
proportion of information in a concise format, keeping the overall meaning
and intent of the original documents essentially the same. Text
summarisation integrates and combines the various methods that employ
text categorization like decision trees, neural networks, regression models,
and swarm intelligence.
67
COMPONENTS OF BUSINESS ANALYTICS
Text mining techniques and text mining tools are rapidly penetrating the
industry, right from academia and healthcare to businesses and social
media platforms. This is giving rise to a number of text mining
applications. Here are a few text mining applications used across the globe
today:
1. Risk Management
One of the primary causes of failure in the business sector is the lack of
proper or insufficient risk analysis. Adopting and integrating risk
management software powered by text mining technologies such as SAS
Text Miner can help businesses to stay updated with all the current trends
in the business market and boost their abilities to mitigate potential risks.
Since text mining tools and technologies can gather relevant information
from across thousands of text data sources and create links between the
extracted insights, it allows companies to access the right information at
the right moment, thereby enhancing the entire risk management process.
68
COMPONENTS OF BUSINESS ANALYTICS
3. Fraud Detection
Text analytics backed by text mining techniques provides a tremendous
opportunity for domains that gather a majority of data in the text format.
Insurance and finance companies are harnessing this opportunity. By
combining the outcomes of text analyses with relevant structured data
these companies are now able to process claims swiftly as well as to detect
and prevent frauds.
4. Business Intelligence
Organizations and business firms have started to leverage text mining
techniques as part of their business intelligence. Apart from providing
profound insights into customer behaviour and trends, text mining
techniques also help companies to analyse the strengths and weaknesses
of their rivals, thus, giving them a competitive advantage in the market.
Text mining tools such as Cogito Intelligence Platform and IBM text
analytics provide insights on the performance of marketing strategies,
latest customer and market trends, and so on.
69
COMPONENTS OF BUSINESS ANALYTICS
Here are two case examples where text mining has transformed
real world data to real world evidence.
Understanding the potential for market access is essential for all pharma
companies, and information to characterize the burden of disease and local
standard of care in different countries across the globe is critical for any
new drug launch. Companies need an assessment of the landscape of
epidemiological data, health economics and outcomes information to
inform the optimal commercial strategy.
I2E can provide the starting point for efficiently performing evidence based
systematic reviews over very large sets of scientific literature, enabling
researchers to answer questions around commercial business decisions.
70
COMPONENTS OF BUSINESS ANALYTICS
Top pharma company Novo Nordisk uses text mining to gain clinical
insights from MSL interactions with HCPs. These interactions may be broad
ranging, covering topics such as safety and efficacy, dosing, cost, special
populations, indication, comparisons, competitor products, etc. MSLs may
use approved slide decks, package inserts (PIs), factsheets, studies or
publications to answer HCP questions. Linguamatics’ text mining
platform I2E is used to structure these source files with custom ontologies
(e.g. for material types, product, disease terminology variation, topics).
This analysis enables Novo Nordisk to better address what support HCPs
may need in their interactions with patients, insurance providers, and other
clinicians and invest in resource development appropriately.
2.6 FORECASTING
71
COMPONENTS OF BUSINESS ANALYTICS
a) Qualitative Models
• Delphi Method: Asking field experts for general opinions and then
compiling them into a forecast. (For more on qualitative modeling, read
"Qualitative Analysis: What Makes a Company Great?")
b) Quantitative Models
Quantitative models discount the expert factor and try to remove the
human element out of the analysis. These approaches are concerned solely
with data and avoid the fickleness of the people underlying the numbers.
They also try to predict where variables like sales, gross domestic product,
housing prices, and so on, will be in the long-term, measured in months or
years. Quantitative models include:
72
COMPONENTS OF BUSINESS ANALYTICS
2. Theoretical variables and an ideal data set are chosen. This is where
the forecaster identifies the relevant variables that need to be
considered and decides how to collect the data.
3. Assumption time. To cut down the time and data needed to make a
forecast, the forecaster makes some explicit assumptions to simplify the
process.
4. A model is chosen. The forecaster picks the model that fits the dataset,
selected variables, and assumptions.
5. Analysis. Using the model, the data is analysed and a forecast made
from the analysis.
73
COMPONENTS OF BUSINESS ANALYTICS
74
COMPONENTS OF BUSINESS ANALYTICS
• Prior to that, relevant data is collected and cleaned. Data from multiple
sources may be combined into a common source. Data relevant to the
analysis is selected, retrieved, and transformed into forms that will work
with data mining procedures.
75
COMPONENTS OF BUSINESS ANALYTICS
Decision Trees
• Decision tree techniques, also based on ML, use classification algorithms
from data mining to determine the possible risks and rewards of pursuing
several different courses of action. Potential outcomes are then
presented as a flowchart which helps humans to visualize the data
through a tree-like structure.
• A decision tree has three major parts: a root node, which is the starting
point, along with leaf nodes and branches. The root and leaf nodes ask
questions.
• The branches connect the root and leaf nodes, depicting the flow from
questions to answers. Generally, each node has multiple additional nodes
extending from it, representing possible answers. The answers can be as
simple as "yes" and "no."
Text Analytics
To find answers in this text data, organizations are now experimenting with
new advanced analytics techniques such as topic modeling and sentiment
analysis. Text analytics uses ML, statistical, and linguistics techniques.
76
COMPONENTS OF BUSINESS ANALYTICS
would result in "healthcare." A law firm might use topic modeling, for
instance, to find case law pertaining to a specific subject.
Sentiment analysis.
77
COMPONENTS OF BUSINESS ANALYTICS
Statistical techniques in predictive analytics modeling can range all the way
from simple traditional mathematical equations to complex deep machine
learning processes running on sophisticated neural networks. Multiple
linear regression is the most commonly used simple statistical method.
Neural Networks
78
COMPONENTS OF BUSINESS ANALYTICS
• For its part, non-traditional data extends way beyond text data such
social media tweets and emails. For data input such as maps, audio,
video, and medical images, deep learning techniques are also required.
These techniques create layer upon layer of neural networks to analyze
complex data shapes and patterns, improving their accuracy rates by
being trained on representative data sets.
79
COMPONENTS OF BUSINESS ANALYTICS
• Other barriers that come into play include the levels of complexity and
computational powers of today's neural networks. Neural networks need
to obtain either enough parameters or a more sophisticated architecture
to train on, learn from, and be aware of lessons learned in autonomous
vehicle applications. Additional engineering challenges are posed by
scaling the data set to a massive size.
2.8 OPTIMIZATION
1. Constructing a Model
80
COMPONENTS OF BUSINESS ANALYTICS
• The variables or the unknowns are the components of the system for
which we want to find values. In manufacturing, the variables may be the
amount of each resource consumed or the time spent on each activity,
whereas in data fitting, the variables would be the parameters of the
model.
The constraints are the functions that describe the relationships among the
variables and that define the allowable values for the variables. In
manufacturing, the amount of a resource consumed cannot exceed the
available amount.
3. Selecting Software
81
COMPONENTS OF BUSINESS ANALYTICS
A simple question you can ask yourself- when you last used car navigation?
and you will have no problem answering. Your car navigation system is an
example of optimization.
With car navigation it's easy to see how optimization may help you get to
the office on time, but now let's explore how you can use it when you get
there.
82
COMPONENTS OF BUSINESS ANALYTICS
2.9 VISUALIZATION
The Enhanced exploratory data analysis & output of modeling results with
highly interactive statistical graphics
Elite athletes use it. The super-rich use it. And peak performers in all fields
now use it. That power is called visualization.
83
COMPONENTS OF BUSINESS ANALYTICS
3) It activates the law of attraction, thereby drawing into your life the
people, resources, and circumstances you will need to achieve your goals.
All you have to do is set aside a few minutes a day. The best times are
when you first wake up, after meditation or prayer, and right before you go
to bed. These are the times you are most relaxed.
84
COMPONENTS OF BUSINESS ANALYTICS
• STEP 1. Imagine sitting in a movie theatre, the lights dim, and then the
movie starts. It is a movie of you doing perfectly whatever it is that you
want to do better. See as much detail as you can create, including your
clothing, the expression on your face, small body movements, the
environment and any other people that might be around. Add in any
sounds you would be hearing — traffic, music, other people talking,
cheering. And finally, recreate in your body any feelings you think you
would be experiencing as you engage in this activity.
• STEP 2. Get out of your chair, walk up to the screen, open a door in the
screen and enter into the movie. Now experience the whole thing again
from inside of yourself, looking out through your eyes. This is called an
“embodied image” rather than a “distant image.” It will deepen the
impact of the experience. Again, see everything in vivid detail, hear the
sounds you would hear, and feel the feelings you would feel.
• STEP 3. Finally, walk back out of the screen that is still showing the
picture of you performing perfectly, return to your seat in the theatre,
reach out and grab the screen and shrink it down to the size of a cracker.
Then, bring this miniature screen up to your mouth, chew it up and
swallow it. Imagine that each tiny piece — just like a hologram —
contains the full picture of you performing well. Imagine all these little
screens traveling down into your stomach and out through the
bloodstream into every cell of your body. Then imagine that every cell of
your body is lit up with a movie of you performing perfectly. It’s like one
of those appliance store windows where 50 televisions are all tuned to
the same channel.
When you have finished this process — it should take less than five
minutes — you can open your eyes and go about your business. If you
make this part of your daily routine, you will be amazed at how much
improvement you will see in your life.
85
COMPONENTS OF BUSINESS ANALYTICS
Example : When we were writing the very first Chicken Soup for the
Soulbook, we took a copy of the New York Times best seller list, scanned it
into our computer, and using the same font as the newspaper, typed
Chicken Soup for the Soul into the number one position in the “Paperback
Advice, How-To and Miscellaneous” category. We printed several copies and
hung them up around the office. Less than two years later, our book was
the number one book in that category and stayed there for over a year.
Now that’s a pretty solid example of a successful visualization technique!
86
COMPONENTS OF BUSINESS ANALYTICS
2.10 SUMMARY
• Data Aggregation
Volunteered data: Data supplied via a paper or digital form that is shared
by the consumer directly or by an authorized third party (usually personal
information).
• Data Mining
87
COMPONENTS OF BUSINESS ANALYTICS
• Text Mining
Companies can also collect textual information from social media sites,
blog comments, and call center scripts to extract meaningful relationship
indicators. This data can be used to:
• Forecasting
Energy demands for a city with a static population in any given month or
quarter
Retail sales for holiday merchandise, including biggest sales days for both
physical and digital stores
88
COMPONENTS OF BUSINESS ANALYTICS
• Predictive Analytics
• Optimization
Peak sales pricing and using demand spikes to scale production and
maintain a steady revenue flow
Inventory stocking and shipping options that optimize delivery schedules
and customer satisfaction without sacrificing warehouse space
Prime opportunity windows for sales, promotions, new products, and spin-
offs to maximize profits and pave the way for future opportunities
• Data Visualization
Information and insights drawn from data can be presented with highly
interactive graphics to show:
89
COMPONENTS OF BUSINESS ANALYTICS
5. Explain: Forecasting
90
COMPONENTS OF BUSINESS ANALYTICS
91
COMPONENTS OF BUSINESS ANALYTICS
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
92
DIGITAL DATA AND ITS TYPES
Chapter 3
Digital Data and its Types
Objectives:
On completion of this chapter, you will understand what is digital data and
types of digital data, its source, storage and characteristics of structured,
unstructured and semi structured data, OLAP v/s OLTP and data models to
it. Considering following:
Structure:
3.1 Introduction
3.9 Summary
93
DIGITAL DATA AND ITS TYPES
3.1 INTRODUCTION:
Digital data is data that represents other forms of data using specific
machine language system that can be interpreted by using various
technologies. The most fundamental of these systems is binary system,
which simply store complex audio, video or text information in series of
binary characters, traditionally ones and zeros, or “on” and “off” values.
One of the biggest strength of the digital data is that all sorts of very
complex analog input can be represented with binary system. Along with
smaller microprocessors and larger data storage centres, this model of
information capture has helped parties like business and government
agencies to explore new frontiers of data collections and to represent
more impressive simulations through a digital interface.
From the earliest primitive digital data design to new, highly sophisticated
and massive volume of binary data, digital data seek to capture elements
of physical world and simulate them for technological use. This is done in
many different ways, but with specific techniques for capturing the real
world events and converting them in to digital form. One simple example is
conversion of physical scene in to digital image. In this, new digital data is
somewhat similar to older data system that converted a physical view or
scene to chemical film. One of the major difference is that digital data
records visual information in to bit map or pixelated map , that stores the
particular colour property for each bit on precise and sophisticated grid. By
using this straightforward essential system data transfer, the digital image
was created. Similar techniques are used to record audio streams in to
digital form
94
DIGITAL DATA AND ITS TYPES
Digital data can be defined in short as -Digital data is data, that represents
other forms of data using specific machine language systems that can be
interpreted by various technologies. The most fundamental of these
systems is a binary system, which simply stores complex audio, video or
text information in a series of binary characters, traditionally ones and
zeros, or "on" and "off" values.
Digital media is digitised content that can be transmitted over the internet
or computer network. This can include Text, audio, video and graphics.
Following are forms of digital media:
i. E-Music – is the audio accessed from the internet. It has been made
possible using compressed file format such as MP3. E-music has allowed
the people to easily download the music from the internet and copy
music from CD in the magnetic disc. However, it has also allowed people
around the world to make illegal copies of music. E-music can be played
back using a media player on the computer or using an MP3 playback
device.
95
DIGITAL DATA AND ITS TYPES
Thus, with audio, text, Video and graphic in digital format can be stored
on server , hard drive and mobile device. One of the main advantage of
this is that it reduces the cost. The cost includes the cost to produce,
deliver and store the physical formats that contain movies , TV shows and
music. The production cost is reduced by eliminating the factories that
manufactures the discs that our media is stored on today. These costs will
be replaced by the cost to host downloads of the content or stream it from
the cloud. While there is cost involved in hosting the content , it is far less
than the cost to build the factories , train workers , and ship in the raw
material to make the discs. There is also cost involved in shipping the discs
to relatives and friends. With digital media, a corrupted file can simply be
redistributed with no extra cost. Another advantage is that digital media is
compatible with different pieces of hard ware, while physical media are
limited to just a few that are compatible. This means that there is more
freedom of choice for customer on how they view media content , whether
it is from computer, TV or mobile device. There is much flexibility with
digital formats over physical ones.
Following are the data types that are used for digital media products:
• Text and hypertext – of all the data types, text requires the least
amount of storage and processing power in a computer. Common text
formats include Word documents, PDF documents, HTML, and Text
documents. Hypertext is text that contains a link to other information or
files (such as a web link).
• Audio – can be stored in many different formats that each have their
own advantages and disadvantages. Digitised sound takes small pieces
or ‘samples’ of a sound and stores them all digitally. The quality of the
sound depends on how fast the sample is taken (sample rate) and the
bits available for storage (sample size). A music CD is samples at 44kHz
which means it has 44100 sample per seconds. A music CD also has a
sample size of 16 bits. MIDI files are smaller than normal audio files as
they do not actually record human speech or other sounds but simple
store information about an instrument as well as pitch, timing, and
duration of notes. MIDI files are smaller than digitised sound files.
Common file formats for digitised sound include MP3, WAV, and WMA.
Some file formats such as MP3 can compress sound to make files smaller
96
DIGITAL DATA AND ITS TYPES
in size but can reduce the quality of the sound. Audio file formats that
lose quality when compressed are called lossy audio files (such as MP3
and AAC). Audio files formats that retain quality when compressed are
called lossless audio files (such as WAV and FLAC).
• Graphics – there are two main types of graphics: bitmap and vector. For
each type of graphics there are also different formats. For example,
digital camera use bitmapped images and store photos in the JPEG
format. Drawing programs such as Illustrator can store vector images in
formats such as SVG. Other programs such as Photoshop, Paint and
SketchUp also have their own graphics formats.
• Video – there are many different video file formats available today and
each one has its own advantages and disadvantages for different
applications. Some video formats may be more suitable for web
streaming while others may be more suitable for televisions or mobile
devices. Common video formats include MP4 AVI, WMV, FLV and
QuickTime (MDV).
• Animation – there are several animation file formats that are used by
different animation authoring programs such as the SWF format used by
Adobe Flash. However, animations can be exported to file formats that
are also used for video such as MP4 and FLV.
97
DIGITAL DATA AND ITS TYPES
Audio plays a big role for many different applications from movies/videos
to games. Adding audio to computer games makes it more enjoyable, even
if it is only adding background music, it can make the game more
appealing to play.
Video software allows you to cut, copy, and paste video and audio
sequences. It can add many effects such as titles, fades and wipes
between scenes. Most digital video media is intended for storage and
distribution on CDs or DVDs, or even for broadcasting on TV. Distributing
video media over the internet is very difficult as only users who have high-
speed broadband internet connections can hope to receive high quality
images.
What would happen to your digital business if the data that feeds it
would suddenly be unavailable? Let’s look at the various sources
you may use, and how to secure them.
Let's look at where your data comes from, and consider which concrete
actions you can take to secure its supply.
98
DIGITAL DATA AND ITS TYPES
Internal data, and especially data which primary purpose is distinct than
the usage your digital business makes, is both the easiest and the trickiest
to secure. It's easy because you don't have to negotiate a formal contract
with a third party, and if there is executive buy-in for what you do, then
getting the data owner to provide access should not be a problem. But it's
also tricky precisely because of this lack of formal contract, because people
change, because priorities shift. Whether accidental or not, you may find
your access cut off overnight, and the restoration of this access not being a
top priority for the data owner. Or data schemas may change and require
that you rebuild you entire collection processes.
Action: make sure the proper processes and SLAs are in place, and follow
very closely organization and staff movements to inform new stakeholders
of why your access to data must remain safe.
If you process data from the Internet of Things, and especially consumer
connected devices, your challenge to securing access is primary legal.
There are two questions you need to consider:
• Who owns the data? Does it belong to the owner of the device, the
account holder, or to your organization?
• What can you do with the data? Surely, you can use it to render a
service to your subscriber, but can you aggregate it with data from other
subscribers? Can you resell this data (anonymized or not)? Can you
derive insights, and resell this insight?
Action: review your terms of use and ensure these questions are being
addressed. Also consider whether privacy laws and customs in various
countries or regions may have an impact.
Syndicated data is usually the easiest to control. Because you are paying a
service provider to deliver data to you, you have a contract with this
99
DIGITAL DATA AND ITS TYPES
provider. This contract will cover service level agreements, licensing and
usage limitations, and should ensure continued access.
However, you still need to consider what will happen if the service provider
goes out of business, or changes its business model (like Twitter's recent
announcement that they are shutting down their firehose to better control
their supply chain).
Action: review if alternate sources are available, and keep these options at
hand in case you need them.
The case of trading partners data is very similar to the one of syndicated
data, except that the data is usually not provided as a standalone service
but as part of a broader relationship -- for example between a retailer and
a manufacturer. Enforcing service level agreements can become tricky, if it
puts at risk an otherwise profitable relationship.
Action: like you do for syndicated data, always have in mind alternate
sources, if applicable.
v. Open data
The good news with open data is that it's free -- but it's also the bad news.
Assuming you study carefully the terms of use and licensing agreement for
the data, you should be safe legally. But there is no guarantee that this
service will be provided in the long run, or that it will be provided
consistently. The risks of changes in the data structures and the access
methods provided, is very high. And if the service is not responding, you
have no recourse.
Action: find multiple sources, and do not build your business on the
assumption that open data feeds will remain available in the long run.
100
DIGITAL DATA AND ITS TYPES
Harvesting data from web sites (screen scraping) or public APIs is common
practice, but it is also the least secure source of data you can consider.
From the legal standpoint, this practice is often borderline since there is no
licensing agreement that permits you to use the data harvested in such
ways.
From the data availability standpoint, web sites change all the time, and
your scraping routines will become obsolete in no time.
Action: stay away from data harvesting! And if data harvesting is your
only option, be prepared to suffer outages, and to have to redevelop your
routines all the time. And maybe get a lawyer.
Digital Data Storage (DDS) is a format for storing and backing up computer
data on tape that evolved from the Digital Audio Tape (DAT) technology.
DAT was created for CD-quality audio recording. In 1989, Sony and
Hewlett Packard defined the DDS format for data storage using DAT tape
cartridges. Tapes conforming to the DDS format can be played by either
DAT or DDS tape drives. However, DDS tape drives cannot play DAT tapes
since they can't pick up the audio on the DAT tape.
DDS uses a 4-mm tape. A DDS tape drive uses helical scanning for
recording, the same process used by a video recorder (VCR). There are two
read heads and two write heads. The read heads verify the data that has
been written (recorded). If errors are present, the write heads rewrite the
data. When restoring a backed-up file, the restoring software reads the
directory of files located at the beginning of the tape, winds the tape to the
location of the file, verifies the file, and writes the file onto the hard drive.
DDS cannot update a backed-up file in the same place it was originally
recorded. In general, DDS requires special software for managing the
storage and retrieval of data from DDS tape drives.
101
DIGITAL DATA AND ITS TYPES
A DDS cartridge needs to be retired after 2,000 passes or 100 full backups.
You should clean your DDS tape drive every 24 hours with a cleaning
cartridge and discard the cleaning cartridge after 30 cleanings. DDS tapes
have an expected life of at least 10 years.
Big Data includes huge volume, high velocity, and extensible variety of
data. These are 3 types: Structured data, Semi-structured data, and
Unstructured data. In computer science, a data structure is a particular
way of organising and storing data in a computer such that it can be
accessed and modified efficiently. More precisely, a data structure is a
collection of data values, the relationships among them, and the functions
or operations that can be applied to the data.
For the analysis of data, it is important to understand that there are three
common types of data structures:
102
DIGITAL DATA AND ITS TYPES
1. Structured data –
Structured data is data whose elements are addressable for effective
analysis. It has been organized into a formatted repository that is typically
a database. It concerns all data which can be stored in database SQL in a
table with rows and columns. They have relational keys and can easily be
mapped into pre-designed fields. Today, those data are most processed in
the development and simplest way to manage
information. Example: Relational data.
103
DIGITAL DATA AND ITS TYPES
2. Semi-Structured data-
The reason that this third category exists (between structured and
unstructured data) is because semi-structured data is considerably easier
to analyse than unstructured data. Many Big Data solutions and tools have
the ability to ‘read’ and process either JSON or XML. This reduces the
complexity to analyse structured data, compared to unstructured data.
3. Unstructured data –
104
DIGITAL DATA AND ITS TYPES
The ability to store and process unstructured data has greatly grown in
recent years, with many new technologies and tools coming to the market
that are able to store specialised types of unstructured data. MongoDB, for
example, is optimised to store documents. Apache Giraph as an opposite
example, is optimised for storing relationships between nodes.
105
DIGITAL DATA AND ITS TYPES
Data manifests itself in many different shapes. Each shape of the data may
hold much value to the business. In some shapes, this is easier to extract
than others. Different shapes of the data require different storage solutions
and should therefore be dealt with in different ways. We can distinguish
between there shapes of data as under:
1. Un structured data:
Unstructured data is the rawest form of data. It can be any type of file ,
e.g. texts, pictures, sounds or videos. This data is often stored in
repository of files. Think of this as a very well organised directory on your
computer hard drive. Extracting value out of this shape of data is often the
hardest. Since you first need to extract the structured features from the
data that describes or abstract from it. For example , to use the text you
might want to extract the topics, and whether the text is positive and
negative about them
106
DIGITAL DATA AND ITS TYPES
2. Structured data:
Structured data is tabular data ( rows and columns) which are very well
defined. Meaning that we know which columns are there and what kind of
data it contains. Often such type of data is stored in database. In data
bases , we can use the power of language SQL to answer the queries about
the data and easily create the data sets to use in data science solutions.
You can find these three shapes of data within the organisation, but you
can also find them in external data sources like the internet. You may also
find them in combine shapes of data from different sources in to single
source.
107
DIGITAL DATA AND ITS TYPES
The first place to look for data is within the organisation. Most
organisations have a ERP, CRM, Workflow management systems. These
systems often use a database to store the data in structured way. These
data bases contain huge amount of data from which you can extract the
values. For example from the workflow management system you can easily
get insight about bottlenecks of business processes , or by using data from
ERP system you can make sales predictions.
Following are some of the pictures of external data source from which you
can extract the data.
108
DIGITAL DATA AND ITS TYPES
The real fun starts when we enrich the organisations data with external
data sources. We distinguish four kind of external sources. The most
obvious are publicly available data sets. Often government organisations
release demographic and economic data sets every year. An example of
such data is population/ Km2 per region.
There are companies that have made it their core business to collect,
estimate and sell data. We have worked with data sets from such
companies and it contain certain information such as the net income of the
address, the size of the house and even the probability that person has a
dog. We can use this data to enrich the organisations data to improve their
customer profile. Can we use this data to predict the credit risk of our
customer?
Many websites these days provide APIs which allow the programmers to
build interactive apps on their platform e.g. Twitter, Facebook, Linked in
etc. However, such APIs can also be used to collect the data. In case of
Twitter, you can request all tweets which contain a certain hash tags
Customer support software’s are often able to extract social media feeds
using these APIs and perform sentiment analysis. Sentiment analysis is a
method to determine whether the text is positive or negative about a topic.
Using this method , customer support division can efficiently focus on
unsatisfied customers.
Mast but not least is a scraping. With scrapping you extract a relevant data
of an unstructured data source. With scrapping you are able to extract
anything you see on website.
109
DIGITAL DATA AND ITS TYPES
110
DIGITAL DATA AND ITS TYPES
5. Pivot. Analysts can gain a new view of data by rotating the data axes of
the cube.
• OLAP Server:
• Advantages of ROLAP
• Disadvantages of ROLAP
111
DIGITAL DATA AND ITS TYPES
• Advantages of MOLAP
• Disadvantages of MOLAP
• Advantages of HOLAP
• Disadvantages of HOLAP
1. HOLAP architecture is very complex because it support both MOLAP and
ROLAP servers.
112
DIGITAL DATA AND ITS TYPES
OLAP begins with data accumulated from multiple sources and stored in a
data warehouse. The data is then cleansed and stored in OLAP cubes,
which users run queries against.
With OLAP , a report is just starting view, say sales for 2015 by country- a
summarised starting point. As user clicks , drills and pivot, the end result
might be sales, unit price, volume for one quarter, for 2 products , in
113
DIGITAL DATA AND ITS TYPES
• OLAP viewers:
Microsoft excel is one of the most popular interfaces to OLAP data. In fact
for three of the leading OLAP products ( Oracles Hyperion Essbase,
Microsoft analysis services, SAP business explorer), the spreadsheet was
initially the only interface. User would open a spreadsheet and could
immediately begin drilling within cells and excel pivot tables to retrieve and
explore their data.
What is OLTP:-
114
DIGITAL DATA AND ITS TYPES
Definition: –
• Requirements:-
115
DIGITAL DATA AND ITS TYPES
• Benefits:-
ADVANTAGES OF OLTP:-
• It provides faster and more accurate forecast for revenues and expenses.
• It provides a concrete foundation for a stable organization because of
timely modification of all transactions.
• It makes the transactions much easier on behalf of the customers by
allowing them to make the payments according to their choice.
• It broadens the customer base for an organization by simplifying and
speeding up individual processes.
116
DIGITAL DATA AND ITS TYPES
DISADVANTAGES OF OLTP:-
117
DIGITAL DATA AND ITS TYPES
The following table summarizes the major differences between LTP System
Online Transaction Processing (Operational
118
DIGITAL DATA AND ITS TYPES
OLTP OLAP
1. Current data. 1. Current and historical data.
2. Short database transactions . 2. Long database transactions.
3. Short database transactions . 3. Batch update/insert/delete.
4. Normalization is promoted . 4. Denormalization is promoted .
5. High volume transactions . 5. Low volume transactions.
6. Transaction recovery is necessary. 6. Transaction recovery is not
necessary.
119
DIGITAL DATA AND ITS TYPES
TYPES:-
120
DIGITAL DATA AND ITS TYPES
Features
• Rapid response
Fast performance with a rapid response time is critical. Businesses cannot
afford to have customers waiting for a TPS to respond, the turnaround time
from the input of the transaction to the production for the output must be
a few seconds or less.
• Reliability
Many organizations rely heavily on their TPS; a breakdown will disrupt
operations or even stop the business. For a TPS to be effective its failure
rate must be very low. If a TPS does fail, then quick and accurate recovery
must be possible. This makes well–designed backup and recovery
procedures essential.
• Inflexibility
A TPS wants every transaction to be processed in the same way regardless
of the user, the customer or the time for day. If a TPS were flexible, there
would be too many opportunities for non-standard operations, for example,
a commercial airline needs to consistently accept airline reservations from
a range of travel agents, accepting different transactions data from
different travel agents would be a problem.
• Controlled processing
The processing in a TPS must support an organization's operations. For
example if an organization allocates roles and responsibilities to particular
employees, then the TPS should enforce and maintain this requirement.
Example : ATM Transaction
• Consistency
A transaction is a correct transformation of the state. The actions taken as
a group do not violate any of the integrity constraints associated with the
state. This requires that the transaction be a correct program!
Isolation
Even though transactions execute concurrently, it appears to each
transaction T, that others executed either before T or after T, but not both.
• Durability
Once a transaction completes successfully (commits), its changes to the
state survive failures.
121
DIGITAL DATA AND ITS TYPES
• Concurrency
Ensures that two users cannot change the same data at the same time.
That is, one user cannot change a piece of data before another user has
finished with it. For example, if an airline ticket agent starts to reserve the
last seat on a flight, then another agent cannot tell another passenger that
a seat is available.
122
DIGITAL DATA AND ITS TYPES
A relational structure.
• Master file: Contains information about an organization’s business
situation. Most transactions and databases are stored in the master file.
• Transaction file: It is the collection of transaction records. It helps to
update the master file and also serves as audit trails and transaction
history.
• Report file: Contains data that has been formatted for presentation to a
user.
• Work file: Temporary files in the system used during the processing.
• Program file: Contains the instructions for the processing of data.
123
DIGITAL DATA AND ITS TYPES
Data warehouse
Backup procedures
Recovery process
A TPS may fail for many reasons. These reasons could include a system
failure, human errors, hardware failure, incorrect or invalid data, computer
viruses, software application errors or natural or man- made disasters. As
it's not possible to prevent all TPS failures, a TPS must be able to cope with
124
DIGITAL DATA AND ITS TYPES
failures. The TPS must be able to detect and correct errors when they
occur. A TPS will go through a recovery of the database to cope when the
system fails, it involves the backup, journal, checkpoint, and recovery
manager:
• Journal: A journal maintains an audit trail of transactions and database
changes. Transaction logs and Database change logs are used, a
transaction log records all the essential data for each transactions,
including data values, time of transaction and terminal number. A
database change log contains before and after copies of records that
have been modified by transactions.
• Checkpoint: A checkpoint record contains necessary information to
restart the system. These should be taken frequently, such as several
times an hour. It is possible to resume processing from the most-recent
checkpoint when a failure occurs with only a few minutes of processing
work that needs to be repeated.
• Recovery Manager: A recovery manager is a program which restores the
database to a correct condition which can restart the transaction
processing.
Depending on how the system failed, there can be two different recovery
procedures used. Generally, the procedures involves restoring data that
has been collected from a backup device and then running the transaction
processing again. Two types of recovery are backward recovery and
forward recovery:
125
DIGITAL DATA AND ITS TYPES
A. Grandfather-father-son
B. Partial backups
This only occurs when parts of the master file are backed up. The master
file is usually backed up to magnetic tape at regular times, this could be
daily, weekly or monthly. Completed transactions since the last backup are
stored separately and are called journals, or journal files. The master file
can be recreated from the journal files on the backup tape if the system is
to fail.
Updating in a batch
This is used when transactions are recorded on paper (such as bills and
invoices) or when it's being stored on a magnetic tape. Transactions will be
collected and updated as a batch at when it's convenient or economical to
process them. Historically, this was the most common method as the
information technology did not exist to allow real-time processing.
126
DIGITAL DATA AND ITS TYPES
• Collecting and storage of the transaction data into a transaction file - this
involves sorting the data into sequential order.
• Processing the data by updating the master file - which can be difficult,
this may involve data additions, updates and deletions that may require
to happen in a certain order. If an error occurs, then the entire batch
fails.
Updating in real-time
This is the immediate processing of data. It provides instant confirmation
of a transaction. This involves a large amount of users who are
simultaneously performing transactions to change data. Because of
advances in technology (such as the increase in the speed of data
transmission and larger bandwidth), real-time updating is now possible.
Updating in real-time uses direct access of data. This occurs when data are
accessed without accessing previous data items. The storage device stores
data in a particular location based on a mathematical procedure. This will
then be calculated to find an approximate location of the data. If data are
not found at this location, it will search through successive locations until
it's found.
127
DIGITAL DATA AND ITS TYPES
The term 'electronic payment' is a collective phrase for the many different
kinds of electronic payment methods available (also meaning online
payment), and the processing of transactions and their application within
online merchants and ecommerce websites.
128
DIGITAL DATA AND ITS TYPES
Electronic payments systems can also increase your cash flow, reduce
administrative costs and labour and provide yet another way for your
customers to pay. Care must be taken when choosing an electronic
payment solution as it will need to fit within the constraints of your
particular online business and integrate seamlessly within your website.
129
DIGITAL DATA AND ITS TYPES
3.9 SUMMARY:
With audio, text, Video and graphic in digital format can be stored on
server , hard drive and mobile device. One of the main advantage of this is
that it reduces the cost. The cost includes the cost to produce , deliver and
store the physical formats that contain movies , TV shows and music. The
production cost is reduced by eliminating the factories that manufactures
the discs that our media is stored on today. These costs will be replaced by
the cost to host downloads of the content or stream it from the cloud.
While there is cost involved in hosting the content , it is far less than the
cost to build the factories , train workers , and ship in the raw material to
make the discs. There is also cost involved in shipping the discs to relatives
and friends. With digital media, a corrupted file can simply be redistributed
with no extra cost. Another advantage is that digital media is compatible
with different pieces of hard ware, while physical media are limited to just
a few that are compatible. This means that there is more freedom of choice
for customer on how they view media content , whether it is from
computer, TV or mobile device. There is much flexibility with digital formats
over physical ones.
Following are the data types that are used for digital media products:
• Text and hypertext
• Audio
• Graphics
• Video
• Animation
• Video productions
Audio plays a big role for many different applications from movies/videos
to games. Adding audio to computer games makes it more enjoyable, even
if it is only adding background music, it can make the game more
appealing to play.
130
DIGITAL DATA AND ITS TYPES
Video software allows you to cut, copy, and paste video and audio
sequences. It can add many effects such as titles, fades and wipes
between scenes. Most digital video media is intended for storage and
distribution on CDs or DVDs, or even for broadcasting on TV. Distributing
video media over the internet is very difficult as only users who have high-
speed broadband internet connections can hope to receive high quality
images.
Digital Data Storage (DDS) is a format for storing and backing up computer
data on tape that evolved from the Digital Audio Tape (DAT) technology.
DAT was created for CD-quality audio recording. In 1989, Sony and
Hewlett Packard defined the DDS format for data storage using DAT tape
cartridges. Tapes conforming to the DDS format can be played by either
DAT or DDS tape drives. However, DDS tape drives cannot play DAT tapes
since they can't pick up the audio on the DAT tape.
DDS uses a 4-mm tape. A DDS tape drive uses helical scanning for
recording, the same process used by a video recorder (VCR). There are two
read heads and two write heads. The read heads verify the data that has
been written (recorded). If errors are present, the write heads rewrite the
data. When restoring a backed-up file, the restoring software reads the
directory of files located at the beginning of the tape, winds the tape to the
location of the file, verifies the file, and writes the file onto the hard drive.
DDS cannot update a backed-up file in the same place it was originally
recorded. In general, DDS requires special software for managing the
storage and retrieval of data from DDS tape drives.
Big Data includes huge volume, high velocity, and extensible variety of
data. These are 3 types: Structured data, Semi-structured data, and
Unstructured data. In computer science, a data structure is a particular
way of organising and storing data in a computer such that it can be
accessed and modified efficiently. More precisely, a data structure is a
131
DIGITAL DATA AND ITS TYPES
collection of data values, the relationships among them, and the functions
or operations that can be applied to the data.
132
DIGITAL DATA AND ITS TYPES
133
DIGITAL DATA AND ITS TYPES
3. Analysts that performs five types such as roll-up, drill down, slice, dice
and pivot are in ------------- analytical operations against
multidimensional data base.
a. OLTP
b. OLAP
134
DIGITAL DATA AND ITS TYPES
135
DIGITAL DATA AND ITS TYPES
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
136
BUSINESS INTELLIGENCE
Chapter 4
Business Intelligence
Objectives:
Structure:
4.1 Introduction
4.6 A Case study: Business intelligence for sales analysis and reporting
4.10 Summary
137
BUSINESS INTELLIGENCE
4.1 INTRODUCTION:
138
BUSINESS INTELLIGENCE
operational data (internal data). When combined external and internal data
can provide a complete picture which in effect, create intelligence that
cannot be derived from any singular set of data. BI tools empowers
organisations to gain an insight in to new market , to assess the demand
and suitability of the product and services for different market segments
and to gauge the impact of marketing efforts.
Why is BI important?
139
BUSINESS INTELLIGENCE
• Step 1) Raw Data from corporate databases is extracted. The data could
be spread across multiple systems heterogeneous systems.
• Step 2) The data is cleaned and transformed into the data warehouse.
The table can be linked, and data cubes are formed.
• Step 3) Using BI system the user can ask quires, request ad-hoc reports
or conduct any other analysis.
There are Four types of BI users also. They are the four key players
who are used Business Intelligence System:
2. The IT users:
The IT user also plays a dominant role in maintaining the BI infrastructure.
The difference between both of them is that a power user has the
capability of working with complex data sets, while the casual user need
will make him use dashboards to evaluate predefined sets of data.
140
BUSINESS INTELLIGENCE
141
BUSINESS INTELLIGENCE
• Example 4: Bank
The Business Intelligence i widely used in different banks. A bank gives
branch managers access to BI applications. It helps branch manager to
determine who are the most profitable customers and which customers
they should work on. The use of BI tools frees information technology staff
from the task of generating analytical reports for the departments. It also
gives department personnel access to a richer data source.
The Business Intelligence applications are also used to check the customer
credit score. The Credit score apps compares the customer data and check
whether he/she has cleared the EMI of the loan as well as credit bills on
time. These Business Intelligence apps helps the loan manager to give the
credit card to the customer or to give the loan to the customer. These kind
of apps are useful to the customer to check their credit score and he/she
should take the steps to improve it.
142
BUSINESS INTELLIGENCE
will go to the Mumbai for Friday. By analysing the data using business
intelligence tool the Buses will be arranged according to the need.
In many cases, the results obtained have made possible a much more
efficient and profitable redesign of the entire logistical and productive
warehousing process.
143
BUSINESS INTELLIGENCE
Business Intelligence tools often source the data from data warehouses.
The reason is straightforward: a data warehouse already has data from
various production systems within an enterprise; the data is cleansed,
consolidated, conformed and stored in one location. Because of this BI
tools are able to concentrate on analyzing the data.
• Dashboards:
Software that provides real time digital visual indicators of how well
predetermined aspects of an organisation are working. Think of monitors
on dashboards of car.
• Data Mart:
This is a subset of data warehouse that focuses on particular aspect of an
organisations activities
• Data warehouse:
This is a comprehensive data base containing the information that has
been extracted , cleaned up, filtered , organised and integrated from
several electronic source of data. At Berkeley, the national History Museum
have contributed extracted data, formatted according to the Darwin core
data standard , to an online data warehouse that lets users query the
holding of all the museums through a portal.
144
BUSINESS INTELLIGENCE
• ETL:
Extract, Transform and Load- the software and processes needed to find ,
cleanse, and process the data in to data warehouse, data mart, or other
integrated data base or system
• Portal:
A website that provides the access to a structured set of online resources,
such as a search engine , a news services , a company home page or
other online services that a user wants to have access to on daily basis.
• Scorecard:
Software that provides visual digital measurement of the factors identified
by an organisation as a critical to its success.
Above mentioned are the places from where actually data is picked up for
processing and making the decisions. Some of the techniques / tools that
are used in business intelligence are as under:
• Data Visualization
When data is stored as a set or matrix of numbers, it is precise but difficult
to interpret. For example, are sales going up, down or holding steady?
When looking at more than one dimension of the data, this becomes even
harder. Hence the visualization of data in charts is a convenient way to
immediately understand how to interpret the data.
• Data Mining
Data mining is a computer supported method to reveal previously unknown
or unnoticed relations among data entities. Data mining techniques are
used in a myriad of ways: shopping basket analysis, measurement of
products consumers buy together in order to promote other products; in
the banking sector, client risk assessment is used to evaluate whether the
client is likely to pay back the loan based on historical data; in the
insurance sector, fraud detection based on behavioral and historical data;
in medicine and health, analysis of complications and/or common diseases
may help to reduce the risk of cross infections.
145
BUSINESS INTELLIGENCE
• Reporting
Design, schedule and generation of the performance, sales, reconciliation
and savings reports is an area where BI tools help business users. Reports
output by BI tools efficiently gather and present information to support the
management, planning and decision making process. Once the report is
designed it can be automatically send to a predefined distribution list in the
required form presenting daily/weekly/monthly statistics.
• Statistical Analysis
Statistical analysis uses the mathematic foundations to qualify the
significance and reliability of the observed relations. The most interesting
features are distribution analysis, confidence intervals (for example for
changes in user behaviours, etc). Statistical analysis is used for devising
and analyzing the results from data mining.
146
BUSINESS INTELLIGENCE
• Client background:
The client is involved in developing and manufacturing performance
material for industry. These materials are used and available in 100
countries to produce high performing environmental friendly products. The
client’s customers include the leading manufacturers in their respective
industries.
• Business requirement:
2. Perform cleansing and mining of the data captured with different data
base resources at different locations
147
BUSINESS INTELLIGENCE
Solution:
The data were fetched from various sources such as FoxPro, MySQL,
Oracle, MS SQL server- cleansed and integrated using SQL server
integration services (SSIS) to create the relational data warehouse. The
data cleaning included record matching , reduplication and column
segmentation using horizontal platform of SSIS and processes : schema
extraction and translation , schema matching and integration , schema
implementation.
The solution was built on support large volume of data and dynamic
business rules and included in batch window processing for daily
incremental processing and monthly restatement.
148
BUSINESS INTELLIGENCE
149
BUSINESS INTELLIGENCE
In OLTP system dealing with customer demographic data bases data that
could be fed would be
• increase customer credit limit
• change in customer salary level
Example 2:
A hotel owner uses BI analytical applications to gather statistical
information regarding average occupancy and room rate. It helps to find
aggregate revenue generated per room.
It also collects statistics on market share and data from customer surveys
from each hotel to decides its competitive position in various markets.
By analyzing these trends year by year, month by month and day by day
helps management to offer discounts on room rentals.
Example 3:
A bank gives branch managers access to BI applications. It helps branch
manager to determine who are the most profitable customers and which
customers they should work on.
The use of BI tools frees information technology staff from the task of
generating analytical reports for the departments. It also gives department
personnel access to a richer data source.
150
BUSINESS INTELLIGENCE
Data mining process is the discovery through large data sets of patterns,
relationships and insights that guide enterprises measuring and managing
where they are and predicting where they will be in the future.
Large amount of data and databases can come from various data sources
and may be stored in different data warehouses. And, data mining
techniques such as machine learning, artificial intelligence (AI) and
predictive modeling can be involved.
The data mining process requires commitment. But experts agree, across
all industries, the data mining process is the same. And should follow a
prescribed path. Here are the 6 essential steps of the data mining process.
151
BUSINESS INTELLIGENCE
1. Business understanding
Then, from the business objectives and current situations, create data
mining goals to achieve the business objectives within the current
situation.
2. Data understanding
The data understanding phase starts with initial data collection, which is
collected from available data sources, to help get familiar with the data.
Some important activities must be performed including data load and data
integration in order to make the data collection successfully.
Then, the data needs to be explored by tackling the data mining questions,
which can be addressed using querying, reporting, and visualization.
3. Data preparation
The data preparation typically consumes about 90% of the time of the
project. The outcome of the data preparation phase is the final data set.
Once available data sources are identified, they need to be selected,
cleaned, constructed and formatted into the desired form. The data
152
BUSINESS INTELLIGENCE
4. Modeling
5. Evaluation
6. Deployment
153
BUSINESS INTELLIGENCE
A. Advantages:
Here are some of the advantages of using Business Intelligence System:
1. Boost productivity
With a BI program, It is possible for businesses to create reports with a
single click thus saves lots of time and resources. It also allows employees
to be more productive on their tasks.
2. To improve visibility
BI also helps to improve the visibility of these processes and make it
possible to identify any areas which need attention.
3. Fix Accountability
BI system assigns accountability in the organization as there must be
someone who should own accountability and ownership for the
organization's performance against its set goals.
154
BUSINESS INTELLIGENCE
B. Disadvantages
1. Cost:
Business intelligence can prove costly for small as well as for medium-sized
enterprises. The use of such type of system may be expensive for routine
business transactions.
2. Complexity:
Another drawback of BI is its complexity in implementation of data-
warehouse. It can be so complex that it can make business techniques
rigid to deal with.
3. Limited use
Like all improved technologies, BI was first established keeping in
consideration the buying competence of rich firms. Therefore, BI system is
yet not affordable for many small and medium size companies.
The following are some business intelligence and analytics trends that you
should be aware of
155
BUSINESS INTELLIGENCE
156
BUSINESS INTELLIGENCE
4.10 SUMMARY
157
BUSINESS INTELLIGENCE
There are Four types of BI users also. They are the four key players who
are used Business Intelligence System:
2. The IT users:
The IT user also plays a dominant role in maintaining the BI infrastructure.
The difference between both of them is that a power user has the
capability of working with complex data sets, while the casual user need
will make him use dashboards to evaluate predefined sets of data.
158
BUSINESS INTELLIGENCE
Data mining process is the discovery through large data sets of patterns,
relationships and insights that guide enterprises measuring and managing
where they are and predicting where they will be in the future.
Large amount of data and databases can come from various data sources
and may be stored in different data warehouses. And, data mining
techniques such as machine learning, artificial intelligence (AI) and
predictive modeling can be involved.
159
BUSINESS INTELLIGENCE
2. The data analyst is a statistician who always needs to drill deep down
into data. BI system helps them to get fresh insights to develop unique
business strategies. This user is called as-------------------
a. The IT users
b. The Professional Data Analyst
c. The head of the company
d. The Business Users"
160
BUSINESS INTELLIGENCE
161
BUSINESS INTELLIGENCE
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
Video Lecture
162
BIG DATA
Chapter 5
Big Data
Objectives:
On completion of this chapter, you will understand about the big data in
data analytics considering following:
Structure:
5.1 Introduction
5.2 Definition
5.13 Summary
163
BIG DATA
5.1 INTRODUCTION:
When we handle big data, we may not sample but simply observe and
track what happens. Therefore, big data often includes data with sizes that
exceed the capacity of traditional software to process within an acceptable
time and value.
Current usage of the term big data tends to refer to the use of predictive
analytics, user behavioural analytics or certain other advanced data
analytics methods that extract value from data, and seldom to a particular
size of data set. "There is little doubt that the quantities of data now
available are indeed large, but that's not the most relevant characteristic of
this new data ecosystem." Analysis of data sets can find new correlations
to "spot business trends, prevent diseases, combat crime and so on."
Scientists, business executives, practitioners of medicine, advertising
and governments alike regularly meet difficulties with large data-sets in
areas including Internet searches, fintech, urban informatics, and business
informatics. Scientists encounter limitations in e-Science work,
including meteorology, genomics, connectomes complex physics
simulations, biology and environmental research.
Data sets grow rapidly, to a certain extent because they are increasingly
gathered by cheap and numerous information-sensing Internet of
things devices such as mobile devices, aerial (remote sensing), software
logs, cameras, microphones, radio-frequency identification (RFID) readers
and wireless sensor networks. One question for large enterprises is
determining who should own big-data initiatives that affect the entire
organization.
164
BIG DATA
5.2 DEFINITION:
These data sets are so voluminous that traditional data processing software
just can’t manage them. But these massive volumes of data can be used to
address business problems that you would not have been able to tackle
before.
The other definitions are also worth considering and they are as under:
1. The Big data is a terms that is used to describe the data, that is high
volume, high velocity and or high variety, requires new technologies and
techniques to capture , store and analyse it, and is used to enhance
decision making provide insight and discovery and support and
optimise the process.Therefore big data is high volume, high velocity
and high variety information assets that demand cost effective ,
innovative forms of information processing for enhanced insight and
decision making.
2. Big data is term that describes the large volumes of high velocity,
complex and variable data that requires advanced techniques and
technologies to enable the capture, storage , distribution , management
and analysis of the information.
165
BIG DATA
The emergence of new data source and need to analyse everything from
the live data streams in real-time to huge amount of unstructured content
has made many businesses to realise that they are now in era where the
spectrum of analytical workload is so broad that it cannot all be dealt with
using single enterprise data warehouse. It goes well beyond this. While
data warehouse are very much part of the analytical landscape, business
requirements are now dictating that a new more complex analytical
environment is needed to support a range of analytical workloads that
cannot be easily supported in a traditionally environment.
Big data is therefore a term associated with the new type of workload and
underlying technologies needed to solve the business problems that we
could not previously support due to technology limitations, prohibitive cost
or both.
Big data analytics is about analytical workloads that are associated with
some combination of data volume , data velocity and data variety that
may include complex analytics and complex data types.
For the reason, the big data analytics can include the traditional data
warehouse environment because some analytical workload may need both
traditional and workload optimised platforms to solve the business
problems. The new enterprise analytical environment encompasses
traditional data warehousing and other analytical platforms best suited to
certain analytical workload. Bog data does not replace a data warehouse.
166
BIG DATA
Although the concept of big data itself is relatively new, the origins of large
data sets go back to the 1960s and '70s when the world of data was just
getting started with the first data centres and the development of the
relational database.
Around 2005, people began to realize just how much data users generated
through Facebook, YouTube, and other online services. Hadoop (an open-
source framework created specifically to store and analyze big data sets)
was developed that same year. NoSQL also began to gain popularity during
this time.
With the advent of the Internet of Things (IoT), more objects and devices
are connected to the internet, gathering data on customer usage patterns
and product performance. The emergence of machine learning has
produced still more data.
While big data has come far, its usefulness is only just beginning. Cloud
computing has expanded big data possibilities even further. The cloud
offers truly elastic scalability, where developers can simply spin up ad hoc
clusters to test a subset of data.
167
BIG DATA
• Big data makes it possible for you to gain more complete answers
because you have more information.
168
BIG DATA
169
BIG DATA
170
BIG DATA
9. Linked data: data that is built upon standard Web technologies such as
HTTP, RDF, SPARQL and URIs to share information that can be
semantically queried by computers (rather than serving human needs).
This allows data from different sources to be connected and read. The
term was coined by Tim Berners-Lee, director of the World Wide Web
Consortium, in a design note about the Semantic Web project. This
project allowed the Web to connect related data that wasn’t linked in
the past by providing the mechanisms and lowering the barriers to
linking data currently linked. Examples of repositories for linked data
include (i) DBpedia, a dataset containing extracted data from Wikipedia,
(ii) GeoNames, RDF descriptions of more than 7,500,000 geographical
features worldwide, (iii) UMBEL, a lightweight reference structure of
20,000 subject concept classes and their relationships derived from
OpenCyc, and (iv) FOAF, friend of a friend, a dataset describing persons,
their properties and relationships. Linked open data is another project
that targets linked data with open content.
Finally, each data type has different requirements for analysis and poses
different challenges. In principle, the interpretation of data is known but in
practice, nobody has the full picture.
171
BIG DATA
Web data, sensor data and text data have emerged as popular data source
for big data analytical projects.
Big data is often characterised by three “V”s : The extreme volume data ,
the wide variety of data types and velocity at which the data must be
processed. Although, Big data does not equates to any specific volume of
172
BIG DATA
data , the erm is often used to describe the terabytes , petabytes and even
exabytes of data captured over time.
In this article, we are talking about how Big Data can be defined using the
famous 3 Vs -Volume, Velocity and Variety.
1. Volume: The amount of data matters. With big data, you’ll have to
process high volumes of low-density, unstructured data. This can be
data of unknown value, such as Twitter data feeds, clickstreams on a
webpage or a mobile app, or sensor-enabled equipment. For some
organizations, this might be tens of terabytes of data. For others, it may
be hundreds of petabytes. The volumes of the data is being captured by
enterprises.
Within the Social Media space for example, Volume refers to the amount
of data generated through websites, portals and online applications.
Especially for B2C companies, Volume encompasses the available data
that are out there and need to be assessed for relevance. Consider the
following -Facebook has 2 billion users, YouTube 1 billion users, Twitter
350 million users and Instagram 700 million users. Every day, these
users contribute to billions of images, posts, videos, tweets etc. You can
now imagine the insanely large amount -or Volume- of data that is
generated every minute and every hour.
173
BIG DATA
With Velocity we refer to the speed with which data are being generated.
Staying with our social media example, every day 900 million photos are
uploaded on Facebook, 500 million tweets are posted on Twitter, 0.4
million hours of video are uploaded on YouTube and 3.5 billion searches
are performed in Google. This is like a nuclear data explosion. Big Data
helps the company to hold this explosion, accept the incoming flow of
data and at the same time process it fast so that it does not create
bottlenecks.
3. Variety: Variety refers to the many types of data that are available.
Traditional data types were structured and fit neatly in a relational
database. With the rise of big data, data comes in new unstructured
data types. Unstructured and semi structured data types, such as text,
audio, and video, require additional pre-processing to derive meaning
and support metadata. This data is captured by enterprises
Variety in Big Data refers to all the structured and unstructured data that
has the possibility of getting generated either by humans or by
machines. The most commonly added data are structured -texts, tweets,
pictures & videos. However, unstructured data like emails, voicemails,
hand-written text, ECG reading, audio recordings etc, are also important
elements under Variety. Variety is all about the ability to classify the
incoming data into various categories
Data has intrinsic value. But it’s of no use until that value is discovered.
Equally important: How truthful is your data—and how much can you rely
on it?
Today, big data has become capital. Think of some of the world’s biggest
tech companies. A large part of the value they offer comes from their data,
which they’re constantly analyzing to produce more efficiency and develop
new products.
174
BIG DATA
Finding value in big data isn’t only about analyzing it (which is a whole
other benefit). It’s an entire discovery process that requires insightful
analysts, business users, and executives who ask the right questions,
recognize patterns, make informed assumptions, and predict behaviour.
175
BIG DATA
Such voluminous data can come from myriad different sources, such as
business sales record, the collected result of scientific experiments or real
time sensors used in internet of things. Data may be raw or pre-processed
using separate software tools before analytics are applied.
Data may also exists in a wide variety of file types, including the structured
data , such as SQL database stores, unstructured data, such as document
files, or streaming data from sensors. Further big data may involve
multiple, simultaneous data sources which may not otherwise be
integrated. For example, a big data analytics project may attempt to gauge
a product’s success and future sales by correlating past sales data, return
data and online buyer review data for the product.
Finally velocity refers to the speed at which big data must be analysed.
Every big data analytic project will ingest correlate and analyse the data
sources and then render an answer or result based on an overarching
query. This means human analysts must have a detailed understanding of
available data and possess some sense of what answer they are looking
for.
176
BIG DATA
Big data can help you address a range of business activities, from customer
experience to analytics. Here are just a few.
1. Product Development
Companies like Netflix and Procter & Gamble use big data to anticipate
customer demand. They build predictive models for new products and
services by classifying key attributes of past and current products or
services and modeling the relationship between those attributes and the
commercial success of the offerings.
In addition, P&G uses data and analytics from focus groups, social media,
test markets, and early store rollouts to plan, produce, and launch new
products.
2. Predictive Maintenance
Factors that can predict mechanical failures may be deeply buried in
structured data, such as the year, make, and model of equipment, as well
as in unstructured data that covers millions of log entries, sensor data,
error messages, and engine temperature. By analyzing these indications of
potential issues before the problems happen, organizations can deploy
maintenance more cost effectively and maximize parts and equipment
uptime.
3. Customer Experience
The race for customers is on. A clearer view of customer experience is
more possible now than ever before. Big data enables you to gather data
from social media, web visits, call logs, and other sources to improve the
interaction experience and maximize the value delivered. Start delivering
personalized offers, reduce customer churn, and handle issues proactively.
177
BIG DATA
5. Machine Learning
Machine learning is a hot topic right now. And data—specifically big data—
is one of the reasons why. We are now able to teach machines instead of
program them. The availability of big data to train machine learning models
makes that possible.
6. Operational efficiency
Operational efficiency may not always make the news, but it’s an area in
which big data is having the most impact. With big data, you can analyze
and assess production, customer feedback and returns, and other factors
to reduce outages and anticipate future demands. Big data can also be
used to improve decision-making in line with current market demand.
7. Drive Innovation
Big data can help you innovate by studying interdependencies among
humans, institutions, entities, and process and then determining new ways
to use those insights. Use data insights to improve decisions about financial
and planning considerations. Examine trends and what customers want to
deliver new products and services. Implement dynamic pricing. There are
endless possibilities.
While big data holds a lot of promise, it is not without its challenges.
• First, big data is…big. Although new technologies have been developed
for data storage, data volumes are doubling in size about every two
years. Organizations still struggle to keep pace with their data and find
ways to effectively store it.
• But it’s not enough to just store the data. Data must be used to be
valuable and that depends on curation. Clean data, or data that’s
relevant to the client and organized in a way that enables meaningful
analysis, requires a lot of work. Data scientists spend 50 to 80 percent of
their time curating and preparing data before it can actually be used.
• Finally, big data technology is changing at a rapid pace. A few years ago,
Apache Hadoop was the popular technology used to handle big data.
Then Apache Spark was introduced in 2014. Today, a combination of the
two frameworks appears to be the best approach. Keeping up with big
data technology is an ongoing challenge.
178
BIG DATA
Some of the most common of those big data challenges include the
following:
The most obvious challenge associated with big data is simply storing and
analyzing all that information. In its Digital Universe report, IDC estimates
that the amount of information stored in the world's IT systems is doubling
about every two years. By 2020, the total amount will be enough to fill a
stack of tablets that reaches from the earth to the moon 6.6 times. And
enterprises have responsibility or liability for about 85 percent of that
information.
On the management and analysis side, enterprises are using tools like
NoSQL databases, Hadoop, Spark, big data analytics software, business
intelligence applications, artificial intelligence and machine learning to help
them comb through their big data stores to find the insights their
companies need.
Of course, organizations don't just want to store their big data — they want
to use that big data to achieve business goals. According to the
NewVantage Partners survey, the most common goals associated with big
data projects included the following:
179
BIG DATA
4. Accelerating the speed with which new capabilities and services are
deployed
All of those goals can help organizations become more competitive — but
only if they can extract insights from their big data and then act on those
insights quickly. PwC's Global Data and Analytics Survey 2016 found,
"Everyone wants decision-making to be faster, especially in banking,
insurance, and healthcare."
But in order to develop, manage and run those applications that generate
insights, organizations need professionals with big data skills. That has
driven up demand for big data experts.
180
BIG DATA
The variety associated with big data leads to challenges in data integration.
Big data comes from a lot of different places — enterprise applications,
social media streams, email systems, employee-created documents, etc.
Combining all that data and reconciling it so that it can be used to create
reports can be incredibly difficult. Vendors offer a variety of ETL and data
integration tools designed to make the process easier, but many
enterprises say that they have not solved the data integration problem yet.
5. Validating data
Closely related to the idea of data integration is the idea of data validation.
Often organizations are getting similar pieces of data from different
systems, and the data in those different systems doesn't always agree. For
example, the ecommerce system may show daily sales at a certain level
while the enterprise resource planning (ERP) system has a slightly different
number. Or a hospital's electronic health record (EHR) system may have
one address for a patient, while a partner pharmacy has a different address
on record.
The process of getting those records to agree, as well as making sure the
records are accurate, usable and secure, is called data governance. And in
the AtScale 2016 Big Data Maturity Survey, the fastest-growing area of
concern cited by respondents was data governance.
181
BIG DATA
Security is also a big concern for organizations with big data stores. After
all, some big data stores can be attractive targets for hackers or advanced
persistent threats (APTs).
7. Organizational resistance
It is not only the technological aspects of big data that can be challenging
— people can be an issue too.
182
BIG DATA
Big data gives you new insights that open up new opportunities and
business models. Getting started involves three key actions:
1. Integrate
Big data brings together data from many disparate sources and
applications. Traditional data integration mechanisms, such as ETL (extract,
transform, and load) generally aren’t up to the task. It requires new
strategies and technologies to analyze big data sets at terabyte, or even
petabyte, scale.
During integration, you need to bring in the data, process it, and make
sure it’s formatted and available in a form that your business analysts can
get started with.
2. Manage
Big data requires storage. Your storage solution can be in the cloud, on
premises, or both. You can store your data in any form you want and bring
your desired processing requirements and necessary process engines to
those data sets on an on-demand basis. Many people choose their storage
solution according to where their data is currently residing. The cloud is
gradually gaining popularity because it supports your current compute
requirements and enables you to spin up resources as needed.
3. Analyse
Your investment in big data pays off when you analyze and act on your
data. Get new clarity with a visual analysis of your varied data sets.
Explore the data further to make new discoveries. Share your findings with
others. Build data models with machine learning and artificial intelligence.
Put your data to work.
183
BIG DATA
Best Practices
To help you on your big data journey, we’ve put together some key best
practices for you to keep in mind. Here are our guidelines for building a
successful big data foundation.
More extensive data sets enable you to make new discoveries. To that end,
it is important to base new investments in skills, organization, or
infrastructure with a strong business-driven context to guarantee ongoing
project investments and funding. To determine if you are on the right
track, ask how big data supports and enables your top business and IT
priorities. Examples include understanding how to filter web logs to
understand ecommerce behaviour, deriving sentiment from social media
and customer support interactions, and understanding statistical
correlation methods and their relevance for customer, product,
manufacturing, and engineering data.
One of the biggest obstacles to benefiting from your investment in big data
is a skills shortage. You can mitigate this risk by ensuring that big data
technologies, considerations, and decisions are added to your IT
governance program. Standardizing your approach will allow you to
manage costs and leverage resources. Organizations implementing big
data solutions and strategies should assess their skill requirements early
and often and should proactively identify any potential skill gaps. These
can be addressed by training/cross-training existing resources, hiring new
resources, and leveraging consulting firms.
184
BIG DATA
It is certainly valuable to analyze big data on its own. But you can bring
even greater business insights by connecting and integrating low density
big data with the structured data you are already using today.
Keep in mind that the big data analytical processes and models can be both
human- and machine-based. Big data analytical capabilities include
statistics, spatial analysis, semantics, interactive discovery, and
visualization. Using analytical models, you can correlate different types and
sources of data to make associations and meaningful discoveries.
At the same time, it’s important for analysts and data scientists to work
closely with the business to understand key business knowledge gaps and
requirements. To accommodate the interactive exploration of data and the
experimentation of statistical algorithms, you need high-performance work
areas. Be sure that sandbox environments have the support they need—
and are properly governed.
Big data processes and users require access to a broad array of resources
for both iterative experimentation and running production jobs. A big data
solution includes all data realms including transactions, master data,
reference data, and summarized data. Analytical sandboxes should be
185
BIG DATA
The need of big data velocity imposes unique demands on the underlying
compute infrastructure. The computing power required to quickly process
huge volumes and verities of data can overwhelm a single server or server
cluster. Organisations must apply adequate compute power to big data task
to achieve the desired velocity. This can potentially demand hundreds or
thousands of servers that can distribute the work and operate
collaboratively.
To improve the service level further, some public cloud providers offer big
data capabilities such as highly distributed Hadoop compute instances,
data warehouses, databases and other related cloud services. Amazon web
services elastic MapReduce is one of the example of big data services in a
public cloud.
186
BIG DATA
compute infrastructure to tackle big data project while minimising the need
for hardware and distributed compute software know how.
But these tools only addresses limited use cases. Many other bigdata
tasks , such as determining the effectiveness of new drug , can require
substantial scientific and computational expertise from analytical staff.
Three currently shortage of data scientists and other analysts who have
experience working with big data in a distributed , open source
environment.
Big data can be contrasted with small data , another evolving term that is
often used to describe data whose volume and format can be easily used
for self-service analytics . A commonly quoted axiom is that “Big data is for
machines , small data is for people.”
Back ground:
Prior to the 2008 financial crisis RBS were at one point the largest bank in
the world. When their exposure to the subprime mortgage market
threatened to collapse the business, the UK government stepped in , at
one time holding 84% of the company shares.
Big data analysis has a key part to play in this plan. The bank have
recently announced a sterling 100 million investment in data analytics
technology , and has named one of their first initiatives “Personology”-
emphasizing a focus on customer rather than financial product.
187
BIG DATA
“In the seventies “ say Nellisen , “banks , through the agency of their
branch staff and managers, knew their customers individually. They knew
who they were and how they fitted in-who their family were trying to do.
Nellisen says: If you look at someone like Amazon, they know relatively
little about heir customers compared to us. But they make very good use
of the data they do have.
188
BIG DATA
A very simple and straight forward example , which makes the nice starting
point , is congratulating customers personally when they contact the
branch on their birthday. That’s not exactly big data analytics but it’s in
line with the concept of Personology.
Systems have also been developed to let customers know individually how
they would benefit from deals and promotions being offered. While in the
past , logging in to an online account , or telephoning the customer service
, would have been opportunity for the bank to offer whichever services it
could most profitably offload, now customer will receive personalised
recommendations showing exactly how much they would save by taking up
a particular offer.
Even though it is early days, Nelissen is able to report some initial results.
For example, every single customer contacted regarding duplicate financial
products they were paying for opted to cancel the third party product
rather than the RBS products.
Nelissen says, “we are very excited about the stuff were doing. We are
seeing significantly improved response rates and more engagements”
189
BIG DATA
They are at the point where they understand what the data is trying to do
and feel it helps them have good conversation – and that’s the big shift
from where we were before.
Staff engagement is critical – the ideas that works best , and that have the
best resonance with customers , are the once that we either got from the
front line or we worked really closely with the frontline to develop.
Engaging with staff and other stake holders is essential. They must fully
understand the reason that data analytics is being used in customer facing
situations if they are going to make the most effective use of the insight
being uncovered.
190
BIG DATA
5.13 SUMMARY
In order to understand 'Big Data', you first need to know What is Data. The
quantities, characters, or symbols on which operations are performed by a
computer, which may be stored and transmitted in the form of electrical
signals and recorded on magnetic, optical, or mechanical recording media.
Big Data is also data but with a huge size. Big Data is a term used to
describe a collection of data that is huge in size and yet growing
exponentially with time. In short such data is so large and complex that
none of the traditional data management tools are able to store it or
process it efficiently.
• The New York Stock Exchange generates about one terabyte of new
trade data per day.
• Social Media
The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly
generated in terms of photo and video uploads, message exchanges,
putting comments etc.
1. Structured
2. Unstructured
3. Semi-structured
191
BIG DATA
Structured
Any data that can be stored, accessed and processed in the form of fixed
format is termed as a 'structured' data. Over the period of time, talent in
computer science has achieved greater success in developing techniques
for working with such kind of data (where the format is well known in
advance) and also deriving value out of it. However, nowadays, we are
foreseeing issues when a size of such data grows to a huge extent, typical
sizes are being in the rage of multiple zettabytes.
Unstructured
Any data with unknown form or the structure is classified as unstructured
data. In addition to the size being huge, un-structured data poses multiple
challenges in terms of its processing for deriving value out of it. A typical
example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc. Now day organizations
have wealth of data available with them but unfortunately, they don't know
how to derive value out of it since this data is in its raw form or
unstructured format.
Semi-structured
Semi-structured data can contain both the forms of data. We can see semi-
structured data as a structured in form but it is actually not defined with
e.g. a table definition in relational DBMS. Example of semi-structured data
is a data represented in an XML file.
Examples Of Semi-structured Data is data in XML file.
192
BIG DATA
Big Data Velocity deals with the speed at which data flows in from sources
like business processes, application logs, networks, and social media sites,
sensors, Mobile devices, etc. The flow of data is massive and continuous.
(iv) Variability – This refers to the inconsistency which can be shown by the
data at times, thus hampering the process of being able to handle and
manage the data effectively.
193
BIG DATA
Big Data technologies can be used for creating a staging area or landing
zone for new data before identifying what data should be moved to the
data warehouse. In addition, such integration of Big Data technologies and
data warehouse helps an organization to offload infrequently accessed
data.
Thus,
• Big Data is defined as data that is huge in size. Bigdata is a term used to
describe a collection of data that is huge in size and yet growing
exponentially with time.
• Examples of Big Data generation includes stock exchanges, social media
sites, jet engines, etc.
• Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
• Volume, Variety, Velocity, and Variability are few Characteristics of
Bigdata
• Improved customer service, better operational efficiency, Better Decision
Making are few advantages of Bigdata
194
BIG DATA
1. Big data analytics is about analytical workloads that are associated with
some combination of -------------that may include complex analytics and
complex data types.
a. Data volume , data velocity and data variety
b. Data velocity and data variety
c. Varieties of data
d. Speed of the data with varieties
3. Finding value in big data isn’t only about analyzing it but it’s an entire
discovery process that requires --------------
a. insightful analysts, business users, and executives who ask the right
questions, recognize patterns, make informed assumptions, and
predict behaviour
b. insightful analysts, business users, and executives
c. The people who ask the right questions,
d. Staff who recognize patterns, make informed assumptions, and
predict behaviour
4. What for Companies like Netflix and Procter & Gamble use big data to
anticipate customer demand?
a. Operational efficiency
b. Product developments
c. Drive innovation
d. Understand customer experience
195
BIG DATA
196
BIG DATA
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
197
DATA MINING
Chapter 6
Data Mining
Objectives:
Structure:
6.1 Introduction
6.2 Definition
6.13 Summary
198
DATA MINING
6.1 INTRODUCTION:
Over the last decade, advances in processing power and speed have
enabled us to move beyond manual, tedious and time-consuming practices
to quick, easy and automated data analysis. The more complex the data
sets collected, the more potential there is to uncover relevant insights.
Retailers, banks, manufacturers, telecommunications providers and
insurers, among others, are using data mining to discover relationships
among everything from pricing, promotions and demographics to how the
economy, risk, competition and social media are affecting their business
models, revenues, operations and customer relationships.
6.2 DEFINITION:
199
DATA MINING
200
DATA MINING
How to do Data Mining: The accepted data mining process involves six
steps:
1. Business understanding
The first step is establishing the goals of the project are and how data
mining can help you reach that goal. A plan should be developed at this
stage to include timelines, actions, and role assignments.
2. Data understanding
Data is collected from all applicable data sources in this step. Data
visualization tools are often used in this stage to explore the properties of
the data to ensure it will help achieve the business goals.
3. Data preparation
Data is then cleansed, and missing data is included to ensure it is ready to
be mined. Data processing can take enormous amounts of time depending
on the amount of data analyzed and the number of data sources.
Therefore, distributed systems are used in modern database management
systems (DBMS) to improve the speed of the data mining process rather
than burden a single system. They’re also more secure than having all an
organization’s data in a single data warehouse. It’s important to include
failsafe measures in the data manipulation stage so data is not
permanently lost.
4. Data Modeling
Mathematical models are then used to find patterns in the data using
sophisticated data tools.
5. Evaluation
The findings are evaluated and compared to business objectives to
determine if they should be deployed across the organization.
6. Deployment
In the final stage, the data mining findings are shared across everyday
business operations. An enterprise business intelligence platform can be
used to provide a single source of the truth for self-service data discovery.
201
DATA MINING
• Big Data
The challenges of big data are prolific and penetrate every field that
collects, stores, and analyzes data. Big data is characterized by four major
challenges: volume, variety, veracity, and velocity. The goal of data mining
is to mediate these challenges and unlock the data’s value.
Variety encompasses the many different types of data collected and stored.
Data mining tools must be equipped to simultaneously process a wide
array of data formats. Failing to focus an analysis on both structured and
unstructured data inhibits the value added by data mining.
202
DATA MINING
Finally, veracity acknowledges that not all data is equally accurate. Data
can be messy, incomplete, improperly collected, and even biased. With
anything, the quicker data is collected, the more errors will manifest within
the data. The challenge of veracity is to balance the quantity of data with
its quality.
• Over-Fitting Models
Over-fitting occurs when a model explains the natural errors within the
sample instead of the underlying trends of the population. Over-fitted
models are often overly complex and utilize an excess of independent
variables to generate a prediction. Therefore, the risk of over-fitting is
heighted by the increase in volume and variety of data. Too few variables
make the model irrelevant, where as too many variables restrict the model
to the known sample data. The challenge is to moderate the number of
variables used in data mining models and balance its predictive power with
accuracy.
203
DATA MINING
• Cost of Scale
As data velocity continues to increase data’s volume and variety, firms
must scale these models and apply them across the entire organization.
Unlocking the full benefits of data mining with these models requires
significant investment in computing infrastructure and processing power. To
reach scale, organizations must purchase and maintain powerful
computers, servers, and software designed to handle the firm’s large
quantity and variety of data.
With data privacy comes the need for organizations to develop internal
rules and constraints on the use and implementation of a customer’s data.
Data mining is a powerful tool that provides businesses with compelling
insights into their consumers. However, at what point do these insights
infringe on an individual’s privacy? Organizations must weigh this
relationship with their customers, develop policies to benefit consumers,
and communicate these policies to the consumers to maintain a
trustworthy relationship.
204
DATA MINING
• Supervised Learning
• Linear Regressions
Linear regressions predict the value of a continuous variable using one or
more independent inputs. Realtors use linear regressions to predict the
value of a house based on square footage, bed-to-bath ratio, year built,
and zip code.
• Logistic Regressions
Logistic regressions predict the probability of a categorical variable using
one or more independent inputs. Banks use logistic regressions to predict
the probability that a loan applicant will default based on credit score,
household income, age, and other personal factors.
• Time Series
Time series models are forecasting tools which use time as the primary
independent variable. Retailers, such as Macy’s, deploy time series models
to predict the demand for products as a function of time and use the
forecast to accurately plan and stock stores with the required level of
inventory.
205
DATA MINING
rules, the group that a new observation falls into will become its predicted
value.
• Neural Networks
- A neural network is an analytical model inspired by the structure of the
brain, its neurons, and their connections. These models were originally
created in 1940s but have just recently gained popularity with statisticians
and data scientists. Neural networks use inputs and, based on their
magnitude, will “fire” or “not fire” its node based on its threshold
requirement. This signal, or lack thereof, is then combined with the other
“fired” signals in the hidden layers of the network, where the process
repeats itself until an output is created. Since one of the benefits of neural
networks is a near-instant output, self-driving cars are deploying these
models to accurately and efficiently process data to autonomously make
critical decisions.
• K-Nearest Neighbor
The K-nearest neighbour method is used to categorize a new observation
based on past observations. Unlike the previous methods, k-nearest
neighbour is data-driven, not model-driven. This method makes no
underlying assumptions about the data nor does it employ complex
processes to interpret its inputs. The basic idea of the k-nearest neighbour
model is that it classifies new observations by identifying its closest K
neighbours and assigning it the majority’s value. Many recommender
systems nest this method to identify and classify similar content which will
later be pulled by the greater algorithm.
206
DATA MINING
• Unsupervised Learning
Unsupervised tasks focus on understanding and describing data to reveal
underlying patterns within it. Recommendation systems employ
unsupervised learning to track user patterns and provide them with
personalized recommendations to enhance their customer experience.
Common analytical models used in unsupervised data mining approaches
are:
• Clustering
Clustering models group similar data together. They are best employed
with complex data sets describing a single entity. One example is lookalike
modeling, to group similarities between segments, identify clusters, and
target new groups who look like an existing group.
• Association Analysis
Association analysis is also known as market basket analysis and is used to
identify items that frequently occur together. Supermarkets commonly use
this tool to identify paired products and spread them out in the store to
encourage customers to pass by more merchandise and increase their
purchases.
207
DATA MINING
• Language Standardization
Similar to the way that SQL evolved to become the preeminent language
for databases, users are beginning to demand a standardization among
data mining. This push allows users to conveniently interact with many
different mining platforms while only learning one standard language.
While developers are hesitant to make this change, as more users continue
to support it, we can expect a standard language to be developed within
the next few years.
• Scientific Mining
With its proven success in the business world, data mining is being
implemented in scientific and academic research. Psychologists now use
association analysis to track and identify broader patterns in human
behavior to support their research. Economists similarly employ forecasting
algorithms to predict future market changes based on present-day
variables.
208
DATA MINING
• Web mining
With the expansion of the internet, uncovering patterns and trends in
usage is a great value to organizations. Web mining uses the same
techniques as data mining and applies them directly on the internet. The
three major types of web mining are content mining, structure mining, and
usage mining. Online retailers, such as Amazon, use web mining to
understand how customers navigate their webpage. These insights allow
Amazon to restructure their platform to improve customer experience and
increase purchases.
The proliferation of web content was the catalyst for the World Wide Web
Consortium (W3C) to introduce standards for the Semantic Web. This
provides a standardized method to use common data formats and
exchange protocols on the web. This makes data more easily shared,
reused, and applied across regions and systems. This standardization
makes it easier to mine large quantities of data for analysis.
• RapidMiner
209
DATA MINING
Orange
Mahout
210
DATA MINING
• Microstrategy
• WEKA:
This is a JAVA based customization tool, which is free to use. It includes
visualization and predictive analysis and modeling techniques, clustering,
association, regression and classification.
• R-Programming Tool:
This is written in C and FORTRAN, and allows the data miners to write
scripts just like a programming language/platform. Hence, it is used to
make statistical and analytical software for data mining. It supports
graphical analysis, both linear and nonlinear modeling, classification,
clustering and time-based data analysis.
• Knime:
Primarily used for data pre-processing – i.e. data extraction,
transformation and loading, Knime is a powerful tool with GUI that shows
the network of data nodes. Popular amongst financial data analysts, it has
modular data pipe lining, leveraging machine learning, and data mining
concepts liberally for building business intelligence reports.
Data mining tools and techniques are now more important than ever for all
businesses, big or small, if they would like to leverage their existing data
stores to make business decisions that will give them a competitive edge.
Such actions based on data evidence and advanced analytics have better
chances of increasing sales and facilitating growth. Adopting well-
established techniques and tools and availing the help of data mining
211
DATA MINING
experts shall assist companies to utilize relevant and powerful data mining
concepts to their fullest potential.
The art of data mining has been constantly evolving. There are a number
of innovative and intuitive techniques that have emerged that fine-tune
data mining concepts in a bid to give companies more comprehensive
insight into their own data with useful future trends. Many techniques are
employed by the data mining experts, some of which are listed below:
3. Database Analysis:
Databases hold key data in a structured format, so algorithms built using
their own language (such as SQL macros) to find hidden patterns within
organized data is most useful. These algorithms are sometimes inbuilt into
the data flows, e.g. tightly coupled with user-defined functions, and the
findings presented in a ready-to-refer-to report with meaningful analysis.
212
DATA MINING
4. Text Analysis:
This concept is very helpful to automatically find patterns within the text
embedded in hordes of text files, word-processed files, PDFs, and
presentation files. The text-processing algorithms can for instance, find out
repeated extracts of data, which is quite useful in the publishing business
or universities for tracing plagiarism.
213
DATA MINING
1. Marketing/Retails
In order to create models, marketing companies use data mining. This was
based on history to forecast who’s going to respond to new marketing
campaigns such as direct mail, online marketing, etc. This means that
marketers can sell profitable products to targeted customers.
2. Finance/Banking
Since data extraction provides information to financial institutions on loans
and credit reports, data can determine good or bad credits by creating a
model for historic customers. It also helps banks to detect fraudulent
transactions by credit cards that protect the owner of a credit card.
214
DATA MINING
3. Researchers
Data mining can motivate researchers to accelerate when the method
analysis the data. Therefore they can work more time on other projects.
Shopping behaviours can be detected. Most of the time, you may
experience new problems while designing certain shopping patterns.
Therefore data mining is used to solve these problems. All the information
on these shopping patterns can be found by mining methods. This process
also creates an area where all the unexpected shopping patterns are
calculated. This data extraction can be beneficial when shopping patterns
are identified.
215
DATA MINING
• Automated Decision-Making
Data Mining allows organizations to continually analyze data and automate
both routine and critical decisions without the delay of human judgment.
Banks can instantly detect fraudulent transactions, request verification, and
even secure personal information to protect customers against identity
theft. Deployed within a firm’s operational algorithms, these models can
collect, analyze, and act on data independently to streamline decision
making and enhance the daily processes of an organization.
• Cost Reduction
Data mining allows for more efficient use and allocation of resources.
Organizations can plan and make automated decisions with accurate
forecasts that will result in maximum cost reduction. Delta imbedded RFID
chips in passengers checked baggage and deployed data mining models to
identify holes in their process and reduce the number of bags mishandled.
This process improvement increases passenger satisfaction and decreases
the cost of searching for and re-routing lost baggage.
• Customer Insights
Firms deploy data mining models from customer data to uncover key
characteristics and differences among their customers. Data mining can be
used to create personas and personalize each touchpoint to improve overall
customer experience. In 2017, Disney invested over one billion dollars to
create and implement “Magic Bands.” These bands have a symbiotic
relationship with consumers, working to increase their overall experience at
216
DATA MINING
the resort while simultaneously collecting data on their activities for Disney
to analyze to further enhance their customer experience.
With data mining a retailer can use point of sale records of customer
purchase targeted promotions based on individuals purchase history. By
mining demographic data from comment or warranty card the retailer could
develop product and promotions to appeal to specific customer segment.
217
DATA MINING
How is data mining able to tell you important things that you didn't know
or what is going to happen next? That technique that is used to perform
these feats is called modeling. Modeling is simply the act of building a
model (a set of examples or a mathematical relationship) based on data
from situations where the answer is known and then applying the model to
other situations where the answers aren't known. Modeling techniques
have been around for centuries, of course, but it is only recently that data
storage and communication capabilities required to collect and store huge
amounts of data, and the computational power to automate modeling
techniques to work directly on the data, have been available.
218
DATA MINING
Level of Analysis:
• Genetic Algorithms :
Optimisation technique that use processes such as genetic combination ,
mutation and natural selection in design based on concept of natural
evolution
• Decision Tree:
Tree shaped structure that represent the set of decision. These decisions
generate rule for classification of data set
• Rule induction:
The extraction of useful if and then rules from data based on statistical
significance.
• Data visualisation:
The visual interpretation of complex relationships in multidimensional data.
Graphic tools are used to illustrate data relationship.
219
DATA MINING
2. Query complexity:
The more complex the queries and greater the number of queries being
processed , the more powerful system is required.
• Knowledge base:
This is the domain knowledge. This knowledge is used to guide the search
or evaluate interestingness of the resulting pattern.
• Knowledge Discovery:
The steps involved in the knowledge discovery process are: Data cleaning ,
Data integration, Data selection, Data transformation, Data mining,,
pattern evaluation and knowledge presentation.
220
DATA MINING
• User Interface:
It is the module of data mining system that helps the communication
between the users and the data mining system. User interface allows the
following functionalities:
• Interact with the system by specifying the data mining query task
• Providing the information to help focus the search
• Mining based on the intermediate data mining result
• Browse data base and data warehouse schemas or data structures.
• Evaluate mined patterns
• Visualise the patterns in the different forms
• Data cleaning
Data cleaning is the technique that is applied to remove the noisy data and
correct the inconsistencies in data. Data cleaning involves transformation
to correct the wrong data. Data cleaning is performed as a data processing
step while preparing the data for a data warehouse.
• Data Integration:
Data integration is the processing technique that merges the data from
multiple heterogenous data sources in to coherent data store. Data
integration may involve inconsistent data , and therefore needs data
cleaning.
• Data selection:
Data selection is the process where data relevant to the analysis task are
retrieved from the data base. Sometime, data transformation and
consolidation are performed before the data selection process.
• Clusters:
Clusters refers to a group of similar kind of objects. Cluster analysis refers
to forming group of objects that are very similar to each other but are
highly different from the objects in other clusters.
• Data transformation
The data is transformed or consolidated in to forms appropriate for mining,
by performing summary or aggregation operations.
221
DATA MINING
Pacific Mall in West Delhi figured out through algorithms that 65% of the
customers at its food-court preferred vegetarian food. That prompted the
mall to add a Halidram outlet and sales at the food-court went up by ₹50
lakh a month.
In Bengaluru, Orion Mall found that most of its customers are “trendier”
young crowds who mostly purchased fashion and electronics, prompting it
to ramp up those verticals.
Taking a leaf out of the ecommerce textbook, malls have started, albeit in
a small way, mining customers data and using algorithms to drive sales.
Prominent malls in India for years had revenue-sharing agreements with
retailers and the shopping centres would receive daily or real-time sales
data from brands through a common technological platform. Now such
platforms are evolving to capture various other information on buying
patterns and preferences of consumers to help malls drive sales and
footfalls.
“We have built a platform which gives them insights into what are the
areas they need to concentrate. We have an AI (artificial intelligence)
platform and through data science we forecast their revenue and trends,
and we tell them the buying habits of the consumers,” said AM Navail, an
assistant vice president at tech firm Pathfinder.
Malls these days have a host of tech at their disposal to help them not only
drive sales but also enhance the overall consumer experiences.
222
DATA MINING
For example, high-definition CCTV cameras not only capture pictures but
also generates heatmap of visitors around the mall that helps mall owners
to assign facilities and manpower. Such cameras are also used to analyse
gender and age brackets of customers and the stores they are entering. “If
the customers are thronging to the sports area, we can figure out with the
heatmap technology and tally with the conversion rates with those retailers
and realise we need more brands in that category,” said Deepak Zutshi, the
centre head at New Delhi’s Select Citywalk Mall.
West Delhi’s Pacific Mall has installed a technology that can track the
duration of cars parked in the parking lot. “That way we are getting
average three hours of dwell time of cars that are coming into the mall,”
said Abhishek Bansal, its executive director. Rajneesh Mahajan, the CEO of
InOrbit Malls that operates shopping centres in several cities, said churning
consumer data was in its infancy in India due to limited and non-uninform
data available from retailers to mall owners.
“We are still at an initial stage and this will evolve and people will get
unified platforms to get the data,” he said. “Unless everybody comes on
board and data in a certain manner and the KPIs (key performance
indicators) are defined, it won’t be that meaningful.”
223
DATA MINING
6.13 SUMMARY:
c. Fraud Detection
Apart from these, data mining can also be used in the areas of production
control, customer retention, science exploration, sports, astrology, and
Internet Web Surf-Aid
224
DATA MINING
Fraud Detection
Data mining is also used in the fields of credit card services and
telecommunication to detect frauds. In fraud telephone calls, it helps to
find the destination of the call, duration of the call, time of the day or
week, etc. It also analyzes the patterns that deviate from expected norms.
225
DATA MINING
226
DATA MINING
227
DATA MINING
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
228
DESCRIPTIVE ANALYTICS
Chapter 7
Descriptive Analytics
Objectives:
Structure:
7.1 Introduction
7.2 Definition
7.10 Summary
229
DESCRIPTIVE ANALYTICS
7.1 INTRODUCTION
The four types of analytics are usually implemented in stages and no one
type of analytics is said to be better than the other. They are interrelated
and each of these offers a different insight. With data being important to so
many diverse sectors- from manufacturing to energy grids, most of the
companies rely on one or all of these types of analytics. With the right
choice of analytical techniques, big data can deliver richer insights for the
companies. Before diving deeper into each of these, let’s define the four
types of analytics:
230
DESCRIPTIVE ANALYTICS
231
DESCRIPTIVE ANALYTICS
7.2 DEFINITION
232
DESCRIPTIVE ANALYTICS
233
DESCRIPTIVE ANALYTICS
234
DESCRIPTIVE ANALYTICS
Data mining describes the next step of the analysis and involves a search
of the data to identify patterns and meaning. Identified patterns are
analysed to discover the specific ways that learners interacted with the
learning content and within the learning environment.
Performance data provides analysts with insight into how well learners
succeeded on the course; this information could come from data taken
from assessments or assignments. It’s important to note that insights
learned from descriptive analysis are not used for making inferences or
predictions about a learner’s future performance.
235
DESCRIPTIVE ANALYTICS
1. Mean
The Mean or average is probably the most commonly used methods of
describing a central tendency. The mean represents the centre of gravity of
distribution. Each score in a distribution contributes to the determination of
mean. It is also known as arithmetic average. Mean is the average of all
values in a distribution
To compute the mean, all the values are added and divided by the total
number of values. It is the ratio of summation of all scores to the total
numbers of scores. Using mean one can compare different groups. It also
helps in computing further statistics. Since this method involves handling
of large numbers and entails tedious calculations, the researcher used data
analysis tools available in a simple Microsoft® office suite, Excel 2007 to
calculate the mean.
2. Median
The median is the positional average that divides a distribution into two
equal parts so that one half of items falls above it and the other half below
it.
The researcher used data analysis tools available in the simple Microsoft®
office suite, Excel 2007 to calculate the median. The mode of function is
Formulas/More functions/Statistical/ Median.
236
DESCRIPTIVE ANALYTICS
3. Mode
The mode is the most frequently occurring value in the set of scores. The
mode is indirectly calculated mean and median. It is a quick and
appropriate measure of central tendency.
The researcher used data analysis tools available in the simple Microsoft®
office suite, Excel 2007 to calculate the mode. The mode of function is
Formulas/More functions/Statistical/ Mode.
• Measures of variability
The measures of central tendency indicate the central value of the
distribution. However, the central value alone is not sufficient to fully
describe the distribution.
Variability describes the way the classes are distributed and how they are
changing in relation to a variety. For example, Range and Standard
Deviation. The technique employed in the present study is Standard
Deviation. The range is simply the highest value minus the lowest value.
The standard deviation is more accurate and detail measure of dispersion.
Standard Deviation
The standard deviation shows the relation that set of scores has with the
mean of the sample. Standard deviation is expressed as the positive
square root of the sum of the squared deviations from the mean divided by
the number of scores minus one. It is the average difference between
observed values and the mean. The standard deviation is used when
expressing dispersion in the same unit as the original measurement. It is
designated as (σ)
237
DESCRIPTIVE ANALYTICS
The researcher used data analysis tools available in the simple Microsoft®
office suite, Excel 2007 to calculate the S.D. The mode of function is
Formulas/More functions/Statistical/ STDEV.
Skewness: Many times it is seen that the mean, median and mode of the
distribution don’t fall at the same place, i.e. the scores may extend much
farther in one direction than the other. Such a distribution is called a
skewed distribution.
238
DESCRIPTIVE ANALYTICS
The researcher used data analysis tools available in the simple Microsoft®
office suite, Excel 2007 to calculate the Kurtosis. The mode of function is
Formulas/More functions/Statistical/ KURT.
In order to choose the right descriptive statistics tool, one need to consider
the types and the number of variables available as well as the objective of
the analysis. Based on these three criteria the grid can be generated that
will help to decide which tool to use according to situation.
239
DESCRIPTIVE ANALYTICS
The second column indicates the number of variables. The proposed tools
can handle either the description of one (univariate analysis) or the
description of the relationships between two (bivariate analysis) or several
variables. The grid also includes a column with an example for each
situation.
Grid
Please note that the list below is not exhaustive. However, it contains the
most commonly used descriptive statistics, all available in XLSTAT.
240
DESCRIPTIVE ANALYTICS
241
DESCRIPTIVE ANALYTICS
242
DESCRIPTIVE ANALYTICS
243
DESCRIPTIVE ANALYTICS
The findings from descriptive analytics can quickly identify areas that
require improvement - whether that be improving learner engagement or
the effectiveness of course delivery.
Here are some examples of how descriptive analytics is being used in the
field of learning analytics:
• Tracking course enrolments, course compliance rates,
• Recording which learning resources are accessed and how often
• Summarizing the number of times a learner posts in a discussion board
• Tracking assignment and assessment grades
• Comparing pre-test and post-test assessments
• Analyzing course completion rates by learner or by course
• Collating course survey results
• Identifying length of time that learners took to complete a course
When learners engage in online learning, they leave a digital trace behind
with every interaction they have in the learning environment. This means
that descriptive analytics in online learning can gain insight into behaviours
and performance indicators that would otherwise not be known.
244
DESCRIPTIVE ANALYTICS
• Analyze the value and impact of course design and learning resources.
The HR analytics journey within Coca- Cola Enterprises (CCE) really began
in 2010. Given the complexity of the CCE operation, its global footprint and
various business units, a team was needed which was able to provide a
centralised HR reporting and analytics service to the business. This led to
the formation of a HR analytics team serving 8 countries. As a new team
they had the opportunity to work closely with the HR function to
understand their needs and build a team not only capable of delivering
those requirements but also challenge the status quo.
The first step was to establish strong foundations for the new data
analytics programme.
It was imperative to get the basics right, enhance credibility, and automate
as many of the basic descriptive reports as possible. The sheer number of
requests the team received was preventing them from adding value and
providing more sophisticated reports and scorecards.
245
DESCRIPTIVE ANALYTICS
‘’In the early stages requests were very basic. For example, how many
people am I supporting? How many people have started or left? How many
promotions have there been in my part of the organisation? The majority
of requests were therefore very descriptive in their nature. There was an
obvious need to automate as much as we could, because if we could not
free ourselves of that kind of transactional reporting, there was no way we
were going to add any value with analytics.’’
The team soon found that the more they provided reports, the more
internal recognition they received. This ultimately created a thirst within
HR for more data and metrics for measuring the performance of the
organisation from a HR perspective. The HR analytics function knew this
was an important next step but it wasn’t where they wanted the journey to
end. They looked for technology that would allow them to automate as
many of these metrics as possible whilst having the capability to combine
multiple HR systems and data sources.
A breakthrough, and the next key milestone in the journey for CCE, was
when they invested in an "out of the box" system which provided them
with standard metrics and measures, and enabled quick and simple
descriptive analytics.
Instead of building a new set of standards from scratch, CCE piloted pre-
existing measures within the application and applied these to their data.
The result was that the capability to deliver more sophisticated descriptive
analytics was realised quicker and began delivering results sooner than
CCE business customers had expected.
‘‘We were able to segment tasks based on the skill set of the team. This
created a natural talent development pipeline and ensured the right skill
set was dedicated to the appropriate task. This freed up time for some of
the team to focus on workforce analytics. We implemented a solution that
combines data from various sources, whether it is our HR system, the case
management system for the service centre, or our on- boarding /
246
DESCRIPTIVE ANALYTICS
recruitment tools. We brought all that data in to one central area and
developed a lot of ratios and measures. That really took it to the next
level.’’
247
DESCRIPTIVE ANALYTICS
You can really help by extracting the right questions. If you have the right
question, then the analysis you are going to complete will be meaningful
and insightful.’’
There are numerous examples where the HR reporting and analytics team
have partnered with the HR function and provided insights that have
helped to develop more impactful HR processes and deliver greater
outcomes for the business. As with many organisations it is the
engagement data with which the majority of HR insight is created.
Developing further insight beyond standard survey outputs has meant that
CCE has begun to increase the level of insights developed through the
method, and by using longitudinal data they have started to track
sentiment in the organisation. Tracking sentiment alongside other
measures provides leaders with a good indicator for sense- checking the
power of HR initiatives and general business processes. The question is
whether the relationship between engagement and business results is
causal or correlative. For CCE this point is important when explaining the
implications HR data insights to the rest of the business.
248
DESCRIPTIVE ANALYTICS
For CCE's analytics team one of the most important next steps is to share
the experience and knowledge gained from developing the analytics
function with their colleagues, and build capability across HR.
‘’We are also reviewing the learning and development curriculum for HR to
see what skills and competencies we need to build. One of the
competencies that we have introduced is HR professionals being data
analysers.
• Barriers
As with any long journey the analytics team at CCE have faced numerous
barriers. The challenges they list are common to most HR professionals
attempting to establish a significant new process, but it is the challenge of
establishing new capability and embedding fit-for-purpose technologies
which has created the greatest challenge at CCE.
‘’In terms of barriers, technology is one. For example having the right data
warehouse in place that allows you to extract the data very quickly. From a
HR perspective we are well placed, however extracting data from the rest
of the business, is a challenge. At CCE HR is trying to branch out and get
the data from other parts of the business, which is probably quite unusual.
People probably do not expect HR to be that kind of driving force.’’
249
DESCRIPTIVE ANALYTICS
‘’At conferences I have listened to major firms who have PhD students in
their business intelligence teams, who appear to be very good at not only
analytics but also presenting information. They are few and between and I
believe that people who have that skill set would not naturally go into HR.
If I reference the recent big data conference I went to, and the projects
that some of these companies were doing outside of HR with customer
data, Twitter data, really what I would call ‘big data,’ it may seem a lot
more appetising and appealing than HR analytics. If I was a PhD student, I
am not sure I would consider HR as a place to go to develop my career and
also, whether I would see any longevity in it. As a function we need to
change that.’’
If you think about the 2020 workplace, the issues that we have around
leadership development, multi-generational workforces, people not staying
with companies for as long as they have done in the past, there are a lot of
250
DESCRIPTIVE ANALYTICS
challenges out there for HR. These are all areas where the use of HR
analytics can provide the business with valuable insights.’’
For CCE it appears that analytics and HR insight are gaining significant
traction within the organisation. Leaders are engaging at all levels and the
HR function is increasingly sharing insights across business boundaries.
This hasn't been without its challenges: CCE face HR's perennial issues of
technology and the perceived lack of analytics capability. However their
approach of creating quality data sets and automated reporting processes
has provided them with the foundations and opportunity to begin to
develop real centres of expertise capable of providing high quality insight
to the organisation. It is clear CCE remains focused on continuing its HR
analytical journey.
The future of Data Analytics lies in not only describing what has happened,
but in accurately predicting what might happen in the future. This claim is
explained in the article titled The Future of Analytics Is Prescriptive, Not
Predictive. This article cites a GPS navigation system, where Descriptive
Analytics is used to provide directional cues. However, such analysis is
reinforced by “Predictive Analytics” offering important details about the
journey like the time duration. Now, if the GPS system is further powered
by Prescriptive Analytics, then the navigation system will not only provide
directions and time, but also the quickest way to reach the destination. The
best part of such a super-charged navigation system is that it can even
compare several traveling routes and recommend the best solution.
251
DESCRIPTIVE ANALYTICS
7.10 SUMMARY
252
DESCRIPTIVE ANALYTICS
The future of Data Analytics lies in not only describing what has happened,
but in accurately predicting what might happen in the future. This claim is
explained in the article titled The Future of Analytics Is Prescriptive, Not
Predictive. This article cites a GPS navigation system, where Descriptive
Analytics is used to provide directional cues. However, such analysis is
reinforced by “Predictive Analytics” offering important details about the
journey like the time duration. Now, if the GPS system is further powered
by Prescriptive Analytics, then the navigation system will not only provide
directions and time, but also the quickest way to reach the destination. The
best part of such a super-charged navigation system is that it can even
compare several traveling routes and recommend the best solution.
253
DESCRIPTIVE ANALYTICS
254
DESCRIPTIVE ANALYTICS
255
DESCRIPTIVE ANALYTICS
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
256
DIAGNOSTIC ANALYTICS
Chapter 8
Diagnostic Analytics
Objectives:
Structure:
8.1 Introduction
8.2 Definition
8.7 Summary
257
DIAGNOSTIC ANALYTICS
8.1 INTRODUCTION
Diagnostic analytics really fits into spectrum of analysis, really going from
basic to more complex. We are going to discuss about diagnostic analytics,
which is really the most abstract of any of phases of analysis. And it really
answers the questions of “why”. Why things are happening? What's driving
things to go up, or down, or anything along those lines?
All these kind of questions i.e. why are things happening? What's causing
things to go on? will be answered in the diagnostic analytics.
258
DIAGNOSTIC ANALYTICS
8.2 DEFINITION
259
DIAGNOSTIC ANALYTICS
Indirect tax analytics is a wide term used to describe the identification and
analysis of indirect tax issues through data interrogation. Approaches
range from one-off reports for specific issues, to custom-made tools
developed for in-house use, to continuous monitoring by third-party
providers.
The complexity of indirect tax reporting means that many — and possibly
most — multinational companies have significant indirect tax exposures. At
the same time, many are becoming aware of the cost of indirect taxes —
including high duty rates, unclaimed input VAT/GST, penalties and the costs
of financing indirect tax payments.
Tax and customs administrations are also becoming more active and more
sophisticated in their methods of auditing large companies’ indirect tax
affairs. But, until recently, most companies have not been able to identify
the major indirect tax risks they carry, nor have they been in a position to
optimize their working capital and cash flow on a global basis.
260
DIAGNOSTIC ANALYTICS
261
DIAGNOSTIC ANALYTICS
• As companies begin to outsource tax compliance and run their own data
warehousing and dash-boarding solutions, their analysis of tax and trade
data is becoming much more proactive. And as companies use data
analysis tools more effectively, and their understanding improves,
processes become more streamlined, response times fall, opportunities
increase and the number of unpleasant tax surprises drops considerably
262
DIAGNOSTIC ANALYTICS
Diagnostic analytics lets you understand your data faster to answer critical
workforce questions. Cornerstone View provides the fastest and simplest
way for organizations to gain more meaningful insight into their employees
and solve complex workforce issues. Interactive data visualization tools
allow managers to easily search, filter and compare people by centralizing
information from across the Cornerstone unified talent management suite.
For example, users can find the right candidate to fill a position, select high
potential employees for succession, and quickly compare succession
metrics and performance reviews across select employees to reveal
meaningful insights about talent pools. Filters also allow for a snapshot of
employees across multiple categories such as location, division,
performance and tenure.
263
DIAGNOSTIC ANALYTICS
2. Drill into the analytics (discovery): Analysts must identify the data
sources that will help them explain these anomalies. Often, this step
requires analysts to look for patterns outside the existing data sets, and
it might require pulling in data from external sources to identify
correlations and determine if any of them are causal in nature.
In the past, all of these functions would be completely manual; they would
rely on the abilities of an analyst to identify anomalies, detect patters, and
determine relationships. In that setting, a few of the most experienced
analysts would outperform their peers. However, even those top analysts
wouldn’t be able to guarantee consistency or results. As data volume,
variety, and velocity has increased, such purely manual efforts for
diagnostic analytics are no longer feasible.
264
DIAGNOSTIC ANALYTICS
people. Just as machines can be used to help reduce the bias in human
decision making, so should people be used to contextualize the outputs of
machine decision making.
Thus, Diagnostic analytics lets you understand your data faster to answer
critical workforce questions. Cornerstone View provides the fastest and
simplest way for organizations to gain more meaningful insight into their
employees and solve complex workforce issues. Interactive data
visualization tools allow managers to easily search, filter and compare
people by centralizing information from across the Cornerstone unified
talent management suite. For example, users can find the right candidate
to fill a position, select high potential employees for succession, and
quickly compare succession metrics and performance reviews across select
employees to reveal meaningful insights about talent pools. Filters also
allow for a snapshot of employees across multiple categories such as
location, division, performance and tenure.
The world revolves around data, and every industry uses analytics to make
informed decisions. However, lack of understanding of what advanced
analytics is, and is not, dissuades organizations from examining their
potential to improve supply chain processes.
The analytics value chain typically starts with gathering data from all
possible relevant sources. These are analysed in real time to answer the
“What” question. For each “What” reply, the related historical data is then
dissected further to understand the reasons why it was happening. This is
what we call “Diagnostic Analytics”.
The results of this stage are then investigated to predict what are the
(desired) outcomes that can be potentially created. All this distilled
information is used to arrive at actionable insights that create business
value.
265
DIAGNOSTIC ANALYTICS
The process starts at the descriptive analytics phase and moves into the
predictive analytics stage.
The analyst also needs to make it clear what data is relevant to the
analysis so that the relationship between the two data sets is clear.
266
DIAGNOSTIC ANALYTICS
First, you look for a root cause. Perhaps there was a change in ad spend, a
rise in cart abandonments, or even a change in Google’s algorithm which
has affected your web traffic.
Finding nothing, you then look at one of the data sets which contribute to
revenue: impressions, clicks, conversions, and new customer sign-ups.
You discover from the data that changes in revenue closely tracks changes
in new customer sign-ups, and so you isolate these two data series in a
graph showing the relationship. This then leaves you, or one of your
colleagues, to conduct diagnostic analysis on user registrations to find out
why they are down.
267
DIAGNOSTIC ANALYTICS
2. Do the analysis
More complex analyses, however, may require multiple data sets and the
search for a correlation using regression analysis.
How to carry out regression analysis is beyond the scope of this chapter.
268
DIAGNOSTIC ANALYTICS
What you are trying to accomplish in this step is to find a statistically valid
relationship between two data sets, where the rise (or fall) in one causes a
rise (or fall) in another.
More advanced techniques in this area include data mining and principal
component analysis, but straightforward regression analysis is a great
place to get started.
It does not have to include all of the background work, but you should:
Here are a few more things to keep in mind when doing diagnostic
analytics.
Correlation does not prove causation. Correlation will tell you when
two variables (say clicks and conversions) move in sync with one
another.
269
DIAGNOSTIC ANALYTICS
Analysts can do better, though. They can provide further insights into the
data by using diagnostic analytics to try and explain why certain things
happen.
Not only will this help the user to understand why some decisions have
been made, but it also provides evidence that the report writer
understands the data and the point of collecting it. That is, we collect data
so that we can make better-informed decisions through analytics.
270
DIAGNOSTIC ANALYTICS
8.7 SUMMARY
The world revolves around data, and every industry uses analytics to make
informed decisions. However, lack of understanding of what advanced
analytics is, and is not, dissuades organizations from examining their
potential to improve supply chain processes.
The analytics value chain typically starts with gathering data from all
possible relevant sources. These are analysed in real time to answer the
“What” question. For each “What” reply, the related historical data is then
dissected further to understand the reasons why it was happening. This is
what we call “Diagnostic Analytics”.
The results of this stage are then investigated to predict what are the
(desired) outcomes that can be potentially created. All this distilled
information is used to arrive at actionable insights that create business
value.
The process starts at the descriptive analytics phase and moves into the
predictive analytics stage.
271
DIAGNOSTIC ANALYTICS
Analysts can do better, though. They can provide further insights into the
data by using diagnostic analytics to try and explain why certain things
happen.
272
DIAGNOSTIC ANALYTICS
273
DIAGNOSTIC ANALYTICS
274
DIAGNOSTIC ANALYTICS
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
Video Lecture
275
PREDICTIVE ANALYTICS
Chapter 9
Predictive Analytics
Objectives:
Structure:
9.1 Introduction
9.2 Definition
9.10 Summary
276
PREDICTIVE ANALYTICS
9.1 INTRODUCTION
Predictive analytics starts with a business goal to use data to reduce waste,
save time, or cut costs. The process harnesses heterogeneous, often
massive, data sets into models that can generate clear, actionable
outcomes to support achieving that goal, such as less material waste, less
s t o c k e d i n v e n t o r y, a n d m a n u f a c t u r e d p r o d u c t t h a t m e e t s
specificationsPredictive analytics draws its power from a wide range of
methods and technologies, including big data, data mining, statistical
modeling, machine learning and assorted mathematical processes.
Organizations use predictive analytics to sift through current and historical
data to detect trends and forecast events and conditions that should occur
at a specific time, based on supplied parameters.
277
PREDICTIVE ANALYTICS
Increasing Competition
With increased competition, businesses seek an edge in bringing products
and services to crowded markets. Data-driven predictive models can help
companies solve long-standing problems in new ways.
To extract value from big data, businesses apply algorithms to large data
sets using tools such as Hadoop and Spark. The data sources might consist
of transactional databases, equipment log files, images, video, audio,
sensor, or other types of data. Innovation often comes from combining
data from several sources.
With all this data, tools are necessary to extract insights and
trends. Machine learning techniques are used to find patterns in data and
to build models that predict future outcomes. A variety of machine learning
algorithms are available, including linear and nonlinear regression, neural
networks, support vector machines, decision trees, and other algorithms.
278
PREDICTIVE ANALYTICS
9.2 DEFINITION
1. Why It Matters
2. How It Works
279
PREDICTIVE ANALYTICS
Let us presume that , we are all familiar with predictive models for weather
forecasting. A vital industry application of predictive models relates to
energy load forecasting to predict energy demand.
In this case, energy producers, grid operators, and traders need accurate
forecasts of energy load to make decisions for managing loads in the
electric grid. Vast amounts of data are available, and using predictive
analytics, grid operators can turn this information into actionable insights.
Now let us understand -Step-by-Step Workflow for Predicting Energy
Loads. Typically, the workflow for a predictive analytics application follows
these basic steps:
280
PREDICTIVE ANALYTICS
281
PREDICTIVE ANALYTICS
Predictive analytics makes looking into the future more accurate and
reliable than previous tools. As such it can help adopters find ways to save
and earn money. Retailers often use predictive models to forecast
inventory requirements, manage shipping schedules and configure store
layouts to maximize sales. Airlines frequently use predictive analytics to set
ticket prices reflecting past travel trends. Hotels, restaurants and other
hospitality industry players can use the technology to forecast the number
of guests on any given night in order to maximize occupancy and revenue.
Predictive analytics can also be used to detect and halt various types of
criminal behaviour before any serious damage is inflected. By using
predictive analytics to study user behaviours and actions, an organization
can detect activities that are out of the ordinary, ranging from credit card
fraud to corporate spying to cyberattacks.
The financial industry, with huge amounts of data and money at stake, has
long embraced predictive analytics to detect and reduce fraud, measure
credit risk, maximize cross-sell/up-sell opportunities and retain valuable
customers. Commonwealth Bank uses analytics to predict the likelihood of
fraud activity for any given transaction before it is authorized – within 40
milliseconds of the transaction initiation.
282
PREDICTIVE ANALYTICS
Develop credit risk models. Forecast financial market trends. Predict the
impact of new policies, laws and regulations on businesses and markets.
• Retail
Since the now infamous study that showed men who buy diapers often buy
beer at the same time, retailers everywhere are using predictive analytics
for merchandise planning and price optimization, to analyze the
effectiveness of promotional events and to determine which offers are most
appropriate for consumers. Staples gained customer insight by analyzing
behaviour, providing a complete picture of their customers, and realizing a
137 percent ROI.
• Health Insurance
In addition to detecting claims fraud, the health insurance industry is
taking steps to identify patients most at risk of chronic disease and find
what interventions are best. Express Scripts, a large pharmacy benefits
company, uses analytics to identify those not adhering to prescribed
treatments, resulting in a savings of $1,500 to $9,000 per patient.
283
PREDICTIVE ANALYTICS
• Manufacturing
For manufacturers it's very important to identify factors leading to reduced
quality and production failures, as well as to optimize parts, service
resources and distribution. Lenovo is just one manufacturer that has used
predictive analytics to better understand warranty claims – an initiative
that led to a 10 to 15 percent reduction in warranty costs.
• Automotive
Breaking new ground with autonomous vehicles companies developing
driver assistance technology and new autonomous vehicles use predictive
analytics to analyze sensor data from connected vehicles and to build
driver assistance algorithms.
• Aerospace
Monitoring aircraft engine health to improve aircraft up-time and reduce
maintenance costs, an engine manufacturer created a real-time analytics
application to predict subsystem performance for oil,fuel, lift-off,
mechanical health, and controls.
284
PREDICTIVE ANALYTICS
Predictive analytics tools give users deep, real-time insights into an almost
endless array of business activities. Tools can be used to predict various
types of behaviour and patterns, such as how to allocate resources at
particular times, when to replenish stock or the best moment to launch a
marketing campaign, basing predictions on an analysis of data collected
over a period of time.
Virtually all predictive analytics adopters use tools provided by one or more
external developers. Many such tools are tailored to meet the needs of
specific enterprises and departments.
285
PREDICTIVE ANALYTICS
Predictive models use known results to develop (or train) a model that can
be used to predict values for different or new data. Modeling provides
results in the form of predictions that represent a probability of the target
variable (for example, revenue) based on estimated significance from a set
of input variables.
This is different from descriptive models that help you understand what
happened, or diagnostic models that help you understand key relationships
and determine why something happened. Entire books are devoted to
analytical methods and techniques. Complete college curriculums delve
deeply into this subject. But for starters, here are a few basics.
286
PREDICTIVE ANALYTICS
Three of the most widely used predictive modeling techniques are decision
trees, regression and neural networks.
287
PREDICTIVE ANALYTICS
II. Decision trees are classification models that partition data into
subsets based on categories of input variables. This helps you
understand someone's path of decisions. A decision tree looks like a tree
with each branch representing a choice between a number of
alternatives, and each leaf representing a classification or decision. This
model looks at the data and tries to find the one variable that splits the
data into logical groups that are the most different. Decision trees are
popular because they are easy to understand and interpret. They also
handle missing values well and are useful for preliminary variable
selection. So, if you have a lot of missing values or want a quick and
easily interpretable answer, you can start with a tree.
288
PREDICTIVE ANALYTICS
While getting started in predictive analytics isn't exactly a snap, it's a task
that virtually any business can handle as long as one remains committed to
the approach and is willing to invest the time and funds necessary to get
the project moving. Beginning with a limited-scale pilot project in a critical
business area is an excellent way to cap start-up costs while minimizing
the time before financial rewards begin rolling in. Once a model is put into
action, it generally requires little upkeep as it continues to grind out
actionable insights for many years.
289
PREDICTIVE ANALYTICS
Detecting Fraud
Optimising Marketing
Campaigns
Improving Operations
Reducing Risk
290
PREDICTIVE ANALYTICS
Baker Hughes trucks are equipped with positive displacement pumps that
inject a mixture of water and sand deep into drilled wells. With pumps
accounting for about $100,000 of the $1.5 million total cost of the truck,
Baker Hughes needed to determine when a pump was about to fail. They
processed and analysed up to a terabyte of data collected at 50,000
samples per second from sensors installed on 10 trucks operating in the
field, and trained a neural network to use sensor data to predict pump
failures. The software reduced maintenance costs by 30–40%—or more
than $10 million.
291
PREDICTIVE ANALYTICS
Optimization is used to determine the best schedule for heating and cooling
each building throughout the day. The Building IQ platform reduces HVAC
energy consumption in large-scale commercial buildings by 10–25% during
normal operation.
Using MATLAB tools and functions, one can perform predictive analytics
with engineering, scientific, and field data, as well as business and
transactional data. With MATLAB, you can deploy predictive applications to
large-scale production systems, and embedded systems.
292
PREDICTIVE ANALYTICS
293
PREDICTIVE ANALYTICS
9.10 SUMMARY
Though predictive analytics has been around for decades, it's a technology
whose time has come. More and more organizations are turning to
predictive analytics to increase their bottom line and competitive
advantage. Growing volumes and types of data, and more interest in using
data to produce valuable insights.
Predictive analytics tools give users deep, real-time insights into an almost
endless array of business activities. Tools can be used to predict various
types of behaviour and patterns, such as how to allocate resources at
particular times, when to replenish stock or the best moment to launch a
marketing campaign, basing predictions on an analysis of data collected
over a period of time.
Virtually all predictive analytics adopters use tools provided by one or more
external developers. Many such tools are tailored to meet the needs of
specific enterprises and departments.
The financial industry, with huge amounts of data and money at stake, has
long embraced predictive analytics to detect and reduce fraud, measure
credit risk, maximize cross-sell/up-sell opportunities and retain valuable
customers. Commonwealth Bank uses analytics to predict the likelihood of
fraud activity for any given transaction before it is authorized within 40
milliseconds of the transaction initiation.
294
PREDICTIVE ANALYTICS
Predictive models use known results to develop (or train) a model that can
be used to predict values for different or new data. Modeling provides
results in the form of predictions that represent a probability of the target
variable (for example, revenue) based on estimated significance from a set
of input variables.
This is different from descriptive models that help you understand what
happened, or diagnostic models that help you understand key relationships
and determine why something happened.
295
PREDICTIVE ANALYTICS
296
PREDICTIVE ANALYTICS
297
PREDICTIVE ANALYTICS
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
Video Lecture
298
PRESCRIPTIVE ANALYTICS
Chapter 10
Prescriptive Analytics
Objectives:
Structure:
10.1 Introduction
Prescriptive Analytics
R)
10.8 Summary
299
PRESCRIPTIVE ANALYTICS
10.1 INTRODUCTION
The field borrows heavily from mathematics and computer science, using a
variety of statistical methods to create and re-create possible decision
patterns that could affect an organization in different ways.
300
PRESCRIPTIVE ANALYTICS
With the flood of data available to businesses regarding their supply chain
these days, companies are turning to analytics solutions to extract
meaning from the huge volumes of data to help improve decision making
Looking at all the analytic options can be a daunting task. However, luckily
these analytic options can be categorized at a high level into three distinct
types. No one type of analytic is better than another, and in fact they co-
exist with, and complement, each other.
In order for a business to have a holistic view of the market and how a
company competes efficiently within that market requires a robust analytic
environment which includes:
301
PRESCRIPTIVE ANALYTICS
Descriptive analysis or statistics does exactly what the name implies they
“describe”, or summarize raw data and make it something that is
interpretable by humans. They are analytics that describe the past. The
past refers to any point of time that an event has occurred, whether it is
one minute ago, or one year ago. Descriptive analytics are useful because
they allow us to learn from past behaviours, and understand how they
might influence future outcomes.
The vast majority of the statistics we use fall into this category. (Think
basic arithmetic like sums, averages, percent changes.) Usually, the
underlying data is a count, or aggregate of a filtered column of data to
which basic math is applied.
For all practical purposes, there are an infinite number of these statistics.
Descriptive statistics are useful to show things like total stock in inventory,
average dollars spent per customer and year-over-year change in sales.
Predictive analytics has its roots in the ability to “predict” what might
happen. These analytics are about understanding the future. Predictive
analytics provides companies with actionable insights based on data.
Predictive analytics provides estimates about the likelihood of a future
outcome. It is important to remember that no statistical algorithm can
“predict” the future with 100% certainty. Companies use these statistics to
forecast what might happen in the future. This is because the foundation of
predictive analytics is based on probabilities.
302
PRESCRIPTIVE ANALYTICS
These statistics try to take the data that you have, and fill in the missing
data with best guesses. They combine historical data found in ERP, CRM,
HR and POS systems to identify patterns in the data and apply statistical
models and algorithms to capture relationships between various data sets.
Companies use predictive statistics and analytics anytime they want to look
into the future. Predictive analytics can be used throughout the
organization, from forecasting customer behaviour and purchasing patterns
to identifying trends in sales activities. They also help forecast demand
for inputs from the supply chain, operations and inventory.
One common application most people are familiar with is the use of
predictive analytics to produce a credit score. These scores are used by
financial services to determine the probability of customers making future
credit payments on time. Typical business uses include understanding how
sales might close at the end of the year, predicting what items customers
will purchase together, or forecasting inventory levels based upon a myriad
of variables.
Therefore, Use Predictive Analytics any time you need to know something
about the future, or fill in the information that you do not have.
303
PRESCRIPTIVE ANALYTICS
including historical and transactional data, real-time data feeds, and big
data.
304
PRESCRIPTIVE ANALYTICS
Most modern BI tools have prescriptive analytics built in and provide users
with actionable results that empower them to make better decisions. One
of the more interesting applications of prescriptive analytics is in oil and
gas management, where prices fluctuate almost by second based on ever-
changing political, environmental, and demand conditions.
305
PRESCRIPTIVE ANALYTICS
Big Data gets a lot of buzz in the business world. It's true that data
analytics can give you deep, useful insights into your business and its
customers, but only if you use those insights to their full potential.
Predictive and prescriptive analytics are the next steps that help you turn
descriptive metrics into insights and decisions. But you shouldn't rely on
just one or the other; when used in conjunction, both types of analytics
can help you create the strongest, most effective business strategy is
possible.
306
PRESCRIPTIVE ANALYTICS
Analytics in action
Prescriptive analytics not only anticipates what will happen and when it will
happen, but also why it will happen. Further, prescriptive analytics
suggests decision options on how to take advantage of a future opportunity
or mitigate a future risk and shows the implication of each decision option.
Prescriptive analytics can continually take in new data to re-predict and re-
prescribe, thus automatically improving prediction accuracy and prescribing
better decision options. Prescriptive analytics ingests hybrid data, a
combination of structured (numbers, categories) and unstructured data
(videos, images, sounds, texts), and business rules to predict what lies
ahead and to prescribe how to take advantage of this predicted future
without compromising other priorities.
307
PRESCRIPTIVE ANALYTICS
308
PRESCRIPTIVE ANALYTICS
In addition to this variety of data types and growing data volume, incoming
data can also evolve with respect to velocity, that is, more data being
generated at a faster or a variable pace. Business rules define the business
process and include objectives constraints, preferences, policies, best
practices, and boundaries. Mathematical models and computational models
are techniques derived from mathematical sciences, computer science and
related disciplines such as applied statistics, machine learning, operations
research, natural language processing, computer vision, pattern
recognition, image processing, speech recognition, and signal processing.
The correct application of all these methods and the verification of their
results implies the need for resources on a massive scale including human,
computational and temporal for every Prescriptive Analytic project. In
order to spare the expense of dozens of people, high performance
machines and weeks of work one must consider the reduction of resources
and therefore a reduction in the accuracy or reliability of the outcome. The
preferable route is a reduction that produces a probabilistic result within
acceptable limits.
309
PRESCRIPTIVE ANALYTICS
A. Example-1
Energy is the largest industry in the world ($6 trillion in size). The
processes and decisions related to oil and natural gas exploration,
development and production generate large amounts of data. Many types
of captured data are used to create models and images of the Earth’s
structure and layers 5,000 - 35,000 feet below the surface and to describe
activities around the wells themselves, such as depositional characteristics,
machinery performance, oil flow rates, reservoir temperatures and
pressures. Prescriptive analytics software can help with both locating and
producing hydrocarbons by taking in seismic data, well log data, production
data, and other related data sets to prescribe specific recipes for how and
where to drill, complete, and produce wells in order to optimize recovery,
minimize cost, and reduce environmental footprint.
310
PRESCRIPTIVE ANALYTICS
311
PRESCRIPTIVE ANALYTICS
• Pricing
312
PRESCRIPTIVE ANALYTICS
Example-2
313
PRESCRIPTIVE ANALYTICS
“What are the different branches of analytics?” Most of us, when we’re
starting out on our analytics journey, are taught that there are two types –
descriptive analytics and predictive analytics. There’s actually a third
branch which is often overlooked – prescriptive analytics.
Prescriptive analytics is the most powerful branch among the three. Let us
understand with an example.
Recently, a deadly cyclone hit Odisha, India, but thankfully most people
had already been evacuated. The Odisha meteorological department had
already predicted the arrival of the monstrous cyclone and made the life-
saving decision to evacuate the potentially prone regions.
Contrast that with 1999, when more than 10,000 people died because of a
similar cyclone. They were caught unaware since there was no prediction
about the coming storm. So what changed?
314
PRESCRIPTIVE ANALYTICS
The effort to retain customers so far has been very reactive. Only when the
customer calls to close their account is when we take action. The
management team is keen to take more proactive measures on this front.
Data scientists are tasked with analyzing their data, deriving insights,
predicting the potential behaviour of customers, and then recommending
steps to improve performance.
• Hypothesis Generation
315
PRESCRIPTIVE ANALYTICS
If you have to test the same for telecom provider. Typically, you have to
encourage the company to come up with an exhaustive set of hypotheses
so as not to leave out any variables or major points.
Now you have data set, the problem statement and the hypothesis to test,
it’s time to get what insights can be drawn.
The approach is to go through similar steps. Note that , you may be
removing variables with more than 30% missing value or you can take
your own call on this.
316
PRESCRIPTIVE ANALYTICS
Here’s the code to find the variables with more than 30% missing values:
As you can see in the above illustration, where all variables with more than
30% missing values are removed . Here’s the summary of our dataset:
First, we will analyze the mean minutes of usage, revenue range, mean
total monthly recurring charge and the mean number of dropped or
blocked calls against the target variable – churn:
Similarly, we shall analyze the mean number of dropped (failed) voice calls,
the total number of calls over the life of the customer, the range of the
number of outbound wireless to wireless voice calls and the mean number
of call waiting against the churn variable:
If you change things up a bit. You will. use the faceting functionality in the
awesome “ggplot2”package to plot the months of usage, credit class code,
call drops and the number of days of current equipment against the churn
variable:
You will then analyze the numeric variable separately to see if there are
any features that have high degrees of collinearity. This is because the
presence of collinear variables always reduces the model’s performance
since they introduce bias into the model.
You can then handle the collinearity problem. Now, there are many ways of
dealing with it, such as variable transformation and reduction using
principal component analysis (PCA). Now let us remove the highly
correlated variables:
317
PRESCRIPTIVE ANALYTICS
This is the part most of you may be familiar with – building models on
the training data. You can build a number of models so that you can
compare their performance across the spectrum.
And now comes the part we’ve been waiting for – prescriptive analytics!
Let’s see what recommendations we can come up with to improve the
performance of our model.
318
PRESCRIPTIVE ANALYTICS
The below is summary statistics from the logistic model proves that:
319
PRESCRIPTIVE ANALYTICS
10.8 SUMMARY:
The Prescriptive analytics is the third and final phase of business analytics,
which also includes descriptive and predictive analytics.
320
PRESCRIPTIVE ANALYTICS
and inventory in the supply chain to make sure that are delivering the right
products at the right time and optimizing the customer experience.
Most modern BI tools have prescriptive analytics built in and provide users
with actionable results that empower them to make better decisions. One
of the more interesting applications of prescriptive analytics is in oil and
gas management, where prices fluctuate almost by second based on ever-
changing political, environmental, and demand conditions.
321
PRESCRIPTIVE ANALYTICS
significant effort at data transformation. More than 80% of the world's data
today is unstructured, according to IBM.
What the self-driving car will deliver is a (fundamental) change in the car
driving experience. Likewise, the impact of prescriptive analytics, AI, and
ML in the workplace will change the work experience and redefine the jobs
and roles. In organizations and business, we will see the growing presence
of Augmented Decision Making through more informed, prescriptive
analytics that helps and guides decision-makers to examine and determine
the best course of action. Focusing on prescriptive analytics, AI, and ML on
use cases that add value to people’s capabilities and performance as well
as process value is essential to successful organizational adoption.
322
PRESCRIPTIVE ANALYTICS
2. In the financial sector the type of analytics that can be used throughout
the organization, from forecasting customer behaviour and purchasing
patterns to identifying trends in sales activities is --------------which
also help forecast demand for inputs from the supply chain, operations
and inventory.
a. Prescriptive Analytics
b. Descriptive Analytics
c. Predictive Analytics
d. Diagnostic Analytics
323
PRESCRIPTIVE ANALYTICS
324
PRESCRIPTIVE ANALYTICS
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
Video Lecture
325
BUSINESS ANALYTICS PROCESS
Chapter 11
Business Analytics Process
Objectives:
Structure:
11.1 Introduction
11.5 Summary
326
BUSINESS ANALYTICS PROCESS
11.1 INTRODUCTION:
There are many types of analytics, and there is a need to organize these
types to understand their uses. These types of analytics can be viewed
independently. For example, some firms may only use descriptive analytics
to provide information on decisions they face. Others may use a
combination of analytic types to glean insightful information needed to plan
and make decisions. This is summarised as under:
327
BUSINESS ANALYTICS PROCESS
The purposes and methodologies used for each of the three types of
analytics differ, as can be seen in Table below. It is these differences that
distinguish analytics from business analytics. Whereas analytics is focused
on generating insightful information from data sources, business analytics
goes the extra step to leverage analytics to create an improvement in
measurable business performance. Whereas the process of analytics can
involve any one of the three types of analytics, the major components of
business analytics include all three used in combination to generate new,
unique, and valuable information that can aid business organization
decision-making. In addition, the three types of analytics are applied
sequentially (descriptive, then predictive, then prescriptive).
Therefore, business analytics (BA) can be defined as a process beginning
with business-related data collection and consisting of sequential
application of descriptive, predictive, and prescriptive major analytic
components, the outcome of which supports and demonstrates business
decision-making and organizational performance.
328
BUSINESS ANALYTICS PROCESS
329
BUSINESS ANALYTICS PROCESS
questions like why something is happening, what new trends may exist,
what will happen next, and what is the best course for the future.
Thus, BA includes the same procedures as in plain analytics but has the
additional requirement that the outcome of the analytic analysis must
make a measurable impact on business performance. BA includes reporting
results like BI but seeks to explain why the results occur based on the
analysis rather than just reporting and storing the results, as is the case
with BI. In the below table Characteristics of analytics , business analytics
and business intelligence is summarised to know it better in comparison of
each other.
330
BUSINESS ANALYTICS PROCESS
331
BUSINESS ANALYTICS PROCESS
The size of some data sources can be unmanageable, overly complex, and
generally confusing. Sorting out data and trying to make sense of its
informational value requires the application of descriptive analytics as a
first step in the BA process. One might begin simply by sorting the data
into groups using the four possible classifications presented in below Table.
Also, incorporating some of the data into spreadsheets like Excel and
preparing cross tabulations and contingency tables are means of restricting
the data into a more manageable data structure. Simple measures of
central tendency and dispersion might be computed to try to capture
possible opportunities for business improvement. Other descriptive analytic
summarization methods, including charting, plotting, and graphing, can
help decision makers visualize the data to better understand content
opportunities. The types of Data Measurement Classification Scales are as
under:
332
BUSINESS ANALYTICS PROCESS
Type of Data
Measurement Description
Scale
Data that is grouped by one or more characteristics. Categorical data
usually involves cardinal numbers counted or expressed as
percentages. Example 1: Product markets that can be characterized
Categorical
by categories of “high-end” products or “low-income” products,
Data based on dollar sales. It is common to use this term to apply to data
sets that contain items identified by categories as well as
observations summarized in cross-tabulations or contingency tables.
Data that is ranked or ordered to show relational preference.
Example 1: Football team rankings not based on points scored but
Ordinal Data on wins. Example 2: Ranking of business firms based on product
quality.
Data that is arranged along a scale where each value is equally
distant from others. It is ordinal data. Example 1: A temperature
gauge. Example 2: A survey instrument using a Likert scale (that is,
Interval Data 1, 2, 3, 4, 5, 6, 7), where 1 to 2 is perceived as equidistant to the
interval from 2 to 3, and so on. Note: In ordinal data, the ranking of
firms might vary greatly from first place to second, but in interval
data, they would have to be relationally proportional.
Data expressed as a ratio on a continuous scale. Example 1: The
Ratio Data ratio of firms with green manufacturing programs is twice that of
firms without such a program.
333
BUSINESS ANALYTICS PROCESS
A single or multiple regression model can often forecast a trend line into
the future. When regression is not practical, other forecasting methods
(exponential smoothing, smoothing averages) can be applied as predictive
analytics to develop needed forecasts of business trends. The identification
of future trends is the main output of Step 2 and the predictive analytics
used to find them. This helps answer the question of what will happen.
If a firm knows where the future lies by forecasting trends as they would in
Step 2 of the BA process, it can then take advantage of any possible
opportunities predicted in that future state. In Step 3, Prescriptive
Analytics analysis, operations research methodologies can be used to
optimally allocate a firm’s limited resources to take best advantage of the
opportunities it found in the predicted future trends. Limits on human,
technology, and financial resources prevent any firm from going after all
opportunities they may have available at any one time. Using prescriptive
analytics allows the firm to allocate limited resources to optimally achieve
objectives as fully as possible. For example, linear programming (a
constrained optimization methodology) has been used to maximize the
334
BUSINESS ANALYTICS PROCESS
profit in the design of supply chains. This third step in the BA process
answers the question of how best to allocate and manage decision-making
in the future.
335
BUSINESS ANALYTICS PROCESS
336
BUSINESS ANALYTICS PROCESS
The decision-making foundation that has served ODMP for many decades
parallels the BA process. The same logic serves both processes and
supports organization decision-making skills and capacities.
Following are the 8-step business analysis process steps that you can
apply whether you are in an agile environment or a traditional one,
whether you are purchasing off-the-shelf software or building custom code,
whether you are responsible for a multi-million dollar project or a one-
week project.
Depending on the size and complexity of your project, you can go through
these steps quickly or slowly, but to get to a successful outcome you must
go through them.
First, take a look at this process flow below which shows how the 8 steps
fit together and how you might iterate through them on a typical business
analyst project.
337
BUSINESS ANALYTICS PROCESS
Often as business analysts, we are expected to dive into a project and start
contributing as quickly as possible to make a positive impact. Sometimes
the project is already underway. Other times there are vague notions about
what the project is or why it exists. We face a lot of ambiguity as business
analysts and it’s our job to clarify the scope, requirements, and business
objectives as quickly as possible.
But that doesn’t mean that it makes sense to get ourselves knee-deep into
the detailed requirements right away. Doing so very likely means a quick
start in the wrong direction.
Taking some time, whether that’s a few hours, few days, or at the very
most a few weeks, to get oriented will ensure you are not only moving
338
BUSINESS ANALYTICS PROCESS
• Clarifying your role as the business analyst so that you are sure to create
deliverables that meet stakeholder needs.
• Determining the primary stakeholders to engage in defining the project’s
business objectives and scope, as well as any subject matter experts, to
be consulted early in the project.
• Understanding the project history so that you don’t inadvertently repeat
work that’s already been done or rehash previously made decisions.
• Understanding the existing systems and business processes so you have
a reasonably clear picture of the current state that needs to change.
This is where you learn how to learn what you don’t know, so to speak.
This step gets you the information you need to be successful and effective
in the context of this particular project.
It’s very common for business analysts and project managers to jump right
in to defining the scope of the project. However, this can lead to
unnecessary headaches. Uncovering and getting agreement on the
business needs early in a project and before scope is defined is the
quickest path forward to a successful project.
339
BUSINESS ANALYTICS PROCESS
Discovering the primary business objectives sets the stage for defining
scope, ensuring that you don’t end up with a solution that solves the wrong
problem or, even worse, with a solution that no one can even determine is
successful or not.
A clear and complete statement of scope provides your project team the
go-forward concept to realize the business needs. Scope makes the
business needs tangible in such a way that multiple project team
participants can envision their contribution to the project and the
implementation.
340
BUSINESS ANALYTICS PROCESS
Your business analysis plan will bring clarity to the business analysis
process that will be used to successfully define the detailed requirements
for this project. Your business analysis plan is going to answer many
questions for you and your project team.
341
BUSINESS ANALYTICS PROCESS
342
BUSINESS ANALYTICS PROCESS
All of these efforts help the implementation team fulfil the intended
benefits of the project and ensure the investment made realizes a
positive return.
Your technology team can deliver a beautiful shiny new solution that
theoretically meets the business objectives, but if your business users
don’t use it as intended and go back to business-as-usual, your
project won’t have delivered on the original objectives. Business
analysts are increasingly getting involved in this final phase of the project
to support the business.
This step is all about ensuring all members of the business community are
prepared to embrace the changes that have been specified as part of the
project.
343
BUSINESS ANALYTICS PROCESS
In this flurry of activity and a focus on delivery, it’s easy to lose track of the
big picture. Why are we making all these changes and what value do they
deliver for the organization? And even more importantly, are we still on
track? Meaning, is the solution we’re delivering actually delivering the value
we originally anticipated?
After completing this step, it’s likely you’ll uncover more opportunities to
improve the business which will lead you to additional projects. And so the
cycle begins again!
344
BUSINESS ANALYTICS PROCESS
11.5 SUMMARY
One of the reasons for the flourishing of business analytics as a tool is that
it can be applied in any industry where data is captured and accessible.
This data can be used for a variety of reasons, ranging from improving
customer service as well improving the organisation’s capability to predict
fraud to offering valuable insights on online and digital information.
However business analytics is applied, the key outcome is the same: The
solving of business problems using the relevant data and turning it into
insights, providing the enterprise with the knowledge it needs to
proactively make decisions. In this way the enterprise will gain a
competitive advantage in the marketplace. Essentially, business analytics is
a 7-8 step process, outlined below.
345
BUSINESS ANALYTICS PROCESS
Once the data has been cleaned, the analyst will try to make better sense
of the data. The analyst will plot the data using scatter plots (to identify
possible correlation or non-linearity). He will visually check all possible
slices of data and summarise the data using appropriate visualisation and
descriptive statistics (such as mean, standard deviation, range, mode,
median) that will help provide a basic understanding of the data. At this
stage, the analyst is already looking for general patterns and actionable
insights that can be derived to achieve the business goal.
346
BUSINESS ANALYTICS PROCESS
347
BUSINESS ANALYTICS PROCESS
1. At what stage, key questions such as, “what data is available”, “how can
we use it”, “do we have sufficient data” must be answered?
a. Defining the business need
b. Exploring the data
c. Analysing the data
d. Optimise the data
348
BUSINESS ANALYTICS PROCESS
349
BUSINESS ANALYTICS PROCESS
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
Video Lecture
350
BUSINESS ANALYTICS APPLICATIONS
Chapter 12
Business Analytics Applications
Objectives:
Structure:
12.1 Introduction
12.11 Summary
351
BUSINESS ANALYTICS APPLICATIONS
12.1 INTRODUCTION
Analysing the data is an important skill for any professional to possess. The
existence of data in its raw collected state has very little use without some
sort of processing.
There are a number of features that are available in Excel to make your
task easier. Some of the main features are:
352
BUSINESS ANALYTICS APPLICATIONS
6. Drag and Drop - it will help you to reposition the data and text by
simply dragging the data with the help of mouse.
9. Shortcut Menus - the commands that are appropriate to the task that
you are doing will appear by clicking the right mouse button.
Before using the sort function or pivot tables, the data must be cleaned .
This means that the first step in data analysis is to go through the data
and ensure that the style of data entry is consistent within the columns. In
this case , for diagnosis it is important to make sure that only one word,
phrase or abbreviation is used to describe each diagnosis. If multiple words
are used to describe the same thing the analysis will be more difficult so it
is best to choose one term and use consistently. It may be necessary to
change the terminology used in the data set in order to be consistent
throughout , such a changes ought to be made at preliminary stage. If
there are multiple diagnoses for single subject, it is important to list
diagnoses separately. It may be necessary to create additional columns
labelled as “Diagnosis 2”, “Diagnosis 3” and so on listing one diagnosis in
each column. It is possible to create multiple pivot tables and manually
add result together.
Using Excel 2016 for Windows, first select the data ( control -A -Select all).
On the top of the excel tool bar, choose the “data” Tab. Then click on the
sort function. In the window that pops up, click sort by “Diagnosis”. To start
again by any other column, click the button in the upper left corner of the
window that says “add level” then click on earlier column used for
diagnosis and then OK button.
353
BUSINESS ANALYTICS APPLICATIONS
Sorting is great tool to identify the trends and to analyse the small amount
of data. Once the data is sorted out by diagnosis and then on another
column used, simply count the number with each diagnosis and record the
breakdown numbers either manually or by using the excel “COUNTIFS”
formula.
With a large data set , manually counting or using the formula to count
can be tedious and create opportunities for error. Pivot table will
automatically sort the data and list values , producing efficient and
accurate information . To create pivot table , select the data , click on
insert tab, then select “pivot table” ( for Macs, click on the “Data” tab ,
followed by “Pivot table”)
Use the column to be sorted and drag to the box with heading “Rows” and
excel provides automatic breakdown which can be used to calculate
percentage and to create graph. Alternatively, to sort first by named
column and then by diagnosis -Switch the order in “Rows “ box. Fields can
be added or removed as necessary. It may be helpful to practice dragging
different fields to different categories in order to develop an understanding
of how pivot table works.
Once the data is analysed it is often used to create a display so that others
an quickly and easily understands the result. One way to do this is to
create a chart using excel. First create another table to more easily show
the breakdown of the numbers with certain diagnosis.
Next click the Pivot Chart under the “ANAYZE” tab and select the option
“stacked Column” , this shows the desired chart and Axis titles , add labels
, format the colour scheme and hide the field setting
354
BUSINESS ANALYTICS APPLICATIONS
When analysing the data, it is critical to report all results even if they
seem insignificant. It is also essential to not lump data analyses together
and make generalisation. For example researcher conducting the study on
effectiveness of visual aid to increase the knowledge of cataract
administers a 10 question survey to patient before and after showing them
the visual aid. Researcher finds that the visual aid increases the overall
number of questions answered correctly. This is a good start , but it is not
enough. It is critical that the researcher analyses the result of each
individual question. Just knowing that the intervention increases overall
knowledge provides little information about the strength and weakness of
intervention. Perhaps the intervention caused a significant increase in
number of people understanding what cataract id , but not the number of
people understanding proper post-operative procedures. This is important
to know because the intervention can then be modified to better convey
the necessary information.
355
BUSINESS ANALYTICS APPLICATIONS
Every enterprise has strategic and secured information about their human
capital. The internal data source may range from employee profile,
compensation and benefits , employee performance , employee
productivity and so on stored in variety of technologies like ERP system ,
OLTP RDBMS, Hadoop ecosystem spread marts , data marts and data
warehouses. Some of the external data system may include compensation
benchmark , Employee sentiments , thought leadership contribution etc.
enterprises have started gaining benefits from the following areas by
deployment of analytics.
• Workforce planning analytics to acquire talents at the right time for right
positions. Human capital analytics will lead to identification of positions
that drive the business results and critical competencies needed for those
positions. Gujarat chemicals has developed the custom modelling tool
that predicts the future hiring needs for each business units and can
adjust its predictions based on industry trends ( acquisition).
• Workforce Talent Development analytics aligned to business goals
(Development).
• Workforce sentiment analytics for enhancing employee engagement
(ability to stimulate business impact of employee attrition)
(Engagement).
• Workforce utilisation analytics to ensure the optimised deployment of
right talent in right functions. This helps to connect employee
performance to business result (Optimisation). Retail companies are use
analytics to predict incoming call centres volume and release hourly
employees early if it I expected to drop.
• Workforce compensation Analytics helps to optimise benefits using big
data sources including performance and benchmarks (Pay).
• Compliance analytics helps to detect any anomalies relating to enterprise
compliance policies and initiate proactive corrective actions
(compliance).
356
BUSINESS ANALYTICS APPLICATIONS
2. IT Analytics
357
BUSINESS ANALYTICS APPLICATIONS
Managers are aware of what has happened and what is happening in the
business. Analytics enables enterprise to move further.
• Customer segmentation:
Customer segmentation allows enterprise to define newer and sizable
group of the target prospects using analytics This enables enterprises to
customise the products and services to new segments and position them
for competitive advantage. It can be more strategic , such as behaviour
based profiling , predictive modelling or customer even based
segmentation.
358
BUSINESS ANALYTICS APPLICATIONS
• Recommendation system
Next best offer models can leverage the behaviour of the similar buyers to
next best product or service or proactively recommend the perfect
solution.
1. Analytics in Telecom
People and devices generate data 24x7 globally in telecom industry.
Whether you are speaking to friend, browsing website , streaming a video,
playing games with friends or making in-app purchase , user activity
generates data about our needs , preferences, spending , complaints and
so on. Communication service providers (CSPs) traditionally have
leveraged the data they generate to make decisions in the areas of
improving financial performance, increasing operational efficiency or
managing the subscriber relationship. They have adopted advanced
reporting and BI tools to bring facts and trends to decision makers.
Strategic focus areas for deploying analytics in CSP business are:
359
BUSINESS ANALYTICS APPLICATIONS
• Network Optimisation:
It is crucial for telecom operators to ensure that all its customers are able
to avail its products at all times. Also the firms need to be frugal when it
comes to allocating resources to network, because any unused capacity is
waste of resources . Analytics helps in better monitoring traffic and in
facilitating capacity planning decisions. Analytical tools leverages data
collected through day to day transactions and helps in both short term
optimisation decisions and long term strategic decision making.
• Predictive Analytics:
With the use of predictive analytics telecom operators can predict the
approximate success rate of new schemes based on the past preferences of
customers. This provides telecom operators with a great strategic
advantage. Predictive analytics helps in targeting the right customer at
right time based on their past behaviour and choices. It helps boosting
revenues by proper planning and reducing the operational cost in the long
term.
• Social analytics
The branding of telecom operators on social media plays a very crucial part
in customer gain and retention. Data generated through the social media
can be interpreted in to meaningful insights using social analytical tools.
The customer sentimental analytics, customer experience and positioning
of the company can be analysed to make the customer experience richer
and smoother. Also the data generated through such platform are much
diverse both geographically and demographically, and hence helps in
gaining a closer to reality customer information.
• Subscriber Acquisition:
CSPs study customer behaviour to identify the most suitable channels and
sales strategy for each product.
• Churn Analytics:
Helps CSPs to not only model the loyalty program , but also predicts churn
and destination CSPs.
360
BUSINESS ANALYTICS APPLICATIONS
• Financial Analytics:
a. Infrastructure analytics:
c. Channel Analytics:
d. Cost reduction:
2. Analytics in Retail:
The retail industry is lucky vertical having greater access to data around
the consume, products they buy and use and different channels that sell
and services products. Data coupled with insights are at the heart of what
drives the retail business.
Technologies like POS, CRM, SCM, Big Data , mobility and social media
offers a means of understand shoppers via numerous digital touch points
ranging from. Their online purchases to their presence social network, to
their visits to brick and mortar stores as well as tweets , images and more.
Even today retailers are grappling with how to meaningfully leverage and
ultimately monetise the hidden insight around huge amount of structured
and unstructured data about the consumer. The value of analytics comes
from 3 sources;
361
BUSINESS ANALYTICS APPLICATIONS
b. Pricing analytics:
It helps retailers to optimise the product pricing , special offers ,
merchandising , loyalty programs and campaigns that attract maximum
number of consumers both from physical stores and online store
perspective.
a. Inventory analytics:
Retailers aim to fulfil consumer demand by optimising stock and ability to
replenish when consumer demand increases due to seasonal effects or as
a result of powerful campaign. This area of analytics will alert store
managers about the potential need of stocking high moving items and
reduce slow moving items.
b. Consumer analytics:
Every region has people with different tests for goods and service levels.
The purpose of consumer analytics is to equip store managers with insight
to customise their product and services to the local consumer profile.
362
BUSINESS ANALYTICS APPLICATIONS
c. Campaign Analytics:
All retailers have digital marketing programs to entire consumers with
value offer. Retails invest in this area of analytics to design most effective
campaign that converts maximum number of consumers to buyers.
d. Fraud detection:
All retailers strives to eliminate fraud relating to payments , shipping and
change of price tag. Analytics can study transactions in real time to detect
fraud and alert store personnel or online commerce team.
a. Web analytics:
Here the different perspectives of each consumers online behaviour such as
surfacing traffic , visitors and conversation trend , location of smart devices
, access to kiosks will be analysed to recommend the best sales approach
in response to each of the consumers real time actions.
363
BUSINESS ANALYTICS APPLICATIONS
3. Analytics in Healthcare:
b. Compliance analytics:
Provide healthcare compliance metrics to regulatory authorities and
benchmark against the world class hospitals using Baldridge criteria.
Improvement in wide spread use of digital data will support audits,
analytics and will improve hospital processes needed for regulatory
compliance.
c. Financial analytics:
This area of analytics leads to enhance ROI ( return on investment),
improved utilisation of hospital infrastructure and human resources,
optimise capital management , optimise supply chain and reduce frauds.
d. Predictive models:
Help healthcare professional go beyond traditional search and analysis of
unstructured data by applying predictive root cause analysis , natural
language and built in medical terminology support to identify trends and
patterns to achieve clinical and operational insight. Healthcare predictive
analytics can help healthcare organisations get to know their patients
364
BUSINESS ANALYTICS APPLICATIONS
better , so that they can understand better their individual patients need,
while delivering quality , cost effective life saving services.
e. Social analytics:
Helps hospitals listen to patients sentiments , requirement affordability and
insurance to model care and wellness programmes customising services by
localisation of needs.
f. Clinical Analytics:
A number of other critical clinical situations can be detected by analytics
applied to HER such as:
❖ Detecting port operative complications
❖ Predicting 30 days risk of re-admission
❖ Risk adjusting hospital mortality rates.
What business questions you are trying to answer? Once you understand
this you need to think about what data is available to you to answer these
questions.
3. What measure of accuracy and granularity are you going to use? Is that
level of summaries good enough for business users?
365
BUSINESS ANALYTICS APPLICATIONS
366
BUSINESS ANALYTICS APPLICATIONS
1. Processing social media data for business benefits, Telecom CSPs stand
to understand the voice of the subscribers, HR will understand the
sentiments of the employees and partners , hospitals discover unmet
needs of patients , IT functions will understand the need of business
users challenges and service level expectations. Social media analytics ,
web analytics or digital s enterprises.
Have you ever wondered , what algorithm Google uses to maximise its
targets ads revenue? What about e-commerce websites which advocates
you through option such as “people who bought this also bought this “or
“how does Facebook automatically suggests to tag friends in the pictures?”
367
BUSINESS ANALYTICS APPLICATIONS
Recommendation systems are not totally new , they take results from
market basket analysis of business data in advance analytic system and
suggests the next best offer or next best action to specific customer. They
are also very popular for making suggestions or recommendations.
368
BUSINESS ANALYTICS APPLICATIONS
Both recommendations and pricing are classic topics for advanced analytics
modelling and both offer possibilities for real time scoring. As we built
more accurate models and train them with real life data, more accurate will
be the recommendations and the prices the company can offer. It will be
great advantage to retailers to change the price dynamically to acquire
more customers when an item is desired , there is a more of willingness to
pay a high price and when less desired the customer will pay less price.
This will play an important role in the decision making process. Discounts
are the way of helping the customers not only choose a particular supplier ,
but to help the customer move to desired state to purchase commitment .
At the same time discounts are expensive. They eat the profit that
company makes. In an ideal world , we would make discount decision
based on customer desire of closing the deal immediately.
369
BUSINESS ANALYTICS APPLICATIONS
The data story especially , also, includes one or more indexes created
based on full text search of the items and their description and other
metadata. The indexes are usually used to improve the real time
performance of the recommendation system. Basic anamolies such as
undue dominance of certain parameters are compared by applying
techniques such as term frequency inverse document. The process of index
creation may also involve pruning of frequently appearing terms and
370
BUSINESS ANALYTICS APPLICATIONS
371
BUSINESS ANALYTICS APPLICATIONS
372
BUSINESS ANALYTICS APPLICATIONS
12.11 SUMMARY
There are a number of features that are available in Excel to make your
task easier. A spreadsheet is a large sheet having data and information
arranged in rows and columns in Excel . As you know, Excel is one of the
most widely used spreadsheet applications. It is a part of Microsoft Office
suite. Spreadsheet is quite useful in entering, editing, analysing and
storing data. Arithmetic operations with numerical data such as addition,
subtraction, multiplication and division can be done using Excel. You can
sort numbers/ characters according to some given criteria (like ascending,
descending etc.)
Before using the sort function or pivot tables, the data must be cleaned .
This means that the first step in data analysis is to go through the data
and ensure that the style of data entry is consistent within the columns. In
this case , for diagnosis it is important to make sure that only one word,
phrase or abbreviation is used to describe each diagnosis. If multiple words
are used to describe the same thing the analysis will be more difficult so it
is best to choose one term and use consistently. It may be necessary to
change the terminology used in the data set in order to be consistent
throughout , such a changes ought to be made at preliminary stage.
Use the column to be sorted and drag to the box with heading “Rows” and
excel provides automatic breakdown which can be used to calculate
percentage and to create graph. Alternatively, to sort first by named
column and then by diagnosis -Switch the order in “Rows “ box. Fields can
be added or removed as necessary. It may be helpful to practice dragging
different fields to different categories in order to develop an understanding
of how pivot table works.
Once the data is analysed it is often used to create a display so that others
an quickly and easily understands the result. One way to do this is to
create a chart using excel. First create another table to more easily show
the breakdown of the numbers with certain diagnosis.
When analysing the data, it is critical to report all results even if they
seem insignificant. It is also essential to not lump data analyses together
and make generalisation.
373
BUSINESS ANALYTICS APPLICATIONS
Every enterprise has strategic and secured information about their human
capital. The internal data source may range from employee profile,
compensation and benefits, employee performance, employee productivity
and so on stored in variety of technologies like ERP system, OLTP RDBMS,
Hadoop ecosystem spread marts, data marts and data warehouses. Some
of the external data system may include compensation benchmark,
Employee sentiments, thought leadership contribution etc. enterprises
have started gaining benefits
374
BUSINESS ANALYTICS APPLICATIONS
• Processing social media data for business benefits, Telecom CSPs stand
to understand the voice of the subscribers, HR will understand the
sentiments of the employees and partners , hospitals discover unmet
needs of patients , IT functions will understand the need of business
users challenges and service level expectations. Social media analytics ,
web analytics or digital s enterprises.
375
BUSINESS ANALYTICS APPLICATIONS
1. Before using the sort function or pivot tables, the data must be
-------------------
a. Cleaned
b. Sorted
c. Analysed
d. Arranged
376
BUSINESS ANALYTICS APPLICATIONS
4. While developing the analytical application if the results are not in line
with what you were expecting then what you will do?
a. Try using different model/ algorithm
b. Consider collecting more or different data
c. Consider redefining or reframing the problem , changing the question
and the means to an answer as you better understand your data and
your environment
d. Use all the above 3 steps in sequential manner.
377
BUSINESS ANALYTICS APPLICATIONS
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
Video Lecture
378
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
Chapter 13
Programming Languages & Softwares Used
in Data Analytics
Objectives:
Structure:
13.1 Introduction
13.8 Summary
379
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
13.1 INTRODUCTION
Similarly, data scientist have developed some software’s that are used in
the business analytics by the various entrepreneurs depending up on type
of their activities and need. Gone are the days when a Data Analyst knew
or worked on just one tool. Anyone who works with data these days is well
versed will multiple software tools. But are there any tools that are
essential for any data analyst? Of course there are! There are some tools
that a data analyst has to know to make work and life that much easier
and efficient. These software’s are also discussed in in brief in this chapter.
380
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
381
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
382
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
13.4.1 Python
In a recent worldwide survey, it was found that 83% of the almost 24,000
data professionals used Python. Data scientists and programmers like
Python because it is a general-purpose and dynamic programming
language. Python seems to be preferred for data science over R because it
ends up being faster than R with iterations less than 1000. It is also said to
be better than R for data manipulation. This language also contains good
packages for natural language processing and data learning and is
inherently object-oriented.
13.4.2 R
13.4.3 Java
383
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
13.4.4 Hadoop
13.4.5 SQL
13.4.6 Julia
384
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
13.4.7 Scala
The best data analytics software for 2020 is Sisense because of its simple
yet powerful functionalities that let you aggregate, visualize, and analyze
data quickly. Moreover, this platform has a scalable architecture that allows
it to handle a wide range of data volumes, making it great for small and
large businesses alike.
The digital age has made it easier for professionals to access data that
would allow you to optimize your business performance. However, to
leverage this information, you will need data analytics software that can
provide you with tools for data mining, organization, analysis, and
385
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
However, there are plenty of factors involved in finding the right analytics
tool for a particular business. From checking its performance to figuring out
how well it plays with other systems, the research process can be
overwhelming. So, to help you, we have compiled the leading products on
the market and assessed their functionalities and usability. This way, it will
be easier for you to determine the best possible data analytics platform for
your operations.
386
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
Results showed that the top challenge for data scientists is dirty
data (36%). Next comes the lack of data science talent (30%), company
politics (27%), not having clear questions (22%), inaccessible data (22%),
and results not used by decision-makers (18%).
There are also problems with the difficulty of explaining data to others with
16% and privacy issues with 14%. Meanwhile, 13% of data professionals
revealed their small organization couldn’t afford to have a data science
team.
For instance, content companies can use a data analytics tool to keep their
audiences by clicking and watching their content. Another example is for
gaming companies to get their hands on relevant data to keep players
active in the game by providing rewards.
387
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
1. Sisense:
Sisense offers a robust data analytics system that brings analytics not just
to data scientists, but to all business users as well. It simplifies business
data analytics even to non-technical users through its set of tools and
features. Insights are extracted instantly by any user using self-service
analytics without hard coding and aggregating modeling. Some of its top
features that enable you to do so include its personalized dashboards,
interactive visualizations, and analytical capabilities.
Its dashboard is one of its top features that enable you to filter, explore,
and mine data in just a few clicks to get instant answers to your questions.
With its in-chip technology, data analytics can be performed faster with
richer insights. Furthermore, it provides you with advanced analytics
through an improved, advanced BI reporting and predictive analytics by
integrating R functions in your formulas.
It is best to test the features and functionalities of the tool first so that
you’ll know if it matches your requirements. To do so, you may sign up for
Sisense free demo.
388
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
2. Looker
Looker is a data analytics platform that allows anyone to ask sophisticated
questions using familiar business terms. It delivers data directly to the
tools and applications used by your team, including custom ones.
In simple terms, the platform gathers and extracts data from various
sources and then loads it into an SQL database. From there, it undergoes
the platform’s agile modeling layer for custom business logic and, finally,
makes it available for all users through dashboards, shared insights, and
explorations.
389
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
3. Zoho Analytics
What’s good about the system is that businesses can embed just about any
report or dashboard in their blogs, websites, and apps. The system even
has state-of-the-art security practices that include connection encryption.
It can also be used by ISVs and developers the solution in the building and
integration of analytical and reporting functionalities into their systems.
Zoho Analytics offers a free trial and test drive its features at no cost.
4. Yellowfin
390
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
Accessible via desktop and mobile devices, Yellowfin also comes with a web
API that lets it integrate with a wide variety of business systems, add-ons,
and widgets. This means you can easily extend its functionalities depending
on the changing needs of your business. Alternatively, you can merge it
with your existing software solutions to streamline your workflow. The
vendor has an appealing free trial where you can tinker with the features
at no cost.
5. Periscope Data
It aggregates distinct data sources into one single source of truth and then
utilizes advanced analytics and BI reporting to make the most out of them.
The platform offers a wide variety of visualizations and charts to choose
from, and you can even create your very own if that’s the more beneficial
course of action to take for your organization.
391
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
data is also made simpler and more efficient thanks to the integrated
Python, R and SQL environments.
6. Domo
Using a holistic view in your system, you can make more informed actions
with the tool’s 7 platform components working together. You are notified
with predictive alerts to bring crucial matters and issues into your attention
with enough time before they make an impact on your organization.
392
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
a. Connected data. Bring your data directly together with over 500 data
connectors from any third-party source such as on the cloud, on-
premise, and proprietary systems.
b. Instant data-driven chat. It has more than 300 interactive charts and
dashboards both for desktop and mobile use.
7. Qlik Sense
393
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
8. GoodData
This smart business application integrates insights directly into your point
of work to expedite the decision-making process. Improvements are also
automated over time as it learns from user actions and is capable of
making data-driven predictions. On top of that, the tool ensures
enterprise-grade security in HIPAA, GDPR, SOC II, and ISO 27001, among
others.
9. Birst
Its specialty lies in its 2-tier approach for end-user data visualization,
querying, and production-oriented business intelligence. You can extract
data and maximize connectivity options in various databases and cloud or
394
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
The tool also lets you build a secure foundation for your analytics and
organize your data in a business-aligned, single source of truth.
Furthermore, it enables you to scale your insights by incorporating
evidence-based insights into your decisions that were previously
unobtainable. This can help you analyze data more smartly.
395
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
One of its top features is its self-service functionality that enables users to
interact and access reports on mobile devices, both online and offline.
When it comes to analytics, the tool also offers a wide selection of analysis
methods ranging from trend analysis, analytical reporting, trend analysis,
and what-if analysis.
396
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
13. MATLAB
397
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
data. You can then directly forecast and predict outcomes by building
predictive models and prototypes. Furthermore, the system lets you
integrate the tool with production IT environments even without recoding
or building a custom infrastructure.
Furthermore, this tool provides you with data that you can transform into
actionable insights for businesses of all sizes to garner a stronger result
across their websites, applications, and offline channels. Specializing in one
of the most important aspects of data analysis, this tool is essential for
building a tight data analysis framework for your organization.
398
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
399
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
Consistent levels of response and service are expected with its Hadoop
YARN-based architecture which makes the tool one of the data access
engines that work in YARN in HDP. This means the solution, along with
other applications, can share a common dataset and cluster with ease.
400
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
The platform offers various tools, including SAP Analytics Cloud and SAP
BusinessObjects BI Suite. They are used for solving specific business needs
and leveraging decision-making. By supporting the collecting IQ of your
business, this tool is reliable in providing a high standard for enterprise
data analytics and BI.
18. Minitab
401
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
19. Strata
The tool is known for being fast, easy, and secure. It has an intuitive
command syntax and a point-and-click interface that streamlines how
analyses are reproduced and documented for review and publication.
Regardless of when they are written, version control ensures the analyses
scripts are accurate and up-to-date to show the same results.
402
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
With Visitor Analytics, you have access to detailed dashboards that have all
the key information you need at a glance. Basically, the software provides
you with all the key data you need on a silver platter. You can also view all
information on your mobile devices, which means you can whip out your
phone to check on traffic stats, page performance, conversion rates, and
other metrics.
403
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
5. SigmaPlot. A data analysis tool to help you create graphs fast, even for
non-technical users. Besides, this software can also integrate with Excel
for data organization and PowerPoint for presenting outputs.
404
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
405
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
way, you not only generate more targeted reports but also prevent your
database from being cluttered with information that you won’t even use.
• Regularly Assess Your Data Models
• You wouldn’t use outdated information to perform analysis, so why
should you utilize outdated data models? To prevent old data models
from having a negative effect on your data analytics efforts, you need to
make it a point to assess these models now and then. Check if you are
ignoring certain data sources or if you have overlooked how certain fields
could affect your model. Perhaps, certain data sources are containing
poor naming standards that are affecting your data analytics model. By
taking this extra step, you can ensure that you are generating accurate
reports that can drive your business forward.
• By taking advantage of these tips as you implement your data analytics
software, you are only a few steps away from reaping all the benefits
that this technology has to offer. Hopefully, our list of 20 best data
analytics tools was able to guide you in finding the right platform for your
operations.
• To sum it up, we highly recommend choosing Sisense. This is because it
offers a code-free self-service analytics system that is great for both
tech-averse and tech-savvy users. Furthermore, it offers highly
customizable dashboards, allowing it to easily adapt to your business’
data analytics and visualization needs.
406
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
13.8 SUMMARY
Data Science is a dynamic field with ever growing technologies and tools.
Since Data Science is a vast field, you must select a specific problem to
tackle. For this, you should select the programming language best suited
for it. The programming languages mentioned above, focus on several key
areas of Data Science and one must always be willing to experiment with
new languages based on the requirements.
The Python, R, JAWA, Hadoop etc are some of the most used languages
along with other languages in in the development of software. Each has
specific characters and accordingly they are used by data scientists.
The digital age has made it easier for professionals to access data that
would allow you to optimize your business performance. However, to
leverage this information, you will need data analytics software that can
provide you with tools for data mining, organization, analysis, and
visualization. Moreover, it should be equipped with AI and advanced
algorithms to transform your raw data into valuable insights instantly. This
way, you can keep up with business trends, and even find ways to further
improve your overall operations.
407
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
However, there are plenty of factors involved in finding the right analytics
tool for a particular business. From checking its performance to figuring out
how well it plays with other systems, the research process can be
overwhelming. So, to help you, we have compiled the leading products on
the market and assessed their functionalities and usability. This way, it will
be easier for you to determine the best possible data analytics platform for
your operations.
The best data analytics software for 2020 is Sisense because of its simple
yet powerful functionalities that let you aggregate, visualize, and analyze
data quickly. Moreover, this platform has a scalable architecture that allows
it to handle a wide range of data volumes, making it great for small and
large businesses alike.
The above information shows that the top challenge for data scientists is
dirty data (36%). Next comes the lack of data science talent (30%),
company politics (27%), not having clear questions (22%), inaccessible
data (22%), and results not used by decision-makers (18%).
408
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
By taking advantage of some of the tips provided at the end of the chapter
as you implement your data analytics software, you may be few steps
away from reaping all the benefits that this technology has to offer. The list
of 20 best data analytics tools was able to guide you in finding the right
platform for your operations.
1. What are the programme languages that are used in data analytics?
Describe.
2. Name any 3 program languages , which according to you are best to use
un data analytics and explain in short.
409
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
2. Which of the languages are much closer to human language and are
also converted into machine language behind the scenes by either the
interpreter or compiler?
a. low-level programming language
b. High level programming language
c. Assembly language
d. Program language
4. Sisense offers a robust data analytics system that brings analytics not
just to data scientists, but to all business users as well why?
a. NLG technology, Data Visualization and Anomaly detection
b. Accessible data, data scheduling and web integration
c. Insightful report, Highly-secure system and Collaboration
d. Data story telling Highly-secure system and Collaboration
410
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
Video Lecture
411
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
Chapter 14
Business Analytics and Digital
Transformation
Objectives:
Structure:
14.1 Introduction
14.2 Definition
transformation
14.9 Summary
412
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
14.1 INTRODUCTION
Across the board, enterprises in all industries —and of different sizes —can
greatly benefit from data analytics. Data analytics can also aid enterprise
automation processes for numerous applications, such as providing insight
about when a machine or a system will fail. Overall, enterprises that
embrace data analytics will see improved productivity, which will enhance
important business decisions.
413
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
14.2 DEFINITION:
For small businesses just getting started, there’s no need to set up your
business processes and transform them later. You can future-proof your
organization from the word go. Building a 21st-century business on stickies
and handwritten ledgers just isn’t sustainable. Thinking, planning, and
building digitally sets you up to be agile, flexible, and ready to grow.
414
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
Finding and sharing information became much easier once it had been
digitized, but the ways in which businesses used their new digital records
largely mimicked the old analog methods. Computer operating systems
were even designed around icons of file folders to feel familiar and less
intimidating to new users. Digital data was exponentially more efficient for
businesses than analog had been, but business systems and processes
were still largely designed around analog-era ideas about how to find,
share, and use information.
415
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
Digital transformation is changing the way business gets done and, in some
cases, creating entirely new classes of businesses. With digital
transformation, companies are taking a step back and revisiting everything
they do, from internal systems to customer interactions both online and in
person. They’re asking big questions like “Can we change our processes in
a way that will enable better decision-making, game-changing efficiencies,
or a better customer experience with more personalization?”
Now we’re firmly entrenched in the digital age, and businesses of all sorts
are creating clever, effective, and disruptive ways of leveraging technology.
Netflix is a great example. It started out as a mail order service and
disrupted the brick-and-mortar video rental business. Then digital
innovations made wide-scale streaming video possible. Today, Netflix takes
on traditional broadcast and cable television networks and production
studios all at once by offering a growing library of on-demand content at
ultracompetitive prices.
Digitization gave Netflix the ability not only to stream video content directly
to customers, but also to gain unprecedented insight into viewing habits
and preferences. It uses that data to inform everything from the design of
its user experience to the development of first-run shows and movies at in-
house studios. That’s digital transformation in action: taking advantage of
available technologies to inform how a business runs.
416
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
Before Netflix, people chose movies to rent by going to stores and combing
through shelves of tapes and discs in search of something that looked
good. Now, libraries of digital content are served up on personal devices,
complete with recommendations and reviews based on user preferences.
417
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
The root of any change in business starts with customers. It has to:
Customer happiness is how you win in business.
418
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
419
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
The iPhone disrupted the status quo for technology adoption in the
workplace. Instead of IT leaders telling employees which approved devices
to use, enough workers asked for iPhones that IT departments eventually
acquiesced. This trend continues today, with more “consumer-grade”
technologies making their way into the workplace. Maybe even more
noteworthy is the flip side of the trend: Enterprise software has started
taking design and functionality cues from the consumer world. Long live
ease of use!
Bonus: When you build digitally from the beginning, it’s much easier to
scale systems as your business grows.
420
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
The examples are never ending. Digital innovations like AI and the IoT are
driving all manner of advancements in the production of everything from
consumer goods to cars and trucks. Optimized manufacturing processes
adapt to changing consumer demand. Cloud-based software affords real-
time visibility into supply chain logistics. Customer experience mapping
powered by machine learning surfaces key insights to help product
planners, marketers, and budget makers alike do their jobs better.
Together, these and many more innovations like them are changing the
way we do business, from every conceivable angle.
But how are these changes taking shape? What does digital transformation
look like in practice, across different parts of an organization? Let’s take a
look at some examples.
421
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
What does digital transformation look like in practice, and how has it
already changed the way we do business? Let’s take a look at examples of
digital innovations in marketing, sales, and service that build closer
customer relationships and empower employees across all industries.
The shift from analog to digital marketing materials helps these efforts in
two key ways. First, digital materials are generally cheaper to produce and
distribute than analog media. Email, in particular, is far less expensive than
print-and-mail campaigns. Second, digital marketing opens the door
to marketing automation, analytics tracking, and dialogue with customers
in ways that analog never could.
Let’s look at some examples from that article that detail how digitally
transforming your messaging strategy can increase customer engagement
and reduce your costs.
422
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
There’s a good reason that the traditional roles of marketing and sales are
being redefined in the digital age. It’s all about the data.
423
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
Salespeople particularly benefit from access to more and better data. When
marketing and sales teams share information across a CRM, and individual
sales reps enter sales activity and keep their pipelines up to date on the
platform, information flows freely throughout an entire organization.
From there, two big things happen. First, more eyes on the same
information means more opportunities to share intelligence across your
entire business. Maybe someone from marketing ops sees a sales rep’s
note about a prospect in the CRM, and shares marketing campaign
activities related to the prospect that helps move the deal along.
Second, as information flows and gathers within your company, you set
yourself up to leverage cutting-edge digital innovations like artificial
intelligence.
With more and more datasets available from external sources, AI systems
can mine marketplace information as well as your own sales history. From
there, the systems look for correlations, patterns, and even anomalies to
give your teams a competitive edge when going after accounts. Combining
AI-driven insights with the tribal knowledge of your teams is perhaps the
ultimate realization of digital transformation for sales.
424
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
Customer service, and our ideas around where service begins and ends,
are being upended by the digital era as much as any other part of
business. Maybe more so.
The “on-demand economy” has quickly grown from a few upstart apps that
hire errand runners and hail cars for busy urbanites to a global movement
to, as Forbes put it, “Uberize the entire economy.” A combination of
smartphone ubiquity, electronic payment systems, and apps designed to
match demand (consumers) to supply (gig workers) in real time has
created a world in which nearly anything you might want is just a swipe
and tap away, around the clock.
425
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
Meeting your customers where they already are is a big part of winning
business in our digital world. Approaching social service with a digital
transformation mindset can really spell the difference between struggling
to keep up with customer needs and turning service calls into opportunities
to grow your brand.
426
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
427
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
More recently, PCs and mobile devices have given way to online and mobile
banking, and cashless payment systems. Consumers now conduct more
and more bank business via the web, including paying bills and sending
funds directly to friends and family. Mobile banking apps let users take
snapshots of paper checks to make remote deposits, and a new wave of
payment systems, including PayPal and Apple Pay, let consumers pay for
everyday purchases with accounts linked directly to their phones, no cash
or plastic card required.
Retail has also been radically transformed in the digital era. Digital
transformation has both impacted the in-store retail experience and
ushered in the age of ecommerce.
Digital technologies have improved the retail experience for consumers and
proprietors alike, enabling everything from loyalty cards and e-coupons to
automated inventory and retail analytics systems. Shoppers who used to
clip coupons from newspapers and magazines now just show their phones
at checkout to access in-store discounts and deals. When they do this,
their purchases are tallied by digital systems that track consumer
behaviour trends, tie into inventory and purchasing systems, and trigger
individualized customer journey events like email and SMS messaging.
Additional personalization of the in-store experience can be enabled by
digital beacons that link to mobile apps to sense when particular shoppers
enter the store. From there, anything from a phone alert to a personal
concierge can be deployed to enhance the retail experience.
428
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
429
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
As discussed, small businesses — even those just getting off the ground —
can leverage a digital transformation mindset to build digital first into their
company culture. What better way to imagine how digital innovation can
benefit customers than by being a digital native yourself in all aspects of
growing and running a business?
If one or more of the items on our checklist rings true, it might be time to
think seriously about developing a digital transformation strategy.
• You’re not getting the referrals that you used to get. More and more
referrals are now shared online, via social media, apps, email, and
messaging. If your business doesn’t have a strong, easy-to-share digital
presence, you could be missing out on referrals.
• Repeat business isn’t repeating like it used to. Customers not coming
back to do business with you again isn’t necessarily a sign that your
products and services aren’t measuring up. Losing repeat business could
be due to competitors’ promotions, lack of follow-up communication on
your part, or any number of other reasons. A digital transformation of
your messaging strategy could shed light on why your repeats have been
dwindling.
430
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
Digging past the surface to understand the root causes of these problems
often leads to the realization that you don’t have the proper visibility into
business data necessary to make good decisions. Many SMBs are built on a
patchwork of applications that don’t talk to each other. Fixing your
technology infrastructure to facilitate sharing and analyzing data across
your business is a key step toward better, more informed decision-making.
Remember that just as digital transformations are about business first, and
digital second, problems with your business data may be signals to look
more closely at how your company is doing business generally. Laurie
McCabe, Co-Founder and Partner at SMB Group, said it well: “In fact, it's
usually situations like these that make you realize you don't have great
visibility into your own business data or, even worse, have lost touch with
what your customers want and need.”
431
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
If you’re seeing red flags and realizing that your business data isn’t
centralized, accessible, and working for you, what’s next? It’s time to craft
a digital transformation strategy.
Even if your company is small and new, and the path to digital
transformation seems clear now, remember that you’re building for the
future. And future you will be bigger. Whether that means more
employees, more revenue, or both, your business will grow. Flexibility and
the ability to stay nimble as your business evolves should be built right into
your digital transformation strategy. Connecting with a Salesforce
MVP online or in person can be a great — and free — resource as you start
thinking about your small business digital transformation strategy.
Working with consultants, partners, and tech vendors can be great for
SMBs because they have the depth of experience and knowledge to help
you figure out the best paths to success. Experienced partners have likely
helped other companies in similar situations, and so can help you find the
most direct paths to meaningful transformation..
Many small business leaders hear the word “consultant” and instinctively
flinch while reaching a hand to guard their wallets. Don’t assume that
getting help is always too expensive — that’s simply not true. Many large
companies offer free advice or trainings for SMBs. Beyond free offerings,
there are all sorts of ways to get advice without spending a lot.
432
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
Technology integration is key. It’s perhaps the number one area SMBs
should be investing in.
433
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
SMBs need to stay focused on getting the capabilities they need now in a
way that will scale as their businesses grow. Today’s business ecosystems
and platforms make it easy for vendors and developers to build apps
tailored to helping SMBs grow. Adopting a scalable platform will help
ensure that the processes and information in your company can flow as
easily as possible. That’s the foundation upon which everything else can be
built.
You don’t need to scrap everything and start over when beginning a digital
transformation, even if you’re transitioning from a snarl of apps that don’t
talk to each other. In fact, the most effective solution is to bridge data
silos, and pull all information into a central space — rather than completely
starting over.
The second part of the process is to unify your data, with the aim of
creating a single, unified view of the customer. Once you’ve built bridges
between fragmented information, you’ll be able to surface useful insights
into customer behaviour and maximize the potential of new technologies
like AI. Looking at your business anew with the benefit of new insights and
tools is what digital transformations are all about.
434
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
14.9 SUMMARY
Information is a critical enterprise asset but it’s still in the early adoption
phase. As businesses focus on digital transformation, this makes data and
analytics strategic priorities.
Businesses do realize that. But they are struggling to make the cultural
shift or commit to the necessary information management and advanced
analytics skills and technology investments.
Finding and sharing information became much easier once it had been
digitized, but the ways in which businesses used their new digital records
largely mimicked the old analog methods. Computer operating systems
were even designed around icons of file folders to feel familiar and less
intimidating to new users. Digital data was exponentially more efficient for
businesses than analog had been, but business systems and processes
were still largely designed around analog-era ideas about how to find,
share, and use information.
435
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
All digital transformations start with the move from analog to digital — that
is, taking information off of paper and putting it into the digital realm.
From there, these basic ideas apply to all businesses and industries:
436
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
437
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
438
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
Video Lecture
439
CASE – STUDIES IN BUSINESS ANALYTICS
Chapter 15
Case – Studies in Business Analytics
Objectives:
On completion of this chapter, you will come to know the big success
stories of the companies who used big data analytics to deliver the
extraordinary results. These success stories are considered as case studies
to better understand the subject matter clearly. The case studies are as
under:
Structure:
15.1 Introduction
15.2 Case study-1: Walmart : How Big Data is used to drive supermarket
Performance
15.3 Case study-2: US Olympic women’s cycling team : How big data
15.4 Case Study-3: US immigration and Customs: How Big Data is used to
15.6 Case study-5: Uber: How big data is at the centre of Uber’s
Transportation Business
15.7 Case study-6: Amazon: How predictive analytics are used to get a
15.8 Summary
440
CASE – STUDIES IN BUSINESS ANALYTICS
15.1 INTRODUCTION:
In this chapter we are attempting to showcase the current state of the art
in big data and provide an overview of some companies and organisations
across different industries are using the big data to deliver the values in
diverse areas. The case studies covered areas including how retails use big
data to predict the trend and consumer behaviour , how government are
using the big data to foil terrorist plot, even how tiny family butcher or zoo
use the big data to improve the performance as well as use of big data in
cities , telecoms, sports, gambling, fashion , manufacturing , motor racing ,
videogaming etc.
Out of the above named activities we have selected few success stories for
your understanding better the bigdata and business analytics subject.
Walmart are the largest retailers in the world and the world’s largest
company by revenue with over 2 million employees and 20,000 stores in
28 countries. With operations on this scale it is no surprise that they have
long seen the values in data analytics. In 2004, when Hurricane Sandy hit
the US , they found unexpected insights could come to light when data was
studied as whole , rather that isolated individual sets. Attempting to
forecast demand for emergency supplies in the face of approaching
Hurricane Sandy, CIO Linda Dillman turned up some surprising statistics.
As well as flashlights and emergency equipment, expected bad weather
had led to an upsurge in scale of strawberry pop Tarts in several other
locations. Extra supplies of these were dispatched to store in Hurricane
Frances’s path in 2012 and sold extremely well.
Walmart have grown their Big Data and analytics department considerably
since then, continuously staying on cutting edge. In 2015, the company
announced they were in the process of creating the worlds largest private
data cloud, to enable the processing of 2.5 petabytes of information every
hour.
441
CASE – STUDIES IN BUSINESS ANALYTICS
The culmination of this strategy was referred to as the Data Café-A state of
art analytics hub at their Bentonville , Arkansas headquarters. At the Café ,
the analytics team can monitor 200 streams of internal and external data
in real time, including 40 -petabyte database of all the sale transactions in
the previous week.
Teams from any apart of business are invited to visit the Café with their
data problems and work with the analysts to devise a solution. There is
also a system which monitors performance indicators across the company
and triggers the automated alerts when they hit a certain level-inviting the
teams responsible for them to talk to the data team about possible
solutions.
442
CASE – STUDIES IN BUSINESS ANALYTICS
443
CASE – STUDIES IN BUSINESS ANALYTICS
444
CASE – STUDIES IN BUSINESS ANALYTICS
Once a new Analyst starts at Walmart, they are put through their Analytics
rotation program. This sees them moved through each different team with
responsibility to Analytical work., to allow them to gain a broad overview of
how analytics is used across business.
Bricks “n” mortar retail may be seen as “low tech “-almost stone age , in
fact- compared to their flashy , online rivals but Walmart have shown that
cutting age Big Data ias just as relevant to them as it is to Amazon or
Alibaba. Despite the seemingly more convenient options on offer , it
appears that customers , whether through habit or preference , are still
willing to get in their cars and travel to shop to buy things in person. This
means there is a still huge market out there for the taking and business
that makes the best use of analytics in order to drive efficiency and
improve their customer’s experience are set to prosper.
445
CASE – STUDIES IN BUSINESS ANALYTICS
Background:
We are aware that at various points sports and data analytics are becoming
fast friends. This is story of how US women’s cycling team went from
underdogs to silver medallists at the 2012 London Olympic - thanks to
power of data analytics.
The team were struggling when they turned to their friends, family and
community for help. A diverse group of volunteers were formed, made up
individuals in the sports and digital health communities, led by sky
Christopherson. Christopherson was an Olympic cyclist and the world
record holder for 200 meter velodrome sprint in the 35+ category. He had
achieved this using a training regime he designed himself , based on data
analytics and originally inspired by the work of Cardiologist Dr Eric Topol.
446
CASE – STUDIES IN BUSINESS ANALYTICS
In this case the depth of analytics meant that Christopherson was able to
drill right down to what he call “Individual optimal zones”. With this
information , tailored programs could be tweaked for each athlete to get
best out of every team member. For example-one insight which came up
was that the cyclist Jenny Reed performed much better in the training if
she had slept at a lower temperature the night before. So she was
provided with water cooled mattress to keep her body at an exact
temperature throughout the night. “This had the effect of giving her better
deep sleep, which is when the body releases human growth hormone and
testosterone naturally”, says Christopherson . In the case of Sarah
Hammer , the data revealed a vitamin D deficiency , so they made changes
to her diet and daily routine ( including getting more sun shine). This
resulted in a measurable difference in her performance.
447
CASE – STUDIES IN BUSINESS ANALYTICS
448
CASE – STUDIES IN BUSINESS ANALYTICS
It also highlights the importance of supporting patterns in data. so, it’s not
just about the amount of data you collect or how you analyse it, it’s about
looking for patterns across different datasets and combining that
knowledge to improve performance- this applies to sports team and
business alike.
Background:
People move back and forward across US borders at a rate of nearly 100
million crossing a year. The department of homeland security (DHS) have
unenviable task of screening each one of those crossing to make sure that
they are not being made with ill intention, and pose no threat to national
security.
449
CASE – STUDIES IN BUSINESS ANALYTICS
Research has shown that there is no full proof way for human to tell
whether another human is lying simply by speaking to and watching them,
despite what many believes about “give away signs”. Compounding this
problem , humans inevitably get tired , bored and distracted , meaning
their level vigilance can drop.
The AVATAR system uses sensors that scans the persons face and body
language , picking up the slightest variations of movement or cues which
could suggest something suspicious is going on. In addition, a
computerised agent with a virtual human face and voice asks several
questions in spoken English. The subject of inspection answers , and their
response is monitored , to detect fluctuations in tone of voice as well as
content of what exactly was said.
This data is then compared against the ever growing and constantly
updating big database collected by AVATAR, and matched against
suspicious profiles which experience has shown can indicate that someone
has something to hide or is not being honest about their intentions in
travelling.
The data is fed back to the human agent via tablets and smartphones,
which gives them probabilistic assessment of whether a particular subject
is likely to be honest – each aspect of their profile is coded red, amber or
green – depending on how likely AATAR believes it is that they are being
truthful. If too many reds or ambers flash up that subject will be
investigated in more depth.
450
CASE – STUDIES IN BUSINESS ANALYTICS
As well as on the US-Mexico border, the AVATAR system has been trialled
on European borders , including at Bucharest’s main airport.
451
CASE – STUDIES IN BUSINESS ANALYTICS
Machines have the capability to detect whether humans are lying or acting
deceptively , far more accurately than people themselves can, if they are
given the right data and algorithms.
Humans respect authority – lab tests on the AVATAR system found that
interviewee were more likely to answer truthfully when AVATAR was given
a serious, authoritative tone and face than when programmed to speak and
appear friendly and informal.
452
CASE – STUDIES IN BUSINESS ANALYTICS
Background:
Big data is big in gaming industry. Take Zynga, the company behind
Farmville, words with friends and Zynga Poker. Zynga position themselves
as makers of social games. , which are generally played on social media
platform and take advantage of connectivity with other users that those
platform offer. Their games are also built to take advantage of big data
those platforms enable them to collect. At their company’s peak, as many
as 2 million players were playing their games at any point during the day
and every second their servers processed 650 hands of Zynga Poker.
Big Data also plays a part in designing the games. Zynga’s smartest big
data insight was to realise the importance of giving their users what they
wanted, and to this end, they monitored and recorded how their games
were being played, using the data gained to tweak gameplay according to
453
CASE – STUDIES IN BUSINESS ANALYTICS
what was working well. For example, animals, which played mostly a
background role in early version, were made a more prominent part of
later games when the data revealed how popular were they with gamers.
In short, Zynga use data to understand what gamers like and don’t like
about their games.
Game developers are more aware than ever of huge amount of data that
can be gained , when every joysticks twitch can be analysed to provide
feedback on how gamers play games and what they enjoy. Once a game
has been released, this feedback can be analysed to find out if, for
example, players are getting frustrated at a certain point, and a live update
can be deployed to make it slightly easier. The idea is to provide the player
with a challenge that remains entertaining without becoming annoying.
Their ultimate aim is always to get players gaming for as long as possible –
either to feel like they are getting value for money if it was game they paid
for or so that they can be served plenty of adds if it is free game.
Zynga make their data available to all employees , so they can see what
has proved popular in games. So even Farmville product manager can see
the Poker data and see how many people have done a particular game
action, for example. This transparency helps foster a data driven culture
and entertain data experimentation across the company. Indeed, Yuko
Yamazaki , Head of analytics at Zynga , tells that the company are
currently running over 1000 experiments on live products at the time of
writing, continually testing features and personalising game behaviours for
their players. Zynga’s analytics team also do “data hackathons” , using
their data and use cases and they host many analytics and data meet -ups
on site. All this this helps encourage innovation and strengthen the data
driven culture.
Elsewhere in the gaming industry, it has even been suggested that the
Microsoft’s $ 2.5 billion acquisitions of Minecraft last year was because of
the games integrated data mining capabilities, which Microsoft could use in
other products. Minecraft, the extremely popular world building game, is
based around a huge database containing thousands of individual items
and objects that make up each world. By playing the game, the player is
essentially manipulating that data t5o create their desired outcome in the
game. Minecraft, in Microsoft’s opinion, provides an ideal introduction for
children to principles of structuring and manipulating digital data to build
model that relates in some way to the real world.
454
CASE – STUDIES IN BUSINESS ANALYTICS
“compared to web gaming” Yamazaki explains “mobile gaming has its own
challenges , such as anonymous play activities, more genres of game and
more concentrated session activities” particularly in mobile games, session
length can be more important than the number of users, and longer
session mean greater opportunities for Zynga. This is because in mobile
sessions , players are usually paying attention the whole time during their
sessions ( whereas in browser based session , they may just have the page
open on inactive tab) . So though the number of daily active users is
down , a stronger focus on mobile games will provide Zynga with the
potential for greater reach and higher revenue.
455
CASE – STUDIES IN BUSINESS ANALYTICS
456
CASE – STUDIES IN BUSINESS ANALYTICS
Background
Uber is smartphone app-based taxi booking service which connects users
who need to get somewhere with drivers willing to give them a ride. The
service has been hugely popular . since being launched to serve San
Francisco in 2009, the service has been expanded to many major cities on
every continent except for Antarctica and the company are now valued at $
41 billion. The business are routed firmly in Big data , and leveraging this
data in more effective way than traditional taxi firms has played a huge
part in their success.
Uber hold a huge data base of drivers in all cities they cover, so when
passenger asks for a ride , they can instantly match you with the most
suitable drivers. The company have developed algorithm to monitor traffic
conditions and journey time in the real time , meaning price can be
adjusted as a demand for rides changes, and traffic conditions means
journeys are likely to take longer. This encourages more drivers to get
behind the wheels when they are needed- and stay at home when demand
is low. The company have applied for patent for this method of big data
informed pricing which they call “surge pricing”. This is an implementation
of “dynamic pricing” similar to that used by hotel chains and airlines to
adjust price to meet demand although rather than simply increasing prices
at weekends or during public holidays it uses predictive modelling to
estimate demand in real time.
457
CASE – STUDIES IN BUSINESS ANALYTICS
Data also drives the company’s Uber pool service, which allows user to find
others near to them who according to Uber’s data , often makes similar
journeys at similar time so that they can share the ride. According to
Uber’s blog, introducing this service became a no brainer when their data
told them the “vast majority of have look -a -like -a trip that starts near ,
ends near and is happening around the same time as another trip”. Other
initiatives either trailed or due to launch in future includes Uber Chopper,
offering Helicopter rides to wealthy , Uber fresh grocery deliveries and
Uber rush , a package courier service.
Uber rely on detailed rating system – users can rate drivers , and vice
versa to build up trust and allow both parties to make informed decisions
about who they want to share a car with. Drivers in particular have to be
very conscious of keeping their standard high, as falling below a certain
threshold could result in their not being offered any more work. They have
another matric to worry about , too: their “acceptance rate”. This is the
number of jobs they accepts verses those they decline. Drivers apparently
have been told that they should aim to keep this above 80%, in order to
provide a consistently available service to passengers.
There is bigger picture benefit to all this data that goes way beyond
changing the way we book taxies or get ourselves to the office each day.
Uber CEO Travis Kalanick has claimed that the service will also cut the
number of private , owner operated automobiles on the roads of the
world’s most congested cities. For instance he hopes Uber Pool alone could
help cut traffic on the streets of London by a third. Service like Uber could
revolutionise the way we travel around our crowded cities. There are
certainly environmental as well as economic reasons why this could be
good thing.
458
CASE – STUDIES IN BUSINESS ANALYTICS
It is fair to say there are still some legal hurdles to overcome: the service
is currently banned in handful of jurisdictions., including Brussels and
parts of India , and is receiving intense scrutiny in many other parts of the
world. There have been several court cases in the us regarding the
companies compliance with regulatory procedures-some of which have
been dismissed and some are still ongoing. But given their popularity ,
there is a huge financial incentive for the company to press ahead with
plans to transform private travel.
459
CASE – STUDIES IN BUSINESS ANALYTICS
Background:
Amazon long ago outgrew their original business model of an on line
bookshop. They are now one of the world’s largest retailers of physical
goods , virtual goods such as e-books and streaming video and more
recently Web services.
With this ethos in mind, Amazon have also moved in to being the producer
of goods and services, rather than just retailer. As well as commissioning
the films and TV shows, they build and market electronics , including
tablets , TV boxes and streaming hardware.
Even more recently , they have moved to take on food supermarkets head-
on by offering fresh produce and far quicker delivery through their Amazon
Now services.
460
CASE – STUDIES IN BUSINESS ANALYTICS
The problem here is that a customer can often feel overwhelmed when
presented with huge range of possible options. Psychologically, worries
about suffering from “buyers remorse”-wasting money by making ill-
informed purchasing decisions- can lead to our putting off spending money
until we are certain we have done sufficient research. The confusing
amount of options may even cause us to change or minds entirely about a
fact we need a $ 2000 ultra HD Television set and decide to go on vacation
instead.
It is the same problem that often plagues many projects involving large
amounts of information. Customers can become data rich with a great
many options but insight poor -with little idea about what would be the
best purchasing decision to meet their need and desire.
Amazon probably did not invent the recommendation engine but they
introduced it to widespread public use. The theory is that the more they
know about you , the more likely they are to be able to predict what you
want to buy. Once they done that , they can streamline the process of
persuading you to buy it by cutting out the need for you to search through
their catalogue.
Unlike with content based filtering -as seen, for example , in Netflix’s
recommendation engine – this means the system does not actually have to
know anything about the unstructured data within the product it sells. All it
needs is Metadata : the name of the product , how much it cost, who else
has bought it and similar information.
Amazon gather the data on every one of their over a quarter of billion
customers while they use their services. As well as what you buy, they
monitor what you look at , your shipping address to determine
demographic data ( they can take a good stab at guessing your income
461
CASE – STUDIES IN BUSINESS ANALYTICS
level by knowing what neighbourhood you lives in) and whether you leave
the customer reviews and feedback.
They also look at the time of the day you are browsing , to determine your
habitual behaviour and match your data with others who follow similar
patterns.
Revenue for their cloud based web services business such as Amazon Wb
Services have grown by 81 percentage in the last year to $ 1.8 billion.
462
CASE – STUDIES IN BUSINESS ANALYTICS
your location data and information about other apps and services you use
on your phone. Using Amazon’s streaming content services , such as
Amazon Prime and audible , provides them with more detailed information
on where, when and how you watch and listen to TV, film and audio.
The more the business know about a customer , the better it can sell to
them. Developing a 360 degree view of each customer as an individual is a
foundation of big data -driven marketing and customer service.
463
CASE – STUDIES IN BUSINESS ANALYTICS
15.8 SUMMARY
Next few years will see companies who ignore big data be overtaken by
those who don’t. Any organisation without a big data strategy and without
plans in place to start using Big Data to improve performance will be left
behind.
It is impossible to predict big data but one can see the term disappearing ,
because it will no longer be needed to emphasize a new phenomenon,
because over emphasizing the size of the data rather than the variety and
what we do with it.
Smart application of big data starts with your strategy in order to identify
the areas in which data can make the biggest difference to performance
and decision making. Only once one should clear about the strategic
questions the big data could help to answer but you start to collect and
analyse data to help to answer those questions and transform the
organisation. It is clearly mentioned in these 6 case studies showing how
these principles are applied well. However, in practice there are lot of
companies that get lost in the big data opportunities and end hoarding
data in mistaken believe it will, some day , become useful.
So it will be always better to start with right strategy and identify the big
challenges and areas in which the data will make the biggest difference.
Only then collect and analyse the data that will help you to meet those
challenges. Don’t fall in to the trap of collecting and analysing everything
you can.
464
CASE – STUDIES IN BUSINESS ANALYTICS
the developments of big data and make phenomenon become even more
important.
If were to look in to a crystal ball , then you can see an increasing move to
real time analytics where large volume of data (structured and
unstructured ) are analysed in almost real time to inform decision making
and to feed machine -learning algorithms.
There is no doubt, that big data will give us many innovations and
improvements but it will also challenge us in the areas such as data
privacy and data protection. The ability to analyse everything in wrong
hands can cause unthinkable harm. It will be up to all to ensure the right
legal framework are in place to protect from the misuse of big data.
1. How is the Big Data used in Practice by Walmart in the case study
given?
5. In case of Amazon case study discussed in this chapter What was the
problem that big data helped to solve?
465
CASE – STUDIES IN BUSINESS ANALYTICS
4. In case study of Uber, It was proven tricky to get any great details on
Uber’s big data infrastructure, but it appears all their data is collected in
to Hadoop data lake and they use ------------to process the data.
a. Apache Spark
b. Hadoop
c. Meta data
d. Both Apache Spark and Hadoop
466
CASE – STUDIES IN BUSINESS ANALYTICS
467
CASE – STUDIES IN BUSINESS ANALYTICS
REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter
Summary
PPT
MCQ
468