Fundamentals of Big Data & Business Analytics
Fundamentals of Big Data & Business Analytics
-\;�
SVL\1'S ?
NMIMS
u.-,..,...,liNIVERStTY
CENTRE FOR
DISTANCE AND
ONLINE EDUCATION
FUNDAMENTALS OF BIG DATA & BUSINESS ANALYTICS
Edited by:
Dr. Brinda Sampat
NMIMS Centre for Distance and Online Education
ISBN:
978-93-86052-15-5
7 163
8
A(: Prescriptive Analytics
181
Social
M edia Analytics and Mobile Analytics
9 239
10
11
"'> Data Visualisation
Business Analytics in Practice
Case Studies
263
287
FUNDAMENTALS OF BIG DATA &
BUSINESS ANALYTICS
CURRICULUM
Business Transformation with Big Data: What is Big Data; Structured v/sUnstructured data
BigData Skills and Sources of Big Data; Big Data Adoption; Characteristics of Big Data - The Seven
V's; Under standing Big Data with Examples; Key aspects of a Big Data Platform; Governance for
Big Data; Text Analytics and Streams; Business applications of Big Data; Technology infrastructure
required to store, handle, and manage Big Data
Technologies for handling Big Data: Distributed and Parallel Computing for Big Data, Introduction
to Big Data Technologies (Hadoop, Python, R etc.) Cloud Computing and Big Data, In-Memory
Technolo gy for Big Data; Big Data Techniques (Massive Parallelism; Data Distribution; High-
Performance Com puting; Task and Thread Management; Data Mining and Analytics; Data Retrieval;
Machine Learning; Data Visualization)
Introduction to Business Analytics: What is Business Analytics (BA)? ; Types of BA; Business Analyt
ics Model; Importance of business analytics now; what is Business Intelligence (BI)? Relation
between BI and BA; Emerging Trends in BI and BA
Descriptive Analytics: What is descriptive Analytics; Visualizing and Exploring Data; Descriptive
Sta tistics; Sampling and Estimation; Introduction to Probability Distributions
1.1
Introduction
1.2 Evolution of Big
Data
Self Assessment Questions
1.3
Activity
Structured v/s Unstructured data
Self Assessment Questions
1.4
Activity
1.4.1 Big Data Skills and Sources
The Sources of Big
Data
1.5
Self Assessment Questions
1.5.1 Activity
1.5.2 Big Data Adoption
1.5.3 Use of Big Data in Social Networking
Use of Big Data in Preventing Fraudulent Activities
Use of Big Data in Retail Industry
1.6 Self Assessment Questions
Activity
Characteristics of Big Data -The Seven Vs
1.7 Self Assessment Questions
1.7.1 Activity
Big Data Analytics
Advantages of Big Data Analytics
1.8 Self Assessment Questions
Activity
Key Aspects of a Big Data
1.9 Platform Self Assessment
Questions Activity
Governance for Big Data
Self Assessment Questions
Activity
CONTENTS
Text Analytics
Self Assessment Questions
Activity
1.11 Business Applications of Big Data
Self Assessment Questions
Activity
1.12 Technology InfrastructureRequirement
1.12.1 Storing of Big Data
1.12.2 Handling of Big Data
1.12.3 Managing Big Data
Self Assessment Questions
Activity
1.13 Summary
1.14 Descriptive Questions
1.15 Answers and Hints
1.16 Suggested Readings & References
INTRODUCTORY CASELET
N O T E S
@ LEARNING OBJECTIVES
Ii■INTRODUCTION
The 21st century is characterised by the rapid advancement in the
field of information technology. IT has become an integral part
ofdaily life as well as various other industries, be it health, education,
enter tainment, science and technology, genetics, or business
operations. In today's competitive and global economy, organisations
must possess a number of skills to create their place and sustain in
the market. One of the most crucial of these skills is an
understanding of and the ability to utilise and harness the immense
potential of information technology.
This chapter first discusses about the evolution of Big Data. Next,
the chapter describes the differences between structured and
unstruc-
N O T E S
tured data. Further, the chapter explains Big Data skills and sourc
es. This chapter next discusses about the adoption of Big Data.
The chapter also discusses the characteristics of Big Data and Big
Data analytics. Next, the chapter discusses about key aspects of a
Big Data platform and text analytics. Towards the end, the chapter
discusses about business applications of Big Data and technology
infrastructure requirement.
Some years later in 1919, IBM took up the agricultural census with
over 5000 federal employees deployed across Washington and over
90,000 enumerators by using more than 100million IBM punch cards
and other processing equipment. After that successful program, Big
Data took yet another leap forward with the development of The
Man hattan Project - the atomic bomb developed by the US in World
War II and further more in US space programs from 1950. Later, a
synoptic data collection model was adopted, which relied heavily on
allocation of large data sets. This shift in data-collecting techniques,
analysis, and subsequent collaboration helped to redefine how bigger
scientific projects were planned and accomplished. One such
ambitious project was the International Biological Program - it
studied the environmen tal changes on the species and flora-fauna of
a particular place. This program led to the exponential increase in
the amount of data gath ered and combined latest analysis
technologies. Although, it was met with difficulties related to
research structures and methodologies, and ultimately ended in 1974,
it opened a host of different transformed ways that data was
collected, organised, shared and redefined the ways the existing tech
could use data science more efficiently.
The lessons gained from the arrival of Big Data Science laid way for
further contemporary Big Data projects, like weather prediction, su
percollider data analytics and other physics based research,
astronom ical sciences and data collection like planetary image
detection, med-
N O T E S
ical research and many others. Big Data has become such a dynamic
force that it doesn't apply only to sciences anymore; many businesses
have got their critical data based services hooked onto its methodolo
gies, techniques and objectives too which has allowed the businesses
to unleash the data value that might have gone unnoticed earlier.
1. The path towards modern Big Data was actually laid during
ACTIVITY
Where else, instead of existing industries and domains, do you
think Big Data can play a crucial role in improving the overall
■
op erational and organisational efficiency? Make a list of the
domains with reasons to back them up.
No.Actually, a word file may not fit in a database where only text
files are supposed to be kept. Word file may have an internal
structure with all sorts of indentations, grammar, alignment and
margins thoroughly worked upon but in a database with different
definitions for the data, the database designer expects a text or excel
file as a word file is con sidered as unstructured.
•• •
N O T E S
The joys of having a structurally sound data are many like they can
be seamlessly added in a relational database and are easily
searchable by simplest of search engine operations or even
algorithms; whereas, the unstructured data is basically the reverse
of the above definition. It is a nightmare for the designers to
connect the random strands of data with the existing meaningful
ones and present it as a structure. Structural data is closer to
machine language than the unstructured data. So, the battle of
finding out a fine balance between keeping the machine happy and
the user happier is all that leads to the ever-refin ing Big Data
sciences and its affiliated technologies.
ACTIVITY
In your day-to-day life, write all the structured data patterns.
You see that you have observed for a week and compare them
with the unstructured patterns around you. Now, think of the
ways how the connection between them can be laid, if required.
Please note re late only logically cohesive things, i.e. things that
can co-exist.
EXHIBIT
Semi-structured data
Semi-structured data, also known as having a schema-less or
self-describing structure, refers to a form of structured data that
contains tags or markup elements in order toseparate elements and
generate hierarchies of records and fields in the given data. Such
type of data does not follow the proper structure of data models as
in relational databases. In other words, data is stored
inconsistently in rows and columns of a database. Some sources
for semi-struc tured data include:
o File systems such as Web data in the form of cookies
O Data exchange formats such as JavaScript Object Notation
(JSON) data
SEMI-STRUCTURED DATA
S.No. Name E-mail
1. Sam Jacobs [email protected]
2. First Name: David [email protected]
Last Name: Brown
Over the next five years, demand for Big Data staff, by comparison,
is forecasted to increase at an average rate of between 13% (low
growth) and 23% p.a. (high growth). A mid-point average of these
two rates would give an expected growth rate of 18% p.a. This
would be a fa voured situation and should equate to the creation of
approx. 28,000 job opportunities p.a. by 2017.
That was a read about the Big Data technologies and methodologies
with a brief overview of how the job prospects are for a potential Big
Data candidate. Let's take a brief look on the sources of datasets that
define Big Data as a science and complete it as a method.
The philosophy around Big Data sciences and collection has often
been defined around the 3 Vs - volume, velocity and variety of data
in flowing a system. For many yesteryears, this used to be enough
but as companies moved more towards online processes,
thisdescription has been stretched to take in variability as well -
which simply denotes the increase in the range value of a large
data set - and value, that addresses the evaluation of a typical
enterprise data.
N O T E S
ACTIVITY
Can there be specific data types that are most reliable and authen
tic while another one that is more prone to errors? Consider met
rics, such as references, quotes, and sources while creating the vi
sualisation.
N O T E S
Big Data frameworks are not push-button answers. For data analysis/
analytics to offer value, corporations ought to have data management
and the governance frameworks of Big Data. Complete well-defined
processes and ample skill sets for those who will be responsible for
customising, implementing, populating and using Big Data solutions
are also necessary. Additionally, the data quality aimed for Big Data
powered processing needs to be evaluated as well.
N O T E S
The clerk for a U.S. council received an e-mail from her senior, who
was out of the country on a vacation, requesting the funds transfer
for a time-bound acquisition requiring to be closed by the end of the
day. The senior said that a lawyer would contact her to provide
further details.
"It was not uncommon for me to get official e-mails seeking funds trans
fe1;" the clerk said. Later the lawyer contacted her via e-mail,
with the appropriate authorisation-including her senior's signature
with company's seal-she simply followed the directions to
transfer more than $880,000 to a bank in China.
Clearly, to handle such attacks, you need a unique defense outlook
where Big Data offers a potential answer as it allows institutions or
corporations to tackle the fraud differently and get results
accordingly.
Here is how Big Data helps in preventing frauds:
D Recognising suspicious activities in advance: Banks are always
on a lookout for real time data with suspicious behaviour. Like, if
a credit card owner transacts for the first time from a particular
de vice, the bank gets notified. If multiple transactions are
occurring from different devices in a day, the subsequently
generated data is enough to raise the alarm and red flag the
transactions. Few banks also inform the actual card holders
instantly and can prohibit the transaction. Big Data is simplifying
the detection of unusual trans actions like if two transactions take
place from a single credit card in different cities within a short
period, the bank is going to get alerted.
D Leverage data to detect suspicious activities: Banks access large
number of customer's data from various sources such as social
media, logs, call center conversation and that data can be very
helpful in determining abnormal activities. For example, a credit
card holder travelling in an airplane currently and is posted his
present status on Facebook. Therefore, any transaction on user's
credit card during that period is considered suspicious and can be
blocked at the bank's discretion.
Let us now consider insurance industry that receives a lot of deceit
ful claims and even accepts some and disburses the substantial claim
amount. How does Big Data assist in such a case? The industry can
access data gained from a variety of sources such as past claim re
cords, social media, phone and criminal records. Upon receipt of a
claim, the scrutiniser should verify the claimant information. If any
suspicious activity is found in the claimant's record, it should
forward the claim for additional investigation.
.. • •
•
N O T E S
Big Data has brought in some remarkable results for retailers across
the industries as evident from their testimonials.
N O T E S
such unpredictable weather are not good. However, Cafe Inn turned
this adversity into their advantage. They observed that the travellers
of a cancelled flight end up in an urgent situation and are in need of
an overnight stay. The company used weather and flight cancellation
information that was readily and freely available, coupled with hotel
and airport information, and an algorithm was developed, which took
factors, like travel conditions, weather severity, time of the day and
rates of cancellation by airlines among other variables. With insights
of Big Data, and pattern recognition of travelers using the mobiles
for this use case, the company effectively used Pay Per Click (PPC)
and search mobile campaigns to send specific mobile ads to stuck
travel ers and made it easier for them to book a nearby hotel and
increasing the overall hotel revenue by manifolds even in the most
unexpected of times.
There are several such case studies and stories where BigData's effec
tive utilisation resulted in a great deal of turnaround for corporations.
ACTIVITY
Big Data for retail industries can be a hit and miss affair. Discuss
with your friends.
ACTIVITY
Are seven V's enough/too much for Big Data classification.
Critical ly explain both the cases with examples.
N O T E S
ACTIV1TY
■
Analyse a real life situation around you that can use Big Data ana
lytics to increase the overall operational and functional
efficiencies.
erated within the organisation. Being able to analyse all this data in a
meaningful way can be an intimidating task without the proper infra
structure and ways to process data from diverse sources and
effective ly. And once you have managed it, it's another fight to
make it mean ingful to the people who need to understand it. So, for
organisations to build the correct Big Data policy, here are the five
crucial components to consider:
□ A universal data model: Ensure your entire data is centralised
and unified in a common data model to provide a single accurate
view of the business. The conventions for common data model
such as naming, fields relationships and attributes are created by
data model itself in a way that everything is aligned across trans
actional and other related systems.
□ Exploit the power of external data: Capturing the true
meaning of the data means successfully integrating initial data
from inter nal sources with external data from diverse
environments (like so cial media, vendor data and demographics).
The platform should be flexible enough to accommodate
information in multiple ways from multiple structured or
unstructured distributed databases.
□ Focus on open standards and scalability: Organisations can uti
lise existing systems efficiently by using a platform with scalable
standards, simultaneously gaining flexibility and reducing the IT
related costs in terms of businesses. Open industry standard com
pliant systems are readily available and preferred to existing sys
tems for many reasons, one being their effortless integration with
existing systems from multiple other vendors, legacy systems and
future add-on solutions.
□ Platform independent model: In today's age, the information
is readily accessible across various platforms, hence
organisations must ensure a universal infrastructure for delivering
and produc ing scorecards, dashboards, enterprise reports and ad-
hoc anal ysis while giving end-users the real-time round the clock
access to mobile BI, self-service BI, and the capacity to tailor
made their own BI content and customised dashboards using a
simpler point and-click interface.
□ Provide users with insights: Users need to be at single point in
order to act on information rather than switching between tasks
or multiple applications. This type of cross-domain, closed-loop
analytics ensures that BigData will have an instant beneficial and
informative impact on daily operations.
ACTIVITY
How will you unify the different data sets in case you are given
an opportunity to design and develop an architecture?
N O T E S
ACTIVITY
Which other governance model other than Big Data you can think
for managing low traffic data centers?
■ 8111TEXT ANALYTICS
The Text Analytics are the conversion of unstructured textual data
into comprehensible analytical data, that often includes processes
like check product reviews, measure consumer opinions, buyer
sentiment analysis and feedback, provide search facility, and object
modelling to ensure factual decision-making. Text analysis requires
multiple statis tical, linguistic and machine learning techniques and
involves retriev al of information from unstructured data and
restructuring the input text to create patterns and trends, evaluate and
interpret the data output. It also involves categorisation, alphabetical
analysis, tagging, recognition of a recurring or singular pattern,
clustering, extraction of vital information, visualisation, link and
association, and predictive analytics. Text Analytics determines
topics, keywords, category, tags, semantics, from the humongous text
data stored in different files and formats in a typical organisation.
The term 'Text Analytics' also refers to 'text mining'.
N O T E S
Text
Identification 1 -----
Visualisation Text Mining
Summarisation Text
Categorisation
Text Analytics
Text
Clustering
Search
Access
Entity/Relation
Modeling
N O T E S
Big Data is now seeping in those areas that were earlier prone to mis
calculations and predictions, such as stock inventory model where a
retailer couldn't decide whether to stock up for the upcoming
seasonal sales based on the factors around or not. Now, the same
retailer can optimise their stock from the Web search trends, social
media data and weather forecasts predictions.
N O T E S
Collider nuclear physics lab, world's most powerful and largest parti
cle accelerator is currently experimenting on genesis of the universe
in search of the elusive God particle. The datacentre responsible for
managing CERN's datasets has 66,000 processors to analyse around
30petabytes of data produced. It uses the distributed computing pow
er of thousands of systems located across 140 datacentres around the
world. Suchcomputing powers can be utilised to change the
waymany other areas of science and research function and give the
results.
ACTIVITY
Pattern-based recognitions and fingerprint recognition system
store their data and keep it unique based on patterns and finger
prints. What do you think facial recognition systems keep as
unique identifier and how does Big Data help in it?
TECHNOLOGY INFRASTRUCTURE
REQUIREMENT
Big Data is simply a large data repository with the following charac
teristics:
□ Has distributed redundant data storage
D Handles large amounts (a petabyte or more) of data
D Provides data processing (MapReduce or equivalent) capabilities
D Processes tasks in parallel
□ Is relatively inexpensive
□ Centrally managed and orchestrated
D Extensible - basic capabilities can be augmented and altered
D Accessible - easy to use and available
So, the infrastructure that's going to host BigData as the prime driver
of an organisation must be robust, scalable, ductile and fail-safe for
unplanned situations. But how do we arrive at such robust scale of
infrastructure? Merely having a super-expensive high-spec systems
and networking gears will be enough or Big Data requires something
more than these usual factors?
= •• •
N O T E S
N O T E S
A lot has been discussed and written about the Big Data's
functioning, its associated workflows, technologies used and traits
they require to share in order to perform efficiently. Following key
points should be considered while keeping Big Data management in
context:
D Cluster design: Application requirements are evaluated in terms
of volume, workload and other associated factors that form the
ba sis of cluster design, which is not a repetitive process. The set-
up in initial stages is validated and verified with an application
and data sample before being actuated. Although, the cluster
design of a typical Big Data structure allows scalability in tuning
config uration parameters, a large number of other parameters
and their impacts on each other lead to additional complexity.
D Hardware architecture: The key factor that works in favour of
Hadoop clusters is the high-quality equipment used by it. Since
most of the Hadoop users concern about the cost and as the clus
ters grow, cost rises significantly. In the current scenario, the ar
chitecture hardware requirements for the NameNode are higher
RAM and lower level or mid-lower levels ofHDD. If the
JobTrack er turns out to be a separate server, it will have higher
CPU speed and RAM latency. DataNodes are a standard for
lower end server machines.
= •• •
N O T E S
ACTMTY
Can HDFS have a replacement with a much more efficient system?
Make a list of technologies that have the potential to do so.
■ lijsuMMARY
□ The Big Data sciences use concepts of statistics, relational data
base programming extensively.
□ Normally, while dealing with enormous number of datasets, you
need to have a good sense of observing the patterns, frequency of
data occurrences and other features that help in narrowing down
a data further to its correct place.
□ The chunk of Big Data created comes from three primary
sources: machine data, social data and transactional data.
o The adoption of a contemporary technology like Big Data can
enable the altering innovation that can bring a transition in the
structure of a business, either with its services, products, or
organ isation.
F.UNDAMENTAI::S :
N O T E S
m KEYWORDS
N O T E S
N O T E S
E-REFERENCES
D What is Big Data and why it matters. (n.d.). Retrieved April 22,
2017, from https://www.sas.com/en_us/insights/big-data/what-is
big-data.html
D Big Data. (2017, March 17). Retrieved April 22, 2017, from https://
www.ibm.com/big-data/us/en/
CONTENTS
2.1
Introduction
2.2
Distributed and Parallel Computing for Big Data
Self Assessment Questions
Activity
2.3 Introduction to Big Data Technologies
2.3.1 Hadoop
2.3.2 Python
2.3.3 R
Self Assessment Questions
Activity
2.4 Cloud Computing and Big Data
Self Assessment Questions
Activity
2.5 In-Memory Technology for Big Data
Self Assessment Questions
Activity
2.6 Big Data Techniques
2.6.1 Massive Parallelism
2.6.2 Data Distribution
2.6.3 High-Performance Computing
2.6.4 Task and Thread
2.6.5 Management Data Mining
2.6.6 and Analytics Data Retrieval
2.6.7 Machine Learning
2.6.8 Data Visualisation
Self Assessment Questions
Activity
2.7 Summary
CONTENTS
2.8
Descriptive Questions
2.9 Answers and Hints
2.10
Suggested Readings & References
• •• •
INTRODUCTORY CASELET
N O T E S
N O T E S
@ LEARNING OBJECTIVES
ill INTRODUCTION
The market is flooded with corporations offering custom-made tools
and frameworks for implementing Big Data and analytics. However,
behind the branding and beneath the platform, the basic features are
common in all. Given below is a list of methods and practices that
are usually followed for a typical Big Data implementation:
O NoSQL database: It offers a provision for storage and extraction
of the data modelled in tabular relations instead of typical
relation al databases to cater efficiently to real-time situations.
O Data incorporation: Data management tools available as solu
tions like Amazon Elastic MapReduce (EMR) that run
underneath a customised version of Apache Hive, Pig, Spark,
Couchbase, Ma pReduce, Hadoop, MongoDB, etc.
Data virtualisation: Virtualisation of multiple data sources into
one helps in real-time extraction, fetching and storage operations
from multiple sources such as Hadoop and distributed data
stores. It is possible from a single point.
o Search and knowledge finding: These tools and applications aid
in self-serviced processes to extract information and new
findings from humongous storage spaces consisting of
structured/unstruc tured data residing in numerous sources such
as databases, file systems, APis, streams, other platforms and
applications.
O Stream analysis: These tools and applications can enrich, aggre
gate, filter and analyse a high data influx from multiple
incongru ent real-time data sources and in any format.
O Data memory composition: These tools provide faster access
and processing of humongous data by spreading it across the
dynamic RAM, SSD or Flash storage of a distributed computer
system.
O Big Data predictive analytics: Predictive analysis is simply the
analysis of the expected events and pre-planning to manage such
events that might have an impact on overall structural, operation
al and functional aspect of an organisation. It usually comprises
hardware or tool-based solutions to let the organisation discover,
• •• •
N O T E S
In this chapter, you will first learn distributed and parallel comput
ing for Big Data. Next, you will learn the basics of Big Data
technolo gies. Further, you will study cloud computing in reference
to Big Data. Next, you will learn in-memory technology for Big
Data. Towards the end, you will learn about various Big Data
techniques.
N O T E S
Data, such parallel systems are the ones that execute from multi
ple datasets throughput points and run in parallel connected to a
master system. Parallel computing is a close-coupled system and
is used in solving the following:
♦ Computer-exhaustive problems
♦ Bigger problems in the same time
♦ Similar-sized problems in the same time with high precision
Distributed Computing
Grid Node
/'
Control Server /
,,,1.;'
,
,
Parallel Computing
l
z
.. . ,
Q )
::,
0..
s 0
u
ACTIVITY
2.3.1 HADOOP
FS/namespacelmeta ops
HOFS Secondary
NameNode NamcNode
Clicnc
Numespac backup
N O T E S
HDFS Architecture
HDFS Architecture
Datanodes Datanodes
□□ D Replication D D I[?..,D
□ ----
CTs1oc s
Rack 1 Rack 2
N O T E S
2.3.2 PYTHON
N O T E S
2.3.3 R
N O T E S
Table 2.2 lists the pros and cons of using Rand Python for Big Data:
ACTIVITY
Try to find out alternatives to R in Python that equally resonates
well with the Hadoop HDFS architecture.
SaaS
Laptop
loud - IaaS
rovider
PaaS
Mobiles or
PDAs
N O T E S
N O T E S
••
Company X
Cloud
Services
Public Cloud (IaaS/ CompanyY
PaaS/
SaaS)
CompanyZ
Cloud
Services
(IaaS/PaaS/
Saa$)
Cloud Services
(IaaS/PaaS/SaaS) Cloud Services
(IaaS/PaaS/SaaS)
II II II II II II
Organisations having common tie
to Organisations having common iie to
share resources share resources
Public Cloud
. ..
Migrated Application
Private
Cloud
Cloud Services
Organisatio
I n
I
(IaaS/PaaS/SaaS)
EXHIBIT
Difference between Saas, PaaS and IaaS
The cloud is a broad concept and it covers just about every possi
ble sort of online service, but when businesses refer to cloud pro
curement, there are usually three models of cloud service under
consideration: Software as a Service (SaaS), Platform as a Service
(PaaS),and Infrastructure as a Service (laaS). Each has its own in
tricacies and hybrid cloud models, but today we're going to help
you develop an understanding of high-level differences between
SaaS, PaaS and IaaS.
Buy
IaaS SaaS PaaS
SOFTWARE AS A SERVICE
PLATFORM AS A SERVICE
INFRASTRUCTURE AS A SERVICE
IaaS is the most flexible cloud computing model and allows for
automated deployment of servers, processing power, storage and
networking. IaaS clients have true control over their infrastructure
than the users of PaaS or SaaS services. The main uses of IaaS in
clude the actual development and deployment of PaaS, SaaS and
Web-scale applications.
Som·c.e: https://www.computenext.com/blog/when-to-use-saas-paas-and-iaas/
Big Data cloud providers have been gearing up to bring the most ad
vanced technologies at competitive prices in the market. Some pro
viders are established, whereas some of them are relatively new to
the
•
N O T E S
N O T E S
ACTIVITY
Search the names of companies on the Internet, which make avail
able different cloud computing service (IaaS, PaaS, or SaaS) bene
fits to their users.
N O T E S
Now cost variations for such setups have abridged. Figure 2.9 shows
the cost of various storage technologies available for a sample 1GB
of memory along with respective read/write performance:
It takes $9 for 1GB of RAM, $0.40 for SSDs and $1 for PCI
compatible memory cards. The choice of a specific memory
technology is subject to its raw performance figures for a real-time
scenario than bench marking figures, for a given use case. As
memory evolution goes on, new dynamic memory substitutes are
shortening performance gaps by far and large. Database-related
technologies are adapting with the evolution that has struck the
goldmine for corporations for giving a capability to fuse the newer
and older setups in tandem with deliver ing radical performance to
cost ratios.
!!:] ACTIVITY
tions, they also assist Big Data in the bioinformatics domain as they
do for sequencing and alignment.
N O T E S
N O T E S
It delivers on the potential of mining the value from huge and differ
ent data sources with less dependence on human instructions. It is
data-driven and runs at machine scale and well-suited to the compli
cation of dealing with different data sources and the enormous range
of variables and quantities of data involved. And in contrast to con
ventional analysis, machine learning blooms on expanding datasets.
More data a machine learning system gets, more it learns and applies
the results to yield higher quality insights.
N O T E S
N O T E S
N O T E S
ACTMTY
Research on different machine learning methods and find out
which methods and their algorithms are vital for solving Big Data
problems.
IJisuMMARY
D Distributed computing works on the rules of the divide and con
quer approach, performing modules of the parent tasks on multi
ple macllines and then combining the results.
D Parallel computing refers to the utilisation of a single CPU
present in a system or a group of internally coupled systems by
the means of efficient and clever multi-threading operations.
D Concurrency of a system is simply an operation of multiple
threads that execute on single or multiple processors.
D Distributed computing is considered to be the subset of parallel
computing, which further is the subset of concurrent computing.
D A Big Data system is vastly different from other solution provid
ing systems and is based on the seven Vs, as described in previ
ous chapter, namely: Volume, Velocity, Variety, Veracity, Variability,
Value and Visualisation.
D Hadoop is an open-source platform that provides analytical tech
nologies and computational power required to work with such
large volumes of data.
D MapReduce is a framework that helps developers to write pro
grams to process large volumes of unstructured data parallel over
a distributed architecture/standalone architecture which
produc es result in a useful aggregated form.
D Hive is a data warehouse tool created by Facebook based on Ha
doop and converts the query language into MapReduce jobs.
D Hbase is a Hadoop application running atop the HDFS.
• •• •
N O T E S
N O T E S
II KEYWORDS
□ Hadoop distributed file system (HDFS): It is a fault-tolerant
storage system in Hadoop.
□ Hive: A data warehouse tool created by Facebook based on
Ha doop that converts a query language into MapReduce jobs.
□ MapReduce: It is a framework that helps developers to
write programs to process large volumes of unstructured data
over a distributed architecture/standalone architecture which
produc es results in a useful aggregated form.
□ Object O1·iented Programming (OOP): A paradigm where
data is encompassed within an object and carries several
heuristic properties.
□ Pig: Pig is a high-level modular programming tool developed by
Yahoo for streamlining huge data sets with the use of Hadoop
and MapReduce.
□ Python: It is a popular interpreted, general-purpose, high-lev
el dynamic programming language that aims to improve code
readability and overall ease of use and expression in fewer
state ments than other competitive languages such as C+ + or
Java.
□ R: It is an open source interpreted programming language
and an application environment for statistical computing with
graphics, developed by R Foundation for Statistical
Computing.
□ So1id State Drives (SSD): Such storage drives have no me
chanical components and higher read/write rates that result in
less wear or tear and robust performance.
IJ:jDESCRIPTIVE QUESTIONS
1. Differentiate between parallel and distributed computing.
2. Explain the concept of Hadoop in Big Data.
3. What do you understand by cloud computing? Also, discuss its
three basic types of services.
4. Describe the concept of in-memory technology for Big Data.
5. Enlist and explain different types of Big Data techniques.
N O T E S
N O T E S
E-REFERENCES
D Welcome to Apache'" Hadoop®! (n.d.). Retrieved April 22, 2017,
from http://hadoop.apache.org/
D What is Hadoop? (n.d.). Retrieved April 22, 2017,from https://www.
sas.com/en_us/insights/big-data/hadoop.html
D Hadoop& Big Data.(n.d.). Retrieved April 22, 2017, from https://
mapr.com/products/apache-hadoop/
CONTENTS
3.1 Introduction
3.2 Introduction to Business Analytics
Self Assessment Questions
Activity
3.3 Types of BA
Self Assessment Questions
Activity
3.4 Business Analytics Model
3.4.1 SWOT Analytical Model
3.4.2 PESTLE or PEST Analytical Model
Self Assessment Questions
Activity
3.5 Importance of Business Analytics
Self Assessment Questions
Activity
3.6 What is Business Intelligence (BI)?
Self Assessment Questions
Activity
3.7 Relation between BI and BA
Self Assessment Questions
Activity
3.8 Emerging Trends in BI and BA
Self Assessment Questions
Activity
3.9 Summary
3.10 Descriptive Questions
3.11 Answers and Hints
3.12 Suggested Readings & References
INTRODUCTORYCASELET
N O T E S
AMNESTY INTERNATIONAL
THECHALLENGE
Arow1d four years back, with the help of its in-house fund
raising consultants, Amnesty International started seeking an
analytics software to work parallel to the existing CRM systems.
The fund-raising consultants are responsible for gathering funds
and managing various kinds of donors. They are also required to
measure the donors' sentiments and interests based on multiple
inputs, such as various parameters and participatory ratios. For
such measurements, they were dependent on programmers for
analysing customers, directing specific campaigns at them based
on their interact.ions and contributions to the campaign and the
organisation. It was a tedious exercise and not always accurate.
There were regular gaps between the requirements consultants
asked for and what they were delivered.
THESOLUTION
Based on the inputs gained from the consultants, Amnesty In
ternational finalised an analytics tool with easy drag-and-drop
interface to carry out the analytics processes as envisaged by the
consultants.
The analytical tool was integrated with the CRM. Thus, using the
contemporary analytics software with CRM database became
eas ier, making the reporting features much more robust. Of
course, as a human rights organisation, Amnesty International
performs all data analytics in obedience with privacy rules and
protective integrity.
N O T E S
@J LEARNING OBJECTIVES
Ii■INTRODUCTION
The word 'Analytics' has multiple meanings and is open to
interpreta tion for business and marketing professionals. This term is
used dif ferently by experts and consultants in almost a similar
fashion. Ana lytics, as per the definition of the business dictionary, is
anything that involves measurement - a quantifiable amount of data
that signifies a cause and warrants an analysis that culminates into
resolution.
This chapter discusses about Business Analytics and its types. Next,
the chapter discusses about Business Ana.lytics (BA) model. This chap
ter further discusses about importance of Business Analytics. Further,
■
this chapter discusses about the concept of Business Intelligence (Bl)
and its relation with business analytics. In the end, this chapter dis
cusses about emerging trends of BI and BA.
INTRODUCTION TO BUSINESS
ANALYTICS
Business Analytics is a group of techniques and applications for stor
ing, analysing and making data accessible to help users make better
strategic decisions. Business Analytics is a subset of Business Intel
ligence, which creates competences for companies to contest in the
market efficiently and is likely to become one of the main functional
areas in most companies (More on BI later in this chapter).
N O T E S
ACTIVITY
How can business analytics bring a change for a newspaper hawk
er? Think it out.
Ill TYPES OF BA
Going by the linguistic definition purely, there may be multiple elu
cidations of the term BA.However, in practical terms, there are four
types of BA that help an organisation in gauging out the customer
sentiments and then take respective actions:
D Descriptive analysis: It refers to "What is happening?" or "What
happened?" type analytics based on incoming data. Such analyt
ics is better studied by the dashboards and reports. Like, a coffee
shop experiencing heavy rush on a daythey least expected and
are ill-prepared to do anything about it.
D Diagnostic analysis: It refers to analysis of the past figures and
facts to derive the scenarios about what happened and why it hap
pened. The result of this analysis is often a pre-defined reporting
structure, such as root cause analysis (RCA) report. For example,
a root cause analysis may help in finding out the factors which
the above coffee shop owners fail to read and comprehend.
D Predictive analysis: It refers to analysis of probabilities. Predic
tive analysis tries to forecast on the basis of previous data and
sce narios. For example, a hotel chain owner might ramp down
pro motional offers during a restive season of rains in a coastal
area. This is based on the predictions that there is going to be
fewer footfalls due to heavy rain.
D Prescriptive analysis: This analysis type tells you about the
ac tions you should take. This is the most essential analysis type
and typically forms the standards and recommendations for the
next phase. For example, a doctor prescribes medicines to the
patient after researching, studying, evaluating and diagnosing
the cause of pain or irritation with the patient. Similarly,
organisations too, after drawing out the statements, resultants,
conclusions and oth-
N O T E S
er factors will take a step in ensuring that the factors affecting the
growth charts positively continue to exist, whereas the damaging
factors stay out of their future prospects.
ACTIVITY
Is there any other analysis type you can think of other than above
four models? What would it be?
businesses that mayend up exploiting its weaknesses and may turn its
strengths into weakness. Figure 3.1shows the SWOT diagram:
Strengths Opportunities
• VVhat does yourorganisation dobetter • VVhat political, economic, social-cultural,
than others? or technology (PESl) changes are taking
• VVhat are your unique selling points? place that could be favourable to
• VVhat doyou competitors and customers you?
in your market perceive asyour • VVhere are there currently gaps in
strengths? the market or unfulfilled demand?
• VVhat isyour organisations competitive • What new innovation
edge? couldyour organisation bring to
the market?
Weakness Threats
• VVhat doother organisations • VVhat political, economic, soc,al-cultural,
dobetter than you? ortechnology (PESl) changes are taking
• VVhat elements of your business place that could beunfavourable to you?
add little or no value? • VVhat restraints to you face?
• VVhat docompetitors and • VVhat isyour competition doing that
customersin your market perceive could negatively impact you?
asyour weakness?
On the other hand, new starters should include SWOT as their plan
ning process. SWOT is not necessarily a pan-organisation-based pro
cess; rather each of the organisation's departments can have their
own dedicated SWOT, such as Marketing SWOT, Operational
SWOT, Sales SWOT, etc.
N O T E S
PEST Analysis
N O T E S
ACTIVITY
Significance of BA:
D To get visions about customer behaviour: The prime
advantage of financing some BI software and expert is the fact
that it increas es your skill to examine the present customer-
purchasing trend. Once you know what your customers are
ordering, this informa tion can be used to create products
matching the present con sumption trends and, thus improve your
cost-effectiveness since you can now attract more valued
consumers.
. ..
N O T E S
N O T E S
ACTIVITY
BI-based solutions are most apt for industries with huge customer
base, higher competition levels and massive data volumes. Some of
the exclusive BI functions include the following:
D Examining sales trends
. ..
N O T E S
ACTIVITY
How can an election campaign benefit from BI? Make a case study
on it.
. N O T E S
With the help of BA, you get to know the pain points of your busi
ness; your product's standing in the market, your strengths related to
business that put you ahead of the competition and the opportunity
which you are yet to explore. BA helps you in knowing your
business thoroughly. BI helps in bridging that gap between ground
reality and management perspective on a pan-organisational basis.
N O T E S
ACTIVITY
Create a case study on election campaign for a new party using BA
system and compare the outcomes with that of BI system.
N O T E S
initiate outside the data from multiple sensor devices, and servers,
e.g. a spatial satellite or an oil rig in the sea.
D Artificial Intelligence (AI): This is a top trend as per multiple
studies with scientists targetting to make machines that do what
complex human reflexes and intelligence achieve. The analytical
work on such programmes is exponentially growing with AI and
machine-learning transforming the way we relate with the analyt
ics and data management.
D BI Centre of Excellence (CoE): Moving to a simpler, secure and
effective BI strategy isn't entirely the onus of IT. The difficulty
of the data management in huge companies is astounding, and
the need to strengthen it is becoming important. A growing
number of organisations are opting for BI and Analytical CoE to
substitute the implementation of self-serviced analytics. These
CoE centres will have a great role in applying an information-
driven culture and get the maximum advantage from a BI
solution. Through me diums like virtual forums ancl training, the
CoEs will authorise even laymen to include data in their decision-
making strategy. It is quite an efficient way of getting skilled
people, processes and technology aligned in a structured manner
at one place.
o Predictive analytics and impact on data discovery: By gather
ing more information, organisations will have the capacity to
build more detailed visual models that will help them to act in
more ac curate ways. For instance, having better information
models shows organisations more about what clients are
purchasing, and even what they are possibly going to purchase in
future. From CRM to sales or marketing deals, predictive
analytics and cutting edge BI are set to bring disruption.
D Cloud computing: Cloud computing is being absorbed into many
systems and will continue to grow. We've witnessed the division
of Cloud into multiple vendor systems and many companies are
utilising Cloud services to host the powerful data analytics tools.
A lot of customers are already using Microsoft Azure and
Amazon Redshift along with Cloud resources that provide
flexible handling and scalability for the data.
□ Digitisation: It is a process of turning any analogue image, sound
or video into a digital format understandable by the electronic de
vices and computers. This data is usually easier to store, fetch
and share than the raw original format (e.g. turning a tape
recorded into a digital song). The gains from digitising the
data-intensive processes are great with up to 90% cost cut and
much faster turn around times than before. Creating and
utilising software over manual processes allow the businesses
to gather and screen the data in real time, which assists the
managers to tackle issues be fore they turn critical.
. ..
N O T E S
ACTIV1TY
What trend you think can be emerging the next in BI and BA
field? Discuss.
llisuMMARY
□ Business Analytics is a group of techniques and applications for
storing, analysing and making data accessible to help users make
better strategic decisions.
O The analytics certainly influences the business by acquiring
knowl edge that can be helpful to make enhancements or bring
changes.
□ In diagnostic analysis, analysis of the past figures and facts to
de rive the scenarios about what happened and why it happened
is done.
□ Business analytics frequently utilises numerous quantitative tools
to convert big data into meaningful contexts valuable for making
sound business moves.
□ PESTLE stands for Political, Economic, Social, Technological,Le gal
and Environmental (PESTLE) - a method for figuring out nu
merous external impacts on a business.
□ Business Intelligence (BI) is the set of applications, technologies
and ideal practices for the integration, collection and
presentation of business information and analysis.
El KEYWORDS
o Business analytics: It is the subset of Business Intelligence,
which creates competences for companies to contest in the
mar ket efficiently.
O PEST analysis: It is an examination of the external environ
ment in which an organisation currently exists or is going to
enter or start.
. F.UNDAMENTAI::S :
'
N O T E S
Q111DESCRIPTIVE QUESTIONS
1. Discuss the concept of BA.
2. Enlist and explain different types of BA.
3. Explain the different analytical models with the help of real-
time examples.
4. Discuss the importance of BA with suitable examples.
5. Describe the importance of BI.
6. Discuss the evolution and relation between BA and BI.
N O T E S
E-REFERENCES
□ What is big data analytics? - Definition from Whatls.com. (n.d.).
Retrieved April 25, 2017, from http://searchbusinessanalytics.
techtarget.com/definition/big-data-analytics
□ What is business analytics (BA)? - Definition from Whatls.com.
(n.d.). Retrieved April 25, 2017, from http://searchbusinessanalyt
ics.techtarget.com/definition/business-analytics-BA
□ Monnappa, A. (2017, March 24). Data Science vs. Big Data vs. Data
Analytics. Retrieved April 25, 2017, from https://www.simplilearn.
com/data-science-vs-big-data-vs-data-analytics-article
CONTENTS
4.1 Introduction
4.2 What is Data, Information and Knowledge?
Self Assessment Questions
Activity
4.3 Business Analytics Personnel and their Roles
Self Assessment Questions
Activity
4.4 Required Competencies for an Analyst
Self Assessment Questions
Activity
4.5 Business Analytics Data
Self Assessment Questions
Activity
4.6 Ensuring Data Quality
Self Assessment Questions
Activity
4.7 Technology for Business Analytics
Self Assessment Questions
Activity
4.8 Managing Change
Self Assessment Questions
Activity
4.9 Summary
4.10 Descriptive Questions
4.11 Answers and Hints
4.12 Suggested Readings & References
.. F.UNDAMENTAI::S BUSINESS i\.NALYTICS
.
INTRODUCTORY CASELET
N O T E S
XYZ Inc. provides its consumers a private and tailor made cloud
infrastructure to execute important applications, with the help of
latest cutting edge tools, which support the company to look
after customer needs while reducing management and system
compli cations.
N O T E S
@J LEARNING OBJECTIVES
Ill INTRODUCTION
Business analytics is a process to filter and analyse sets ofdata
which might be small bits of data, a file containing the data or a
large col lection of data generally known as a database. With the
growth in the data, a need of storing it at some appropriate
location arises from where it can be easily accessed and modified
irrespective of geograph ical location. Unlike small datasets which is
useful only for individual organisations, Big Data is useful for
various organisations. To store BigData, companies use cloud
technology, data warehousing, etc. This data is further retrieved
from its storage and analytics is applied on it to derive useful
information. The analytics involves the use of various statistical
methods such as measures of central tendency, graphs, etc. to derive
significant information from data. This useful information is
further used in businesses for decision making, growth, planning,
creating action plans and increasing overall profitability. The way
of sorting the data to derive useful information has given a new
purpose to business analytics.
In this chapter, you will first study about data, information and
knowl edge. Next, the chapter discusses business analytics personnel
and their roles. Further, the chapter discusses the required
competencies for an analyst. Next, the chapter details upon business
analytics data and the importance of ensuring data quality. Towards
the end, the chapter dis cusses technology for business analytics and
change management.
N O T E S
Examples of data
2,4,6,8
Mercury, Jupiter, Pluto
The above data alone does not represent the true picture. Maybe the
sequence above is simply the table of two or a sequence denoting
the difference of two between numbers. The names may just be the
names of conference rooms in an organisation rather than being plan
et names, unless you give it a logic and define the reasoning for its
ex istence, the data alone does not have a standalone existence by
itself.
Information is the result that we achieve after the raw data is pro
cessed. This is where the data takes the shape as per the need and
starts making sense. Standalone data has no meaning. It only
assumes meaning and transitions into information upon being
interpreted. In IT terms, characters, symbols, numbers or images are
data. These are joint inputs which a system running a technical
environment needs to process in order to produce a meaningful
interpretation.
Information = Data +
Meaning Examples of
Information
2,4,6,8 are the results of first four multiples of 2.
Mercury, Jupiter, Pluto are the names of planets.
N O T E S
.....
N O T E S
ACTIVITY
N O T E S
N O T E S
ACTIVITY
As a business analyst, prepare a report on your analytical study
■
of Sony Corporation, currently undergoing turmoil for serving too
many areas in business fields.
N O T E S
N O T E S
N O T E S
ACTMTY
You are a veteran business analyst, responsible for coaching a
new batch of management trainees in an organisation. Layout the
course plans and methods you will utilise to train them about the
standards and the knowledge.
N O T E S
N O T E S
ACTIVITY
CONTROL
MONITOR
data to ensure that the data quality matches the desired levels. Ad
ditionally, information captured from one system to another compels
the company to monitor the data frequently to confirm consistency
across multiple systems. Data quality monitoring enables the organi
sation to actively discover issues before they affect the decision-mak
ing process.
IMPROVE
N O T E S
■
Prepare a report on popular tools used for measuring data quality.
While taking the human factor in mind, the change between reactive
and proactive decision making is defined by the complexity level of
the fields between advanced analytics and BI. Summary reports, sta
tistics and queries, and low-latency dashboards are built on chrono
logical information. There is a mid-ground for simple analytics, e.g.,
algebraic or trending predictions that give estimated answers about
expectations in terms ofsales, production, etc. Advanced analytics are
much more refined, support techniques such as statistical analysis,
forecasting, prediction and correlation, whereas trend analysis simply
infers the existing data to project the next quarter. A refined predic
tive model takes seasonality, correlations between strong and weak
quarters, and historical sales outlines into account.
Let's take a look at decision making from another point of view. Say
we want to examine our brain while taking a decision. From a logi
cal viewpoint, when our brain encounters a task it has no idea about,
it attempts to create rational assumptions guessing the input, likely
outcomes vs. actions to be taken, and attempts to find the best an
swer. When the brain encounters the same level of problem again, it
re-imagines the outcomes and methods deployed as in the old task,
before trying to figure out the right answer to the current problem, as
sesses what worked earlier and what did not. After being subjected to
a certain amount of similar or varying tasks, brain becomes familiar
to cracking a specific type of task. Consequently, the time of re-
examin ing the older solutions and finding the right solution for the
new task reduces significantly.
ii:■MANAGING CHANGE
There are numerous reasons why change is fraught with stares -
our characteristic necessity of having a sense of security around
existing processes and comfort zone is often tough to break which
further helps in decreasing a contemporary change's probability of
accom plishment. For example, many Windows XP user, most of
them being elderly bank employees in India, were intimidated on
hearing Micro soft discontinuing support for XP, since they had to
learn new OS from scratch and that for them could have taken
considerable time, if not long. Instead, they found ways of doing
existing work efficiently with available resources and with the help
of consultants hired to drain the fear factor that had them on their
toes. Change management in field
••
N O T E S
• Compare planned
• Register and study and actual indi
corporate data cators
• Follow the budget
Change
Manageme • Evaluate the
• Implement a efficiency of
achieved targets
system • Make
exact
decisions
There should be multiple phase auditors to ensure that the roles and
responsibilities of one phase assigned to a business analyst do not
seep into the other phases, affecting the overall outcomes and
messing up the overall project execution.
At the point when the vast majority consider helping people to adjust
to a change, the two most commonly used methods are Training and
Communication. Both are important tools that are expected to help
individuals work through the change procedure, and help address the
awareness and ability/knowledge areas. Nonetheless, they are not ad
equate to completely back the implementation of a change.
N O T E S
N O T E S
ACTIVITY
Your existing medical project requires some sudden changes due
to a large influx of disorganised sample data. Not only that, it also
requires change in system dynamics being used so far to manage
the existing volumes of data. How will you proceed to ensure an
effective change management being carried out without affecting
operations?
IIJjsuMMARY
D Data, to put simply, is the raw material that does not make any
definite sense unless you process it to any meaningful end.
□ Information is the result which we achieve after the raw data is
processed.
□ Standalone data has no meaning rather it only assumes
meaning and transitions into information upon being
interpreted.
□ Knowledge is something that is inferred from the data and infor
mation.
□ A business analyst is anyone who has the key domain experience
and knowledge related to the paradigms being followed.
D Business analysts need not necessarily be from the IT background
although it certainly helps having a basic understanding IT sys
tems and how they work.
D When the data quality checks report a decline in quality, a few
cor rective measures can be deployed.
D Change administration is a different field than business analysis;
however, the two are extremely correlative.
m KEYWORDS
D Business analyst: Anyone who has the key domain experience
and knowledge related to the paradigms being followed.
D Explicit knowledge: A type of knowledge that can be simply
transferred to others.
□ Information: It is the result that we achieve after the raw
data is processed.
D Stakeholder management: It is a process of dealing with
stake holders and understanding how much power and impact
they have on your project.
□ Tacit knowledge: A type of knowledge that is complex and
intri cate and is gained simply by passing on to others and
requires elevated and advance skills in order to be
comprehended.
N O T E S
2. True
Business Analytics Person-
nel and their Roles 3. Failure
4. Segregate
Required Competencies
for an Analyst 5. True
6. False
Business Analytics Data
7. Levels
8. Pattern
Ensuring Data Quality 9. Monitoring, improvement
10. False
Technology for Business 11. a. Associative Query Logic
Analytics
Managing Change 12. Change
N O T E S
E-REFERENCES
□ Risk, S. (n.d.). Business Analytics less Data Quality equals Bad
Decisions. Retrieved April 26, 2017, from https://www.blue-
gran ite.com/blog/business-analytics-less-data-quality-equals-bad-
de cisions
□ Data Quality for Business Analytics by David Loshin - BeyeNET
WORK. (n.d.). Retrieved April 26, 2017, from http://www.b-eye-
net work.com/view/15539
CONTENTS
5.1
Introduction
5.2
Visualising and Exploring Data
5.2.1 Dashboards
5.2.2 Column and Bar Charts
5.2.3 Data Labels and Data Tables Chart Options
5.2.4 Line Charts
5.2.5 Pie Charts
5.2.6 Scatter Chart
5.2.7 Bubble Charts
5.2.8 Miscellaneous Excel Charts
5.2.9 Pareto Analysis
Self-Assessment Questions
Activity
5.3 Descriptive Statistics
5.3.1 Central Tendency (Mean, Median and Mode)
5.3.2 Variability
5.3.3 Standard Deviation
Self-Assessment Questions
Activity
5.4 Sampling and Estimation
5.4.1 Sampling Methods
5.4.2 Estimation Methods
Self-Assessment Questions
Activity
5.5 Introduction to Probability Distributions
Self-Assessment Questions
Activity
CONTENTS
5.6
Summary
5.7 Descriptive Questions
5.8 Answers and Hints
5.9 Suggested Readings & References
•
INTRODUCTORY CASELET
N O T E S
N O T E S
@ LEARNING OBJECTIVES
After studying this chapter, you will be able to:
>- Explain about visualising and exploring data
>- Describe descriptive statistics
>- Define sampling and estimation
>- Elucidate probability distributions
h■INTRODUCTION
Descriptive analytics is the most essential type of analytics and
estab lishes the framework for more advanced type of analytics. This
sort of analysis involves "What has occurred in the corporation" and
"What is going on now?" Let us consider the case of Facebook.
Facebook user produce content through comments, posts and picture
uploads. This information is unstructured and is produced at an
extensive rate. Facebook stats reveal that 2.4 million posts equivalent
to around 500 TB of information are produced every minute. These
jaw-dropping figures have offered popularity of another term which
we know as Big Data.
There are three crucial approaches to abridge and describe the raw
data:
D Dashboards and MIS reporting: This technique gives
condensed data giving information on "What has happened",
"What's been going on?" and "How can it stand with the plan?"
D Impromptu detailing: This technique supplements the past
strat egyin helping the administration to extract the information
as re quired.
D Drill-down reporting: This is the most complex piece of
descrip tive analysis and gives the capacity to delve further into
any report to comprehend the information better.
5.2.1 DASHBOARDS
7 0 l 2 3 4 5 6 7
6.3
6
2012
5
4
2013
3
2014 6.3
0
2OU 2013
2014
250 35%
30%
200
- - - - 25%
- -
- -
150 20%
100 - r-,
- 15%
10%
50
5%
0 I 0%
J F M A M J J A s 0 N D
1 PeopleCount 145 109 105 100 1.45 109 130 140 150 193 185 171
I-%Labor Cost 20% 21% 23% 23% 24% 25% 24% 25% 24% 26% 28% 29%
Line charts are a useful way of displaying data for a given period.
You may enter multiple series of data in line charts; however, it can
be come difficult to interpret ifthe size of data values differs
exponential-
•
N O T E S
.."
. !!
80
◊
◊
◊
◊
i 60 ◊
.s
."?,
a:
.. 0.--c..
$:I 1S
20 .,a'
0
Jan Feb Mar Apt' May Jun Jul Aug Sep Oct Hov Dec
Website visits
---------------------------Chart Title
Scatter Chart-
Value Boxes
100
... .
---Tooltips
,.
80
catterMarl<ers
! 60
• • -- Plot
i ••
,
• •••••
Area
i----------Legend
►
?O
• Scale
(sca1e-x or scale-y)
---Tick Marks
.:: -----------------------------------•ScaleLabel•
-----------------==-ScaleTitle
r-,
1996 1997 19911 1999 2000 2001 2002 c003 2004 2005 2006 2007 2008 2009
.Turi< • s,.,.w.,., Epsode1 • lhe A>aonl<>mMenoce
e Horry PotterondtheSoroorer"s - • lhel<>rdof tho Rhgs:lhe Two T. . .,.
• lhe 1.on:to1theRhos:lhe 1te11.mo1the no • SIYek2
PTotesolthoc..tboM:OeodMan"s0-..st PTotosd thoc.,t,beon: /II.Worl<fs Er4
• Horry PottwondtheO.derol the- lhe0¥kl(rActt
N O T E S
ACTIVITY
Prepare a report on data visualisation tools available on the Web
other than the tools discussed in the chapter.
MEAN
The mean of n observations sample, x1, x2, .•. x", ,is calculated as
n
xi
Xi==l
n
Note that the calculations for the mean are the same whether we are
dealing with a population or a sample; only the notation differs. We
may also calculate the mean in Excel using the function AVERAGE
(data range).
One property of the mean is that the sum of the deviations of each
observation from the mean is zero:
This simply means that the sum of the deviations above the mean is
the same as the sum of the deviations below the mean. Thus, the
mean "balances" the values on either side of it. However, it does not
suggest that half the data lie above or below the mean.
N O T E S
MEDIAN
The measure of location that specifies the middle value when the
data are arranged from least to greatest is the median. If the number
of observations is odd, the median is the exact middle of the sorted
num bers - i.e. the 4 observation. If the number of observations is
even, say 8, the median is the mean of the two middle numbers - i.e.
mean of 4th and 5th observation. We can use the Sort option of MS
Excel to order the data as per the rank and then find the median. The
Excel function MEDIAN (data range) could also be used. The
median is meaningful for ratio, interval and ordinal data. As opposed
to the mean, the medi an is not affected by outliers.
MODE
MIDRANGE
5.3.2 VARIABILITY
The bigger the variance is, the more is the spread of the observations
from the mean. This indicates more variability in the observations.
The formula used for calculating the variance is different for popula
tions and samples.
where xi is the value of the ith item, N is the number of items in the
population, andµ. is the population mean.
•
N O T E S
8
=
2 -'-,--'-1 _
n-1
where n is the number of items in the sample and is the sample mean.
µ., = (1 + 3 + 5 + 7) / 4 = 4
Insert all known values into the formula for the variance, as shown
below:
The square root of the variance is the standard deviation. For a popu
lation, the standard deviation is computed as:
a = u-'-,--'-1 ---
N
s= \l-'-,--'-1 ---
n-1
x = (1 + 3 + 5 + 7) / 4 = 4
Then, we insert all the known values into formula for calculating the
SD of a sample, as shown below:
s = sqrt { [ ( -3 )2 + ( -1 )2 + ( 1 )2 + ( 3 )2 ] / 3 }
s = sqrt { [ 9 + 1 + 1 + 9 ] / 3 } = sqrt (20 / 3) = sqrt ( 6.67 ) = 2.58
STANDARDISED VALUES
We subtract the sample mean from the ith observation, xi' and divide
the result by the sample standard deviation. The numerator denotes
the distance that xi is away from the sample mean; a negative value
designates that x,is at the left of the mean, and a positive value
means it lies at the right. By dividing by the standard deviation, s, we
scale the distance from the mean to express it in units of standard
devia tions.
N O T E S
COEFFICIENT OF VARIATION
CV = Standard Deviation/Mean
ACTIVITY
Prepare a report on the relationship between statistical analytical
concepts and their usage in analytical sciences in the simplest
man ner possible.
N O T E S
and then every 2000th name can be selected. This approach can
be used for sampling telephone supported by an automated dialler
used to dial numbers in an orderly manner. However, systemat ic
sampling is complex compared to random sampling as for any
given sample, every possible sample of a given size of the
popula tion has no equal chance of getting selected. In few
situations, this method can bring weighty bias if the population
has some basic pattern. For example, sampling the orders
received on each Sun day may not produce an illustrative sample
if consumers tend to order more or less i=on other days.
□ Stratified sampling: It applies to populations divided into natu
ral subsets (strata). For example, a large city may be divided into
political districts called wards. Each ward has a different number
of citizens. A stratified sample would choose a sample of
individu als in each ward proportionate to its size. This approach
ensures that each stratum is weighted by its size relative to the
population and can provide better results than simple random
sampling if the items in each stratum are not homogeneous.
However, issues of cost or significance of certain strata might
make a disproportion ate sample more useful. For example, the
ethnic or racial mix of each ward might be significantly different,
making it difficult for a stratified sample to obtain the desired
information.
D Cluster sampling: It refers to dividing a population into clusters
(subgroups), sampling a cluster set, and conducting a complete
survey within the sampled clusters. For instance, a company
might segment its customers into small geographical regions. A
cluster sample would consist of a random sample of the
geographical re gions, and all customers within these regions
would be surveyed (which might be easier because regional lists
might be easier to produce and mail).
□ Sampling from a continuous process: Selecting a sample from
a continuous manufacturing process can be accomplished in two
main ways. First, select a time at random; then select the next n
items produced after that time. Second, randomly select n times;
select the next item created after each of these times. The first
approach generally ensures that the observations will come from
a homogeneous population; however, the second approach might
include items from different populations if the characteristics of
the process should change over time, so caution should be used.
UNBIASED ESTIMATORS
It seems quite intuitive that the sample mean should provide a good
point estimate for the population mean. However, it may not be clear
why the formula for the sample variance we read previously, has a
denominator of n - 1, particularly because it is different from the for
mula for the population variance. In these formulas, the population
variance is computed by
N
I;(x. - ,,)2
d2 = •- l _
N
s2 = _;-_1 _
n-1
Whyso?Statisticians develop many types of estimators, and from a
theoretical as well as a practical perspective, it is important that they
estimate the population parameters truly as they are expected to es
timate. Say, we perform a test where we frequently sampled from a
population and calculated a point estimate for a population parame
ter. Each individual point estimate varies from population parameter;
though, the long-term average (probable value) of all the likely point
estimates would be identical to the population parameter, hopefully.
If the likely value of an estimator is equal to the population
parameter it is supposed to estimate, the estimator is credited as
impartial else the estimator is called biased and will yield incorrect
results.
SAMPLING DISTRIBUTIONS
We can quantify the sampling error in estimating the mean for any
unknown population. Todo this, we need to characterise the sampling
distribution of the mean.
CONFIDENCE INTERVALS
PREDICTION INTERVALS
N O T E S
Note that this interval is wider than the confidence interval by the
additional value of 1 under the square root. This is because, in
addi tion to estimating the population mean, we must also account
for the variability of the new observation around the mean.
ACTIVITY
INTRODUCTION TO PROBABILITY
DISTRIBUTIONS
The concept of probability is prevalent everywhere, from stock mar
ket predictions and market research to weather forecasts. In a busi
ness, managers need to know the likelihood that a new product will
be profitable or the chances that a project will be completed on time.
Probability quantifies the uncertainty that we encounter all around us
and is an important building block for business analytics applications.
Probability is the likelihood that an outcome occurs. Probabilities are
expressed as values between 0 and 1, although many people convert
them to percentages. The statement that there is a 10%chance that oil
prices will rise next quarter is another way of stating that the proba
bility of a rise in oil prices is 0.1.
N O T E S
And let P(O) be the probability related with the outcome O;.
The union of A and Bis the event {2, 3, 7,11,12}. The probability
that some outcome in either A or B (i.e., the union of A and B)
occurs is denoted as P(Aor B). Finding this probability depends on
whether the events are mutually exclusive or not. Twoevents are
mutually exclu sive if they have no outcomes in common. The events
A and B in this example are mutually exclusive. When events are
mutually exclusive, the following rule applies:
□ If events Aand B are mutually exclusive, then P(Aor B) = P(A) +
P(B)
□ If two events A and B are not mutually exclusive, then P (A or B)
=P(A) + P(B) - P (A and B). Here, (A and B) represents the
inter section of events A and B, that is, all outcomes belonging to
both AandB.
CONDITIONAL PROBABILITY
N O T E S
BERNOULLI DISTRIBUTION
BINOMIAL DISTRIBUTION
POISSON DISTRIBUTION
UNIFORM DISTRIBUTION
NORMAL DISTRIBUTION
Using sample data may limit our ability to predict uncertain events
that mayoccur because potential values outside the range of the sam
ple data are not included. A better method is to identify the
probability distribution of the sample data by retrofitting a theoretic
distribution to the data and verifying it.
Summary statistics can also provide clues about the nature ofa distri
bution. The mean, median, standard deviation and coefficient of vari
ation often provide information about the nature of the distribution.
•
N O T E S
For instance, normally distributed data tend to have a fairly low coef
ficient of variation (however, this may not be true if the mean is
small).
For normally distributed data, we would also expect the median and
mean to be approximately the same. For exponentially distributed
data, however, the median will be less than the mean. Also, we
would expect the mean to be about equal to the standard deviation,
or, equiv alently, the coefficient of variation would be close to 1. We
could also look at the skewness index. Normal data are not skewed,
whereas lognormal and exponential data are positively skewed. The
following example of Analysing Airline Passenger Data will help
better in un derstanding the distribution of a normal data.
ACTMTY
Outline your plans if you are assigned an opportunity to study,
eval uate and come out with an execution plan for a newly
launched store chain that is planning to maximise their sales.
4J■suMMARY
□ Descriptive analytics is the most essential type of analytics and
es tablishes the framework for more advanced type of analytics.
□ Data visualisation is the method of showing data in a graphical
manner to provide insights that help take better decisions.
:
N O T E S
El KEYWORDS
N O T E S
3. Tn1e
Descriptive Statistics
4. Statistics
5. True
6. Median
Sampling and Estimation 7. d. All of these
8. Random
9. Stratified
Introduction to Probability Dis 10. False
tributions
11. Conditional
12. Random
N O T E S
E-REFERENCES
D Descriptive, Predictive, and Prescriptive Analytics Explained.
(2016, August 05). Retrieved May 01, 2017, from https://halobi.
com/2016/07/descriptive-predictive-and-prescriptive-analytics-ex
plained/
D Big Data Analytics: Descriptive Vs. Predictive Vs. Prescriptive.
(n.d.). Retrieved May 01, 2017, from http://www.information
week.com/big-data/big-data-analytics/big-data-analytics-descrip
tive-vs-predictive-vs-prescriptive/d/d-id/1113279
D What is descriptive analytics? - Definition from Whatls.com.
(n.d.). Retrieved May 01, 2017, from
http://whatis.techtarget.com/defini tion/descriptive-analytics
CONTENTS
6.1 Introduction
6.2 Predictive Modelling
6.2.1 Logic Driven Models
6.2.2 Data Driven Models
Self Assessment Questions
Activity
6.3 Introduction to Data Mining
Self Assessment Questions
Activity
6.4 Data Mining
6.4.1 Methodologies
6.4.2 Classification
6.4.3 Regression
6.4.4 Clustering (K-means)
Artificial Neural Networks
Self Assessment Questions
6.5 Activity
6.6 Summary
6.7 Descriptive Questions
6.8 Answers and Hints
Suggested Readings & References
INTRODUCTORYCASELET
N O T E S
N O T E S
@J LEARNING OBJECTIVES
CM ■ INTRODUCTION
In the previous chapter, you have learned about descriptive
analytics analyses a database to provide information on the trends
of past or current business events that can help managers, planners,
leaders, etc., to develop a road map for future actions. Descriptive
analytics performs an in-depth analysis of data to reveal details
such as fre quency of events, operation costs, and the underlying
reason for fail ures. It helps in identifying the root cause of the
problem. On the other hand, Predictive analytics is about
understanding and predicting the future and answers the question
'What could happen?' by using statis tical models and different
forecast techniques. It predicts the near fu ture probabilities and
trends and helps in what-if analysis. In predic tive analytics, we use
statistics, data mining techniques, and machine learning to analyse
the future. Figure 6.1 shows the steps involved in predictive
analytics:
r
Level ol
Insight
---- BlMaturity----
In this chapter, you will first learn about about predictive modelling.
Further, the chapter discusses about the concept of data mining. To
wards the end, the chapter discusses about different data mining
methodologies such as classification, regression, clustering (K-
means) and artificial neural networks.
:
N O T E S
N O T E S
nection between cases of the instances "issues with the item", for ex
ample, and increase in customer service calls.
Logic driven models are created on the basis of inferences and postu
lations which the sample space and existing conditions provide.
Creat ing logical models requires solid understanding of business
functional
N O T E S
30% of the customers do not return each year, while 70% do return to
provide more business to the restaurant.
Armed with all the above details, we can logically arrive at a conclu
sion and can derive the following model for the above problem state
ment:
where,
So, as you can see, logical driven predictive models can be derived
for a number of situations, conditions, problem statements and a lot
other scenarios where predictive analytical models provide a
futuristic view on the basis of validation, testing and evaluation to
guess the likeli hood of an outcome in a given set amount of input
data.
N O T E S
comes based on the data. Refer to the caselet in this chapter for data
driven modelling - Samsung's case with their product and their en
suing actions as a good example of data driven predictive modelling.
ACTIVITY
Create a data driven model using MS Excel to denote the
variation in a product's sales for last 3 years.
N O T E S
ACTIVITY
Create a PowerPoint presentation on techniques used in data min
ing and show it in your class.
N O T E S
6.4.1 CLASSIFICATION
Classification is the process of analysing data to predict howto
classify a new data element. An example of classification is spam
filtering in an e-mail client. By examining textual characteristics
of a message (subject header, key words, and so on), the message
is classified as junk or not. Classification methods can aid predicting
if a credit-card charge may be fake, risk details of a loan applicant, or
whether expect ing a consumer response to an advertisement.
Classification is about predicting a positive conclusion based on a
given input and algorithm. The algorithm attempts to determine the
relationships between the attributes that will make it feasible to fore
cast the outcome. Next an unseen data set is given to the algorithm,
called prediction set, containing the same set of attributes, excluding
the prediction attribute. The algorithm examines the input and yields
a prediction. The accuracy of the prediction describes about the ef
ficiency of the algorithm. For example, the training set in a medical
database would have applicable patient information captured earlier
in which the prediction attribute is the patient's heart problem.
PREDICTIVE :ANALYTICS
N O T E S
Training set
Aee Heart rate Blood oressure Heart oroblem
65 78 150/70 Yes
37 83 112/76 No
71 67 108/65 No
Prediction set
Ae Heart rate Blood pressure Heart problem
43 98 147/89 7
65 58 106/63 ?
84 77 150/65 ?
6.4.2 REGRESSION
. ··•--·-· •
• • •
• • •
N O T E S
Agglomerative(AGNES)
The goal of this algorithm is to find groups in the data, with the
num ber of groups represented by the variable K. The algorithm
works it eratively to assign each data point to one of K groups
based on the features that are provided. Data points are clustered
based on feature similarity. The results of the K-means clustering
algorithm are:
1. The centroids of the K clusters, which can be used to label
new data
2. Labels for the training data (each data point is assigned to a
single cluster)
•
N O T E S
BUSINESS USES
Subject A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
This data set is to be clustered into twogroups. Let the Aand B values
of the two individuals farthest apart (using the Euclidean distance
cal culation), define the initial cluster means:
Cluster 1 Cluster 2
Mean Vector Mean
Step Individual Individual
(Centroid) Vector
(Centroid)
1 1 (1.0. 1.0) 4 (5.0, 7.0)
2 1, 2 (1.2. 1.5) 4 (5.0, 7.0)
3 1, 2,3 (1.8. 2.3) 4 (5.0, 7.0)
4 1,2,3 (1.8. 2.3) 4, 5 (4.2, 6.0)
5 1, 2, 3 (1.8. 2.3) 4, 5,6 (4.3, 5.7)
6 1, 2,3 (1.8. 2.3) 4,5, 7 (4.1, 5.4)
Now the initial partition is no more the same and the two clusters
cur rently have the following features:
But since not everyone has been assigned to the respective cluster,
we cannot say for sure. So, we relate everyone's distance from its
own cluster mean and of the opposite cluster:
N 0 T E s
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
The recurring movement would continue from this new partition till
no 1nore relocations remain to occur. However, in this example, each
person gets nearer to its own cluster mean than the other cluster and
the recurrence stops, choosing the latest partitioning as the final clus
ter solution.
-········...........
--......,.·--••••J Output layer (Predictions)
HiddenLayers (PredictionFunctionsl
lime
Input layer (Training Date)
Past
The input layer feeds past data values into the next (hidden) layer.
The black circles denote the hubs of the neural system. The hidden
layer stores a few complex functions that make predictors; those
functions are oblivious from the client. An arrangement of hubs
(black circles) at the hidden layer speaks to mathematical functions
called neurons, that alter the data information. The output layer
gathers the predic tions from the hidden layer and delivers the
outcome.
N O T E S
N O T E S
ACTIVITY
A consumer products company has collected some data relating to
the advertising expenditure and sales of one of its products:
Advertising cost Sales
$300 $7000
$350 $9000
$400 $10000
$450 $10600
Figure out the model that would best depict the above data in
the least number of steps.
liiJsuMMARY
D Predictive modelling is the method of making, testing and authen
ticating a model to best predict the likelihood of a conclusion.
D Predictive analysis and models are characteristically used to pre
dict future probabilities.
D Predictive models are representations of the relationship between
how a member of a sample performs and some of the known
char acteristics of the sample.
D Predictive analytics methods depend on the quantifiable variables,
controlling metrics to forecast future performance or outputs.
□ Logic driven models are created on the basis of inferences and
postulations provided by the sample space and existing
conditions.
□ A data-driven model is based on the data analysis of a specific
sys tem.
□ Regression analysis and forecasting models help us to predict
rela tionships or future values of variables of interest.
D Association rules are intended to discover such broad
association designs among data in large databases.
D Regression analysis is an instrument for creating statistical and
mathematical models that define relations between a dependent
variable (should be ratio variable, not categorical) and one or
more descriptive or independent numerical (ratio or categorical)
vari
a bles.
aD
KEYWORDS
Association rules: These rules are used to discover broad
asso ciation designs in large databases.
D Cause-and-effect modelling: It is the process of developing
an alytic models to describe the relationship between metrics
that drive business performance.
••
•
N O T E S
Predictive modelling
7. Tnte
8. (Rx Fx M)/D
Introduction to Data Mining 9. Tn1e
10. d. All of these
7. Tnte
8. modeling
9. False
Data Mining Methodologies 10. False
11. Cause-and-effect
12. Tnte
13. simple
14. False
:
N O T E S
E-REFERENCES
D Predictive analytics. (2017, May 09). Retrieved May 16, 2017, from
https://en.wikipedia.org/wiki/Predictive_analytics
D What is predictive analytics? - Definition from Whatls.com.
(n.d.). Retrieved May 16, 2017, from
http://searchbusinessanalytics.
techtarget.com/definition/predictive-analytics
D Impact, I. P., & World, P.A. (n.d.). Predictive Analytics World.
Re trieved May 16, 2017, from
http://www.predictiveanalyticsworld.
com/predictive_analytics.php
CONTENTS
7.1 Introduction
7.2 Overview of Prescriptive Analytics
7.2.1 Prescriptive Analytics brings a lot of Input into the Mix
7.2.2 Prescriptive Analytics Comes of Age
7.2.3 How Prescriptive Analytics Functions
7.2.4 Commercial Operations and Viability
7.2.5 Research and Innovation
7.2.6 Business Development
7.2.7 Consumer Excellence
7.2.8 Corporate Accounts
7.2.9 Supply Chain
7.2.10 Governance, Risk and Compliance
Self Assessment Questions
Activity
7.3 Introduction to Prescriptive Modeling
7.3.1 The Waterfall Model
7.3.2 Incremental Process Model
7.3.3 Rapid Application Development (Rad)
Model Self Assessment Questions
Activity
7.4 Non-linear Optimisation
Self Assessment Questions
Activity
7.5 Summary
7.6 Descriptive Questions
7.7 Answers and Hints
7.8 Suggested Readings & References
INTRODUCTORYCASELET
N O T E S
After many weeks had passed, one day Bill's cell phone received
a notification while he was driving the car. Upon opening the no
tification, Bill was surprised to get an alert message from the fast
food vendor, which notified that the place where Bill was current
ly travelling had a restaurant where he could use his coupon. Ini
tially though shocked, Bill had always heard about the existence
of this cutting-edge technology, but had not known that one day
he might be benefited from this technology. This technology is a
sheer example of what retailers can do in the future bycombining
the geo-location ability of the phone along with any other
informa tion, which they had acquired from their customers.
In this caselet, you can see how the credit card uses prescriptive
analytics to link customers with their requirements. After all the
information of an individual is being shared with the company, it
can use many mathematical modeling and statistics methods to
find actionable insights, which can again be used to help
customer to get better results.
N O T E S
@J LEARNING OBJECTIVES
ill INTRODUCTION
After studying predictive and descriptive analytics steps in the previ
ous chapters of the business analytics process, one should be in a
good position to take the final step, i.e., prescriptive analytics. This
analysis will provide a prediction or a forecast of what future trends
in the business may look like.
By the end of this chapter, readers will understand how various class
es of analytics-predictive and descriptive-can lead to prescriptive
analysis. This chapter will first discuss the meaning of prescriptive
analytics. Next, the chapter discusses prescriptive modeling. In the
end, the chapter discusses non-linear optimisation.
N O T E S
■ OVERVIEW OF PRESCRIPTIVE
ANALYTICS
Prescriptive analysis answers 'What should we do?,' on the basis of
complex data obtained from descriptive and predictive analyses.
By using the optimisation technique, prescriptive analytics
determines the finest substitute to minimise or maximise some
equitable finance, marketing and many other areas. For example, if
we have to find the best wayofshipping goods from a factory to a
destination, to minimise costs, we will use prescriptive analytics.
Figure 7.1shows a diagram matic representation of the stages
involved in the prescriptive analyt ics:
I
1
Future
M•fukM =1191-
+
-
Repeat
l Prescriptive Analytics J
Unstructured
Since a lot has been written on Big Data, we will focus on analytics,
which will help companies transform the finance function by offering
forward looking insights and help them devise a solution appropriate
for the optimal course of action, improve the ability to communicate
and collaborate with other companies at a lower cost of ownership.
N O T E S
markets are demanding which items are key zones for prescriptive
analytics including:
D Identifying and settling on choices about circumstances/rising
ranges of unmet need
D Predicting the potential advantage
ACTIV1TY
INTRODUCTION TO PRESCRIPTIVE
MODELING
Prescriptive analytics methods are not just concentrating on Why,
How, When, and What; additionally they also prescribe acceptable
behavior for taking advantage of the situation. Prescriptive analytics
every now and then proved itself as a benchmark for an
organisation's analytics development. Segments of prescriptive
analytics are:
a Evaluate and choose better ways to deal with work
b. Target business goals and conform all restrictions
ACTIVITY
Create a datasheet related to the allocation of budgets for creating
a construction building in your locality. Use prescriptive analytics
to find out the annual budget allocation for the maintenance of the
construction building in the next 5 years in your locality.
di NON-LINEAR OPTIMISATION
You already know that there are numerous numerically programed,
nonlinear techniques and methodologies intended to produce ideal
business execution arrangements. The greater part of them require
cautious estimation of parameters that could possibly be exact, espe
cially given the exactness required of an answer that can be so dubi
ously subordinate upon parameter precision. This accuracy is further
confounded in business analytics by the vast information records that
ought to be figured into the model-building effort. To conquer these
impediments and be more comprehensive in the utilisation of sub
stantial information, regression software can be used. Curve Fitting
programming can be utilised to create predictive analytical models
that can likewise be used to help in settling on prescriptive analytical
decisions.
The curve that perfectly fits the plotted revenue and cost of TV
promo tion is cubic and is plotted in Figure 7.2. The R-square
accomplished through the cubic condition is an astounding 98.7%.
The first and sec ond request subordinates of the cubic condition are
figured as takes after:
Polp,omial Equation, y- -3£-06x' + 0.016.,.; + 1.601.,· + 177.J2 ... (Equation
I)
First Dern·ati\'C of Equation I, dy - -9.00£-06:,. + 0.032x + 1.601... (Equation 2)
dx
Second Order Deri,·a1ive of Equation 1. - -1.80E-05x + 0.03:Z... (Eq1.1ation 3)
dx
N O T E S
ACTIVITY
Create some teams in your class, each having four students and go
to the nearest truck dealer. Use the non-linear optimisation
method to calculate how to minimise the cost of transport as a lot
of trucks of the dealer ship goods to a large network of markets or
stores.
Q,,WsUMMARY
D By using the optimisation technique, prescriptive analytics deter
mines the finest substitute to minimise or maximise some equita
ble finance, marketing and many other areas.
D Data, which is available in abundance, can be streamlined for
growth and expansion in technology as well as business.
D In real life, prescriptive analytics can automatically and continu
ously process new data to improve forecast accuracy and offer
bet ter decision options.
D Prescriptive analytics are an absolute necessity for any company
to execute key marketing strategies.
D Corporate account functions can immensely use prescriptive ana
lytics to improve their capacity to settle on choices that help drive
internal excellence and outer strategy.
D Prescriptive analytics can likewise furnish, supply chain capaci
ties with an upper hand through the capacity to predict and make
decisions in a few basic areas.
D RAD is a Rapid Application Development model. Using the RAD
demonstrate, programming item is produced in a brieftimeframe.
D The inflection point is recognised where the second derivative
changes from positive to negative.
m KEYWORDS
D Analytics: It refers to the discovery, interpretation and
commu nication of meaningful patterns in data.
D Descriptive analytics: It is a preliminary stage of data process
ing that creates a summary of historical data to yield useful in
formation and possibly prepare the data forfurther analysis.
D Prescriptive analytics: It is the area of business analytics (BA)
dedicated to finding the best course of action for a given
situation.
D Predictive modelling: It is a process that uses data mining and
probability to forecast outcomes.
D Waterfall model: The model in which each stage is
completely finished before the start of the following stage.
N O T E S
fl■DESCRIPTIVE QUESTIONS
1. Explain the concept of perspective analytics along with its
functions.
2. What do you understand by perspective modeling? Discuss
three kinds of prescriptive process models in business.
3. Describe the non-linear optimisation in analytics.
4. Discuss the importance of perspective analytics in commercial
operations, research and innovation, business development and
consmner excellence.
2. subjective, decision-making
3. prescriptive
4. True
Introduction to
Prescrip tive Modeling 5. b. Target business goals and con
form all restrictions
6. d. It is a poor model for long
ac- tivities
7. Application
11. inflection
12. Mixed Integer Non-Linear
Programming
N O T E S
E-REFERENCES
□ August 17, 2016 • by Tuhin Chattopadhyay • in Big Data Analyt
ics. (20161 August 23). Application of Derivatives to Nonlinear
Programming for Prescriptive Analytics. Retrieved May 02, 2017,
from https://www.blueoceanmi.com/blueblog/application-deriva
tives-nonlinear-programming-prescriptive-analytics/
D Beginning Prescriptive Analytics with Optimization Modeling by
Jen Underwood- BeyeNETWORK. (n.d.). Retrieved May 02, 2017,
from http://www.b-eye-network.com/view/17152
D Prescriptive Analytics. (n.d.). Retrieved May 02, 2017, from
https:// www.mathworks.com/discovery/prescriptive-
analytics.html
CONTENTS
8.1 Introduction
8.2
Social Media Analytics
Self Assessment Questions
Activity
8.3 Key Elements of Social Media
Self Assessment Questions
Activity
8.4 Overview of Text Mining
8.4.1 Understanding Text Mining Process
Sentiment Analysis
Self Assessment Questions
Activity
8.5 Performing Social Media Analytics and Opinion Mining on Tweets
Self Assessment Questions
Activity
8.6 Online Social Media Analysis
Self Assessment Questions
Activity
8.7 Mobile Analytics
8.7.1 Define Mobile Analytics
8.7.2 Mobile Analytics and WebAnalytics
8.7.3 Types of Results from Mobile Analytics
8.7.4 Types of Applications for Mobile
Analytics
Self Assessment Questions
8.8 Activity
8.8.1 Mobile Analytics Tools
8.8.2 Location-based Tracking tools
8.8.3 Real-Time Analytics Tools
User Behavior Tracking Tools
CONTENTS
N O T E S INTRODUCTORY CASELET
@J LEARNING OBJECTIVES
1:j• INTRODUCTION
In a world where information is readily available via Internet on the
click of a button, organisations need to remain abreast with the ongo
ing events and latest happenings in order to gain a competitive edge
over business markets. Apart from that, organisations also need to
interact with their consumers more effectively in order to gain an in
sight about the ongoing business trends and the market position of
particular products. Social media provides an opportunity to business
organisations and individuals to connect and interact with each other
worldwide. With the evolution of social media as a tool to connect
with the existing and potential customers, business organisations
have be gun to recognise the requirement of employing social media
analytics for gaining crucial business insights and taking timely
decisions.
This chapter discusses the role of social media and the importance of
conducting social media analytics by business organisations. These
analyses help organisations to evaluate feedback from consumers and
gauge their current and future position in the market. Further you
will learn about text mining and sentiment analysis. The chapter ends
with a presentation on how to perform social media analytics and
opinion mining on tweets.
.
N O T E
S
Jesse Farmer, cofounder of Dev Bootcamp, quotes social network as
a collection of people bound together via a specific set of social rela
tions. Social media, in turn, denotes a group oflnternet-based
applica tions build over the foundations of Web 2.0 that supports the
creation and exchange of user-generated content. In other words,
social media relies on Web-based technologies to generate
interactive platforms where people and organisations can create, co-
create, recreate, share, discuss, and modify user-generated content.
Prior to the advent of social media as an open-system approach to ex
change content effectively, business organisations and public relation
practitioners rarely focused on business dynamics to manage brand
images. With the changing business environments due to the
evolution of social media, business organisations also adopted the
open-system approach based on reciprocal feedback. This, in turn,
has completely transformed the way information was communicated
or the manner by which public relations were developed. The new
approach encour ages active participation in development and
distribution of informa tion by merging innovative technologies and
sociology. Social media provides a collaborative environment which
can be employed for:
D Building relationships
D Distributing content
Apart from the listed ones, social media may include websites that
showcase reviews and ratings, such as Yelp, forums, and discussion
boards, such as Yahoo!, and websites that showcase virtual social
worlds that create a virtual environment where people can interact,
such as SecondLife. Figure 8.1depicts the forms of conversations
pos sible via social media:
COllaboratlof\
Bl0g
Platforms
N O T E S
ACTIVITY
Enlist and discuss the elements of social media marketing strategy
in your class.
I I
NOTE
Lexicometry or lexical statistics refers to the study of identifying
the frequency of occurrence of words in textual data.
NOTE
N O T E S
Statistical analysis tools, such as R and word count, aid in the assess
ment of the overall review. Further, positive and negative
relationships can be explored using various plotting techniques, such
as scatter plot. Apart from the listed application areas, text mining
techniques can be further applied for analysis of demographics,
financial status, and buying tendencies of customers.
N O T E S
• Text indexing
• Measure of accuracy
• Document similarity
• Document categorization
N O T E S
ACTIVITY
Search and prepare a report on the various applications of text
mining.
install.packages("twitteR")
install.packages("bitops")
install.packages("digest")
install.packages("RCurl")
# If there is any error while installing RCurl,
follow the below
command using terminal
#sudo apt-get install libcurl4-openssl-dev
install.packages ("ROAuth")
install.packages ("tm")
install.packages("stringr")
install.packages ("plyr")
library(twitteR)
library(ROAuth)
library (RCurl)
library (plyr)
library(stringr)
library(tm)
load("/Datasets/twitter_cred.RData")
registerTwitterOAuLh(cred)
Figure 8.4 shows the use of Twitter credentials for Twitter authenti
cation in R:
R _ RConsole
> t.a.-ta.11.pac:Jr.aoe...c•cv.L.:: •••>
l':rror1 CO\ll.4 not. .t.1.nd func J..on •t:n•t•ll.peclr•o•••
> 1-.ru:caU.pa,ck-oe•t•cw•t.t a•)
Iru1 lli.no p,ac.b9e U'lto • -e:Il!--"'°/IVv:.1..11- /:S.O•
(a.a •1.1.b• ._. .1•d·
---Ple.. t •tlte1: a CR.All a1.:c:ror ror uae: .1..0 ctu.e aa■al.OD -
:;.ry:1.39 URL "bte;p://hp.J..i.aa.ac.1.D/t:r..an/bUl/v-1.Ddo'lifa/co;nr-"1.b/3.0/tWi t•R 1.1.7. ·
COJ1te.nt t •a.pp.1..1.cat.J.an/1.t,p• l..n9 320t'SI b •• ( 1.2 D) -
opani&d URL
dovn1oacSed 31.2 Kb
> .li.b.r.a_ry(•t..r..1.nQ.Z")
I I
E( NOTE
input tweets[l:3]
N O T E S
R RCo e ---->--==
I> f:CloWal.oa41.nq t.vee't.!I
> 1np\Jt tweets • uerchTv1t:ter«•nok.1.a•. n•1000, le.nq-•en•)
llerning-me.!lsaqe:
In dcRppAPICa.l.l ("aea.rch/twee:ta•, n, pa.r6DIJII • param.s, retry()n.RauL.iJal.t • retr)'OnRat.$
1000 tweet.:, ven re:que ted but t.he API c:.an only return 91
After writing the preceding function, the files containing positive and
negative words are loaded to run the sentiment function. Enter the
following commands to load the data file containing positive and
neg ative words, respectively:
R RConsole
I> po■ - rcadL:i.nca('"Cs/wc:11.1». LAB.S/Wcelt 6/5e ota 1./d& t•/po•1.C.1vc-vo .tac•)
> neg- readLJ..Ce.a ('"C:/llCIH». Ut.B5/We-elt 6/Se.s.sJ.oa 1/csoeaaec.s/acga 1ve-vords.11:xc.'")
> .scorea - ,core. .sent::1.aent(tve-et. pos. ne:q, .prOQre.s.-•t.ext•)
I I M
>
>
>
> •core•.Svery.po.s - ••-nuae.r:Lc (aco.re■S•eore > OJ
> •cor.■.Sv•rv ..a.•9 • ••-n i.c(■cor...S-■oor• < O )
> a-corcoS•cry.ncu • ae.nuae.r1otacorc.sC':,co,:,c - O)
> n\UI00-9 - :11\lla(sooreeSYery,DO■l
> n. o • •\.a(■c:orc•.Sv.ry.11eo)
> ru.mneu • ■ia(ac:or•■SV'try.n.u)
> • <- c(auapoa,a.uane-o,nu.aeu)
> l.bl..s <- cc•POSU •.•nG:ArIVE•,•N?:OT"RAL11)
> pc:'t. <- l:Cl'IUM:l(a/.su.(a)•lOO)
> ll>U <- p.a■ce (l.b.1..a. pee)
> l.bU <- pei,n::e Ubl.a,"'"', ■.-pa••)
> P-1• (:S.,la.l)fcle • 1bl..s ool•r&.l..Cl)ov(1e.n tl.bl.s)) aa..1.c-·oPINION•)
>
# Number
of positive, neutral, and negative tweets
numpos
sum(scores$very.pos)
numneg
sum(scoresSvery.neg)
numneu
sum(scores$very.neu)
After the sentiments are categorised and the number of positive, neg
ative, and neutral tweets is found out, plot theresults byusing the
following command:
The Pie chart for the analysed sentiment score is shown in Figure 8.7:
R RGtaphk:.s:Oevtce2 (ACTIVE)
OPINION
u Score
SELF ASSESSMENT
_..........,_t_._J,.:,.J_ c_::JV:,,..
vc.,.a•■• o,•.=.
-
social :ttion*
social "ntlon*
sony
,,..,.
""""
Figure 8.9: Searching a Product for Online Text Analysis
A Web page appears with the sentiment score for the product
(Sony, in our case) being displayed on the left-hand side, as
shown in Figure 8.10:
ls.a,eli'
ll'k!llrnl - 1G
Mulnl _,so
HQ IIW t ,:,lJti09Augment.aBttsllfy to Maektl10 Cotlt91StJJRMII S.l.uct,,
-fo
Pffi\'fft mifltlle,;SQO'IJlc,J;J'Iso n;,s.nHoo't"'.oo-w.:11uievW3
Top Keywords r-;i Oil.a str\ln91,m,esmod(M n.kJl"IE!2012 9 e:repo.'W'<l on a Sonv
-y - 00 i:,.,ti,i\11
. .■ ■ ,.
,.,,,.,,. • bl
■ ..
,.
:.pen• I Jl oArmur rec:or@ ttie:lfl"QQSs1bte
p11au,s I 10 Da1','TeieqJ.JOIIAJslrala• FOOD'l5&mi111"'2S&gc> 8M;rl ly ll'le)n1ont l)e
N O T E S
Top Users
moic,eslcilrlsUa, 34
Je-ns 8o Jones - 14
racll:an&p - 8
- 6
guucc"1'1 a 5
heryhong ■ 4
nature-v1orna.m ■ 4
CH.ETTV ■ 4
veloc.idodpromoe 4
Andf"OICI
Authonty
■ 4
Top Hashtags
ik%.Z..2jutefbt.2
Sources
f)tlOlObUCkel 58
delicious 50
ffic:kr 49
an.sweB_Wl.kl 49
'!/OU'-Ube 47
ask a 8
..,,........
Sentiment140
•
sentiment illnaJyels for toi.hlba
TWeeti.ill>out;toshJba
!WOO Cbec!rn,1) St '3e8Tehing! Th!t-il!Ithe hottest de8Ion T01-/:llb9NolellOok comput ,M'UU.i('9 otrer,nut ':'I
CONPAZM"K4"'f1
Figure 8.13: Showing the Online Analysis for Toshiba Products Using
the Sentimentl40 Tool
ACTIVITY
Search and find some tools that are used by the organisations to
analyse its Social Media Competitors.
N O T E S
2010
2001
• 4G (20()
• G (2 Mbps)
1991
Mbps)
•2G (14.4
kbps)
1980
• 1G (1. 0
/ kbps)
Ir]
Analog
GSM and
CDMA
I I
m"NOTE
"Forget what we have taken for granted on how consumers use the In
ternet," said Karsten Weide, research vice president, Media and
En tertainment. "Soon, more users will access the Web using mobile
devic es than using PCs, and it's going to make the Internet a very
dif.ferent place."
N O T E S
N O T E S
Mobile analytics can easily and effectively collect data from various
data sources and manipulate it into useful information. Mobile
analyt ics keeps track of the following information:
□ Total time spent: This information shows the total time spent
by the user with an application.
□ Visitors' location: This information shows the location of the user
using any particular application.
□ Number of total visitors: This is the total number of users
using any particular application, useful in knowing the
application's popularity.
□ Click paths of the visitors: Mobile analytics tracks of the
activities of a user visiting the pages of any application.
:
N O T E S
Mobile Web refers to the use of mobile phones or other devices like
tablets to view online content via a light-weight browser. The name
of any mobile-specific site can be the form of m.example.com.
Mobile Web sometimes depends on the size of the screen of the
devices. For example, if you design an application for a small screen,
its images would appear blurred on a big screen; similarly, if you
make your site for the bigscreen, it can be heavy for a small screen
device. Some or ganisations are starting to build sites specifically for
tablets because they have found that neither their mobile-specific site
nor their main website ideally serves the tablet segment. To solve
this problem, mo bile Web should have a responsive design. In other
words, it should
'
N O T E S
have the property to adapt the content to the screen size of the
user's device.
Responsive
I
Website Mobile Site Design Site
example.com m.example.com I example.com
I
- - I .
08
I x---;
I
I I
I
I
I
I
B
'- - - -'
I
I 9
I
In Figure 8.15, you can see that a website can be opened on both
com puters and mobile phones, while a mobile site can be opened
only on mobile phones; responsive-design sites, on the other hand,
can be opened on any device like a computer, tablet, or mobile
phone.
The term mobile app is short for the term mobile application
software. It is an application program designed to run on
smartphones and oth er mobile devices.
Table 8.1 lists the main differences between mobile app analytics and
mobile Web analytics:
EXHIBIT
Enhancement in Search Query on Mobile Phones
N O T E S
Figure 8.16 shows the growth of mobile phone users (in billions)
with respect to years:
8 Population
7
6
4
:!! 5
4
iii 3
2
Mobile-cellular subscriptions
1
0,_
2005 2006 2007 2008 2009 2010 2011 2012· 2013•
N O T E S
n, GD.
-
Actions speak louder thanpageviews.
>
N O T E S
@Hif1tilifjnQ
•
ACTIVITY
Search and enlist at least 10 mobile analytics tools used by organi
sations these days.
N O T E S
<
+ iiS#iif#Fi§f fin ,clll/0- e • • • e •
1234567890
iPhonC!'it<ccn hol
Dat8b89B8
All
a...lnoss
Cu&tomer Lj5t
Tlll"!C8illi119
v Flnancb.1
WJo:,nthlyBudg,et
PDtSOnar
088oilN !i00;.00 "'-'lO
Chackbooloi v'3
S::IIOcfllllcm!: 112..50
EZShop
e.tss.•1/0fflJ(een-er.
-
Figure 8.23: GUI of StatViz
The following are some popular applications that collect data into
the server:
□ DataWinners is the data collection service design for experts.
This application converts paper forms into digital
questionnaires.Team members can submit their data through any
service like SMS, Web, etc. DataWinners provides an efficient
data collection facility that can reduce users' decision-making
time. The home page of the DataWinners application is shown in
Figure 8.24:
-=,.c
fc•---=
.. C e,
-=====::::':::!:::========= ==
,,. ..._,.,,..,._
HQW WOlll(S
- =llllCTffi"ORWATIONIOE'-10ijE EST -
-
•
P1100UC:T Oqi,.t1,il(»,, cm.,. ...>« 1110< trl'IOM? [()l,,:1-'CJUC l>An!Nftl'Otll,',1
Perform the following steps to download and install the Graph Trial
app, and then create a graph to present the results of data analysis:
1. Open the Google Play Store by tapping the Play Store icon
on the screen of any android phone or tablet. A window
appears, showing the contents of the play store.
'
N O T E S
2. Type Graph Trial in the search box, and tap the Ea button to
start the search operation, as shown in Figure 8.26:
,; a .,ti ., 12 16 pm
Graph Trial
X
Graph Trial
.• , t.AOVIES 800
••. • · • • Free
I , Horoscop
( II graph trail X
-
Apps
Graph trial
•••
Gaia GPS: Topo Maps an,
Books
Figure 8.27: Selecting the Graph Trial App from the List
App penmss1ons
Storo9e
Nf'IWOfk commun1cauon
new
w-===
....
•' \
, li3
...,
10. Input. the details in the Y axis title, Min, and Max fields, as
shown in Figure 8.35:
r.i ,. ,-. -
.ti • -,'lI"
r,,..11l•1o,t•\pl1•1JIJpl1
,.,rc:o11
,
. .o.a..
.ii.
.h
.....
0 M,h 10000
"'J n: (I ., I lp111
► itrii ,■:ii Bb
A.c.r"I 1S00
Son 2000
Lmo1ra 5500
- Apple 10000
HP 3000
12. Tap the Save button to get the Earehart of list items icon, as
shown in Figure 8.37:
Figure 8.37: Showing the Window with the Earehart of list items Icon
13. Long tap the Earehart of list items icon to see the graph, as
shown in Figure 8.38:
l ne Pi11
,,
10,000
8,000
QJ
5,500
-
.! :!
a.
6,000
- 4,000
3,000
2.000 1
brand
14. Select the Pie tab by tapping it to get a pie chart, as shown in
Figure 8.39:
0 0 T .,ii ., 12 30 pm
Mer,,
0 0 ;, .,II ., 12 37 pm
Pi!'
10,000
QI
- 7,500
a.
- 5,000
2,500
r::1 "'
0 'f .,II " 12 23 pm
Settings
csv
• CSV import
All dala1mpor1
/' Plotlabesettings
,• ·• ,,,._ r
!.¥1hn91
17. Tap the CSV import option to open the window listing the CSV
files in the selected location, as shown in Figure 8.42:
18. Select the particular file name to get a graph for the dataset.
19. Select the size of the graph by tapping an optionfrom the Select
image size window, as shown in Figure 8.43:
20. Tap the OK button to save the graph in the form of an image,
as shown in Figure 8.44:
Saved graphimage at
/rnnVsdcard/ graph/Barchartojlist
items_1388991763854,png
After saving the graph as an image, you can share it via e-mail.
To do this, proceed to the next step.
'
N O T E S
21. Tap the Share option to open the Share graph image window,
as shown in Figure 8.45:
22. Tap the option through which you want to share the image.
In our case, we have selected Gmail, as shown in Figure
8.46:
23. Enter the details in the required fields, as shown in Figure 8.47:
i::. A O T .,,I ., 12 33 pm
24. Tap the Share button and exit the application by pressing
the OK button on the pop-up box that appears, as shown in
Figure 8.48:
0 0 T .,11 ., 12 35 pm
L.IIJ -
EXHIBIT
*
Premier Inn
prcmicrinn.com
Howwas the hotel able to achieve such big revenues? Actually, the
magic behind the success of the Premier Inn mobile app was mo
bile data analytics provided by Grapple, a mobile-innovation agen
cy. Grapple collected data from its 300 branded applications of
cli ents. Branded applications are those which either offer a utility
or make the life of the customer easier when he or she is on the
move. Grapple analysed this data to enable companies, such as
Premier Inn, to better understand customer behavior and make
required changes to improve sales, customer retention, and loyalty.
Premier Inn used Grapple's analysis to improve the features and
function ality of its mobile application and increase sales
conversion rates from 3% to 5.9% and generated revenues of £Im
in a short period of three months.
N O T E S
ACTIVITY
Determine the ways to overcome the challenges in the field of
mo bile marketing and mobile advertising.
1:j1I SUMMARY
D Social media refers to a computer-mediated, interactive, and in
ternet-based platform that allows people to create, distribute, and
share a wide range of content and information, such as text and
images.
D Social media analytics is the practice of collecting data from so
cial media, websites or blogs and analysing the data to take
crucial business decisions.
D Text mining or text analytics comes as a handy tool to
quantitative ly examine the text generated by social media and
filtered in the form of different clusters, patterns, and trends.
D Sentiment analysis involves careful analysis of people's opinions,
sentiments, attitudes, appraisals, and evaluations.
D Automated sentiment analysis is still evolving as it is difficult to
interpret the conditional phrases used by people to express their
sentiments on social media.
D 4G provides wide range access, multiservice capacity, integration
of all older mobile technologies, and low bit cost to the user.
0 Mobile analytics has several similarities with web and social ana
lytics, such as both can analyse the behavior of the user with
regard to an application and send this information to the service
provider.
D Mobile web refers to the use of mobile phones or other devices
like tablets to view online content via a light-weight browser.
□ Mobile apps are usually available through application distribution
platforms like apple app store and Google play.
:
N O T E S
II KEYWORDS
□ Blog: It represents an online journal to showcase the content
organised in the reverse chronological order.
□ Microblogs: The types of blogs that allow people to share and
showcase small posts and are suitable for quick sharing of con
tent in a few lines of text or an individual photo or video.
□ Wik:i: It represents a collective website in which the members
can create and modify content in a community-based database.
□ Social networks: It is a network that generally supports the ex
change of information and data in various formats, such as
text, videos, and photos.
D Text mining tools: The tools used to identify themes,
patterns, and insights hidden in the structured as well as
unstructured data.
8. Algorithms
Online social media analysis
9. False
10. Sentimentl40
Mobile analytics
11. a. 30
12. Division
Mobile analytics tools
13. Packet sniffing
14. Real-time dashboard
Performing mobile analytics
15. Server
16. True
17. cl. Temporary Mobile Sub-
scriber Identity
Challenges of mobile analytics
18. Redirect
N O T E S
E-REFERENCES
D Top 25 social media analytics tools for marketers - keyhole. (2017,
march 09). Retrieved April 28, 2017, from http://keyhole.co/blog/
list-of-the-top-25-social-media-analytics-tools/
D Social media analytics. (2017, April 13). Retrieved April 28,
2017, from https://en.wikipedia.org/wiki/Social_media_analytics
D What is social media analytics? - Definition from Whatls.com.
(n.d.). Retrieved April 28, 2017, from
http://searchbusinessanalyt ics.techtarget.com/definition/social-
media-analytics
D Mobile Analytics Key Benefits I Mobile Marketing. (n.d.). Re
trieved April 28, 2017, from https://www.webtrends.com/prod
ucts-solutions/digital-analytics/mobile-analytics-use-cases/
CONTENTS
9.1
Introduction
9.2
What is Visualisation?
9.2.1 Ways of Representing Visual Data
9.2.2 Techniques Used for Visual Data Representation
9.2.3 Types of Data Visualisation
9.2.4 Applications of Data Visualisation
Self Assessment Questions
Activity
9.3 Importance of Big Data Visualisation
9.3.1 Deriving Business Solutions
9.3.2 Turning Data into Information
Self Assessment Questions
Activity
9.4 Tools Used in Data Visualisation
9.4.1 Open-Source Data Visualisation Tools
9.4.2 Analytical Techniques Used in Big Data Visualisation
Self Assessment Questions
Activity
9.5 Summary
9.6 Descriptive Questions
9.7 Answers and Hints
9.8 Suggested Readings & References
INTRODUCTORYCASELET
N O T E S
N O T E S
@J LEARNING OBJECTIVES
Ill INTRODUCTION
In the previous chapter, you have learned about prescriptive analyt
ics. It is the final phase of Business Analytics, which uses fundamen
tals of mathematical and computational sciences to provide different
decision options for taking the benefit of the results of descriptive
and predictive analytics.
Data visualisation is a pictorial or visual representation of data with
the help of visual aids such as graphs, bar, histograms, tables, pie
charts, mind maps, etc. Depending upon the complexity of data and
the aspects from which it is analysed, visuals can varyin terms of
their dimensions (one-/two-/multi-dimensional) or types, such as
temporal, hierarchical, network, etc. All these visuals are used for
presenting different types of datasets. Different types of tools are
available in the market for visualising data. But what is the use of
data visualisation in Big Data? Is it necessary to use it? To answer
these questions, we need to track down the real meaning of
visualisation in the context of Big Data analytics.
This chapter familiarises you with the concept of data visualisation
and the need to visualise data in Big Data analytics. You also learn
about different types of data visualisations. Next, you learn about
var ious types of tools using which data or information can be
presented in a visual format.
Left
2[aovERNMEN
N O T E S
Data can be classified on the basis of the following three criteria irre
spective of whether it is presented as data visualisation or
infographics:
□ Method of creation: It refers to the type of content used while cre
ating any graphical representation.
□ Quantity of data displayed: It refers to the amount of data
which is represented.
□ Degree of creativity applied: It refers to the extent to which the
data is created graphically, and wheather it is designed in a color ful
way or in black and white diagrams.
On the basis of above evaluation, we can understand which is the
cor rect form of representation for a given data type. Let's discuss the
var ious content types:
□ Graph: A representation in which X and Yaxes are used to
depict the meaning of the information
□ Diagram: A two-dimensional representation of information to
show how something works
□ Timeline: A representation of important events in a sequence
with the help of self-explanatory visual material
□ Template: A layout design for presenting information
□ Checklist: A list of items for comparison and verification
□ Flowchart: A representation of instructions which shows how
something works or a step-by-step procedure to perform a task
□ Mind Map: A type of diagram which is used to visually
organise information
• 4780 .
60 83
. 543
3549
• 40509 .4013
3180
2469
0.4
0.2
0
-0.2
-0.4
N O T E S
.,,. - +- ........
....
.....--
Figure 9.5: Streamlines
□ Map: It is a visual representationoflocations within a specific
area. It is depicted on a planar surface. Figure 9.6 shows an
instance of Google Map:
10
z
-1 0
2 X
®®
AnB AnB
®88
AvB A-B
r----
In SOU1hwes1em Kenya outsidethe pastoralan,as P,0Vll1000 lllrougll January
Jan-05 Deo-05
Shon-<alns haNest Kidding.lambingbegins. S&asonal mlgratlon Short-ra,ns season beg,ns
ancl milkavallabilllyImproves orliveslOCI< to
dry-saason g,az,nga,eas
LOng-,&N seasonbegins
in !he paSIOrala,ea
N O T E S
C:::,sote,sitn-niffl011s
C::> OUtr,IMl.ltk.,fcmit/4:.J
io l---- ...-
t
6
8
2.82 2.83 2,84 2,85 2,86 2,7 2,8 2.9 3,0 3,1 3,2 3,3
grad,ent gradient
You already know that data can be visualised in many ways, such as
in the forms of lD, 2D, or 3D structures. Table 9.1 briefly describes
the different types of data visualisation:
TABLE 9.1: DATA VISUALISATION TYPES
Name Description Tool
ID/Linear A list of items organised Generally, no tool is used
for in a predefined maimer lD visualisation
2D/Planar Choropleth, cartogram, GeoCommons, Google
Fusion dot distribution map, Tables, Google Maps
API,
and proportional sym- Polymaps, Many Eyes, Google
bol map Charts, and Tableau Public
3DNolumetric 3D computer models, AC3D, AutoQ3D,
TrueSpace surface rendering,
volume rendering, and
computer simulations
Temporal Timeline, time series, TimeFlow, Timeline JS, Excel,
Gantt chart, sanky dia- Timeplot, TimeSearcher, Goog-
gram, alluvial diagram, le Charts, Tableau Public, and
and connected scatter Google Fhsion Tables
plot
Multidimen- Pie chart, histogram, Many Eyes, Google
Charts, sional tag cloud, bubble cloud, Tableau Public, and
Google
bar chart, scatter plot, Fusion Tables
heat map, etc.
Tree/Hierar- Dendogram, radial tree, d3, Google Charts, and Net-
chical hyperbolic tree, a11d work Workbench/Sci2
wedge stack graph
Network Matrix, node link Pajek, Gephi, NodeXL,
diagram, hive plot, VOSviewer, UCINET,
GUESS, and tube map Network Workbench/Sci2,
sig-
ma.js, d3/Protovis, Many Eyes,
and Google Fusion Tables
;\TISUALISATION ..
N O T E S
ACTIVITY
Search and enlist the symbols used in a flowchart. Also, create a
■
flowchart which represents a sequence of instructions for resolving
a problem using its symbols.
The most exciting part of any analytical study is to find useful infor
mation from a plethora of data. Visualisation facilitates identification
of patterns in the form of graphs or charts, which in turn helps to de
rive useful information. Data reduction and abstraction are generally
followed during data mining to get valuable information.
Visual data mining also works on the same principle as simple data
mining; however, it involves the integration of information visualisa
tion and human-computer interaction. Visualisation of data produces
cluttered images that are filtered with the help of clutter-reduction
techniques. Uniform sampling and dimension reduction are two
com monly used clutter-reduction techniques.
Visual data reduction process involves automated data analysis to
measure density, outliers, and their differences. These measures are
then used as quality metrics to evaluate data-reduction activity.
Visual quality metrics can be categorised as:
□ Size metrics (e.g. number of data points)
□ Visual effectiveness metrics (e.g. data density, collisions)
□ Feature preservation metrics (e.g. discovering and preserving
data density differences)
;\TISUALISATION
N O T E S
ACTIVITY
Prepare a report on Big Data visualisation tools that are widely
used by the organisations nowadays.
N O T E S
I A B
·i""=■;";,
HOM NSERT PAGEIAVOUT FORMUlAS DATA REVIEW V1EW Ramankumar •
jc.nb,i
a I l! •
•!11
I A. A. ;;;: :,;;: :,;;: !;a •
! Geo«• I
$ • '11, •
·I C-•i<>NIFonnotting•
ljjiFo.mot"1T>blc·
00115 Editing
EB· ¢,·.A. • € € - 'oi.'& 1ilc.115tyt<,•
Clipboard r. Fcmt r. Ali9n.rnt-nt r.. Numbtr r.i
[ Al • [, [ X ../ Jx [1 Almaft:Type v[
= --:: ,;;; .::::A:: --'t:-:"'----:-:--:-----------:::---------------,c:=c-:-:r-o:::-:-r:-:-:::-::E;-;--:iG
r1-+-::-':;'-'1a"":.-:T ype ---- ;::_ =AR::::tc::: c:;Ecc:iv-,N-Tl_A_R_-PT :=--------+==..:.:::=c.,:.::'-'-'-'="'-'lLJ
Airplane UNKNOWN
f-A"-irpccl•cc.ncoe VER INnAIRPOR=T- _
Airplane CHICAGOO'HAREINTlARPT Unk.nown 8-727-200 1
:---1,oHN F-KENNEOv INTl u_nkno;.,nlUNKNOW----1
7 AhJ?'lane -+u_N_k_NO_W N .._<_1000_ft_C-_55Q_-+------1
8 Al lane UNKNOWN <1000ft f!.n1-200 1
9 Airplane CINCINNATI MUNIARPT-WNKENFIELD <lOOOft CITATION
.
12 Airplane ---;i=ISA:;..;l_c;cT -=IA-=KE _C...;;ITY;;..;;clN=Tl...;.;;c---;;;;;--=-=-; =..;..;;'--"==±<=1=000=ftc±e.=...;73=7•=300 2t=o=l0=:1 8
' @ II]
f.: n. ,r !��
.., a
N O T E S
b_ldJi iji, o
§(§§)
�us 0
Figure 9.17: Charts Obtained from Google Charts API
N O T E S
N O T E S
N O T E S
ACTIVITY
Collect information about the pivot table used in Excel for repre
senting data.
IIJsuMMARY
□ Visualisation is a pictorial or visual representation technique.
a Anything which is represented in pictorial or graphical form, with
the help of diagrams, charts, pictures, flowcharts, etc. is known as
visualisation.
□ Data presented in the form of graphics can be analysed better
than the data presented in words.
a Infographics are the visual representation of information or data.
a Data visualisation approach is different from Infographics. It is
the study of representing data or information in a visual form.
□ Data can be presented in various visual forms, which include sim
ple line diagrams, bar graphs, tables, matrices, etc.
a Multimedia and entertainment industry use visuals to
communi cate their ideas and information.
a The data generated by social media interaction is interpreted
using visual analytics techniques.
a Apart from the type of data, the volume and speed with which
data is generated pose a great challenge.
a Because of heterogeneity of data sources, data streaming, and
real-time data, it becomes difficult to handle Big Data by using
traditional tools.
□ Visual data reduction process involves automated data analysis to
measure density, outliers, and their differences.
N O T E S
II KEYWORDS
□ Graph: It is a representation in which X and Y axes are used
to depict the meaning of the information.
D Diagram: It is a two-dimensional representation of
information to show how something works.
□ Timeline: It is a representation of important events in a se
quence with the help of self-explanatory visual material.
□ Flowchart: It is a representation of instructions which
shows how something works or a step-by-step procedure to
perform a task.
□ Isosurfaces: These are designed to represent points that are
bound by a constant value in a volume of space.
2. True
3. digitally
4. False
5. C. Flowchart
6. Direct Volume Rendering
7. Venn
•
N O T E S
16. True
17. Flickr
N O T E S
E-REFERENCES
D Data visualization. (2017, April 26). Retrieved May 02, 2017, from
https://en.wikipedia.org/wiki/Data_visualization
D Suda, B., & Hampton-Smith, S. (2017, February 07). The 38 best
tools for data visualization. Retrieved May 02, 2017, from http://
www.creativebloq.com/design-tools/data-visualization-712402
D 50 Great Examples of Data Visualization. (2009, June 01). Re
trieved May 02, 2017, from https://www.webdesignerdepot.
com/2009/06/50-great-examples-of-data-visualization/
CONTENTS
10.1 Introduction
10.2 Financial and Fraud Analytics
Self Assessment Questions
Activity
10.3 HR Analytics
Self Assessment Questions
Activity
10.4 Marketing Analytics
Self Assessment Questions
Activity
10.5 Healthcare Analytics
Self Assessment Questions
Activity
10.6 Supply Chain Analytics
Self Assessment Questions
Activity
10.7 Web Analytics
Self Assessment Questions
Activity
10.8 Sports Analytics
Self Assessment Questions
Activity
10.9 Analytics for Government and NGO's
Self Assessment Questions
Activity
10.10 Summary
10.11 Descriptive Questions
10.12 Answers and Hints
10.13 Suggested Readings & References
:
N O T E S INTRODUCTORY CASELET
N O T E S
@J LEARNING OBJECTIVES
■ uj■INTRODUCTION
Business analytics has emerged as a growth driver for most new
era organisations. Gone are those occasions when managers used to
settle on choices on the premise of their own guts or use large-
scale finan cial indicators and their imaginable effect on individual
organisations. Choices made without data and information have
turned out to be unfortunate for many associations. With the
advent of data innova tion and increased data handling ability of
PCs, supervisors are utilis ing numerous metbods to anticipate the
fate of business and enhance gainfulness of the venture. The
application of descriptive and predic tive analytics, client relationship
management tools and different pro cess improvement devices
brings the benefit to the organisation. The entire business world is
taking a look at huge information as an open door and source of a
competitive advantage.
This chapter first discusses financial and fraud analytics. Next, the
chapter explains HR analytics, marketing analytics and healthcare
analytics. The chapter also explains supply chain analytics and Web
analytics. Towards the end, the chapter discusses sports analytics and
how analytics is used by the government and NGOs for providing
var ious beneficial services to people.
N O T E S
N O T E S
ACTIVITY
■ ujj HR ANALYTICS
Human Resource (HR) analytics, additionally called talent analytics,
is the use of complex information mining and business analytics
(BA) strategies to get HR information. HR analytics is a zone in the
field of analysis that alludes to applying analytic processes to the
human resource department of a company in the expectation of
enhancing
:
N O T E S
N O T E S
ACTIVITY
■ llj■MARKETING ANALYTICS
Every organisation strives to gain an edge over its competitors. This
can be possible if an organisation develops an effective industry lev
el strategy. For this, an organisation needs to analyse various forces,
such as level of competition in the market, entry of new
organisations, availability of substitute products, etc. For this
purpose, marketing an alytics are used by organisations.
N O T E S
You need to follow the below three steps to get the benefits from
mar keting analytics:
l. Practice a balanced collection of analytic methods
In order to get the best benefits from marketing analytics, you
need an analytic evaluation that is balanced - that is, one that
merges methods for:
♦ Covering the past: Utilising marketing analytics to
research on the past. You can answer a few queries such as
which cam paign component was used to make most income
from last quarter?
♦ Exploring the present: Marketing analytics enables you to
decide how your marketing activities are acting at this mo
ment by asking questions such as: How are clients doing?
Which channels do clients use to gain maximum benefits?
What is the reaction of different networking media
personnel on the company's image?
♦ Predicting influencing what's to come: Marketing
analytics can be used to deliver data driven expectations to
change the future by putting few inquiries such as: How
would we be able to transform here and nowwininto
dedication and con tinuous engagement? In what
capacity, we should include more sales representatives to
meet expectations? Which ur ban communities would be a
good idea for us to focus next by utilising our present
situation?
Evaluate yow· analytical capabilities and fill in the gaps
Marketing organisations have an access to a lot of analytic
abilities for supporting different marketing goals. Estimating
your present analytic capabilities is necessary to attain these
goals. It is significant to know about your present situation
along with an analytic spectrum, so that you can determine gaps
and take steps to create a strategy for filling those gaps.
Consider an example in which a marketing organisation is
already gathering data from sources like the Internet and POS
transactions, but is not providing importance to the unstructured
information coming from social media platforms. Such
unstructured sources are very useful, and the technology for
transforming unstructured data into actual insights is available
today that can be used by marketers. A marketing organisation
can planand allocate budget for adding these analytic
capabilities that can be used to fill that particular gap.
3. Take action as per analytical findings
The information collected after performing marketing analytics
is not useful until you try to act on that information. In the
continuous process of testing and learning, marketing analytics
: IN
PRACTICE
N O T E S
N O T E S
ume-to value based healthcare. Presently like never before, the ana
lytics is crucial for clinicians and health service providers so that
they can distinguish and address gaps in care, quality and hazards
and use it to bolster changes in clinical and quality results and
financial per formance.
ACTIVITY
Study how healthcare analytics has helped in improving the care
delivered in your nearby hospital.
N O T E S
for their products as soon as the products are announced in the mar
ket. Most Apple products are manufactured in China; therefore,
Apple needs to have a highly efficient supply chain to ship items
from China to different countries in the world.
N O T E S
N O T E S
ACTIVITY
Prepare a report on how the use of business analytics tools in sup
plychain has helped in improving the production of the manufac
turing industry.
■ uf1WEB ANALYTICS
Web analytics refers to measuring, collecting, analysing and
reporting of Web data to understand and optimise the usage of Web.
However, Web analytics is not only restricted to measurement of
Web traffic but can also be utilised as a method of performing
research in business and market.
N O T E S
There are mainly two methods of gathering the data technically. The
first method lays emphasis on server log file analysis in which the
log files are read and used by the Web server for recording file
requests sent by browsers. The second method, known as page
tagging, uses JavaScript embedded in the Web page for tracking it.
Both the meth ods can gather data which can be processed for
generating reports of Web traffic. This second method provides
more accurate result as compared to the first method.
N O T E S
could mean visitors were unable to find what they were searching
for in the site.
□ Identify exit pages: An exit is the point at which a visitor visits
various pages on site and then leaves that site. A few pages on a
site may have a high leave rate, similar to the thank youpage on
an online e-commerce website after purchasing is done
successfully. A high exit rate on a particular page demonstrates
that the page has some issue and should be investigated quickly.
Examination of such pages should be done to determine whether
visitors are not getting the intended information for which they
have visited the website. Web analytics tools help in finding such
pages quickly and rectifying the problems with those pages.
□ Identify target market: It is essential for advertisers to understand
their visitors and deliver information according to their require
ments. The discoveries of analytics services uncover the present
market requests which generally change with a geographic area.
By utilising Web analytics, marketers can track the volume and
geographical information of visitors and can offer things
according to the interest of visitors.
ACTMTY
Visit a Web hosting company and try to learn howWeb analytics can
help the company to monitor the activity on the hosted websites of
the server.
■ 11j:1SPORTS ANALYTICS
Sports analytics is a technique of analysing relevant and historical in
formation in the field of sports mainly to perform better than other
team or individual. The information gathered in sports is analysed by
coaches, players and other staff members for decision making both
during and prior to sporting events. With rapid advancement in the
technology in the past fewyears, data collection has become more
pre cise and relatively easier than earlier. The advancement in the
collec tion of data has also contributed in the growth of sports
analytics as it totally relies on the collected pool of data. The growth
in analytics has further led to building of technologies such as
fitness trackers,
I •• •
N O T E S
game simulators, etc. Fitness trackers are smart devices that provide
data about the fitness of players on the basis of which coaches can
take a decision of including particular players in the team or not. The
game simulators help in practicing the game before the actual sport
ing event takes place.
The sport analytics not only modifies the way of playing a game but
also changes the way of recording the performance of players. The
National Basketball of America (NBA) teams are now using the
player tracking technology which can evaluate the efficiency of a
team by an alysing the movement of its players. As per the
information provided by the SportVu software website, the teams in
NBAhave installed six cameras for tracking the movements of each
player on the court and the basketball at the rate of 25 times per
second. The data collected using cameras provide significant amount
of innovative statistics on the basis ofspeed, player separation and
ball possession. For example, how fast a player moved, how much
distance he had covered during the game, how many times he had
passed the ball and much more. On the basis of the data collected,
strategies are created to win the game or to improve the performance
in the game.
Sports analytics has also found its application in the field of sports
gambling. The availability of more accurate information about teams
and players on the websites leads to sport gambling to new levels.
The analytics information helps gamblers in better decision
making and attaining accuracy in predicting outcomes of games or
performance of a particular player. In addition to websites or Web
pages, a number of companies also help in providing minute details
of players or teams to gamblers to fulfill their betting requirements.
Sports gambling con tributed 13%of global gambling industry valued
somewhere between
$700-$1000 billion. Some of the popular websites which provide bet
ting services to users are bet365, bwin, Paddy Power, betfair, and
Uni bet.
ACTIVITY
Discuss with your friends how analytics can be used in the field of
sports to enhance the energy of players while protecting them from
injuries.
:
II
N O T E S
Big data analytics is used in almost every part of the world for
deriving useful information from huge sets of data. Not only private
organisa tions and industries are employing data analytics but also
many gov ernment enterprises are adopting data analytics for taking
smart de cisions for the benefit of its citizens. Lot of data gets
generated in the government sector and processing and analysing this
data helps the government in improving its policies and services for
citizens. Some benefits of data analytics in government sector are as
follows:
D With the rise of national threats and criminal activities these days,
it is important for any government to ensure safety and security
of its citizens. With the help of data analytics, intelligence
organisa tions can detect crime prone areas and be prepared to
prevent or stop any kind of criminal activity.
D The analytics also help in detecting the possibility of the cyber at
tacks and identifying criminals. It also helps in detecting their
pat terns of attacks. The government can therefore, takes
appropriate action in advance to prevent people from any kind of
financial loss.
D Government can use analytics to track and monitor health of its
citizens. It can also be used for tracking disease patterns. The
gov ernment can launch proper healthcare facilities in advance in
the areas prone to diseases. It also helps in arranging and
managing free medicines and vaccinations, etc in order to save
life of people.
D Real time analysis and sensors help government departments in
water management in the city. The officials can detect the issues
in the flow of water, pollution level in water, predict scarcity of wa
ter on the basis of usage, detect areas of leakage, etc.
Government departments can take proper action to avoid these
issues to ensure supply of clean water in city.
D Government organisations also use analytics to detect tax frauds
and predict the revenue. Government can take necessary steps to
prevent tax frauds and increase the revenue.
D Government can also use the analytics in the field of agriculture
to know the appropriate time for cultivation of crops, fertilisers
required for crops, etc. Moreover, the government can also take
prior actions to prevent damage of crops in case of various envi
ronmental challenges.
I •• •
N O T E S
Besides Akashya Patra, several other large NGOs such as Bill and
Melinda Gates Foundation India, Save the Children India, and Child
Rights and You (CRY) are also utilising data to raise their efficiency
in getting and allocating funds, predicting trends and planning cam
paigns.
These NGOs often face difficulties with data collection because they
use traditional ways of data collection. In order to overcome these
challenges, NGOs have allotted mobile phones equipped with apps
so that real time collection and recording of data can take place. The
data recorded in this manner would be accurate and will give more
precise information on the basis of which further decisions or action
plans can be made.
:
N O T E S
ACTIVITY
Visit to a nearby NGO and try to know how analytics has helped
them in improving their services and focus more on the overall
de velopment of the people or area.
■ 11#111SUMMARY
□ Business analytics has expanded consistently over the previous
decade as confirmed by the constantly developing business ana
lytics software market.
□ Fraud impacts organisations in several ways which might be
relat ed to financial, operational and psychological processes.
□ Numerous organisations stay helpless against extortion and mon
ey related crime since they aren't exploiting new abilities to battle
today's dangers.
□ Organisations generally move to HR analytics and data led solu
tions when there exists problems that cannot be resolved with the
current management practices.
D Marketing analytics helps in providing deeper insights of custom
er preferences and trends. Despite various benefits, a majority of
organisations failed to realise the benefits of marketing analytics.
O Healthcare organisations are also implementing approaches, for
example lean and Six Sigma to take a more patient-driven
concen tration, lessen errors and waste, and increase the number
of flow of patients with the objective of enhancing quality.
O Organisations that operate in a highly competitive global environ
ment needs to have a highly effective supply chain management
system in place.
: IN PRACTICE
N O T E S
II KEYWORDS
□ Capacity analytics: It helps in tracking the number of people
who are operationally efficient and currently in business.
□ Employee churn analytics: It refers to the process of estimating
your staff turnover rates for predicting the future and reducing
employee churn.
□ Employee performance analytics: It is used in assessing the
performance of an individual employee.
□ Fraud Analytics: It is used to detect whether a financial
activity is fraudulent or not to prevent any kind of financial
loss.
□ Marketing analytics: It helps in providing deep insight of cus
tomer preferences and trends.
■ ui•,DESCRIPTIVE QUESTIONS
1. Discuss the importance of financial and fraud analytics for an
organisation.
2. Describe the role of HR analytics in an organisation.
3. What do you understand by marketing analytics? Discuss the
steps in getting the best assistance from marketing analytics.
4. How healthcare analytics is useful in the medical field? Explain
with suitable examples.
5. Why analytics is required in supply chain? Discuss with
suitable reasons.
6. What is Web analytics? Enlist the steps involved in the Web
analytics process.
7. Describe the importance of analytics in the field of sports.
8. Discuss the need for analytics for government and NGOs.
N O T E S
2. Advanced analytics
3. True
HR Analytics 4. Talent
;:,, True
6. Capacity
Marketing Analytics 7. a. Search Engine Optimisation
8. True
9. False
Healthcare Analytics 10. b. Electronic Medical Records
11. True
12. Real-time
Supply Chain Analytics 13. True
14. Advanced
N O T E S
E-REFERENCES
D Data analysis techniques for fraud detection. (2017, April 26).
Re trieved May 03, 2017, from
https://en.wikipedia.org/wiki/Data_
analysis_techniques_for_fraud_detection
... :
N O T E S
Case Study 1
HowCISCO IT uses Big Data Platform to Transform Data
Management
Case Study 2 USDA used Data Mining to know the Patterns of Loan Defaulters
Case Study 3 Cincinnati Zoo used Business Analytics for Improving Performance
Case Study 4 Application of Business Analytics in Resource Management
Case Study 5 Role of Descriptive Analytics in the Healthcare Sector
Case Study 6 An Application of Predictive Analytics in Underwriting
Case Study 7 Unicredit Bank Applies Prescriptive Analytics for Risk Management
Case Study 8 Campaign Success of Mediacom
Case Study 9 Dundas BI Solution Helped Medidata and its Clients in Getting
Better Data Visualisation
Case Study Sports Analytics Helped in the Enrichment of Performance of
10 Players
Fraud Analytics Solution Helped in Saving the Wealth of Companies
Case Study
11 Big Data Analytics Allowing Users to Visualise the future ofFree
Online Classifieds
Case Study
12
N O T E S CASE STUDY I
BACKGROUND
Cisco is one of the world's leading networking organisations
that has transformed the way how people connect, communicate
and collaborate. Cisco IT has 38 global data centres that totally
comprise 334,000 square feet space.
CHALLENGE
The company had to manage large datasets of information about
customers, products and network activities, which actually
comprise the company's business intelligence. In addition,
there was a large quantity of unstructured data, approximately
in terabytes in the form of Web logs, videos, emails, documents
and images. To handle such a huge amount of data, the company
decided to adopt Hadoop, which is an open-source software
framework to support distributed storage and processing of big
datasets.
According to Piyush Bhargava, a distinguished engineer at
Cisco IT, who handles big data programs, "Had.oop behaves
like an affordable supercomputing platform." He also says, "It
moves compute to where the data is stored., which mitigates the
disk I/0 bottleneck and. provides almost linear scalability. Had.oop
would. enable us to consolidate the islands of data scattered.
throughout the enterprise."
Toimplement the Hadoop platform for providingbig data analytics
services to Cisco business teams, firstly Cisco IT was required
to design and implement an enterprise platform that could
support appropriate service level agreements (SLAs) for
availability and performance. Piyush Bhargava says, "Our
challenge was adapting the open source Hadoop platform for the
enterprise."
The technical requirements of the company for implementing
the big data architecture were to:
D have open source components at place to establish the archi
tecture.
□ know the hidden business value oflarge datasets, whether the
data is structured or unstructured
• • •• DATA: MANAGEMENT-
CASE STUDY I
N O T E S
SOLUTION
Cisco IT developed a Hadoop platform using Cisco® UCS Common
Platform Architecture (CPA) for Big Data.
According to Jag Kahlon, a Cisco IT architect, "Cisco UCS
CPAfor Big Data provides the capabilities we need to·use big data
anct.lytics for business advantage, including high-performance,
scalability, and ease of management."
For computation, the building block of the Cisco IT Hadoop
Platform is the Cisco UCS C240 M3 Rack Servers, which are
powered by Intel Xeon ES-2600 series processors, 256 GB of RAM,
and 24 TB of local storage.
Virendra Singh, a Cisco IT architect, says, "Cisco UCS C-
Series Servers provide high performance access to Local storage, the
biggest factor in Hadoop performance."
The present architecture contains four racks of servers, where
each rack is having 16 server nodes providing 384 TB of raw
storage per rack. Kahlon says,"This configuration can scale to
160 servers in a single management domain supporting 3.8
petabytes of raw storage capacity."
Cisco IT server administrators are able to manage all
elements of Cisco UCS including servers, storage access,
networking and virtualisation from a single Cisco UCS Manager
interface. Kahlon declares,"Cisco UCS Manager significantly
simplifies management of our Hadoop plaiform. UCS Manager will
help us manage larger clusters as our plaiform grows without
increasing staffing."
Cisco IT uses MapR Distribution for Apache Hadoop, and
code written in advanced C+ + rather than Java. Virendra Singh
says, "Hadoop complements rather than replacing Cisco IT's
traditional data-processing tools, such as Oracle and Teradata. Its
unique value is to process unstructured data and very large data
sets far more quickly and at far less cost."
Hadoop Distributed File System (HDFS) manages the storage on
all Cisco UCS C240 M3 servers in the cluster form to create one
large logical unit. Then, HDFS system splits the data into
smaller chunks for further processing and performing ETL
(Extract, Transform and Load) operations.
Hari Shankar, a Cisco IT architect, says, "Processing can continue
even if a node fails because Hadoop makes multiple copies of every
N O T E S CASESTUDYl
RESULTS
The main result of transforming the business using Big Data by
Cisco IT is that the company has introduced multiple big data
analytics programs, which are based on the Cisco® UCS
Common Platform Architecture (CPA) for Big Data.
The revenues of the company from partner sales have been
increased. The company has started the Cisco Partner Annuity
Initiative program, which is in production. Piyush says, ''With
our Hadoop architecture, analysis of partner sales opportunities
completes in approximately one-tenth the ti-me it did on our
traditional data analysis architecture, and at one-tenth the cost."
The productivity of the company has been increased by making
intellectual capital easier to find. Earlier, many employees who
work as knowledge workers in Cisco used to take a lot of time to
search for the content on websites throughout the day as most
of the content was not tagged with relevant keywords. But, now,
Cisco IT has replaced the static and manual tagging process with
dynamic tagging on the basis of user feedback. This process uses
machine-learning techniques to examine usage patterns adopted
by users and also acts on user suggestions given for searching by
new tags.
Moreover, the Hadoop platform analyses log data of
collaboration tools, such as Cisco Unified Communications,
email, Cisco TelePresence®, Cisco WebEx®, Cisco WebEx
Social, and Cisco
• • DATA: MANAGEMENT-
CASE STUDY I
N O T E S
LESSON LEARNED
Cisco IT has come up with the following observations shared
with other organisations:
D Hive is good for structured data processing, but provides lim
ited SQLsupport.
D Sqoop easily moves a large amount of data to Hadoop.
D Network File System (NFS) saves time and effort to manage
a large amount of data.
D Cisco TES simplifies the job-scheduling and orchestration
process.
D A library of user-defined functions (UDFs) provided by Hive
and Pig increases developer productivity.
D Knowledge of internal users is enhanced as they can now
analyse unstructured data of email, webpages, documents,
etc., besides data stored in databases.
1. \:IM4iiM4f
1. What were the challenges faced by Cisco?
(Hint: Open source components, service-level
agreements (SLAs) to the internal customers, etc.)
2. What are the lessons learned by Cisco?
(Hint: Hive is good for structured data processing, Cisco
TES simplifies job-scheduling and orchestration process,
Network File System (NFS) saves time and effort to
L manage a large amount of data, etc.) _J
N O T E S CASESTUDY2
1&1114iiMii
1. What were the motives behind setting up the USDA
Rural Development?
(Hint: Welfare of rural areas of America, etc.)
2. How could the data mining technique help the USDA?
(Hint: To determine problems in the already granted
loans, etc.)
..
N O T E S CASESTUDY3
BACKGROUND
Opened in 1875, Cincinnati Zoo & Botanical Garden is a world
famous zoo that is located in Cincinnati, Ohio, US. It has more
than 1.3million visitors every year.
CHALLENGE
In late 2007, the management of the zoo had begun a strategic
planning process to increase the number of visitors by enhancing
their experience with an aim to generate more revenues. For
this, the management decided to increase the sales of food
items and retail outlets in the zoo by improving their
marketing and promotional strategies.
According to Jolm Lucas, the Director of Operations at
Cincinnati Zoo& Botanical Garden, "A.lmost itnmediately, we
realised we had a story being told to us in the form of internal and
customer data,but we didn't have a lens through which to view it
in a way that would allow us to make meaningful changes."
Lucas and his team members were interested in finding business
analytics solutions to meet the zoo's needs. He said, ' t the start,
we had never heard the terms 'business intelligence' or 'business
analytics'; it was just an abstract idea. We more or less stumbled
onto it."
They looked for various providers, but did not include IBMinitially
in the false assumption that they could not afford IBM. Then,
somebody guided them that it wascompletely free to talk to IBM.
Then, they found that IBMnot only had suggested a solution that
could fit in their budget, but it was the most appropriate solution
for what they were looking for.
SOLUTION
IBM has provided a business analytics solution to the zoo's
executive committee, which provides a facility of analysing data
related to the membership of customers, their admission and
food, etc. in order to gain a better understanding of visitors'
behaviour. This solution also provides a facility of analysing the
geographic and demographic information that could help in
customer segmentation and marketing.
The zoo's executive committee wanted a platform, which would
be capable of delivering the desired goals by combining and
•
CASESTUDY3
N O T E S
OUTPUT
The result of implementing the IBM's business analytics solution
is that the zoo's return of investment (ROI) has increased. Lucas
admits, "Over the 10 years we'd been?"Unning that promotion,
we lost just under $1 million in revenue because we had no
visibility into where the visitors using it were coming from.,,
The new business analytics solution has helped in cost savings
for the zoo; for example, there is a saving of $40,000 in
marketing in the first year, visitors' number has been increased to
50,000 in 2011, food sales is increased by least 25%, and retail
sales has been increased by at least 7.5%, etc.
By adopting new operational management strategies of the
business analytics solution, there is a remarkable increment in
attendance and revenues, which have resulted in an annual ROI
of 411%. Lucas admits, "Prior to this engagement, I never would
have believed that an organisation of the size of the Cincinnati
Zoo could reach the level of granularity its business analytics
solution provides. These are Fortune 200 capabilities in my
eyes.'1
QUESTIONS
N O T E S CASESTUDY4
METHOD
The company conducted a few surveys and questionnaire among
the group and came out with a solution to streamline and lean
manage the teams present within the law firm. For the office
space, the real estate company used the firm's resources to
map out where the employees were most often. The real estate
company assisted the law firm by utilising different location
conscious mechanisms to keep track of the whereabouts of the
firm's personnel in which the data was accumulated based on
employee partialities and activities. The end-result was that the
lawfirm decided to relocate from the high-rise office into a more
affordable space based on the location habits of its personnel.
The new location was too convenient for employees that it
resulted in increased employee retention; thereby saving costs of
the firm. Apart from the above actions, the following
questionnaires were circulated across various departments:
Questions for Management:
□ What evaluation methods should be employed to assess the
yearly performance of employees?
•
•
CASE STUDY4
N O T E S
QUESTIONS
1. What were the initial challenges faced by the law firm?
(Hint: Office space relocation indecisiveness, employee
retention impact and resourcing issues)
2. What are the lessons learned from this case study?
(Hint: You can cite examples of cross-functionality
deployed by the real estate team to denote excellent all
round services provided by the real estate company.)
-------'
N O T E S CASESTUDY5
CASESTUDY5
N O T E S
QUESTIONS
1. What were the challenges faced by hospitals in
emergency services?
(Hint: Overcrowding of patients, delay in providing health
services, etc.)
2. What are the advantages of descriptive analytics?
(Hint: To determine hidden patterns, better visualisation
of information, etc.)
II
N O T E S CASESTUDY6
QUESTIONS
L What were the initial challenges faced by Freedom
Specialty Insurance?
(Hint: Interface bottlenecks, manual processes, various
policies developed in silos, etc.)
2. What changes did the implementation of an advanced
predictive model bring in for the company?
(Hint: Integrated processes, easier claim tracking, etc.)
••
N O T E S CASESTUDY7
QUESTIONS
This Case Study discusses how MediaCom has taken the assistance
of Sysomos for pLanning and meas·uring data related to
advertising campaigns fo1·its clients. It is with respect to Chapter
8.
MediaCom is one of the leading media agencies of the world,
which helps its clients to plan and measure its advertising
strategies across all media channels. The company greatly
depends on Sysomos in planning and measuring the performance
of campaigns of its clients.
The main motto of MediaCom agency was to improve the
business along with having insight data related to the audience's
response to their brands and issues.
Alejandro De Luna, Social Strategy Manager at MediaCom, says
"The value Sysomos provides for us is very clear. We need to have a
bedrock of insights to justify how to approach content solutions for
different audiences and different platforms, and Sysomos helps us
to sell in our strategies by giving ·u,s a much clearer undentanding
of how audiences feel about specific brands and issues."
Sysomos has enabled MediaCom to analyse online conversations
without any limitations of keywords or result into the database of
over 550 billion social media posts. Now, MediaCom is able to
use social intelligence for planning and reporting. For example, it
can analyse the data ofsocial media discussions about the
campaign on Twitter and discussion forums to know about
consumer opinions.
Sysomos has provided a tool, Buzzgraph, to MediaCom that
helps in gaining knowledge about the key concepts of online
conversations. However, the Tweet Life tool helps in
analysing how a tweet gets viral on the Internet. With the help
of Sysomos, MediaCom, is now easily convincing its clients
that its plans are made on solid facts and figures. Sysomos has
helped MediaCom to have complex analysis of a wide range of
topics, without imposing restrictions on the number of
searching terms or obtaining the results in order to gain insight
knowledge and applying their campaign strategies.
• •
CASESTUDY8
N O T E S
QUESTIONS
N O T E S CASESTUDY9
QUESTIONS
1. Why data visualisation is important for companies?
(Hint: Timely action, resource allocation, etc.)
2. What should be the features of a good data visualisation
solution?
(Hint: Highly interactive, data-driven alerts, etc.) _J
CASE STUDY 10
N O T E S
PERFORMANCE OF PLAYERS
- This Case Study discusses how real-time analytics from IBM have
been utilised by team USA for measuring and improving their
athlete's performance. It is with respect to Chapter 10 of the book.
A US-based cycling organisation, which is dedicatedly
contributing towards the betterment of advanced US cycling
teams in the Olympics and other international events, was
involved in determining the ways to get an edge over its well
funded competitive organisations in the events like Women's
Team Pursuit. In the team pursuit event, there are four cyclists
with one in the lead and the other three remaining behind. The
challenge appears when riders change their places, which cause
disruption and slows down the group. The delay of fraction of a
second can cost the race in this extremely competitive sport.
USAcycling totally depends on private donations unlike national
teams which are totally supported by government bodies.
Coaches in USA recycling felt the need of analytics for
analysing the rider's performance along with managing the
organisation's budget efficiently. The challenge in front of USA
cycli11g was to quantify the performance in Team Pursuit
track cycling events in real time, which were organised indoors
in velodromes. It was easier to monitor and track the rider's
performance outdoors only if there are no variations in wind or
condition of cycling track.
"The single most important factor in winning a race is the power
that the riders are able to exert on the pedals. The bikes we use
have a power meter on the crank that measures the power
generated in watts," according to Andy Sparks, Direct.or of
Track Programs for USA Cycling. Collecting and applying data
analytics from bicyclists' sensors was a slow-going process that
usually took an hour in only collecting data per cyclist.
' t the end of a training session, the coach had to plug the head.
unit of each bike into his PC,download the data, manually sl.ice it
into half-second intervals, match those intervals to the events that
took place during the session - for example, when each rider was
pulling, versus when they were exchanging or pursuing - and then
calculate a variety of key metrics," according to Andy Sparks.
This means performance data and cycling analytics would not get
ready until the next training day.
USA cycling organisers were looking for the solution to
overcome these challenges. They decided to take help from real-
time analytics for analysing the performance of players and
achieve its goal. They started working with IBM jStart to
configure the
N O T E S CASE STUDY 10
QUESTIONS
15%
blc.reo$8 in
6Qolo
tneieose in total
trcrud. detection sa-vtngs
N O T E S CASE STUDY 11
BACKGROUND
OLX is a popular fast growing online classified advertising
website. It is active in around 105 countries and supports over
40 languages. This website is having more than 125 million
unique visitors per month across the world and generates one
billion page-hits per month approximately. OLX allows its users
to design and personalise their advertisements and add them
in their social networking profiles, so that their data require big
data analytics.
CHALLENGES
The main challenge for OLX website was to find new ways to
use business analytics to handle the vast data of their customers.
The business users of OLX required numerous metrics to track
their customer data. To achieve this aim, they need to build a
good control over their data warehouse. OLX takes the help of
Datalytics, Pentaho's partner vendor, in searching the solutions
for extracting, transforming and loading data from worldwide
and then creating an improved data warehouse. After creating
such a warehouse, OLXwants to allow its customers to visualise
its stored data in real time without facingany technical error or
barrier. OLX knew that it would be difficult for those people
who do not have without previous Business Intelligence (Bl)
knowledge, so it is essential to use a visualisation tool for this
purpose. According to Franciso Achaval, Business Intelligence
Manager at OLX, "While it may beeasy for a BI analyst
tounderstand what's happening in the m.imbers, to explain this to
business users who are not versed in BI or OLAP (On-line
Analytical Processing), you need visualisations."
SOLUTIONS
OLX has approached Pentaho, which is a business intelligence
software company that provides open source products and
services to its customers, such as data integration, OLAP
services, reporting, information dashboards, etc. Pentaho has
partnership with Datalytics, which is basically a consulting firm
based in Argentina. Datalytics provides data integration,
business intelligence, and data mining solutions to Pentaho's
worldwide clients.
N O T E S CASE STUDY 12
RESULTS
OLX has realised that Datalytics' expertise and Pentaho's
platform have enabled them to deploy their new analytics
solution in less than a month. They have realised the following
changes in the new solution:
□ Pentaho Business Analytics enables OLX to facilitate its
users to create easy and creative reports about key business
metrics.
□ Instead of buying an expensive enterprise solution or invest
ing time in building a new data warehouse internally, OLX
was able to save time by focussing on data integration with
analytics capabilities.
□ Pentaho Business Analytics provides end-user satisfaction.
□ Pentaho Business Analytics provides a scalable solution to
OLX, as it can integrate any type of data from any data
source and can increase its business. In addition, Datalytics'
assis tance provides an opportunity to OLX regarding the
experi ment with big data.