0% found this document useful (0 votes)
8 views

IBA Module3

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

IBA Module3

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Module III:

Visualization and Data Issues

✧ Learning Outcome:

At the end of this module, you should be able to talk about the
following:

■ Organization/sources of data
■ Importance of data quality

■ Dealing with missing or incomplete data

■ Data Classification

■ Business Modeling, Metrics and Measurement

■ Davenport and Harris article - “The Dark Side of Customer Analytics”

✧ Reading Materials:

Data Visualization

❖ What it is and why it matters

Data visualization is the presentation of data in a pictorial or graphical


format. It enables decision makers to see analytics presented visually, so they
can grasp difficult concepts or identify new patterns. With interactive
visualization, you can take the concept a step further by using technology to
drill down into charts and graphs for more details, interactively changing what
data you see and how it’s processed.

History of Data Visualization

The concept of using pictures to understand data has been around for
centuries, from maps and graphs in the 17th century to the invention of the pie
chart in the early 1800s. Several decades later, one of the most cited
examples of statistical graphics occurred when Charles Minard mapped
Napoleon’s invasion of Russia. The map depicted the size of the army as well
as the path of Napoleon’s retreat from Moscow – and tied that information to
temperature and time scales for a more in-depth understanding of the event.
It’s technology, however, that truly lit the fire under data visualization.
Computers made it possible to process large amounts of data at lightning-fast
speeds. Today, data visualization has become a rapidly evolving blend of
science and art that is certain to change the corporate landscape over the
next few years.

Data visualization: A wise investment in your big data future

With big data there’s potential for great opportunity, but many retail
banks are challenged when it comes to finding value in their big data
investment. For example, how can they use big data to improve
customer relationships? How – and to what extent – should they invest
in big data?

Why is data visualization important?

Because of the way the human brain processes information, using charts or
graphs to visualize large amounts of complex data is easier than poring over
spreadsheets or reports. Data visualization is a quick, easy way to convey
concepts in a universal manner – and you can experiment with different
scenarios by making slight adjustments.

Data visualization can also:

● Identify areas that need attention or improvement.


● Clarify which factors influence customer behavior.
● Help you understand which products to place where.
● Predict sales volumes.

Data visualization is going to change the way our analysts work with
data. They’re going to be expected to respond to issues more rapidly.
And they’ll need to be able to dig for more insights – look at data
differently, more imaginatively. Data visualization will promote that
creative data exploration.

How Is It Being Used?

Regardless of industry or size, all types of businesses are using data


visualization to help make sense of their data. Here’s how.

● Comprehend information quickly


By using graphical representations of business information, businesses
are able to see large amounts of data in clear, cohesive ways – and
draw conclusions from that information. And since it’s significantly
faster to analyze information in graphical format (as opposed to
analyzing information in spreadsheets), businesses can address
problems or answer questions in a more timely manner.

● Identify relationships and patterns

Even extensive amounts of complicated data start to make sense when


presented graphically; businesses can recognize parameters that are highly
correlated. Some of the correlations will be obvious, but others won’t.
Identifying those relationships helps organizations focus on areas most likely
to influence their most important goals.

● Pinpoint emerging trends

Using data visualization to discover trends – both in the business and


in the market – can give businesses an edge over the competition, and
ultimately affect the bottom line. It’s easy to spot outliers that affect
product quality or customer churn, and address issues before they
become bigger problems.

● Communicate the story to others

Once a business has uncovered new insights from visual analytics, the next
step is to communicate those insights to others. Using charts, graphs or other
visually impactful representations of data is important in this step because it’s
engaging and gets the message across quickly.

Laying the groundwork for data visualization

Before implementing new technology, there are some steps you need
to take. Not only do you need to have a solid grasp on your data, you
also need to understand your goals, needs and audience. Preparing
your organization for data visualization technology requires that you
first:
● Understand the data you’re trying to visualize, including its size and
cardinality (the uniqueness of data values in a column).

● Determine what you’re trying to visualize and what kind of information


you want to communicate.

● Know your audience and understand how it processes visual


information.

● Use a visual that conveys the information in the best and simplest form
for your audience.

Once you've answered those initial questions about the type of data you have
and the audience who'll be consuming the information, you need to prepare
for the amount of data you'll be working with. Big data brings new challenges
to visualization because large volumes, different varieties and varying
velocities must be taken into account. Plus, data is often generated faster that
it can be managed and analyzed.

There are factors you should consider, such as the cardinality of columns
you’re trying to visualize. High cardinality means there’s a large percentage of
unique values (e.g., bank account numbers, because each item should be
unique). Low cardinality means a column of data contains a large percentage
of repeat values (as might be seen in a “gender” column).

Deciding which visual is best

One of the biggest challenges for business users is deciding which visual
should be used to best represent the information. SAS Visual Analytics uses
intelligent autocharting to create the best possible visual based on the data
that is selected.

When you’re first exploring a new data set, autocharts are especially useful
because they provide a quick view of large amounts of data. This data
exploration capability is helpful even to experienced statisticians as they seek
to speed up the analytics lifecycle process because it eliminates the need for
repeated sampling to determine which data is appropriate for each model.

Source: https://www.sas.com/en_in/insights/big-data/data-visualization.html
Potential Problems With Data Visualization

Big data has been a big topic for a few years now, and it’s only going to grow
bigger as we get our hands on more sophisticated forms of technology and
new applications in which to use them. The problem now is beginning to shift;
originally, tech developers and researchers were all about gathering greater
quantities of data. Now, with all this data in tow, consumers and developers
are both eager for new ways to condense, interpret, and take action on this
data.
One of the newest and most talked-about methods for this is data
visualization, a system of reducing or illustrating data in simplified, visual
ways. The buzz around data visualization is strong and growing, but is the
trend all it’s cracked up to be?

The Need for Data Visualization


There’s no question that data visualization can be a good thing, and it’s
already helped thousands of marketers and analysts do their jobs more
efficiently. Human abilities for pattern recognition tend to revolve around
sensory inputs—for obvious reasons. We’re hard-wired to recognize visual
patterns at a glance, but not to crunch complex numbers and associate those
numbers with abstract concepts. Accordingly, representing complex numbers
as integrated visual patterns would allow us to tap into our natural analytic
abilities.

The Problems With Visualization


Unfortunately, there are a few current and forthcoming problems with
the concept of data visualization:
1. The oversimplification of data. One of the biggest draws of visualization
is its ability to take big swaths of data and simplify them to more basic,
understandable terms. However, it’s easy to go too far with this; trying to
take millions of data points and confine their conclusions to a handful of
pictorial representations could lead to unfounded conclusions, or
completely neglect certain significant modifiers that could completely
change the assumptions you walk away with. As an example not
relegated to the world of data, consider basic real-world tests, such as
alcohol intoxication tests, which try to reduce complex systems to simple
“yes” or “no” results—as Monder Law Group points out, these tests can be
unreliable and flat-out inaccurate.

2. The human limitations of algorithms. This is the biggest potential


problem, and also the most complicated. Any algorithm used to reduce
data to visual illustrations is based on human inputs, and human inputs
can be fundamentally flawed. For example, a human developing an
algorithm may highlight different pieces of data that are “most” important
to consider, and throw out other pieces entirely; this doesn’t account for all
companies or all situations, especially if there are data outliers or unique
situations that demand an alternative approach. The problem is
compounded by the fact that most data visualization systems are rolled
out on a national scale; they evolve to become one-size-fits-all algorithms,
and fail to address the specific needs of individuals.

3. Overreliance on visuals. This is more of a problem with consumers than


it is with developers, but it undermines the potential impact of visualization
in general. When users start relying on visuals to interpret data, which
they can use at-a-glance, they could easily start over-relying on this mode
of input. For example, they may take their conclusions as absolute truth,
never digging deeper into the data sets responsible for producing those
visuals. The general conclusions you draw from this may be generally
applicable, but they won’t tell you everything about your audiences or
campaigns.

4. The inevitability of visualization. Already, there are dozens of tools


available to help us understand complex data sets with visual diagrams,
charts, and illustrations, and data visualization is too popular to ever go
away. We’re on a fast course to visualization taking over in multiple areas,
and there’s no real going back at this point. To some, this may not seem
like a problem, but consider some of the effects—companies racing to
develop visualization products, and consumers only seeking products that
offer visualization. These effects may feed into user overreliance on
visuals, and compound the limitations of human errors in algorithm
development (since companies will want to go to market as soon as
possible).

There’s no stopping the development of data visualization, and we’re not


arguing that it should be stopped. If it’s developed in the right ways, it can be
an extraordinary tool for development in countless different areas—but
collectively, we need to be aware of the potential problems and biggest
obstacles data visualization will need to overcome.

Source:
https://www.datasciencecentral.com/profiles/blogs/4-potential-problems-with-d
ata-visualization
I. ORGANIZATION/SOURCES OF DATA
Types of Data Sources

Various types of data are very useful for business reports, and in business
reports, you will quickly come across things like revenue (money earned in a
given period, usually a year), turnover (people who left the organization in a
given period), and many others.

There are a variety of data available when one is constructing a business


report. We may categorize data in the following manner:

● Internal
o Employee headcount
o Employee demographics (e.g., sex, ethnicity, marital status)
o Financials (e.g., revenue, profit, cost of goods sold, margin,
operating ratio)
● External
o Number of vendors used
o Number of clients in a company’s book of business
o Size of the industry (e.g. number of companies, total capital)

Internal and external business or organizational data come in two main


categories: qualitative and quantitative.

● Qualitative data are generally non-numeric and require context, time, or


variance to have meaning or utility.
o Examples: taste, energy, sentiments, emotions
● Quantitative data are numeric and therefore largely easier to understand.
o Example: temperature, dimensions (e.g., length), prices, headcount,
stock on hand

Both types of data are useful for business report writing. Usually a report will
feature as much “hard” quantitative data as possible, typically in the form of
earnings or revenue, headcount, and other numerical data available. Most
organizations keep a variety of internal quantitative data. Qualitative data,
such as stories, case studies, or narratives about processes or events, are
also very useful, and provide context. We may consider that a good report will
have both types of data, and a good report writer will use both types of data to
build a picture of information for their readers.

Primary Research

Primary research is usually defined as research you collect yourself.

This type of research is done to fill in gaps found during secondary research
review. That is, one does not conduct primary research if you can address
your research question with already existing secondary sources.

Think back to Martha’s case we discussed earlier in this module; her


interviews of homeless people in downtown Chicago are primary research.
She is doing these interviews only because her existing secondary sources
lack something she feels she needs now to properly answer her research
question (about the current experience of homeless families in downtown
Chicago). Primary research is used to supplement gaps in more accessible
secondary research.

Purdue University’s Online Writing Lab describes the following as typical


primary research:

● Interviews: Interviews are conversations, typically in small groups,


where one party asks questions of another. Interviews are usually
conducted in-person, between two people (the person asking questions
and the person answering them); however, these can also take place
over the phone, and may involve multiple parties.
● Surveys: Surveys are typically written documents that are sent out to
individuals to fill out. Surveys are more rigid than interviews, as an
interviewer can change their planned questions based on the subject’s
responses. Surveys, however are pre-written and can only respond in
limited anticipated ways.
● Observations: Observations are just what they sound like: the
researcher watches something and records what they see. It is important
to avoid influencing whatever you’re watching. However, if it’s impossible
to not influence your subject, make sure to include the fact that your
presence may have influenced your observations.
● Analysis: In analysis, gathered data is examined and organized so
those who are less familiar with technical details can be guided through
the data. Analysis can also help uncover patterns and trends in data.

Secondary Research

Secondary research is gathering information from other people’s primary


research.

Common forms are books, journals, newspaper articles, media reports, and
other polished accounts of data. Most report writers will use secondary
sources for their business reports in order to gather, curate, and present the
material in a new, updated and helpful manner. Using secondary research is
far less costly, more efficient, and requires less time to gather data from
already developed sources.

In business, where everything has a cost, we may argue for maximizing


secondary sources alone because primary research is expensive and time
consuming. That said, primary and secondary data should interact, and as
discussed, we gather primary data when we find gaps in the already available
secondary sources.

Source:
https://courses.lumenlearning.com/wm-businesscommunicationmgrs/c
hapter/types-of-data-sources/

What is Big Data and its Importance to Businesses as a Game Changer


❖ What is Big Data ?

If marketers had all the data about consumers that they can then use to
predict consumer behavior, which would be the marketers dream come true.
Until now, marketers had enough data about consumers that they then
modeled to arrive at probable consumer behavior decisions. This data culled
from marketing research was adequate until now when the extrapolation of
the trends could translate into predictions of consumer behavior. However, in
recent years, marketers are going one-step ahead and instead of
extrapolating data to predict consumer behavior, they are now turning to Big
Data or data about virtually all aspects of consumers that would help them in
predictive analytics or the art and science of accurately mapping consumer
behavior. In other words, Big Data is all about how marketers collect
everything possible about consumer behavior and predict not only consumer
behavior but predict what they are doing and how they would behave in
future. For instance, Big Data provides marketers with the ability to
identify the state of the consumers as can be seen in the recent
prediction by the retailer giant, Target, about a woman being pregnant
based on her consumer buying data.

Big Data can be a Game Changer for Marketers

This is the promise of Big Data that goes beyond merely extrapolating trends,
instead, identifies, and predicts the next move of the consumer based on his
or her current state. This would be like getting inside the minds of the
consumers and instead of merely knowing what they would probably
purchase, marketers would know with accuracy about what consumers are
likely to do in the future.

The term Big Data has been coined because it gives marketers the bigger
picture and at the same time lets, they model consumer behavior at the micro
level. The integration of the macro data and the micro trends gives marketers’
unparalleled access to data, which can then be used to accurately predict
consumer behavior. The collection of Big Data is done not only from the
consumer buying behavior but also from mining all the available data in the
public and private domains to arrive at a comprehensive picture of what the
consumers think and how they act. The promise of Big Data is boundless for
marketers who can now think ahead of the consumers instead of the other
way around as well as preempt possible consumer behavior by targeting
products aimed at the future actions of the consumers.

Big Data can be misused as well

Of course, the promise of Big Data also comes with its perils as the tendency
to be the master of consumer behavior can lead to serious issues with privacy
and security of the data available with the marketers. The example of Target
predicting whether the woman was pregnant or not based on her shopping
habits was received with both enthusiasm as well as alarm. The enthusiasm
was from the marketers whereas the alarm was from the activists and experts
who deal with privacy and security of data. The point here is that Big Data
places enormous responsibilities on marketers and hence, they have to be
very careful about the data that they hold and the prediction models and
simulation that they run. If they chose to predict whether someone is going to
do something next based on the results from the model, this prediction can
also be used for unwelcome purposes and as can be seen in the recent
revelations about tracking and surveillance, the data can be compromised or
used to target innocent consumers. This is the reason why many experts are
guarded as far as Big Data is concerned and they are waiting for the
marketers and the regulators to frame rules and policies on how Big Data can
be used in practice.

Concluding Remarks

Finally, it must be mentioned that whichever stance one might have about Big
Data, the potential uses of it for predicting the outbreaks of diseases and
controlling crime are indeed boons to the regulators and the law enforcement
agencies and therefore, it would be better for all stakeholders to decide on the
kind of purposes for which Big Data can be used.

Source:
https://www.managementstudyguide.com/big-data-and-its-importance.h
tm

II. IMPORTANCE OF DATA QUALITY

Data quality is a measure of the condition of data based on factors such as


accuracy, completeness, consistency, reliability and whether it's up to date.
Measuring data quality levels can help organizations identify data errors that
need to be resolved and assess whether the data in their IT systems is fit to
serve its intended purpose.

The emphasis on data quality in enterprise systems has increased as data


processing has become more intricately linked with business operations and
organizations increasingly use data analytics to help drive business decisions.
Data quality management is a core component of the overall data
management process, and data quality improvement efforts are often closely
tied to data governance programs that aim to ensure data is formatted and
used consistently throughout an organization.

Why data quality is important


Bad data can have significant business consequences for companies.
Poor-quality data is often pegged as the source of operational snafus,
inaccurate analytics and ill-conceived business strategies. Examples of the
economic damage that data quality problems can cause include added
expenses when products are shipped to the wrong customer addresses, lost
sales opportunities because of erroneous or incomplete customer records,
and fines for improper financial or regulatory compliance reporting.

An oft-cited estimate by IBM calculated that the annual cost of data quality
issues in the U.S. amounted to $3.1 trillion in 2016. In an article he wrote for
the MIT Sloan Management Review in 2017, data quality consultant Thomas
Redman estimated that correcting data errors and dealing with the business
problems caused by bad data costs companies 15% to 25% of their annual
revenue on average.

In addition, a lack of trust in data on the part of corporate executives and


business managers is commonly cited among the chief impediments to using
business intelligence (BI) and analytics tools to improve decision-making in
organizations.
What is good data quality?
Data accuracy is a key attribute of high-quality data. To avoid transaction
processing problems in operational systems and faulty results in analytics
applications, the data that's used must be correct. Inaccurate data needs to
be identified, documented and fixed to ensure that executives, data analysts
and other end users are working with good information.

Other aspects, or dimensions, that are important elements of good data


quality include data completeness, with data sets containing all of the data
elements they should; data consistency, where there are no conflicts between
the same data values in different systems or data sets; a lack of duplicate
data records in databases; data currency, meaning that data has been
updated as needed to keep it current; and conformity to the standard data
formats created by an organization. Meeting all of these factors helps produce
data sets that are reliable and trustworthy.

How to determine data quality


As a first step toward determining their data quality levels, organizations
typically perform data asset inventories in which the relative accuracy,
uniqueness and validity of data are measured in baseline studies. The
established baseline ratings for data sets can then be compared against the
data in systems on an ongoing basis to help identify new data quality issues
so they can be resolved.

Another common step is to create a set of data quality rules based on


business requirements for both operational and analytics data. Such rules
specify required quality levels in data sets and detail what different data
elements need to include so they can be checked for accuracy, consistency
and other data quality attributes. After the rules are in place, a data
management team typically conducts a data quality assessment to measure
the quality of data sets and document data errors and other problems -- a
procedure that can be repeated at regular intervals to maintain the highest
data quality levels possible.

Various methodologies for such assessments have been developed. For


example, data managers at UnitedHealth Group's Optum healthcare services
subsidiary created the Data Quality Assessment Framework (DQAF) to
formalize a method for assessing its data quality. The DQAF provides
guidelines for measuring data quality dimensions that include completeness,
timeliness, validity, consistency and integrity. Optum has publicized details
about the framework as a possible model for other organizations.

The International Monetary Fund (IMF), which oversees the global monetary
system and lends money to economically troubled nations, has also specified
an assessment methodology, similarly known as the Data Quality Assessment
Framework. Its framework focuses on accuracy, reliability, consistency and
other data quality attributes in the statistical data that member countries need
to submit to the IMF.

Data quality management tools and techniques


Data quality projects typically also involve several other steps. For example, a
data quality management cycle outlined by data management consultant
David Loshin begins with identifying and measuring the effect that bad data
has on business operations. Next, data quality rules are defined, performance
targets for improving relevant data quality metrics are set, and specific data
quality improvement processes are designed and put in place.

Those processes include data cleansing, or data scrubbing, to fix data errors,
plus work to enhance data sets by adding missing values, more up-to-date
information or additional records. The results are then monitored and
measured against the performance targets, and any remaining deficiencies in
data quality provide a starting point for the next round of planned
improvements. Such a cycle is intended to ensure that efforts to improve
overall data quality continue after individual projects are completed.

Software tools specialized for data quality management can match records,
delete duplicates, validate new data, establish remediation policies and
identify personal data in data sets; they also do data profiling to collect
information about data sets and identify possible outlier values. Management
consoles for data quality initiatives support creation of data handling rules,
discovery of data relationships and automated data transformations that may
be part of data quality maintenance efforts.

Collaboration and workflow enablement tools have also become more


common, providing shared views of corporate data repositories to data quality
managers and data stewards, who are charged with overseeing particular
data sets. Those tools and data quality improvement processes are often
incorporated into data governance programs, which typically use data quality
metrics to help demonstrate their business value to companies, and master
data management (MDM) initiatives that aim to create central registries of
master data on customers, products and supply chains.

Benefits of good data quality


From a financial standpoint, maintaining high levels of data quality enables
organizations to reduce the cost of identifying and fixing bad data in their
systems. Companies are also able to avoid operational errors and business
process breakdowns that can increase operating expenses and reduce
revenues.

In addition, good data quality increases the accuracy of analytics applications,


which can lead to better business decision-making that boosts sales,
improves internal processes and gives organizations a competitive edge over
rivals. High-quality data can help expand the use of BI dashboards and
analytics tools, as well -- if analytics data is seen as trustworthy, business
users are more likely to rely on it instead of basing decisions on gut feelings
or their own spreadsheets.

Effective data quality management also frees up data management teams to


focus on more productive tasks than cleaning up data sets. For example, they
can spend more time helping business users and data analysts take
advantage of the available data in systems and promoting data quality best
practices in business operations to minimize data errors.

Emerging data quality challenges


For many years, the burden of data quality efforts centered on structured data
stored in relational databases since they were the dominant technology for
managing data. But the nature of data quality problems expanded as big
data systems and cloud computing became more prominent. Increasingly,
data managers also need to focus on the quality of unstructured and
semistructured data, such as text, internet clickstream records, sensor data
and network, system and application logs.

The growing use of artificial intelligence (AI) and machine


learning applications further complicates the data quality process in
organizations, as does the adoption of real-time data streaming platforms that
funnel large volumes of data into corporate systems on a continuous basis. In
addition, data quality now often needs to be managed in a combination of
on-premises and cloud systems.

Data quality demands are also expanding due to the implementation of new
data privacy and protection laws, most notably the European Union's General
Data Protection Regulation (GDPR) and the California Consumer Privacy Act
(CCPA). Both measures give people the right to access the personal data that
companies collect about them, which means organizations must be able to
find all of the records on an individual in their systems without missing any
because of inaccurate or inconsistent data.
Fixing data quality issues
Data quality managers, analysts and engineers are primarily responsible for
fixing data errors and other data quality problems in organizations. They're
collectively tasked with finding and cleansing bad data in databases and other
data repositories, often with assistance and support from other data
management professionals, particularly data stewards and data governance
program managers.

However, it's also a common practice to involve business users, data


scientists and other analysts in the data quality process to help reduce the
number of data quality issues created in systems. Business participation can
be achieved partly through data governance programs and interactions with
data stewards, who frequently come from business units. In addition, though,
many companies run training programs on data quality best practices for end
users. A common mantra among data managers is that everyone in an
organization is responsible for data quality.

Data quality vs. data integrity


Data quality and data integrity are sometimes referred to interchangeably;
alternatively, some people treat data integrity as a facet of data accuracy in
the data quality process. More generally, though, data integrity is seen as a
broader concept that combines data quality, data governance and data
protection mechanisms to address data accuracy, consistency and security as
a whole.

In that broader view, data integrity focuses on integrity from both logical and
physical standpoints. Logical integrity includes data quality measures and
database attributes such as referential integrity, which ensures that related
data elements in different database tables are valid. Physical integrity involves
access controls and other security measures designed to prevent data from
being modified or corrupted by unauthorized users, as well as backup and
disaster recovery protections.

Source: https://searchdatamanagement.techtarget.com/definition/data-quality

III. DEALING WITH MISSING OR INCOMPLETE


DATA

Missing data is an everyday problem that a data professional needs to deal


with. Though there are many articles, blogs, videos already available, I found
it difficult to find a concise consolidated information in a single place. That’s
why I am putting my effort here, hoping it will be useful to any data practitioner
or enthusiast.
What is missing data? Missing data is defined as values that are not available
and that would be meaningful if they are observed. Missing data can be
anything from missing sequence, incomplete feature, files missing,
information incomplete, data entry error etc. Most datasets in the real world
contain missing data. Before you can use data with missing data fields, you
need to transform those fields so they can be used for analysis and modelling.
Like many other aspects of data science, this too may actually be more art
than science. Understanding the data and the domain from which it comes is
very important.

Having missing values in your data is not necessarily a setback but it is an


opportunity to perform right feature engineering to guide the model to interpret
the missing information right way. There are machine learning algorithms and
packages that can automatically detect and deal with missing data, but it’s still
recommended to transform the missing data manually through analysis and
coding strategy. First, we need to understand what are the types of missing
data. Missingness is broadly categorized in 3 categories:

Missing Completely at Random (MCAR)

When we say data are missing completely at random, we mean that the
missingness has nothing to do with the observation being studied (Completely
Observed Variable (X) and Partly Missing Variable (Y)). For example, a
weighing scale that ran out of batteries, a questionnaire might be lost in the
post, or a blood sample might be damaged in the lab. MCAR is an ideal but
unreasonable assumption. Generally, data are regarded as being MCAR
when data are missing by design, because of an equipment failure or because
the samples are lost in transit or technically unsatisfactory. The statistical
advantage of data that are MCAR is that the analysis remains unbiased. A
pictorial view of MCAR as below where missingness has no relation to
dataset variables X or Y. Missingness is not related to X or Y but some other
reason Z.
Let’s explore one example of mobile data. Here one sample has missing
value which is not because of dataset variables but because of another
reason.
Missing at Random (MAR)

When we say data are missing at random, we mean that missing data on a
partly missing variable (Y) is related to some other completely observed
variables(X) in the analysis model but not to the values of Y itself.

It is not specifically related to the missing information. For example, if a child


does not attend an examination because the child is ill, this might be
predictable from other data we have about the child’s health, but it would not
be related to what we would have examined had the child not been ill. Some
may think that MAR does not present a problem. However, MAR does not
mean that the missing data can be ignored. A pictorial view of MAR as below
where missingness has relation to dataset variable X but not with Y. It can
have other relation as well (Z).
Missing not at Random (MNAR)

If the characters of the data do not meet those of MCAR or MAR, then they
fall into the category of missing not at random (MNAR). When data
are missing not at random, the missingness is specifically related to what is
missing, e.g. a person does not attend a drug test because the person took
drugs the night before, a person did not take English proficiency test due to
his poor English language skill. The cases of MNAR data are problematic.
The only way to obtain an unbiased estimate of the parameters in such a case
is to model the missing data but that requires proper understanding and
domain knowledge of the missing variable. The model may then be
incorporated into a more complex one for estimating the missing values. A
pictorial view of MNAR as below where missingness has direct relation to
variable Y. It can have other relation as well (X & Z).
There are several strategies which can be applied to handle missing data to
make the Machine Learning/Statistical Model.

Try to obtain the missing data

This may be possible in data collection phase in survey like situation where
one can check if survey data is captured in entirety before respondent leaves
the room. Sometimes it may be possible to reach out to the source to get the
data like asking the missing question again for a response. In real world
scenario, this is very unlikely way to resolve the missing data problem.

Educated Guessing

It sounds arbitrary and isn’t never preferred course of action, but one can
sometimes infer a missing value based on other response. For related
questions, for example, like those often presented in a matrix, if the participant
responds with all “2s”, assume that the missing value is a 2.

Discard Data

1) list-wise (Complete-case analysis — CCA) deletion

By far the most common approach to the missing data is to simply omit those
cases with the missing data and analyse the remaining data. This approach is
known as the complete case (or available case) analysis or list-wise deletion.

If there is a large enough sample, where power is not an issue, and the
assumption of MCAR is satisfied, the listwise deletion may be a reasonable
strategy. However, when there is not a large sample, or the assumption of
MCAR is not satisfied, the listwise deletion is not the optimal strategy. It also
introduces bias if it does not satisfy MCAR.

Refer to below sample observation after deletion


2) Pairwise (available case analysis — ACA) Deletion

In this case, only the missing observations are ignored and analysis is done
on variables present. If there is missing data elsewhere in the data set, the
existing values are used. Since a pairwise deletion uses all information
observed, it preserves more information than the listwise deletion.

Pairwise deletion is known to be less biased for the MCAR or MAR data.
However, if there are many missing observations, the analysis will be
deficient. the problem with pairwise deletion is that even though it takes the
available cases, one can’t compare analyses because the sample is different
every time.
3) Dropping Variables

If there are too many data missing for a variable it may be an option to delete
the variable or the column from the dataset. There is no rule of thumbs for this
but depends on situation and a proper analysis of data is needed before the
variable is dropped all together. This should be the last option and need to
check if model performance improves after deletion of variable.
Retain All Data

The goal of any imputation technique is to produce a complete dataset that


can then be then used for machine learning. There are few ways we can do
imputation to retain all data for analysis and building the model.

1) Mean, Median and Mode

In this imputation technique goal is to replace missing data with statistical


estimates of the missing values. Mean, Median or Mode can be used as
imputation value.

In a mean substitution, the mean value of a variable is used in place of the


missing data value for that same variable. This has the benefit of not changing
the sample mean for that variable. The theoretical background of the mean
substitution is that the mean is a reasonable estimate for a randomly selected
observation from a normal distribution. However, with missing values that are
not strictly random, especially in the presence of a great inequality in the
number of missing values for the different variables, the mean substitution
method may lead to inconsistent bias. Distortion of original variance and
Distortion of co-variance with remaining variables within the dataset are two
major drawbacks of this method.

Median can be used when variable has a skewed distribution.


The rationale for Mode is to replace the population of missing values with the
most frequent value, since this is the most likely occurrence.
2) Last Observation Carried Forward (LOCF)

If data is time-series data, one of the most widely used imputation methods is
the last observation carried forward (LOCF). Whenever a value is missing, it is
replaced with the last observed value. This method is advantageous as it is
easy to understand and communicate. Although simple, this method strongly
assumes that the value of the outcome remains unchanged by the missing
data, which seems unlikely in many settings.
3) Next Observation Carried Backward (NOCB)

A similar approach like LOCF which works in the opposite direction by taking
the first observation after the missing value and carrying it backward (“next
observation carried backward”, or NOCB).
4) Linear Interpolation

Interpolation is a mathematical method that adjusts a function to data and


uses this function to extrapolate the missing data. The simplest type of
interpolation is the linear interpolation, that makes a mean between the values
before the missing data and the value after. Of course, we could have a pretty
complex pattern in data and linear interpolation could not be enough. There
are several different types of interpolation. Just in Pandas we have the
following options like : ‘linear’, ‘time’, ‘index’, ‘values’, ‘nearest’, ‘zero’,
‘slinear’, ‘quadratic’, ‘cubic’, ‘polynomial’, ‘spline’, ‘piece wise polynomial’ and
many more .
5) Common-Point Imputation

For a rating scale, using the middle point or most commonly chosen value.
For example, on a five-point scale, substitute a 3, the midpoint, or a 4, the
most common value (in many cases). It is similar to mean value but more
suitable for ordinal values.

6) Adding a category to capture NA

This is perhaps the most widely used method of missing data imputation for
categorical variables. This method consists in treating missing data as if they
were an additional label or category of the variable. All the missing
observations are grouped in the newly created label ‘Missing’. It does not
assume anything on the missingness of the values. It is very well suited when
the number of missing data is high.
7) Frequent category imputation

Replacement of missing values by the most frequent category is the


equivalent of mean/median imputation. It consists of replacing all occurrences
of missing values within a variable by the most frequent label or category of
the variable.
8) Arbitrary Value Imputation

Arbitrary value imputation consists of replacing all occurrences of missing


values within a variable by an arbitrary value. Ideally arbitrary value should be
different from the median/mean/mode, and not within the normal values of the
variable. Typically used arbitrary values are 0, 999, -999 (or other
combinations of 9’s) or -1 (if the distribution is positive). Sometime data
already contain arbitrary value from originator for the missing values. This
works reasonably well for numerical features that are predominantly positive
in value, and for tree-based models in general. This used to be a more
common method in the past when the out-of-the box machine learning
libraries and algorithms were not very adept at working with missing data.
9) Adding a variable to capture NA

When data are not missing completely at random, we can capture the
importance of missingness by creating an additional variable indicating
whether the data was missing for that observation (1) or not (0). The
additional variable is a binary variable: it takes only the values 0 and 1, 0
indicating that a value was present for that observation, and 1 indicating that
the value was missing for that observation. Typically, mean/median imputation
is done together with adding a variable to capture those observations where
the data was missing.
10) Random Sampling Imputation

Random sampling imputation is in principle similar to mean/median


imputation, in the sense that it aims to preserve the statistical parameters of
the original variable, for which data is missing. Random sampling consists of
taking a random observation from the pool of available observations of the
variable, and using that randomly extracted value to fill the NA. In Random
Sampling one takes as many random observations as missing values are
present in the variable. Random sample imputation assumes that the data are
missing completely at random (MCAR). If this is the case, it makes sense to
substitute the missing values, by values extracted from the original variable
distribution.

Multiple Imputation
Multiple Imputation (MI) is a statistical technique for handling missing data.
The key concept of MI is to use the distribution of the observed data to
estimate a set of plausible values for the missing data. Random components
are incorporated into these estimated values to show their uncertainty.
Multiple datasets are created and then analysed individually but identically to
obtain a set of parameter estimates. Estimates are combined to obtain a set
of parameter estimates. As a flexible way of handling more than one missing
variable, apply a Multiple Imputation by Chained Equations (MICE) approach.
The benefit of the multiple imputation is that in addition to restoring the natural
variability of the missing values, it incorporates the uncertainty due to the
missing data, which results in a valid statistical inference. Refer to reference
section to get more information on MI and MICE. Below is a schematic
representation of MICE.

Predictive/Statistical models that impute the missing data

This should be done in conjunction with some kind of cross-validation scheme


in order to avoid leakage. This can be very effective and can help with the
final model. There are many options for such predictive model including
neural network. Here I am listing a few which are very popular.

Linear Regression

In regression imputation, the existing variables are used to make a prediction,


and then the predicted value is substituted as if an actual obtained value. This
approach has a number of advantages, because the imputation retains a
great deal of data over the list wise or pair wise deletion and avoids
significantly altering the standard deviation or the shape of the distribution.
However, as in a mean substitution, while a regression imputation substitutes
a value that is predicted from other variables, no novel information is added,
while the sample size has been increased and the standard error is reduced.

Random Forest
Random forest is a non-parametric imputation method applicable to various
variable types that works well with both data missing at random and not
missing at random. Random forest uses multiple decision trees to estimate
missing values and outputs OOB (out of bag) imputation error estimates. One
caveat is that random forest works best with large datasets and using random
forest on small datasets runs the risk of overfitting.

k-NN (k Nearest Neighbour)

k-NN imputes the missing attribute values on the basis of nearest K


neighbour. Neighbours are determined on the basis of distance measure.
Once K neighbours are determined, missing value are imputed by taking
mean/median or mode of known attribute values of missing attribute.

Maximum likelihood

There are a number of strategies using the maximum likelihood method to


handle the missing data. In these, the assumption that the observed data are
a sample drawn from a multivariate normal distribution is relatively easy to
understand. After the parameters are estimated using the available data, the
missing data are estimated based on the parameters which have just been
estimated.

Expectation-Maximization

Expectation-Maximization (EM) is a type of the maximum likelihood method


that can be used to create a new data set, in which all missing values are
imputed with values estimated by the maximum likelihood methods. This
approach begins with the expectation step, during which the parameters (e.g.,
variances, co-variances, and means) are estimated, perhaps using the list
wise deletion. Those estimates are then used to create a regression equation
to predict the missing data. The maximization step uses those equations to fill
in the missing data. The expectation step is then repeated with the new
parameters, where the new regression equations are determined to “fill in” the
missing data. The expectation and maximization steps are repeated until the
system stabilizes.

Sensitivity analysis

Sensitivity analysis is defined as the study which defines how the uncertainty
in the output of a model can be allocated to the different sources of
uncertainty in its inputs. When analysing the missing data, additional
assumptions on the reasons for the missing data are made, and these
assumptions are often applicable to the primary analysis. However, the
assumptions cannot be definitively validated for the correctness. Therefore,
the National Research Council has proposed that the sensitivity analysis be
conducted to evaluate the robustness of the results to the deviations from the
MAR assumption.
Algorithms that Support Missing Values

Not all algorithms fail when there is missing data. There are algorithms that
can be made robust to missing data, such as k-Nearest Neighbours that can
ignore a column from a distance measure when a value is missing. There are
also algorithms that can use the missing value as a unique and different value
when building the predictive model, such as classification and regression
trees. Algorithm like XGBoost takes into consideration of any missing data. If
your imputation does not work well, try a model that is robust to missing data.

Recommendations

Missing data reduces the power of a model. Some amount of missing data is
expected, and the target sample size is increased to allow for it. However,
such cannot eliminate the potential bias. More attention should be paid to the
missing data in the design and performance of the studies and in the analysis
of the resulting data. Application of the machine learning model techniques
should only be performed after the maximal efforts put to reduce missing data
in the design and prevention techniques.

A statistically valid analysis which has appropriate mechanisms and


assumptions for the missing data strongly recommended. Most of the
imputation technique can cause bias. It is difficult to know whether the
multiple imputation or full maximum likelihood estimation is best, but both are
superior to the traditional approaches. Both techniques are best used with
large samples. In general, multiple imputation is a good approach when
analysing data sets with missing data.

Source:
https://towardsdatascience.com/all-about-missing-data-handling-b94b8b5d21
84
IV. DATA CLASSIFICATION

A DEFINITION OF DATA CLASSIFICATION


Data classification is broadly defined as the process of organizing data by
relevant categories so that it may be used and protected more efficiently. On a
basic level, the classification process makes data easier to locate and
retrieve. Data classification is of particular importance when it comes to risk
management, compliance, and data security.

Data classification involves tagging data to make it easily searchable and


trackable. It also eliminates multiple duplications of data, which can reduce
storage and backup costs while speeding up the search process. Though the
classification process may sound highly technical, it is a topic that should
be understood by your organization’s leadership.

REASONS FOR DATA CLASSIFICATION


Data classification has improved significantly over time. Today, the technology
is used for a variety of purposes, often in support of data security initiatives.
But data may be classified for a number of reasons, including ease of
access, maintaining regulatory compliance, and to meet various other
business or personal objectives. In some cases, data classification is a
regulatory requirement, as data must be searchable and retrievable within
specified timeframes. For the purposes of data security, data classification is a
useful tactic that facilitates proper security responses based on the type of
data being retrieved, transmitted, or copied.

TYPES OF DATA CLASSIFICATION


Data classification often involves a multitude of tags and labels that define the
type of data, its confidentiality, and its integrity. Availability may also be taken
into consideration in data classification processes. Data’s level of sensitivity is
often classified based on varying levels of importance or confidentiality, which
then correlates to the security measures put in place to protect each
classification level.

There are three main types of data classification that are considered
industry standards:

● Content-based classification inspects and interprets files looking for sensitive


information

● Context-based classification looks at application, location, or creator among


other variables as indirect indicators of sensitive information
● User-based classification depends on a manual, end-user selection of each
document. User-based classification relies on user knowledge and discretion
at creation, edit, review, or dissemination to flag sensitive documents.

Content-, context-, and user-based approaches can be both right or wrong


depending on the business need and data type.

AN EXAMPLE OF DATA CLASSIFICATION


An organization may classify data as Restricted, Private or Public. In this
instance, public data represents the least-sensitive data with the lowest
security requirements, while restricted data is in the highest security
classification and represents the most sensitive data. This type of data
classification is often the starting point for many enterprises, followed by
additional identification and tagging procedures that label data based on its
relevance to the enterprise, quality, and other classifications. The most
successful data classification processes employ follow-up processes and
frameworks to keep sensitive data where it belongs.

THE DATA CLASSIFICATION PROCESS


Data classification can be a complex and cumbersome process. Automated
systems can help streamline the process, but an enterprise must determine
the categories and criteria that will be used to classify data, understand and
define its objectives, outline the roles and responsibilities of employees in
maintaining proper data classification protocols, and implement security
standards that correspond with data categories and tags. When done
correctly, this process will provide employees and third parties involved in the
storage, transmission, or retrieval of data with an operational framework.

GDPR DATA CLASSIFICATION


With the General Data Protection Regulation (GDPR) in effect, data
classification is more imperative than ever for companies that store, transfer,
or process data pertaining to EU citizens. It is crucial for these companies to
classify data so that anything covered by the GDPR is easily identifiable and
the appropriate security precautions can be taken.

Additionally, GDPR provides elevated protection for certain categories of


personal data. For instance, GDPR explicitly prohibits the processing of data
related to racial or ethnic origin, political opinions, and religious or
philosophical beliefs. Classifying such data accordingly can significantly
reduce the risk of compliance issues.

STEPS FOR EFFECTIVE DATA CLASSIFICATION


● Understand the Current Setup: Taking a detailed look at the location of
current data and all regulations that pertain to your organization is perhaps
the best starting point for effectively classifying data. You must know what
data you have before you can classify it.
● Creating a Data Classification Policy: Staying compliant with data
protection principles in an organization is nearly impossible without proper
policy. Creating a policy should be your top priority.

● Prioritize and Organize Data: Now that you have a policy and a picture of
your current data, it’s time to properly classify the data. Decide on the best
way to tag your data based on its sensitivity and privacy.

There are more benefits to data classification than simply making data easier
to find. Data classification is necessary to enable modern enterprises to make
sense of the vast amounts of data available at any given moment.

Data classification provides a clear picture of all data within an organization’s


control and an understanding of where data is stored, how to easily access it,
and the best way to protect it from potential security risks. Once implemented,
data classification provides an organized framework that facilitates more
adequate data protection measures and promotes employee compliance with
security policies.

Source:
https://digitalguardian.com/blog/what-data-classification-data-classification-def
inition

What is Data Classification

Data classification tags data according to its type, sensitivity, and value to the
organization if altered, stolen, or destroyed. It helps an organization
understand the value of its data, determine whether the data is at risk, and
implement controls to mitigate risks. Data classification also helps an
organization comply with relevant industry-specific regulatory mandates such
as SOX, HIPAA, PCI DSS, and GDPR.

Data Sensitivity Levels

Data is classified according to its sensitivity level—high, medium, or low.

● High sensitivity data—if compromised or destroyed in an unauthorized


transaction, would have a catastrophic impact on the organization or
individuals. For example, financial records, intellectual property, authentication
data.

● Medium sensitivity data—intended for internal use only, but if compromised


or destroyed, would not have a catastrophic impact on the organization or
individuals. For example, emails and documents with no confidential data.

● Low sensitivity data—intended for public use. For example, public website
content.
Data Sensitivity Best Practices

Since the high, medium, and low labels are somewhat generic, a best practice
is to use labels for each sensitivity level that make sense for your
organization. Two widely-used models are shown below.

SENSITIVITY MODEL 1 MODEL 2

High Confidential Restricted

Medium Internal Use Only Sensitive

Low Public Unrestricted

If a database, file, or other data resource includes data that can be classified
at two different levels, it’s best to classify all the data at the higher level.

Types of Data Classification

Data classification can be performed based on content, context, or user


selections:

● Content-based classification—involves reviewing files and documents, and


classifying them

● Context-based classification—involves classifying files based on meta data


like the application that created the file (for example, accounting software), the
person who created the document (for example, finance staff), or the location
in which files were authored or modified (for example, finance or legal
department buildings).
● User-based classification—involves classifying files according to a manual
judgement of a knowledgeable user. Individuals who work with documents
can specify how sensitive they are—they can do so when they create the
document, after a significant edit or review, or before the document is
released.

Data States and Data Format

Two additional dimensions of data classifications are:

● Data states—data exists in one of three states—at rest, in process, or in


transit. Regardless of state, data classified as confidential must remain
confidential.

● Data format—data can be either structured or unstructured. Structured data


are usually human readable and can be indexed. Examples of structured data
are database objects and spreadsheets. Unstructured data are usually not
human readable or indexable. Examples of unstructured data are source
code, documents, and binaries. Classifying structured data is less complex
and time-consuming than classifying unstructured data.

Data Discovery

Classifying data requires knowing the location, volume, and context of data.
Most modern businesses store large volumes of data, which may be spread
across multiple repositories:

● Databases deployed on-premises or in the cloud

● Big data platforms

● Collaboration systems such as Microsoft SharePoint

● Cloud storage services such as Dropbox and Google Docs

● Files such as spreadsheets, PDFs, or emails

Before you can perform data classification, you must perform accurate and
comprehensive data discovery. Automated tools can help discover sensitive
data at large scale. See our article on Data Discovery for more information.

The Relation Between Data Classification and Compliance

Data classification must comply with relevant regulatory and industry-specific


mandates, which may require classification of different data attributes. For
example, the Cloud Security Alliance (CSA) requires that data and data
objects must include data type, jurisdiction of origin and domicile, context,
legal constraints, sensitivity, etc. PCI DSS does not require origin or domicile
tags.

Creating Your Data Classification Policy

A data classification policy defines who is responsible for data


classification—typically by defining Program Area Designees (PAD) who are
responsible for classifying data for different programs or organizational units.

The data classification policy should consider the following questions:

● Which person, organization or program created and/or owns the information?

● Which organizational unit has the most information about the content and
context of the
information?

● Who is responsible for the integrity and accuracy of the data?

● Where is the information stored?

● Is the information subject to any regulations or compliance standards, and


what are the penalties associated with non-compliance?

Data classification can be the responsibility of the information creators,


subject matter experts, or those responsible for the correctness of the data.

The policy also determines the data classification process: how often data
classification should take place, for which data, which type of data
classification is suitable for different types of data, and what technical means
should be used to classify data. The data classification policy is part of the
overall information security policy, which specifies how to protect sensitive
data.

Data Classification Examples

Following are common examples of data that may be classified into each
sensitivity level.

Sensitivity Examples
Level

High Credit card numbers (PCI) or other financial account numbers, customer
personal data, FISMA protected information, privileged credentials for IT
systems, protected health information (HIPAA), Social Security numbers,
intellectual property, employee records.

Medium Supplier contracts, IT service management information, student education


records (FERPA), telecommunication systems information, internal
correspondence not including confidential data.

Low Content of public websites, press releases, marketing materials, employee


directory.

Source: https://www.imperva.com/learn/data-security/data-classification/
V. BUSINESS MODELING, METRICS AND
MEASUREMENT

Leveraging the Business Model Canvas for Analytics


If you have worked with the business model canvas, you may be wishing to
leverage it more fully in your day-to-day operations. You might be wondering,
"How do I measure the execution of the different elements on the canvas?
How do I identify and communicate the most critical aspects of the elements
arranged on the canvas?"
This article takes a hypothetical business model for Toro, the leading
worldwide provider of innovative solutions for the outdoor environment. From
lawnmowers in your garage to the golf course, Toro provides a wide array of
tools for residential and professional use. (The business model below was
built from details of quarterly analyst calls and available online research. No
relationship with Toro is implied.)
A Hypothetical Canvas for Toro
While the completion of the Strategyzer canvas for Toro is outside the scope
of this article, you may note critical business model elements broken into key
sections: the value proposition, the customer, revenue generation, partners
and so forth.
This work is generally completed in group work sessions as strategies are
conceived and revised.
The Business Model Canvas as Data Dashboard
After the canvas is complete it becomes primarily a tool of reference. What we
propose here is to extend the canvas into a day-to-day tool to assess the
results from the canvas.
We want to first measure the execution of the plan through a small set of key
performance indicators (KPIs). To arrive at these KPIs, we suggest distilling
the critical performance questions of the model. Bernard Marr, the innovator of
the key performance question (KPQ) talks about the process here.
These key performance questions (KPQs) provide a focused set of subject
areas for the development of the KPIs themselves. Rather than proceed with
an open-ended selection of individual KPIs, the use of key performance
questions and the business model canvas create clear boundaries around
"what is important."

❖ Keep the KPI Count Manageable

Focus is not the only question of developing KPIs for the canvas. There is
also the question of quantity. We strongly advise to keep the count down to a
limited set that can be committed to memory. Ideally, this would be less than
10. Marr's personal recommendation is 7.
It won't come as a surprise that the rigor of identifying the KPQs and KPIs can
be challenging. We want the executives, managers and employees of an
organization to align around them, to be familiar with them and to ground their
initiatives with an eye to these metrics. It's a tall order, but well worth it.
Integrating KPIs Directly on the Canvas
Now let's marry the KPIs to the canvas. We'll continue with the Toro example.
As you've seen above, we've mapped the KPIs directly onto the surface of the
canvas and associated them with the appropriate part of the canvas. While
there is a wide range of aesthetic approaches to this, the key outcome must
be that we can assess - quantitatively - how the business model's vision is
unfolding. Simple KPI colors draw the eye to status - and the relative
challenge of resolving.
Learn About Our Strategic Canvas

❖ The Indicators Will Change Over Time

Of course, as critical performance questions are resolved, new "bottleneck"


considerations will come to the fore. These new KPIs can be swapped out for
the old, the organization can be brought along on the journey, and the shared
vision for the enterprise maintained.
Most importantly, the business model canvas can take its place as a living tool
and top-of-mind for everyone in the organization. Consistently distributing the
canvas directly to employees, texting links to it and making it visible in
strategic locations will all help highlight its importance - and ensure everyone
has access to the company strategy.
Alignment with a Traditional Dashboard
While the canvas shown above can be useful enterprise visualization, it is
equally desirable to flesh out these same KPIs into a more actionable format.
This can be done through a coordinated KPI dashboard. The metric selection,
measurement, definitions and color-coding should be consistent.
Of course, the metrics on the KPI dashboard can provide further interactive
features as well.

❖ What Gets Measured Gets Done

The creation of Strategyzer's business model canvas doesn't need to be a


"one-and-done." We see it as the beginning of the journey rather than a
culmination. Developed correctly, it can become the foundation of a powerful
approach to organizational communication, alignment and status.
With the additional of overlaid measurement, employees can see both what
they're doing - and how well they're doing it.

Source:
http://www.bartlettsystem.com/blog/business-model-canvas-integrating-kpis

An Overview of Business Metrics for Data-Driven Companies

Essentials Business Skills for Success:


● Learn best practices for using data analytics to make a business achieve
its objectives.

● Learn to recognize the most critical business metrics and distinguish


those from mere data.

● Clearly understand the roles played by Business Analysts, Business Data


Analysts, and Data Scientists.

● Know exactly the skills required to be hired and to succeed in each of


these.

What about Business Metrics?

● Metrics help us ask the right questions, and measure success, give
feedback to improve on the strategies.

● There are three metrics: Revenue, Profitability and Risk Metrics:

● Revenue metric is related to sales and marketing, Profitability is related to


efficiency, operations, Risk is related to sustainability given present
cash-flow conditions.

● There are traditional Vs Dynamic Metrics. Any change is not easy to see
in traditional metrics, like Quarterly revenue. It is easy to measure in
dynamic metric like website visits getting converted to clicks and
eventually purchase

● Consider a hypothetical business model: the business has to give cash


on delivery for raw goods from the vendor, takes one month to produce
and sell its product, payment terms are “Net 60”. This means 3 months
delay in getting return on investment. This is technically
called “Negative Float”. As the business receives more and more orders
month on month, even if it had cash reserves to start with, eventually it
would face cash flow problems [issue is it has to pay its vendors
immediately for increased raw goods needed!]. This can be prevented
with a loan against accounts receivable. More orders meant
more profitable as per the book, as cost per unit production goes down,
but business would go bankrupt if negative float is not handled. Important
point to note is profitability metric alone is not sufficient; Handling risk,
not encouraging rapid growth is also important part of the game.

Traditional Metrics : Personal Sales.

This is an example of not data driven metric


● There is something called “Traditional Enterprise Sales Funnel”. It
historically refers to big sales which need full time involvement of sales
people in your company. It can involve multiple meetings, visiting
prospective customers etc

● The amount of revenue that a traditional enterprise sales brings should


justify the cost, or else firm would go out of business. Thumb-rule is,
$250k one time or recurring yearly sale of $100k should be the revenue
that it brings to sustain the business.

● People are normally happy to talk about how interested they are in your
product and introduce you to many people who actually may not have any
budget to buy, and even make you travel and present at your own
expense. Business needs to be cautious of such cases.

● Great sales guys in an organization have a knack to identify decision


makers, weed out the rest, quickly get past the gatekeepers.

● But this still does not mean sales has happened. Lot of things can go
wrong, like decision maker quitting, company getting acquired, the project
getting scrapped.

● Some key metrics are sales leads, qualified sales leads, time taken in
getting to the right person, making them say yes

Dynamic Metrics :

Computer Sales — Data Driven — This section covers Revenue Metrics

● Explained by taking example of Amazon. This can be extended to any


business, retail or otherwise, that makes most of its sales through a web
interface.

● Dynamic metrics examples are click rates, most sought after items,
people who viewed this items also viewed these etc.

● Customize based on user activity in real time, that is “right now”!

● When a search string is typed in Amazon, there is a complex mechanism


which retrieves best sellers related to user query that will maximize the
probability of buying. Key metrics : top subject area categories related to
search terms, top best selling books within each subcategory closely
matching the query

● Why despite showing frequently bought together items, Amazon


does not offer a discount? Obviously their data has shown that
doing so has not resulted in increase in sales. — I found this an
interesting counter-intuitive point.
● Amazon is not telling us what percentage of people who bought “this”
book bought “that” book too, but clearly it is tracking that data, which is
formally known as co-occurrence data. Co-occurrence sales is an
important Dynamic Metric for Amazon.

● Also note that it does not say “most frequently bought together”, only
“frequently bought together”. There is a possibility that out of top 100
co-occurrence data, the choice to display this co-occurrence over the
other is made due to an A/B testing that results in better sales
and revenue. These numbers are dependent on book price and volumes,
remember!

● Amazon also maintains a co-occurrence for click data, so that it can tempt
the user into buying just by showing “people also viewed this item” list.

Profitability/Efficiency Metrics:

● Inventory Management: Take any retail that sells goods, where inventory
and stock keeping unit (SKU) come into picture. Sales price being the
same, cost to manage inventory is the deciding factor. More number of
days item is in the “inventory” — that is at shelf or storage, lesser the
profit because of many factors explained. Average number of days in
inventory-called “days inventory”- is one of the Profitability
Metric. The company's inventory on hand at the end of the year divided
by the total annual cost of goods sold and then multiplied by 365 for days
of the year is a very good estimate of average days inventory.

● Too few inventory can cause customer go empty handed, never visiting
the store again, hard to know. A good metric to capture this lost
customer is number of times inventory of any particular SKU
reached zero.

● Good practices: If a customer walks out empty handed, follow them up


with a question + gift voucher on whether there is anything they wanted
and could not find. Another practice is to make it a practice to ask them at
the checkout anyway and record. This kind of data could be used by data
analyst effectively.

● Hotel Room and Airline Examples: If one seat in a flight is not occupied,
that is lost opportunity, similarly with a hotel room not rented. The variable
cost per unit is negligible compared to huge fixed cost. It is called “sunk
cost”. So, there is concept of variable pricing. Analyze the occupancy
rates for weekly or seasonal patterns etc, predict expected occupancy
rate. Based on these factors the hotel can offer three different types of
rates, rack rate, intermediate or promotional rate and floor rate

Risk Metric:
● When a company owes more than it is worth — “Excessive Leverage”,
it is unlikely to survive

● Reputation risk: Prevention of brand getting bad reputation is often done


by “Level one product recall”. Time to product recalls is a good risk
metric. Example of Costco: Easy to track, because of membership
requirement. In Jan-Feb 2010, 272 people in 44 states got sick with
genetically identical strain of salmonella. Someone analyzed purchase
records of Costco and traced down the product which caused this.

Source:
https://towardsdatascience.com/an-overview-of-business-metrics-for-data-driv
en-companies-b0f698710da1

What Are Metrics?

Metrics are measures of quantitative assessment commonly used for


assessing, comparing, and tracking performance or production. Generally, a
group of metrics will typically be used to build a dashboard that management
or analysts review on a regular basis to maintain performance assessments,
opinions, and business strategies.

Understanding Metrics

Metrics have been used in accounting, operations, and performance analysis


throughout history.

Metrics come in a wide range of varieties with industry standards and


proprietary models often governing their use.

Executives use them to analyze corporate finance and operational strategies.


Analysts use them to form opinions and investment recommendations.
Portfolio managers use metrics to guide their investing portfolios.
Furthermore, project managers also find them essential in leading and
managing strategic projects of all kinds.

Overall, metrics refer to a wide variety of data points generated from a


multitude of methods. Best practices across industries have created a
common set of comprehensive metrics used in ongoing evaluations. However,
individual cases and scenarios typically guide the choice of metrics used.

Choosing Metrics

Every business executive, analyst, portfolio manager, and the project


manager has a range of data sources available to them for building and
structuring their own metric analysis. This can potentially make it difficult to
choose the best metrics needed for important assessments and evaluations.
Generally, managers seek to build a dashboard of what has come to be
known as key performance indicators (KPIs).

In order to establish a useful metric, a manager must first assess its goals.
From there, it is important to find the best outputs that measure the activities
related to these goals. A final step is also setting goals and targets for KPI
metrics that are integrated with business decisions.

Academics and corporate researchers have defined many industry metrics


and methods that can help shape the building of KPIs and other metric
dashboards. An entire decision analysis method called applied information
economics was developed by Douglas Hubbard for analyzing metrics in a
variety of business applications. Other popular decision analysis methods
include cost-benefit analysis, forecasting, and Monte Carlo simulation.

Several businesses have also popularized certain methods that have become
industry standards in many sectors. DuPont began using metrics to better
their own business and in the process came up with the popular DuPont
analysis which closely isolates variables involved in the return on equity
(ROE) metric. GE has also commissioned a set of metrics known as Six
Sigma that are commonly used today, with metrics tracked in six key areas:
critical to quality; defects; process capability; variation; stable operations; and,
design for Six Sigma.

Examples of Metrics
While there are a wide range of metrics, below are some commonly used
tools:

Economic Metrics

● Gross domestic product (GDP)


● Inflation
● Unemployment rate

Operational Company Metrics

From a comprehensive perspective, executives, industry analysts, and


individual investors often look at key operational performance measures of a
company, all from different perspectives. Some top-level operational metrics
include measures derived from the analysis of a company’s financial
statements. Key financial statement metrics include sales, earnings before
interest and tax (EBIT), net income, earnings per share, margins, efficiency
ratios, liquidity ratios, leverage ratios, and rates of return. Each of these
metrics provides a different insight into the operational efficiency of a
company.

Executives use these operational metrics to make corporate decisions


involving costs, labor, financing, and investing. Executives and analysts also
build complex financial models to identify future growth and value prospects,
integrating both economic and operational metric forecasts.

There are several metrics that are key to comparing the financial position of
companies against their competitors or the market overall. Two of these key
comparable metrics, which are based on market value,
include price-to-earnings ratio and price-to-book ratio.

Portfolio Management

Portfolio managers use metrics to identify investing allocations in a portfolio.


All types of metrics are also used for analyzing and investing in securities that
fit a specific portfolio strategy. For example, environmental, social and
governance (ESG) criteria are a set of standards for a company's operations
that socially conscious investors use to screen potential investments.

Project Management Metrics

In project management, metrics are essential in measuring project


progression, output targets, and overall project success. Some of the areas
where metric analysis is often needed include resources, cost, time, scope,
quality, safety, and actions. Project managers have the responsibility to
choose metrics that provide the best analysis and directional insight for a
project. Metrics are followed in order to measure the overall progression,
production, and performance.

KEY TAKEAWAYS

● Metrics are measures of quantitative assessment commonly used for


comparing, and tracking performance or production.
● Metrics can be used in a variety of scenarios.
● Metrics are heavily relied on in the financial analysis of companies by
both internal managers and external stakeholders.

Source: https://www.investopedia.com/terms/m/metrics.asp

Business Metric or Key Performance Indicator? What's the Difference?

To be effective, business metrics should be compared to established benchmarks


or business objectives. This provides valuable context for the values used in the
metric and allows business users to better act on the information they are viewing.
For instance, $20M sales in Q4 sounds like an impressive figure; however, if
you're Boeing Aircraft, this figure would have you contemplating filing for
bankruptcy.
Context allows business metrics to make an impact. In fact, this is where the line
between key performance indicators and performance metrics becomes blurry.
The difference between the two ultimately comes down this:

● Business metrics are used to track all areas of business.

● KPIs target critical areas of performance.

For example, a metric may monitor website traffic compared to a goal, whereas a
KPI would monitor how website traffic contributed to incremental sales.

Examples of Business Metrics

Depending on your company and the areas you’re aiming to monitor, you may
want to focus on certain business metrics in particular. Here, you can see a
number of performance metrics examples for industry verticals and departments
that are available to you:

Marketing Metrics

Marketing Metrics are measurable values used by marketing teams to display


the overall performance of social platform accounts, campaigns, lead
nurturing, etc. Monitoring digital marketing KPIs can help your team stay on
target from month-to-month. With the vast base of different marketing
channels used by team, it is crucial for marketing teams to actively track their
progress using the right and most-effective metrics. Depending on the
channels your team is monitoring, metrics and KPIs will vary. Below are a few
examples of key marketing metrics:
Web Traffic Sources

Incremental Sales

Social Sentiment
End Action Rate

SEO Keyword Ranking

SEO Traffic
Sales Metrics

In order to thrive in a highly competitive business environment, organizations


must be in control of their sales. The best way to gain control over your sales
is to provide you sales team with the right performance indicators and metrics.
Here you can find the sales KPIs and metrics that are crucial to monitor:

Sales Growth

Average Profit Margin


Average Purchase Value

Product Performance

Financial Metrics

Company success rides on generating revenue and properly managing your


finances. It’s not only customers that will be scouring your financial data, but
also potential investors and stockholders; Not having control of your financials
can turn key people off your organization. Use these metrics to monitor and
prove the fiscal health of your business:
Quick Ratio / Acid Test

Debt to Equity Ratio

Current Ratio
Working Capital

Interactive glossary of metrics defined


by experts

Find metrics that matter to you

SaaS Metrics

SaaS (software as a service) companies need to pay close attention to


metrics that display their ability to retain customers, generate recurring
revenue, and to attract customers. These examples are some of the top
metrics for SaaS companies:

Customer Lifetime Value


Customer Churn Rate

Monthly Recurring Revenue

Customer Retention Rate


Social Media Metrics

Social Media Metrics are values used by marketing teams to track the
performance of social media campaigns. Social media marketing is a
fundamental part of any business, bringing in website visits and eventually
converting web users into lead. Since marketing teams often use multiple
social media platforms to increase impressions, it can be difficult to monitor
performance on all of them. These social media metrics combine the most
important data and allow your team to track their progress:

Social Followers vs Target

Facebook Page Stats


Twitter Followers Metric

Key Social Metrics

And More Business Metrics

Every business has data that is critical to monitor in real-time, or over time.
These metric examples are applicable (and important) to a multitude of
departments and fields:
Time to Healthcare Service

Service Level

Call Abandonment
Project Burndown Metric

Monitor Your Performance With The Right Business Metrics Dashboard

Business performance metrics are crucial in keeping teams, executives, investors,


and customers informed and aware of how a company is performing. The easiest
and most effective way to stay on top of your company’s performance is by having
your key metrics on a comprehensive business dashboard. Different departments
need to keep an eye on different metrics, so what the right business dashboard is
will vary from department to department, and from company to company.

Source:
https://www.klipfolio.com/resources/articles/what-are-business-metrics

Take Advantage Of Operational Metrics & KPI Examples – A


Comprehensive Guide
By Sandra Durcevic in Data Analysis, Nov 21st 2018
“What gets measured gets done.” – Peter Drucker

Using data in today’s businesses is crucial to evaluate success and gather


insights needed for a sustainable company. Identifying what is working and
what is not is one of the invaluable management practices that can decrease
costs, determine the progress a business is making, and compare it to
organizational goals. By establishing clear operational metrics and evaluate
performance, companies have the advantage of using what is crucial to stay
competitive in the market, and that’s data.

We have written about management reporting methods that can be utilized


into the modern practice of creating powerful analysis, bringing complex data
into simple visuals and employ them to make actionable decisions. Now we
will focus on operational KPIs and metrics that can ultimately bring the
indispensable value out of the overall business performance, by concentrating
on the most important business question: what can I do to perform even
better?

But let’s start with the basics of business operations, and provide foundations
for analyzing your own metrics and KPIs while focusing on industry and
company department-specific examples which a business can use for its own
development. We will discuss Marketing, Human Resources, Sales, Logistics
and IT Project Management examples that can grow the operational efficiency
and decrease costs.

What Is An Operational KPI?

An operational KPI is a quantifiable value expressing the business


performance in a shorter time-frame level. They are used in different
industries to track organizational processes, improve efficiency and help
businesses to understand and reflect on the outcomes.
By analyzing KPI examples for a specific industry or function, a business can
reduce the amount of time needed to evaluate the overall performance. An
additional important thing to consider is which one business should implement
in order to gain sustainable success and maintain its competitiveness on the
market.

How To Select Operational Metrics And KPIs?

Since every business is different, it is essential to establish specific metrics


and KPIs to measure, follow, calculate and evaluate. As mentioned earlier,
both are used to measure a business performance, so we will discuss which
should be used in which scenarios and what to be careful about when
selecting the right one for your business.

Key performance indicators in a business management environment should


be constituent of 4 primary parameters that need to be taken into
consideration:

1. What exactly needs to be measured?


2. Who will measure it?
3. What is the time interval in between measuring?
4. How frequently the information is being sent to the management level?

Turning these datasets into a business dashboard can effectively track the
right values and offer a comprehensive application to the entire business
system.

The analysis of operational KPIs and metrics with the right KPI software can
be easily developed by turning raw data into a neat and interactive online
dashboard, providing insights that can be easily overlooked when creating
traditional means of reporting and analysis, like spreadsheets or simple
written reports. Operational KPIs and metrics can be immense and boundless
if not defined and used properly, so taking care of the mentioned basics we
have outlined, should be one of the top priorities when deciding on which one
to use. Later we will discuss examples, so that a clear overview is made on
which one to identify and utilize – on an industry and function level.

When a business is measuring the effectiveness of a process, often metrics


and KPIs are established to perform the evaluation and analysis. The key
factor to consider is also to employ a holistic view of operational metrics that
are being identified and used. A business cannot track only one and expect to
obtain sustainable development. By using multiple types of metrics,
companies can leverage more data and acquire insights needed for success.

Top 10 Operational Metrics Examples

While there are numbers of operational metrics to choose from, a company


needs to be careful which one will be of utmost importance and value. That
being said, we will discuss operational metrics examples that can be used in
business processes and outline the most prominent ones, while
using business analytics tools as our invaluable assistance.

1. Marketing: CPC (Cost-per-Click)

The need to establish specific operational metrics and track their efficiency
creates invaluable results for any marketing campaign. Let's see this through
an example.
The CPC (Cost-per-Click) overview of campaigns is an operational metric that
expounds on the standard pricing model in online advertising. While
comparing different campaigns into the CPC section of the overall strategy,
you can easily spot which one had the lowest price and tackle deeper into the
details. While this marketing KPI is priceless when it comes to advertising, it
should be viewed in relations to other important operational metrics. Below in
the article you can find a holistic overview of different kinds of KPIs that are
used in a standard marketing practice.

2. Marketing: CPA (Cost-per-Acquisition)


Another example we could analyze is the CPA (Cost-per-Acquisition) in
correlation with the specific marketing channel, as presented in the visual
above. The CPA metric is even more performance-based since it's
concentrated on the price of acquiring a customer, not clicks made to a
website. Using these indicators to reflect on the outcomes of a campaign and
establish future processes can be of invaluable significance.

3. Human Resources: Absenteeism Rate


Another example comes from the HR industry, and considers the engagement
of the employees. This is an extremely important HR KPI since it concentrates
on the main workforce actions needed to establish a successful HR strategy -
the number of employees calling sick, missing work or skipping, can tell the
organization what kind of impact it will have in the long run.

4. Human Resources: Overtime Hours


The workload of employees is an operational KPI that can impact the
Absenteeism Rate, if the workforce is understaffed and deals with higher
amounts of pressure. This performance indicator should be monitored in detail
since it can be interpreted differently, according to the context (for example, is
the economic growth or high volume of orders causing overtime hours, or
something completely different?).

5. Sales: Lead-to-Opportunity Ratio


In this sales example above, the Lead-to-Opportunity ratio provides insights
into the number of leads a sales professional or manager needs to stay on
target with revenue goals. Since this is the first part of the sales funnel, you
can easily spot which leads have turned into qualified ones and easily
calculate the ratio. It would make sense to dig deeper into the exact source of
qualified leads so that you can guide the marketing and sales team even
better.

6. Sales: Lead Conversion Ratio


One of the most important sales KPIs is the Lead Conversion Ratio - it defines
the number of interested people that turned into actual paying customers - a
magic sales number indeed. After finding your baseline, you will understand
how many leads you need to obtain for a healthy sales pipeline. If the
conversion rate is low, you can be sure that the pipeline needs additional
adjusting.

7. Logistics: Delivery Time


A standard logistics KPI, Delivery Time, measures the time between an order
is placed to be shipped, and the moment it is delivered to the customer or the
post office. The average amount will then show you where you need to
decrease these values and provide a base for specifying the exact time your
customers can expect their package.

8. Logistics: Transportation Costs


All the costs related to the transportation process can be seen in this example
above: the order processing, administrative costs, inventory carrying,
warehousing and, finally, the actual transportation costs. This will help
determine the average numbers and the distribution expressed in percentage.
The final goal is to decrease the costs while maintaining a high-quality
delivery process.

9. IT: Total Tickets vs Open Tickets


The overall progress of the project is one of the top IT KPI to measure. When
visualizing the overall progress in the correlation of the launch date, the
management can easily spot if there are issues across the system. That's why
it is also important to monitor the workload of staff and their deadlines, as
displayed in the example above. Measuring the open tickets vs completed
ones can set benchmarks for the project management and help in the
optimization of the overall ticketing system.

10. IT: Average Handle Time


Another example from the IT project management function is the Average
Handle Time of tasks. It helps in the process of monitoring planned projects,
tasks and/or Sprints. By evaluating each member of the team, alongside the
overall average handle time of tasks, you can easily spot if any deficiency is
occurring in the system, and, therefore, adjust accordingly.

Interconnected Operational Metrics And KPIs

After we have provided specific KPIs from industries and functions, now we
will focus on a holistic overview, and how they are interconnected into an
overall operational process. Let's analyze this through examples.

Marketing: Is my budget on track?


The operational metrics and KPIs example presented on the dashboard
above focuses on the marketing performance of specific campaigns on an
operational level. The significance lays within the fact that this clear overview
can help marketing managers and professionals to develop a
comprehensive data driven marketing strategy. Changes will alert the
marketing team which can then optimize the campaign and make sure the
budget stays on track.

Human Resources: Is our productivity on track?

The second of our operational metrics examples we will focus on is the


employee performance, shown through an HR dashboard presented below.
This interactive dashboard example shows relevant metrics to keep under
closer consideration while analyzing employees’ performance and behavior.
The absenteeism rate metric should be monitored since it can affect the
financial state of a business (holistic view, remember?), but, most importantly,
it can provide insights into the potential causes and reasons of absence. This
can be used then to improve business operation, productivity, and
subsequently reduce costs.

Another interesting metric to take into account is the overtime hours. That way
is easy to spot if employees are understaffed or lack training, that can also
affect productivity. The main focus is not to put workers under pressure which
can lead to demotivation. A comprehensive HR report can utilize all the
effectiveness needed to develop and maintain a sustainable workforce in a
company.

Sales: What details should I keep an eye on?

One of our operational metrics examples we will focus on next is sales. When
considering the sales cycle process, it is of utmost importance to compile a
succinct operations’ monitoring process to ensure all the sales stages, leading
to conversions, are covered.
Metrics shown in the example above provide operational details needed to
compile a holistic overview of the sales conversion rate cycle. Leads don’t
always turn into opportunities, and proposals don’t always yield wins, but the
monitoring process of your metrics can easily identify if the overall
performance is on track and developing as planned. The magic is in the
details, and this dashboard presentation can effectively round up the
data-story you need.

Compiling information into a visual narrative can help organizations decipher


all the raw waves of data, since 90% of information transmitted to the brain is
visual, which can then enable to connect the multidimensional relationships
between operational metrics, KPIs, and make sense of interdepartmental and
different organizational levels.

IT Project Management: Is my project on target?


By gaining insights into the project management of the IT performance, this IT
dashboard example above provides a holistic view of the KPIs and metrics
needed to obtain a sustainable level of efficiency. The overview of the project
management can deliver fast and accurate data to establish a smooth
operational performance. Consequently, it will reduce costs since any
changeability will be clearly visualized in this simple interactive dashboard.

Logistics: How efficient is my transportation process?

Our final example we will discuss concentrates on the logistics level of


operating transportation. Monitoring the fleet efficiency on a detailed
performance level – how much is on the move, and how much in
maintenance, will help you collect data needed to create a sustainable
strategy, or to monitor if the KPIs are on track.
The loading time and weight viewed over a set time-frame will provide you
with insights on the average amounts and controlling points of the
transportation process and the efficiency you are running your operations.

To conclude, the aspects of operational metrics and KPIs, viewed from


different industries, levels of operations and specific processes needed to
establish a sustainable development, can be effectively managed if you set
valuable indicators to track the performance of a company. It is not just about
collecting data, it is also about interpreting them into the right context and
organize them to complement the companies’ intelligence performance.

To put things into perspective, here are the Top 10 operational metrics from
different functions and industries:

1. CPC (Cost-per-Click)

2. CPA (Cost-per-Acquisition)

3. Absenteeism Rate
4. Overtime Hours

5. Lead-to-Opportunity Ratio

6. Lead Conversion Ratio

7. Delivery Time

8. Transportation Costs

9. Total Tickets vs Open Tickets

10. Average Handle Time

Source: https://www.datapine.com/blog/operational-metrics-and-kpi-examples/

8 Steps for Measurement and Analytics Success

By: Gordon Plutsky | 03/10/2016

Measurement is a hot topic with marketers who are focused on technology


buyers and business decision-makers. The buying process for IT products
and related services is complex, long and involves many players. Clients often
ask what should be the most important metric for success; or what are best
practices for measuring their campaigns. There is no one-size-fits-all or
simple answer because of the complexity of the purchase. However, there are
some logical steps you can take to accurately gauge your digital marketing
efforts to this lucrative audience.

1. Align Your Objectives


The most important first step is to make sure you have alignment
between your business objectives and marketing campaign tactics and
metrics. To get the desired outcome, you need to make sure you have
the right message going to the right audience on the proper platform.
And, the timing of the messages should correspond to where the
prospect is in the buying process. Your campaign key performance
indicators (KPIs) should spring from that concept.

2. Create Buyer Personas


A smart technique is to map your digital marketing metrics to the sales
funnel or buying process. Before tackling the buying process, it is a
good idea to create buying personas for the target audience which
should include all members of the decision-making committee. These
personas should include media and content format preferences.
3. Map the Buyers Journey
The next step is mapping the buyer’s journey for each of the key
personas that constitute the buying committee. Ideally, your marketing
campaign and tactics would be focused on the information needs of
each persona along the buying process.

4. Review KPIs
As you move your prospects along the journey from awareness to
consideration to purchase (and repurchase/retention), your marketing
tactics, messaging, and content will deepen the relationship. At each
phase you should be looking at different KPIs that will signal success
or a need to optimize and improve performance.

5. Measurements for Top of Funnel


If you are doing a top of the funnel ad campaign to build awareness,
brand equity, and knowledge of your product, you should measure
items such as reach and frequency, web site traffic, and impressions.
Also, you could measure a lift in brand perception by a quantitative
study.

6. Measuring throughout the Buyer’s Journey.


As you move your prospects down the journey to “building preference
and consideration,” your marketing goals are to inspire and inform and
then persuade. This is where content engagement becomes critical as
buyers do their active evaluation and build short lists. Here you want to
look at content consumption and engagement metrics such as
downloads, time spent, search performance and how many content
touches prospects have on your web site.

7. Metrics for Bottom of the Funnel


When it comes time for purchase, the key metrics can be things such
as: conversion to sales from a lead, average order size, repurchase
rate, and conversion rate per marketing channel/tactic.

8. Calculate Lifetime Value


A sometimes overlooked key step is calculating the lifetime value of a
new customer by marketing channel (e.g. display, search, lead gen,
content download etc.). This will give you great intelligence from your
marketing campaign. For example, a certain search term could be very
expensive, but if it yields a customer who spends a lot over a long
period of time, then it can be return on investment (ROI) positive.
Conversely, a customer who comes in through a discount promotion
who only buys that one time and never again could be a money losing
proposition. One caveat with this approach, it often depends on “last
click attribution” (read more about this here) – giving credit to the
channel from where the prospect directly comes from in the mix. In
reality, channels are additive and build synergy with each other.

In summary, the best course of action to maximize your marketing ROI is to


take a holistic look at the buying process and the lifetime value of customers
generated by your campaigns to accurately evaluate your success.

Source: https://www.idg.com/8-steps-for-measurement-and-analytics-success/
Davenport and Harris article - “ The Dark Side of Customer
Analytics
by Thomas H. Davenport and Jeanne Harris
From the May 2007 Issue

Explore The Archive

Laura Brickman was glad she was almost done grocery shopping.
The lines at the local ShopSense supermarket were especially long for a Tuesday evening.
Her cart was nearly overflowing in preparation for several days away from her family, and
she still had packing to do at home. Just a few more items to go: “A dozen eggs, a half gallon
of orange juice, and—a box of Dip & Dunk cereal?” Her six-year-old daughter, Maryellen,
had obviously used the step stool to get at the list on the counter and had scrawled her
high-fructose demand at the bottom of the paper in bright-orange marker.

Laura made a mental note to speak with Miss Maryellen about what sugary
cereals do to kids’ teeth (and to their parents’ wallets). Taking care not to
crack any of the eggs, she squeezed the remaining items into the cart. She
wheeled past the ShopSense Summer Fun displays. “Do we need more
sunscreen?” Laura wondered for a moment, before deciding to go without.
She got to the checkout area and waited.

As regional manager for West Coast operations of IFA, one of the largest
sellers of life and health insurance in the United States, Laura normally might
not have paid much attention to Shop-Sense’s checkout procedures—except
maybe to monitor how accurately her purchases were being rung up. But now
that her company’s fate was intertwined with that of the Dallas-based national
grocery chain, she had less motivation to peruse the magazine racks and
more incentive to evaluate the scanning and tallying going on ahead of her.

Some 14 months earlier, IFA and ShopSense had joined forces in an


intriguing venture. Laura for years had been interested in the idea of looking
beyond the traditional sources of customer data that insurers typically used to
set their premiums and develop their products. She’d read every article, book,
and Web site she could find on customer analytics, seeking to learn more
about how organizations in other industries were wringing every last drop of
value from their products and processes. Casinos, credit card companies,
even staid old insurance firms were joining airlines, hotels, and other
service-oriented businesses in gathering and analyzing specific details about
their customers. And, according to recent studies, more and more of those
organizations were sharing their data with business partners.

Laura had read a profile of ShopSense in a business publication and learned


that it was one of only a handful of retailers to conduct its analytics in-house.
As a result, the grocery chain possessed sophisticated data-analysis methods
and a particularly deep trove of information about its customers. In the article,
analytics chief Steve Worthington described how the organization employed a
pattern-based approach to issuing coupons. The marketing department
understood, for instance, that after three months of purchasing nothing but
Way-Less bars and shakes, a shopper wasn’t susceptible to discounts on a
rival brand of diet aids. Instead, she’d probably respond to an offer of a free
doughnut or pastry with the purchase of a coffee. The company had even
been experimenting in a few markets with what it called Good-Sense
messages—bits of useful health information printed on the backs of receipts,
based partly on customers’ current and previous buying patterns. Nutritional
analyses of some customers’ most recent purchases were being printed on
receipts in a few of the test markets as well.

Shortly after reading that article, Laura had invited Steve to her office in San
Francisco. The two met several times, and, after some fevered discussions
with her bosses in Ohio, Laura made the ShopSense executive an offer. The
insurer wanted to buy a small sample of the grocer’s customer loyalty card
data to determine its quality and reliability; IFA wanted to find out if the
ShopSense information would be meaningful when stacked up against its own
claims information.

With top management’s blessing, Steve and his team had agreed to provide
IFA with ten years’ worth of loyalty card data for customers in southern
Michigan, where ShopSense had a high share of wallet—that is, the
supermarkets weren’t located within five miles of a “club” store or other major
rival. Several months after receiving the tapes, analysts at IFA ended up
finding some fairly strong correlations between purchases of unhealthy
products (high-sodium, high-cholesterol foods) and medical claims. In
response, Laura and her actuarial and sales teams conceived an offering
called Smart Choice, a low-premium insurance plan aimed at IFA customers
who didn’t indulge.

Laura was flying the next day to IFA’s headquarters in Cincinnati to meet with
members of the senior team. She would be seeking their approval to buy
more of the ShopSense data; she wanted to continue mining the information
and refining IFA’s pricing and marketing efforts. Laura understood it might be
a tough sell. After all, her industry wasn’t exactly known for embracing radical
change—even with proof in hand that change could work. The make-or-break
issue, she thought, would be the reliability and richness of the data.

“Your CEO needs to hear only one thing,” Steve had told her several days
earlier, while they were comparing notes. “Exclusive rights to our data will give
you information that your competitors won’t be able to match. No one else has
the historical data we have or as many customers nationwide.” He was right,
of course. Laura also knew that if IFA decided not to buy the grocer’s data,
some other insurer would.

“Exclusive rights to our data will give you information that your
competitors won’t be able to match. No one else has the historical data
we have.”
“Paper or plastic?” a young boy was asking. Laura had finally made it to front
of the line. “Oh, paper, please,” she replied. The cashier scanned in the
groceries and waited while Laura swiped her card and signed the touch
screen. Once the register printer had stopped chattering, the cashier curled
the long strip of paper into a thick wad and handed it to Laura. “Have a nice
night,” she said mechanically.

Before wheeling her cart out of the store into the slightly cool evening, Laura
briefly checked the total on the receipt and the information on the back:
coupons for sunblock and a reminder about the importance of UVA and UVB
protection.

Tell It to Your Analyst

“No data set is perfect, but based on what we’ve seen already, the
ShopSense info could be a pretty rich source of insight for us,” Archie Stetter
told the handful of executives seated around a table in one of IFA’s recently
renovated conference rooms. Laura nodded in agreement, silently cheering
on the insurance company’s uberanalyst. Archie had been invaluable in
guiding the pilot project. Laura had flown in two days ahead of the meeting
and had sat down with the chatty statistics expert and some members of his
team, going over results and gauging their support for continuing the
relationship with ShopSense.

“Trans fats and heart disease—no surprise there, I guess,” Archie said, using
a laser pointer to direct the managers’ attention to a PowerPoint slide
projected on the wall. “How about this, though: Households that purchase
both bananas and cashews at least quarterly seem to show only a negligible
risk of developing Parkinson’s and MS.” Archie had at first been skeptical
about the quality of the grocery chain’s data, but ShopSense’s well of
information was deeper than he’d imagined. Frankly, he’d been having a blast
slicing and dicing. Enjoying his moment in the spotlight, Archie went on a bit
longer than he’d intended, talking about typical patterns in the purchase of
certain over-the-counter medications, potential leading indicators for diabetes,
and other statistical curiosities. Laura noted that as Archie’s presentation wore
on, CEO Jason Walter was jotting down notes. O.Z. Cooper, IFA’s general
counsel, began to clear his throat over the speakerphone.

Laura was about to rein in her stats guy when Rusty Ware, IFA’s chief actuary,
addressed the group. “You know, this deal isn’t really as much of a stretch as
you might think.” He pointed out that the company had for years been buying
from information brokers lists of customers who purchased specific drugs and
products. And IFA was among the best in the industry at evaluating external
sources of data (credit histories, demographic studies, analyses of
socioeconomic status, and so on) to predict depression, back pain, and other
expensive chronic conditions. Prospective IFA customers were required to
disclose existing medical conditions and information about their personal
habits—drinking, smoking, and other high-risk activities—the actuary
reminded the group.
The CEO, meanwhile, felt that Rusty was overlooking an important point. “But
if we’re finding patterns where our rivals aren’t even looking, if we’re coming
up with proprietary health indicators—well, that would be a huge hurdle for
everyone else to get over,” Jason noted.
Laura was keeping an eye on the clock; there were several themes she still
wanted to hammer on. Before she could follow up on Jason’s comments,
though, Geneva Hendrickson, IFA’s senior vice president for ethics and
corporate responsibility, posed a blue-sky question to the group: “Take the
fruit-and-nut stat Archie cited. Wouldn’t we have to share that kind of
information? As a benefit to society?”

Several managers at the table began talking over one another in an attempt to
respond. “Correlations, no matter how interesting, aren’t conclusive evidence
of causality,” someone said. “Even if a correlation doesn’t hold up in the
medical community, that doesn’t mean it’s not useful to us,” someone else
suggested.
Laura saw her opening; she wanted to get back to Jason’s point about
competitive advantage. “Look at Progressive Insurance,” she began. It was
able to steal a march on its rivals simply by recognizing that not all motorcycle
owners are created equal. Some ride hard (young bikers), and some hardly
ride (older, middle-class, midlife crisis riders). “By putting these guys into
different risk pools, Progressive has gotten the rates right,” she said. “It wins
all the business with the safe set by offering low premiums, and it doesn’t lose
its shirt on the more dangerous set.”

Then O.Z. Cooper broke in over the speakerphone. Maybe the company
should formally position Smart Choice and other products and marketing
programs developed using the Shop-Sense data as opt in, he wondered. A lot
of people signed up when Progressive gave discounts to customers who
agreed to put devices in their cars that would monitor their driving habits. “Of
course, those customers realized later they might pay a higher premium when
the company found out they routinely exceeded the speed limit—but that’s not
a legal problem,” O.Z. noted. None of the states that IFA did business in had
laws prohibiting the sort of data exchange ShopSense and the insurer were
proposing. It would be a different story, however, if the company wanted to do
more business overseas.

At that point, Archie begged to show the group one more slide: sales of
prophylactics versus HIV-related claims. The executives continued taking
notes. Laura glanced again at the clock. No one seemed to care that they
were going a little over.

Data Decorum

Rain was in the forecast that afternoon for Dallas, so Steve Worthington
decided to drive rather than ride his bike the nine and a half miles from his
home to ShopSense’s corporate offices in the Hightower Complex. Of course,
the gridlock made him a few minutes late for the early morning meeting with
ShopSense’s executive team. Lucky for him, others had been held up by the
traffic as well.
The group gradually came together in a slightly cluttered room off the main
hallway on the 18th floor. One corner of the space was being used to store
prototypes of regional in-store displays featuring several members of the
Houston Astros’ pitching staff. “I don’t know whether to grab a cup of coffee or
a bat,” Steve joked to the others, gesturing at the life-size cardboard cutouts
and settling into his seat.

Steve was hoping to persuade CEO Donna Greer and other members of the
senior team to approve the terms of the data sale to IFA. He was pretty
confident he had majority support; he had already spoken individually with
many of the top executives. In those one-on-one conversations, only Alan
Atkins, the grocery chain’s chief operations officer, had raised any significant
issues, and Steve had dealt patiently with each of them. Or so he thought.
At the start of the meeting, Alan admitted he still had some concerns about
selling data to IFA at all. Mainly, he was worried that all the hard work the
organization had done building up its loyalty program, honing its analytical
chops, and maintaining deep customer relationships could be undone in one
fell swoop. “Customers find out, they stop using their cards, and we stop
getting the information that drives this whole train,” he said.
“ Customers find out, they stop using their cards, and we stop getting
the information that drives this whole train.”

Steve reminded Alan that IFA had no interest in revealing its relationship with
the grocer to customers. There was always the chance an employee would let
something slip, but even if that happened, Steve doubted anyone would be
shocked. “I haven’t heard of anybody canceling based on any of our other
card-driven marketing programs,” he said.
“That’s because what we’re doing isn’t visible to our customers—or at least it
wasn’t until your recent comments in the press,” Alan grumbled. There had
been some tension within the group about Steve’s contribution to several
widely disseminated articles about ShopSense’s embrace of customer
analytics.
“Point taken,” Steve replied, although he knew that Alan was aware of how
much positive attention those articles had garnered for the company. Many of
its card-driven marketing programs had since been deemed cutting-edge by
others in and outside the industry.

Steve had hoped to move on to the financial benefits of the arrangement, but
Denise Baldwin, ShopSense’s head of human resources, still seemed
concerned about how IFA would use the data. Specifically, she wondered,
would it identify individual consumers as employees of particular companies?
She reminded the group that some big insurers had gotten into serious trouble
because of their profiling practices.

IFA had been looking at this relationship only in the context of individual
insurance customers, Steve explained, not of group plans. “Besides, it’s not
like we’d be directly drawing the risk pools,” he said. Then Steve began
distributing copies of the spreadsheets outlining the five-year returns
ShopSense could realize from the deal.

“‘Directly’ being the operative word here,” Denise noted wryly, as she took her
copy and passed the rest around.

Parsing the Information

It was 6:50 pm, and Jason Walters had canceled his session with his personal
trainer—again—to stay late at the office. Sammy will understand, the CEO
told himself as he sank deeper into the love seat in his office, a yellow legal
pad on his lap and a pen and cup of espresso balanced on the arm of the
couch. It was several days after the review of the ShopSense pilot, and Jason
was still weighing the risks and benefits of taking this business relationship to
the next stage.

He hated to admit how giddy he was—almost as gleeful as Archie Stetter had


been—about the number of meaningful correlations the analysts had turned
up. “Imagine what that guy could do with an even larger data set,” O.Z.
Cooper had commented to Jason after the meeting. Exclusive access to
ShopSense’s data would give IFA a leg up on competitors, Jason knew. It
could also provide the insurer with proprietary insights into the food-related
drivers of disease. The deal was certainly legal. And even in the court of
public opinion, people understood that insurers had to perform risk analyses.
It wasn’t the same as when that online bookseller got into trouble for charging
customers differently based on their shopping histories.

What if IFA took the pilot to the next level and found out something that
maybe it was better off not knowing?

But Jason also saw dark clouds on the horizon: What if IFA took the pilot to
the next level and found out something that maybe it was better off not
knowing? As he watched the minute hand sweep on his wall clock, Jason
wondered what risks he might be taking without even realizing it. • • •

Donna Greer gently swirled the wine in her glass and clinked the stemware
against her husband’s. The two were attending a wine tasting hosted by a
friend. The focus was on varieties from Chile and other Latin American
countries, and Donna and Peter had yet to find a sample they didn’t like. But
despite the lively patter of the event and the plentiful food. Donna couldn’t
keep her mind off the IFA deal. “The big question is, Should we be charging
more?” she mused to her husband. ShopSense was already selling its
scanner data to syndicators, and, as her CFO had reminded her, the company
currently made more money from selling information than from selling meat.
Going forward, all ShopSense would have to do was send IFA some tapes
each month and collect a million dollars annually of pure profit. Still, the deal
wasn’t without risks: By selling the information to IFA, it might end up diluting
or destroying valuable and hard-won customer relationships. Donna could see
the headline now: “Big Brother in Aisle Four.” All the more reason to make it
worth our while, she thought to herself.

Peter urged Donna to drop the issue for a bit, as he scribbled his comments
about the wine they’d just sampled on a rating sheet. “But I’ll go on record as
being against the whole thing,” he said. “Some poor soul puts potato chips in
the cart instead of celery, and look what happens.”

“But what about the poor soul who buys the celery and still has to pay a
fortune for medical coverage,” Donna argued, “because the premiums are set
based on the people who can’t eat just one?”
“Isn’t that the whole point of insurance?” Peter teased. The CEO shot her
husband a playfully peeved look—and reminded herself to send an e-mail to
Steve when they got home.

How can these companies leverage the customer data responsibly?

George L. Jones is the president and chief executive officer of Borders


Group, a global retailer of books, music, and movies based in Ann Arbor,
Michigan.
Sure, a customer database has value, and a company can maximize that
value in any number of ways—growing the database, mining it, monetizing it.
Marketers can be tempted, despite pledges about privacy, to use collected
information in ways that seem attractive but may ultimately damage
relationships with customers.

The arrangement proposed in this case study seems shortsighted to me.


Neither company seems to particularly care about its customers. Instead, the
message coming from the senior teams at both IFA and ShopSense is that
any marketing opportunity is valid—as long as they can get away with it
legally and customers don’t figure out what they’re doing.

The message coming from both IFA and ShopSense is that any
marketing opportunity is valid—as long as they can get away with it.

In my company, this pilot would never have gotten off the ground. The culture
at Borders is such that the managers involved would have just assumed we
wouldn’t do something like that. Like most successful retail companies, our
organization is customer focused; we’re always trying to see a store or an
offer or a transaction through the customer’s eyes. It was the same way at
both Saks and Target when I was with those companies.

At Borders, we’ve built up a significant database through our Borders


Rewards program, which in the past year and a half has grown to 17 million
members. The data we’re getting are hugely important as a basis for serving
customers more effectively (based on their purchase patterns) and as a
source of competitive advantage. For instance, we know that if somebody
buys a travel guide to France, that person might also be interested in reading
Peter Mayle’s A Year in Provence. But we assure our customers up front that
their information will be handled with the utmost respect. We carefully control
the content and frequency of even our own communications with Rewards
members. We don’t want any offers we present to have negative
connotations—for instance, we avoid bombarding people with e-mails about a
product they may have absolutely no interest in.

I honestly don’t think these companies have hit upon a responsible formula for
mining and sharing customer data. If ShopSense retained control of its data to
some degree—that is, if the grocer and IFA marketed the Smart Choice
program jointly, and if any offers came from ShopSense (the partner the
customer has built up trust with) rather than the insurance company (a
stranger, so to speak)—the relationship could work. Instead of ceding
complete control to IFA, ShopSense could be somewhat selective and send
offers to all, some, or none of its loyalty card members, depending on how
relevant the grocer believed the insurance offer would be to a particular set of
customers.

A big hole in these data, though, is that people buy food for others besides
themselves. I rarely eat at home, but I still buy tons of groceries—some
healthy, some not so healthy—for my kids and their friends. If you looked at a
breakdown of purchases for my household, you’d say “Wow, they’re
consuming a lot.” But the truth is, I hardly ever eat a bite. That may be an
extreme example, but it suggests that IFA’s correlations may be flawed.

Both CEOs are subjecting their organizations to a possible public relations


backlash, and not just from the ShopSense customers whose data have been
dealt away to IFA. Every ShopSense customer who hears about the deal,
loyalty card member or not, is going to lose trust in the company. IFA’s
customers might also think twice about their relationship with the insurer. And
what about the employees in each company who may be uncomfortable with
what the companies are trying to pull off? The corporate cultures suffer.

What the companies are proposing here is very dangerous—especially in the


world of retail, where loyalty is so hard to win. Customers’ information needs
to be protected.

Katherine N. Lemon ([email protected]) is an associate professor of


marketing at Boston College’s Carroll School of Management. Her expertise is
in the areas of customer equity, customer management, and customer-based
marketing strategy.

As the case study illustrates, companies will soon be able to create fairly
exhaustive, highly accurate profiles of customers without having had any
direct interaction with them. They’ll be able to get to know you intimately
without your knowledge.

From the consumer’s perspective, this trend raises several big concerns. In
this fictional account, for instance, a shopper’s grocery purchases may directly
influence the availability or price of her life or health insurance products—and
not necessarily in a good way. Although the customer, at least tacitly,
consented to the collection, use, and transfer of her purchase data, the real
issue here is the unintended and uncontemplated use of the information (from
the customer’s point of view). Most customers would probably be quite
surprised to learn that their personal information could be used by companies
in a wholly unrelated industry and in other ways that aren’t readily
foreseeable.

If consumers lose trust in firms that collect, analyze, and utilize their
information, they will opt out of loyalty and other data-driven marketing
programs, and we may see more regulations and limitations on data
collection. Customer analytics are effective precisely because firms
do not violate customer trust. People believe that retail and other
organizations will use their data wisely to enhance their experiences, not to
harm them. Angry customers will certainly speak with their wallets if that trust
is violated.

Customer analytics are effective precisely because firms do not violate


customer trust.

Decisions that might be made on the basis of the shared data represent
another hazard for consumers—and for organizations. Take the insurance
company’s use of the grocer’s loyalty card data. This is limited information at
best and inaccurate at worst. The ShopSense data reflect food bought but not
necessarily consumed, and individuals buy food at many stores, not just one.
IFA might end up drawing erroneous conclusions—and exacting unfair rate
increases. The insurer’s general counsel should investigate this deal.

Another concern for consumers is what I call “battered customer syndrome.”


Market analytics allow companies to identify their best and worst customers
and, consequently, to pay special attention to those deemed to be the most
valuable. Looked at another way, analytics enable firms to understand how
poorly they can treat individual or groups of customers before those people
stop doing business with them. Unless you are in the top echelon of
customers—those with the highest lifetime value, say—you may pay higher
prices, get fewer special offers, or receive less service than other consumers.
Despite the fact that alienating 75% to 90% of customers may not be the best
idea in the long run, many retailers have adopted this “top tier” approach to
managing customer relationships. And many customers seem to be willing to
live with it—perhaps with the unrealistic hope that they may reach the upper
echelon and reap the ensuing benefits.

Little research has been done on the negative consequences of using


marketing approaches that discriminate against customer segments.
Inevitably, however, customers will become savvier about analytics. They may
become less tolerant and take their business (and information) elsewhere.

If access to and use of customer data are to remain viable, organizations


must come up with ways to address customers’ concerns about privacy.
What, then, should IFA and ShopSense do? First and foremost, they need to
let customers opt in to their data-sharing arrangement. This would address
the “unintended use of data” problem; customers would understand exactly
what was being done with their information. Even better, both firms would be
engaging in trust-building—versus trust-eroding—activities with customers.
The result: improvement in the bottom line and in the customer experience.

David Norton ([email protected]) is the senior vice president of


relationship marketing at Harrah’s Entertainment, based in Las Vegas.
Transparency is a critical component of any loyalty card program. The value
proposition must be clear; customers must know what they’ll get for allowing
their purchase behavior to be monitored. So the question for the CEOs of
ShopSense and IFA is, Would customers feel comfortable with the
data-sharing arrangement if they knew about it?

Would customers feel comfortable with the data-sharing arrangement if


they knew about it?

ShopSense’s loyalty card data are at the center of this venture, but the
grocer’s goal here is not to increase customer loyalty. The value of its
relationship with IFA is solely financial. The company should explore whether
there are some customer data it should exclude from the
transfer—information that could be perceived as exceedingly sensitive, such
as pharmacy and alcohol purchases. It should also consider doing market
research and risk modeling to evaluate customers’ potential reaction to the
data sharing and the possible downstream effect of the deal.

The risk of consumer backlash is lower for IFA than for ShopSense, given the
information the insurance company already purchases. IFA could even put a
positive spin on the creation of new insurance products based on the
ShopSense data. For instance, so-called healthy purchases might earn
customers a discount on their standard insurance policies. The challenge for
the insurer, however, is that there is no proven correlation between the
purchase of certain foods and fewer health problems. IFA should continue
experimenting with the data to determine their richness and predictive value.

Some companies have more leeway than others to sell or trade customer
lists. At Harrah’s, we have less than most because our customers may not
want others to know about their gaming and leisure activities. We don’t sell
information, and we don’t buy a lot of external data. Occasionally, we’ll buy
demographic data to fine-tune our marketing messages (to some customers,
an offer of tickets to a live performance might be more interesting than a
dining discount, for example). But we think the internal transactional data are
much more important.

We do rely on analytics and models to help us understand existing customers


and to encourage them to stick with us. About ten years ago, we created our
Total Rewards program. Guests at our hotels and casinos register for a loyalty
card by sharing the information on their driver’s license, such as their name,
address, and date of birth. Each time they visit one of our 39 properties and
use their card, they earn credits that can be used for food and merchandise.
They also earn Tier Credits that give them higher status in the program and
make them eligible for differentiated service.
With every visit, we get a read on our customers’ preferences—the types of
games they play, the hotels and amenities they favor, and so on. Those
details are stored in a central database. The company sets rules for what can
be done with the information. For instance, managers at any one of our
properties can execute their own marketing lists and programs, but they can
target only customers who have visited their properties. If they want to dip into
the overall customer base, they have to go through the central
relationship-marketing group. Some of the information captured in our online
joint promotions is accessible to both Harrah’s and its business partners, but
the promotions are clearly positioned as opt in.

We tell customers the value proposition up front: Let us track your play at our
properties, and we can help you enjoy the experience better with richer
rewards and improved service. They understand exactly what we’re capturing,
the rewards they’ll get, and what the company will do with the information. It’s
a win-win for the company and for the customer.

Companies engaging in customer analytics and related marketing initiatives


need to keep “win-win” in mind when collecting and handling customer data.
It’s not just about what the information can do for you; it’s about what you can
do for the customer with the information.

Michael B. McCallister ([email protected]) is the president and


CEO of Humana, a health benefits company based in Louisville, Kentucky.
Companies that can capitalize on the information they get from their
customers hold an advantage over rivals. But as the firms in the case study
are realizing, there are also plenty of risks involved with using these data.
Instead of pulling back the reins, organizations should be nudging customer
analytics forward, keeping in mind one critical point: Any collection, analysis,
and sharing of data must be conducted in a protected, permission-based
environment.

Humana provides health benefit plans and related health services to more
than 11 million members nationwide. We use proprietary data-mining and
analytical capabilities to help guide consumers through the health maze. Like
IFA, we ask our customers to share their personal and medical histories with
us (the risky behaviors as well as the good habits) so we can acquaint them
with programs and preventive services geared to their health status.

Customer data come to us in many different ways. For instance, we offer


complimentary health assessments in which plan members can take an
interactive online survey designed to measure how well they’re taking care of
themselves. We then suggest ways they can reduce their health risks or treat
their existing conditions more effectively. We closely monitor our claims
information and use it to reach out to people. In our Personal Nurse program,
for example, we’ll have a registered nurse follow up with a member who has
filed, say, a diabetes-related claim. Through phone conversations and e-mails,
the RN can help the plan member institute changes to improve his or her
quality of life. All our programs require members to opt in if the data are going
to be used in any way that would single a person out. Regardless of your
industry, you have to start with that.

One of the biggest problems in U.S. health care today is obesity. So would it
be useful for our company to look at grocery-purchasing patterns, as the
insurance company in the case study does? It might be. I could see the
upside of using a grocer’s loyalty card data to develop a wellness-based
incentive program for insurance customers. (We would try to find a way to
build positives into it, however, so customers would look at the interchange
and say “That’s in my best interest; thank you.”) But Humana certainly
wouldn’t enter into any kind of data-transfer arrangement without ensuring
that our customers’ personal information and the integrity of our relationship
with them would be properly protected. In health care, especially, this has to
be the chief concern—above and beyond any patterns that might be revealed
and the sort of competitive edge they might provide. We use a range of
industry standard security measures, including encryption and firewalls, to
protect our members’ privacy and medical information.

Ethical behavior starts with the CEO, but it clearly can’t be managed by just
one person. It’s important that everyone be reminded often about the
principles and values that guide the organization. When business
opportunities come along, they’ll be screened according to those
standards—and the decisions will land right side up every time. I can’t tell
people how to run their meetings or who should be at the table when the
tougher, gray-area decisions need to be made, but whoever is there has to
have those core principles and values in mind.

When the tougher, gray-area decisions need to be made, each person


has to have the company’s core principles and values in mind.

The CEOs in the case study need to take the “front page” test: If the headline
on the front page of the newspaper were reporting abuse of customer data
(yours included), how would you react? If you wouldn’t want your personal
data used in a certain way, chances are your customers wouldn’t, either.

Source: https://hbr.org/2007/05/the-dark-side-of-customer-analytics
✧ Learning Activity:
● Analyze the case study on “The Dark Side of Customer Analytics.” How
can the companies leverage the data responsibly? Share your insights.

✧ Questions for Discussion:


1. What is data visualization and why is it important?
2. What is the difference between business metrics and key performance
indicators?

You might also like