IBA Module3
IBA Module3
✧ Learning Outcome:
At the end of this module, you should be able to talk about the
following:
■ Organization/sources of data
■ Importance of data quality
■ Data Classification
✧ Reading Materials:
Data Visualization
The concept of using pictures to understand data has been around for
centuries, from maps and graphs in the 17th century to the invention of the pie
chart in the early 1800s. Several decades later, one of the most cited
examples of statistical graphics occurred when Charles Minard mapped
Napoleon’s invasion of Russia. The map depicted the size of the army as well
as the path of Napoleon’s retreat from Moscow – and tied that information to
temperature and time scales for a more in-depth understanding of the event.
It’s technology, however, that truly lit the fire under data visualization.
Computers made it possible to process large amounts of data at lightning-fast
speeds. Today, data visualization has become a rapidly evolving blend of
science and art that is certain to change the corporate landscape over the
next few years.
With big data there’s potential for great opportunity, but many retail
banks are challenged when it comes to finding value in their big data
investment. For example, how can they use big data to improve
customer relationships? How – and to what extent – should they invest
in big data?
Because of the way the human brain processes information, using charts or
graphs to visualize large amounts of complex data is easier than poring over
spreadsheets or reports. Data visualization is a quick, easy way to convey
concepts in a universal manner – and you can experiment with different
scenarios by making slight adjustments.
Data visualization is going to change the way our analysts work with
data. They’re going to be expected to respond to issues more rapidly.
And they’ll need to be able to dig for more insights – look at data
differently, more imaginatively. Data visualization will promote that
creative data exploration.
Once a business has uncovered new insights from visual analytics, the next
step is to communicate those insights to others. Using charts, graphs or other
visually impactful representations of data is important in this step because it’s
engaging and gets the message across quickly.
Before implementing new technology, there are some steps you need
to take. Not only do you need to have a solid grasp on your data, you
also need to understand your goals, needs and audience. Preparing
your organization for data visualization technology requires that you
first:
● Understand the data you’re trying to visualize, including its size and
cardinality (the uniqueness of data values in a column).
● Use a visual that conveys the information in the best and simplest form
for your audience.
Once you've answered those initial questions about the type of data you have
and the audience who'll be consuming the information, you need to prepare
for the amount of data you'll be working with. Big data brings new challenges
to visualization because large volumes, different varieties and varying
velocities must be taken into account. Plus, data is often generated faster that
it can be managed and analyzed.
There are factors you should consider, such as the cardinality of columns
you’re trying to visualize. High cardinality means there’s a large percentage of
unique values (e.g., bank account numbers, because each item should be
unique). Low cardinality means a column of data contains a large percentage
of repeat values (as might be seen in a “gender” column).
One of the biggest challenges for business users is deciding which visual
should be used to best represent the information. SAS Visual Analytics uses
intelligent autocharting to create the best possible visual based on the data
that is selected.
When you’re first exploring a new data set, autocharts are especially useful
because they provide a quick view of large amounts of data. This data
exploration capability is helpful even to experienced statisticians as they seek
to speed up the analytics lifecycle process because it eliminates the need for
repeated sampling to determine which data is appropriate for each model.
Source: https://www.sas.com/en_in/insights/big-data/data-visualization.html
Potential Problems With Data Visualization
Big data has been a big topic for a few years now, and it’s only going to grow
bigger as we get our hands on more sophisticated forms of technology and
new applications in which to use them. The problem now is beginning to shift;
originally, tech developers and researchers were all about gathering greater
quantities of data. Now, with all this data in tow, consumers and developers
are both eager for new ways to condense, interpret, and take action on this
data.
One of the newest and most talked-about methods for this is data
visualization, a system of reducing or illustrating data in simplified, visual
ways. The buzz around data visualization is strong and growing, but is the
trend all it’s cracked up to be?
Source:
https://www.datasciencecentral.com/profiles/blogs/4-potential-problems-with-d
ata-visualization
I. ORGANIZATION/SOURCES OF DATA
Types of Data Sources
Various types of data are very useful for business reports, and in business
reports, you will quickly come across things like revenue (money earned in a
given period, usually a year), turnover (people who left the organization in a
given period), and many others.
● Internal
o Employee headcount
o Employee demographics (e.g., sex, ethnicity, marital status)
o Financials (e.g., revenue, profit, cost of goods sold, margin,
operating ratio)
● External
o Number of vendors used
o Number of clients in a company’s book of business
o Size of the industry (e.g. number of companies, total capital)
Both types of data are useful for business report writing. Usually a report will
feature as much “hard” quantitative data as possible, typically in the form of
earnings or revenue, headcount, and other numerical data available. Most
organizations keep a variety of internal quantitative data. Qualitative data,
such as stories, case studies, or narratives about processes or events, are
also very useful, and provide context. We may consider that a good report will
have both types of data, and a good report writer will use both types of data to
build a picture of information for their readers.
Primary Research
This type of research is done to fill in gaps found during secondary research
review. That is, one does not conduct primary research if you can address
your research question with already existing secondary sources.
Secondary Research
Common forms are books, journals, newspaper articles, media reports, and
other polished accounts of data. Most report writers will use secondary
sources for their business reports in order to gather, curate, and present the
material in a new, updated and helpful manner. Using secondary research is
far less costly, more efficient, and requires less time to gather data from
already developed sources.
Source:
https://courses.lumenlearning.com/wm-businesscommunicationmgrs/c
hapter/types-of-data-sources/
If marketers had all the data about consumers that they can then use to
predict consumer behavior, which would be the marketers dream come true.
Until now, marketers had enough data about consumers that they then
modeled to arrive at probable consumer behavior decisions. This data culled
from marketing research was adequate until now when the extrapolation of
the trends could translate into predictions of consumer behavior. However, in
recent years, marketers are going one-step ahead and instead of
extrapolating data to predict consumer behavior, they are now turning to Big
Data or data about virtually all aspects of consumers that would help them in
predictive analytics or the art and science of accurately mapping consumer
behavior. In other words, Big Data is all about how marketers collect
everything possible about consumer behavior and predict not only consumer
behavior but predict what they are doing and how they would behave in
future. For instance, Big Data provides marketers with the ability to
identify the state of the consumers as can be seen in the recent
prediction by the retailer giant, Target, about a woman being pregnant
based on her consumer buying data.
This is the promise of Big Data that goes beyond merely extrapolating trends,
instead, identifies, and predicts the next move of the consumer based on his
or her current state. This would be like getting inside the minds of the
consumers and instead of merely knowing what they would probably
purchase, marketers would know with accuracy about what consumers are
likely to do in the future.
The term Big Data has been coined because it gives marketers the bigger
picture and at the same time lets, they model consumer behavior at the micro
level. The integration of the macro data and the micro trends gives marketers’
unparalleled access to data, which can then be used to accurately predict
consumer behavior. The collection of Big Data is done not only from the
consumer buying behavior but also from mining all the available data in the
public and private domains to arrive at a comprehensive picture of what the
consumers think and how they act. The promise of Big Data is boundless for
marketers who can now think ahead of the consumers instead of the other
way around as well as preempt possible consumer behavior by targeting
products aimed at the future actions of the consumers.
Of course, the promise of Big Data also comes with its perils as the tendency
to be the master of consumer behavior can lead to serious issues with privacy
and security of the data available with the marketers. The example of Target
predicting whether the woman was pregnant or not based on her shopping
habits was received with both enthusiasm as well as alarm. The enthusiasm
was from the marketers whereas the alarm was from the activists and experts
who deal with privacy and security of data. The point here is that Big Data
places enormous responsibilities on marketers and hence, they have to be
very careful about the data that they hold and the prediction models and
simulation that they run. If they chose to predict whether someone is going to
do something next based on the results from the model, this prediction can
also be used for unwelcome purposes and as can be seen in the recent
revelations about tracking and surveillance, the data can be compromised or
used to target innocent consumers. This is the reason why many experts are
guarded as far as Big Data is concerned and they are waiting for the
marketers and the regulators to frame rules and policies on how Big Data can
be used in practice.
Concluding Remarks
Finally, it must be mentioned that whichever stance one might have about Big
Data, the potential uses of it for predicting the outbreaks of diseases and
controlling crime are indeed boons to the regulators and the law enforcement
agencies and therefore, it would be better for all stakeholders to decide on the
kind of purposes for which Big Data can be used.
Source:
https://www.managementstudyguide.com/big-data-and-its-importance.h
tm
An oft-cited estimate by IBM calculated that the annual cost of data quality
issues in the U.S. amounted to $3.1 trillion in 2016. In an article he wrote for
the MIT Sloan Management Review in 2017, data quality consultant Thomas
Redman estimated that correcting data errors and dealing with the business
problems caused by bad data costs companies 15% to 25% of their annual
revenue on average.
The International Monetary Fund (IMF), which oversees the global monetary
system and lends money to economically troubled nations, has also specified
an assessment methodology, similarly known as the Data Quality Assessment
Framework. Its framework focuses on accuracy, reliability, consistency and
other data quality attributes in the statistical data that member countries need
to submit to the IMF.
Those processes include data cleansing, or data scrubbing, to fix data errors,
plus work to enhance data sets by adding missing values, more up-to-date
information or additional records. The results are then monitored and
measured against the performance targets, and any remaining deficiencies in
data quality provide a starting point for the next round of planned
improvements. Such a cycle is intended to ensure that efforts to improve
overall data quality continue after individual projects are completed.
Software tools specialized for data quality management can match records,
delete duplicates, validate new data, establish remediation policies and
identify personal data in data sets; they also do data profiling to collect
information about data sets and identify possible outlier values. Management
consoles for data quality initiatives support creation of data handling rules,
discovery of data relationships and automated data transformations that may
be part of data quality maintenance efforts.
Data quality demands are also expanding due to the implementation of new
data privacy and protection laws, most notably the European Union's General
Data Protection Regulation (GDPR) and the California Consumer Privacy Act
(CCPA). Both measures give people the right to access the personal data that
companies collect about them, which means organizations must be able to
find all of the records on an individual in their systems without missing any
because of inaccurate or inconsistent data.
Fixing data quality issues
Data quality managers, analysts and engineers are primarily responsible for
fixing data errors and other data quality problems in organizations. They're
collectively tasked with finding and cleansing bad data in databases and other
data repositories, often with assistance and support from other data
management professionals, particularly data stewards and data governance
program managers.
In that broader view, data integrity focuses on integrity from both logical and
physical standpoints. Logical integrity includes data quality measures and
database attributes such as referential integrity, which ensures that related
data elements in different database tables are valid. Physical integrity involves
access controls and other security measures designed to prevent data from
being modified or corrupted by unauthorized users, as well as backup and
disaster recovery protections.
Source: https://searchdatamanagement.techtarget.com/definition/data-quality
When we say data are missing completely at random, we mean that the
missingness has nothing to do with the observation being studied (Completely
Observed Variable (X) and Partly Missing Variable (Y)). For example, a
weighing scale that ran out of batteries, a questionnaire might be lost in the
post, or a blood sample might be damaged in the lab. MCAR is an ideal but
unreasonable assumption. Generally, data are regarded as being MCAR
when data are missing by design, because of an equipment failure or because
the samples are lost in transit or technically unsatisfactory. The statistical
advantage of data that are MCAR is that the analysis remains unbiased. A
pictorial view of MCAR as below where missingness has no relation to
dataset variables X or Y. Missingness is not related to X or Y but some other
reason Z.
Let’s explore one example of mobile data. Here one sample has missing
value which is not because of dataset variables but because of another
reason.
Missing at Random (MAR)
When we say data are missing at random, we mean that missing data on a
partly missing variable (Y) is related to some other completely observed
variables(X) in the analysis model but not to the values of Y itself.
If the characters of the data do not meet those of MCAR or MAR, then they
fall into the category of missing not at random (MNAR). When data
are missing not at random, the missingness is specifically related to what is
missing, e.g. a person does not attend a drug test because the person took
drugs the night before, a person did not take English proficiency test due to
his poor English language skill. The cases of MNAR data are problematic.
The only way to obtain an unbiased estimate of the parameters in such a case
is to model the missing data but that requires proper understanding and
domain knowledge of the missing variable. The model may then be
incorporated into a more complex one for estimating the missing values. A
pictorial view of MNAR as below where missingness has direct relation to
variable Y. It can have other relation as well (X & Z).
There are several strategies which can be applied to handle missing data to
make the Machine Learning/Statistical Model.
This may be possible in data collection phase in survey like situation where
one can check if survey data is captured in entirety before respondent leaves
the room. Sometimes it may be possible to reach out to the source to get the
data like asking the missing question again for a response. In real world
scenario, this is very unlikely way to resolve the missing data problem.
Educated Guessing
It sounds arbitrary and isn’t never preferred course of action, but one can
sometimes infer a missing value based on other response. For related
questions, for example, like those often presented in a matrix, if the participant
responds with all “2s”, assume that the missing value is a 2.
Discard Data
By far the most common approach to the missing data is to simply omit those
cases with the missing data and analyse the remaining data. This approach is
known as the complete case (or available case) analysis or list-wise deletion.
If there is a large enough sample, where power is not an issue, and the
assumption of MCAR is satisfied, the listwise deletion may be a reasonable
strategy. However, when there is not a large sample, or the assumption of
MCAR is not satisfied, the listwise deletion is not the optimal strategy. It also
introduces bias if it does not satisfy MCAR.
In this case, only the missing observations are ignored and analysis is done
on variables present. If there is missing data elsewhere in the data set, the
existing values are used. Since a pairwise deletion uses all information
observed, it preserves more information than the listwise deletion.
Pairwise deletion is known to be less biased for the MCAR or MAR data.
However, if there are many missing observations, the analysis will be
deficient. the problem with pairwise deletion is that even though it takes the
available cases, one can’t compare analyses because the sample is different
every time.
3) Dropping Variables
If there are too many data missing for a variable it may be an option to delete
the variable or the column from the dataset. There is no rule of thumbs for this
but depends on situation and a proper analysis of data is needed before the
variable is dropped all together. This should be the last option and need to
check if model performance improves after deletion of variable.
Retain All Data
If data is time-series data, one of the most widely used imputation methods is
the last observation carried forward (LOCF). Whenever a value is missing, it is
replaced with the last observed value. This method is advantageous as it is
easy to understand and communicate. Although simple, this method strongly
assumes that the value of the outcome remains unchanged by the missing
data, which seems unlikely in many settings.
3) Next Observation Carried Backward (NOCB)
A similar approach like LOCF which works in the opposite direction by taking
the first observation after the missing value and carrying it backward (“next
observation carried backward”, or NOCB).
4) Linear Interpolation
For a rating scale, using the middle point or most commonly chosen value.
For example, on a five-point scale, substitute a 3, the midpoint, or a 4, the
most common value (in many cases). It is similar to mean value but more
suitable for ordinal values.
This is perhaps the most widely used method of missing data imputation for
categorical variables. This method consists in treating missing data as if they
were an additional label or category of the variable. All the missing
observations are grouped in the newly created label ‘Missing’. It does not
assume anything on the missingness of the values. It is very well suited when
the number of missing data is high.
7) Frequent category imputation
When data are not missing completely at random, we can capture the
importance of missingness by creating an additional variable indicating
whether the data was missing for that observation (1) or not (0). The
additional variable is a binary variable: it takes only the values 0 and 1, 0
indicating that a value was present for that observation, and 1 indicating that
the value was missing for that observation. Typically, mean/median imputation
is done together with adding a variable to capture those observations where
the data was missing.
10) Random Sampling Imputation
Multiple Imputation
Multiple Imputation (MI) is a statistical technique for handling missing data.
The key concept of MI is to use the distribution of the observed data to
estimate a set of plausible values for the missing data. Random components
are incorporated into these estimated values to show their uncertainty.
Multiple datasets are created and then analysed individually but identically to
obtain a set of parameter estimates. Estimates are combined to obtain a set
of parameter estimates. As a flexible way of handling more than one missing
variable, apply a Multiple Imputation by Chained Equations (MICE) approach.
The benefit of the multiple imputation is that in addition to restoring the natural
variability of the missing values, it incorporates the uncertainty due to the
missing data, which results in a valid statistical inference. Refer to reference
section to get more information on MI and MICE. Below is a schematic
representation of MICE.
Linear Regression
Random Forest
Random forest is a non-parametric imputation method applicable to various
variable types that works well with both data missing at random and not
missing at random. Random forest uses multiple decision trees to estimate
missing values and outputs OOB (out of bag) imputation error estimates. One
caveat is that random forest works best with large datasets and using random
forest on small datasets runs the risk of overfitting.
Maximum likelihood
Expectation-Maximization
Sensitivity analysis
Sensitivity analysis is defined as the study which defines how the uncertainty
in the output of a model can be allocated to the different sources of
uncertainty in its inputs. When analysing the missing data, additional
assumptions on the reasons for the missing data are made, and these
assumptions are often applicable to the primary analysis. However, the
assumptions cannot be definitively validated for the correctness. Therefore,
the National Research Council has proposed that the sensitivity analysis be
conducted to evaluate the robustness of the results to the deviations from the
MAR assumption.
Algorithms that Support Missing Values
Not all algorithms fail when there is missing data. There are algorithms that
can be made robust to missing data, such as k-Nearest Neighbours that can
ignore a column from a distance measure when a value is missing. There are
also algorithms that can use the missing value as a unique and different value
when building the predictive model, such as classification and regression
trees. Algorithm like XGBoost takes into consideration of any missing data. If
your imputation does not work well, try a model that is robust to missing data.
Recommendations
Missing data reduces the power of a model. Some amount of missing data is
expected, and the target sample size is increased to allow for it. However,
such cannot eliminate the potential bias. More attention should be paid to the
missing data in the design and performance of the studies and in the analysis
of the resulting data. Application of the machine learning model techniques
should only be performed after the maximal efforts put to reduce missing data
in the design and prevention techniques.
Source:
https://towardsdatascience.com/all-about-missing-data-handling-b94b8b5d21
84
IV. DATA CLASSIFICATION
There are three main types of data classification that are considered
industry standards:
● Prioritize and Organize Data: Now that you have a policy and a picture of
your current data, it’s time to properly classify the data. Decide on the best
way to tag your data based on its sensitivity and privacy.
There are more benefits to data classification than simply making data easier
to find. Data classification is necessary to enable modern enterprises to make
sense of the vast amounts of data available at any given moment.
Source:
https://digitalguardian.com/blog/what-data-classification-data-classification-def
inition
Data classification tags data according to its type, sensitivity, and value to the
organization if altered, stolen, or destroyed. It helps an organization
understand the value of its data, determine whether the data is at risk, and
implement controls to mitigate risks. Data classification also helps an
organization comply with relevant industry-specific regulatory mandates such
as SOX, HIPAA, PCI DSS, and GDPR.
● Low sensitivity data—intended for public use. For example, public website
content.
Data Sensitivity Best Practices
Since the high, medium, and low labels are somewhat generic, a best practice
is to use labels for each sensitivity level that make sense for your
organization. Two widely-used models are shown below.
If a database, file, or other data resource includes data that can be classified
at two different levels, it’s best to classify all the data at the higher level.
Data Discovery
Classifying data requires knowing the location, volume, and context of data.
Most modern businesses store large volumes of data, which may be spread
across multiple repositories:
Before you can perform data classification, you must perform accurate and
comprehensive data discovery. Automated tools can help discover sensitive
data at large scale. See our article on Data Discovery for more information.
● Which organizational unit has the most information about the content and
context of the
information?
The policy also determines the data classification process: how often data
classification should take place, for which data, which type of data
classification is suitable for different types of data, and what technical means
should be used to classify data. The data classification policy is part of the
overall information security policy, which specifies how to protect sensitive
data.
Following are common examples of data that may be classified into each
sensitivity level.
Sensitivity Examples
Level
High Credit card numbers (PCI) or other financial account numbers, customer
personal data, FISMA protected information, privileged credentials for IT
systems, protected health information (HIPAA), Social Security numbers,
intellectual property, employee records.
Source: https://www.imperva.com/learn/data-security/data-classification/
V. BUSINESS MODELING, METRICS AND
MEASUREMENT
Focus is not the only question of developing KPIs for the canvas. There is
also the question of quantity. We strongly advise to keep the count down to a
limited set that can be committed to memory. Ideally, this would be less than
10. Marr's personal recommendation is 7.
It won't come as a surprise that the rigor of identifying the KPQs and KPIs can
be challenging. We want the executives, managers and employees of an
organization to align around them, to be familiar with them and to ground their
initiatives with an eye to these metrics. It's a tall order, but well worth it.
Integrating KPIs Directly on the Canvas
Now let's marry the KPIs to the canvas. We'll continue with the Toro example.
As you've seen above, we've mapped the KPIs directly onto the surface of the
canvas and associated them with the appropriate part of the canvas. While
there is a wide range of aesthetic approaches to this, the key outcome must
be that we can assess - quantitatively - how the business model's vision is
unfolding. Simple KPI colors draw the eye to status - and the relative
challenge of resolving.
Learn About Our Strategic Canvas
Source:
http://www.bartlettsystem.com/blog/business-model-canvas-integrating-kpis
● Metrics help us ask the right questions, and measure success, give
feedback to improve on the strategies.
● There are traditional Vs Dynamic Metrics. Any change is not easy to see
in traditional metrics, like Quarterly revenue. It is easy to measure in
dynamic metric like website visits getting converted to clicks and
eventually purchase
● People are normally happy to talk about how interested they are in your
product and introduce you to many people who actually may not have any
budget to buy, and even make you travel and present at your own
expense. Business needs to be cautious of such cases.
● But this still does not mean sales has happened. Lot of things can go
wrong, like decision maker quitting, company getting acquired, the project
getting scrapped.
● Some key metrics are sales leads, qualified sales leads, time taken in
getting to the right person, making them say yes
Dynamic Metrics :
● Dynamic metrics examples are click rates, most sought after items,
people who viewed this items also viewed these etc.
● Also note that it does not say “most frequently bought together”, only
“frequently bought together”. There is a possibility that out of top 100
co-occurrence data, the choice to display this co-occurrence over the
other is made due to an A/B testing that results in better sales
and revenue. These numbers are dependent on book price and volumes,
remember!
● Amazon also maintains a co-occurrence for click data, so that it can tempt
the user into buying just by showing “people also viewed this item” list.
Profitability/Efficiency Metrics:
● Inventory Management: Take any retail that sells goods, where inventory
and stock keeping unit (SKU) come into picture. Sales price being the
same, cost to manage inventory is the deciding factor. More number of
days item is in the “inventory” — that is at shelf or storage, lesser the
profit because of many factors explained. Average number of days in
inventory-called “days inventory”- is one of the Profitability
Metric. The company's inventory on hand at the end of the year divided
by the total annual cost of goods sold and then multiplied by 365 for days
of the year is a very good estimate of average days inventory.
● Too few inventory can cause customer go empty handed, never visiting
the store again, hard to know. A good metric to capture this lost
customer is number of times inventory of any particular SKU
reached zero.
● Hotel Room and Airline Examples: If one seat in a flight is not occupied,
that is lost opportunity, similarly with a hotel room not rented. The variable
cost per unit is negligible compared to huge fixed cost. It is called “sunk
cost”. So, there is concept of variable pricing. Analyze the occupancy
rates for weekly or seasonal patterns etc, predict expected occupancy
rate. Based on these factors the hotel can offer three different types of
rates, rack rate, intermediate or promotional rate and floor rate
Risk Metric:
● When a company owes more than it is worth — “Excessive Leverage”,
it is unlikely to survive
Source:
https://towardsdatascience.com/an-overview-of-business-metrics-for-data-driv
en-companies-b0f698710da1
Understanding Metrics
Choosing Metrics
In order to establish a useful metric, a manager must first assess its goals.
From there, it is important to find the best outputs that measure the activities
related to these goals. A final step is also setting goals and targets for KPI
metrics that are integrated with business decisions.
Several businesses have also popularized certain methods that have become
industry standards in many sectors. DuPont began using metrics to better
their own business and in the process came up with the popular DuPont
analysis which closely isolates variables involved in the return on equity
(ROE) metric. GE has also commissioned a set of metrics known as Six
Sigma that are commonly used today, with metrics tracked in six key areas:
critical to quality; defects; process capability; variation; stable operations; and,
design for Six Sigma.
Examples of Metrics
While there are a wide range of metrics, below are some commonly used
tools:
Economic Metrics
There are several metrics that are key to comparing the financial position of
companies against their competitors or the market overall. Two of these key
comparable metrics, which are based on market value,
include price-to-earnings ratio and price-to-book ratio.
Portfolio Management
KEY TAKEAWAYS
Source: https://www.investopedia.com/terms/m/metrics.asp
For example, a metric may monitor website traffic compared to a goal, whereas a
KPI would monitor how website traffic contributed to incremental sales.
Depending on your company and the areas you’re aiming to monitor, you may
want to focus on certain business metrics in particular. Here, you can see a
number of performance metrics examples for industry verticals and departments
that are available to you:
Marketing Metrics
Incremental Sales
Social Sentiment
End Action Rate
SEO Traffic
Sales Metrics
Sales Growth
Product Performance
Financial Metrics
Current Ratio
Working Capital
SaaS Metrics
Social Media Metrics are values used by marketing teams to track the
performance of social media campaigns. Social media marketing is a
fundamental part of any business, bringing in website visits and eventually
converting web users into lead. Since marketing teams often use multiple
social media platforms to increase impressions, it can be difficult to monitor
performance on all of them. These social media metrics combine the most
important data and allow your team to track their progress:
Every business has data that is critical to monitor in real-time, or over time.
These metric examples are applicable (and important) to a multitude of
departments and fields:
Time to Healthcare Service
Service Level
Call Abandonment
Project Burndown Metric
Source:
https://www.klipfolio.com/resources/articles/what-are-business-metrics
But let’s start with the basics of business operations, and provide foundations
for analyzing your own metrics and KPIs while focusing on industry and
company department-specific examples which a business can use for its own
development. We will discuss Marketing, Human Resources, Sales, Logistics
and IT Project Management examples that can grow the operational efficiency
and decrease costs.
Turning these datasets into a business dashboard can effectively track the
right values and offer a comprehensive application to the entire business
system.
The analysis of operational KPIs and metrics with the right KPI software can
be easily developed by turning raw data into a neat and interactive online
dashboard, providing insights that can be easily overlooked when creating
traditional means of reporting and analysis, like spreadsheets or simple
written reports. Operational KPIs and metrics can be immense and boundless
if not defined and used properly, so taking care of the mentioned basics we
have outlined, should be one of the top priorities when deciding on which one
to use. Later we will discuss examples, so that a clear overview is made on
which one to identify and utilize – on an industry and function level.
The need to establish specific operational metrics and track their efficiency
creates invaluable results for any marketing campaign. Let's see this through
an example.
The CPC (Cost-per-Click) overview of campaigns is an operational metric that
expounds on the standard pricing model in online advertising. While
comparing different campaigns into the CPC section of the overall strategy,
you can easily spot which one had the lowest price and tackle deeper into the
details. While this marketing KPI is priceless when it comes to advertising, it
should be viewed in relations to other important operational metrics. Below in
the article you can find a holistic overview of different kinds of KPIs that are
used in a standard marketing practice.
After we have provided specific KPIs from industries and functions, now we
will focus on a holistic overview, and how they are interconnected into an
overall operational process. Let's analyze this through examples.
Another interesting metric to take into account is the overtime hours. That way
is easy to spot if employees are understaffed or lack training, that can also
affect productivity. The main focus is not to put workers under pressure which
can lead to demotivation. A comprehensive HR report can utilize all the
effectiveness needed to develop and maintain a sustainable workforce in a
company.
One of our operational metrics examples we will focus on next is sales. When
considering the sales cycle process, it is of utmost importance to compile a
succinct operations’ monitoring process to ensure all the sales stages, leading
to conversions, are covered.
Metrics shown in the example above provide operational details needed to
compile a holistic overview of the sales conversion rate cycle. Leads don’t
always turn into opportunities, and proposals don’t always yield wins, but the
monitoring process of your metrics can easily identify if the overall
performance is on track and developing as planned. The magic is in the
details, and this dashboard presentation can effectively round up the
data-story you need.
To put things into perspective, here are the Top 10 operational metrics from
different functions and industries:
1. CPC (Cost-per-Click)
2. CPA (Cost-per-Acquisition)
3. Absenteeism Rate
4. Overtime Hours
5. Lead-to-Opportunity Ratio
7. Delivery Time
8. Transportation Costs
Source: https://www.datapine.com/blog/operational-metrics-and-kpi-examples/
4. Review KPIs
As you move your prospects along the journey from awareness to
consideration to purchase (and repurchase/retention), your marketing
tactics, messaging, and content will deepen the relationship. At each
phase you should be looking at different KPIs that will signal success
or a need to optimize and improve performance.
Source: https://www.idg.com/8-steps-for-measurement-and-analytics-success/
Davenport and Harris article - “ The Dark Side of Customer
Analytics
by Thomas H. Davenport and Jeanne Harris
From the May 2007 Issue
Laura Brickman was glad she was almost done grocery shopping.
The lines at the local ShopSense supermarket were especially long for a Tuesday evening.
Her cart was nearly overflowing in preparation for several days away from her family, and
she still had packing to do at home. Just a few more items to go: “A dozen eggs, a half gallon
of orange juice, and—a box of Dip & Dunk cereal?” Her six-year-old daughter, Maryellen,
had obviously used the step stool to get at the list on the counter and had scrawled her
high-fructose demand at the bottom of the paper in bright-orange marker.
Laura made a mental note to speak with Miss Maryellen about what sugary
cereals do to kids’ teeth (and to their parents’ wallets). Taking care not to
crack any of the eggs, she squeezed the remaining items into the cart. She
wheeled past the ShopSense Summer Fun displays. “Do we need more
sunscreen?” Laura wondered for a moment, before deciding to go without.
She got to the checkout area and waited.
As regional manager for West Coast operations of IFA, one of the largest
sellers of life and health insurance in the United States, Laura normally might
not have paid much attention to Shop-Sense’s checkout procedures—except
maybe to monitor how accurately her purchases were being rung up. But now
that her company’s fate was intertwined with that of the Dallas-based national
grocery chain, she had less motivation to peruse the magazine racks and
more incentive to evaluate the scanning and tallying going on ahead of her.
Shortly after reading that article, Laura had invited Steve to her office in San
Francisco. The two met several times, and, after some fevered discussions
with her bosses in Ohio, Laura made the ShopSense executive an offer. The
insurer wanted to buy a small sample of the grocer’s customer loyalty card
data to determine its quality and reliability; IFA wanted to find out if the
ShopSense information would be meaningful when stacked up against its own
claims information.
With top management’s blessing, Steve and his team had agreed to provide
IFA with ten years’ worth of loyalty card data for customers in southern
Michigan, where ShopSense had a high share of wallet—that is, the
supermarkets weren’t located within five miles of a “club” store or other major
rival. Several months after receiving the tapes, analysts at IFA ended up
finding some fairly strong correlations between purchases of unhealthy
products (high-sodium, high-cholesterol foods) and medical claims. In
response, Laura and her actuarial and sales teams conceived an offering
called Smart Choice, a low-premium insurance plan aimed at IFA customers
who didn’t indulge.
Laura was flying the next day to IFA’s headquarters in Cincinnati to meet with
members of the senior team. She would be seeking their approval to buy
more of the ShopSense data; she wanted to continue mining the information
and refining IFA’s pricing and marketing efforts. Laura understood it might be
a tough sell. After all, her industry wasn’t exactly known for embracing radical
change—even with proof in hand that change could work. The make-or-break
issue, she thought, would be the reliability and richness of the data.
“Your CEO needs to hear only one thing,” Steve had told her several days
earlier, while they were comparing notes. “Exclusive rights to our data will give
you information that your competitors won’t be able to match. No one else has
the historical data we have or as many customers nationwide.” He was right,
of course. Laura also knew that if IFA decided not to buy the grocer’s data,
some other insurer would.
“Exclusive rights to our data will give you information that your
competitors won’t be able to match. No one else has the historical data
we have.”
“Paper or plastic?” a young boy was asking. Laura had finally made it to front
of the line. “Oh, paper, please,” she replied. The cashier scanned in the
groceries and waited while Laura swiped her card and signed the touch
screen. Once the register printer had stopped chattering, the cashier curled
the long strip of paper into a thick wad and handed it to Laura. “Have a nice
night,” she said mechanically.
Before wheeling her cart out of the store into the slightly cool evening, Laura
briefly checked the total on the receipt and the information on the back:
coupons for sunblock and a reminder about the importance of UVA and UVB
protection.
“No data set is perfect, but based on what we’ve seen already, the
ShopSense info could be a pretty rich source of insight for us,” Archie Stetter
told the handful of executives seated around a table in one of IFA’s recently
renovated conference rooms. Laura nodded in agreement, silently cheering
on the insurance company’s uberanalyst. Archie had been invaluable in
guiding the pilot project. Laura had flown in two days ahead of the meeting
and had sat down with the chatty statistics expert and some members of his
team, going over results and gauging their support for continuing the
relationship with ShopSense.
“Trans fats and heart disease—no surprise there, I guess,” Archie said, using
a laser pointer to direct the managers’ attention to a PowerPoint slide
projected on the wall. “How about this, though: Households that purchase
both bananas and cashews at least quarterly seem to show only a negligible
risk of developing Parkinson’s and MS.” Archie had at first been skeptical
about the quality of the grocery chain’s data, but ShopSense’s well of
information was deeper than he’d imagined. Frankly, he’d been having a blast
slicing and dicing. Enjoying his moment in the spotlight, Archie went on a bit
longer than he’d intended, talking about typical patterns in the purchase of
certain over-the-counter medications, potential leading indicators for diabetes,
and other statistical curiosities. Laura noted that as Archie’s presentation wore
on, CEO Jason Walter was jotting down notes. O.Z. Cooper, IFA’s general
counsel, began to clear his throat over the speakerphone.
Laura was about to rein in her stats guy when Rusty Ware, IFA’s chief actuary,
addressed the group. “You know, this deal isn’t really as much of a stretch as
you might think.” He pointed out that the company had for years been buying
from information brokers lists of customers who purchased specific drugs and
products. And IFA was among the best in the industry at evaluating external
sources of data (credit histories, demographic studies, analyses of
socioeconomic status, and so on) to predict depression, back pain, and other
expensive chronic conditions. Prospective IFA customers were required to
disclose existing medical conditions and information about their personal
habits—drinking, smoking, and other high-risk activities—the actuary
reminded the group.
The CEO, meanwhile, felt that Rusty was overlooking an important point. “But
if we’re finding patterns where our rivals aren’t even looking, if we’re coming
up with proprietary health indicators—well, that would be a huge hurdle for
everyone else to get over,” Jason noted.
Laura was keeping an eye on the clock; there were several themes she still
wanted to hammer on. Before she could follow up on Jason’s comments,
though, Geneva Hendrickson, IFA’s senior vice president for ethics and
corporate responsibility, posed a blue-sky question to the group: “Take the
fruit-and-nut stat Archie cited. Wouldn’t we have to share that kind of
information? As a benefit to society?”
Several managers at the table began talking over one another in an attempt to
respond. “Correlations, no matter how interesting, aren’t conclusive evidence
of causality,” someone said. “Even if a correlation doesn’t hold up in the
medical community, that doesn’t mean it’s not useful to us,” someone else
suggested.
Laura saw her opening; she wanted to get back to Jason’s point about
competitive advantage. “Look at Progressive Insurance,” she began. It was
able to steal a march on its rivals simply by recognizing that not all motorcycle
owners are created equal. Some ride hard (young bikers), and some hardly
ride (older, middle-class, midlife crisis riders). “By putting these guys into
different risk pools, Progressive has gotten the rates right,” she said. “It wins
all the business with the safe set by offering low premiums, and it doesn’t lose
its shirt on the more dangerous set.”
Then O.Z. Cooper broke in over the speakerphone. Maybe the company
should formally position Smart Choice and other products and marketing
programs developed using the Shop-Sense data as opt in, he wondered. A lot
of people signed up when Progressive gave discounts to customers who
agreed to put devices in their cars that would monitor their driving habits. “Of
course, those customers realized later they might pay a higher premium when
the company found out they routinely exceeded the speed limit—but that’s not
a legal problem,” O.Z. noted. None of the states that IFA did business in had
laws prohibiting the sort of data exchange ShopSense and the insurer were
proposing. It would be a different story, however, if the company wanted to do
more business overseas.
At that point, Archie begged to show the group one more slide: sales of
prophylactics versus HIV-related claims. The executives continued taking
notes. Laura glanced again at the clock. No one seemed to care that they
were going a little over.
Data Decorum
Rain was in the forecast that afternoon for Dallas, so Steve Worthington
decided to drive rather than ride his bike the nine and a half miles from his
home to ShopSense’s corporate offices in the Hightower Complex. Of course,
the gridlock made him a few minutes late for the early morning meeting with
ShopSense’s executive team. Lucky for him, others had been held up by the
traffic as well.
The group gradually came together in a slightly cluttered room off the main
hallway on the 18th floor. One corner of the space was being used to store
prototypes of regional in-store displays featuring several members of the
Houston Astros’ pitching staff. “I don’t know whether to grab a cup of coffee or
a bat,” Steve joked to the others, gesturing at the life-size cardboard cutouts
and settling into his seat.
Steve was hoping to persuade CEO Donna Greer and other members of the
senior team to approve the terms of the data sale to IFA. He was pretty
confident he had majority support; he had already spoken individually with
many of the top executives. In those one-on-one conversations, only Alan
Atkins, the grocery chain’s chief operations officer, had raised any significant
issues, and Steve had dealt patiently with each of them. Or so he thought.
At the start of the meeting, Alan admitted he still had some concerns about
selling data to IFA at all. Mainly, he was worried that all the hard work the
organization had done building up its loyalty program, honing its analytical
chops, and maintaining deep customer relationships could be undone in one
fell swoop. “Customers find out, they stop using their cards, and we stop
getting the information that drives this whole train,” he said.
“ Customers find out, they stop using their cards, and we stop getting
the information that drives this whole train.”
Steve reminded Alan that IFA had no interest in revealing its relationship with
the grocer to customers. There was always the chance an employee would let
something slip, but even if that happened, Steve doubted anyone would be
shocked. “I haven’t heard of anybody canceling based on any of our other
card-driven marketing programs,” he said.
“That’s because what we’re doing isn’t visible to our customers—or at least it
wasn’t until your recent comments in the press,” Alan grumbled. There had
been some tension within the group about Steve’s contribution to several
widely disseminated articles about ShopSense’s embrace of customer
analytics.
“Point taken,” Steve replied, although he knew that Alan was aware of how
much positive attention those articles had garnered for the company. Many of
its card-driven marketing programs had since been deemed cutting-edge by
others in and outside the industry.
Steve had hoped to move on to the financial benefits of the arrangement, but
Denise Baldwin, ShopSense’s head of human resources, still seemed
concerned about how IFA would use the data. Specifically, she wondered,
would it identify individual consumers as employees of particular companies?
She reminded the group that some big insurers had gotten into serious trouble
because of their profiling practices.
IFA had been looking at this relationship only in the context of individual
insurance customers, Steve explained, not of group plans. “Besides, it’s not
like we’d be directly drawing the risk pools,” he said. Then Steve began
distributing copies of the spreadsheets outlining the five-year returns
ShopSense could realize from the deal.
“‘Directly’ being the operative word here,” Denise noted wryly, as she took her
copy and passed the rest around.
It was 6:50 pm, and Jason Walters had canceled his session with his personal
trainer—again—to stay late at the office. Sammy will understand, the CEO
told himself as he sank deeper into the love seat in his office, a yellow legal
pad on his lap and a pen and cup of espresso balanced on the arm of the
couch. It was several days after the review of the ShopSense pilot, and Jason
was still weighing the risks and benefits of taking this business relationship to
the next stage.
What if IFA took the pilot to the next level and found out something that
maybe it was better off not knowing?
But Jason also saw dark clouds on the horizon: What if IFA took the pilot to
the next level and found out something that maybe it was better off not
knowing? As he watched the minute hand sweep on his wall clock, Jason
wondered what risks he might be taking without even realizing it. • • •
Donna Greer gently swirled the wine in her glass and clinked the stemware
against her husband’s. The two were attending a wine tasting hosted by a
friend. The focus was on varieties from Chile and other Latin American
countries, and Donna and Peter had yet to find a sample they didn’t like. But
despite the lively patter of the event and the plentiful food. Donna couldn’t
keep her mind off the IFA deal. “The big question is, Should we be charging
more?” she mused to her husband. ShopSense was already selling its
scanner data to syndicators, and, as her CFO had reminded her, the company
currently made more money from selling information than from selling meat.
Going forward, all ShopSense would have to do was send IFA some tapes
each month and collect a million dollars annually of pure profit. Still, the deal
wasn’t without risks: By selling the information to IFA, it might end up diluting
or destroying valuable and hard-won customer relationships. Donna could see
the headline now: “Big Brother in Aisle Four.” All the more reason to make it
worth our while, she thought to herself.
Peter urged Donna to drop the issue for a bit, as he scribbled his comments
about the wine they’d just sampled on a rating sheet. “But I’ll go on record as
being against the whole thing,” he said. “Some poor soul puts potato chips in
the cart instead of celery, and look what happens.”
“But what about the poor soul who buys the celery and still has to pay a
fortune for medical coverage,” Donna argued, “because the premiums are set
based on the people who can’t eat just one?”
“Isn’t that the whole point of insurance?” Peter teased. The CEO shot her
husband a playfully peeved look—and reminded herself to send an e-mail to
Steve when they got home.
The message coming from both IFA and ShopSense is that any
marketing opportunity is valid—as long as they can get away with it.
In my company, this pilot would never have gotten off the ground. The culture
at Borders is such that the managers involved would have just assumed we
wouldn’t do something like that. Like most successful retail companies, our
organization is customer focused; we’re always trying to see a store or an
offer or a transaction through the customer’s eyes. It was the same way at
both Saks and Target when I was with those companies.
I honestly don’t think these companies have hit upon a responsible formula for
mining and sharing customer data. If ShopSense retained control of its data to
some degree—that is, if the grocer and IFA marketed the Smart Choice
program jointly, and if any offers came from ShopSense (the partner the
customer has built up trust with) rather than the insurance company (a
stranger, so to speak)—the relationship could work. Instead of ceding
complete control to IFA, ShopSense could be somewhat selective and send
offers to all, some, or none of its loyalty card members, depending on how
relevant the grocer believed the insurance offer would be to a particular set of
customers.
A big hole in these data, though, is that people buy food for others besides
themselves. I rarely eat at home, but I still buy tons of groceries—some
healthy, some not so healthy—for my kids and their friends. If you looked at a
breakdown of purchases for my household, you’d say “Wow, they’re
consuming a lot.” But the truth is, I hardly ever eat a bite. That may be an
extreme example, but it suggests that IFA’s correlations may be flawed.
As the case study illustrates, companies will soon be able to create fairly
exhaustive, highly accurate profiles of customers without having had any
direct interaction with them. They’ll be able to get to know you intimately
without your knowledge.
From the consumer’s perspective, this trend raises several big concerns. In
this fictional account, for instance, a shopper’s grocery purchases may directly
influence the availability or price of her life or health insurance products—and
not necessarily in a good way. Although the customer, at least tacitly,
consented to the collection, use, and transfer of her purchase data, the real
issue here is the unintended and uncontemplated use of the information (from
the customer’s point of view). Most customers would probably be quite
surprised to learn that their personal information could be used by companies
in a wholly unrelated industry and in other ways that aren’t readily
foreseeable.
If consumers lose trust in firms that collect, analyze, and utilize their
information, they will opt out of loyalty and other data-driven marketing
programs, and we may see more regulations and limitations on data
collection. Customer analytics are effective precisely because firms
do not violate customer trust. People believe that retail and other
organizations will use their data wisely to enhance their experiences, not to
harm them. Angry customers will certainly speak with their wallets if that trust
is violated.
Decisions that might be made on the basis of the shared data represent
another hazard for consumers—and for organizations. Take the insurance
company’s use of the grocer’s loyalty card data. This is limited information at
best and inaccurate at worst. The ShopSense data reflect food bought but not
necessarily consumed, and individuals buy food at many stores, not just one.
IFA might end up drawing erroneous conclusions—and exacting unfair rate
increases. The insurer’s general counsel should investigate this deal.
ShopSense’s loyalty card data are at the center of this venture, but the
grocer’s goal here is not to increase customer loyalty. The value of its
relationship with IFA is solely financial. The company should explore whether
there are some customer data it should exclude from the
transfer—information that could be perceived as exceedingly sensitive, such
as pharmacy and alcohol purchases. It should also consider doing market
research and risk modeling to evaluate customers’ potential reaction to the
data sharing and the possible downstream effect of the deal.
The risk of consumer backlash is lower for IFA than for ShopSense, given the
information the insurance company already purchases. IFA could even put a
positive spin on the creation of new insurance products based on the
ShopSense data. For instance, so-called healthy purchases might earn
customers a discount on their standard insurance policies. The challenge for
the insurer, however, is that there is no proven correlation between the
purchase of certain foods and fewer health problems. IFA should continue
experimenting with the data to determine their richness and predictive value.
Some companies have more leeway than others to sell or trade customer
lists. At Harrah’s, we have less than most because our customers may not
want others to know about their gaming and leisure activities. We don’t sell
information, and we don’t buy a lot of external data. Occasionally, we’ll buy
demographic data to fine-tune our marketing messages (to some customers,
an offer of tickets to a live performance might be more interesting than a
dining discount, for example). But we think the internal transactional data are
much more important.
We tell customers the value proposition up front: Let us track your play at our
properties, and we can help you enjoy the experience better with richer
rewards and improved service. They understand exactly what we’re capturing,
the rewards they’ll get, and what the company will do with the information. It’s
a win-win for the company and for the customer.
Humana provides health benefit plans and related health services to more
than 11 million members nationwide. We use proprietary data-mining and
analytical capabilities to help guide consumers through the health maze. Like
IFA, we ask our customers to share their personal and medical histories with
us (the risky behaviors as well as the good habits) so we can acquaint them
with programs and preventive services geared to their health status.
One of the biggest problems in U.S. health care today is obesity. So would it
be useful for our company to look at grocery-purchasing patterns, as the
insurance company in the case study does? It might be. I could see the
upside of using a grocer’s loyalty card data to develop a wellness-based
incentive program for insurance customers. (We would try to find a way to
build positives into it, however, so customers would look at the interchange
and say “That’s in my best interest; thank you.”) But Humana certainly
wouldn’t enter into any kind of data-transfer arrangement without ensuring
that our customers’ personal information and the integrity of our relationship
with them would be properly protected. In health care, especially, this has to
be the chief concern—above and beyond any patterns that might be revealed
and the sort of competitive edge they might provide. We use a range of
industry standard security measures, including encryption and firewalls, to
protect our members’ privacy and medical information.
Ethical behavior starts with the CEO, but it clearly can’t be managed by just
one person. It’s important that everyone be reminded often about the
principles and values that guide the organization. When business
opportunities come along, they’ll be screened according to those
standards—and the decisions will land right side up every time. I can’t tell
people how to run their meetings or who should be at the table when the
tougher, gray-area decisions need to be made, but whoever is there has to
have those core principles and values in mind.
The CEOs in the case study need to take the “front page” test: If the headline
on the front page of the newspaper were reporting abuse of customer data
(yours included), how would you react? If you wouldn’t want your personal
data used in a certain way, chances are your customers wouldn’t, either.
Source: https://hbr.org/2007/05/the-dark-side-of-customer-analytics
✧ Learning Activity:
● Analyze the case study on “The Dark Side of Customer Analytics.” How
can the companies leverage the data responsibly? Share your insights.