Good Information Governance
Good Information Governance
WHITE PAPER
Executive Summary
Most organizations struggle with how to manage the enormous volumes of
information they have today, but the problem is going to become much more difficult
in the future as both the number of new data types and the volume of data increase.
To get a handle on these problems, decision makers should implement an information
governance program that will help to properly manage their data; enable their
organizations to satisfy their legal, regulatory and best practice obligations; enable
improved employee productivity; and reduce the overall corporate risk associated
with improper information management.
KEY TAKEAWAYS
• Information volumes are increasing
The sheer volume of information is increasing and managing it will get worse in
the future because of the growing volume of data types that organizations retain
today, because they will need to retain new data types in the future, and
because good information governance practices are lacking in many
organizations.
The Challenges
Information is the lifeblood of most organizations and it is becoming more important
over time. This section address some of the many challenges in dealing with
information.
In short, we create, store and transfer lots of information and the volume of data
continues to grow, the vast majority of it unstructured. Moreover, organizations are
creating and storing a variety of new information types in addition to the traditional
sources like emails and files, such as social media posts, text messages, videos,
voicemails, data from Operational Technology (OT) devices and sensors, etc.
that the typical employee spends 1.8 hours per day searching for and gathering
informationv. If we assume a fully burdened salary for the typical information worker
of $60,000 annually, then $12,981 of that salary is paid simply for that worker to find
information. Moreover, additional productivity can be squandered from inefficient
search practices and the re-creation of data when retained information cannot be
found.
The research we conducted for this white paper found that 77 percent of Storage bloat is
organizations regularly dispose of information from their file shares and 74 percent
do so from their email archives, but defensible deletion is much less common for another issue
other types of data.
that good
These results have important implications: it means that most of the information information
retained by organizations is unnecessary, but many organizations aren’t addressing
the problem. Retaining this data adds needlessly to storage costs, and it makes
governance can
searching for the remaining information that actually is valuable that much more address.
difficult.
• Storage costs
Storing information that is not necessary obviously drives up storage costs, but it
also makes it more difficult and more costly to find the minority of information
that is necessary to produce for legal, regulatory, employee productivity, and
other considerations. A good information governance program can significantly
reduce the cost of storage in the short term and can result in slower growth in
storage costs over the longer term.
• Defensible deletion
Key to the ability to reduce data volumes is a defensible deletion program that
will allow decision makers to safely delete unnecessary information. It will not
only reduce storage costs, but also reduce costs through fewer documents being
identified as potentially responsive during eDiscovery and litigation review.
• End-user productivity
As noted earlier, employees spend a significant amount of their day searching for
old information for reuse and reference. When they cannot find the data they
need, they often will recreate the data they couldn’t find, wasting their time on
duplicating information that is “somewhere” in the organization. A good Key to the
information governance program can ensure that data can be found quickly,
eliminating the need to recreate lost information. ability to
• Litigation support and eDiscovery
reduce data
The growing volume of electronically stored information makes it very difficult for volumes is a
end users to find and properly categorize all of this information, and so poor
information governance practices will drive up the cost of data collection in
defensible
response to an eDiscovery request. As noted above, the problem is exacerbated deletion
by the tendency to over-collect information for fear of spoliation and the
significant consequences that can result. That drives up the cost of the data
program that
review process and makes it more difficult to meet production demands in a will allow
timely manner.
decision
The process of eDiscovery review involves reviewing all of the potentially makers to
responsive documents to determine if they are actually responsive to the case, or
are privileged or confidential and, therefore, not subject to production. Because safely delete
data preparation for document review costs roughly $150 per gigabyte,
document review hosting costs $20 per gigabyte per month, and document
unnecessary
review is about $1.00 per documentvii, a good information governance program information.
that culls out unnecessary data can dramatically reduce the cost of the
eDiscovery process.
In the next section, we will examine the costs for each of these areas without good
information governance in place, and in the section after that focus on how
information governance can reduce these costs significantly.
• Redundancy and high availability will require at least 30 percent overhead, and
so 50 terabytes of usable storage will require the purchase of at least 65
terabytes of total storage.
• Add in the cost of labor to evaluate, purchase, deploy, configure and maintain
these systems.
• Add in the cost of space to house these systems and their power and cooling
requirements. For example, a Dell EMC Isilon X210 chassis consumes 400 watts.
At 12 cents per kilowatt-hour, that translates to about $420 per year in electricity
costs.
The result is that the actual cost of storage is many multiples higher than its initial
procurement cost. As shown in Figure 1, the estimated storage usage and savings
over a five-year period can be calculated based on these assumptions.
Figure 1
Assumptions for Calculating Email, File System and SharePoint Storage
Storage is
Description cheap, but
Total employees in year 1 2,500
Average size of an email 50Kb storage
Average number of emails sent and received daily per employee 100 management is
Expected rate of increase or decrease in the number of employees
per year
5% not.
Annual growth rate in the average size of email messages
15%
(including attachments)
Annual growth rate in the average number of emails sent and
5%
received daily per employee
Estimated annual change in the cost of storage -20%
Expected annual growth rate of file system storage requirements 15%
Expected annual growth rate of total SharePoint storage
10%
requirements
Fully burdened Tier 1 storage cost per gigabyte $15.00
Fully burdened Tier 2 storage cost per gigabyte $12.50
Average number of workdays per year 250
Based on these assumptions, the five-year cost of email storage is shown in Figure 2.
Figure 2
Email Storage Calculations Over Five Years
In order to determine the total cost of resources used for file system storage, we
multiply the amount of storage consumed by the file system by the fully loaded cost
per gigabyte of the storage tier used. It’s important to note that file system storage
includes a range of solutions, including traditional file shares, secure file transfer
The actual cost
systems, cloud-based file storage and the like. of storage is
In this example, we are assuming 30 terabytes of Tier 1 storage, and so the cost
many multiples
calculation for Year 1 would be: higher than its
30 terabytes x $15.00/Gb x 1,024 = $460,800 initial procure-
ment cost.
As shown in Figure 3, file storage costs are shown for a five-year period.
Figure 3
File System Storage Calculations Over Five Years
As a next step, we estimate the amount and cost of storage consumed by the various
SharePoint repositories. First, we determine the total number of SharePoint
installations, the average volume of storage used for each SharePoint instance, and
the storage tier used. In this example, there are 13 SharePoint repositories with an
average of 400 gigabytes in each Tier 2 storage repository. The SharePoint storage
calculations are:
Consequently, the cost of that storage would be 5.2 terabytes multiplied by $12.50
per gigabyte, or $66,560 in Year 1. Figure 4 below shows the calculations for both
the storage in use and its cost over a five-year period.
Figure 4
SharePoint Storage Calculations Over Five Years
Please note that the assumption we have made in the analysis above is that no data
is deleted over the five-year period, and there are no limits on the consumption of
storage.
As noted earlier, employees also spend significant amounts of time searching for
information. Since most organizations don’t actively manage their employees’ data,
individual employees are typically left to decide how best to store the information
they decide to keep. The survey conducted for this white paper found that 87 percent
of organizations rely on employees to categorize and file their own digital
information, but only 53 percent of organizations provide guidelines to their
employees on how to do this. Over time, many employees forget where they stored a
particular file and so will conduct a hit-and-miss keyword search. These searches do
not normally produce the desired content right away because of the use of weak
search applications available to the employee, the use of incorrect search terms, and
forgotten data repositories. These searches can negatively impact employee
productivity, particularly if the average employee searches for old information on a
regular basis. Individual
We have made the assumptions shown in Figure 6 for a pre-information governance employees are
environment, showing the amount of time that employees search for information and
their success in doing so.
typically left to
decide how
Figure 6
best to store
End User Productivity Assumptions the information
they decide to
Description Variables
Average number of hours per week spent managing
2.0
keep.
email/files/SharePoint records
Number of times per year the average employee searches for old
16
email/files/records
Average minutes spent searching for old email/files/records, per
30
search
Average percentage of success in finding old email/files/records 60%
Average time spent per email/file/record (in hours) recreating the
1.0
content the search did not turn up
Average annual fully burdened employee salary $60,000
Average annual salary growth 3.5%
Work weeks per year per employee 50
Figure 7
Per-User Productivity Loss Calculations by Year Without Information
Governance
However, because of poor indexing and management of data, those charged with
eDiscovery typically will collect too much information and then later cull out the
irrelevant data or that which cannot be produced as part of the eDiscovery order.
A typical eDiscovery effort includes conducting a keyword search of the various data
repositories for relevant content within a target date range. The average for initial
data collection is roughly three to five gigabytes of data per custodian.
The cost of eDiscovery collection and review is relatively high in most cases because
of the large volume of data that must be culled, processed and reviewed. To
determine if an information governance program would reduce an organization’s
eDiscovery costs, it must first understand the details of its current eDiscovery
processes. To better show the details of eDiscovery costs, Figure 9 details some
assumptions and costs for eDiscovery.
Figure 9
eDiscovery Cost Assumptions
Assumptions
Number of discovery requests per year
Number of custodians per discovery request
6
20
The ability to
Gigabytes of data per custodian 3.5 deduplicate
Average number of document pages per gigabyte 12,000 content and
Average culling percentage (cull rate) 45%
Number of documents that a reviewer can process per hour 50 defensibly
Hourly billing rate for a legal reviewer (average of attorney and
$65.00 dispose of
paralegal)
Annual salary increase for legal reviewers 5% information are
Calculations the two
Total gigabytes of data per eDiscovery event (pre-culling) 70.0
Total gigabytes of data per eDiscovery event (post-culling) 38.5 primary ways
Total documents per eDiscovery event (post-culling) 462,000 that
Hours spent on document review 9,240
Costs
information
eDiscovery review per event $600,600 governance
Total annual eDiscovery review $3,603,600
results in
Source: Osterman Research, Inc. storage
savings.
The Cost Savings and ROI of Good
Information Governance
Much of the savings that result from a robust information governance program will
come primarily from two areas:
• Storage savings
The ability to deduplicate content and defensibly dispose of information are the
two primary ways that information governance results in storage savings. A good
information governance program will enable an organization to identify expired,
unnecessary and useless data, and to delete this content safely. This will free up
storage resources that then can be redeployed, delaying the purchase of new
storage systems.
For the following analysis, we will conservatively assume that only 40 percent of
data can safely be deleted without negatively impacting the organization, and
that 10 percent of data is duplicated and can be disposed of without
Shown in Figure 10 is the anticipated cost savings for email, file system and
SharePoint storage based on the assumptions for defensible deletion and
deduplication noted in the paragraph above.
Figure 10
Storage Savings Arising from Good Information Governance
Millions of Dollars
A general lack
• Litigation support and eDiscovery savings
There are two rules of thumb in eDiscovery response: of information
o A general lack of information management across the enterprise translates
management
to more time spent searching for and reviewing potentially relevant content. across the
o The more electronic content you have, the higher the cost of collection and
enterprise
review. translates to
eDiscovery savings will come from two areas: data collection and data review.
more time
Both of these are influenced by the volume of potentially discoverable data spent searching
floating around an organization. The more unnecessary data that can be
removed from the organization before a discovery request is received, the less for and
data that will have to be collected, culled and reviewed. Studies have reviewing
demonstrated that much of the data collected and reviewed during discovery
should not have been available to discover and would have been removed and potentially
not included in the search and collection process if effective information
governance had been available.
relevant
content.
Figure 11 shows the cost savings that result from an information governance
program’s reduction in storage and resulting eDiscovery effort.
Figure 11
Total Estimated eDiscovery Savings
Figure 12
Per-User Productivity Loss Calculations by Year With Information
Governance
The total costs of eDiscovery review, storage and employee productivity with
information governance are shown in Figure 13, and a comparison of costs without
and with information governance are shown in Figure 14.
Figure 13
Annual Costs With Good Information Governance in a 2,500-User
Organization
Millions of Dollars
Figure 14
Cumulative Costs Without and With Good Information Governance
Millions of Dollars
CALCULATING ROI
Return-on-investment (ROI) is a measurement of investment performance that is Return-on-
used to evaluate the efficiency of an investment. ROI is based on good faith
estimates of costs before and after the investment, and it goes beyond the simple investment
cost savings calculations that many label as ROI. The difference between cost
savings and a true ROI measurement is the inclusion of the actual cost of the
(ROI) is a
investment into the calculations. To determine ROI, the cost of the investment is measurement
subtracted from the estimated cost savings of an investment and is then divided by
the cost of the investment, the result being expressed as a percentage. This is the
of investment
standard ROI formula: performance
that is used to
(the cost before the investment minus the cost after the investment) evaluate the
minus the cost of the investment
= ROI efficiency of an
the cost of the investment investment.
Let’s assume that the cumulative cost of an information governance program in a
2,500-user organization will be $6 million over five years ($2,400 per user over five
years, or an average of $480 per user per year), which will include the cost of the
information governance platform(s), the various technologies that will be deployed,
the labor required to manage the program, and so forth. Using the data presented
above, we can determine an ROI for an information governance investment by
populating the above formula with the already calculated costs and cost savings
(using the five-year estimates), plus an estimated cost of the investment:
losses arising from poor information governance – are “soft” costs, or costs that the
company is not paying directly. Unlike the costs of paralegals, outside counsel,
additional storage systems and the like, soft costs are not one for which finance will
cut a check, and so many decision makers balk at the cost of considering them as a
true cost of the business. The mindset for some is that if employees need to spend
extra time searching for information or re-creating it, they can work longer hours,
work weekends, etc. to make up for these inefficiencies.
What this tells us is that for every hour of employee productivity recovered, the
average contribution to revenue for the companies shown above will range from $112
to $1,965 per hour. For those decision makers who are skeptical that recovery of
employee productivity will result in additional corporate revenue, a simple question:
why are you hiring employees if they aren’t contributing to revenue generation?
happen? One way to do this is to consider the life insurance model: those who want
to protect their families or companies will typically spend significant sums on a
product that will mitigate the risk from an event that they have never experienced.
It is also important to note that there are some additional soft costs to consider:
Our research found that the leading drivers for an information governance
program are avoiding risk (77 percent), regulatory risks other than the GDPR (60
percent), and improving end user productivity (48 percent).
No part of this document may be reproduced in any form by any means, nor may it be
distributed without the permission of Osterman Research, Inc., nor may it be resold or
distributed by any entity other than Osterman Research, Inc., without prior written authorization
of Osterman Research, Inc.
Osterman Research, Inc. does not provide legal advice. Nothing in this document constitutes
legal advice, nor shall this document or any software product or other offering referenced herein
serve as a substitute for the reader’s compliance with any laws (including but not limited to any
act, statute, regulation, rule, directive, administrative order, executive order, etc. (collectively,
“Laws”)) referenced in this document. If necessary, the reader should consult with competent
legal counsel regarding any Laws referenced herein. Osterman Research, Inc. makes no
representation or warranty regarding the completeness or accuracy of the information contained
in this document.
THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND. ALL EXPRESS OR
IMPLIED REPRESENTATIONS, CONDITIONS AND WARRANTIES, INCLUDING ANY IMPLIED
WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, ARE
DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE DETERMINED TO BE
ILLEGAL.
REFERENCES
i
https://www.cgoc.com/wp-content/uploads/2018/11/CGOC_Infographic_2018_.png
ii
https://www.fiaks.com/data-never-sleeps-6-0/
iii
https://www.statista.com/statistics/456500/daily-number-of-e-mails-worldwide/
iv
https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-
every-day-the-mind-blowing-stats-everyone-should-read/#130eda0960ba
v
https://www.cottrillresearch.com/various-survey-statistics-workers-spend-too-much-time-
searching-for-information/
vi
https://www.cgoc.com/wp-content/uploads/2018/11/CGOC_Infographic_2018_.png
vii
https://www.mindseyesolutions.com/2017/03/30/want-to-reduce-the-cost-of-ediscovery-re-
think-the-approach/
viii
http://fortune.com/fortune500/