Accelerating AI Impact by Taming The Data Beast

Uploaded by

Bill Petrie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

Accelerating AI Impact by Taming The Data Beast

Uploaded by

Bill Petrie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Public & Social Sector Practice

Accelerating AI
impact by taming the
data beast
Government agencies seeking to deploy artificial intelligence face
hurdles in data awareness, availability, and quality. A five-step,
mission-based data strategy can help sidestep these challenges.

by Anusha Dhasarathy, Ankur Ghia, Sian Griffiths, and Rob Wavra

March 2020
Artificial intelligence (AI) has the power to private-sector organizations, but it’s often in
dramatically enhance the way public-sector unusable, inconsistent formats. On average, only
agencies serve their constituents, tackle their 3 percent of an organization’s data meet the quality
most vexing issues, and get the most out of standards needed for analytics.² And unlike tools,
their budgets. Several converging factors are infrastructure, or talent, a complete set of AI-ready
pressuring governments to embrace AI's potential. data cannot typically be purchased because an
As citizens become more familiar with the power agency’s unique use cases and mission demand
of AI through digital banking, virtual assistants, bespoke data inputs.
and smart e-commerce, they are demanding
better outcomes from their governments. Similarly, The most powerful AI solutions often require
public servants are pushing for private sector–like a cocktail of internal data about constituents,
solutions to boost on-the-job effectiveness. At the programs, and services as well as external data from
same time, AI technology is maturing rapidly and other agencies and third parties for enrichment. The
being incorporated into many offerings, making it core—existing internal agency data—is often in a
increasingly accessible to all organizations. format and a quality that make it incompatible
with AI approaches. A Socrata survey highlighted
Most government agencies around the world do these challenges:³
not yet have all of the building blocks of successful
AI programs—clear vision and strategy, budget, — Only 45 percent of developers agreed that
high-quality available data, and talent—in place. government data was clean and accurate; the
Even as AI strategy is formulated, budget secured, same percent agreed that it was in a usable
and talent attracted, data remains a significant format for their work
stumbling block. For governments, getting all of an — Less than 35 percent thought it was
organization’s data “AI ready” is difficult, expensive, well documented
and time-consuming (see sidebar, “AI-ready data
defined”), limiting the impact of AI to pilots and
projects within existing silos. In addition, sharing data between agencies
often requires an intergovernmental agreement
How can governments get past pilots and (IGA)—which can take years to secure, even
proofs-of-concept to achieve broader results? with the most willing counterparties. Within a single
To raise the return on AI spending, leading national agency, policy restrictions require signed
organizations are prioritizing use cases and data-sharing agreements and adherence to
narrowing their aperture to focus only on improving multiple security standards. State agencies face
the data necessary to create an impact with AI. A similar problems with inconsistent confidentiality,
five-step, mission-driven process can ensure data privacy requirements, and legal frameworks
meets all AI requirements and that every dollar for sharing data. The result is a hodgepodge of
invested generates tangible improvements. conflicting memorandums of understanding
and IGAs.

Locating data and determining ownership can

Navigating the data labyrinth
also pose challenges. In many organizations, data
As governments seek to harness the power of
have accumulated uncontrollably for years. It’s
AI, one of the first questions that AI programs
not uncommon for agencies to be unaware of
may need to answer concerns analytical adequacy:
where the data reside, who owns them, and where
Is there data, and is it of sufficient quality to
they came from. As a result, little AI-relevant data is
address the specific business need?¹ On the
accessible to any given office or “problem owner” in
whole, the public sector has more data than

1
Oliver Fleming, Tim Fountaine, Nicolaus Henke, and Tamim Saleh, “Ten red flags signaling your analytics program will fail,” May 2018,
McKinsey.com.
2
Tadhg Nagle et al., “Only 3% of companies’ data meets basic quality standards,” Harvard Business Review, September 11, 2017, hbr.org.
3
Developers Rate the Current State of Gov Data Accessibility, Socrata, updated August 23, 2016, benchmarkstudy.socrata.com.

2 Accelerating AI impact by taming the data beast

Sidebar 1 of 2

AI-ready data
AI-ready data defined
defined.
Data that can support artificial intelligence (AI) solutions must meet five criteria:

1 Known
The agency is aware of its available enterprise and local data sources.

Understood
2 Users and leaders are aware of what’s in the data set (and what isn’t), where it came from (its provenance and lineage),
as well as its format, size, and potential to link to other data sets.

3 Available
Data must “live” somewhere that makes it available to users and analysts doing AI work.

4 Fit for purpose

The data are right for the AI goal and of sufficient quality, variety, and scale.

Secure
5 The data are being handled appropriately and are compliant with information security guidelines, confidentiality, relevant
civil rights and civil liberties rules, and data privacy regimes (for example, the General Data Protection Regulation).

the organization. According to a McKinsey Global data quality and underlying systems through
Survey about AI capabilities, only 8 percent of surgical fixes.
respondents across industries said their AI-relevant
data are accessible by systems across the All of these factors make getting data AI ready
organization.⁴ Data-quality issues are compounded expensive and time-consuming; the undertaking
by the fact that governments have a multitude of also demands talent that is not always available in
different systems, some of which are obsolete, so the public sector. It also puts years of IT projects
aggregating data can be exceedingly difficult. Both and data cleansing between current citizen needs
state and federal agencies grapple with aging and the impact of AI-enabled solutions. The number
infrastructure: in some instances, the whole stack of records needed for effective analytics can range
of hardware, data storage, and applications is still in from hundreds to millions (see box, "Number of
use—decades after reaching end of life. And annual records needed for effective analytics").
budget cycles make it difficult to implement long-
term fixes.
Five steps to AI-ready data
The scale of the challenge can lead government The best way for public-sector agencies to start
officials to take a slower, more comprehensive their AI journey is by defining a mission-based data
approach to data management. Realizing the strategy that focuses resources on feasible use
importance of data to AI, agencies often focus their cases with the highest impact, naturally narrowing
initial efforts on integrating and cleaning data, the number of data sets that need to be made AI
with the goal of creating an AI-ready data pool over ready. In other words, governments can often
hundreds or even thousands of legacy systems. accelerate their AI efforts by emphasizing impact
A more effective approach focuses on improving over perfection.

4
Michael Chui and Sankalp Malhotra, “AI adoption advances, but foundational barriers remain,” November 2018, McKinsey.com.

Accelerating AI impact by taming the data beast 3

McKinsey PSSP 2020
AI in Government
Sidebar 2 of 2

Box

Number of records needed for effective analytics

Number of records needed for effective analytics.
Deep learning on images: Millions Artificial neural networks: Hundreds of Classifiers: Hundreds to thousands
thousands to millions

In addition, while prioritizing use cases, users, interviews with technical experts and tenured
governments should ensure that data sources business staff, and the use of smart or automated
are available and that the organization is building data discovery tools to quickly map and categorize
familiarity and expertise with the most important agency data.
sources over time.
One federal agency, for example, led a digital
Proper planning can allow bundling of related use assessment of its enterprise data to highlight the
cases—that is, exploiting similar tools and data most important factors for achieving enhanced
sets, reducing the time required to implement operational effectiveness and cost savings. It built
use cases. By expending resources only on use a data catalog that allowed data practitioners
cases prioritized by mission impact and feasibility, throughout the agency to find and access available
governments can ensure investments are closely data sets.
tied to direct, tangible mission results and outcomes.
These early wins can build support and excitement 2. Evaluate the quality and completeness of
within agencies for further AI efforts. data sets
Since the prioritized use cases will require a limited
Governments can select the appropriate data number of data sets, agencies should assess the
sets and ensure they meet the AI-ready criteria by state of these sources to determine whether they
following five steps. meet a baseline for quality and completeness. At
a national customs agency, business leaders and
1. Build a use case–specific data catalog analytics specialists selected priority use cases and
The chief data officer, chief information officer, or then audited the relevant data sets. Moving forward
data domain owner should work with business on the first tranche of use cases tapped less than
leaders to identify existing data sets that are related 10 percent of the estimated available data.
to prioritized use cases, who owns them, in which
systems they live, and how one gains access. Data- In many instances, agencies have a significant
discovery approaches must be tailored to specific opportunity to tailor AI efforts to create impact
agency realities and architectures. Many successful with available data and then refine this approach
efforts to build data catalogs for AI include direct over time. A state-level government agency was
collaboration with line- and supervisor-level system able to use data that already existed and predictive

4 Accelerating AI impact by taming the data beast

analytics to generate a performance improvement is lacking, analytics teams can engineer
of 1.5 to 1.8 times. They then used that momentum new features or parameters, incorporating
to pursue cross-agency IGAs, focusing their third-party data sets or collecting new data in
investments on the data with the highest impact. critical domains.

3. Aggregate prioritized data sources A state agency decided to build a machine-

Agencies should then consolidate the selected data learning model to help inform the care decisions
sources into an existing data lake or a microdata of a vulnerable population. The model required
lake (a “puddle”)—either on existing infrastructure a wide range of inputs—from demographics to
or a new cloud-based platform put together for this health. Much of this data was of poor quality and
purpose. The data lake should be available to the in a suboptimal format. The agency conducted a
business, client, analytics staff, and contractors. systematic assessment of the required data by
One large civil engineering organization quickly digesting paper-based data and made targeted
collected and centralized relevant procurement investments to improve data quality and enrich
data from 23 enterprise resource planning systems existing data sets. It also generated the analytics
on a single cloud instance available to all relevant model to improve outcomes.
stakeholders.
5. Govern and execute
4. Gauge the data’s fit The last step is for agencies to establish a
Next, government agencies must perform a use governance framework covering stewardship,
case–specific assessment about the quantity, security, quality, and metadata. This need not
content, quality, and joinability of available data. immediately be an exhaustive list of rules, controls,
Since such assessments depend on a specific and aspirations for data maturation. Still, it is crucial
use case’s context or problem to be solved, data to define how data sets in different environments
can’t objectively be fit for purpose. For example, will be stewarded by business owners, how their
data that are highly aggregated or missing certain quality will be increased, and how they will be made
observations may be insufficiently granular accessible and usable by other agencies.
or low quality to inform person-level decision
support. However, they may be perfectly suited for Many security governance issues may already be
community-level predictions. To assess fit, analytics met by keeping data in a compliant environment
teams must do the following: or accredited container, but agencies still need to
pinpoint any rules that remain unaddressed. Finally,
— Select available data related to prioritized they should determine required controls based on
use cases. standard frameworks—for example, the National
— Develop a reusable data model for the analytic, Institute of Standards and Technology—and best
identifying specific fields and tables needed practices from leading security organizations. One
to inform the model. Notably, approaches large governmental agency was struggling with the
that depend on working directly with raw data, security and sharing requirements for its more than
exploiting materialized views, or developing 150 data sources and specialized applications. It did
custom queries for each feature often do not not have agency-level security procedures tailored
scale and may result in data inconsistency. for such a complex, role-based environment where
— Systematically assess the quality and dozens of combinations of roles and restrictions
completeness of prioritized data (such as error could exist. To resolve this issue, leaders developed
rate and missing fields) to understand gaps and a comprehensive enterprise data strategy with use
potential opportunities for improvement. case–level security requirements, dramatically
simplifying the target architecture and application
— Bring the best of agile approaches to data stack. The agency is currently executing a multiyear
development, iteratively enriching the reusable implementation road map.
data model and its contents. Where quality

Accelerating AI impact by taming the data beast 5

These important governance and security knowledge, and limited budgets on the subset of
responsibilities must be paired with a strong bias data needed for prioritized use cases. This strategy
toward impact. Most public-sector agencies have avoids creating data and tool capabilities without
found that legacy waterfall-development life cycles a plan. The iterative process—translating mission
and certification and accreditation processes priorities into requirements and data engineering
are incompatible with AI projects. Agile approaches tasks, generating AI-ready data, and translating
to development—from scrum-based methods data into insights—keeps investments focused and
of leading development efforts to fully mature maximizes their impact.
DevSecOps approaches to continuous delivery—
are central to ensuring that process and culture
are also AI ready. While this change is often slow,
decelerated by risk-averse cultures and long-
established policies, it is a critical element in AI
success stories.

By adopting a mission-based data strategy,

governments can avoid many common roadblocks
and immediately focus their technical talent,

Anusha Dhasarathy is a partner in McKinsey's Chicago office and Ankur Ghia is a senior partner in the Washington, DC,
office, where Sian Griffiths and Rob Wavra are associate partners.