Accelerating AI Impact by Taming The Data Beast
Accelerating AI Impact by Taming The Data Beast
Accelerating AI
impact by taming the
data beast
Government agencies seeking to deploy artificial intelligence face
hurdles in data awareness, availability, and quality. A five-step,
mission-based data strategy can help sidestep these challenges.
© Z_Wei/Getty Images
March 2020
Artificial intelligence (AI) has the power to private-sector organizations, but it’s often in
dramatically enhance the way public-sector unusable, inconsistent formats. On average, only
agencies serve their constituents, tackle their 3 percent of an organization’s data meet the quality
most vexing issues, and get the most out of standards needed for analytics.² And unlike tools,
their budgets. Several converging factors are infrastructure, or talent, a complete set of AI-ready
pressuring governments to embrace AI's potential. data cannot typically be purchased because an
As citizens become more familiar with the power agency’s unique use cases and mission demand
of AI through digital banking, virtual assistants, bespoke data inputs.
and smart e-commerce, they are demanding
better outcomes from their governments. Similarly, The most powerful AI solutions often require
public servants are pushing for private sector–like a cocktail of internal data about constituents,
solutions to boost on-the-job effectiveness. At the programs, and services as well as external data from
same time, AI technology is maturing rapidly and other agencies and third parties for enrichment. The
being incorporated into many offerings, making it core—existing internal agency data—is often in a
increasingly accessible to all organizations. format and a quality that make it incompatible
with AI approaches. A Socrata survey highlighted
Most government agencies around the world do these challenges:³
not yet have all of the building blocks of successful
AI programs—clear vision and strategy, budget, — Only 45 percent of developers agreed that
high-quality available data, and talent—in place. government data was clean and accurate; the
Even as AI strategy is formulated, budget secured, same percent agreed that it was in a usable
and talent attracted, data remains a significant format for their work
stumbling block. For governments, getting all of an — Less than 35 percent thought it was
organization’s data “AI ready” is difficult, expensive, well documented
and time-consuming (see sidebar, “AI-ready data
defined”), limiting the impact of AI to pilots and
projects within existing silos. In addition, sharing data between agencies
often requires an intergovernmental agreement
How can governments get past pilots and (IGA)—which can take years to secure, even
proofs-of-concept to achieve broader results? with the most willing counterparties. Within a single
To raise the return on AI spending, leading national agency, policy restrictions require signed
organizations are prioritizing use cases and data-sharing agreements and adherence to
narrowing their aperture to focus only on improving multiple security standards. State agencies face
the data necessary to create an impact with AI. A similar problems with inconsistent confidentiality,
five-step, mission-driven process can ensure data privacy requirements, and legal frameworks
meets all AI requirements and that every dollar for sharing data. The result is a hodgepodge of
invested generates tangible improvements. conflicting memorandums of understanding
and IGAs.
1
Oliver Fleming, Tim Fountaine, Nicolaus Henke, and Tamim Saleh, “Ten red flags signaling your analytics program will fail,” May 2018,
McKinsey.com.
2
Tadhg Nagle et al., “Only 3% of companies’ data meets basic quality standards,” Harvard Business Review, September 11, 2017, hbr.org.
3
Developers Rate the Current State of Gov Data Accessibility, Socrata, updated August 23, 2016, benchmarkstudy.socrata.com.
AI-ready data
AI-ready data defined
defined.
Data that can support artificial intelligence (AI) solutions must meet five criteria:
1 Known
The agency is aware of its available enterprise and local data sources.
Understood
2 Users and leaders are aware of what’s in the data set (and what isn’t), where it came from (its provenance and lineage),
as well as its format, size, and potential to link to other data sets.
3 Available
Data must “live” somewhere that makes it available to users and analysts doing AI work.
Secure
5 The data are being handled appropriately and are compliant with information security guidelines, confidentiality, relevant
civil rights and civil liberties rules, and data privacy regimes (for example, the General Data Protection Regulation).
the organization. According to a McKinsey Global data quality and underlying systems through
Survey about AI capabilities, only 8 percent of surgical fixes.
respondents across industries said their AI-relevant
data are accessible by systems across the All of these factors make getting data AI ready
organization.⁴ Data-quality issues are compounded expensive and time-consuming; the undertaking
by the fact that governments have a multitude of also demands talent that is not always available in
different systems, some of which are obsolete, so the public sector. It also puts years of IT projects
aggregating data can be exceedingly difficult. Both and data cleansing between current citizen needs
state and federal agencies grapple with aging and the impact of AI-enabled solutions. The number
infrastructure: in some instances, the whole stack of records needed for effective analytics can range
of hardware, data storage, and applications is still in from hundreds to millions (see box, "Number of
use—decades after reaching end of life. And annual records needed for effective analytics").
budget cycles make it difficult to implement long-
term fixes.
Five steps to AI-ready data
The scale of the challenge can lead government The best way for public-sector agencies to start
officials to take a slower, more comprehensive their AI journey is by defining a mission-based data
approach to data management. Realizing the strategy that focuses resources on feasible use
importance of data to AI, agencies often focus their cases with the highest impact, naturally narrowing
initial efforts on integrating and cleaning data, the number of data sets that need to be made AI
with the goal of creating an AI-ready data pool over ready. In other words, governments can often
hundreds or even thousands of legacy systems. accelerate their AI efforts by emphasizing impact
A more effective approach focuses on improving over perfection.
4
Michael Chui and Sankalp Malhotra, “AI adoption advances, but foundational barriers remain,” November 2018, McKinsey.com.
Box
In addition, while prioritizing use cases, users, interviews with technical experts and tenured
governments should ensure that data sources business staff, and the use of smart or automated
are available and that the organization is building data discovery tools to quickly map and categorize
familiarity and expertise with the most important agency data.
sources over time.
One federal agency, for example, led a digital
Proper planning can allow bundling of related use assessment of its enterprise data to highlight the
cases—that is, exploiting similar tools and data most important factors for achieving enhanced
sets, reducing the time required to implement operational effectiveness and cost savings. It built
use cases. By expending resources only on use a data catalog that allowed data practitioners
cases prioritized by mission impact and feasibility, throughout the agency to find and access available
governments can ensure investments are closely data sets.
tied to direct, tangible mission results and outcomes.
These early wins can build support and excitement 2. Evaluate the quality and completeness of
within agencies for further AI efforts. data sets
Since the prioritized use cases will require a limited
Governments can select the appropriate data number of data sets, agencies should assess the
sets and ensure they meet the AI-ready criteria by state of these sources to determine whether they
following five steps. meet a baseline for quality and completeness. At
a national customs agency, business leaders and
1. Build a use case–specific data catalog analytics specialists selected priority use cases and
The chief data officer, chief information officer, or then audited the relevant data sets. Moving forward
data domain owner should work with business on the first tranche of use cases tapped less than
leaders to identify existing data sets that are related 10 percent of the estimated available data.
to prioritized use cases, who owns them, in which
systems they live, and how one gains access. Data- In many instances, agencies have a significant
discovery approaches must be tailored to specific opportunity to tailor AI efforts to create impact
agency realities and architectures. Many successful with available data and then refine this approach
efforts to build data catalogs for AI include direct over time. A state-level government agency was
collaboration with line- and supervisor-level system able to use data that already existed and predictive
Anusha Dhasarathy is a partner in McKinsey's Chicago office and Ankur Ghia is a senior partner in the Washington, DC,
office, where Sian Griffiths and Rob Wavra are associate partners.