Optimizing Data Controls in Banking VF
Optimizing Data Controls in Banking VF
Optimizing data
controls in banking
Banks need to do more in four important areas of data culture to
build the risk-related data-control capabilities they will need in the
coming decade.
by Tony Ho, Jorge Machado, Satya Parekh, Kayvaun Rowshankish, and John Walsh
© Westend61/Getty Images
July 2020
Over the past decade, banks across the globe with central capabilities under a chief data officer.
have made considerable progress in building risk- Industry leaders are clear, however, that they
related data-control capabilities, prompted in large struggle in four areas: the scope of data programs,
part by regulatory demands. The starting point data lineage, data quality, and transaction testing.3
was the Basel Committee’s BCBS 239 principles,
issued in 2013 to strengthen banks’ risk-related There is considerable variation within the industry
data-aggregation and reporting capabilities. on how to address these four challenging areas, in
Progress, however, has not been uniform, and most investment, degree of risk mitigation, sustainability,
institutions are not fully compliant. In fact, many and automation. A few institutions, however, are
banks are still struggling with major deficiencies, leading the way in improving their data programs
particularly when it comes to data architecture and management and have made great strides
and technology. toward regulatory compliance.
In response, the industry has adopted some As a result, many banks are trying to identify
common, workable solutions in a few key areas. industry best practices for the number of reports
These include data-aggregation capabilities to and types of data to include in their data programs.
support regulatory reporting requirements, such Our industry benchmarking indicates that the
as automating some of the reporting required by average bank’s data program includes 50 reports,
the Federal Reserve in the US and the European 90 metrics, and 1,100 data elements. Interestingly,
Banking Authority (EBA) in Europe,2 preparing over time, we have seen the number of reports in
to collect evidence for regulatory examinations, data programs increase while the number of metrics
and deploying a federated data operating model and data elements decreased (Exhibit 1). We believe
1
BSA/AML refers to the US Bank Secrecy Act (anti–money laundering law) of 1970; CECL is the Current Expected Credit Losses standard
issued by the US Financial Accounting Standards Board in 2016; GDPR is the EU’s General Data Protection Regulation, which came into force
in 2018; CCPA is the California Consumer Privacy Act of 2018; and CCAR is a regulatory framework for comprehensive capital analysis and
review introduced by the US Federal Reserve in 2011.
2
For example, Federal Reserve form FR Y-14M reports monthly data on the loan portfolios of bank holding companies, savings and loan holding
companies, and intermediate holding companies; FR Y-14Q reports quarterly data for the same kinds of institutions on various asset classes,
capital components, and categories of preprovision net revenue. The EBA issued the Common Reporting (COREP) framework as the standard
for capital-requirements reporting; the EBA’s standard for financial reporting is known as FINREP.
3
McKinsey benchmarking survey on data programs with 60 banks, 2020.
Average reports in scope, Average metrics in scope, Average critical data elements
number number in scope, number
80 240 4,000
60 180 3,000
40 120 2,000
20 60 1,000
0 0 0
2015 2016 2017 2018–19 2015 2016 2017 2018–19 2015 2016 2017 2018–19
the increase in reports reflects the inclusion of business decisions and client interactions as well
different nonfinancial risk types, such as operational as regulatory processes.
or compliance risk. The reduction in metrics and
data elements is the result of banks’ attempts to
reduce management costs and efforts and focus Data lineage
only on the most critical metrics and data. Of all data-management capabilities in banking,
data lineage often generates the most debate.
More important than the number of reports, metrics, Data-lineage documents how data flow throughout
and data elements is a bank’s ability to demonstrate the organization—from the point of capture or
to regulators and other stakeholders that the scope origination to consumption by an end user or
of its data program covers the major risks it faces. application, often including the transformations
With this in mind, leading banks have established performed along the way. Little guidance has been
principles to define the scope and demonstrate its provided on how far upstream banks should go
suitability to regulators. Leading institutions usually when providing documentation, nor how detailed
define the scope of their data programs broadly the documentation should be for each “hop” or step
(Exhibit 2). in the data flow. As a result of the lack of regulatory
clarity, banks have taken almost every feasible
For all banks, the application of the principles approach to data-lineage documentation.
illustrated in Exhibit 2 ranges from narrow to broad.
However, supervisors are increasingly advocating In some organizations, data-lineage standards
for a broader scope, and many banks are are overengineered, making them costly and time
complying. Best-in-class institutions periodically consuming to document and maintain. For instance,
expand the scope of their data programs as their one global bank spent about $100 million in just
needs shift. From purely meeting regulatory a few months to document the data lineage for a
objectives, these banks seek to meet business handful of models. But increasingly, overspending
objectives as well. After all, the same data support is more the exception than the rule. Most banks
Risk types ● Main risk types: market, ● All quantitative risk types: ● All risk types: market, credit,
credit, operational market, credit, operational, operational, liquidity, compliance,
liquidity audit, reputational, strategic
Legal-entity ● Material legal entities ● Material legal entities ● All legal entities
data sources¹
Legal-entity ● Bank holding company ● Bank holding company and ● Bank holding company and all legal
alignment² material legal entities entities
Functions and ● Risk ● Risk and all material business ● Risk + all business units + all support
business units units (but not support functions) functions
Audience for ● Board and/or senior ● Board and senior manage- ● Board, senior management, heads of
reports management ment, heads of business units business units and functions, managers
and functions, regulators of functions and businesses, regulators
¹Refers to legal entities whose data feeds into reports within the scope of BCBS 239.
²Refers to legal entities that are independently aligned with BCBS 239 principles including controls, governance, and reporting.
are working hard to extract some business value Most institutions are looking to reduce the expense
from data lineage; for example, by using it as a and effort required to document data lineage by
basis to simplify their data architecture or to spot utilizing increasingly sophisticated technology.
unauthorized data-access points, or even to identify Data-lineage tools have traditionally been platform
inconsistencies among data in different reports. specific, obliging banks to use a tool from the
same vendor that provided their data warehouse
Our benchmarking revealed that more than half or their ETL tools (extract, transform, and load).
of banks are opting for the strictest data-lineage However, newer tools are becoming available
standards possible, tracing back to the system of that can partly automate the data-lineage effort
record at the data-element level (Exhibit 3). We and operate across several platforms. They also
also found that leading institutions do not take a offer autodiscovery and integration capabilities
one-size-fits-all approach to data. The data-lineage based on machine-learning techniques for
standards they apply are more or less rigorous creating and updating metadata and building
depending on the data elements involved. For interactive data-lineage flows. These tools are
example, they capture the full end-to-end data not yet widely available and have no proven
lineage (including depth and granularity) for critical market leaders, so some banks are experimenting
data elements, while data lineage for less critical with more than one solution or are developing
data elements extends only as far as systems of proprietary solutions.
record or provisioning points.
Nearly half
Nearly halfof
of aa responding
responding sample
sample of banks produce data
data lineage
lineage at
atthe
the
data-element level
data-element levelback
backtotothe
thesystem
systemofof
record.
record.
Data lineage, % share of respondents (n = 37)
More
Data-element level
including transforma-
tions (including interim 0 5.5 8.1 13.5
steps within systems)
Data-element level
(hops or transformation
for each data element 2.7 16.2 8.1 16.2
between systems)
Granularity
File level (hops or
transformations for all
data elements in a file) 0 2.7 0 5.5
Less
Don’t know Provisioning System of System of
point record origin
(data ware- (data (origination
house for repository applications
transactional where data is or paper
data and data first recorded) files)
marts for risk)
Other ways to reduce the data-lineage effort is maintained and usable through IT upgrades.
include simplifying the data architecture. For Report owners are expected to periodically review
example, by establishing an enterprise data lake, a and certify the lineage documentation to identify
global bank reduced the number of data hops for necessary updates.
a specific report from more than a hundred to just
three. Some institutions also use random sampling
to determine when full lineage is needed, especially Data quality
for upstream flows that are especially manual in Improving data quality is often considered one of the
nature and costly to trace. Another possibility is to primary objectives of data management. Most banks
adjust the operating model. For instance, banking have programs for measuring data quality and for
systems change quickly, so element-level lineages analyzing, prioritizing, and remediating issues that
go out of date just as fast. To tackle this issue, are detected. They face two common challenges.
some banks are embedding tollgates on change First, thresholds and rules are specific to each bank,
processes to ensure that the documented lineage with little or no consistency across the industry.
Although some jurisdictions have attempted to techniques to identify existing and emerging data-
define standards for data-quality rules, these failed quality issues and accelerate remediation efforts
to gain traction. Second, remediation efforts often (Exhibit 4).
consume significant time and resources, creating
massive backlogs at some banks. Some institutions One institution identified accuracy issues by
have resorted to establishing vast data-remediation using machine-learning clustering algorithms to
programs with hundreds of dedicated staff involved analyze a population of loans and spot contextual
in mostly manual data-scrubbing activities. anomalies, such as when the value of one attribute
is incongruent with that of other attributes. Another
Banks are starting to implement better processes bank applied artificial intelligence and natural-
for prioritizing and remediating issues at scale. To language processing to hundreds of thousands of
this end, some are setting up dedicated funds to records to predict accurately a customer’s missing
remediate data-quality issues more rapidly, rather occupation. To do this the program used information
than relying on the standard, much slower IT captured in free-form text during onboarding and
prioritization processes. This approach is especially integrated this with third-party data sources.
helpful for low- or medium-priority issues that might
not otherwise receive enough attention or funding. Leading institutions are revising and enhancing their
entire data-control framework. They are developing
As data-quality programs mature, three levels of holistic risk taxonomies that identify all types of
sophistication in data-quality controls are emerging data risks, including for accuracy, timeliness, or
among banks. The first and most common uses completeness. They are choosing what control
standard reconciliations to measure data quality types to use, such as rules, reconciliation, or data-
in completeness, consistency, and validity. At the capture drop-downs, and they are also setting the
second level, banks apply statistical analysis to minimum standards for each control type—when
detect anomalies that might indicate accuracy the control should be applied and who shall define
issues. These could be values beyond three the threshold, for example. Banks are furthermore
standard deviations, or values that change by pushing for more sophisticated controls, such as
more than 50 percent in a month. At the third those involving machine learning, as well as greater
and most sophisticated level, programs use levels of automation throughout the end-to-end
artificial intelligence and machine learning–based data life cycle.
Tony Ho is an associate partner in McKinsey’s New York office, where Jorge Machado and Kayvaun Rowshankish are
partners; Satyajit Parekh is a knowledge expert in the Waltham office, and John Walsh is a senior adviser in the Washington,
DC, office.