100% found this document useful (1 vote)
193 views8 pages

Optimizing Data Controls in Banking VF

Optimizing data control in banking
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
193 views8 pages

Optimizing Data Controls in Banking VF

Optimizing data control in banking
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Risk Practice

Optimizing data
controls in banking
Banks need to do more in four important areas of data culture to
build the risk-related data-control capabilities they will need in the
coming decade.

by Tony Ho, Jorge Machado, Satya Parekh, Kayvaun Rowshankish, and John Walsh

© Westend61/Getty Images

July 2020
Over the past decade, banks across the globe with central capabilities under a chief data officer.
have made considerable progress in building risk- Industry leaders are clear, however, that they
related data-control capabilities, prompted in large struggle in four areas: the scope of data programs,
part by regulatory demands. The starting point data lineage, data quality, and transaction testing.3
was the Basel Committee’s BCBS 239 principles,
issued in 2013 to strengthen banks’ risk-related There is considerable variation within the industry
data-aggregation and reporting capabilities. on how to address these four challenging areas, in
Progress, however, has not been uniform, and most investment, degree of risk mitigation, sustainability,
institutions are not fully compliant. In fact, many and automation. A few institutions, however, are
banks are still struggling with major deficiencies, leading the way in improving their data programs
particularly when it comes to data architecture and management and have made great strides
and technology. toward regulatory compliance.

One major reason for this limited progress is


that the Basel Committee called for effective Scope of data programs
implementation of BCBS 239 principles without Banks need to define the scope of their data
clearly explaining what that means or how to programs clearly enough to create a basis for
implement them. This ambiguity has led to a wide easily conversing with regulators and identifying
range of interpretations, which vary from institution additional actions necessary for regulatory
to institution, country to country, and even regulator compliance. Most banks have defined the scope
to regulator. At the same time, a host of other of their data programs to include pertinent reports,
regulations with substantial data implications the metrics used in them, and their corresponding
have emerged, particularly those involving stress input-data elements. Thus a credit-risk report
testing (CCAR in the United States), data privacy or a report on strategic decision making might
(CCPA in the US, GDPR in Europe), BSA/AML, be covered, as well as risk-weighted assets as a
and CECL.1 As might be expected, banks have a metric and the principal loan amounts as an input.
monumental task in analyzing the layers of data Unfortunately, the industry has no set rules for
requirements across all these regulations and how broadly or narrowly to define the scope of a
building common and reusable capabilities that data program or what standard metrics or data
meet regulatory expectations. elements to include.

In response, the industry has adopted some As a result, many banks are trying to identify
common, workable solutions in a few key areas. industry best practices for the number of reports
These include data-aggregation capabilities to and types of data to include in their data programs.
support regulatory reporting requirements, such Our industry benchmarking indicates that the
as automating some of the reporting required by average bank’s data program includes 50 reports,
the Federal Reserve in the US and the European 90 metrics, and 1,100 data elements. Interestingly,
Banking Authority (EBA) in Europe,2 preparing over time, we have seen the number of reports in
to collect evidence for regulatory examinations, data programs increase while the number of metrics
and deploying a federated data operating model and data elements decreased (Exhibit 1). We believe

1
BSA/AML refers to the US Bank Secrecy Act (anti–money laundering law) of 1970; CECL is the Current Expected Credit Losses standard
issued by the US Financial Accounting Standards Board in 2016; GDPR is the EU’s General Data Protection Regulation, which came into force
in 2018; CCPA is the California Consumer Privacy Act of 2018; and CCAR is a regulatory framework for comprehensive capital analysis and
review introduced by the US Federal Reserve in 2011.
2
For example, Federal Reserve form FR Y-14M reports monthly data on the loan portfolios of bank holding companies, savings and loan holding
companies, and intermediate holding companies; FR Y-14Q reports quarterly data for the same kinds of institutions on various asset classes,
capital components, and categories of preprovision net revenue. The EBA issued the Common Reporting (COREP) framework as the standard
for capital-requirements reporting; the EBA’s standard for financial reporting is known as FINREP.
3
McKinsey benchmarking survey on data programs with 60 banks, 2020.

2 Optimizing data controls in banking


Exhibit 1
The
The scope
scope of banks’ data programs has
has varied
varied considerably over time.
Reports, metrics, and data elements in scope
North and South America Overall Rest of world

Average reports in scope, Average metrics in scope, Average critical data elements
number number in scope, number

80 240 4,000

60 180 3,000

40 120 2,000

20 60 1,000

0 0 0
2015 2016 2017 2018–19 2015 2016 2017 2018–19 2015 2016 2017 2018–19

the increase in reports reflects the inclusion of business decisions and client interactions as well
different nonfinancial risk types, such as operational as regulatory processes.
or compliance risk. The reduction in metrics and
data elements is the result of banks’ attempts to
reduce management costs and efforts and focus Data lineage
only on the most critical metrics and data. Of all data-management capabilities in banking,
data lineage often generates the most debate.
More important than the number of reports, metrics, Data-lineage documents how data flow throughout
and data elements is a bank’s ability to demonstrate the organization—from the point of capture or
to regulators and other stakeholders that the scope origination to consumption by an end user or
of its data program covers the major risks it faces. application, often including the transformations
With this in mind, leading banks have established performed along the way. Little guidance has been
principles to define the scope and demonstrate its provided on how far upstream banks should go
suitability to regulators. Leading institutions usually when providing documentation, nor how detailed
define the scope of their data programs broadly the documentation should be for each “hop” or step
(Exhibit 2). in the data flow. As a result of the lack of regulatory
clarity, banks have taken almost every feasible
For all banks, the application of the principles approach to data-lineage documentation.
illustrated in Exhibit 2 ranges from narrow to broad.
However, supervisors are increasingly advocating In some organizations, data-lineage standards
for a broader scope, and many banks are are overengineered, making them costly and time
complying. Best-in-class institutions periodically consuming to document and maintain. For instance,
expand the scope of their data programs as their one global bank spent about $100 million in just
needs shift. From purely meeting regulatory a few months to document the data lineage for a
objectives, these banks seek to meet business handful of models. But increasingly, overspending
objectives as well. After all, the same data support is more the exception than the rule. Most banks

Optimizing data controls in banking 3


Exhibit 2
Best-in-class institutionsusually
Best-in-class institutions usually define
define the
the scope
scope of
of their
their data
data programs
programs broadly.
broadly.
Dimensions and characterizations in data program scope
Characterization

Dimensions Narrow Targeted Broad

Risk types ● Main risk types: market, ● All quantitative risk types: ● All risk types: market, credit,
credit, operational market, credit, operational, operational, liquidity, compliance,
liquidity audit, reputational, strategic

Legal-entity ● Material legal entities ● Material legal entities ● All legal entities
data sources¹

Legal-entity ● Bank holding company ● Bank holding company and ● Bank holding company and all legal
alignment² material legal entities entities

Functions and ● Risk ● Risk and all material business ● Risk + all business units + all support
business units units (but not support functions) functions

Audience for ● Board and/or senior ● Board and senior manage- ● Board, senior management, heads of
reports management ment, heads of business units business units and functions, managers
and functions, regulators of functions and businesses, regulators

¹Refers to legal entities whose data feeds into reports within the scope of BCBS 239.
²Refers to legal entities that are independently aligned with BCBS 239 principles including controls, governance, and reporting.

are working hard to extract some business value Most institutions are looking to reduce the expense
from data lineage; for example, by using it as a and effort required to document data lineage by
basis to simplify their data architecture or to spot utilizing increasingly sophisticated technology.
unauthorized data-access points, or even to identify Data-lineage tools have traditionally been platform
inconsistencies among data in different reports. specific, obliging banks to use a tool from the
same vendor that provided their data warehouse
Our benchmarking revealed that more than half or their ETL tools (extract, transform, and load).
of banks are opting for the strictest data-lineage However, newer tools are becoming available
standards possible, tracing back to the system of that can partly automate the data-lineage effort
record at the data-element level (Exhibit 3). We and operate across several platforms. They also
also found that leading institutions do not take a offer autodiscovery and integration capabilities
one-size-fits-all approach to data. The data-lineage based on machine-learning techniques for
standards they apply are more or less rigorous creating and updating metadata and building
depending on the data elements involved. For interactive data-lineage flows. These tools are
example, they capture the full end-to-end data not yet widely available and have no proven
lineage (including depth and granularity) for critical market leaders, so some banks are experimenting
data elements, while data lineage for less critical with more than one solution or are developing
data elements extends only as far as systems of proprietary solutions.
record or provisioning points.

4 Optimizing data controls in banking


Exhibit 3

Nearly half
Nearly halfof
of aa responding
responding sample
sample of banks produce data
data lineage
lineage at
atthe
the
data-element level
data-element levelback
backtotothe
thesystem
systemofof
record.
record.
Data lineage, % share of respondents (n = 37)

More
Data-element level
including transforma-
tions (including interim 0 5.5 8.1 13.5
steps within systems)

Data-element level
(hops or transformation
for each data element 2.7 16.2 8.1 16.2
between systems)
Granularity
File level (hops or
transformations for all
data elements in a file) 0 2.7 0 5.5

System level (feeds


between systems) 2.7 5.5 5.5 5.5

Less
Don’t know Provisioning System of System of
point record origin
(data ware- (data (origination
house for repository applications
transactional where data is or paper
data and data first recorded) files)
marts for risk)

Less Upstream More

Other ways to reduce the data-lineage effort is maintained and usable through IT upgrades.
include simplifying the data architecture. For Report owners are expected to periodically review
example, by establishing an enterprise data lake, a and certify the lineage documentation to identify
global bank reduced the number of data hops for necessary updates.
a specific report from more than a hundred to just
three. Some institutions also use random sampling
to determine when full lineage is needed, especially Data quality
for upstream flows that are especially manual in Improving data quality is often considered one of the
nature and costly to trace. Another possibility is to primary objectives of data management. Most banks
adjust the operating model. For instance, banking have programs for measuring data quality and for
systems change quickly, so element-level lineages analyzing, prioritizing, and remediating issues that
go out of date just as fast. To tackle this issue, are detected. They face two common challenges.
some banks are embedding tollgates on change First, thresholds and rules are specific to each bank,
processes to ensure that the documented lineage with little or no consistency across the industry.

Optimizing data controls in banking 5


Banks are pushing for more
sophisticated controls, such as those
involving machine learning, as well as
greater levels of automation throughout
the end-to-end data life cycle.

Although some jurisdictions have attempted to techniques to identify existing and emerging data-
define standards for data-quality rules, these failed quality issues and accelerate remediation efforts
to gain traction. Second, remediation efforts often (Exhibit 4).
consume significant time and resources, creating
massive backlogs at some banks. Some institutions One institution identified accuracy issues by
have resorted to establishing vast data-remediation using machine-learning clustering algorithms to
programs with hundreds of dedicated staff involved analyze a population of loans and spot contextual
in mostly manual data-scrubbing activities. anomalies, such as when the value of one attribute
is incongruent with that of other attributes. Another
Banks are starting to implement better processes bank applied artificial intelligence and natural-
for prioritizing and remediating issues at scale. To language processing to hundreds of thousands of
this end, some are setting up dedicated funds to records to predict accurately a customer’s missing
remediate data-quality issues more rapidly, rather occupation. To do this the program used information
than relying on the standard, much slower IT captured in free-form text during onboarding and
prioritization processes. This approach is especially integrated this with third-party data sources.
helpful for low- or medium-priority issues that might
not otherwise receive enough attention or funding. Leading institutions are revising and enhancing their
entire data-control framework. They are developing
As data-quality programs mature, three levels of holistic risk taxonomies that identify all types of
sophistication in data-quality controls are emerging data risks, including for accuracy, timeliness, or
among banks. The first and most common uses completeness. They are choosing what control
standard reconciliations to measure data quality types to use, such as rules, reconciliation, or data-
in completeness, consistency, and validity. At the capture drop-downs, and they are also setting the
second level, banks apply statistical analysis to minimum standards for each control type—when
detect anomalies that might indicate accuracy the control should be applied and who shall define
issues. These could be values beyond three the threshold, for example. Banks are furthermore
standard deviations, or values that change by pushing for more sophisticated controls, such as
more than 50 percent in a month. At the third those involving machine learning, as well as greater
and most sophisticated level, programs use levels of automation throughout the end-to-end
artificial intelligence and machine learning–based data life cycle.

6 Optimizing data controls in banking


Exhibit 4
Banks
Banksare
arestarting
startingto
touse
useartificial
artificialintelligence
intelligenceto
tomanage
manage data
data quality.
quality.
Finding errors, improving quality, and validating models
Example: Identifying errors in
residential-mortgage data
Future Bank deployed algorithms (such as
Identify emerging Remediate emerging k-NN global anomaly detection) to
issues issues detect outliers and identify data-
quality issues in residential mortgages
Identify anomalies that are Fix emerging data-quality
likely to be errors using issues before they flow
algorithms such as into downstream report- Example: Validating machine-
isolation forests ing, analytics, and deci- learning models
sion-making systems
Bank applied machine learning to
review input data for protected-class
Time bias before using it to develop models
horizon and analytics to drive decision making
Find existing issues Fix past issues

Pinpoint unidentified Use recommendation Example: Improving data quality


data-quality issues that systems to propose for anti–money laundering
are similar to known changes in data elements
issues by using supervised (using NLP), then have Bank employed neurolinguistic
machine learning such as them reviewed by humans programming (Word2vec) to fix over
regression trees to fix data-quality issues 5 million occupation codes that were
quickly contributing to false positives in
enhanced due diligence
Past
Identify Strategy Remediate

Transaction testing because they increasingly recognize that


Transaction testing, also referred to as data tracing maintaining high-quality data can lead to better
or account testing, involves checking whether the strategic decision making, permit more accurate
reported value of data at the end of the journey modeling, and improve confidence among
matches the value at the start of the journey (the customers and shareholders.
source). Banks use transaction testing to assess
the validity and accuracy of data used in key reports Banks with distinctive transaction-testing
and to determine if “black box” rules have been capabilities shine in three areas. First, they have
implemented correctly. Banks utilize a spectrum well-defined operating models that conduct
of different transaction-testing approaches, with transaction testing as an ongoing exercise (rather
single testing cycles taking between a few weeks than a one-off effort), with clearly assigned roles,
and nine months to complete. procedures, and governance oversight. The findings
from transaction tests are funneled into existing
Regulators are putting pressure on banks to data-governance processes that assess the impact
strengthen their transaction-testing capabilities of identified issues and remediate them.
through direct regulatory feedback and by
conducting their own transaction tests at several Second, they strategically automate and expedite
large banks. At the same time, many banks are transaction testing, utilizing modern technology
inclined to focus more on transaction testing and tools. While no tools exist that span the

Optimizing data controls in banking 7


end-to-end process, leading banks are using problematic accounts. The review or testing of these
a combination of best-in-class solutions for critical samples is often done at an account level (rather
capabilities (such as document management and than a report level) to allow for cross-report integrity
retrieval), while building wraparound workflows checks, which examine the consistency of data
for integration. across similar report disclosures.

Finally, they apply a risk-based approach to define


their transaction-testing methodology. For example,
leading banks often select the population for testing Although banks have in general made fair progress
by combining data criticality and materiality with with data programs, their approaches to building
other considerations. These could include the data-management capabilities vary greatly in
persistence or resolution of issues identified in cost, risk, and value delivered. In the absence of
previous tests. Similarly, the size and selection of more coordinated guidance from regulators, it is
samples from that population will be related to the incumbent upon the banking industry to pursue
population’s risk characteristics. While most leading a broader and more harmonized data-control
banks opt for a minimum sample size and random framework based on the risks that need to be
sampling, some also use data profiling to inform their managed and the pace of automation to ensure data
sampling, pulling in more samples from potentially efforts are sustainable.

Tony Ho is an associate partner in McKinsey’s New York office, where Jorge Machado and Kayvaun Rowshankish are
partners; Satyajit Parekh is a knowledge expert in the Waltham office, and John Walsh is a senior adviser in the Washington,
DC, office.

Copyright © 2020 McKinsey & Company. All rights reserved.

8 Optimizing data controls in banking

You might also like