Ifcb50 06
Ifcb50 06
1 This presentation was prepared for the meeting. The views expressed are those of the author and do not necessarily reflect the views of the BIS, the IFC or the central banks
and other institutions represented at the meeting.
Big Data for Central Banks
Bruno TISSOT
Head of Statistics and Research Support, BIS
Head of Secretariat, Irving Fisher Committee on Central Bank Statistics (IFC)
International Workshop on Big Data for Central Bank Policies – Bali, 23-25 July 2018
Session 1
The views expressed are those of the author and do not necessarily reflect those of the BIS or the IFC.
Overview
Introduction
Financial Big Data
Three key developments
Challenges in handling and using big data
Analysing CBs’ experiences
Annexes: Selected references/ BD projects by CBs
2
Introduction – Big Data…
• General & increasing policy interest for “Big Data” (BD)
→ the world’s most valuable resource is no longer oil, but data (The Economist)
• Term usually describes
Extremely large data-sets
Often a by-product of commercial or social activities
Huge amount of granular information, typically transaction-level
Data available in, or close to, real time
Used to identify behavioural patterns / economic trends
4
Introduction – … with significant opportunities…
• Focus on sources that can effectively support micro- and macro-
economic as well as monetary and financial stability analyses
Other big data – eg geospatial information – of lower interest
• Big data provide new “business opportunities” for CBs, such as:
Qualitative statements to decipher central banks’ communication
Large number of big data pools generated by financial regulations
In turn, big data can strengthen supervisors’ capacity
5
Introduction – … but also challenges…
• Specific challenges faced in handling and using big data
Public nature of financial authorities and public trust
Central banks concerned about ethical & reputational consequences
Risk of misusing big data for policy actions?
6
Introduction – … not least due to security concerns…
• Increasing security concerns linked to internet / big data, such as:
Risk that large private records of individual information could be accessed
and potentially misused by unauthorized third-parties
Resilience of financial market infrastructures
7
Introduction – … with the risk of being behind…
• CBs’ constraints compared to private firms
Basic resources needs (IT budget, staff)
Concerns about the lack of transparency in methodologies
Poor quality of some data sources hampering public use
8
Introduction – … and the need to be proactive
• Key objective for central banks is to better understand
The new data-sets and related methodologies for their analysis
The value added in comparison with “traditional” statistics
• Yet:
Not sufficient to be large to qualify as “big data” – cf census
Unstructured data require new tools to be processed
Structured data-sets handled with “traditional” techniques?
12
I – Financial Big Data: … some judgment…
• Room for judgment, depends on features such as the “Vs”
Volume (number of records and attributes)
Velocity (speed of data production, eg tick data)
Variety (for instance structure and format)
Veracity (accuracy / uncertainty of large individual records)
Valence (interconnectedness of the data)
Value (often a by-product of an activity, can trigger a monetary reward)
15
I – Financial Big Data: …with overlaps…
• Increasing part of the information collected on the web can be
the result of financial, commercial or administrative activities
• Cf recent expansion of “Fintech”
“Technology-enabled innovation in financial services that could result in
new business models, applications, processes or products with an associated
material effect on the provision of financial services” (FSB, 2017)
Parallel innovations: big data, mobile phone, internet, artificial intelligence
• Multiple applications that blur traditional boundaries
Digital currencies (Bitcoin)
Various applications in payments, crowdfunding, smart contracts, robot
advice, credit risk assessments & contract pricing
16
I – Financial Big Data: … practical issues…
• In practice CBs deal with various & heterogeneous “big data”
Usually not directly produced for a specific statistical purpose, as in the
cases of traditional census or survey exercises
Indirectly, data sources can be exploited for addressing statistical
information needs that may independently exist
“Smart data”: treatment of the raw, “organic” data is key
17
I – Financial Big Data: … complexity…
• Micro-level BD universe is complex and evolves over time
Interaction between data available and specific policy needs
18
I – Financial Big Data: … example
• Example: BIS International Debt Security issuance statistics
Micro aggregation derived from large security-by-security data-sets
Data collection based on a “traditional” residency concept…
… and a "nationality basis“ (include debt issued by foreign affiliates)
20
II – 3 Developments: Internet of things (1: new data)
21
II – 3 Developments: Internet of things (2: inflation)
• Example: “scraping” prices posted online by retailers
Exercises typically limited to specific inflation components (eg volatile
fresh vegetables’ prices)
Process appears robust, scalable and can be automatised
Important challenges: capturing unit-level prices, product characteristics,
quantities, adequate weights
22
II – 3 Developments: Internet of things (3: house prices)
• Example: collection of housing prices on the web
Scraping prices displayed by real estate agencies
Capturing the various housing characteristics posted in advertisements
facilitates the calculation of quality effects (hedonic prices)
• Challenges
Collecting the information in a comprehensive & structured way
Weighting schemes
• But this use has been relatively incremental and limited, even
for national statistical agencies in advanced economies, and often
targeted at:
Methodological improvements (eg quality adjustment)
Reducing reporting lags and data revisions
Alternative to the organisation of large surveys (eg India)
24
II – 3 Developments: Internet of things (5: new insights)
• Possibility of capturing unsuspected data patterns
“Traditional” statistical modelling to infer economic relationships
BD algorithms to incorporate various effects without ex ante assumptions
Techniques can be implemented easily and in an automated way
25
II – 3 Developments: Internet of things (6: drawbacks)
• Data quality issues
Errors, typos and self-fulfilling expectations
Need to collect consistent information but goods are not kept identical
Announcement prices can differ from actual transaction prices
Advertisements remain posted after economic transactions are settled
Accuracy of the information that individuals (or robots!) input to the web
• Key limitation is that the data are not well structured
Details on the location of a transaction / job offer difficult to get
Underlying information can be collected several times
26
II – 3 Developments: Internet of things (7: challenges)
• Technical challenges
Use of new techniques (eg web-scraping) and methodologies
27
II – 3 Developments: Digitalisation (1: new information)
• Expanded access to digitalised information
Rise in textual information moving to the web (while not produced by
internet activities strictly speaking)
Reference documents can be digitalised, accessed and analysed like
“web-based” indicators
• Can be more easily and automatically exploited through ad hoc
BD techniques: eg text semantic analysis
Extraction of textual information of interest
Characterising text attributes and similarities
Classifying information content (eg tone of central banks’ messages)
Assessing the impact of external factors (eg circumstances, policy actions)
28
II – 3 Developments: Digitalisation (2: new opportunities)
• Techniques can also be used to measure impact on economic
agents’ expectations
• Structured way to assess policy communication
Perceived stance of public authorities’ communication
Impact of this communication / action in view of the messages expressed
in reaction by stakeholders
Formation of public expectations
30
II – 3 Developments: New financial statistics (2: CBs’ interest)
31
II – 3 Developments: New financial statistics (3: CCRs)
• Example: rising demand for detailed loan-by-loan / security-
by-security information
Central credit registries (CCRS) have become the largest data-sets
maintained by some central banks
Europe’s AnaCredit: “analytical credit dataset”
US FRBNY Consumer Credit Panel: detailed information on consumer
debt and credit derived from individuals’ reports
• Data are well structured, but reporting is highly granular
Multiple attributes: 200 attributes per data point on a monthly basis (and
on a daily basis for a subset) for AnaCredit
Often complex to aggregate / analyse
32
II – 3 Developments: New financial statistics (4: specificities)
• Information often derived from confidential operations (tax
registers, banks’ books)
Richness across the population of interest (eg capturing very small
enterprises)
Usually collected regularly over a long period of time
But need for anonymization / confidentiality protection
• CBs learning from private sector
Increased experience in dealing with large data-sets (eg production of
“stress tests”)
Supervisors of financial firms to develop their expertise in these areas too
33
III – Challenges
• Handling big datasets requires significant resources and proper
arrangements for managing the information
• Using big data in policy-making creates opportunities but is not
without risks
• Key implications
Explains why public authorities’ actual use of big data is still limited, at
least in comparison to the private industry
Significant time and effort needed before any regular production of big
data-based information for supporting CBs’ statistical and analytical work
on a large scale
34
III – Challenges in handling big data (1)
• Resources and proper arrangements for managing BD
Sheer size of the data-sets
Lack of structure
Often limited quality of raw data
35
III – Challenges in handling big data (2)
• Need to set up a clear and comprehensive information
management process
Data acquisition
Data preparation
Data processing
Data validation
• A major area is IT
Large processing costs, difficult & expensive technology choices
Sophisticated statistical techniques: “BD algorithms”, “ML techniques”, “AI”
Public authorities with less budget compare to private sector
36
III – Challenges in handling big data (3)
• New issues in terms of confidentiality protection and security
Large amount of data provided by users through their web-based activities
Large financial datasets require the handling of transaction-level,
potentially highly confidential, information
Data privacy issues may increase with the development of big data and
Fintech firms
39
III – Challenges in handling big data (6)
• How to enhance existing information management processes?
Goal: flexible production of relevant information out of data points
“Traditional”, template-driven data collections to be replaced by accessing
granular data from various sources
• Requirements
Greater harmonisation of data-sets, statistical standards, identifiers and
dictionaries
International efforts eg to develop global Legal Entity Identifiers &
automated data exchanges standards (XBRL, SMDX, ISO 20022)
40
III – Challenges in handling big data (7)
• Better integration of various IT systems among both authorities
and reporting entities
• Big data opportunities for policy use but is not without risks
Immediate benefits: lower production costs, new insights, production
speed
To be balanced against potential large economic and social costs of
misguided policy decisions
42
III – Challenges in using big data (2)
44
IV – Analysing CBs’ experiences (1)
• Proper information management frameworks needed to make
the most of big data so as to:
Address challenges faced when handling and using big data
Avoid the risk of focussing on cumbersome management tasks – cleaning,
documenting, organising data – instead of using the information
• Various ingredients:
Proper IT infrastructure
Adequate statistical applications (including big data analytics)
Legal and HR support in terms of skill-sets
Good co-ordination to have a consistent and holistic information
production chain
47
IV – Analysing CBs’ experiences (4)
• Central banks have already started to rethink their information
management processes to:
Be able to access internet data-sets and big data techniques
Handle the new data collections initiated after the GFC
48
Annex (1): Selected references
Bank for International Settlements (BIS) (2018). Cryptocurrencies: looking beyond the hype, BIS Annual Economic Report, chapter V.
Bean, C. (2016). Independent review of UK economic statistics, March.
Bholat, D. (2015). Big data and central banks, Bank of England, Quarterly Bulletin, March 2015.
Borio, C. (2013). The Great Financial Crisis: setting priorities for new statistics, Journal of Banking Regulation.
Caruana, J. (2017). International financial crises: new understandings, new data, Speech at the National Bank of Belgium, Brussels, February.
Cavallo, A., & Rigobon, R. (2016). The Billion Prices Project: Using Online Prices for Measurement and Research, Journal of Economic Perspectives, Spring 2016, Vol 30(2):
151-78.
Cœuré, B. (2017). Policy analysis with big data, speech at the conference on “Economic and Financial Regulation in the Era of Big Data”, Banque de France, Paris, November.
Financial Stability Board (FSB) (2017). Financial Stability Implications from FinTech.
Glass, E. (2016): Survey analysis – Big data in central banks, Central Banking Focus Report, 2016.
Haldane, A. G. (2018). Will Big Data Keep Its Promise? Speech at the Bank of England Data Analytics for Finance and Macro Research Centre, King’s Business School, 19
April.
Hammer, C., Kostroch, D., Quiros, G., & Staff of the IMF Statistics Department (STA) Internal Group (2017). Big data: potential, challenges, and statistical implications, IMF
Staff Discussion Note, Staff Discussion Notes (SDN)/17/06, September.
Hill, S. (2018). The Big Data Revolution in Economic Statistics: Waiting for Godot... and Government Funding, Goldman Sachs US Economics Analyst, 6 May.
Irving Fisher Committee on Central Bank Statistics (IFC) (2015). Central banks’ use of and interest in ‘big data’, October.
Irving Fisher Committee on Central Bank Statistics (IFC) (2017). Proceedings of the IFC Satellite Seminar on “Big Data” at the ISI Regional Statistics Conference 2017, IFC
Bulletin, no 44, September.
Meng, X. (2014). A trio of inference problems that could win you a Nobel Prize in statistics (if you help fund it), in Lin, X., Genest, C., Banks, D., Molenberghs, G., Scott D., &
Wang, J.-L. (eds), Past, present, and future of statistical science, Chapman and Hall, 2014, pp 537–62.
Nymand-Andersen, P. (2015). Big data – the hunt for timely insights and decision certainty: Central banking reflections on the use of big data for policy purposes, IFC Working
Paper, no 14.
The Economist (2017). The world’s most valuable resource is no longer oil, but data, 6th May edition.
49
Annex Big data areas Types of data-sets
Foreign trade operations /
investment transactions
Examples of projects
Balance of payments statistics eg tourism, exports
(2): Taxation / payroll / unemployment Employment, wages, business formation (SMEs)
insurance
Selected Administrative Central balance sheet offices Performance vulnerabilities assessment
records
BD Loans registers Measurement of credit risk, FX exposures
50
Thank you!!
Questions?
[email protected]
[email protected]
51