0% found this document useful (0 votes)
247 views22 pages

Big Data in Finance: Bin Fang and Peng Zhang

Uploaded by

Cyrine Amri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
247 views22 pages

Big Data in Finance: Bin Fang and Peng Zhang

Uploaded by

Cyrine Amri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Chapter 11

Big Data in Finance

Bin Fang and Peng Zhang

Abstract Quantitative finance is an area in which data is the vital actionable


information in all aspects. Leading finance institutions and firms are adopting
advanced Big Data technologies towards gaining actionable insights from massive
market data, standardizing financial data from a variety of sources, reducing the
response time to real-time data streams, improving the scalability of algorithms and
software stacks on novel architectures. Today, these major profits are driving the
pioneers of the financial practitioners to develop and deploy the big data solutions
in financial products, ranging from front-office algorithmic trading to back-office
data management and analytics.
Not only the collection and purification of multi-source data, the effective visu-
alization of high-throughput data streams and rapid programmability on massively
parallel processing architectures are widely used to facilitate the algorithmic trading
and research. Big data analytics can help reveal more hidden market opportunities
through analyzing high-volume structured data and social news, in contrast to
the underperformers that are incapable of adopting novel techniques. Being able
to process massive complex events in ultra-fast speed removes the roadblock for
promptly capturing market trends and timely managing risks.
These key trends in capital markets and extensive examples in quantitative
finance are systematically highlighted in this chapter. The insufficiency of techno-
logical adaptation and the gap between research and practice are also presented.
To clarify matters, the three natures of Big Data, volume, velocity and variety
are used as a prism through which to understand the pitfalls and opportunities of
emerged and emerging technologies towards financial services.

B. Fang, Ph.D.
QuantCloud Brothers Inc., Setauket, NY 11733, USA
e-mail: [email protected]
P. Zhang, Ph.D. ()
Stony Brook University, Stony Brook, NY 11794, USA
e-mail: [email protected]; [email protected]

© Springer International Publishing Switzerland 2016 391


S. Yu, S. Guo (eds.), Big Data Concepts, Theories, and Applications,
DOI 10.1007/978-3-319-27763-9_11
392 B. Fang and P. Zhang

11.1 Overview

Just a decade ago, finance was a small-data discipline. The data scarcity is the main
reason. Most exchanges provided only Open, High, Low, Close (OHLC) four prices
per instrument per day. Intraday data beyond what was required by the regulations
was not kept even for the biggest market markers. For example, commodity trading
floors kept no more than 21 days of intraday history until 6 years ago [1].
Today, the proliferation of data has changed the financial industry dramatically,
not only in portfolio analysis and risk management, but also in retail banking and
credit scoring. Along with the ever-increasing volume, velocity and variety (3V’s)
of financial data, capital firms have been investigating in ways to make Big Data
more manageable and to condense enormous amount of information into actionable
insights, in order for keeping their competitive edges in the business.

11.1.1 Quick View of the Financial Industry

Financial industry encompasses a broad range of businesses that manage money,


including commercial banks, investment banks, credit card companies, insurance
companies, consumer finance companies, stock brokerages, investment funds and
some government-run institutions. The businesses can range from as big as JPMor-
gan Chase, which has more than 250,000 employees globally, to as small as a
proprietary trading shop consisting of couple of individuals. However, the essence is
the same, which is to maximize the profit, minimize the risk and position themselves
for ongoing success, by gaining insights into market opportunities, customers and
operations.
The context of the insights is various depending on the individual businesses.
For an investment company, it can be a multi-factor relation which determines
how one specific stock goes within a certain time of period. For instances, a major
factor of the long-term stock price movement of XOM (ExxonMobil) would be its
earnings and cash flow, which in turn are determined by the global crude oil price
and inventory. However for a high-frequency propriety trading shop, the insights
for the short-term price movement of the same XOM would be the liquidity in the
markets, short-term buy/sell pressure, the sector momentum, and crude oil future
movement.
Besides the context, another perspective is the insights are ever-changing. Unlike
gravitational force depending solely on mass and distance in physics, or water
molecules consisting of two hydrogen and one oxygen in chemistry, finance is an
area that essentially dealing with people, instead of nature. We are changing day-by-
day, so are the insights people trying to gain. Back to the XOM long-term stock price
example we mentioned earlier, the technology revolution of shale oil extraction has
become a very important factor to be added in, which significantly affected OPEC
policy, global oil inventory and price. Sometime this change can be abrupt. One
good example is that on January 15th 2015, without any pre-notification, the Swiss
11 Big Data in Finance 393

National Bank (SNB) unexpectedly abandoned the euro cap at 1.20, introduced in
September 2011. This made Swiss franc soared as much as 30 % in chaotic trade.
Any strategies based on this 1.2 Swiss franc/Euro cap assumption became invalid
immediately.
To some extent, it is quite like how information and technology play the role
in modern battlefields. In order to win in financial markets, institutions need to
examine large pools of data, extract value from complicated analysis in a timely
manner. Take the trading MSFT (Microsoft) for example. Because MSFT traded
in different markets, data from all these markets are needed in order to get a global
view of the stock. MSFT has very tight relations with, let’s say AAPL (Apple), IBM,
INTL(Intel), DJI (Dow Jones Indices) and etc. we need to get those data as well,
even though we are interested only in trading MSFT. The more data we have, the
more complicated analysis can be practiced, which usually means more time needs
to be devoted. However the transient market opportunities don’t give us this leisure.
The speed to transform big data into actionable insights distinguishes the profitable
from the losing. This problem is exactly what the modern big data techniques are
designed to handle.

11.1.2 3V’s in Financial Markets

The principal characteristics of Big Data, including the volume, variety and velocity
(3V’s), have been embodied in all aspects of financial data and markets.

11.1.2.1 Volume

Data volume in financial markets has been growing at a tremendous rate. As


algorithmic trading becomes a main stream on Wall Street during the past decade,
capital markets stepped into Big Data era as well. For example, High Frequency
Trading (HFT), as a primary form of quantitative trading, using proprietary trading
strategies carried out by computers to move in and out of positions in seconds or
fractions of a second, represents 2 % of the approximately 20,000 firms operating
today, however accounted for 60–73 % of all US equity trading volume as of 2009,
with that number falling to approximately 50 % in 2012 [2, 3].
For the big picture, although the total shares changed hand is only tenfold of
20 years ago, the total number of transactions was increased by 50 times, with this
number being more than 120 times during the financial crisis (Fig. 11.1). If L1
quotes are counted in, the number would be one order of magnitude more than
the trades averagely; and if L2 quotes are included as well, prepare to double or
quadruple the amount one more time.
Not only structured data from dozens of exchanges, banks and data vendors, but
also unstructured data from news, twitters, even social media have been used by
the industry in daily practice for various purposes, for instances, to supplement for
investment decisions making, to tailor products per customer basis in retail banking,
394 B. Fang and P. Zhang

The combined number of transactions and number of shares traded in NYSE and NASDAQ within
the past two decades, along with the average shares per trade.
The left axis is for number of trades and shares; the right axis is for shares per trade.
Data Quoted from World Federation of Exchanges (WFE)
7000 2500

6000
2000
5000

1500
4000

3000
1000

2000

500
1000

0 0
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Avg Shares per Trade Number of Transactions (in million) Number of SharesTraded (in billion)

Fig. 11.1 The number of transactions and number of shares traded in NYSE and NASDAQ.
Source: World Federation of Exchanges

to have a holistic view of individual’s creditworthiness, and so on. This part of data
accounts for another big portion of the volume.

11.1.2.2 Velocity

One decade ago, the stocks OHLC prices were reported the following day (on the
T C 1 basis). In current financial market, a stock can experience about 500 quote
changes and about 150 trades in 1 ms, 1800 quotes and 600 trades in 100 ms, 5500
quotes and 700 trades in 1 s [4]. To catch high frequency data consolidated from
dozens of markets and venues and to submit orders nationally or even globally with
ultra-low latency, various infrastructures, hardware and software techniques have
been designed and deployed by different vendors, such as microwave/optic fiber data
transition, FPGA, and ASIC. The reason that firms, especially the ones practicing
HFT, have been willing to spend tens of millions dollars for these infrastructures or
technologies to gain tiny increments of speed is that ‘milliseconds mean millions’.
Not only for professional investment, but the increase of speed is also happening
in everyone’s regular life. The use of web and mobile devices has dramatically
increased the speed and frequency of transaction to everybody. People order
Starbucks, Taco Bell by clicking several keys; people check in, check out, asking for
room service all on your smart phone; people also make deposit, pay bills through
mobile apps in just seconds. All these represent challenges and opportunities
to financial institutions. Analytics and the ability to efficiently and effectively
exploit the big data technology stack, advanced statistical modeling, and predictive
analytics in support of real-time decision making across business channels and
operations will distinguish those companies that flourish in uncertain markets from
those that misstep.
11 Big Data in Finance 395

11.1.2.3 Variety

Financial data mainly consists of structured and unstructured data.


Structured data is the information having fixed structure and length. In financial
industry, most of structured data are in the form of time series. There are various
structured data in the markets. Based on the types of instruments, it has equities,
futures, options, ETFs and OTCs. Different markets and venues usually have
different formats even for the same instrument. Various data vendors provide
consolidated data with various highlights, some of which being focusing on the
latency of interconnection among several data sources by providing state-of-the-art
network infrastructures, i.e., Pico Quantitative Trading, and some of which marketed
as global feed coverage by offering a uniformed format, i.e., Reuters.
Unstructured data is information that is unorganized and does not fall into a pre-
determined model. This includes data gathered from social media sources, such as
news articles, weather reports, twitters, emails or even audio and video, which help
institutions to deliver greater insights into customers’ needs, fraud transactions, and
market sentiment. Although complicated, strategies based on unstructured data have
widely been utilized by professional traders for years, a good example is the Hash
Crash on April 23, 2013 [5], in which event-driven trading algorithms responded to
the hijacked Associated Press @AP Twitter feed and briefly wiping $121 billion off
the value of companies in the S&P 500 index, before recovering minutes later [6].
To process both the unparalleled amount of structured and unstructured data
feeds, big data technologies are definitely needed, since it is no longer possible for
traditional relational database and data warehousing technologies to handle them
efficiently.

11.1.3 Big Data in Context

The concept of “big” in financial industry context is different from what it is in


scientific or retail contexts. In retail businesses, for example, the analysis of profiling
of customers mainly involves analysis of unstructured data from social media
sources. However, financial markets primarily deal with structured data collected
from a limited set of sources, such as exchanges and data vendors. Although
unstructured data sets have been used with firms for sentiment analysis and trading,
these have not traditionally been the data sets of primary importance to the business.
In financial markets, big data problems are not considered as being represented
by any of the three V’s alone. Regarding the volume, technologies that are good
at handling the high volume of tick data, which has always been the biggest data
set, have already been deployed in a structured manner for years. Although not
perfect, these technologies have been able to scale up to meet increased electronic
flows of data resulting from increased market activities. In terms of velocity, HFT
has adequately dealt with much higher velocity of data, squeezing the feed/order
latency from microsecond to nanosecond and to near the hardware theoretical
396 B. Fang and P. Zhang

limits. But this is not traditionally considered as big data technologies. Complicated
analysis have already been used in like OTC derivatives using various sets of data
for quite some time even before big data concept exists. So it is not suitable to say
that variety or complexity of data alone can be tagged as a big data problem.
Big data challenges in financial context are usually referred to projects that
involve multiple factors, such as high volumes of complex data that must be
cross-referenced in a specific timeframe. Although not necessarily required to be
performed in real time, current tasks are tend to be consolidating different data sets
from various sources, structured and unstructured, from heterogeneous asset class
and risk information, deploying complex data aggregations for ad hoc regulatory
reports, credit analysis, trading signal generation or risk management for instances,
while reducing the latencies of data aggregation and increasing the effectiveness of
data management.
Today, real-time streaming data is widely available. The proliferation of data is
significantly changing business models in financial firms, whether in market making
or long-term portfolio management. Even long-only portfolio managers nowadays
add screens of data-driven signals to their portfolio selection models in order to
abstract volatility and noise, and realize pure returns for their investors. On the other
hand, portfolio managers ignoring or under-studying the multitude of available data
are adding a considerable risk to their investment portfolios.

11.2 Applications of Big Data Technologies

By consolidating data management in traditional silos, financial firms are able to


manage portfolio, analyze risk exposure, perform enterprise-level analytics, and
comply with regulations from a more holistic point of view. The ever-growing
volumes of data and the requirements to fast access, aggregate, analyze and
act on them within a limited time frame make traditional technologies such as
Relation Database Management Systems (RDBMS) impossible to accomplish these
advanced analytics for most of the times. The emerging big data technologies,
however, become invaluable in their abilities to meet the elasticity to the rapidly-
changing requirements. They can ultimately help firms to discover many innovative
and strategic directions that firms couldn’t get before.
Based on a recent research, SunGard identified ten trends shaping big data
initiatives across all segments of the financial industry [7]:
1. Larger market data sets containing historical data over longer time periods
and increased granularity are required to feed predictive models, forecasts and
trading impacts throughout the day.
2. New regulatory and compliance requirements are placing greater emphasis on
governance and risk reporting, driving the need for deeper and more transparent
analyses across global organizations.
11 Big Data in Finance 397

3. Financial institutions are ramping up their enterprise risk management frame-


works, which rely on master data management strategies to help improve
enterprise transparency, auditability and executive oversight of risk.
4. Financial services companies are looking to leverage large amounts of con-
sumer data across multiple service delivery channels (branch, web, mobile)
to support new predictive analysis models in discovering consumer behavior
patterns and increase conversion rates.
5. In post-emergent markets like Brazil, China and India, economic and business
growth opportunities are outpacing Europe and America as significant invest-
ments are made in local and cloud-based data infrastructures.
6. Advances in big data storage and processing frameworks will help financial
services firms unlock the value of data in their operations departments in
order to help reduce the cost of doing business and discover new arbitrage
opportunities.
7. Population of centralized data warehouse systems will require traditional ETL
processes to be re-engineered with big data frameworks to handle growing
volumes of information.
8. Predictive credit risk models that tap into large amounts of data consisting of
historical payment behavior are being adopted in consumer and commercial
collections practices to help prioritize collections activities by determining the
propensity for delinquency or payment.
9. Mobile applications and internet-connected devices such as tablets and smart-
phone are creating greater pressure on the ability of technology infrastructures
and networks to consume, index and integrate structured and unstructured data
from a variety of sources.
10. Big data initiatives are driving increased demand for algorithms to process data,
as well as emphasizing challenges around data security and access control, and
minimizing impact on existing systems.
Big data has been emerging to be driving business analytics in the enterprise level
to help with innovation and decision-making in today’s financial industry. These
analytics include, but not limited to, portfolio management, trading opportunities
hunting, execution analysis, risk management, credit scoring, regulatory compli-
ance, security and fraud management. The ability to efficiently and effectively
deploy the big data technologies to support real-time decision making across whole
business will widen the gap between successful companies and those misstep.

11.2.1 Retail Banking

Online and mobile banking has reshaped today’s banking institutions, making them
different from a decade ago. Over the years, channel growth has had enormous
impacts on retail banking, as customers began using alternate channels more
frequently. The use of web and mobile channels has led to a decrease in face-to-face
398 B. Fang and P. Zhang

interactions between the customers and the banks, and in the meantime led to an
increase in virtual interactions and increasing volume of customer data. The data
that banks hold about their customers is much bigger in volume and much more
diverse in variety than ever before. However, only a small portion of them gets
utilized for driving successful business outcomes. Big data technologies can make
effective use of customer data, helping develop personalized products and services,
like most e-commerce companies already did.
Customers have expectations about similar experiences from the retail banking
as they have in popular e-commerce destinations, such as Amazon and EBay.
However, banks are often unable to deliver effective personalized service. The main
reason is the low level of customer intelligence. Without deep know-how about their
customers, banks may not be able to meet these expectations. Big data analytics help
banks goldmine and maximize the value of their customer data, to predict potential
customer attrition, maximize lead generation and unlock opportunities to drive top
line growth before their competitors can [8].
There are certain things that retail banks can do to advance the level of customer
intelligence [7]:
• Leverage big data to get a 360ı view of each customer.
• Drive revenues with one-to-one targeting and personalized offers in real-time.
• Reduce business risk by leveraging predictive analytics for detecting fraud.
• Achieve greater customer loyalty with personalized retention offers.
• Employ the power of big data without worrying about complexities and steep
learning curves.
As an example, before big data was tamed by technology, Bank of America
took the usual approach to understanding customers—it relied on sample. Now,
it can increasingly process and analyze data from its full customer set. It has been
using big data to understand multi-channel customer relationships, by monitoring
customer ‘journeys’ through the tangle of websites, call centers, tellers, and other
branch personnel to have a holistic view of the paths that customers follow through
the bank, and how those paths affect attrition or the purchase of particular financial
services. The bank also uses transaction and propensity models to determine which
customers have a credit card or mortgage that could benefit from refinancing at a
competitor and then makes an offer when the customer contacts the bank through
online, call center or branch channels [9].
US bank, the fifth largest commercial bank in the United States, shows another
good example of archiving more effective customer acquisition with the help of big
data solutions. The bank wanted to focus on multi-channel data to drive strategic
decision-making and maximize lead conversions. It deployed an analytics solution
that integrates data from online and offline channels and provides a unified view of
the customer. This integrated data feeds into the bank’s CRM solution, supplying
the call center with more relevant leads. It also provides recommendations to the
bank’s web team on improving customer engagement on the bank’s website. As an
outcome, the bank’s lead conversion rate has improved by over 100 % and customers
receive an personalized and enhanced experience [10].
11 Big Data in Finance 399

A mid-sized European bank used data sets of over 2 million customers with
over 200 variables to create a model that predicts the probability of churn for each
customer. An automated scorecard with multiple logistic regression models and
decision trees calculated the probability of churn for each customer. Through early
identification of churn risks, the bank saved itself millions of dollars in outflows it
otherwise could not have avoided [8].

11.2.2 Credit Scoring

The conventional methodology for loan and credit scoring that financial institutions
have been using is based on a five component composite score, including (1) past
loan and credit applications, (2) on time payments, (3) types of loan and credit used,
(4) length of loan and credit history and (5) credit capacity used [7]. Until the big
data scoring services become available, this approach has seen little innovation in
making scoring a commodity.
With big data technologies, for instance machine learning algorithms, loan and
credit decisions are determined in seconds by automated processes. In some cases,
the technology can use million-scale data points to asses customers’ credit scores in
real-time.
The variety of data that can be used for credit scoring has expanded considerably.
With this invaluable data, the new technologies can give financial companies the
capability to make the observation of shopping habits look downright primitive.
The information gathered from social media, e-commerce data, micro geographical
statistics, digital data brokers and online trails is used to mathematically determine
the creditworthiness of individuals/groups, or to market products specifically tar-
geted to them.
Such technologies give a 360-degree comprehensive view of any prospective
customer, based on his relatives, his colleagues and even his web browsing habits.
This ultimately helps to expand the availability of credit to those who struggle to
obtain fair loans. Research has shown that everything, ranging from users’ political
inclination to sexual orientation can now be accurately predicted by parsing publicly
available information on social networks such as Facebook and Twitter, as shown in
Fig 11.2.
The biggest barrier of adopting Big Data in Credit Scoring, however, is the fear
of regulatory scrutiny. When it comes to big data, there is no clear prohibition on
using data for underwriting. With the technologies, financial companies are capable
of predicting lots of things that’s illegal to use for lending and are regarded as
discrimination. “Because big data scores use undisclosed algorithms, it is impossible
to analyze the algorithm for potential racial discriminatory impact,” the National
Consumer Law Center wrote in a recent paper on big data [11]. It can become a fair
lending issue, if the use of that data results in disproportionate negative outcomes
for members of a protected class.
It is this fear of regulatory scrutiny that has left many big banks and credit
card companies reluctant to dive completely into the new world of non-traditional
400 B. Fang and P. Zhang

Personal characteristics can be predicted by social media data

Gender
Lesbian
Gay
Democrat vs Republican
Christianity vs Islam
Caucasian vs African-American
Uses drugs
Drinks alcohol
Smokes cigarettes
Parents together at 21
Single vs. In relationship
0 20 40 60 80 100

Fig. 11.2 Success rate of using Facebook ‘likes’ to predict personal characteristics (%). Source:
Kosinski, Stillwell and Graepel for the National Academy of Sciences of the USA

credit information. For many lenders, non-traditional data is regarded as augment


or supplements to traditional scoring methods that still rely largely on historical
information. Instead, a lot of start-ups have been actively using non-traditional
information, with their goals being to use the variety of information now available
to extend credit in a more efficient way, or to those lacking traditional credit
information.
One of the successful start-ups by leveraging new techniques is Big Data Scoring,
which is a European provider of credit scoring solutions based on social media.
It was founded in 2013 and is aimed to provide services to banks and consumer
lending companies. Their credit scoring model based purely on information from
Facebook, with Gini coefficient of 0.340. In order to build the model, Facebook data
about individuals was collected in various European countries with prior permission
from the individuals. This data was then combined with the actual loan payment
information for the same people and the scoring models were built using the same
tools used in building traditional credit scoring models. According to the company’s
website, this new underwriting model provides on average 25 % improvement over
the current best-in-class scoring models. For a lender, this translates directly into
better credit quality and more clients. Use of big data can save money on credit
losses, while at the same time increase revenue through expanding the potential
client base.

11.2.3 Algorithmic Trading

In early 1990s, the largest exchanges adopted electronic “matching engine” to bring
together buyers and sellers. In 2000, decimalization changed the minimum tick
size from 1/16 of a dollar to US$0.01 per share. Both of the facts encouraged
11 Big Data in Finance 401

algorithmic trading in the past 2 decades, and at its peak in 2009, more than 70 %
of the US equity trading volume was contributed by quantitative trading [12]. Even
conventional traders nowadays increasingly started to rely on algorithms to analyze
market conditions and supplement their trading and execution decisions.
Algorithmic trading mainly uses huge historical data to calculate the success
ratio of the algorithms being written. Algorithms evaluate thousands of securities
with complex mathematical tools, far beyond human capacity. They also combine
and analyze data to reveal insights that are not readily apparent to the human eye.
This is where true innovation happens—there is a seemingly endless amount of data
available to us today, and with the right tools financial modeling becomes limited
only by the brain power and imagination of the quant at work. Big data techniques
are the right tools that can facilitate the alpha-generation cycle of development-
deployment-management, which is the process every serious investor has been
taking. Base on functionalities, the process can be categorized as Data Management,
Strategy Development and Product Deployment, where various big data techniques
could be deployed.
• Data management: markets are more complex and interconnected and informa-
tion traverses the connections more rapidly than a decade ago. One cannot get
a comprehensive view of a portfolio with just one source of data any more.
Capital firms need to store and stream in various types and enormous amount
of data, and effectively link disparate data together to get an actionable insight.
Big data technologies provide solutions for effective data management, such as
the column based database like NoSQL, and in-memory database.
• Strategy development: when identifying and tuning trading strategies, different
algorithms, various combinations of parameters, disparate sets of symbols and
various market conditions are needed to be experimented to find the most prof-
itable strategies with least drawdown. This process is an extremely computation-
intensive and data-intensive task. Big data techniques such as MapReduce, which
has been widely used in other industries, are not quite suitable for algorithmic
trading, because instead of batch processing, real-time streaming/analytics are
of more needs. Complex Event Processing (CEP) has been widely adopted for
real-time analysis.
• Product deployment: a comprehensive series of risk checks, verifications, market
exposure checks and control actions in accordance with regulations is required
to be done before any order populated to execution gateways. These measures
provide protection to both the markets and the funds themselves. Ideally these
checks do not introduce much unwanted latency to the live trading system. An
accurate, concise, real-time trading monitoring system is a necessary tool for
traders, portfolio managers to have a comprehensive view of portfolios/accounts,
and to provide human intervention capabilities.
Big Data is an invaluable tool to allow for better visibility, faster alpha-generation
cycles and improved control over risks in algorithm trading. Big Data strategies can
also be used to fast gather and process information to create a clear understanding
of market in order to drive front office trading strategies, as well as to determine
402 B. Fang and P. Zhang

the valuation of individual securities. Traders are able to determine whether various
market participants, including those on Twitter or blogs, are bullish or bearish and
formulate investment strategies accordingly.

11.2.4 Risk Management

Risk management has been a high-priority focus area for most financial institutions
[13, 14]. Post-crisis, financial institutions face new demands and challenges.
More detailed, transparent and increasingly sophisticated reports are required from
the regulators. Comprehensive and regular stress tests across all asset classes are
required for banks. Improved risk modeling and real-time risk monitoring are
expected by the industry because of the recent money laundering scandals and
‘rogue trader’.
As institution becomes more concentrated, markets become more interconnected
and information traverses the connections more rapidly, complexity has grown
across every aspects of the industry. In the meantime, risks increase with complexity.
The demands to improve monitoring of risk, risk coverage, and more predictive
power of risk models have never been such high. Big Data technologies, accompa-
nied with thousands of risk variables, can allow banks, asset managers and insurance
institutions to proactively detect potential risks, react more efficiently and more
effectively, and make robust decisions. Big Data can be targeted to an organization’s
particular needs and applied to enhance different risk domains.
• Credit risk: big data can aggregate information not only from conventional
structured database, but also from mobile devices, social media and website
visit, etc. to gain greater visibility into customers behavior and to monitor closer
borrowers for real-time events that may increase the chances of default.
• Liquidity risk: banks finance the longer-term instruments, which are sold to
their customers by borrowing via short-term instruments. That leverage can be
lost quickly as funding is withdrawn. It has been a known difficulty to model
and forecast liquidity crises. Big data has the capability of linking superficially
unrelated events in real time, which could presumably precede a liquidity crisis
such as widening credit spreads.
• Counterparty credit risk. To calculate Credit Valuation Adjustment (CVA) at port-
folio level or fully simulate potential exposure for all path-dependent derivatives
as structured products, banks need to run 10–100 K Monte Carlo scenarios, to
which in-memory and GPU technologies allow enormous amount of data to be
processed at incredibly high speeds, and let the derivatives be traded at better
level than the competitors.
• Operational Risk. New technologies have the capability of collecting data from
anywhere. Not only does it include trading systems, social media and emails,
but also computer access log files and door swipe card activities. A fully
comprehensive, integrated data analysis can detect fraud before the damage has
hit disastrous levels.
11 Big Data in Finance 403

Numerous institutions have already begun to implement big data projects for
risk management. An example is the UOB bank from Singapore. It successfully
tested a risk system based on big data, which makes the use of big data feasible
with the help of in-memory technology and reduces the calculation time of its
VaR (value at risk) from about 18 h to only a few minutes. This will make it
possible in future to carry out stress tests in real time and to react more quickly
to new risks. Another success example is Morgan Stanley. The bank developed its
capacities in processing big data and thus optimized its portfolio analysis in terms
of size and result quality. It is expected that these processes will lead to a significant
improvement of financial risk management, thanks to automated pattern recognition
and increased comprehensibility [15].
“Whether it’s guarding against fraud or selling something new, being able to pull
data from 80 different businesses enables us to get ahead of problems before they’re
problems,” says Wells Fargo Chief Data Officer Charles Thomas [13].

11.2.5 Regulatory Compliance

After the financial crisis of 2008, stringent regulatory compliance laws have been
passed to improve operational transparency, increasing the visibility into consumer
actions and groups with certain risk profiles. Today’s financial firms are required to
be able to access years of various types of historical data in response to the requests
from regulators at any given time.
The requirements and purposes vary from law to law. For instances,
• Dodd-Frank Act, which is for the authority to monitor the financial stability
of major firms whose failure could have a major negative impact on economy,
requires firms to hold historical data for at least 5 years;
• Basel III, the third Basel Accord, which is for authorities to have a closer look at
the banks’ capital cushion and leverage levels, requires retention of transaction
data and risk information for 3–5 years;
• FINRA/Tradeworx Project is a comprehensive and consolidated audit trail by
FINRA to monitor real-time transactions, in order to detect potentially disruptive
market activity caused by HFT. Tick data set (quotes, updates, cancellations,
transactions, etc.) and a real-time system by Tradeworx are included.
Not only is the amount of data required to be held much more, but also some ad-
hoc reports are required to be more comprehensive and time sensitive. For examples,
information from unstructured data of emails, twitters, voice mails are required
to be extracted; and in some cases, this information is again required to be cross
referenced to key sets of structured data of transactions in order to facilitate trade
reconstruction and reporting. Linking data sets across a firm can be particularly
challenging, especially for a top-tier firm with dozens of data warehouses storing
data sets in a siloed manner. The speed of report generation is also critical. A good
example is that the trade reconstruction reports under Dodd-Frank is required to
404 B. Fang and P. Zhang

respond within 72-hour period and needs to cope with data including audio records,
text records and tagged with legal items too.
To assist the firms in resolving this matter, IBM and Deloitte have developed a
system that can parse complex government regulations related to financial matters,
and compare them to a company’s own plans for meeting those requirements. The
work is aimed to help financial firms and other organizations use advanced big
data analysis techniques to improve their practices around risk management and
regulatory compliances. The service draws on Deloitte’s considerable experience
in regulatory intelligence, and uses IBM’s cloud capabilities and big data-style
analysis techniques. Basically, it will use IBM’s Watson-branded cognitive
computing services to parse written regulations paragraph by paragraph, allowing
organizations to see if their own frameworks are meeting the mandates described
in the regulatory language. This analysis could help cut costs of meeting new
regulatory guidelines [16].

11.3 Hurdles to Adopting Big Data

The importance of big data deployments highlights that actionable information and
insight are equally pegged with scalability for future data volume increases [17].
Over 60 % of financial institutions in North America, believe that big data analytics
provides a significant competitive advantage, and over 90 % believe that successful
big data initiatives will determine the winners of the future [18]. However, the
majority of firms active in the capital markets do not have a big data strategy in
place at an enterprise level. For instance, according to a research, less than half of
banks analyze customers’ external data, such as social media activities and online
behavior. Only 29 % analyze customers’ share of wallet, one of the key measures
of a bank’s relationship with its customers [19]. Moreover, only 37 % of capital
firms have hands-on experience with live big data deployments, while the majority
are still focusing on pilots and experiments [20]. The reasons for this gap between
willingness and realities will be summarized in this section.

11.3.1 Technological Incompatibilities

Big data in financial industry pay attention to data flows as opposed to stocks. Many
failures of big data projects in the past were because of the lack of compatibility
between financial industry needs and the capabilities of big data technologies. Big
data originally came from the practices of scientific research and online searching.
Hadoop implementation via MapReduce has been one of the successful big data
strategies for parallel batch processing with good flexibility and easy migration.
However, these technologies have sometimes been unsuccessful in capital markets,
because it relies on offline batch processing, which is not suitable for real-time
11 Big Data in Finance 405

analytics. Moreover, resource management and data processing are tightly coupled
together by Hadoop, so it is not possible to prioritize tasks when running multiple
applications simultaneously.

11.3.2 Siloed Storage

Traditional data management typically distributes data across systems focusing on


specific functions such as portfolio management, mortgage lending, etc. Thus firms
are either lack of a seamless holistic view of customers/markets or have big overlap
of data across dozens of legacy data warehouses. For example, from the front office
to the back office, Deutsche bank has been collecting petabytes of data, which are
stored across 46 data warehouses, where there is 90 % overlap of data [21]. The data
storage strategy needs to be changed via the addition of a focus on tiered storage,
placing the data sets of most importance on faster devices and other sets be less
readily accessible but more cheaply stored. In the meantime, data retention needs
a more proactive approach to retire and delete data after the end of the retention
timeframe reaches. Data storage transition can be extremely challengeable with
decades of years of traditional siloed storage.

11.3.3 Inadequate Knowledge

New skill sets to benefit from big data analytics are needed. These skills include
programming, mathematical, statistical skills and financial knowledge, which go
beyond what traditional analytics tasks require. The individuals with all these
knowledge are what people usually called “data scientists”, who need to be not
only well versed in understanding analytics and IT, but should also have the ability
to communicate effectively with decision makers. However, the biggest issue in this
regard is finding employees or consultants that understand both the business and the
technology [22]. Some firms have chosen to hire a team with the combined skills of
a data scientist due to the lack of a single available individual with all of the required
capabilities [17].

11.3.4 Security and Privacy Concern

By distributing business sensitive data across systems, especially by storing them in


Cloud, security risk is an unavoidable concern for financial firms. Protecting a vast
and growing volume of critical information and being able to search and analyze it
to detect potential threats is more essential than ever. Research indicates that 62 %
of bankers are cautious in their use of big data due to privacy and security issues
406 B. Fang and P. Zhang

[23]. When executive decisions to be made for big data strategies deployment,
senior management may decide against handling over sensitive information to
cloud, especially public cloud providers, because if anything goes wrong, the
reputational damage to the brand or the cutting-edge intellectual property loss would
far outweigh any possible benefits. With regard to the concern, private clouds tend
to be the norm for top-tier capital firms.

11.3.5 Culture Shift

Culture shift from a ‘Data as an IT asset’ to a ‘Data as a Key Asset for Decision-
Making’ is a must. Most traditional role of IT adheres to standards and controls on
changes, views data from a static/historic perspective, and analytics has been more
of an afterthought. Big data analytics is largely aimed to be used in a near real-
time basis to reflect and mine the constantly data changing, and to react quickly
and intelligently [22]. The traditional networks, storages and relational databases
can be swamped by big data flows. Consequently, attempts to replicate and scale
the existing technologies will not keep up with big data demands. The technologies,
skills and traditional IT culture have been changed by big data.
Data managers, once considered to be primarily in the back office and IT, are now
increasingly considered to be a vital source of value for the business. Data Scientists
as well need to be organized differently than analytical staff was in the past, to
be closer to products and processes within firms. Dislike the traditional analytical
tasks, data scientists are focused on analyzing information from numerous disparate
sources with the objective of unlocking insights that will either create value or
provide a solution to a business problem. The job of a data scientist goes beyond
just analytics to include consultancy services, research, enterprise-wide taxonomy,
automating processes, ensuring the firm keeps pace with technology development
and managing analytics vendors [17].

11.4 Technology and Architecture

Different from traditional data management technologies, a number of character-


istics of big data technologies are specifically developed for the need of handling
enormous volume of data, large variations of feed sources, and high-speed and
processing real-time data. To this end, technologies such as Hadoop, column-
oriented databases, in-memory database and complex event processing (CEP) are
most often cited as examples of big data in action.
11 Big Data in Finance 407

11.4.1 Hadoop

Apache’s Hadoop is open-source software originally designed for online searching


engine to grab and process information from all over the internet. It has two well-
known components, MapReduce and Hadoop Distributed File System (HDFS). It
was designed to distribute blocks of subdivided data across different file systems
and run data processing in parallel, with the name of the first stage called “Map”;
then consolidates the processed output to one single server to accomplish the second
stage called “Reduce”.
One successful use case in financial services is that BNP Mellon Corp. credited
Hadoop for allowing it to provide clients real-time visibility into when their trades
are executed at the Hadoop Summit in June 2014 [24]. BNY Mellon gets trade
instructions from its clients—portfolio advisors and investment managers—and
handles those trades and the after-trade processing. Some number of days later,
depending on the type of financials, clients receive a confirmation back saying this
trade was executed and the paper is being held in a specific repository. It used to be
not giving a lot of visibility into what’s happened in the financial system. However,
with Hadoop, the bank found a much cheaper way to process and store data. Now
the company can start to give clients real-time visibility into what’s happening with
the business the bank handles for them.

11.4.2 Column-Oriented and In-Memory Databases

Traditional Relational Database Management System (RDBMS) is a database


management system (DBMS) that is based on the relational model. It has been
used by the industry for decades. Given the fact it is row-oriented, RDBMS have
the properties of Atomicity, Consistency, Isolation, Durability (ACID), which are
ideal for transactions processing. In current financial industry, however, enormous
structured and unstructured data are needed for market sentiment analysis, real-time
portfolio analysis, credit scoring analysis etc. The data stored are not frequently
modified and mostly read-only, such as market tick data set; however the amount
is big and is queried more frequently and repeatedly, so instead scalability and
distributed processing capabilities are required.
Column-oriented databases, however, store mostly time-series and focus on
supporting data compression, aggregation and quick query. The downside to these
columnar databases is that they will generally only allow batch updates and therefore
have a much slower update time than traditional models. However, in most practices
of financial services, the time series are read-only. There are couple of commercial
column-oriented-database products in the markets which were designed for high-
speed access to market and tick data for analysis and trading, with KDBC being the
most prevalent example. In addition to column oriented, in-memory database has
been started utilizing in the industry for high-speed applications and scale linearly
up or down on the fly based on memory requirements.
408 B. Fang and P. Zhang

11.4.3 Complex Event Processing

The need for fast actions and timely responses is of paramount importance in finan-
cial industry, and traditional databases apparently don’t provide these capabilities.
Thus, complex event processing (CEP) emerged. Complex event processing is a
general category of technology designed to analyze streams of data flowing from
live sources to identify patterns and significant business indicators. CEP enables
firms to analyze and act upon rapidly changing data in real time; to capture, analyze
and act on insight before opportunities are lost forever; and to move from batching
process to real-time analytics and decisions.
Imagine a business decision that combines all information sources to render a
real-time action. Information could include: current event, static information about
entities involved in the event, information about past events correlated to the current
event and entity, information relating to the entity and current event, and trends
about the likely futures, derived from predictive models. This complex analysis is
possible with CEP, which can address the following requirements:
• Low latency: typically less than a few milliseconds, but sometimes less than 1
millisecond, between the time that an event arrives and it is processed;
• High throughput: typically hundreds or a few thousand events processed per
second, but burst may happen into millions of events per second;
• Complex patterns and strategies: such as patterns based on temporal or spatial
relationships.
The financial services industry was an early adopter of CEP technology, using
complex event processing to structure and contextualize available data so that
it could inform trading behavior, specifically algorithmic trading, by identifying
opportunities or threats that indicate traders or automatic trading systems buy or
sell. For example, if a trader wants to track MSFT price move outside 2 % of its 10-
minute-VWAP, followed-by S&P moving by 0.5 %, and both within any 2 min time
interval, CEP technology can track such an event. Moreover, it can trigger action
upon the happening of the event to buy MSFT, for example. Today, a wide variety
of financial applications use CEP, including risk management systems, order and
liquidity analysis, trading cost analysis, quantitative trading and signal generation
systems, and etc.

11.5 Cloud Environment

In other industries, big data is often closely connected with the Cloud computing.
The obvious advantage of using cloud is to save up front cost of IT investment, but
the majority of the capital firms are very cautions of public cloud in commercial
sensitive areas. Small companies have been enjoying ‘pay-as-you-go’ model for
cloud services, but the giant firms are not, especially when data control, data
11 Big Data in Finance 409

protection, and risk management are major concerns. Providing cloud services to
major financial institutions is no longer so much about the arguments for or against
any particular cloud model. Instead, it’s about changing culture [25]. In this section,
we briefly present the benefits, challenges, practical solutions of cloud usage in
financial industry.

11.5.1 Benefits

Ever since the financial crisis, the rise of cloud services in financial industry,
especially global Tier one financial institutions, have been mainly and undoubtedly
driven by the increasing demands from customers and regulators, as well as the
pressure of cutting expenses and shrinking margins. Obviously, cloud providers can
make business processes more efficient, enabling banks to do more with less and
reducing the immense cost of in-house IT. By using cloud, businesses are able to
scale up or down on a ‘pay-as-you-go’ basis, rather than being reliant on internal IT
resources. For examples, Commonwealth Bank of Australia reduced expenditure on
infrastructure and maintenance from 75 % of total outgoings to just 25 % by being a
partnership with Amazon Web Services (AWS). Another example is that by utilizing
cloud, BankInter in Spain was able to reduce the time needed for risk analysis from
23 h to less than 1 h [25].

11.5.2 Challenges

Inevitably there are headwinds for cloud deployment in financial industry. These
include concerns over the possible reputation damage the banks might suffer, loss
of competitive edge from proprietary technology and strategies for hedge funds if
its security is breached, government intrusion to data privacy, and loss of direct
control over IT. As the cost of storage has gone down, cloud storage of data actually
seems not particularly useful or beneficial, because the cost saving may not be able
to offset the risk of security breach and damage which could do to the firms. If
something goes wrong, the reputation damage, or proprietary technology stolen
would far outweigh any possible benefits.
Another big problem with cloud is one of the hardest to resolve: extraterritorial-
ity. Whose rules should apply to a cloud service that serves one country but hosted in
another? What if the cloud is being used by an international organization—a large,
global Tier One bank such as J.P. Morgan, for instance? With differing rules between
North America, Europe and Asia, the only way round the problem is to understand
exactly where the data is at all times. This way, a bank can work out how to deal with
rules that have a cross-border implication. For instance, the US FATCA legislation
applies to any bank that interacts with a US taxpayer. But providing the IRS with
410 B. Fang and P. Zhang

details of US customers may inadvertently contravene local rules in other countries


which demand that customer data is protected and not shared with third parties.

11.5.3 Hybrid Cloud

To resolve the security concern, a good workaround solution is to use hybrid cloud.
More innovative things can go on private cloud, while less sensitive can go public.
The scale, power, and flexibility of the Hybrid Cloud provides financial companies
with significant benefits, particularly the ability to extend existing infrastructure
without incurring a large capital outlay for capacity while retaining sensitive
data/code on-premises as appropriate or mandatory by regulations. While in general
terms most businesses expect a private cloud to be more expensive than a public
cloud, the private cloud is actually cheaper for a big institution above a certain
scale, because to use a public cloud the firm would have to implement such stringent
security that any cost saving would be eaten away in any case.
“We are very slow as an industry to understand big data,” said Alastair Brown,
head of e-channels, Global Transaction Banking at RBS. But when we have
worked out the best way to use it, “it will almost certainly be unaffordable to
run the algorithms without using cloud capabilities.” Cloud is part of the future,
it provides a competitive advantage, and it is moving from a buzzword to real
implementation [25].

11.6 The Future

Finance is no longer a small data discipline. The ability to process enormous amount
of information on the fly separates winners from losers in today’s financial markets.
Being aware of latest big data finance tools and technology is a necessity for every
prudent financial service professional.
Big data in financial industry is still at the start of its journey. It has yet been
across the industry alone as a whole. Some top-tier financial firms have acted as
early adopters but they usually do not have comprehensive big data strategies in
place, instead focusing on some specific areas such as risk management, trade
analytics, etc. The frontrunners that have already been aware of the benefits of big
data are certainly going to extend their usage of these strategies. However, these
implementations will likely remain piecemeal for the near future.
The focusing areas of future big data investment will be extended toward client
analytics. Current investments in big data has been largely focusing on revenue
generations in the front office, such as trading opportunities mining and portfolio
management, but the future is likely to be more on client acquisition and retention
to enhance and personalize customer experience. Client analytics have been proven
to be able to benefit both acquisition and retention. Research showed that banks that
11 Big Data in Finance 411

apply analytics to customer data have a four-percentage point lead in market share
over banks that do not. The difference in banks that use analytics to understand
customer attrition is even starker at 12-percentage points [26].
The future growth of big data as a strategy in the industry relies on the continued
education of internal staffs about its uses and advantages. Most of the financial firms
using big data tend to hire experts in order to grow their internal knowledge base
[17]. However, this will open up to key person risk if the big data knowledge and
skills are not disseminated wider among internal staffs. Plus same as other technolo-
gies, after initiative, the big data needs constant refinement and evolvement to adapt
the dynamic market conditions. Firms also need to invest continually in training
their analytics staff on new techniques and their business personnel to enhance
decision-making. The continued in-house education will be a key to future success-
ful deployments and maintenance of big data strategies and technologies over time.

References

1. Aldridge I (2015) Trends: all finance will soon be big data finance
2. Iati R (2009) The real story of trading software espionage. WallStreet and Technology.
Available: AdvancedTrading.com
3. (2012) Times Topics: high-frequency trading. The New York Times
4. Lewis M (2014) An adaption from ‘Flash Boys: A Wall Street Revolt’, by Michael Lewis, The
New York Times
5. Egan M (2013) Survey: ‘Hash Crash’ didn’t seriously erode market structure confidence,
FoxBusiness
6. Kilburn F (2013) 2013 review: social media, ‘Hash Crash’ Are 2013’s trendingtopics
7. Gutierrez DD (2015) InsideBIGDATA guide to big data for finance
8. (2014) Big data: profitability, potential and problems in banking, Capgemini Consulting
9. Groenfeldt T (2013) Banks use big data to understand customers across channels, Forbes
10. Zagorsky V (2014) Unlocking the potential of Big Data in banking sector
11. Yu P, McLaughlin J, Levy M (2014) Big Data, a big disappointment for scoring consumer
creditworthiness. National Consumer Law Center, Boston
12. Algorithmic trading
13. (2014) Retail banks and big data: big data as the key to better risk management, A report from
the Economist Intelligence Unit
14. Arnold Veldhoen SDP (2014) Applying Big Data To Risk Management: transforming risk
management practices within the financial services industry
15. Andreas Huber HH, Nagode F (2014) BIG DATA: potentials from a risk management
perspective
16. Jackson J (2015) IBM and Deloitte bring big data to risk management, Computerworld
17. O’Shea V (2014) Big Data in capital markets: at the start of the journey, Aite Group Report
(commissioned by Thomson Reuters)
18. M a Celent (2013) How Big is Big Data: big data usage and attitudes among North American
financial services firms
19. (2013) BBRS 2013 banking customer centricity study
20. Jean Coumaros JB, Auliard O (2014) Big Data alchemy: how can banks maximize the value of
their customer data? Capgemini Consulting
21. (2013) Deutsche bank: big data plans held back by legacy systems, Computerworld UK
22. (2012) How ‘Big Data’ is different, MIT Sloan Management Review and SAS
412 B. Fang and P. Zhang

23. C. P. Finextra research, NGDATA (2013) Monetizing payments: exploiting mobile wallets and
big data
24. King R (2014) BNY Mellon finds promise and integration challenges with Hadoop. Wall
Street J
25. Holley E (2014) Cloud in financial services – what is it not good for?
26. Aberdeen (2013) Analytics in banking

You might also like