Big Data in Finance: Bin Fang and Peng Zhang
Big Data in Finance: Bin Fang and Peng Zhang
B. Fang, Ph.D.
QuantCloud Brothers Inc., Setauket, NY 11733, USA
e-mail: [email protected]
P. Zhang, Ph.D. ()
Stony Brook University, Stony Brook, NY 11794, USA
e-mail: [email protected]; [email protected]
11.1 Overview
Just a decade ago, finance was a small-data discipline. The data scarcity is the main
reason. Most exchanges provided only Open, High, Low, Close (OHLC) four prices
per instrument per day. Intraday data beyond what was required by the regulations
was not kept even for the biggest market markers. For example, commodity trading
floors kept no more than 21 days of intraday history until 6 years ago [1].
Today, the proliferation of data has changed the financial industry dramatically,
not only in portfolio analysis and risk management, but also in retail banking and
credit scoring. Along with the ever-increasing volume, velocity and variety (3V’s)
of financial data, capital firms have been investigating in ways to make Big Data
more manageable and to condense enormous amount of information into actionable
insights, in order for keeping their competitive edges in the business.
National Bank (SNB) unexpectedly abandoned the euro cap at 1.20, introduced in
September 2011. This made Swiss franc soared as much as 30 % in chaotic trade.
Any strategies based on this 1.2 Swiss franc/Euro cap assumption became invalid
immediately.
To some extent, it is quite like how information and technology play the role
in modern battlefields. In order to win in financial markets, institutions need to
examine large pools of data, extract value from complicated analysis in a timely
manner. Take the trading MSFT (Microsoft) for example. Because MSFT traded
in different markets, data from all these markets are needed in order to get a global
view of the stock. MSFT has very tight relations with, let’s say AAPL (Apple), IBM,
INTL(Intel), DJI (Dow Jones Indices) and etc. we need to get those data as well,
even though we are interested only in trading MSFT. The more data we have, the
more complicated analysis can be practiced, which usually means more time needs
to be devoted. However the transient market opportunities don’t give us this leisure.
The speed to transform big data into actionable insights distinguishes the profitable
from the losing. This problem is exactly what the modern big data techniques are
designed to handle.
The principal characteristics of Big Data, including the volume, variety and velocity
(3V’s), have been embodied in all aspects of financial data and markets.
11.1.2.1 Volume
The combined number of transactions and number of shares traded in NYSE and NASDAQ within
the past two decades, along with the average shares per trade.
The left axis is for number of trades and shares; the right axis is for shares per trade.
Data Quoted from World Federation of Exchanges (WFE)
7000 2500
6000
2000
5000
1500
4000
3000
1000
2000
500
1000
0 0
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Avg Shares per Trade Number of Transactions (in million) Number of SharesTraded (in billion)
Fig. 11.1 The number of transactions and number of shares traded in NYSE and NASDAQ.
Source: World Federation of Exchanges
to have a holistic view of individual’s creditworthiness, and so on. This part of data
accounts for another big portion of the volume.
11.1.2.2 Velocity
One decade ago, the stocks OHLC prices were reported the following day (on the
T C 1 basis). In current financial market, a stock can experience about 500 quote
changes and about 150 trades in 1 ms, 1800 quotes and 600 trades in 100 ms, 5500
quotes and 700 trades in 1 s [4]. To catch high frequency data consolidated from
dozens of markets and venues and to submit orders nationally or even globally with
ultra-low latency, various infrastructures, hardware and software techniques have
been designed and deployed by different vendors, such as microwave/optic fiber data
transition, FPGA, and ASIC. The reason that firms, especially the ones practicing
HFT, have been willing to spend tens of millions dollars for these infrastructures or
technologies to gain tiny increments of speed is that ‘milliseconds mean millions’.
Not only for professional investment, but the increase of speed is also happening
in everyone’s regular life. The use of web and mobile devices has dramatically
increased the speed and frequency of transaction to everybody. People order
Starbucks, Taco Bell by clicking several keys; people check in, check out, asking for
room service all on your smart phone; people also make deposit, pay bills through
mobile apps in just seconds. All these represent challenges and opportunities
to financial institutions. Analytics and the ability to efficiently and effectively
exploit the big data technology stack, advanced statistical modeling, and predictive
analytics in support of real-time decision making across business channels and
operations will distinguish those companies that flourish in uncertain markets from
those that misstep.
11 Big Data in Finance 395
11.1.2.3 Variety
limits. But this is not traditionally considered as big data technologies. Complicated
analysis have already been used in like OTC derivatives using various sets of data
for quite some time even before big data concept exists. So it is not suitable to say
that variety or complexity of data alone can be tagged as a big data problem.
Big data challenges in financial context are usually referred to projects that
involve multiple factors, such as high volumes of complex data that must be
cross-referenced in a specific timeframe. Although not necessarily required to be
performed in real time, current tasks are tend to be consolidating different data sets
from various sources, structured and unstructured, from heterogeneous asset class
and risk information, deploying complex data aggregations for ad hoc regulatory
reports, credit analysis, trading signal generation or risk management for instances,
while reducing the latencies of data aggregation and increasing the effectiveness of
data management.
Today, real-time streaming data is widely available. The proliferation of data is
significantly changing business models in financial firms, whether in market making
or long-term portfolio management. Even long-only portfolio managers nowadays
add screens of data-driven signals to their portfolio selection models in order to
abstract volatility and noise, and realize pure returns for their investors. On the other
hand, portfolio managers ignoring or under-studying the multitude of available data
are adding a considerable risk to their investment portfolios.
Online and mobile banking has reshaped today’s banking institutions, making them
different from a decade ago. Over the years, channel growth has had enormous
impacts on retail banking, as customers began using alternate channels more
frequently. The use of web and mobile channels has led to a decrease in face-to-face
398 B. Fang and P. Zhang
interactions between the customers and the banks, and in the meantime led to an
increase in virtual interactions and increasing volume of customer data. The data
that banks hold about their customers is much bigger in volume and much more
diverse in variety than ever before. However, only a small portion of them gets
utilized for driving successful business outcomes. Big data technologies can make
effective use of customer data, helping develop personalized products and services,
like most e-commerce companies already did.
Customers have expectations about similar experiences from the retail banking
as they have in popular e-commerce destinations, such as Amazon and EBay.
However, banks are often unable to deliver effective personalized service. The main
reason is the low level of customer intelligence. Without deep know-how about their
customers, banks may not be able to meet these expectations. Big data analytics help
banks goldmine and maximize the value of their customer data, to predict potential
customer attrition, maximize lead generation and unlock opportunities to drive top
line growth before their competitors can [8].
There are certain things that retail banks can do to advance the level of customer
intelligence [7]:
• Leverage big data to get a 360ı view of each customer.
• Drive revenues with one-to-one targeting and personalized offers in real-time.
• Reduce business risk by leveraging predictive analytics for detecting fraud.
• Achieve greater customer loyalty with personalized retention offers.
• Employ the power of big data without worrying about complexities and steep
learning curves.
As an example, before big data was tamed by technology, Bank of America
took the usual approach to understanding customers—it relied on sample. Now,
it can increasingly process and analyze data from its full customer set. It has been
using big data to understand multi-channel customer relationships, by monitoring
customer ‘journeys’ through the tangle of websites, call centers, tellers, and other
branch personnel to have a holistic view of the paths that customers follow through
the bank, and how those paths affect attrition or the purchase of particular financial
services. The bank also uses transaction and propensity models to determine which
customers have a credit card or mortgage that could benefit from refinancing at a
competitor and then makes an offer when the customer contacts the bank through
online, call center or branch channels [9].
US bank, the fifth largest commercial bank in the United States, shows another
good example of archiving more effective customer acquisition with the help of big
data solutions. The bank wanted to focus on multi-channel data to drive strategic
decision-making and maximize lead conversions. It deployed an analytics solution
that integrates data from online and offline channels and provides a unified view of
the customer. This integrated data feeds into the bank’s CRM solution, supplying
the call center with more relevant leads. It also provides recommendations to the
bank’s web team on improving customer engagement on the bank’s website. As an
outcome, the bank’s lead conversion rate has improved by over 100 % and customers
receive an personalized and enhanced experience [10].
11 Big Data in Finance 399
A mid-sized European bank used data sets of over 2 million customers with
over 200 variables to create a model that predicts the probability of churn for each
customer. An automated scorecard with multiple logistic regression models and
decision trees calculated the probability of churn for each customer. Through early
identification of churn risks, the bank saved itself millions of dollars in outflows it
otherwise could not have avoided [8].
The conventional methodology for loan and credit scoring that financial institutions
have been using is based on a five component composite score, including (1) past
loan and credit applications, (2) on time payments, (3) types of loan and credit used,
(4) length of loan and credit history and (5) credit capacity used [7]. Until the big
data scoring services become available, this approach has seen little innovation in
making scoring a commodity.
With big data technologies, for instance machine learning algorithms, loan and
credit decisions are determined in seconds by automated processes. In some cases,
the technology can use million-scale data points to asses customers’ credit scores in
real-time.
The variety of data that can be used for credit scoring has expanded considerably.
With this invaluable data, the new technologies can give financial companies the
capability to make the observation of shopping habits look downright primitive.
The information gathered from social media, e-commerce data, micro geographical
statistics, digital data brokers and online trails is used to mathematically determine
the creditworthiness of individuals/groups, or to market products specifically tar-
geted to them.
Such technologies give a 360-degree comprehensive view of any prospective
customer, based on his relatives, his colleagues and even his web browsing habits.
This ultimately helps to expand the availability of credit to those who struggle to
obtain fair loans. Research has shown that everything, ranging from users’ political
inclination to sexual orientation can now be accurately predicted by parsing publicly
available information on social networks such as Facebook and Twitter, as shown in
Fig 11.2.
The biggest barrier of adopting Big Data in Credit Scoring, however, is the fear
of regulatory scrutiny. When it comes to big data, there is no clear prohibition on
using data for underwriting. With the technologies, financial companies are capable
of predicting lots of things that’s illegal to use for lending and are regarded as
discrimination. “Because big data scores use undisclosed algorithms, it is impossible
to analyze the algorithm for potential racial discriminatory impact,” the National
Consumer Law Center wrote in a recent paper on big data [11]. It can become a fair
lending issue, if the use of that data results in disproportionate negative outcomes
for members of a protected class.
It is this fear of regulatory scrutiny that has left many big banks and credit
card companies reluctant to dive completely into the new world of non-traditional
400 B. Fang and P. Zhang
Gender
Lesbian
Gay
Democrat vs Republican
Christianity vs Islam
Caucasian vs African-American
Uses drugs
Drinks alcohol
Smokes cigarettes
Parents together at 21
Single vs. In relationship
0 20 40 60 80 100
Fig. 11.2 Success rate of using Facebook ‘likes’ to predict personal characteristics (%). Source:
Kosinski, Stillwell and Graepel for the National Academy of Sciences of the USA
In early 1990s, the largest exchanges adopted electronic “matching engine” to bring
together buyers and sellers. In 2000, decimalization changed the minimum tick
size from 1/16 of a dollar to US$0.01 per share. Both of the facts encouraged
11 Big Data in Finance 401
algorithmic trading in the past 2 decades, and at its peak in 2009, more than 70 %
of the US equity trading volume was contributed by quantitative trading [12]. Even
conventional traders nowadays increasingly started to rely on algorithms to analyze
market conditions and supplement their trading and execution decisions.
Algorithmic trading mainly uses huge historical data to calculate the success
ratio of the algorithms being written. Algorithms evaluate thousands of securities
with complex mathematical tools, far beyond human capacity. They also combine
and analyze data to reveal insights that are not readily apparent to the human eye.
This is where true innovation happens—there is a seemingly endless amount of data
available to us today, and with the right tools financial modeling becomes limited
only by the brain power and imagination of the quant at work. Big data techniques
are the right tools that can facilitate the alpha-generation cycle of development-
deployment-management, which is the process every serious investor has been
taking. Base on functionalities, the process can be categorized as Data Management,
Strategy Development and Product Deployment, where various big data techniques
could be deployed.
• Data management: markets are more complex and interconnected and informa-
tion traverses the connections more rapidly than a decade ago. One cannot get
a comprehensive view of a portfolio with just one source of data any more.
Capital firms need to store and stream in various types and enormous amount
of data, and effectively link disparate data together to get an actionable insight.
Big data technologies provide solutions for effective data management, such as
the column based database like NoSQL, and in-memory database.
• Strategy development: when identifying and tuning trading strategies, different
algorithms, various combinations of parameters, disparate sets of symbols and
various market conditions are needed to be experimented to find the most prof-
itable strategies with least drawdown. This process is an extremely computation-
intensive and data-intensive task. Big data techniques such as MapReduce, which
has been widely used in other industries, are not quite suitable for algorithmic
trading, because instead of batch processing, real-time streaming/analytics are
of more needs. Complex Event Processing (CEP) has been widely adopted for
real-time analysis.
• Product deployment: a comprehensive series of risk checks, verifications, market
exposure checks and control actions in accordance with regulations is required
to be done before any order populated to execution gateways. These measures
provide protection to both the markets and the funds themselves. Ideally these
checks do not introduce much unwanted latency to the live trading system. An
accurate, concise, real-time trading monitoring system is a necessary tool for
traders, portfolio managers to have a comprehensive view of portfolios/accounts,
and to provide human intervention capabilities.
Big Data is an invaluable tool to allow for better visibility, faster alpha-generation
cycles and improved control over risks in algorithm trading. Big Data strategies can
also be used to fast gather and process information to create a clear understanding
of market in order to drive front office trading strategies, as well as to determine
402 B. Fang and P. Zhang
the valuation of individual securities. Traders are able to determine whether various
market participants, including those on Twitter or blogs, are bullish or bearish and
formulate investment strategies accordingly.
Risk management has been a high-priority focus area for most financial institutions
[13, 14]. Post-crisis, financial institutions face new demands and challenges.
More detailed, transparent and increasingly sophisticated reports are required from
the regulators. Comprehensive and regular stress tests across all asset classes are
required for banks. Improved risk modeling and real-time risk monitoring are
expected by the industry because of the recent money laundering scandals and
‘rogue trader’.
As institution becomes more concentrated, markets become more interconnected
and information traverses the connections more rapidly, complexity has grown
across every aspects of the industry. In the meantime, risks increase with complexity.
The demands to improve monitoring of risk, risk coverage, and more predictive
power of risk models have never been such high. Big Data technologies, accompa-
nied with thousands of risk variables, can allow banks, asset managers and insurance
institutions to proactively detect potential risks, react more efficiently and more
effectively, and make robust decisions. Big Data can be targeted to an organization’s
particular needs and applied to enhance different risk domains.
• Credit risk: big data can aggregate information not only from conventional
structured database, but also from mobile devices, social media and website
visit, etc. to gain greater visibility into customers behavior and to monitor closer
borrowers for real-time events that may increase the chances of default.
• Liquidity risk: banks finance the longer-term instruments, which are sold to
their customers by borrowing via short-term instruments. That leverage can be
lost quickly as funding is withdrawn. It has been a known difficulty to model
and forecast liquidity crises. Big data has the capability of linking superficially
unrelated events in real time, which could presumably precede a liquidity crisis
such as widening credit spreads.
• Counterparty credit risk. To calculate Credit Valuation Adjustment (CVA) at port-
folio level or fully simulate potential exposure for all path-dependent derivatives
as structured products, banks need to run 10–100 K Monte Carlo scenarios, to
which in-memory and GPU technologies allow enormous amount of data to be
processed at incredibly high speeds, and let the derivatives be traded at better
level than the competitors.
• Operational Risk. New technologies have the capability of collecting data from
anywhere. Not only does it include trading systems, social media and emails,
but also computer access log files and door swipe card activities. A fully
comprehensive, integrated data analysis can detect fraud before the damage has
hit disastrous levels.
11 Big Data in Finance 403
Numerous institutions have already begun to implement big data projects for
risk management. An example is the UOB bank from Singapore. It successfully
tested a risk system based on big data, which makes the use of big data feasible
with the help of in-memory technology and reduces the calculation time of its
VaR (value at risk) from about 18 h to only a few minutes. This will make it
possible in future to carry out stress tests in real time and to react more quickly
to new risks. Another success example is Morgan Stanley. The bank developed its
capacities in processing big data and thus optimized its portfolio analysis in terms
of size and result quality. It is expected that these processes will lead to a significant
improvement of financial risk management, thanks to automated pattern recognition
and increased comprehensibility [15].
“Whether it’s guarding against fraud or selling something new, being able to pull
data from 80 different businesses enables us to get ahead of problems before they’re
problems,” says Wells Fargo Chief Data Officer Charles Thomas [13].
After the financial crisis of 2008, stringent regulatory compliance laws have been
passed to improve operational transparency, increasing the visibility into consumer
actions and groups with certain risk profiles. Today’s financial firms are required to
be able to access years of various types of historical data in response to the requests
from regulators at any given time.
The requirements and purposes vary from law to law. For instances,
• Dodd-Frank Act, which is for the authority to monitor the financial stability
of major firms whose failure could have a major negative impact on economy,
requires firms to hold historical data for at least 5 years;
• Basel III, the third Basel Accord, which is for authorities to have a closer look at
the banks’ capital cushion and leverage levels, requires retention of transaction
data and risk information for 3–5 years;
• FINRA/Tradeworx Project is a comprehensive and consolidated audit trail by
FINRA to monitor real-time transactions, in order to detect potentially disruptive
market activity caused by HFT. Tick data set (quotes, updates, cancellations,
transactions, etc.) and a real-time system by Tradeworx are included.
Not only is the amount of data required to be held much more, but also some ad-
hoc reports are required to be more comprehensive and time sensitive. For examples,
information from unstructured data of emails, twitters, voice mails are required
to be extracted; and in some cases, this information is again required to be cross
referenced to key sets of structured data of transactions in order to facilitate trade
reconstruction and reporting. Linking data sets across a firm can be particularly
challenging, especially for a top-tier firm with dozens of data warehouses storing
data sets in a siloed manner. The speed of report generation is also critical. A good
example is that the trade reconstruction reports under Dodd-Frank is required to
404 B. Fang and P. Zhang
respond within 72-hour period and needs to cope with data including audio records,
text records and tagged with legal items too.
To assist the firms in resolving this matter, IBM and Deloitte have developed a
system that can parse complex government regulations related to financial matters,
and compare them to a company’s own plans for meeting those requirements. The
work is aimed to help financial firms and other organizations use advanced big
data analysis techniques to improve their practices around risk management and
regulatory compliances. The service draws on Deloitte’s considerable experience
in regulatory intelligence, and uses IBM’s cloud capabilities and big data-style
analysis techniques. Basically, it will use IBM’s Watson-branded cognitive
computing services to parse written regulations paragraph by paragraph, allowing
organizations to see if their own frameworks are meeting the mandates described
in the regulatory language. This analysis could help cut costs of meeting new
regulatory guidelines [16].
The importance of big data deployments highlights that actionable information and
insight are equally pegged with scalability for future data volume increases [17].
Over 60 % of financial institutions in North America, believe that big data analytics
provides a significant competitive advantage, and over 90 % believe that successful
big data initiatives will determine the winners of the future [18]. However, the
majority of firms active in the capital markets do not have a big data strategy in
place at an enterprise level. For instance, according to a research, less than half of
banks analyze customers’ external data, such as social media activities and online
behavior. Only 29 % analyze customers’ share of wallet, one of the key measures
of a bank’s relationship with its customers [19]. Moreover, only 37 % of capital
firms have hands-on experience with live big data deployments, while the majority
are still focusing on pilots and experiments [20]. The reasons for this gap between
willingness and realities will be summarized in this section.
Big data in financial industry pay attention to data flows as opposed to stocks. Many
failures of big data projects in the past were because of the lack of compatibility
between financial industry needs and the capabilities of big data technologies. Big
data originally came from the practices of scientific research and online searching.
Hadoop implementation via MapReduce has been one of the successful big data
strategies for parallel batch processing with good flexibility and easy migration.
However, these technologies have sometimes been unsuccessful in capital markets,
because it relies on offline batch processing, which is not suitable for real-time
11 Big Data in Finance 405
analytics. Moreover, resource management and data processing are tightly coupled
together by Hadoop, so it is not possible to prioritize tasks when running multiple
applications simultaneously.
New skill sets to benefit from big data analytics are needed. These skills include
programming, mathematical, statistical skills and financial knowledge, which go
beyond what traditional analytics tasks require. The individuals with all these
knowledge are what people usually called “data scientists”, who need to be not
only well versed in understanding analytics and IT, but should also have the ability
to communicate effectively with decision makers. However, the biggest issue in this
regard is finding employees or consultants that understand both the business and the
technology [22]. Some firms have chosen to hire a team with the combined skills of
a data scientist due to the lack of a single available individual with all of the required
capabilities [17].
[23]. When executive decisions to be made for big data strategies deployment,
senior management may decide against handling over sensitive information to
cloud, especially public cloud providers, because if anything goes wrong, the
reputational damage to the brand or the cutting-edge intellectual property loss would
far outweigh any possible benefits. With regard to the concern, private clouds tend
to be the norm for top-tier capital firms.
Culture shift from a ‘Data as an IT asset’ to a ‘Data as a Key Asset for Decision-
Making’ is a must. Most traditional role of IT adheres to standards and controls on
changes, views data from a static/historic perspective, and analytics has been more
of an afterthought. Big data analytics is largely aimed to be used in a near real-
time basis to reflect and mine the constantly data changing, and to react quickly
and intelligently [22]. The traditional networks, storages and relational databases
can be swamped by big data flows. Consequently, attempts to replicate and scale
the existing technologies will not keep up with big data demands. The technologies,
skills and traditional IT culture have been changed by big data.
Data managers, once considered to be primarily in the back office and IT, are now
increasingly considered to be a vital source of value for the business. Data Scientists
as well need to be organized differently than analytical staff was in the past, to
be closer to products and processes within firms. Dislike the traditional analytical
tasks, data scientists are focused on analyzing information from numerous disparate
sources with the objective of unlocking insights that will either create value or
provide a solution to a business problem. The job of a data scientist goes beyond
just analytics to include consultancy services, research, enterprise-wide taxonomy,
automating processes, ensuring the firm keeps pace with technology development
and managing analytics vendors [17].
11.4.1 Hadoop
The need for fast actions and timely responses is of paramount importance in finan-
cial industry, and traditional databases apparently don’t provide these capabilities.
Thus, complex event processing (CEP) emerged. Complex event processing is a
general category of technology designed to analyze streams of data flowing from
live sources to identify patterns and significant business indicators. CEP enables
firms to analyze and act upon rapidly changing data in real time; to capture, analyze
and act on insight before opportunities are lost forever; and to move from batching
process to real-time analytics and decisions.
Imagine a business decision that combines all information sources to render a
real-time action. Information could include: current event, static information about
entities involved in the event, information about past events correlated to the current
event and entity, information relating to the entity and current event, and trends
about the likely futures, derived from predictive models. This complex analysis is
possible with CEP, which can address the following requirements:
• Low latency: typically less than a few milliseconds, but sometimes less than 1
millisecond, between the time that an event arrives and it is processed;
• High throughput: typically hundreds or a few thousand events processed per
second, but burst may happen into millions of events per second;
• Complex patterns and strategies: such as patterns based on temporal or spatial
relationships.
The financial services industry was an early adopter of CEP technology, using
complex event processing to structure and contextualize available data so that
it could inform trading behavior, specifically algorithmic trading, by identifying
opportunities or threats that indicate traders or automatic trading systems buy or
sell. For example, if a trader wants to track MSFT price move outside 2 % of its 10-
minute-VWAP, followed-by S&P moving by 0.5 %, and both within any 2 min time
interval, CEP technology can track such an event. Moreover, it can trigger action
upon the happening of the event to buy MSFT, for example. Today, a wide variety
of financial applications use CEP, including risk management systems, order and
liquidity analysis, trading cost analysis, quantitative trading and signal generation
systems, and etc.
In other industries, big data is often closely connected with the Cloud computing.
The obvious advantage of using cloud is to save up front cost of IT investment, but
the majority of the capital firms are very cautions of public cloud in commercial
sensitive areas. Small companies have been enjoying ‘pay-as-you-go’ model for
cloud services, but the giant firms are not, especially when data control, data
11 Big Data in Finance 409
protection, and risk management are major concerns. Providing cloud services to
major financial institutions is no longer so much about the arguments for or against
any particular cloud model. Instead, it’s about changing culture [25]. In this section,
we briefly present the benefits, challenges, practical solutions of cloud usage in
financial industry.
11.5.1 Benefits
Ever since the financial crisis, the rise of cloud services in financial industry,
especially global Tier one financial institutions, have been mainly and undoubtedly
driven by the increasing demands from customers and regulators, as well as the
pressure of cutting expenses and shrinking margins. Obviously, cloud providers can
make business processes more efficient, enabling banks to do more with less and
reducing the immense cost of in-house IT. By using cloud, businesses are able to
scale up or down on a ‘pay-as-you-go’ basis, rather than being reliant on internal IT
resources. For examples, Commonwealth Bank of Australia reduced expenditure on
infrastructure and maintenance from 75 % of total outgoings to just 25 % by being a
partnership with Amazon Web Services (AWS). Another example is that by utilizing
cloud, BankInter in Spain was able to reduce the time needed for risk analysis from
23 h to less than 1 h [25].
11.5.2 Challenges
Inevitably there are headwinds for cloud deployment in financial industry. These
include concerns over the possible reputation damage the banks might suffer, loss
of competitive edge from proprietary technology and strategies for hedge funds if
its security is breached, government intrusion to data privacy, and loss of direct
control over IT. As the cost of storage has gone down, cloud storage of data actually
seems not particularly useful or beneficial, because the cost saving may not be able
to offset the risk of security breach and damage which could do to the firms. If
something goes wrong, the reputation damage, or proprietary technology stolen
would far outweigh any possible benefits.
Another big problem with cloud is one of the hardest to resolve: extraterritorial-
ity. Whose rules should apply to a cloud service that serves one country but hosted in
another? What if the cloud is being used by an international organization—a large,
global Tier One bank such as J.P. Morgan, for instance? With differing rules between
North America, Europe and Asia, the only way round the problem is to understand
exactly where the data is at all times. This way, a bank can work out how to deal with
rules that have a cross-border implication. For instance, the US FATCA legislation
applies to any bank that interacts with a US taxpayer. But providing the IRS with
410 B. Fang and P. Zhang
To resolve the security concern, a good workaround solution is to use hybrid cloud.
More innovative things can go on private cloud, while less sensitive can go public.
The scale, power, and flexibility of the Hybrid Cloud provides financial companies
with significant benefits, particularly the ability to extend existing infrastructure
without incurring a large capital outlay for capacity while retaining sensitive
data/code on-premises as appropriate or mandatory by regulations. While in general
terms most businesses expect a private cloud to be more expensive than a public
cloud, the private cloud is actually cheaper for a big institution above a certain
scale, because to use a public cloud the firm would have to implement such stringent
security that any cost saving would be eaten away in any case.
“We are very slow as an industry to understand big data,” said Alastair Brown,
head of e-channels, Global Transaction Banking at RBS. But when we have
worked out the best way to use it, “it will almost certainly be unaffordable to
run the algorithms without using cloud capabilities.” Cloud is part of the future,
it provides a competitive advantage, and it is moving from a buzzword to real
implementation [25].
Finance is no longer a small data discipline. The ability to process enormous amount
of information on the fly separates winners from losers in today’s financial markets.
Being aware of latest big data finance tools and technology is a necessity for every
prudent financial service professional.
Big data in financial industry is still at the start of its journey. It has yet been
across the industry alone as a whole. Some top-tier financial firms have acted as
early adopters but they usually do not have comprehensive big data strategies in
place, instead focusing on some specific areas such as risk management, trade
analytics, etc. The frontrunners that have already been aware of the benefits of big
data are certainly going to extend their usage of these strategies. However, these
implementations will likely remain piecemeal for the near future.
The focusing areas of future big data investment will be extended toward client
analytics. Current investments in big data has been largely focusing on revenue
generations in the front office, such as trading opportunities mining and portfolio
management, but the future is likely to be more on client acquisition and retention
to enhance and personalize customer experience. Client analytics have been proven
to be able to benefit both acquisition and retention. Research showed that banks that
11 Big Data in Finance 411
apply analytics to customer data have a four-percentage point lead in market share
over banks that do not. The difference in banks that use analytics to understand
customer attrition is even starker at 12-percentage points [26].
The future growth of big data as a strategy in the industry relies on the continued
education of internal staffs about its uses and advantages. Most of the financial firms
using big data tend to hire experts in order to grow their internal knowledge base
[17]. However, this will open up to key person risk if the big data knowledge and
skills are not disseminated wider among internal staffs. Plus same as other technolo-
gies, after initiative, the big data needs constant refinement and evolvement to adapt
the dynamic market conditions. Firms also need to invest continually in training
their analytics staff on new techniques and their business personnel to enhance
decision-making. The continued in-house education will be a key to future success-
ful deployments and maintenance of big data strategies and technologies over time.
References
1. Aldridge I (2015) Trends: all finance will soon be big data finance
2. Iati R (2009) The real story of trading software espionage. WallStreet and Technology.
Available: AdvancedTrading.com
3. (2012) Times Topics: high-frequency trading. The New York Times
4. Lewis M (2014) An adaption from ‘Flash Boys: A Wall Street Revolt’, by Michael Lewis, The
New York Times
5. Egan M (2013) Survey: ‘Hash Crash’ didn’t seriously erode market structure confidence,
FoxBusiness
6. Kilburn F (2013) 2013 review: social media, ‘Hash Crash’ Are 2013’s trendingtopics
7. Gutierrez DD (2015) InsideBIGDATA guide to big data for finance
8. (2014) Big data: profitability, potential and problems in banking, Capgemini Consulting
9. Groenfeldt T (2013) Banks use big data to understand customers across channels, Forbes
10. Zagorsky V (2014) Unlocking the potential of Big Data in banking sector
11. Yu P, McLaughlin J, Levy M (2014) Big Data, a big disappointment for scoring consumer
creditworthiness. National Consumer Law Center, Boston
12. Algorithmic trading
13. (2014) Retail banks and big data: big data as the key to better risk management, A report from
the Economist Intelligence Unit
14. Arnold Veldhoen SDP (2014) Applying Big Data To Risk Management: transforming risk
management practices within the financial services industry
15. Andreas Huber HH, Nagode F (2014) BIG DATA: potentials from a risk management
perspective
16. Jackson J (2015) IBM and Deloitte bring big data to risk management, Computerworld
17. O’Shea V (2014) Big Data in capital markets: at the start of the journey, Aite Group Report
(commissioned by Thomson Reuters)
18. M a Celent (2013) How Big is Big Data: big data usage and attitudes among North American
financial services firms
19. (2013) BBRS 2013 banking customer centricity study
20. Jean Coumaros JB, Auliard O (2014) Big Data alchemy: how can banks maximize the value of
their customer data? Capgemini Consulting
21. (2013) Deutsche bank: big data plans held back by legacy systems, Computerworld UK
22. (2012) How ‘Big Data’ is different, MIT Sloan Management Review and SAS
412 B. Fang and P. Zhang
23. C. P. Finextra research, NGDATA (2013) Monetizing payments: exploiting mobile wallets and
big data
24. King R (2014) BNY Mellon finds promise and integration challenges with Hadoop. Wall
Street J
25. Holley E (2014) Cloud in financial services – what is it not good for?
26. Aberdeen (2013) Analytics in banking