Deloitte Solutions Network: Introduction To Big Data
Deloitte Solutions Network: Introduction To Big Data
2
Introduction
Big Data is one of the latest phenomenon to hit the tech world. The industry experts are constantly abuzz
by about the benefits of Big Data on marketers and businesses, but what exactly is this concept of data
collection? This document explains the concept of Big Data and the sources from where we acquire the
data.
Big Data is the term for a collection of data sets so large and complex that it becomes difficult to process
using on-hand database management tools or traditional data processing applications. Big Data analytics
is the process of examining this huge amount of data to uncover hidden patterns, unknown correlations
and other useful information. This information can be used by an organization to garner competitive ad-
vantages against a rival and result in business opportunities and benefits. It is not the quantity of data,
which is flowing that is radical; the big data revolution is that now we can do something with that data.
According to research from the McKinsey Global Institute (MGI) and McKinsey & Company’s Business
Technology Office, the sheer volume of data generated, stored, and mined for insights has become eco-
3
nomically relevant to businesses, government, and consumers[2]. In fact, the need for big data is no more
a question. The only question is how an organization can take advantage of it best.
But volume alone doesn’t define Big Data. It is about how we identify and use the data to extract
information.
In 2012, LinuxNews.com estimated that 12 Terabytes of Tweet data is generated daily on Twitter
[4]. The average lifespan of a tweet is two hours. A facebook post has a lifespan of three hours
and there are estimated three billion likes and comments posted daily. NYSE captures one tera-
byte of trade information during each session [3], or roughly two terabytes of data in a day.
To be able to handle high velocity, short lifespan data we need to minimize movement and stor-
age and increase the speed of analysis. Data must be analysed in real time and teach touch point
costs valuable time. To process the data efficiently takes massive parallel systems and new BI
architecture. Some of the technologies dealing with high Velocity data are: SAP HANA, Oracle
Exalytics, HP Vertica, Pentaho and Microstrategy to name a few.
Another example is a project called Mobile Millennium [6], which combines data from smartphone
app and traditional traffic control sensors to provide accurate real-time monitoring of traffic condi-
tions in San Francisco Bay Area.
The variety of data ranges from “people to people”, “people to machine” and “machine to ma-
chine”.
It’s imperative to organize data and identify what is to be used to help us spot business trends,
prevent diseases, do innovations, among other things.
5
Sources of Big Data
Until the world became connected as never before, thanks to the internet; most traditional data was struc-
tured, neatly organized in databases. There has been proliferation of unstructured data, generated by all
our online interactions, from online shopping to email interactions, youtube activities to facebook posts.
The number of gadgets recording and transmitting data, from smartphones to car sensors, traffic signals
to CCTV cameras, have multiplied globally, leading to an explosion in the volume of data.
About 75% of the data is unstructured, coming from sources such as text, audio or video. And as the mo-
bile penetration is set to grow with time, the figures of ever growing data can only increase.
We’ll try to break down the main sources of Big Data according to IBM’s research:
Social Media has huge volumes of data generated every day. Facebook ingests approximately
500 times more data each day than the NYSE [7]. At its peak on Aug 3, 2013, Twitter was pro-
cessing 143,199 tweets per second across the globe [8]. That’s a lot of data!
6
This volume of data moves at exceptional speed. It is a challenge for organizations to process
this data in a timely manner.
This data comes in all sorts of formats- unstructured texts, emails, audio, video; structured nu-
meric data in form of tables, stock ticker data etc.
The data from social media can be highly unpredictable. Such data loads from what’s trending in
social media, mixed up with unstructured data, is interesting to explore.
We can see that the amount of data and the sources from where we procure it is virtually unlimited. The
test is just not of how well we capture Big Data, but also how we organize it, visualize it, and operational-
ize it- deriving value from Big Data. Choosing the right technology to process information from the variety
of data sources is as important as capturing the data itself.
7
References
[1] http://harvardmagazine.com/2014/03/why-big-data-is-a-big-deal
[2]
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovati
on
[3] http://www.ibmbigdatahub.com/infographic/four-vs-big-data
[4] http://www.linuxnews.co/2012/06/psychsoftpc-offers-hadoop-cluster-solution/
[5] http://www.informationweek.com/healthcare/clinical-information-systems/big-data-use-in-
healthcare-needs-governance-education/d/d-id/1109191?
[6] http://traffic.berkeley.edu/
[7] http://www.businessinsider.in/Social-Medias-Big-Data-Future--From-Deep-Learning-To-
Predictive-Marketing/articleshow/30020382.cms
[8] https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
Version 1.0
As used in this document, “Deloitte” means Deloitte LLP and its subsidiaries. Please see www.deloitte.com/us/about for a detailed
description of the legal structure of Deloitte LLP and its subsidiaries.
This publication contains general information only, and none of Deloitte Touche Tohmatsu Limited, its member firms, or their related
entities (collectively, the “Deloitte Network”) is, by means of this publication, rendering professional advice or services. Before mak-
ing any decision or taking any action that may affect your finances or your business, you should consult a qualified professional
adviser. No entity in the Deloitte Network shall be responsible for any loss whatsoever sustained by any person who relies on this
publication.
About Deloitte
Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee, and its network of
member firms, each of which is a legally separate and independent entity. Please see www.deloitte.com/about for a detailed de-
scription of the legal structure of Deloitte Touche Tohmatsu Limited and its member firms. Please see www.deloitte.com/us/about
for a detailed description of the legal structure of Deloitte LLP and its subsidiaries.