Module 1 - Introduction
Module 1 - Introduction
©lnthanh
3
What is Big Data?
It’s not big. It’s just bigger…
©lnthanh 4
Definitions of Big data
o A variety of Big data definitions are available worldwide.
Big data is a term used to refer to the study and applications of data sets
that are so big and complex that traditional data-processing application
software are inadequate to deal with them.– Wikipedia.
Big data refers to the dynamic, large and disparate volumes of data being
created by people, tools and machines; it requires new, innovative and
scalable technology to collect, host and analytically process the vast
amount of data gathered in order to derive real-time business insights
that relate to consumers, risk, profit, performance, productivity
management and enhanced©lnthanh
shareholder value. – Ernst & Young, 2014.5
Definitions of Big data
©lnthanh
6
Small data vs. Big data
o “Big data” is similar to “small data” but bigger.
o Handling bigger data requires different approaches
(i.e., techniques, tools and architecture, etc.).
o Solve new problems or existing problems in a better
way.
bigger computer?
bigger data
©lnthanh
or more small computers? 7
Technologies in Big data
o Not a single technology but a combination of old
and new technologies that helps companies gain
actionable insight
©lnthanh
8
Characteristics of Big data
o The characteristics of Big data are characterized by the 4V’s.
©lnthanh
9
The 4V’s: Velocity
o Description: Data is being generated extremely fast,
a process that never stops; and the speed at which
data is transformed into insight
o Attributes: Batch; near/real-time; streams
o Drivers: Improved connectivity; competitive
advantage; precomputed information
©lnthanh
10
Real-time and/or fast data
Mobile devices
(tracking all objects all the time)
Scientific instruments
(collecting all sorts of data)
Product
Recommendations Learning why Customers
that are Relevant Influence switch to competitors
Behavior
& Compelling and their offers; in
time to Counter
Friend Invitations
Improving the
Customer to join a
Game or Activity
Marketing
that expands
Effectiveness of a
business
Promotion while it
is still in Play
Preventing Fraud
as it is Occurring
& preventing more
proactively
©lnthanh
12
The 4V’s: Volume
o Description: The amount of data generated is vast
compared to traditional data sources
o Attributes: Exabyte, zettabyte, yottabytes, etc.
o Drivers: Increase in data sources, higher resolution
sensors, scalable infrastructure
©lnthanh
13
The growth of data
o The data volume is increasing exponentially.
• 44x increase from 2009 to 2020
• From 0.8 Zettabytes to 35 ZBs
Exponential increase in
collected/generated data
©lnthanh
14
The growth of data
Source: http://www.vcloudnews.com/every-day-big-data-statistics-2-5-quintillion-bytes-of-
data-created-daily/, updated 05/04/2015.
©lnthanh
15
©lnthanh 16
https://www.raconteur.net/infographics/a-day-in-data/ (2019)
Examples of Data
Volume
©lnthanh
18
The growth of data: Other
statistics
o The New York Stock Exchange generated ~4−5 TBs of
data per day.
o Facebook hosts more than 240 billion photos,
growing 7 PBs of data per month.
o The genealogy site Ancestry.com stores ~10 PBs of
data.
o The Internet Archive stores around 18.5 PBs of data.
Availability of data
©lnthanh
21
A single view to the customer
Social Banking
Media Finance
Our
Gaming
Customer Known
History
Entertain Purchase
©lnthanh
22
The 4V’s: Veracity
o Description: Quality and origin of data
o Attributes: Consistency; completeness; integrity;
ambiguity
o Drivers: Cost; need of traceability and justification
©lnthanh
23
The emerging V - Value
o The ability and need to turn data into value
o Value is not only profit but also medical or social
benefits, or personal satisfaction (customer,
employee, etc.).
©lnthanh
24
Outline
o What is Big Data?
• Definitions of Big Data
• The V’s characteristics of Big Data
• Common Issues in Big Data
o Big Data Case Studies
• Applications of Big Data
• Big Data Projects in practice
o Motivations and Opportunities
©lnthanh
25
Common issues related to the
4V’s
o As the data volume increases, the value of different
data records will decrease in proportion to age, type,
richness, and quantity among other factors.
o It is hard to handle complex data by existing
traditional analytic systems.
• Big data with relational databases, statistics/visualization
packages
• Massively parallel software running on tens, hundreds, or
even thousands of servers.
• Data analytics with data that is constantly in motion.
©lnthanh
26
Common issues related to the
4V’s
o There is a considerable gap between Business
leaders and IT professionals
• Business leaders concern about adding value to their
business and getting more and more profit, while IT
leaders care about the technicalities of the storage and
processing only.
©lnthanh
27
Issues of storage and transport
o Current technologies limit the disk size to about 4
TBs (1012) ® 1 exabytes (1018) would require
250,000 disks.
• A single computer system would be unable to directly
attach the requisite number of disks
o Access to that data overwhelms current
communication networks
©lnthanh
28
Issues of data management
o Possibly the most difficult problem
o Issues of access, utilization, updating, governance,
and reference (in publication) are major stumbling
blocks.
• Data sources are varied by size, format, and by method of
collection.
• What, when, where, who, why and how it was collected.
• Given the volume, it is impractical to validate every data
item.
©lnthanh
29
Issues of processing power
o Extensive parallel processing and new analytics
algorithms are required.
Assume that an exabyte of data need to be processed and it is chunked into blocks of 8 words ® 1
exabytes = 1K petabytes.
Assuming a processor expends 100 instructions on one block at 5 gigahertz ® 1K petabytes would require
a processing time of 635 years.
©lnthanh
30
Outline
o What is Big Data?
• Definitions of Big Data
• The V’s characteristics of Big Data
• Common Issues in Big Data
o Big Data Case Studies
• Applications of Big Data
• Big Data Projects in practice
o Motivations and Opportunities
©lnthanh
31
Big Data Case Studies
The more data, the better decisions, and then the better outcomes…
©lnthanh 32
Big Data use case categories
©lnthanh
33
Big data analytics
o Big data is more real-time in nature than traditional
data warehouse applications.
• Traditional architectures (e.g. Exadata, Teradata) are not
well-suited for big data apps.
©lnthanh
34
Examples of Big Data Analytics
©lnthanh
35
Practical cases of Big data
analytics
©lnthanh
36
Challenges in handling Big data
©lnthanh
38
Big data in Healthcare
o 80% of medical data is unstructured and clinically
relevant.
o Data resides in multiple places
• Individual EMRs, labs and imaging systems, physician
notes, medical correspondence, etc.
o Leveraging big data may help us to
• Build sustainable healthcare systems.
• Collaborate to improve care and outcomes.
• Increase access to healthcare.
©lnthanh
39
Vestas: optimizes turbine placement
©lnthanh
40
Source: https://www.slideshare.net/SwissHUG/ibm-big-data-
platform-nov-2012, updated 11/2012
41
KTH: Reducing traffic congestion
©lnthanh
IBM MobileFirst Connected Car
Source:
http://m2m.demos.ibm.com/
©lnthanh
42
Sentiment analysis on Twitter
data
o Real-time sentiment analysis on Twitter data to
predict debate winners and changes in candidate
popularity.
o Tweets related to the topic are collected through
Twitter firehose and processed by a Twitter-specific
NLP tool.
Source:
http://www.socialmediatoday
.com/technology-data/using-
social-media-data-predict-
result-2016-us-presidential-
election
©lnthanh
43
Outline
o What is Big Data?
• Definitions of Big Data
• The V’s characteristics of Big Data
• Common Issues in Big Data
o Big Data Case Studies
• Applications of Big Data
• Big Data Projects in practice
o Motivations and Opportunities
©lnthanh
44
Motivations and opportunities
A new horizon that changes our lives…
©lnthanh 45
New insight into data
o Why deal with more data? New insight
o This new insight is not only for top level executives.
• It will be used to get people throughout the enterprise to
run the business better and to provide better service to
customers.
©lnthanh
46
Purpose of Big data analytics
©lnthanh
47
Applications of Big data analytics
©lnthanh
(Copyright 2018 – Dresner Advisory Services)
49
Adoption of Big Data by industry
©lnthanh
51
Big Data market forecast
©lnthanh
52
Big Data market forecast
©lnthanh
53
Big Data job opportunities
Statistics for Big Data Analytics skills in IT jobs advertised across the UK (June, 2016)
©lnthanh
54
Big Data job opportunities