IBM Introduction To Big Data
IBM Introduction To Big Data
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
Introduction to Big Data
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
Unit objectives
• Understand when and why you would use big data
• Explain the perception gap
• Explain the difference between data-at-rest and data-in-motion
• Describe the 3 Vs
Unit objectives
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
The scale
• 2.5 petabytes
Memory capacity of the human brain
• 13 petabytes
Amount that could be downloaded from the internet in two minutes, if every
American (300M) was on a computer at the same time
• 4.75 exabytes
Total genome sequences of all people on the Earth
• 422 exabytes
Total digital data created in 2008
• 1 zetabyte
World’s current digital storage capacity
• 1.8 zettabytes
Total digital data expected to be created in 2011
The scale
It is hard for most people to grasp the concept of how large a petabyte or an exabyte is.
For a long time people thought that a billion was a large number. But as quickly as most
governments spend a billion dollar or euros, obviously, it cannot be that large of a
number. To better understand extremely large numbers, it is best to view them in
comparison to something that you can understand. The capacity of the human brain is
about 2.5 petabytes. (This is also the estimated size of Walmart databases that handle
1 million customer transactions a day.) The total genome sequences of all people on
the Earth is 4.75 exabytes. The total amount of digital data created in 2008 was 422
exabytes. And the total that was expected to be created in 2011 was 1.8 zettabytes.
In 2000 the Sloan Digital Sky Survey began collecting astronomical data. In the first few
weeks it amassed more data than was collected in the history of astronomy. And the
total amount of data collected by the SDSS is the amount that its successor, the Large
Synoptic Survey Telescope, is expected to collect every 5 days, when it comes online
in 2016.
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
2 Billion Internet
users by 2011
1.3 Billion RFID tags in 2005
30 Billion RFID today
4.6 Billon
Mobile Phones
World Wide
Capital market
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
1 in 3
300,000 tweets > 1 PB per
Business leaders frequently make
decisions based on information they
per minute day gas don’t trust, or don’t have
80% 83%
of CIOs cited "Business intelligence
and analytics" as part of their
visionary plans
Of world’s data to enhance competitiveness
is unstructured
60%
of CEOs need to do a better job
capturing and understanding
information rapidly in order to
2012 make swift business decisions
2.8 zettabytes
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
Sources:
• The Guardian, May 2010
• IDC Digital Universe, 2010
• IBM Institute for Business Value, 2009
• IBM CIO Study 2010
• TDWI: Next Generation Data Warehouse Platforms Q4 2009
• https://blog.kissmetrics.com/facebook-statistics/
• http://www.webopedia.com/quick_ref/just-how-much-data-is-out-there.html
• http://www.computerworlduk.com/news/infrastructure/3433595/boeing-787s-to-
create-half-a-terabyte-of-data-per-flight-says-virgin-atlantic/
• http://www.webopedia.com/quick_ref/just-how-much-data-is-out-there.html
• http://www.forbes.com/sites/maribellopez/2013/05/10/ge-speaks-on-the-business-
value-of-the-internet-of-things/
• http://www.idc.com/prodserv/4Pillars/bigdata;jsessionid=94A407E4522FB407627
ECEBBAAA90A24
• http://www.digitalbuzzblog.com/infographic-24-hours-on-the-internet/
• ZB = 1 billion TB
• IDC reference:
o http://idcdocserv.com/925
o http://www.computer.org/portal/web/news/home/-
/blogs/2613266;jsessionid=abbfded1402383e107abfa2641d6
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
7%
l Disagree
23%
Neutral
Agree
70%
Source: "Capitalizing on
complexity, Insights from the "What Customers Want"
Global Chief Executive Office
First in a two-part series
Study," IBM Institute for Business
Value, 2010 IBM Institute for Business Value
Published March 2011
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
IT
Business Users
Delivers a platform
Determine what to enable creative
question to ask discovery
IT Business
Structures the Explores what
data to answer questions could be
that question asked
Monthly sales reports Brand sentiment
Profitability analysis Product strategy
Customer surveys Maximum asset utilization
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
Multi-channel customer
sentiment and experience
analysis
Detect life-threatening
conditions at hospitals in time to
intervene
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
• Imagine if you could make risk decisions, such as whether or not someone
qualifies for a mortgage, in minutes, by analyzing many sources of data, including
real-time transactional data, while the client is still on the phone or in the office.
• Imagine if law enforcement agencies could analyze audio and video feeds in real-
time without human intervention to identify suspicious activity.
As these new sources of data continue to grow in volume, variety and velocity, so too
does the potential of this data to revolutionize the decision-making processes in every
industry.
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
Gartner Sept. 2014 report: 13% of surveyed organizations have deployed big data solutions, while 73%
have invested in big data or plan to do so.
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
In the Educate stage, the primary focus is on awareness and knowledge development.
Almost 25 percent of respondents indicated that they are not yet using big data within
their organizations. While some remain relatively unaware of the topic of big data, our
interviews suggest that most organizations in this stage are studying the potential
benefits of big data technologies and analytics, and trying to better understand how big
data can help address important business opportunities in their own industries or
markets.
The focus of the Explore stage is to develop an organization's roadmap for big data
development. Almost half of respondents reported formal, ongoing discussions within
their organizations about how to use big data to solve important business challenges.
Key objectives of these organizations include developing a quantifiable business case
and creating a big data blueprint.
In the Engage stage, organizations begin to prove the business value of big data, as
well as perform an assessment of their technologies and skills. More than one in five
respondent organizations is currently developing proofs-of-concept (POCs) to validate
the requirements associated with implementing big data initiatives, as well as to
articulate the expected returns.
In the Execute stage, big data and analytics capabilities are more widely
operationalized and implemented within the organization. However, only 6 percent of
respondents reported that their organizations have implemented two or more big data
solutions at scale, the threshold for advancing to this stage.
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE
Unit summary
• Understand when and why you would use big data
• Explain the perception gap
• Explain the difference between data-at-rest and data-in-motion
• Describe the 3 Vs
Unit summary
This material is meant for IBM Academic Initiative use only. NOT FOR RESALE