IM08
IM08
Learning Objectives
• Big data definition
• The key drivers for big data solutions
• Building blocks for big data solutions
• Critical success factors for big data
• Tools for big data
2
Big Data
• Our world is becoming more interconnected, with vast
amounts of data being generated from various sources.
• Unstructured data, like social media content and sensor
readings, makes up a significant portion, around 80%, of
global data.
• Examples such as 30 billion Facebook posts and 2.9 million
emails per second highlight the scale of data creation.
• Enterprises are shifting focus from structured data to
analyzing diverse data types for valuable insights, leading to
the rise of big data solutions.
3
Big Data
• Big data challenges have been present in various industries
before big data technologies emerged.
• Industries like oil and gas exploration and stock exchanges
have dealt with large volumes of data requiring rapid
processing.
• Oil and gas companies process sensor data, seismic images,
and well log data for real-time insights during rig operations.
• Stock exchanges handle real-time stock indexes, necessitating
large data volume processing.
4
Big Data Definition
• Transitioning from traditional data warehouses to big data
environments isn't always clear-cut due to differing
definitions across organizations and industries.
• A practical definition of big data includes processing, storing,
and analyzing large and diverse data sets, including
structured, semi-structured, and unstructured data.
• Big data solutions are necessary when traditional information
management technologies struggle with the scale and
complexity of these data sets.
5
Big Data Definition
• The definition of big data emphasizes its ability to handle
heterogeneous data types that traditional systems cannot
manage effectively.
• Enterprises adopt big data technologies when they encounter
data sets beyond the capabilities of traditional information
management tools, signaling a shift in their data processing
and analytics strategies.
6
Big Data Definition
• Gartner's definition of big data is based on the three Vs:
volume, velocity, and variety.
– Volume refers to the enormous amounts of data generated from
various sources such as the Internet and sensors in mechanical
devices.
– Velocity relates to the high speed at which streaming data from
sensors, RFID tags, and smart meters is generated, requiring near
real-time processing.
– Variety encompasses the diverse forms of data, including structured
data from enterprise applications, semi-structured data from weblogs,
and unstructured data from text documents, emails, and social media.
7
Common data types of big data
8
Determining if an enterprise is ready
for big data
• Involves assessing specific characteristics within its
information landscape:
– Large Data Volumes: The enterprise deals with substantial data
volumes from various sources like enterprise applications, social
media, machine data, weblogs, and weather data.
– Diverse Data Types: The data includes a mix of structured,
unstructured, and semi-structured types, reflecting a broad range of
information formats.
– Longer Data Retention: Data is stored for extended periods due to
regulatory and compliance requirements, leading to a search for cost-
effective storage solutions like Hadoop.
9
Determining if an enterprise is ready
for big data
• Involves assessing specific characteristics within its
information landscape:
– Wide Application Usage: Data serves multiple applications such as
customer retention, loyalty analysis, weather impact on sales, etc.,
necessitating integration of structured and unstructured data for
comprehensive insights.
– Time Sensitivity and Decision Making: There are pressures to reduce
time to market and enable faster decision-making. This requires
technologies that can handle diverse data types efficiently, produce
actionable insights, and support decision-making processes across
product design and customer relationship management.
10
Real-world examples showcase how
enterprises have embraced the realm
of big data:
• Wind Power Companies: By analyzing petabytes of weather
data, wind power companies can swiftly determine optimal
location sites, reducing analysis time from weeks to minutes,
thanks to big data technologies.
• Logistics Companies like UPS: Utilize big data analytics from
truck sensors to improve route planning, leading to cost
savings through reduced driver time and fuel consumption.
11
Real-world examples showcase how
enterprises have embraced the realm
of big data:
• Social Media Platforms: Twitter and Facebook process
terabytes of data daily, while over 200 million smart meters
contribute to the data explosion, highlighting the widespread
adoption of big data technologies.
• Utility Companies: Use big data for forecasting energy
production, leveraging insights from weather data analysis to
meet potential demand efficiently.
12
Key Drivers for Big Data Solutions
13
Data Monetization Opportunities
14
New Product Innovations
15
Deeper Customer Insights
16
Operational Process Efficiencies
17
Fraud Detection and Reduction of
Risk
• Big data solutions play a crucial role in proactively
managing fraud detection and reducing compliance risks in
businesses.
• By combining structured and unstructured data sets and
integrating historical data with fraud modeling, these
solutions can detect patterns and enhance fraud detection
capabilities.
• Additionally, integrating identity data with surveillance
further strengthens fraud prevention measures, helping
businesses mitigate risks associated with fraudulent
activities and regulatory reporting.
18
Cost Optimization
19
Cost Optimization
20
The seven phases of a big data
strategy
21
Building Blocks and Enablers for
Big Data Solutions
22
The three key factors in the
enablement big data solutions
• Big Data Vision and Strategy encompass the initial
assessment of a company's capabilities and future roadmap
for big data adoption. The vision aligns with the enterprise's
mission, focusing on how big data can support its goals. The
strategy outlines specific use cases and initiatives to realize
this vision, with a roadmap and business case detailing the
sequence of actions and potential benefits. Determining ROI
for big data initiatives, especially those involving data
monetization, can be complex, often relying on industry or
process-specific benchmarks for estimation..
23
The three key factors in the
enablement big data solutions
• Big Data Pilot and the Next Steps
• Big Data Pilot projects are crucial for enterprises to test and
understand the benefits of big data solutions. These pilots
help in refining use cases that can drive revenue, cost
optimization, enhanced customer service, and product
innovations.
• For instance, integrating social media data into customer
management can provide insights into customer sentiments,
improving marketing campaigns and customer service
levels.
24
The three key factors in the
enablement big data solutions
• Big Data Pilot and the Next Steps
• The steps involved in a Proof of Concept (PoC) for such
projects include identifying potential use cases, importing
relevant data, analyzing data using visualization tools, and
deriving actionable insights.
• This approach ensures that the technology is validated
before full implementation, leading to a deeper
understanding of business benefits like improved campaign
effectiveness and customer retention. This understanding
helps in defining the next steps in the roadmap for big data
adoption.
25
The three key factors in the
enablement big data solutions
• Big Data Solution Architecture and Tools
• The key drivers in any big data solution are defining the
solution architecture and selecting the supporting tools and
technologies. This involves identifying the necessary solution
components and the tools to build the big data solution.
• Hadoop-Based Repository: A central repository is crucial for
storing various types of data (structured, semi-structured,
unstructured) such as transaction data, enterprise data, machine
data, and social media data. The repository is typically based on
Hadoop Distributed File System (HDFS), with multiple
distributions available like Cloudera, Hortonworks, and
BigInsights.
26
The three key factors in the
enablement big data solutions
• Big Data Solution Architecture and Tools
• Hadoop Components: Hadoop comprises modules such as
storage based on HDFS, resource management for task
execution, a distributed processing model like MapReduce
(now with alternatives like Apache Spark for interactive
processing and Apache Storm for real-time processing), and
utilities and software libraries supporting the Hadoop platform.
• Wider Applications: While initially adopted by e-commerce
and internet companies like Yahoo and Google, Hadoop and big
data solutions are now used across various industries, with
common use cases including data storage, processing, analytics,
and real-time data processing.
27
Common Use Cases for Hadoop
28
key architecture principles
concerning big data solutions
• The big data solution should provide the organization with a
trusted, unified, and consistent view of diverse data types
integrated from a variety of structured and unstructured data
sources.
• The big data solution should provide batch, interactive, and
near real-time data integration and analytics capabilities. It
must have the capability to handle mixed workloads and
query patterns.
• The big data solution needs to scale well (to petabytes) to
handle large volumes as well as to ensure that storage costs
are kept low through HDFS based storage
29
key architecture principles
concerning big data solutions
• In a big data enabled world where enterprises dabble with
both structured and unstructured data, the big data solution
requires that metadata be associated with both structured
and unstructured data.
• The big data solution must have governance and the right to
secure the data elements stored in its data lakes and to
ensure there are suitable audit mechanisms. Compliance to
local and global laws concerning data privacy and customer
data should be maintained.
30
Critical Success Factors in Big Data
31
Critical Success Factors in Big Data
• Start Small with Pilots: Begin with one or two pilot projects
to test the technology and gauge its impact. This approach
helps validate the technology, quantify actual benefits, and
learn valuable lessons for future initiatives.
• Enhance Skills: Invest in training and hiring to build
expertise in big data technologies and data science.
Enhancing skills is crucial for executing big data initiatives
successfully.
32
Critical Success Factors in Big Data
33
Tools for Big Data
34
Assignment
35
36