0% found this document useful (0 votes)
27 views34 pages

Chapter 09 - in Class

Chapter 9 discusses Big Data, cloud computing, and location analytics, focusing on their impact on decision support systems. It covers the definition of Big Data, enabling technologies like Hadoop and Spark, and the challenges faced in Big Data analytics. The chapter also explores applications of stream analytics and geospatial analytics, as well as the role of cloud computing in enhancing business analytics capabilities.

Uploaded by

sanasyed806
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views34 pages

Chapter 09 - in Class

Chapter 9 discusses Big Data, cloud computing, and location analytics, focusing on their impact on decision support systems. It covers the definition of Big Data, enabling technologies like Hadoop and Spark, and the challenges faced in Big Data analytics. The chapter also explores applications of stream analytics and geospatial analytics, as well as the role of cloud computing in enhancing business analytics capabilities.

Uploaded by

sanasyed806
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Analytics, Data Science and A I:

Systems for Decision Support


Eleventh Edition

Chapter 9
Big Data, Cloud Computing, and
Location Analytics: Concepts and
Tools
Slide in this Presentation Contain Hyperlinks.
JAWS users should be able to get a list of links by
using INSERT+F77

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Learning Objectives (1 of 2)
9.1 Learn what Big Data is and how it is changing the
world of analytics
9.2 Understand the motivation for and business drivers of
Big Data analytics
9.3 Become familiar with the wide range of enabling
technologies for Big Data analytics
9.4 Learn about Hadoop, MapReduce, and NoSQ L as
they relate to Big Data analytics
9.5 Compare and contrast the complementary uses of
data warehousing and Big Data technologies

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Learning Objectives (2 of 2)
9.6 Become familiar with in-memory analytics and Spark
applications
9.7 Become familiar with select Big Data platforms and
services
9.8 Understand the need for and appreciate the
capabilities of stream analytics
9.9 Learn about the applications of stream analytics
9.10 Describe the current and future use of cloud
computing in business analytics
9.11 Describe how geospatial and location-based analytics
are assisting organizations

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette (1 of 4)
Analyzing Customer Churn in a Telecom
Company Using Big Data Methods

• Telecom – a highly competitive market segment


• Customer churn rate is higher than most other markets
• A good example of Big Data analytics
• Challenges
– Data from multiple sources
 Web log of customers (website)
 Physical service center activities
 Customer service Phone call logs
– Data volume is higher than usual
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette (2 of 4)
Analyzing Customer Churn in a Telecom
Company Using Big Data Methods
Figure 9.1 Multiple Data Sources Integrated into Teradata Vantage.

Source: Teradata Corp.


Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette (3 of 4)
Analyzing Customer Churn in a Telecom
Company Using Big Data Methods
 Big data architecture to combine all three data sources
 Common data capturing:
 CustomerID
 Channel
 Date/time stamp
 Action taken (one of more of 1 options)
 Goal: find the most common path leading to cancellation
 Sessionized data by customer for each 5-day time period
 Identified top 20 paths of cancellation
 offer incentives to customers on those paths to avoid
cancellation.
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette (4 of 4)
Analyzing Customer Churn in a Telecom
Company Using Big Data Methods
Figure 9.2 Top 20 Paths Visualization (Sankey Diagram)

Source: Teradata Corp.


Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Technology Insights 9.1
The Data Size Is Getting Bigger and Bigger
• Boeing jet sensors - 20 T B/hr Name Symbol Value
Kilobyte kB 103
• Facebook - 600 T B/day
Megabyte MB 106
• YouTube – 1 T B/min Gigabyte GB 109
Terabyte TB 1012
• The proposed Square
Petabyte PB 1015
Kilometer Array telescope
Exabyte EB 1018
(the world’s proposed biggest
Zettabyte ZB 1021
telescope) – 1 E B/day
Yottabyte YB 1024
Brontobyte* B 1027
Gegobyte* Ge B 1030

*Not an official S I (International System of Units) name/symbol, yet.


Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data - Definition and Concepts
• Big Data is a misnomer!
• Big Data is more than just “big”
• The Vs that define Big Data
– Volume
– Variety: all types of format
– Velocity: speed of generating data
– Veracity: accuracy, truthfulness, trustworthiness
– Variability: data flow inconsistent with peaks
– Value: business value
–…

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Fundamentals of Big Data Analytics
• Big Data by itself, regardless of the size, type, or speed, is
worthless
• Big Data + “big” analytics = value
• With the value proposition, Big Data also brought about
big challenges
– Effectively and efficiently capturing, storing, and
analyzing Big Data
– New breed of technologies needed

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Business Problems Addressed by
Big Data Analytics
• Process efficiency and cost reduction
• Brand management
• Revenue maximization, cross-selling/up-selling
• Enhanced customer experience
• Churn identification, customer recruiting
• Improved customer service
• Identifying new products and market opportunities
• Risk management
• Regulatory compliance
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Enablers of Big Data Analytics
High performance computing
• In-memory analytics
– Storing and processing the complete data set in RA M
• In-database analytics
– Placing analytic procedures close to where data is
stored
• Grid computing & M P
– Use of many machines and processors in parallel
(M P - massively parallel processing)
• Appliances
– Combining hardware, software, and storage in a single
unit for performance and scalability
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Challenges of Big Data Analytics
• Data volume
– The ability to capture, store, and process the huge volume
of data
• Data integration
– The ability to combine data quickly and at reasonable cost
• Processing capabilities
– The ability to process data quickly, as it is captured (i.e.,
stream analytics)
• Data governance
– Keep up with security, privacy, ownership, and quality
• Skill availability
• Solution cost (RO I)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data Technologies
-- MapReduce
• MapReduce distributes the processing of very large multi-
structured data files across a large cluster of ordinary
machines/processors
• Goal - achieving high performance with “simple”
computers
• Developed and popularized by Google
• Good at processing and analyzing large volumes of multi-
structured data in a timely manner
• Example tasks: indexing the Web for search, graph
analysis, text analysis, machine learning, …

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data Technologies
-- MapReduce
• How does MapReduce work?

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data Technologies
-- Hadoop (1 of 3)
• Hadoop is an open source framework for storing and
analyzing massive amounts of distributed, unstructured
data
• Hadoop clusters run on inexpensive commodity hardware
so projects can scale-out inexpensively
– Hadoop is now part of Apache Software Foundation
– Open source - hundreds of contributors continuously
improve the core technology

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data Technologies
-- Hadoop (2 of 3)
• How Does Hadoop Work?
– Access unstructured and semi-structured data (e.g., log
files, social media feeds, other data sources)
– Break the data up into “parts,” which are then loaded into a
file system made up of multiple nodes running on
commodity hardware using HDF S
– Each “part” is replicated multiple times and loaded into the
file system for replication and failsafe processing
– A node acts as the Facilitator and another as Job Tracker
– Jobs are distributed to the clients, and once completed the
results are collected and aggregated using MapReduce

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data Technologies
-- Hadoop (3 of 3)
• Hadoop Technical Components
– Hadoop Distributed File System (HDF S)
– Name Node (primary facilitator knowing data location)
– Secondary Node (backup to Name Node)
– Job Tracker (coordinator of processing or MapReduce)
– Slave Nodes (store data and process data following
directions)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Technology Insights 9.2
A Few Demystifying Facts about Hadoop
• Hadoop consists of multiple products
• Hadoop is open source but available from vendors, too
• Hadoop is an ecosystem, not a single product
• HDF S is a file system, not a DBM S
• Hive resembles SQ L but is not standard SQ L
• Hadoop and MapReduce are related but not the same
• MapReduce provides control for analytics, not analytics
• Hadoop is about data diversity, not just data volume

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
A High-Level Conceptual
Architecture for Big Data Solutions
Figure 9.3 A High-Level Conceptual Architecture for Big Data Solutions.

Source: Teradata Company.


Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data and Data Warehousing

• What is the impact of Big Data on D W?


– Will Hadoop replace data warehousing/RDBM S?
• Use Cases for Hadoop
– Hadoop as the repository and refinery
– Hadoop as a powerful, economical, and active archive
– Unstructured or semi-structured data
• Use Cases for Data Warehousing
– Data warehouse performance
– Integrating data that provides business value
– Interactive B I tools

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Hadoop versus Data Warehouse
When to Use Which Platform

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Coexistence of Hadoop and D W
Figure 9.7 Coexistence of Hadoop and Data Warehouses.

Source:“Hadoop and the Data Warehouse: When to Use Which, Teradata, 2012.” Used
with permission from Teradata Corporation.
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Coexistence of Hadoop and D W
1. Use Hadoop for storing and archiving multi-structured
data
2. Use Hadoop for filtering, transforming, and/or
consolidating multi-structured data
3. Use Hadoop to analyze large volumes of multi-structured
data and publish the analytical results
4. Use a relational DBM S that provides MapReduce
capabilities as an investigative computing platform
5. Use a front-end query tool to access and analyze data

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
In-Memory Analytics and Spark
• In-memory analytics
– Faster processing than batch processing
– Real-time dashboards for decision making

• Apache Spark
– Developed at University of California, Berkeley in 2009
– Unified analytics engine for batch and streaming data

• Use cases
– Uber uses Spark to detect fraudulent trips
– Pinterest measures user engagement in real-time
– Netflix uses Spark to run the recommendation engine
– Yahoo uses it for creating business intelligence apps.
– Ebay uses Spark for data management and stream processing.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data And Stream Analytics
• Data-in-motion analytics and real-time data analytics
– One of the Vs in Big Data = Velocity
– Analytic process of extracting actionable information from
continuously flowing data

• Why Stream Analytics?


– It may not be feasible to store the data
– Data elements are often called tuples
– A window of data is a finite number/sequence of tuples

• Stream Analytics Versus Perpetual Analytics


– SA: prescribed window for data observations
– PA: all prior data or observations

• Critical Event Processing – Detect critical events

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Stream Analytics Applications
• Data stream Mining
– Enabling technology for stream analytics
– Data stream is a continuous flow of an ordered sequence of
instances
– Sensor data, network traffic, phone logs, web searches, financial
data

• Applications
– e-Commerce
– Telecommunication
– Law Enforcement and Cyber Security
– Power Industry
– Financial Services
– Health Services
– Government

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Stream Analytics
A Use Case in Energy Industry

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Location-Based Analytics (1 of 2)
• Geospatial analytics
– Location data available from geographic information
systems (GIS), e.g. ESRI
– Other Applications: Agricultural, crime analysis,
disease spread applications

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Location-Based Analytics (2 of 2)
• A Multimedia Exercise in Analytics Employing Geospatial
Analytics
– BSI case of the Dropped Mobile Calls
• Real-Time Location Intelligence
• Analytics Applications for Consumers
– Waze
– Yelp
– ParkPG H

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cloud Computing and Business
Analytics
• Data as a Service (DaaS)
• Software as a Service (SaaS)
• Platform as a Service (PaaS)
• Infrastructure as a Service (IaaS)
• Essential Technology Stack
– Networking, Storage, Servers, Virtualization (IaaS)
– Above + OS, Middleware, Runtime (PaaS)
– Above + Data, Application (SaaS)
– Figure 9.13 (p. 560)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cloud Computing and Analytics
• Cloud deployment models
– Private cloud: internal or corporate cloud
– Public cloud
– Hybrid cloud
• Major Cloud Platform Providers in Analytics
– Amazon Elastic Beanstalk
– I B M Cloud
– Microsoft Azure
– Google App Engine
– Openshift

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Analytics Applications on the Cloud

• Analytics as a Service (AaaS)


• Representative Analytics as a service providers
– MineMyText, SA S Viya, Tableau, Snowflake …
• Illustrative Applications
– Using Azure I O T, Stream Analytics, and Machine
Learning to Improve Mobile Health Care Services
– Gulf Air Uses Cloudera’s Hadoop Big Data to Get
Deeper Customer Insight
– Chime Enhances Customer Experience Using
Snowflake to consolidate 14 data sources

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Copyright

This work is protected by United States copyright laws and is


provided solely for the use of instructors in teaching their
courses and assessing student learning. Dissemination or sale of
any part of this work (including on the World Wide Web) will
destroy the integrity of the work and is not permitted. The work
and materials from it should never be made available to students
except by instructors using the accompanying text in their
classes. All recipients of this work are expected to abide by these
restrictions and to honor the intended pedagogical purposes and
the needs of other instructors who rely on these materials.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved

You might also like