Chapter 09 - in Class
Chapter 09 - in Class
Chapter 9
Big Data, Cloud Computing, and
Location Analytics: Concepts and
Tools
Slide in this Presentation Contain Hyperlinks.
JAWS users should be able to get a list of links by
using INSERT+F77
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Learning Objectives (1 of 2)
9.1 Learn what Big Data is and how it is changing the
world of analytics
9.2 Understand the motivation for and business drivers of
Big Data analytics
9.3 Become familiar with the wide range of enabling
technologies for Big Data analytics
9.4 Learn about Hadoop, MapReduce, and NoSQ L as
they relate to Big Data analytics
9.5 Compare and contrast the complementary uses of
data warehousing and Big Data technologies
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Learning Objectives (2 of 2)
9.6 Become familiar with in-memory analytics and Spark
applications
9.7 Become familiar with select Big Data platforms and
services
9.8 Understand the need for and appreciate the
capabilities of stream analytics
9.9 Learn about the applications of stream analytics
9.10 Describe the current and future use of cloud
computing in business analytics
9.11 Describe how geospatial and location-based analytics
are assisting organizations
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette (1 of 4)
Analyzing Customer Churn in a Telecom
Company Using Big Data Methods
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Fundamentals of Big Data Analytics
• Big Data by itself, regardless of the size, type, or speed, is
worthless
• Big Data + “big” analytics = value
• With the value proposition, Big Data also brought about
big challenges
– Effectively and efficiently capturing, storing, and
analyzing Big Data
– New breed of technologies needed
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Business Problems Addressed by
Big Data Analytics
• Process efficiency and cost reduction
• Brand management
• Revenue maximization, cross-selling/up-selling
• Enhanced customer experience
• Churn identification, customer recruiting
• Improved customer service
• Identifying new products and market opportunities
• Risk management
• Regulatory compliance
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Enablers of Big Data Analytics
High performance computing
• In-memory analytics
– Storing and processing the complete data set in RA M
• In-database analytics
– Placing analytic procedures close to where data is
stored
• Grid computing & M P
– Use of many machines and processors in parallel
(M P - massively parallel processing)
• Appliances
– Combining hardware, software, and storage in a single
unit for performance and scalability
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Challenges of Big Data Analytics
• Data volume
– The ability to capture, store, and process the huge volume
of data
• Data integration
– The ability to combine data quickly and at reasonable cost
• Processing capabilities
– The ability to process data quickly, as it is captured (i.e.,
stream analytics)
• Data governance
– Keep up with security, privacy, ownership, and quality
• Skill availability
• Solution cost (RO I)
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data Technologies
-- MapReduce
• MapReduce distributes the processing of very large multi-
structured data files across a large cluster of ordinary
machines/processors
• Goal - achieving high performance with “simple”
computers
• Developed and popularized by Google
• Good at processing and analyzing large volumes of multi-
structured data in a timely manner
• Example tasks: indexing the Web for search, graph
analysis, text analysis, machine learning, …
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data Technologies
-- MapReduce
• How does MapReduce work?
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data Technologies
-- Hadoop (1 of 3)
• Hadoop is an open source framework for storing and
analyzing massive amounts of distributed, unstructured
data
• Hadoop clusters run on inexpensive commodity hardware
so projects can scale-out inexpensively
– Hadoop is now part of Apache Software Foundation
– Open source - hundreds of contributors continuously
improve the core technology
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data Technologies
-- Hadoop (2 of 3)
• How Does Hadoop Work?
– Access unstructured and semi-structured data (e.g., log
files, social media feeds, other data sources)
– Break the data up into “parts,” which are then loaded into a
file system made up of multiple nodes running on
commodity hardware using HDF S
– Each “part” is replicated multiple times and loaded into the
file system for replication and failsafe processing
– A node acts as the Facilitator and another as Job Tracker
– Jobs are distributed to the clients, and once completed the
results are collected and aggregated using MapReduce
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data Technologies
-- Hadoop (3 of 3)
• Hadoop Technical Components
– Hadoop Distributed File System (HDF S)
– Name Node (primary facilitator knowing data location)
– Secondary Node (backup to Name Node)
– Job Tracker (coordinator of processing or MapReduce)
– Slave Nodes (store data and process data following
directions)
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Technology Insights 9.2
A Few Demystifying Facts about Hadoop
• Hadoop consists of multiple products
• Hadoop is open source but available from vendors, too
• Hadoop is an ecosystem, not a single product
• HDF S is a file system, not a DBM S
• Hive resembles SQ L but is not standard SQ L
• Hadoop and MapReduce are related but not the same
• MapReduce provides control for analytics, not analytics
• Hadoop is about data diversity, not just data volume
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
A High-Level Conceptual
Architecture for Big Data Solutions
Figure 9.3 A High-Level Conceptual Architecture for Big Data Solutions.
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Hadoop versus Data Warehouse
When to Use Which Platform
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Coexistence of Hadoop and D W
Figure 9.7 Coexistence of Hadoop and Data Warehouses.
Source:“Hadoop and the Data Warehouse: When to Use Which, Teradata, 2012.” Used
with permission from Teradata Corporation.
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Coexistence of Hadoop and D W
1. Use Hadoop for storing and archiving multi-structured
data
2. Use Hadoop for filtering, transforming, and/or
consolidating multi-structured data
3. Use Hadoop to analyze large volumes of multi-structured
data and publish the analytical results
4. Use a relational DBM S that provides MapReduce
capabilities as an investigative computing platform
5. Use a front-end query tool to access and analyze data
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
In-Memory Analytics and Spark
• In-memory analytics
– Faster processing than batch processing
– Real-time dashboards for decision making
• Apache Spark
– Developed at University of California, Berkeley in 2009
– Unified analytics engine for batch and streaming data
• Use cases
– Uber uses Spark to detect fraudulent trips
– Pinterest measures user engagement in real-time
– Netflix uses Spark to run the recommendation engine
– Yahoo uses it for creating business intelligence apps.
– Ebay uses Spark for data management and stream processing.
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Big Data And Stream Analytics
• Data-in-motion analytics and real-time data analytics
– One of the Vs in Big Data = Velocity
– Analytic process of extracting actionable information from
continuously flowing data
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Stream Analytics Applications
• Data stream Mining
– Enabling technology for stream analytics
– Data stream is a continuous flow of an ordered sequence of
instances
– Sensor data, network traffic, phone logs, web searches, financial
data
• Applications
– e-Commerce
– Telecommunication
– Law Enforcement and Cyber Security
– Power Industry
– Financial Services
– Health Services
– Government
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Stream Analytics
A Use Case in Energy Industry
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Location-Based Analytics (1 of 2)
• Geospatial analytics
– Location data available from geographic information
systems (GIS), e.g. ESRI
– Other Applications: Agricultural, crime analysis,
disease spread applications
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Location-Based Analytics (2 of 2)
• A Multimedia Exercise in Analytics Employing Geospatial
Analytics
– BSI case of the Dropped Mobile Calls
• Real-Time Location Intelligence
• Analytics Applications for Consumers
– Waze
– Yelp
– ParkPG H
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cloud Computing and Business
Analytics
• Data as a Service (DaaS)
• Software as a Service (SaaS)
• Platform as a Service (PaaS)
• Infrastructure as a Service (IaaS)
• Essential Technology Stack
– Networking, Storage, Servers, Virtualization (IaaS)
– Above + OS, Middleware, Runtime (PaaS)
– Above + Data, Application (SaaS)
– Figure 9.13 (p. 560)
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cloud Computing and Analytics
• Cloud deployment models
– Private cloud: internal or corporate cloud
– Public cloud
– Hybrid cloud
• Major Cloud Platform Providers in Analytics
– Amazon Elastic Beanstalk
– I B M Cloud
– Microsoft Azure
– Google App Engine
– Openshift
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Analytics Applications on the Cloud
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Copyright
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved