0% found this document useful (0 votes)
8 views29 pages

Chapter 6 (P1)

Uploaded by

Nawaf Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views29 pages

Chapter 6 (P1)

Uploaded by

Nawaf Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Business Intelligence and Analytics:

Systems for Decision Support


(10th Edition)

Chapter 13:
Big Data Analytics
Big Data -
Definition and Concepts
 Big [volume] Data is not new!
 Big Data means different things to people with
different backgrounds and interests
 Traditionally, “Big Data” = massive volumes of data
 E.g., volume of data at CERN, NASA, Google, …
 Where does the Big Data come from?
 Everywhere! Web logs, RFID, GPS systems, sensor
networks, social networks, Internet-based text documents,
Internet search indexes, detail call records, astronomy,
atmospheric science, biology, genomics, nuclear physics,
biochemical experiments, medical records, scientific
research, military surveillance, multimedia archives, …
13-2 Copyright © 2014 Pearson Education, Inc.
Technology Insights 6.1
The Data Size Is Getting Big, Bigger…
 Hadron Collider - 1 PB/sec
 Boeing jet - 20 TB/hr Names for Big Data Sizes
 Facebook - 500 TB/day.
 YouTube – 1 TB/4 min.
 The proposed Square
Kilometer Array telescope
(the world’s proposed
biggest telescope) – 1
EB/day

13-3 Copyright © 2014 Pearson Education, Inc.


Big Data -
Definition and Concepts
 Big Data is a misnomer!
 Big Data is more than just “big”
 The Vs that define Big Data
 Volume
 Variety
 Velocity

13-4 Copyright © 2014 Pearson Education, Inc.


A High-level Conceptual Architecture
for Big Data Solutions (by AsterData / Teradata)
UNIFIED DATA ARCHITECTURE
System Conceptual View

ERP
ERP MOVE MANAGE ACCESS
Marketing
Marketing
Executives

SCM
DATA Operational
PLATFORM Applications
Systems
CRM
INTEGRATED
DATA WAREHOUSE Customers
Business
Partners
Images Intelligence

Frontline
Audio Workers
and Video Data
Mining

Business
Machine
Logs DISCOVERY PLATFORM Analysts
Math
and Stats
Data
Text Scientists
EVENT
PROCESSING Languages
Web and Engineers
Social

BIG DATA ANALYTIC


SOURCES TOOLS & APPS USERS

13-5 Copyright © 2014 Pearson Education, Inc.


Fundamentals of
Big Data Analytics
 Big Data by itself, regardless of the size,
type, or speed, is worthless
 Big Data + “big” analytics = value
 With the value proposition, Big Data also
brought about big challenges
 Effectively and efficiently capturing, storing,
and analyzing Big Data
 New breed of technologies needed
(developed (or purchased or hired or
outsourced …)
13-6 Copyright © 2014 Pearson Education, Inc.
Big Data Considerations
 You can’t process the amount of data that you want to
because of the limitations of your current platform.
 You can’t include new/contemporary data sources (e.g.,
social media, RFID, Sensory, Web, GPS, textual data)
because it does not comply with the data schema.
 You need to (or want to) integrate data as quickly as
possible to be current on your analysis.
 You want to work with a schema-on-demand data
storage paradigm because the variety of data types.
 The data is arriving so fast at your organization’s
doorstep that your analytics platform cannot handle it.
 …
13-7 Copyright © 2014 Pearson Education, Inc.
Critical Success Factors for
Big Data Analytics
 A clear business need (alignment with the
vision and the strategy)
 Strong, committed sponsorship (executive
champion)
 Alignment between the business and IT
strategy
 A fact-based decision-making culture
 A strong data infrastructure
 The right analytics tools
 Right people with right skills
13-8 Copyright © 2014 Pearson Education, Inc.
Critical Success Factors for
Big Data Analytics
A Clear
business need

Personnel with Strong,


advanced committed
analytical skills sponsorship

Keys to Success
with Big Data
Alignment
Analytics
The right between the
analytics tools business and IT
strategy

A fact-based
A strong data
decision-making
infrastructure
culture

13-9 Copyright © 2014 Pearson Education, Inc.


Enablers of Big Data Analytics
high-performance computing

 In-memory analytics
 Storing and processing the complete data set in
RAM
 In-database analytics
 Placing analytic procedures close to where data is
stored
 Grid computing & MPP
 Use of many machines and processors in parallel
(MPP- massively parallel processing)
 Appliances
 Combining hardware, software and storage in a
single unit for performance and scalability
13-10 Copyright © 2014 Pearson Education, Inc.
Challenges of Big Data Analytics
 Data volume
 The ability to capture, store, and process the huge

volume of data in a timely manner


 Data integration
 The ability to combine data quickly/cost effectively

 Processing capabilities
 The ability to process the data quickly, as it is

captured (i.e., stream analytics)


 Data governance (… security, privacy, access)
 Skill availability (… data scientist)
 Solution cost
13-11 Copyright © 2014 Pearson Education, Inc.
Business Problems Addressed by
Big Data Analytics
 Process efficiency and cost reduction
 Brand management
 Revenue maximization, cross-selling/up-selling
 Enhanced customer experience
 Churn identification, customer recruiting
 Improved customer service
 Identifying new products and market opportunities
 Risk management
 Regulatory compliance
 Enhanced security capabilities
 …
13-12 Copyright © 2014 Pearson Education, Inc.
Big Data Technologies
 MapReduce …
 Hadoop …
 Hive
 Pig
 Hbase
 Flume
 Oozie
 Ambari
 Avro
 Mahout, Sqoop, Hcatalog, ….
13-13 Copyright © 2014 Pearson Education, Inc.
Big Data Technologies
- MapReduce
 MapReduce distributes the processing of very
large multi-structured data files across a large
cluster of ordinary machines/processors
 Goal - achieving high performance with “simple”
computers
 Developed and popularized by Google
 Good at processing and analyzing large volumes
of multi-structured data in a timely manner
 Example tasks: indexing the Web for search,
graph analysis, text analysis, machine learning, …
13-14 Copyright © 2014 Pearson Education, Inc.
Big Data Technologies
- MapReduce
How does
MapReduce
work? 4

Raw Data Map Function Reduce Function

13-15 Copyright © 2014 Pearson Education, Inc.


Big Data Technologies
- Hadoop
 Hadoop is an open source framework for storing
and analyzing massive amounts of distributed,
unstructured data
 Originally created by Doug Cutting at Yahoo!
 Hadoop clusters run on inexpensive commodity
hardware so projects can scale-out inexpensively
 Hadoop is now part of Apache Software Foundation
 Open source - hundreds of contributors
continuously improve the core technology
 MapReduce + Hadoop = Big Data core technology
13-16 Copyright © 2014 Pearson Education, Inc.
Big Data Technologies
- Hadoop
 How Does Hadoop Work?
 Access unstructured and semi-structured data (e.g., log
files, social media feeds, other data sources)
 Break the data up into “parts,” which are then loaded
into a file system made up of multiple nodes running on
commodity hardware using HDFS
 Each “part” is replicated multiple times and loaded into
the file system for replication and failsafe processing
 A node acts as the Facilitator and another as Job Tracker
 Jobs are distributed to the clients, and once completed
the results are collected and aggregated using
MapReduce
13-17 Copyright © 2014 Pearson Education, Inc.
Big Data Technologies
- Hadoop
 Hadoop Technical Components
 Hadoop is an open source framework for processing,
storing, and analyzing massive amounts of distributed,
unstructured data.
 Hadoop Distributed File System (HDFS)

 Name Node (primary facilitator)

 Secondary Node (backup to Name Node)

 Job Tracker: The node in a Hadoop cluster that

initiates and coordinates MapReduce jobs, or the


processing of the data
 Slave Nodes (the grunts of any Hadoop cluster)

13-18 Copyright © 2014 Pearson Education, Inc.


How Does Hadoop Work?
 A client accesses unstructured and semistructured data
from sources including log files, social media feeds, and
internal data stores. It breaks the data up into "parts,"
which are then loaded into a file system made up of
multiple nodes running on commodity hardware (Commodity
computing involves the use of large numbers of already-available computing components for
parallel computing).

 The default file store in Hadoop is the Hadoop Distributed


File System (HDFS). File systems such as HDFS are adept
at storing large volumes of unstructured and
semistructured data as they do not require data to be
organized into relational rows and columns.
13-19 Copyright © 2014 Pearson Education, Inc.
Big Data Technologies
Hadoop - Demystifying Facts
 Hadoop consists of multiple products
 Hadoop is open source but available from vendors, too
 HDFS is a file system, not a DBMS
 Hadoop and MapReduce are related but not the same
 Hadoop complements a DW; it’s rarely a replacement.
 Hadoop enables many types of analytics, not just Web
analytics.

13-20 Copyright © 2014 Pearson Education, Inc.


Data Scientist
“The Sexiest Job of the 21st Century”
Thomas H. Davenport and D. J. Patil
Harvard Business Review, October 2012

 Data Scientist = Big Data guru


 One with skills to investigate Big Data
 Very high salaries, very high expectations
 Where do Data Scientist come from?
 M.S./Ph.D. in MIS, CS, IE,… and/or Analytics
 There is not a specific degree program for DS!
 PE, PML, … DSP (Data Sceice Professional)
13-21 Copyright © 2014 Pearson Education, Inc.
Skills That Define a Data Scientist
Domain Expertise,
Problem Definition and
Decision Modeling

Data Access and


Communication and Management
Interpersonal (both traditional and
new data systems)

DATA
SCIENTIST
Curiosity and Programming,
Creativity Scripting and Hacking

Internet and Social


Media/Social Networking
Technologies

13-22 Copyright © 2014 Pearson Education, Inc.


A Typical
Job Post
for Data
Scientist

13-23 Copyright © 2014 Pearson Education, Inc.


Big Data Vendors
 Big Data vendor landscape is developing
very rapidly
 A representative list would include
 Cloudera - cloudera.com Software,
 MapR – mapr.com Hardware,
 Hortonworks - hortonworks.com Service, …
 Also, IBM (Netezza, InfoSphere), Oracle
(Exadata, Exalogic), Microsoft, Amazon,
Google, …
13-24 Copyright © 2014 Pearson Education, Inc.
Top 10 Big Data Vendors
with Primary Focus on Hadoop
$70

$60

$50

$40

$30

$20

$10

$0

13-25 Copyright © 2014 Pearson Education, Inc.


How to Succeed with Big Data
1. Simplify
2. Coexist
3. Visualize
4. Empower
5. Integrate
6. Govern
7. Evangelize
13-26 Copyright © 2014 Pearson Education, Inc.
Big Data And Stream Analytics
 Stream analytics (also called data in-motion
analytics and real-time data analytical, among
others)
 One of the Vs in Big Data = Velocity
 Analytic process of extracting actionable information
from continuously flowing/streaming data
 Why Stream Analytics?
 It may not be feasible to store the data
 It may loose its value if not processed immediately
 Stream Analytics Versus Perpetual Analytics
 Critical Event Processing?
13-27 Copyright © 2014 Pearson Education, Inc.
Stream Analytics
A Use Case in Energy Industry
Energy Production System
Some of the most (Traditional and Renewable) Capacity Decisions

impactful applications
of stream analytics Sensor Data
were developed in the (Energy Production
energy industry, System Status)
specifically for smart
grid (electric power
Streaming Analytics
supply chain) systems. Meteorological Data Data Integration
(Predicting Usage,
(Wind, Light, and Temporary
Production and
Temperature, etc.) Staging
Anomalies)

Permanent
Usage Data
Storage Area
(Smart Meters,
Smart Grid Devises)

Energy Consumption System Pricing Decisions


(Residential and Commercial)

13-28 Copyright © 2014 Pearson Education, Inc.


Stream Analytics Applications
 e-Commerce
 Telecommunication
 Law Enforcement and Cyber Security
 Power Industry
 Financial Services
 Health Services
 Government

13-29 Copyright © 2014 Pearson Education, Inc.

You might also like