OC - Module 1 - Intro To BDA 021312
OC - Module 1 - Intro To BDA 021312
Analytics
1
Module 1: Introduction to Big Data Analytics
2
Module 1: Introduction to Big Data Analytics
3
Introduction to Big Data Analytics
Your Thoughts?
4
Big Data Defined
• “Big Data” is data whose scale, distribution, diversity,
and/or timeliness require the use of new technical
architectures and analytics to enable insights that unlock
new sources of business value.
Requires new data architectures, analytic sandboxes
New tools
New analytical methods
Integrating multiple skills into new role of data scientist
2. Processing Complexity
Changing data structures
Use cases warranting additional transformations and
analytical techniques
3. Data Structure
Greater variety of data structures to mine and analyze
Big Data Characteristics: Data Structures
Data Growth is Increasingly Unstructured
red
• Textual data files with a discernable pattern,
Semi-
More Structured
enabling parsing
Structure • Example: XML data files that are self
d describing and defined by an xml schema
7
Four Main Types of Data Structures
Structured Data Quasi-Structured Data
Semi-Structured Data
View Source
http://www.google.com/
#hl=en&sugexp=kjrmc&cp=8&gs_id=2m&xhr=t&q=data+scientist&pq=big+data&pf=p&sclien
t=psyb&source=hp&pbx=1&oq=data+sci&aq=0&aqi=g4&aql=f&gs_sm=&gs_upl=&bav=on.2,
or.r_gc.r_pw.,cf.osb&fp=d566e0fbd09c8604&biw=1382&bih=651
Unstructured Data
The Red Wheelbarrow, by
William Carlos Williams
8
Data Repositories, An Analyst Perspective
Data Islands Data Warehouses Analytic Sandbox
“Spreadmarts”
Centralized data containers Data assets gathered from multiple
Isolated data marts in a purpose-built space sources and technologies for analysis
• Spreadsheets and low- • Supports BI and reporting, but • Enables high performance analytics
volume DB‘s for restricts robust analyses using in-db processing
recordkeeping • Analyst dependent on IT & • Reduces costs associated with data
• Analyst dependent on DBAs for data access and replication into "shadow" file
data extracts schema changes systems
• Analysts must spend significant • “Analyst-owned” rather than “DBA
time to get extracts from owned”
multiple sources
9
Introduction to Big Data Analytics: Mini-Case
Study
Yoyodyne Bank Scenario
• Evolving from small community bank to a global bank
• Needs to move away from its legacy mainframes to an environment that
supports more robust analytics
• Growing through mergers and acquisitions
• Subject to many new regulatory requirements
• Increasing customer base and increased product offerings
Your Thoughts?
Discussion Questions
1. Discuss how the bank’s data would change under these circumstances.
2. How are their needs changing with these business changes?
3. What do you need to consider from an analyst point of view? What are
some things to consider implementing as the bank grows?
11
Module 1: Introduction to Big Data
Analytics
Lesson 1: Summary
During this lesson the following topics were covered:
• Definition of big data
• Big data characteristics and considerations
• Unstructured data fueling big data analytics
• Analyst perspective on Data Repositories
12
Module 1: Introduction to Big Data Analytics
13
Business Drivers for Analytics
Current Business Problems Provide Opportunities for Organizations to
Become More Analytical & Data Driven
Driver Examples
1
Desire to optimize business
Sales, pricing, profitability, efficiency
operations
2
Desire to identify business
Customer churn, fraud, default
risk
3
Predict new business Upsell, cross-sell, best new customer
opportunities prospects
4
Comply with laws or Anti-Money Laundering, Fair Lending,
regulatory requirements Basel II
14
Analytical Approaches for Meeting Business
Drivers
Business Intelligence vs. Data Science
Predictive Analytics & Data Mining
(Data Science)
Typical • Optimization, predictive modeling,
Technique forecasting, statistical analysis
s & Data • Structured/unstructured data,
Types many types of sources, very large
High data sets
Common • What if…..?
Questions • What’s the optimal scenario for
our business ?
• What will happen next? What if
Data these trends continue? Why is this
Science happening?
Business Intelligence
BUSINESS Typical • Standard and ad hoc reporting,
Technique dashboards, alerts, queries,
VALUE s & Data details on demand
Business Types • Structured data, traditional
Intelligence sources, manageable data sets
Common • What happened last quarter?
Questions • How many did we sell?
• Where is the problem? In which
situations?
Low
15
A Typical Analytical Architecture
1 Data
Sources
Non-Agile Models
2 Departmental
“Spread
Marts”
Warehouse
Enterprise 4
Departmental Applications
Warehouse
3 Prioritized
Operational
Processes
Static schemas
accrete over time Reporting Siloed
Analytics
16
Implications of Typical Architecture for Data
Science
18
Opportunities for a New Approach to Analytics
SMALL
19
Opportunities for a New Approach to
Analytics
Big Data
1 Ecosystem
Data
Devices
Individual
Data
2
Websites
3
Collectors Data
Aggregato
rs
Data
Users/
Buyers Catalog
4 Co-Ops
Phone/TV Retail
Media
Private
Media Credit List Investigators
Archives Bureaus Financial Brokers Delivery /Lawyers
Banks Service
Governmen
t
20
Considerations for Big Data
Analytics
Criteria for Big Data Projects New Analytic Architecture
Analytic Sandbox
Data assets gathered from multiple sources
1. Speed of decision making and technologies for analysis
2. Throughput
• Enables high performance analytics
using in-db processing
3. Analysis flexibility • Reduces costs associated with data
replication into "shadow" file
systems
• “Analyst-owned” rather than “DBA
owned”
23
State of the Practice in Analytics: Mini-Case
Study
Big Data Enabled Loan Processing at Yoyodyne
Traditional Big Data Enabled
Underwriting Underwriting Your Thoughts?
Risk Level Risk Level
Underwriting Risk
e t al
om on en y ing ais
c
In ati ym o r c or tory pr
ic plo is
t S is Ap
e rif Em H edit d H
V Cr An
24
Module 1: Introduction to Big Data
Analytics
Lesson 2: Summary
During this lesson the following topics were covered:
• Business drivers for analytics
• Current analytical architecture
• Business intelligence vs. data science
• Drivers of big data and new big data ecosystem
25
Module 1: Introduction to Big Data Analytics
Lesson 3: The Data Scientist
During this lesson the following topics are covered:
• Key Roles of the New Big Data Ecosystem
• Profile of a Data Scientist
26
Skills Needed In the New Data Ecosystem
Your Thoughts?
27
Three Key Roles of the New Data Ecosystem
Data
Scientists Role Role Description
28
Roles Needed for Analytical Projects
Data Scientist Key Activities
Data Scientists
Key Activities Data Data Bl LOB
Enginee Analy Analys
• Reframe business rs st t Use
r
challenges as analytics
challenges Analytic Productivity Platform
29
Profile of a Data Scientist
Quantitative
Curious &
Technical
Creative
Skeptical Communicative
& Collaborative
30
Module 1: Introduction to Big Data
Analytics
Lesson 3: Summary
During this lesson the following topics were covered:
• Key Roles of the New Big Data Ecosystem
• Profile of a Data Scientist
31
Module 1: Introduction to Big Data Analytics
Lesson 4: Big Data Analytics in Industry Verticals
During this lesson we cover the following representative examples:
• Health Care
• Public Services
• Life Sciences
• IT Infrastructure
• Online Services
32
Big Data Analytics: Industry Examples
1
Health Care
• Reducing Cost of Care Medical
• Preventing Pandemics
3 Life Sciences Data
Collectors
• Genomic Mapping
4 IT Infrastructure
• Unstructured Data Analysis
Phone/TV Retail
5 Online Services
Financial
• Social Media for Professionals
33
1
Big Data Analytics: Healthcare
• Dr. Jeffrey Brenner generated his own crime maps from medical
Use of Big Data billing records of 3 hospitals
• City hospitals & ER’s provided expensive care, low quality care
• Reduced hospital costs by 56% by realizing that 80% of city’s
Key medical costs came from 13% of its residents, mainly low-
Outcomes income or elderly
• Now offers preventative care over the phone or through home
visits
34
2
Big Data Analytics: Public Services
35
3
Big Data Analytics: Life Sciences
Situation • Broad Institute (MIT & Harvard) mapping the Human Genome
36
4
Big Data Analytics: IT Infrastructure
37
5
Big Data Analytics: Online Services
38
Module 1: Introduction to Big Data
Analytics
Lesson 4: Summary
During this lesson the following representative examples were
covered:
• Health Care
• Public Services
• Life Sciences
• IT Infrastructure
• Online Services
39
Check Your Knowledge
1. What are the 3 characteristics of Big Data, and the Your Thoughts?
main considerations in processing Big Data?
2. What is an analytic sandbox?
3. Explain the difference between Business Intelligence
and Data Science.
4. Describe the challenges of the current analytical
architecture for Data Scientists.
5. What are the key skill sets and behavioral characteristics
of a Data Scientist?
40
Module 1: Summary
41