0% found this document useful (0 votes)
33 views50 pages

Big Data and Data Analytics

Uploaded by

Hanin Fitria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views50 pages

Big Data and Data Analytics

Uploaded by

Hanin Fitria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Big Data and Data Analytics

1
2
What is Big Data?
What is Big Data?

• Massive sets of unstructured/semi-structured data from Web traffic,


social media, sensors, etc
• Petabytes, exabytes of data
• Volumes too great for typical DBMS
• Information from multiple internal and external sources:
• Transactions
• Social media
• Enterprise content
• Sensors
• Mobile devices
What is Big Data? continued
• Companies leverage data to adapt products and
services to:
• Meet customer needs
• Optimize operations
• Optimize infrastructure
• Find new sources of revenue
• Can reveal more patterns and anomalies

• IBM estimates that by 2015 4.4 million jobs will be


created globally to support big data
• 1.9 million of these jobs will be in the United States
Where does Big Data come from?
Email Transactions
Enterprise Partner, Employee
“Dark Data” Customer, Supplier

Contracts Monitoring

Public Commercial
Sensor
Credit
Weather
Industry
Population Social Media Sentiment
Economic
Network

5
Types of Data
Types of Data

• When collecting or gathering data we collect


data from individuals cases on particular
variables.
• A variable is a unit of data collection whose value
can vary.
• Variables can be defined into types according to
the level of mathematical scaling that can be
carried out on1.the
Categorical
data. 2. Ordinal
(Nominal)
• There are four types of data or
3. Interval levels of
4. Ratio
measurement:
Categorical (Nominal)
data
• Nominal or categorical data is data that comprises of categories
that cannot be rank ordered – each category is just different.
• The categories available cannot be placed in any order and no
judgement can be made about the relative size or distance from one
category to another.
 Categories bear no quantitative relationship to one another
 Examples:
- customer’s location (America, Europe, Asia)
- employee classification (manager, supervisor,
associate)
• What does this mean? No mathematical operations can be
performed on the data relative to each other.
•Therefore, nominal data reflect qualitative differences rather than
quantitative ones.
Nominal data
Examples:

What is your Did you enjoy


gender? (please the film?
tick) (please tick)

Male Yes
Female No

•Systems for measuring nominal data must ensure


that each category is mutually exclusive and the
system of measurement needs to be exhaustive.
•Exhaustive: the system of categories system should
have enough categories for all the observations
• Variables that have only two responses i.e. Yes or
No, are known as dichotomies.
Ordinal data
How satisfied are you with
the level of service you
Example: have received? (please tick)

Very satisfied
Somewhat satisfied
Neutral
Somewhat dissatisfied
Very dissatisfied

• Ordinal data is data that comprises of categories that can be


rank ordered.
• Similarly with nominal data the distance between each category
cannot be calculated but the categories can be ranked above or
below each other.
 No fixed units of measurement
 Examples:
- college football rankings
- survey responses
(poor, average, good, very good, excellent)
Interval and ratio
data
• Both interval and ratio data are examples of scale data.
• Scale data:
• data is in numeric format ($50, $100, $150)
• data that can be measured on a continuous scale
• the distance between each can be observed and as a
result measured
• the data can be placed in rank order.
Interval data

• Ordinal data but with constant differences


between observations
• Ratios are not meaningful
• Examples:
• Time – moves along a continuous measure or
seconds, minutes and so on and is without a
zero point of time.
• Temperature – moves along a continuous
measure of degrees and is without a true zero.
• SAT scores
Ratio data

• Ratio data measured on a continuous scale


and does have a natural zero point.
 Ratios are meaningful
 Examples:
• monthly sales
• delivery times
• Weight
• Height
• Age
Data for Business Analytics

(continued)
Classifying Data Elements in a Purchasing Database

Ca

Ca

Ca
Ca go

In al
In
Ra
Figure 1.2

Ra
Ra
Ra o
te

te

te
te rica

te
te
ti

tio
tio
tio
go

go
go l

rv
rv

al
r

ri c
ri c
ic
al

al
al

If there was field (column) for Supplier Rating (Excellent, Good,


Acceptable, Bad), that data would be classified as Ordinal
1-14
Big Data Characteristics

VOLUME
Growing quantity of data
e.g. social media, behavioral, video
T Y
I E
R
VA

Quickening speed of data


e.g. smart meters, process monitoring VELOCITY

Gartner, Feb 2001


Increase in types of data
e.g. app data, unstructured data
Which Big Data characteristic is the
biggest issue for your organization?

Velocity
of data
16%

Variety of
data
Volume of 48%
data
35%

Source: Getting Value from Big Data, Gartner Webinar, May 2012
Volume

• Volume
• Petabytes,
exabytes of data
• Volumes too
great for typical
DBMS
Volume - Bytes Defined

eBay data warehouse (2010) = 10


PB

eBay will increase this 2.5 times by


2011

Teradata > 10 PB

Megabyte: 220 bytes or, Gigabyte: 230 bytes or, loosely one
loosely,
5-18 one million bytes billion bytes
Velocity
• Velocity
• Massive
amount of
streaming data
Variety
• Variety
• Massive sets of
unstructured/se
mi-structured
data from Web
traffic, social
media, sensors,
and so on
Which source of data represents
the most immediate opportunity?

Source: Getting Value from Big Data, Gartner Webinar, May 2012
Big Data Opportunities
Making better informed decisions
e.g. strategies, recommendations

Discovering hidden insights


e.g. anomalies forensics, patterns,
trends

Automating business processes


e.g. complex events, translation
Which is the biggest opportunity for Big Data in your
organization?
Through 2017:

• 85% of Fortune 500


organizations will be unable to
exploit big data for competitive
advantage.

• Business analytics needs will


drive 70% of investments in the
expansion and modernization
of information infrastructure.

Source: Getting Value from Big Data, Gartner Webinar, May 2012
Identifying Insurance Fraud
Auto Insurance
• Opportunity
• Save and make money by reducing fraudulent
auto insurance claims

• Data & Analytics


• Predictive analytics against years of historical claims and coverage
data
• Text mining adjuster reports for hidden clues, e.g. missing facts,
inconsistencies, changed stories

• Results
• Improved success rate in pursuing fraudulent claims from 50% to
88%; reduced fraudulent claim investigation time by 95%
What **“dark data” is just laying around that can transform business
• Marketing to individuals with low propensity for fraud
processes?
**Operational data that is not being used. Consulting and market research
company Gartner Inc. describes dark data as "information assets that
organizations collect, process and store in the course of their regular
business
24
activity, but generally fail to use for other purposes."
Quality Improvement
 Opportunity
 Move from manual to automated inspection of
burger bun production to ensure and improve
quality
 Data & Analytics
 Photo-analyze over 1000 buns-per-minute for
color, shape and seed distribution
 Continually adjust ovens and process
automatically
 Result
 Eliminate 1000s of pounds of wasted product per
year; speed production; save energy; Reduce
manual labor costs
Is the company using all of its “senses” to observe, measure
and optimize business processes?

25
Improving Corporate Image
• Opportunity
• Improve reputation, brand and buzz by tapping social
media
• Data & Analytics
• Continually scanning twitterverse for mentions of their
business
• Integrating tweeters with their robust customer
management system
• Results
• Saw tweet from a top customer lamenting late flight—
no time to dine at Morton’s
• Tuxedo-clad waiter waiting for him when he landed
with a bag containing his favorite steak, prepared the
way he normally likes it with all the fixin’s

How can the company listen, analyze and respond in real-time?


BIG DATA BIG REWARDS

 Big Data Big Rewards 1.pdf


28
Business Analytics
Business
Analytics/Business
•Intelligence
Business Analytics/Business intelligence (BI) is a
broad category of applications, technologies, and
processes for:
• gathering,
• storing,
• accessing, and
• analyzing data
• to help business users make better decisions.
Things Are Getting More
Complex
• Many companies are performing new kinds of analytics
(**sentiment analysis, etc.), to better and more quickly
understand and respond to what customers are saying
about them and their products.
• The cloud, and appliances are being used as data stores
• Advanced analytics are growing in popularity and
importance

**Sentiment analysis (also known as opinion mining) refers to


the use of natural language processing, text analysis and
computational linguistics to identify and extract subjective
information in source materials.
Analytics Models
How can we
make it happen?
Prescriptiv
What will
e
happen?
Analytics
Predictive i o n
t
Why did it i za
VALUE

Analytics ti m
happen? Op
ig ht
Diagnostic r es
What
Analytics Fo
happened?
Descriptive gh t
s i
Analytics In

on gh t
at i
d s i
Inf
o rm Hin

DIFFICULTY
31
Descriptive Analytics
• Descriptive analytics, such as reporting/OLAP,
dashboards, and data visualization, have been widely
used for some time.
• They are the core of traditional BI.

What has occurred?


Descriptive analytics, such as data visualization, is
important in helping users interpret the output from
predictive and predictive analytics.
Predictive Analytics
• Algorithms for predictive analytics, such as regression analysis, machine
learning, and neural networks, have also been around for some time.

What will occur?

• Marketing is the target for many predictive analytics applications.


• Descriptive analytics, such as data visualization, is important in helping
users interpret the output from predictive and prescriptive analytics.
Organizational Transformation
• Analytics are a competitive requirement

• For BI-based organizations, the use of BI/analytics is a


requirement for successfully competing in the
marketplace.
• TDWI report on Big Data Analytics found that 85% of
respondents indicated that their firms would be using
advanced analytics within three years

• IBM/MIT Sloan Management Review research study found


that top performing companies in their industry are much
more likely to use analytics rather than intuition across the
widest range of possible decisions.
Complex Systems Require Analytics
 Tackle complex problems and provide individualized solutions
 Products and services are organized around the needs of
individual customers
 Dollar value of interactions with each customer is high
 There is high level of interaction with each customer
 Examples: IBM, World Bank, Halliburton
Volume Operations Require Analytics
 Serves high-volume markets through standardized products and
services
 Each customer interaction has a low dollar value
 Customer interactions are generally conducted through
technology rather than person-to-person
 Are likely to be analytics-based
 Examples: Amazon.com, eBay, Hertz
The Nature of the
Industry
• Online retailers like Amazon.com and Overstock.com are high
volume operations who rely on analytics to compete.
• When you enter their sites a cookie is placed on your PC and all
clicks are recorded.
• Based on your clicks and any search terms, recommendation
engines decide what products to display.
• After you purchase an item, they have additional information
that is used in marketing campaigns.
• Customer segmentation analysis is used in deciding what
promotions to send you.
• How profitable you are influences how the customer care center
treats you.
• A pricing team helps set prices and decides what prices are
needed to clear out merchandise.
• Forecasting models are used to decide how many items to
order for inventory.
• Dashboards monitor all aspects of organizational performance
Knowledge Requirements for Advanced
Analytics
Business Domain

Data Modeling

• Choosing the right data to include in models is important.


• Important to have some thoughts as to what variables might be related.
• Domain knowledge is necessary to understand how they can be
used. Role of Business Analyst is crucial
• Consider the story of the relationship between beer and diapers in the
market basket of young males in convenience stores.
• You still have to decide (or experiment to discover) whether it is better
to put them together or spread them across the store (in the hope that
other things will be bought while walking the isles).
The findings were that men between 30- 40 years in age, shopping
between 5pm and 7pm on Fridays, who purchased diapers were most
likely to also have beer in their carts. This motivated the grocery store
to move the beer isle closer to the diaper isle and instantaneously, a 35%
39
Visualization
Visualization: Acquisition of
Insight
• Many people and institutions possess data that may ‘hide’
fundamental relations
• Realtors
• Bankers
• Air Traffic Controller
• Fraud investigators
• Engineers
• They want to be able to view some graphical representation of that
data, maybe interact with it, and then be able to say…….ahha!

......
Example: Fraud Detection

• The Serious Fraud Office (SFO) suspected mortgage


fraud

• The SFO provided 12 filing cabinets of data

• After 12 person years a suspect was identified


• The suspect was arrested, tried and convicted
Example: Fraud Detection continued

• The data was supplied in electronic form


• A visualization tool (Netmap) was used to examine
the data
• After 4 person weeks the same suspect was
identified
• A master criminal behind the fraud was also
identified
Is Information Visualization
Useful?
Drugs and Chips
Texas Instruments

Manufactures microprocessors on silicon wafers that are routed through 400 steps in many
weeks. This process is monitored, gathering 140,000 pieces of information about each wafer.
Somewhere in that heap of data can be warnings about things going wrong. Detect a bug early
before bad chips are made. TI uses visualization tools to make the detection process easier

Eli Lilly

Has 1500 scientists using an advanced information visualization tool (Spotfire) for decision
making. “With its ability to represent multiple sources of information and interactively change your
view, it’s helpful for homing in on specific molecules and deciding whether we should be doing
further testing on them”
Sheldon Ort of Eli Lilly, speaking to Fortune
The Cholera Epidemic,
London 1845

Dr. John Snow, medical officer for


London, investigated the cholera
epidemic of 1845 in Soho. He
mapped the deaths and noted that
the deaths, indicated by points,
tended to occur near the Broad
Street pump. Closure of the pump
coincided with a reduction in
cholera.
Challenger Disaster
• On 28th January 1986 the space shuttle Challenger exploded, and
seven astronauts died, because two rubber O-Rings leaked.

• The previous day, engineers who designed the rocket opposed the
launch, concerned that the O-Rings would not seal at the forecast
temperature (25 to 29oF).

• After much discussion, the decision was taken to go ahead.

• Cause of the accident:

• An inability to assess the link between cool temperature and O-Ring


damage on earlier flights.
• Many charts poorly presented
Visualization

• Refers to the innovative use of images and interactive technology to explore large, high- density
datasets
• Help users see patterns and relationships that would be difficult to see in text lists
• Rich graphs, charts
• Dashboards
• Maps
• Increasingly is being used to identify insights into both structured and unstructured data for such
areas as
• operational efficiencies
• profitability
• strategic planning

Video Tableau
Examples
 Geo data
mapping

Introduction to Information Visualization - Fall 2012


Examples

 Treemap

Introduction to Information Visualization - Fall 2012


Examples

 Population
“Trendalyzer”

Introduction to Information Visualization - Fall 2012


50
Data Analytics- Machine Learning

You might also like