0% found this document useful (0 votes)
54 views52 pages

Insights Into Big Data: An Industrial Perspective

The document discusses big data analytics from an industrial perspective. It covers topics like the evolution of analytics, characteristics of big data, challenges of big data, transforming data into insights, and provides examples of analytics use cases in marketing and customer behavior. The document is intended to provide an overview of big data analytics concepts and applications.

Uploaded by

venkymit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views52 pages

Insights Into Big Data: An Industrial Perspective

The document discusses big data analytics from an industrial perspective. It covers topics like the evolution of analytics, characteristics of big data, challenges of big data, transforming data into insights, and provides examples of analytics use cases in marketing and customer behavior. The document is intended to provide an overview of big data analytics concepts and applications.

Uploaded by

venkymit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Insights into Big Data

- An Industrial Perspective
Monika R
Data Analyst, Customer Success
[email protected]
Analytics is what?!

• Any sufficiently advanced technology should be indistinguishable


from magic.!!! Arthur C. Clarke  
• Shouldn’t just stop with the excitement of enjoying magic

• Should be able to decode the detailed aspects of the technology behind

• Should go beyond presentation bias, to start looking things beyond the


presentation layer
Table of Contents:

• Analytics in Solving Today’s Business problems.

• Characteristics of Big Data

• Big Data Challenges.

• "Dataset to Insight" inside a Business.

• Sample Case Studies.


• Marketing Analytics
• Customer Behavior Analytics
Analytics in Solving Today’s Business
problems

 Before Data Analytics


 Evolution of Analytics
 Big Data definition
 Data across every Industries
Before Data Analytics
How were we starting a business even before the evolution of computer?
Before Data Analytics
• The ideology behind Analytics can most likely be tracked back to the days
before the age of computers, when unstructured data were the norm and
analytics was in its infancy.
• The use of data to make decisions is, of course, not a new idea; it is as old
as decision making itself.
What is DA ?
• Analytics is the process of iterative,
methodical exploration of an organization’s
data with emphasis on statistical Analysis. To
enable data-driven decision making.
• Not knowing, is like Not Competing
• Incorporating on advanced analytics lets us to
extract, cleanse, validate, analyze and report to
improve decision-making and business
outcomes.
• NOW WE KNOW, NOW WE CAN COMPETE
Why did it evolve big now ?
Today it isn’t just online and information firms that can create
products and services from analyses of data.
It’s every firm in every industry.

• Although analytics has been around for a long while,


it wasn’t until the last 10 to 15 years that its
importance in the business field has been realized.
• With analytics, organizations can now base their
decisions and strategies on data rather than on gut
feelings.
• Everything needs a proof in this world and Analytics
helps to have a proof.
What is Big Data?
• 'Big Data' is a term used to describe collection of data that is huge in size
and yet growing exponentially with time
• The reasons why every company is inclined towards adopting big data are

Reasons Big Data benefits

Timely Gain instant insights from diverse data sources

Better analytics Improvement of business performance through real-time analytics

Vast amount of data Big data technologies manage huge amounts of data

Insights Can provide better insights with the help of unstructured and semi-
structured data

Decision-making Helps mitigate risk and make smart decision by proper risk analysis
Everywhere in Every Domain
• Web
• Retail
• E-Commerce
• Medical
• Financial
• Insurance
• Telecom  
• Banking
• Travel & Hospitality
Types of Data across Industries
• Medical, Healthcare and Life Sciences
• Automobile and Manufacturing
• Travel and Hospitality
• Retail and Ecommerce
• Web, Social Media and Digital
• Media
• Telecommunication
• Banking, Finance and Insurance
• Energy
• Sports, Media and Entertainment
• Niche areas like autonomous driving, image, video, etc,.
Big Data and its Characteristics.

 Data availability
 Characteristics
Format of Handy Data
Big Data
is about
these
4Vs

Velocity is the Game Changer: Its NOT just how fast data is produced or changed,
BUT the speed at which it much be received, understood and processed.
Big Data Challenges

 Risks in handling Big Data


 Types of Analytics
Risk of Big Data
• There are commensurate risks that go along with using big data.
• The specific properties of big data, creates new types of risks that
necessitate a comprehensive strategy to enable a company to utilize big
data while avoiding the pitfalls.
Principles of Analysis
• Goal of an analysis:
* To explain cause-and-effect phenomena
* To relate research with real-world event
* To predict/forecast the real-world phenomena based on research
* Finding answers to a particular problem
* Making conclusions about real-world event based on the problem
* Learning a lesson from the problem
Types of Analytics
• Each of the Analytic type offers a different insight.
"Dataset to Insight" inside a
Business.

 Steps followed in transmitting a dataset(structured/unstructured in


to useful insight.
Insights Defined:
• An insight is a novel, interesting, plausible, and understandable relation, or
set of associated relations, that is selected from a larger set of relations
derived from a data set.
• An insight must have the following key properties:
• Actionable
• Measurable
• Stable
• Reproducible
• Robust
• Enduring
Transmission Process
• Data Collection
• Data Integration
• Data Storage
• Data Pre-
Processing
• Data Mining
• Data Analysis
• Reporting.
Data Collection
• The Task of Data collection begins after a research problem has been
defined and research design/plan chalked out.
• Two Types of Data:
• Primary Data
• Secondary Data
Data Collection
• Data Collection methods for impact evaluation vary along a continuum:
• Quantitative Data - Any Data that is numeric in form
• Qualitative Data - Any Data that is descriptive
• Data Collection improves the quality of expected Results.
• Decision making process is smoother and decisions are definitely better, if
there is data driving them.
• No Specific Data collecting tools - Various according to your business
problem.
Data Collection
Data that are collected in Today’s Business
• Market Research
• Social media data and Crawling
• Sensor Data
• Log(Audit)
• Reviews and Feedback
• Sampling
Data Integration
• DI platforms are the glue between each
program.
• A complete data integration solutions
delivers trusted data from various
sources.
• For Instance, CAD and GIS data can be
integrated to enhance a CAD drawing
with specialized GIS information and
Attributes.
Tools Used:
• BlockSpring:
• It is a unique Program in the way that they harness all of the power of
services in familiar Platforms.
• Free to Use.
• PentaHo:
• It offers big data with Zero coding, using a simple drag and drop UI.
• It is an enterprise solution.
Data Storage and Management

• Part of how Big data got the distinction as “BIG” is that it became too much
for traditional system to handle.
• Need : An infrastructure on which to run all the other analytics tools as well
as a place to store and query data.
• Evolution of NoSQL
• A typical big data storage architecture:
• Direct attached storage pools (Scalable and redundant)
• Clustered network attached storage.
CAP Theorem
• Consistency : All nodes see
the same data at the same
time.
• Availability : Every request
gets a response on
success/failure.
• Partition : System continues to
work despite message loss or
partial failure.
Data Storage Tools:
Data Pre-Processing
• The 3 various steps that would be carried out in a Data Pre-Processing
scenario would be
• Data Cleaning
• Data Transformation
• Data Reduction
Why Data Cleaning?
Data in the real world is dirty
incomplete: lacking attribute values, lacking certain attributes of interest, or containing
only aggregate data
• e.g., occupation=“”
noisy: containing errors or outliers (spelling, phonetic and typing errors, word
transpositions, multiple values in a single free-form field)
• e.g., Salary=“-10”
inconsistent: containing discrepancies in codes or names (synonyms and nicknames,
prefix and suffix variations, abbreviations, truncation and initials)
• e.g., Age=“42” Birthday=“03/07/1997”
• e.g., Was rating “1,2,3”, now rating “A, B, C”
• e.g., discrepancy between duplicate records
Why is Data Dirty?
• Incomplete data comes from:
• non available data value when collected
• different criteria between the time when the data was collected and when it is analyzed.
• human/hardware/software problems
• Noisy data comes from:
• data collection: faulty instruments
• data entry: human or computer errors
• data transmission
• Inconsistent (and redundant) data comes from:
• Different data sources, so non uniform naming conventions/data codes
• Functional dependency and/or referential integrity violation
Steps in Data Cleaning

• The Following are the steps followed in cleaning the Big data
• Fill in the missing Value
• Unified Date Format
• Converting nominal data to Numeric
• Identifying and removing the outliers and Noisy data
• Meta Data
Data Transformation
Data transformation routines convert the data into appropriate forms for
mining. They're shown as follows:
• Smoothing: This uses binning, regression, and clustering to remove noise from
the data
• Attribute construction: In this routine, new attributes are constructed and
added from the given set of attributes
• Aggregation: In this summary or aggregation, operations are performed on the
data
• Normalization: Here, the attribute data is scaled so as to fall within a smaller
range
• Discretization: In this routine, the raw values of a numeric attribute are replaced
by interval label or conceptual label
• Concept hierarchy generation for nominal data: Here, attributes can be
generalized to higher level concepts
Data Reduction
• Reduction of multitudinous amounts of data down to the meaningful parts.
• Data De-Duplication
• Sampling
• Feature Selection
• Dimensionality Reduction
• Data compression reduces the size of a file by removing redundant
information from files so that less disk space is required.
• Archiving data also reduces data on storage systems, but the approach is
quite different. 
What is Data Mining?
• Data mining (knowledge discovery from data)
• Extraction of interesting (non-trivial, implicit, previously unknown and potentially
useful) patterns or knowledge from huge amount of data
• Data mining: a misnomer?

• Alternative names
• Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging, information
harvesting, business intelligence, etc.
Data Mining Functionalities
-What kind of patterns can be mined?
• Concept/Class Description: Characterization and Discrimination
• Data can be associated with classes or concepts.
• E.g. classes of items – computers, printers, …
concepts of customers – bigSpenders, budgetSpenders, …
• How to describe these items or concepts?
• Descriptions can be derived via
• Data characterization – summarizing the general characteristics of a
target class of data.
• E.g. summarizing the characteristics of customers who spend more than $1,000 a year
at AllElectronics. Result can be a general profile of the customers, such as 40 – 50 years old,
employed, have excellent credit ratings.
Data Mining Functionalities
-What kind of patterns can be mined?
• Data discrimination – comparing the target class with one or a set of
comparative classes
• E.g. Compare the general features of software products whole sales increase by 10% in the last year with those
whose sales decrease by 30% during the same period
• Or both of the above

• Mining Frequent Patterns, Associations and


Correlations
• Frequent itemset: a set of items that frequently appear
together in a transactional data set (e.g. milk and bread)
• Frequent subsequence: a pattern that customers tend to purchase product A, followed by
a purchase of product B
Data Mining Functionalities
- What kinds of patterns can be mined?
• Association Analysis: find frequent patterns
• E.g. a sample analysis result – an association rule:
buys(X, “computer”) => buys(X, “software”) [support = 1%, confidence = 50%]
(if a customer buys a computer, there is a 50% chance that she will buy software.
1% of all of the transactions under analysis showed that computer and software
are purchased together. )
• Associations rules are discarded as uninteresting if they do not satisfy both a
minimum support threshold and a minimum confidence threshold.
• Correlation Analysis: additional analysis to find statistical correlations
between associated pairs

Data Mining: Concepts and Techniques 40


Data Mining Functionalities
- What kinds of patterns can be mined?
• Classification and Prediction
• Classification
• The process of finding a model that describes and distinguishes the data classes
or concepts, for the purpose of being able to use the model to predict the class of
objects whose class label is unknown.
• The derived model is based on the analysis of a set of training data (data objects
whose class label is known).
• The model can be represented in classification (IF-THEN) rules, decision trees,
neural networks, etc.

• Prediction
• Predict missing or unavailable numerical data values

Data Mining: Concepts and Techniques 41


Data Mining Functionalities
- What kinds of patterns can be mined?

Data Mining: Concepts and Techniques 42


Data Mining Functionalities
• Cluster Analysis
• Class label is unknown: group data to form new classes
• Clusters of objects are formed based on the principle of maximizing
intra-class similarity & minimizing interclass similarity
• E.g. Identify homogeneous subpopulations of customers. These clusters may
represent individual target groups for marketing.

Data Mining: Concepts and Techniques 43


Case Studies

 Marketing Analytics
 Customer Behavior Analytics
Data Mining Functionalities
• Outlier Analysis
• Data that do no comply with the general behavior or model.
• Outliers are usually discarded as noise or exceptions.
• Useful for fraud detection.
• E.g. Detect purchases of extremely large amounts

• Evolution Analysis
• Describes and models regularities or trends for objects whose
behavior changes over time.
• E.g. Identify stock evolution regularities for overall stocks and for the stocks of
particular companies.

Data Mining: Concepts and Techniques 45


Are All of the Patterns Interesting?
• Data mining may generate thousands of patterns: Not all of them
are interesting
• A pattern is interesting if it is
• easily understood by humans
• valid on new or test data with some degree of certainity
• potentially useful
• novel
• validates some hypothesis that a user seeks to confirm
• An interesting measure represents knowledge !

Data Mining: Concepts and Techniques 46


Data Mining Tools
Reporting
Two main steps to be followed for reporting the insights are:
• Visualization
• Story telling

Data Visualization:
• It will make the data come to life.
• They are bright and easy way to convey complex data insights and most of them
require no coding.
• The goal is to communicate information clearly and efficiently to users. 
Reporting : Story Telling
• Data Scientists need to be able to influence.
• Data and insights which can shape the direction of a business should be
projected clearly and in interesting manner.
Visualization Tools Used:
The Magical Words
• Data Analytics, Data Analysis, Data
Mining, Data Science
• Let’s think about the data available to the
farmer, here’s a simplified breakdown:
1. Historic weather patterns
2. Plant breeding data and productivity for each strain
3. Fertilizer specifications
4. Pesticide specifications
5. Soil productivity data
6. Pest cycle data
7. Machinery cost, reliability, fault and cost data
8. Water supply data
9. Historic supply and demand data
10. Market spot price and futures data
Specialized areas

●  Financial analytics
●  Retail analytics
●  Market analytics
●  Social media
●  HR Analytics
●  Customer analytics
●  Pricing analytics

You might also like