Complete
Complete
Complete
BUSINESS ANALYTICS
Nagendra B.V
AGENDA
BUSINESS ANALYTICS
In God we trust; all others must bring Data
-Edward Deming
2. Technology
3. Data Science
BUSINESS
TECHNOLOGY
CONTEXT
ANALYTICS MATURITY MODEL
Prescriptive
Analytics What is the best action?
Value addition
Predictive
Analytics
What will happen in future?
Descriptive
Analytics What happened in the past?
Analytical capability
DATA TYPES
Nominal Data
Primary Structured
Ordinal Data
Secondary
Interval Data Unstructured
Ratio Scales
ACTIONABLE INSIGHTS
Data analysis is crucial to the success of most companies in today’s data driven business
world. New methods and accompanying software have been developed under the name data
mining. Data mining attempts to discover patterns, trends and relationships among data,
especially non-obvious and unexpected patterns.
Examples:
1. People who purchased milk also tend to purchase whole wheat bread
2. Cars built on Monday’s at 10AM on production line No5 using parts from supplier ABC
have significantly more defects than average
DATA WAREHOUSE
A data warehouse is a huge database that is designed specifically to study patterns in data. A
data warehouse is not the same as the databases companies use for their day to day
operations. A data warehouse should
A data warehouse represents a type of database that is specifically structured to enable data
mining.
DATAMART
A data mart is essentially a scaled down data warehouse, or part of an overall data warehouse, that is
structured specifically for one part of an organization. Virtually all large organizations, and many smaller
ones, have developed data warehouses or data marts in past decade or two to enable them to better
understand their business, customers, suppliers and processes.
Once a data warehouse is in place, analysts can begin to mine the data with a collection of
methodologies and accompanying software. Some of the primary methodologies are-
1. Classification
2. Prediction
3. Cluster Analysis
4. Market Basket Analysis
5. Forecasting
CRISP DM METHODOLOGY
WHAT IS MACHINE LEARNING
Machine learning algorithms are described as
learning a target function (f) that best maps
input variables (X) to an output variable (Y )
Y=f(X)
MACHINE LEARNING
Supervised
Unsupervised Learning
Reinforcement Learning
SUPERVISED LEARNING
Supervised Machine Learning
The majority of practical machine learning uses supervised learning. Supervised learning is where you have input
variables (X) and an output variable (Y ) and you use an algorithm to learn the mapping function from the input to the
output.
Y = f(X)
The goal is to approximate the mapping function so well that when you have new input data (X) that you can predict
the output variables (Y ) for that data. It is called supervised learning because the process of an algorithm learning from
the training dataset can be thought of as a teacher supervising the learning process. We know the correct answers, the
algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the
algorithm achieves an acceptable level of performance. Supervised learning problems can be further grouped into
regression and classication problems.
Classication: A classication problem is when the output variable is a category, such as red or blue or disease and no
disease.
Regression: A regression problem is when the output variable is a real value, such as dollars or weight.
UNSUPERVISED LEARNING
Unsupervised learning is where you only have input data (X) and no corresponding output
variables. The goal for unsupervised learning is to model the underlying structure or distribution
in the data in order to learn more about the data. These are called unsupervised learning
because unlike supervised learning above there is no correct answers and there is no teacher.
Algorithms are left to their own devises to discover and present the interesting structure in the
data. Unsupervised learning problems can be further grouped into clustering and association
problems.
Clustering, Association mining, Principal component analysis, factor analysis, MDS, RFM are
some of the well known unsupervised analytical models
SEMI SUPERVISED LEARNING
Problems where you have a large amount of input data (X) and only some of
the data is labeled (Y ) are called semi-supervised learning problems. These
problems sit in between both supervised and unsupervised learning. A good
example is a photo archive where only some of the images are labeled, (e.g. dog,
cat, person) and the majority are unlabeled. Many real world machine learning
problems fall into this area. This is because it can be expensive or time
consuming to label data as it may require access to domain experts. Whereas
unlabeled data is cheap and easy to collect and store.
CLUSTERING
In cluster analysis, the goal is to group the observations in the data
into meaningful clusters such that every datum in a cluster is more
similar to other datums in the same cluster than it is to datums in
other clusters. Cluster analysis can be related to the problem of
density estimation.
UNSUPERVISED LEARNING APPLICATIONS
Customer Segmentation
Anomaly detection
Image processing and image segmentation
Dimensionality reduction
Document clustering
Relationship identification
CRISP DM PHASES
Supervised Learning:
Association Mining
Market Basket Analysis (product bundling)