Complete

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

FUNDAMENTALS OF

BUSINESS ANALYTICS
Nagendra B.V
AGENDA
BUSINESS ANALYTICS
In God we trust; all others must bring Data
-Edward Deming

“It is a capital mistake to theorize before one has data.


-Sherlock Holmes
LET’S QUIZ
Number of mobile messages sent through WhatsApp, in a Day ??

Number of available applications in the Google Play Store ??

Number of monthly active Twitter users worldwide ??

Number of monthly active Facebook users worldwide ??

Number of Tweets per Day??


THIS IS THE REALITY
WhatsApp : 65 billion messages

Google Play Store : 3.5 billion applications as on December 2017

Twitter Users : 300 million users

Facebook Users : 4 billion users

Tweets : 500 billion tweets


BUSINESS ANALYTICS
Business analytics is a set of statistical and operations research techniques,
artificial intelligence, information technology and management strategies used
for framing a business problem, collecting data and analysing the data to create
value to organizations.
WHY ANALYTICS?
With the advent of technology and robust workflow systems organizations
today collect voluminous data both structured and unstructured through
various data sources. The process of extracting actionable insights from such
voluminous data is often termed as big data analytics in the world of data
science. The organization must be able to extract actionable insights from
these huge data collected which in turn can be part of its metrics and
measurement system. Every actionable insight must be mapped with the
organization’s profitability to know the effectiveness of the metric being
measured.
SCENARIOS
1. Direct marketers analyze enormous customer databases to see which
customers are likely to respond to various products and promotions
2. Hotel and airlines adapt pricing strategies based on historical data of
reservations
3. Financial services need to detect anomalies in accounting
4. Customer sentiments impact stock performances
5. Telecom companies need to adapt customer retention strategies
6. Banks analyze transactions to identify fraudulent transactions
7. Networks teams analyze past records to detect spams and hams
SCENARIOS
8. Banks analyze historical lending data to identify defaulters
9. Manufactures analyze historical sales data to sense future demand
10. Hotel, tourism and hospitality sectors analyze customer data to identify
seasonality and customers sentiments
11. Human resources analyze employee data to identify employees who are
likely to attrite
12. Social media platforms analyze customer data to make recommendations
13. Healthcare industries/hospitals analyze patient records to detect diseases
14. Weather prediction
15. Market segmentation/customer segmentation
SCENARIOS
16. Commodity price prediction and advising farmers on the kind of crops to
be grown
17. Ecommerce companies optimize revenue by identifying customers who are
likely to cancel orders
18. Predicting delivery time for improved customer satisfaction
19. Predicting crimes and their nature
20. Predicting cancer stages and remediation
21. Revenue and profit optimization
22. Process optimization
23. Productivity improvement
COMPONENTS OF BUSINESS ANALYTICS

The Business Analytics as an entity comprise of the following components:

1. Business Context DATA SCIENCE

2. Technology
3. Data Science

BUSINESS
TECHNOLOGY
CONTEXT
ANALYTICS MATURITY MODEL

Prescriptive
Analytics What is the best action?
Value addition

Predictive
Analytics
What will happen in future?

Descriptive
Analytics What happened in the past?

Analytical capability
DATA TYPES

Nominal Data
Primary Structured
Ordinal Data
Secondary
Interval Data Unstructured

Ratio Scales
ACTIONABLE INSIGHTS

Data analysis is crucial to the success of most companies in today’s data driven business
world. New methods and accompanying software have been developed under the name data
mining. Data mining attempts to discover patterns, trends and relationships among data,
especially non-obvious and unexpected patterns.

Examples:

1. People who purchased milk also tend to purchase whole wheat bread
2. Cars built on Monday’s at 10AM on production line No5 using parts from supplier ABC
have significantly more defects than average
DATA WAREHOUSE

A data warehouse is a huge database that is designed specifically to study patterns in data. A
data warehouse is not the same as the databases companies use for their day to day
operations. A data warehouse should

1. Combine data from multiple sources to discover as many relationships as possible


2. Contain accurate and consistent data
3. Be structured to enable quick and accurate responses to a variety of queries
4. Allow follow-up responses to specific relevant questions

A data warehouse represents a type of database that is specifically structured to enable data
mining.
DATAMART

A data mart is essentially a scaled down data warehouse, or part of an overall data warehouse, that is
structured specifically for one part of an organization. Virtually all large organizations, and many smaller
ones, have developed data warehouses or data marts in past decade or two to enable them to better
understand their business, customers, suppliers and processes.

Once a data warehouse is in place, analysts can begin to mine the data with a collection of
methodologies and accompanying software. Some of the primary methodologies are-

1. Classification
2. Prediction
3. Cluster Analysis
4. Market Basket Analysis
5. Forecasting
CRISP DM METHODOLOGY
WHAT IS MACHINE LEARNING
Machine learning algorithms are described as
learning a target function (f) that best maps
input variables (X) to an output variable (Y )

Y=f(X)
MACHINE LEARNING

Supervised

Unsupervised Learning

Reinforcement Learning
SUPERVISED LEARNING
Supervised Machine Learning
The majority of practical machine learning uses supervised learning. Supervised learning is where you have input
variables (X) and an output variable (Y ) and you use an algorithm to learn the mapping function from the input to the
output.
Y = f(X)
The goal is to approximate the mapping function so well that when you have new input data (X) that you can predict
the output variables (Y ) for that data. It is called supervised learning because the process of an algorithm learning from
the training dataset can be thought of as a teacher supervising the learning process. We know the correct answers, the
algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the
algorithm achieves an acceptable level of performance. Supervised learning problems can be further grouped into
regression and classication problems.
Classication: A classication problem is when the output variable is a category, such as red or blue or disease and no
disease.
Regression: A regression problem is when the output variable is a real value, such as dollars or weight.
UNSUPERVISED LEARNING
Unsupervised learning is where you only have input data (X) and no corresponding output
variables. The goal for unsupervised learning is to model the underlying structure or distribution
in the data in order to learn more about the data. These are called unsupervised learning
because unlike supervised learning above there is no correct answers and there is no teacher.
Algorithms are left to their own devises to discover and present the interesting structure in the
data. Unsupervised learning problems can be further grouped into clustering and association
problems.

Clustering, Association mining, Principal component analysis, factor analysis, MDS, RFM are
some of the well known unsupervised analytical models
SEMI SUPERVISED LEARNING
Problems where you have a large amount of input data (X) and only some of
the data is labeled (Y ) are called semi-supervised learning problems. These
problems sit in between both supervised and unsupervised learning. A good
example is a photo archive where only some of the images are labeled, (e.g. dog,
cat, person) and the majority are unlabeled. Many real world machine learning
problems fall into this area. This is because it can be expensive or time
consuming to label data as it may require access to domain experts. Whereas
unlabeled data is cheap and easy to collect and store.
CLUSTERING
In cluster analysis, the goal is to group the observations in the data
into meaningful clusters such that every datum in a cluster is more
similar to other datums in the same cluster than it is to datums in
other clusters. Cluster analysis can be related to the problem of
density estimation.
UNSUPERVISED LEARNING APPLICATIONS

Customer Segmentation
Anomaly detection
Image processing and image segmentation
Dimensionality reduction
Document clustering
Relationship identification
CRISP DM PHASES
Supervised Learning:

Logistic Regression-prediction of category/class (two outcomes Bernoulli trials)


Naïve Bayes Classifier (classification only)
Multiple Linear Regression-Prediction of numeric quantity
Ridge
Lasso
Loess

Both classification and Regression


Decision tree
Random forest
kNN
SVM (Support Vector Machine)
Neural Network
Boosting algorithms (Adaptive Gradient Discent and Xtreme Gradient Discient)
Unsupervised

Clustering-grouping of objects based on similarity


K-means
Hierarchical clustering
Density based clustering

Principal Component Analysis/Factor Analysis

Dimension reduction techniques

Association Mining
Market Basket Analysis (product bundling)

Multi Dimensional scaling

You might also like