100% found this document useful (1 vote)
77 views41 pages

Introduction To Business Analytics: Alka Vaidya Nibm

The document provides an introduction to business analytics and key trends in analytics for Indian banks, including the disparity in sophistication across banks and focus areas like data quality. It discusses why analytics is important now due to increases in data generation and computing power, and covers machine learning, different types of analytics including descriptive, predictive, and prescriptive, as well as techniques like supervised learning and classification.

Uploaded by

Prabhat Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
77 views41 pages

Introduction To Business Analytics: Alka Vaidya Nibm

The document provides an introduction to business analytics and key trends in analytics for Indian banks, including the disparity in sophistication across banks and focus areas like data quality. It discusses why analytics is important now due to increases in data generation and computing power, and covers machine learning, different types of analytics including descriptive, predictive, and prescriptive, as well as techniques like supervised learning and classification.

Uploaded by

Prabhat Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Introduction to Business

Analytics

Alka Vaidya
NIBM
Key Trends in Analytics in
Indian Banks post CBS
• Disparity in Sophistication of Analytics across the banks
• For many banks, BI and Analytics are typically IT function.
• However, the banks which have made significant progress in
this area have Analytics as specialized, dedicated function
• The team size is ranging from 4 to 5 to as high as 100+ team
members
• Data Quality, data governance may still be a problem
• Unstructured data still not part of BI and analytics roadmap of
most banks

2
Data Analytics: Why Now?
- Humans are estimated to • Powerful and
generate 1.7 MB data per second enhanced computing
processes 3.7 billion searches / • Uninterrupted mobile
day connectivity, cheaper
has $258 sales per/min internet
• Improvements in
Machine Learning
4.14 million videos are algorithms
watched / min on
15.2 million texts /min are
sent on

3
Analytics as a Sub-Domain of AI
• Analytics and Machine Learning
• Deep Learning/Reinforcement Learning
• Robotic Process Automation
• Natural Language Processing

4
Analytics
• Data Analytics uses statistical methods, performs pattern
recognition, builds model using existing data that can be
applied to find unknowns
• The term itself rose to prominence around 2005, mostly due to
Google analytics, though the ideas behind analytics are not
new
• The words ‘likelihood’, ‘probability’, ‘confidence’ come very
often in analytics indicating if any pattern exists in the unit of
analysis
• How it is different from traditional Business Intelligence
Systems?
Information Detection using Standard and
Analytics Approach
Standard Analytics
Hypothesis
Approach Approach
 ?
Mostly Verification of Mostly about finding Unknowns,
Hypothesis potentially useful information

SQL, OLAP,
Visualization tools*.. Predictive, Prescriptive,
forward looking
Suggesting
Actions
Known Correlations

*take advantage of human’s perception ability to discern patterns

6
Machine Learning
• It can be called modern day extension of Predictive Analytics
• Simply put, Machine learning provides machines the ability to learn
automatically & improve from experience without being explicitly
programmed to do so.
• Machine learning is a method to automate analytical model building
• Nowadays, it has further advanced to Deep Learning
• Deep learning imitates the workings of the human brain in processing
data and creating patterns using neural networks
Instant Visual
translation
Types of Data Analytics
Type Purpose Methods/Techniques

Descriptive Use of data to understand past Reporting, OLAP


and current performance Descriptive Stats (mean,
To carry out the trend analysis mode, median)
“What has happened” Frequency, Std. Deviation, etc

Predictive Use of past data to predict the Classification


future Regression
“What is likely to happen” Neural Networks, etc

Prescriptive To suggest actions based on Operations Research


optimal allocation of resources Optimization
by considering the business Simulation
rules/constraints

9
What is Predictive Analytics?

“predictive models exploit patterns


found in historical and
transactional data to identify risks
and opportunities.
Any Prediction from the following diagram?

Honest
Tridas Vickie Mike

Crooked

Wally Waldo Barney

11
Prediction

Tridas Vickie Mike

Honest = has round eyes and a smile

Predictive analytics does not do anything that any analyst


couldn’t accomplish with pencil and paper or a spreadsheet
if given enough time
12
Predictive Analytics Process:
CRISP-DM
• Cross-Industry Standard
Process Model for Data
Mining
• CRISP-DM Sequence
– Business Understanding
– Data Understanding
– Data Preparation
– Modelling
– Evaluation
– Deployment
Techniques for Analytics
Techniques

Unsupervised Supervised

Clustering Classification

Association Decision Tree

Sequential Analysis Rule Induction

Neural Networks

Regression

14
Supervised Learning
• When to use it?
 You know how to classify the input data and
the type of behaviour you want to predict,
Input but you need the algorithm to calculate it for
you on new data
• How it works?
 A human labels every element of the input
data and defines the label of target variable
 The target variable could be an event,
Output decision or a value (e.g. Yes/No, Good/Bad)
 Records without a value in target variable
What it is?
cannot be used in building models
It’s done under supervision or a specific
direction  The algorithm is trained on the data to find
An algorithm uses training data from humans the connection between the input variables
to learn the relationship of given inputs to a and the output
given output (e.g. How certain inputs will  Once training is complete— typically when
determine a customer will buy my offer?) the algorithm is sufficiently accurate—the
algorithm is applied to the new dataset
Classification
• Most commonly supported by commercial DM tools
• It can work with categorical data
• Given a collection of records (training set )
– Each record contains a set of attributes, one of the attributes is the
class.
• Find a model for class attribute as a function of the values
of other attributes.
• Goal: previously unknown records should be assigned a
class as accurately as possible.
– A test set is used to determine the accuracy of the model. Usually,
the given data set is divided into training and test sets, with
training set used to build the model and test set used to validate it.
• Techniques include Decision Trees, Rule-based induction,
Neural Networks, etc.

16
A General Approach to Classification
Tid Refund Marital Taxable
Status Income Cheat
Learning
1 Yes Single 125K No
No
Algorithm
2 No Married 100K
3 No Single 70K No
4 Yes Married 120K No
Induction
5 No Divorced 95K Yes
6 No Married 60K No
Learn
7 Yes Divorced 220K No
Model Model
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

Refund Marital Taxable Apply


Status Income Cheat
Model
No Single 75K ?
Yes Married 50K ?
No Married 150K ? Prediction
Yes Divorced 90K ?
No Single 40K ?
No Married 80K ?
17
10
Decision Trees
• As illustrated in Figure, a
decision tree is composed
of the following parts
1. Node – Contains a test of an
attribute
2. Branch – Contains a response
to each attribute
3. Leaf – Each leaf is associated
with a class
4. Rule – Each route from the
root to a leaf corresponds to a
classification rule.
height hair eyes class
short blond blue A
tall blond brown B
tall red blue A
short dark blue B
tall dark blue B
tall blond blue A
tall dark brown B
short blond brown B
19
Decision Trees (cont.)
hair
dark blond
red

short, blue = B short, blue = A


{tall, blue = A}
tall, blue = B tall, brown = B
tall, brown= B tall, blue = A
short, brown = B

Completely classifies dark-haired Does not completely classify


and red-haired people blonde-haired people.
More work is required
20
Decision Trees (cont.)
hair
dark blond
red

short, blue = B short, blue = A


{tall, blue = A}
tall, blue = B tall, brown = B
tall, brown= B tall, blue = A
short, brown = B
Decision tree is complete because
1. All 8 cases appear at nodes
eye
2. At each node, all cases are in blue brown
the same class (A or B)
short = A tall = B
tall = A short = B
21
22
All
Credit Risk =Good : 17
Credit Risk =Bad : 16

Debt - Low Debt - High


Credit Risk=Good: 17 Credit Risk=Good: 0
Credit Risk=Bad: 6 Credit Risk=Bad: 10

Employment-SelfEmp Employment-Salaried
Credit Risk=Good:6 Credit Risk=Good:11
Credit Risk=Bad: 6 Credit Risk=Bad: 0

Income Level=Low Income Level=High


Credit Risk=Good:0 Credit Risk=Good:6
Credit Risk=Bad: 6 Credit Risk=Bad: 0

23
Why Decision Trees are Popular?
• Decision trees are very intuitive and easy to
interpret and explain to the top management
• All trees can be read as “if-then-else” rules
that ultimately generate a predicted value
• Decision trees implicitly perform variable
screening or feature selection
• Decision trees require relatively little effort
from users for data preparation
Regression
• It is the oldest statistical technique that the data
mining community utilizes
• It takes numerical dataset and develops a
mathematical formula that fits the data
• Using the formula you can predict the future
behaviour
• The technique can work only with continuous
quantitative data like stock values, income, etc.
– Time series prediction of stock market indices.

25
Unsupervised Learning
• When to use it?
 You do not know how to classify the
data, and you want the algorithm to
find patterns and classify or group the
data for you
• How it works?
 The algorithm receives data where all
columns are treated equally. (e.g. a set
What it is? of data describing customer journeys
An algorithm explores input data on a website)
without being given an explicit  It infers a structure from the data
output variable (eg, explores
 The algorithm identifies groups of
customer demographic data to
data that exhibit similar behaviour
identify patterns)
(e.g., forms clusters of customers that
exhibit similar buying behaviours)
Clustering
• Clustering means forming groups
• It falls under unsupervised DM technique
• There is no distinction between independent and dependent variables,
all variables participate equally in the cluster detection
• Cluster models find groups of data points that are relatively close to
each other
• Closeness in its most common way is measured by the Euclidean
distance i.e. the shortest distance between two points is a straight line
• The most commonly used clustering techniques are
– The K-means algorithm
– Kohonen’s Self Organising Maps

27
Clustering : Points to note,
• There is no dependent variable
• What you have are only input parameters
• There is no objective function here, hence sometimes
called subjective segmentation
• The segments are developed on its own based on the
values of input variables, hence called unsupervised
learning

28
Clustering Illustration
Studen Physic
t ID s Maths Students
P 15 20 120

Q 20 15 100
R 26 21
80
X 44 52
Y 50 45 60

Z 57 38 40

A 80 85 20

B 90 88 0
C 98 98 10 20 30 40 50 60 70 80 90 100 110

• The groups are homogeneous within and heterogeneous across


based on their characteristics
• If a new subject ‘Chemistry’ is added, cluster formation may
undergo a change totally

29
Association rule mining
• Also called market basket analysis
• Given a set of records each of which contain some
number of items from a given collection;
– Produce dependency rules which will predict
occurrence of an item based on occurrences of other
items.
• An association rule has two parts, an antecedent
(if) and a consequent (then). An antecedent is an
item found in the data. A consequent is an item
that is found in combination with the antecedent.

30
Association rule mining (Contd.)
Beer: Support = 9/10
Cola: Support=8/10
Transactions Nuts: Support=7/10
1: (Beer) Crisps: Support=4/10 (Drop Crisps)
2: (Cola, Beer)
3: (Cola, Beer)
4: (Nuts, Beer)
Beer, Cola: Support=7/10
5: (Nuts, Cola, Beer) Beer, Nuts: Support=6/10
6: (Nuts, Cola, Beer) Cola, Nuts: Support=6/10
7: (Crisps, Nuts, Cola)
8: (Crisps, Nuts, Cola, Beer)
9: (Crisps, Nuts, Cola, Beer)
10:(Crisps, Nuts, Cola, Beer)
Beer->Cola (Support=70%, Confidence= 7/9=77%
Cola->Beer (Support=70%, Confidence= 7/8=87.5
Beer->Nuts (Support=60%, Confidence= 6/9=66%
Nuts->Beer (Support= 60%, Confidence=
6/7=85.7%
Cola->Nuts (Support=60%, Confidence= 6/8=75%
Nuts->Cola (Support=60%, Confidence=
6/7=85.7% 31
Association vs. Classification
• Association • Classification
– Attributes are neither input – One output attribute,
nor output multiple input attributes
– Rules can have multiple – Rules have one attribute,
attributes in the consequent the output attribute, in the
part of the rule (the THEN consequent part of the rule
part)

• Association Rules - Caution


• Many discovered relationships may be trivial
• Many discovered relationships are useless

32
Tapping into Location Based Marketing

Cross Brand Partnership Opportunities for


Merchants

33
Sequential Pattern Discovery
• Given is a set of objects, with each object associated with
its own timeline of events, find rules that predict strong
sequential dependencies among different events.
• In point-of-sale transaction sequences,
– Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -
(Perl_for_dummies,Tcl_Tk)
• In Stock Market,
(stock x increases by 3%) (stock y decreases by 2%)
(stock z decreases by 1%)

34
Natural Language Processing
(NLP)
• Natural language processing (NLP) is a branch of AI that helps
computers understand, interpret and manipulate human language.
• NLP draws from many disciplines, including computer science
and computational linguistics, in its pursuit to fill the gap
between human communication and computer understanding.
• It involves,
– Syntactic analysis to assess how the language aligns with the grammar
– Semantic analysis, a more difficult part to assess the meaning and
interpretation of words based on the context. Otherwise,
– “The spirit is willing, but the flesh is weak.” in Russian may say “The
vodka is good, but the meat is rotten”
Areas where NLP is used
• Sentiment analysis- Identifying the mood or
subjective opinions within large amounts of text,
including average sentiment and opinion mining.
• Speech-to-text and text-to-speech conversion.
• Document summarization- Automatically
generating synopses of large bodies of text.
• Machine translation- Automatic translation of text
or speech from one language to another
Opinion Mining and Sentiment
Analysis
• Opinion mining (Sentiment Analysis) is a type of natural language
processing for tracking the mood of the public about a particular
product
• It can help you judge the success of an ad campaign or new product
launch, determine which versions of a product or service are popular
and even identify which demographics like or dislike particular
features.
• By viewing customer posts, blogs, and news feeds, companies
effectively track customer complaints and offer efficient corrections.
Can we afford to neglect customers’ comments? (‘United Breaks
Guitar’ – David Carroll Case)
• BofA -Ann Minch Case – Hike in credit card rate. Her YouTube video
said that “the Bank jacked up my interest rate to a whooping 30%.”
After her video went viral, the bank slashed her rate back down to
12.99%.” 37
Where Banks can use it?
• Customer Awareness/Marketing
– A customer regularly visits a local restaurant. His bank can use this
information to offer dining- or meal-related deals such as discounts or
rebates, cash back offers, and loyalty points
• Lead Generation
– A customer posts information about an upcoming trip abroad. His bank can
use this information to offer foreign-exchange-related products, travel
insurance, and cash cards.
• Customer Feedback/Grievance Redressal
– Customer posts on social media about the cumbersome nature of the bank’s
Contact Center system. The bank sees that many customers are making the
same comment and uses this input to streamline the IVR process

38
Influence of Social Media
• FB/Twitter/G+/Pinterest… when will it
end?
• Social media can be effective and cost-
efficient marketing, sales, service, insight
and retention tool
• It’s an opportunity to listen to your
customer and form useful strategies based
on real-time feedback received from the
customers
Prescriptive Analytics
• Using descriptive data accumulated over time, predictive
analytics builds models for predicting events. However, it
does not recommend actions
• Prescriptive analytics is the area of business analytics (BA)
dedicated to finding the best course of action for a given
situation
• It recommends actions based on desired outcomes, taking
into account specific scenarios, resources and knowledge
of past and current events.
• Though the final decision is up to the managers,
prescriptive analytics can provide a reliable path to an
optimal solution for business 40
Limitations to Analytics
• While products are powerful, they are not
self-sufficient. Skilled technical and
analytical specialists who can structure the
problem and interpret the results are
needed, hence limitations are personnel-
related rather than technology related

41
• Researchers/Model builders may resort to
explanations that do not make sense.
• Hence, it’s extremely important to involve
domain experts in the process
• IT will facilitate and make the data available
42

You might also like