Introduction To Business Analytics: Alka Vaidya Nibm
Introduction To Business Analytics: Alka Vaidya Nibm
Analytics
Alka Vaidya
NIBM
Key Trends in Analytics in
Indian Banks post CBS
• Disparity in Sophistication of Analytics across the banks
• For many banks, BI and Analytics are typically IT function.
• However, the banks which have made significant progress in
this area have Analytics as specialized, dedicated function
• The team size is ranging from 4 to 5 to as high as 100+ team
members
• Data Quality, data governance may still be a problem
• Unstructured data still not part of BI and analytics roadmap of
most banks
2
Data Analytics: Why Now?
- Humans are estimated to • Powerful and
generate 1.7 MB data per second enhanced computing
processes 3.7 billion searches / • Uninterrupted mobile
day connectivity, cheaper
has $258 sales per/min internet
• Improvements in
Machine Learning
4.14 million videos are algorithms
watched / min on
15.2 million texts /min are
sent on
3
Analytics as a Sub-Domain of AI
• Analytics and Machine Learning
• Deep Learning/Reinforcement Learning
• Robotic Process Automation
• Natural Language Processing
4
Analytics
• Data Analytics uses statistical methods, performs pattern
recognition, builds model using existing data that can be
applied to find unknowns
• The term itself rose to prominence around 2005, mostly due to
Google analytics, though the ideas behind analytics are not
new
• The words ‘likelihood’, ‘probability’, ‘confidence’ come very
often in analytics indicating if any pattern exists in the unit of
analysis
• How it is different from traditional Business Intelligence
Systems?
Information Detection using Standard and
Analytics Approach
Standard Analytics
Hypothesis
Approach Approach
?
Mostly Verification of Mostly about finding Unknowns,
Hypothesis potentially useful information
SQL, OLAP,
Visualization tools*.. Predictive, Prescriptive,
forward looking
Suggesting
Actions
Known Correlations
6
Machine Learning
• It can be called modern day extension of Predictive Analytics
• Simply put, Machine learning provides machines the ability to learn
automatically & improve from experience without being explicitly
programmed to do so.
• Machine learning is a method to automate analytical model building
• Nowadays, it has further advanced to Deep Learning
• Deep learning imitates the workings of the human brain in processing
data and creating patterns using neural networks
Instant Visual
translation
Types of Data Analytics
Type Purpose Methods/Techniques
9
What is Predictive Analytics?
Honest
Tridas Vickie Mike
Crooked
11
Prediction
Unsupervised Supervised
Clustering Classification
Neural Networks
Regression
14
Supervised Learning
• When to use it?
You know how to classify the input data and
the type of behaviour you want to predict,
Input but you need the algorithm to calculate it for
you on new data
• How it works?
A human labels every element of the input
data and defines the label of target variable
The target variable could be an event,
Output decision or a value (e.g. Yes/No, Good/Bad)
Records without a value in target variable
What it is?
cannot be used in building models
It’s done under supervision or a specific
direction The algorithm is trained on the data to find
An algorithm uses training data from humans the connection between the input variables
to learn the relationship of given inputs to a and the output
given output (e.g. How certain inputs will Once training is complete— typically when
determine a customer will buy my offer?) the algorithm is sufficiently accurate—the
algorithm is applied to the new dataset
Classification
• Most commonly supported by commercial DM tools
• It can work with categorical data
• Given a collection of records (training set )
– Each record contains a set of attributes, one of the attributes is the
class.
• Find a model for class attribute as a function of the values
of other attributes.
• Goal: previously unknown records should be assigned a
class as accurately as possible.
– A test set is used to determine the accuracy of the model. Usually,
the given data set is divided into training and test sets, with
training set used to build the model and test set used to validate it.
• Techniques include Decision Trees, Rule-based induction,
Neural Networks, etc.
16
A General Approach to Classification
Tid Refund Marital Taxable
Status Income Cheat
Learning
1 Yes Single 125K No
No
Algorithm
2 No Married 100K
3 No Single 70K No
4 Yes Married 120K No
Induction
5 No Divorced 95K Yes
6 No Married 60K No
Learn
7 Yes Divorced 220K No
Model Model
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Employment-SelfEmp Employment-Salaried
Credit Risk=Good:6 Credit Risk=Good:11
Credit Risk=Bad: 6 Credit Risk=Bad: 0
23
Why Decision Trees are Popular?
• Decision trees are very intuitive and easy to
interpret and explain to the top management
• All trees can be read as “if-then-else” rules
that ultimately generate a predicted value
• Decision trees implicitly perform variable
screening or feature selection
• Decision trees require relatively little effort
from users for data preparation
Regression
• It is the oldest statistical technique that the data
mining community utilizes
• It takes numerical dataset and develops a
mathematical formula that fits the data
• Using the formula you can predict the future
behaviour
• The technique can work only with continuous
quantitative data like stock values, income, etc.
– Time series prediction of stock market indices.
25
Unsupervised Learning
• When to use it?
You do not know how to classify the
data, and you want the algorithm to
find patterns and classify or group the
data for you
• How it works?
The algorithm receives data where all
columns are treated equally. (e.g. a set
What it is? of data describing customer journeys
An algorithm explores input data on a website)
without being given an explicit It infers a structure from the data
output variable (eg, explores
The algorithm identifies groups of
customer demographic data to
data that exhibit similar behaviour
identify patterns)
(e.g., forms clusters of customers that
exhibit similar buying behaviours)
Clustering
• Clustering means forming groups
• It falls under unsupervised DM technique
• There is no distinction between independent and dependent variables,
all variables participate equally in the cluster detection
• Cluster models find groups of data points that are relatively close to
each other
• Closeness in its most common way is measured by the Euclidean
distance i.e. the shortest distance between two points is a straight line
• The most commonly used clustering techniques are
– The K-means algorithm
– Kohonen’s Self Organising Maps
27
Clustering : Points to note,
• There is no dependent variable
• What you have are only input parameters
• There is no objective function here, hence sometimes
called subjective segmentation
• The segments are developed on its own based on the
values of input variables, hence called unsupervised
learning
28
Clustering Illustration
Studen Physic
t ID s Maths Students
P 15 20 120
Q 20 15 100
R 26 21
80
X 44 52
Y 50 45 60
Z 57 38 40
A 80 85 20
B 90 88 0
C 98 98 10 20 30 40 50 60 70 80 90 100 110
29
Association rule mining
• Also called market basket analysis
• Given a set of records each of which contain some
number of items from a given collection;
– Produce dependency rules which will predict
occurrence of an item based on occurrences of other
items.
• An association rule has two parts, an antecedent
(if) and a consequent (then). An antecedent is an
item found in the data. A consequent is an item
that is found in combination with the antecedent.
30
Association rule mining (Contd.)
Beer: Support = 9/10
Cola: Support=8/10
Transactions Nuts: Support=7/10
1: (Beer) Crisps: Support=4/10 (Drop Crisps)
2: (Cola, Beer)
3: (Cola, Beer)
4: (Nuts, Beer)
Beer, Cola: Support=7/10
5: (Nuts, Cola, Beer) Beer, Nuts: Support=6/10
6: (Nuts, Cola, Beer) Cola, Nuts: Support=6/10
7: (Crisps, Nuts, Cola)
8: (Crisps, Nuts, Cola, Beer)
9: (Crisps, Nuts, Cola, Beer)
10:(Crisps, Nuts, Cola, Beer)
Beer->Cola (Support=70%, Confidence= 7/9=77%
Cola->Beer (Support=70%, Confidence= 7/8=87.5
Beer->Nuts (Support=60%, Confidence= 6/9=66%
Nuts->Beer (Support= 60%, Confidence=
6/7=85.7%
Cola->Nuts (Support=60%, Confidence= 6/8=75%
Nuts->Cola (Support=60%, Confidence=
6/7=85.7% 31
Association vs. Classification
• Association • Classification
– Attributes are neither input – One output attribute,
nor output multiple input attributes
– Rules can have multiple – Rules have one attribute,
attributes in the consequent the output attribute, in the
part of the rule (the THEN consequent part of the rule
part)
32
Tapping into Location Based Marketing
33
Sequential Pattern Discovery
• Given is a set of objects, with each object associated with
its own timeline of events, find rules that predict strong
sequential dependencies among different events.
• In point-of-sale transaction sequences,
– Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -
(Perl_for_dummies,Tcl_Tk)
• In Stock Market,
(stock x increases by 3%) (stock y decreases by 2%)
(stock z decreases by 1%)
34
Natural Language Processing
(NLP)
• Natural language processing (NLP) is a branch of AI that helps
computers understand, interpret and manipulate human language.
• NLP draws from many disciplines, including computer science
and computational linguistics, in its pursuit to fill the gap
between human communication and computer understanding.
• It involves,
– Syntactic analysis to assess how the language aligns with the grammar
– Semantic analysis, a more difficult part to assess the meaning and
interpretation of words based on the context. Otherwise,
– “The spirit is willing, but the flesh is weak.” in Russian may say “The
vodka is good, but the meat is rotten”
Areas where NLP is used
• Sentiment analysis- Identifying the mood or
subjective opinions within large amounts of text,
including average sentiment and opinion mining.
• Speech-to-text and text-to-speech conversion.
• Document summarization- Automatically
generating synopses of large bodies of text.
• Machine translation- Automatic translation of text
or speech from one language to another
Opinion Mining and Sentiment
Analysis
• Opinion mining (Sentiment Analysis) is a type of natural language
processing for tracking the mood of the public about a particular
product
• It can help you judge the success of an ad campaign or new product
launch, determine which versions of a product or service are popular
and even identify which demographics like or dislike particular
features.
• By viewing customer posts, blogs, and news feeds, companies
effectively track customer complaints and offer efficient corrections.
Can we afford to neglect customers’ comments? (‘United Breaks
Guitar’ – David Carroll Case)
• BofA -Ann Minch Case – Hike in credit card rate. Her YouTube video
said that “the Bank jacked up my interest rate to a whooping 30%.”
After her video went viral, the bank slashed her rate back down to
12.99%.” 37
Where Banks can use it?
• Customer Awareness/Marketing
– A customer regularly visits a local restaurant. His bank can use this
information to offer dining- or meal-related deals such as discounts or
rebates, cash back offers, and loyalty points
• Lead Generation
– A customer posts information about an upcoming trip abroad. His bank can
use this information to offer foreign-exchange-related products, travel
insurance, and cash cards.
• Customer Feedback/Grievance Redressal
– Customer posts on social media about the cumbersome nature of the bank’s
Contact Center system. The bank sees that many customers are making the
same comment and uses this input to streamline the IVR process
38
Influence of Social Media
• FB/Twitter/G+/Pinterest… when will it
end?
• Social media can be effective and cost-
efficient marketing, sales, service, insight
and retention tool
• It’s an opportunity to listen to your
customer and form useful strategies based
on real-time feedback received from the
customers
Prescriptive Analytics
• Using descriptive data accumulated over time, predictive
analytics builds models for predicting events. However, it
does not recommend actions
• Prescriptive analytics is the area of business analytics (BA)
dedicated to finding the best course of action for a given
situation
• It recommends actions based on desired outcomes, taking
into account specific scenarios, resources and knowledge
of past and current events.
• Though the final decision is up to the managers,
prescriptive analytics can provide a reliable path to an
optimal solution for business 40
Limitations to Analytics
• While products are powerful, they are not
self-sufficient. Skilled technical and
analytical specialists who can structure the
problem and interpret the results are
needed, hence limitations are personnel-
related rather than technology related
41
• Researchers/Model builders may resort to
explanations that do not make sense.
• Hence, it’s extremely important to involve
domain experts in the process
• IT will facilitate and make the data available
42