0% found this document useful (0 votes)
11 views25 pages

Bi 2

Uploaded by

sifovec135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views25 pages

Bi 2

Uploaded by

sifovec135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Classification and Custer

3-4
Data Analytlcs Business Intelllgence and Data Analytics Classification and Clustering
Business Intelligence and
3-5
for testing
random sampling training and Vis purpose.
Formula for repeated Accuracy
Observationsintwo
disjoint sets T
and V. Tis for
method rnumber of times. Repeat Accuracy is calculated as the number of all correct predictlons divtded by the total number of dataset. The best
There are the holdout calculated1T, involves t
involvesreplicating correspondingaccuracy is
randomsampllng sample T,, is extractedand
For each
repetitlon a
observat aceuracy is 1.0whereas the worst Is 0.0. It can be calculated as.1 - EPR.
TRP +TAN
ACC = TRP + TAN + FAN + FAP
TRP +TAN
P+N
where V, =D- T%
accaF(V
ac =acap=2
True positive rate
k=1 predictions divided by the total
True posittve rate or sensittvity is calculated as the number of correct posítive
number of positives.
3.2.3 Cross-Validation The best true poslttve rate is 1.0 and worst is 0.0.
observation of dataset D appears the
assures that each TRP
overlapping test sets, It True positive rate = TRP + FAN
Cross validation evades
number of times. Lq,and require r
The cross validation is bascd on
dataset D. There are r disjoint
subsets L, Ly
partition as the training set
iteratlons.At True negative rate or specificity
union of all other subsets in the
and negatives.
iteration L, is selected as the test set Itis the number of correct negative predictions divided by the total number of
TAN
SP = TAN + FAP
have shown that 10.
cross validation. Extensive experiments
Standard method for evaluation is ten fold Precision
best choice to get accurate estimate. predíctions divided by the total number of positive
times and reculs It is calculated as the total number of correct positive
Ten fold cross validation repeated 10
Repeated stratified cross validation even better. predictions. The best preclsion is 1.0 whereas the worst is 0.0.
Leave one out is a particular form of cross validation. In this case m to
averaged (reduces the variance). TRP
measure accuracy. Precision = TRP+ FAP
include only one observation and each example in turn

3.2.4 Confusion Matrices False positive rate


number of negatives.
such as Yes/No and 1/0, for given innut d. It is calculated as the number of incorrect positive predictions divided by the total
Abinary classifier produces output with two class values or labels,
False positive rate = 1-Specificity
The class of interest is usually denoted as "positive" and the other as "negative".
FAP
Atest dataset is used for performance evaluation. It should hold the correct labels (observed labels) for all datz FPR = TAN + FAP =1 SP
instances. These labels are used to compare with the predicted labels for performance evaluation after Fscore is harmonic mean of precision and recall.
classification. (1+B (PREC- REC)
The predicted labels will be exactly the same if the performance of a binary classifier is perfect. but it is not (B PREC +REC)
common in practical situation.
Bis commonly 0.5, 1 or 2.
Abinary classifier predicts all data instances of a test dataset as either positive or negative. This classification (or
prediction) produces four outcomes - true positive, true negative, false positive and false negative. 3.2.5 ROCCurve Charts
First two basic measures from the confusion matrix. Error rate (ERR) and accuracy (ACC) are the most common evaluation measures specificity and
and intuitive measures derived from the confusion matrix. Receiver Operating Characteristics plot measure is based on two basic
sensitivity. Specificity is a performance measure of the whole negative part of a dataset.
Operating Characteristíc (ROC) curve
Sensitivity is a performance measure of the whole positive part. Receiver
Error rate

charts allow the user to visually evaluate the accuracy of a classifier.


The best error rate is 0.0, whereas the worst is 1.0.
express the information content of a sequence
ErTor rate is calculated as the total number of two incorrect It is used to compare different classification models. They visually
dataset (F +N).
predictions (FAN + FAP) divided by total number or of confusion matrices.
correctly classified positive observations and the number of
It allow the ideal trade-off between the number of
Error rate FAP+ FAN FAP +FAN In this respect, they are an alternative to the
=
ERR=TRP +TAN + FAN + FAP P+N
incorrectly classifled negative observations to be assessed.
assignment of misclassification costs.
TechKnowledge
PubIIations

TechKneuled!
Publlc ations
Classification and Cluste
3-6
Business Intelligence and Data Analytlcs Classification and Clustering
Data Analytics classifler 3-7
Business lntelligence and Four outcomes of a
Observed labels
Posltlve
prediction
Cost Total Number of Students Contacted Posltlve Response
FAN 10000 10000 6000
FAB 10000
20000 20000
negatiye(N)
30000 30000 13000

y-axis Sensitlvity 40000 40000 15800


X-axis 1- Speclflcity True positlve rate
False posltlve rate
150000 50000 17000

TAP FAN FAN 60000 60000 18000


FAP FAP TAN? 18800
70000 70000
separates
(P and N), and a classifier
ADataset has two labels
outcomes - TAP, TAN, FAP,
FAN. The ROC plot 80000 80000 19400
the dataset into four measures - specificity and sensitivity
(s based on two basio the from the four outcomes. 90000 90000 198000
That are calculated from
Fig. 3.2.2 20,000
1,00,000 1,00,000
performance levels. ROC curves bortom
corner area (0.0, 1.0) show good
ROC curves with the top left Cumulative gain chart
performance levels.
corner (1.0, 0.0) area indicate poor the percentage of students contacted.
ROC Curve of a random classifier
A The y ax0s shows the percentage of positive response and x axis shows
students thenn number of students are
Baseline- overallresponse rate-It means if institute contact n number of
1.00
positive.
for the percentage of
Lift curve-Using prediction of response model calculate the percentage of positive response
0.75 Good

0.50 Random the students contacted. e.g. [6000/20000] 100 = 30 %.


Cumulative Gains Chart
0.25 Poor
100

0.00 90
0.00 0.25 0.50 0.75 1.00 sesuodseH
es 80
70 Lift curve
1- Speciticity
A ROC curve represents a classifier 60 Base line
with the random pertormance level. 50
The curve separates the space into 40
two areas for good and poor
periormance levels. a 30
20
Fig. 3.2.3
10
0
3.2.6 Cumulative Gain and Lift Charts 0 10 20 30 40 50 60 70 80 90 100
% Customers Contacted
Gain or lift is the measure of the effectiveness of classification model. It is calculated as the ratio between the
Fig. 3.2.4
results obtained with or without model. It is visual aid for calculating performance of classification model. Both
charts consist of lift curve and base line. Lift chart
For example, An educational institute wants to do mail get 10% of the responders and
marketing drive for new course. It costs institute irs Tor It shows actual lift For contacting 10% of students using no model we should
each item mailed. They have information of 1,00,000 students, Out of 1 model 30% of the responders soy value of the lift curve is 30/10 = 3. Similarly for 20% of students 50% of
lac 20000 students showed positve using
an idea that which customers to contact.
response. Suppose we use response model to assign score.
Prediction of response model. the responders so 50/20 = 2.5. The cumulative and lift chart gives

Tech Kaonlde
PubliCatlons
Tech Knouloly
Public atlons
Llon andClus
3-8

and Data Analytics LiftChart Business Intelligence and Data Analytics Classification and Clustertng
Business Intelligence 3-9
LiftCuve
P(A) Ís the prior probablity of Aoccurring Independently.In our example this Is P(Bp). This value ls given to us.
+Basellne
P(B) is the prior probablity of Boccurring independently. In our example this is P(Pos).
2.5
P(AJB) 0s the posterior probablity that Aoccurs given B. JInour example this is P(Bp| Pos).
2
That the probabílity of an individual having Blood pressure, glven that, that indtvidual got a po_ltive test
result. This is the value that we are looking to calculate.
1
P(BJA) 0s the lkelihood probability of Boccurring, given A. In our example thls is P(Pos|Bp). This value is given to
100
0 70 80 90
40 50 60
10 20 30 Putting our values into the formula for Bayes theorem we get :
% Customers Contacted
Fig.3.2.5 P(Bp|Pos) = (P(Bp) "P(Pos|Bp) / P(Pos)
The probability of getting a positive test result P (Pos) can be calculated using the Sensitivity and Specificity.
Using specificity and sensitivity are as follows:
Bayesian Methods
3.3
algorithms developed by Reverend
P(Pos) = P(Bp) *Sensitivity] +(P(~Bp) *(1- Specificity)
Bayes' theorem is one of
the earliest
probabilistic inference
assumes that there is
independence
among Bayes.Iu
predictor
P(Bp) Probability having blood pressure =0.01
Bayes' Theorem. It feature in a class is P(~Bp) = Probability of not having blood pressure =0.99
classification technique based on presence of a particular
classifier assumes that the unrelate Sensitivity = P(Pos/Bp) =getting positive result = 0.9
slmple terms, a Naive Bayes'
the presence of any other
feature. about 3 inches in P(Negl~Bp) = 0.9 =getting negative result
eif it is red, round, and diameter,
Ev
For example, a fruit may be consideredto be an apple
the other features. all of these P(Pos) = Probability of getting positive test result = (P(Bp) *Sensitivity] +[P(~Bp) *(1- Specificity))]
existence of
these features depend on each other or upon the
and that is why it is known as
'Nat propen 3.3.2 Naive Bayes Classifier (Simplification)
that this fruit is an apple
independently contribute to the probability
The naive Bayes algorithm reduces the complexity of Bayes' theorem by assuming conditional
p(class/data) =p(data/class)·p(class) p(data)
independence over the training dataset.
3.3.1 Bayes' Theorem Implementation This assumption makes the Bayes algorithm, naive.
example. Suppose we want to find the odds of an indi:
Let us implement the Bayes' Theorem using a simple Given, n different attribute values, the likelihood now can be written as,
she was tested for it and got a positive result. In the medical Rala
having high blood pressure, given that he or n
life and death situations. P(X,..X,|Y = II P(X,1Y)
probabilities play a very important role as it usually deals with
i=1
We assume the following: related to
In Naive Bayes algorithm considers the features that particular feature in a class is independent or not
P(Bp) is the probability of a person having Blood pressure. the presence of any other feature.
Assume 19% of the general populatlon has Blood pressure: So p(Bp)= 0.01 inches in diameter. In this
For example, a fruit may be considered to be an apple if it is red, round, and about 3
P(Pos) is the probability of gettingapositive test result. is an apple and that
case allproperties or features are independently contribute to the probability that this fruit
P(Neg) is the probability of getting anegative test result. it is known as 'Naive'. So in the above example, we are considering only one feature, that is the test result.
is why
P(Pos|Bp) is the probability of gettinga positive result ona test done for detecting Blood pressure, given thatyu If we add another feature, 'exercise'.
have Blood pressure. This has a value 0.9. In other words the test is correct 90% of the time. This is also calla the índividual exercises less
Let's say this feature has a binary value of 0 and 1, where the former signifies that
the Sensitivity or True Positive Rate. than or equal to 2 days a week and the latter signifies that the individual exercises greater
than or equal to 3 days
P(Negl~ Bp) is the probability of getting a negative result on a test done for detecting diabetes, given that youi a week.
not have diabetes. This also has a value of 0.9 and is therefore correct. 90o of the time. This is also called u of the 'exercise' feature, to compute
If we had to use both of these features, namely the test result and the value
Specificity or True Negative Rate. Naive Bayes' is an extension of Bayes' theorem that assumes
our final probabilities, Bayes' theorem would fail.
The Bayes formula is as follows:
that all the features are independent of each other.
P(A|B) P(BIA) P(A)
P(B)
TechKnouledge
PuDIcatlons
TechKnould
Publtatll
Classification and Cluster
3-10
Data Analytics Business Intelligence and Data Analytlcs 3-11 Classlflcation andClustering
Business Intelligence and
predictlon. When X (Age ='<= 30, Income = medium, student= yes, credit_rating = falr)
Advantages
It is easy and fast
to predict
class of test data
Baves classifer
set. It performs well in
performs better
multi clasS
compare to other models like logistic assumptregreiosnlo P(c1) = p(Buys_computer = yes)= 9/14 = 0.643
P(c2) = p(buys_computer = no) = 5/14 = 0.357
Naive
independence holds, a For
tralning data. compared to
numerical variable(s), numerical varla P(age <= 30/buys_computer = ves)- number of rows where age <= 30 buys computer = yes)
and you need less categorical input variables (number of rows which buys computer = yes)
in case of assumption).
It perform well which is a strong P(age <=30/buys_computer =yes) = 2/9 = 0.222
assumed (bell curve,
normal distribution is P(age <=30/buys_computer =no) =3/5 = 0.6000
Disadvantages observed In training data set, then P(Income = medium/buys_computer = yes) = 4/9 = 0.444
variable in test data set
has a
category,which was not
prediction. This is often
known as model wa
"Zero Frequency. P(income = medium/buys_computer = no) = 2/5 = 0.400
If categorical to make a
probability. It will be unable
assign a 0(zero) Laplace estimation. P(student = yes/buys_computer = yes)6/9 = 0.667
simplest techniques is called
solve this,one of the predictors. In real life situation, it is not
Bayes is the assumption of independent poss P(student = ues/buys_computer = no) = 1/5 =. 2000
The limitation of Naive independent. P(credit = fair/buys_computer = yes) = 6/9 = 0.667
which are completely
to get a set of predictors
P(credit =fair/buys_computer= no) = 2/S = 400
Applications of Naive Bayes Algorithms X = (Age ='<=30,Income= medium,student=yes,credit_rating= fair)
very fast.
Naive Bayes is used for making predictions in real time. It is To find p(X/buys computer = yes) = plage<30/buys computer yes)
multiple classes of target variable
feature. It predict the probability of
It is used for multi class prediction *p(income = medium/buys computer yes)*p(student =yes/buys computer =yes)
text classification (due to better result in multi class prohlame
Naive Bayes classifiers mostly used in is widely ...
compared to other algorithms. As a result, it *p(credit ration = fair/buys computer= yes)
independence rule) have higher success rate as media analysis, to identify positi. 0.222°0.444*0.667*0.667 = 0.044
and Sentiment Analysis (in social
Spam filtering (identify spam e-mail)
negative customer sentiments). 3.3.3 Bayesian Networks
builds a Recommendation System. It uses mach
Naive Bayes Classifier and Collaborative Filtering together resource or not.
user would like a given Bayesian networks are a type of Probabilistic Graphical Model. It is used to build models from data and/or
learning and data mining techniques to to predict whether a
expert opinion.
Example of Naive Bayes Classifier
It can be used for a wide range of tasks including time series prediction,decision under uncertaínty, diagnostics,
Sr. No Agencome Student Credit card performance Class- Buys computer automated insight anomaly detection and reasoning.
1 <30 High No Fair no A Bayesian network consist of two maín components. The first is an acyclic oriented graph where the nodes
2 < 30 High No Excellent No correspond to the predictive variables and the arcs indicate relationships of stochastic dependence.
30 To 59 High No Fair Yes The variable X, associated with node a,in the network which is dependent on predecessor nodes of a,
4 >60 Medium No Fair Yes
The second component consists of the table associated with the variable X, indicates the conditional distribution
of P(X IC ). where C represents the set of explanatory variables associated with the predecessor nodes of node
5 > 60 Low Yes Fair Yes a, in the network and is estimated based on the relative frequencies in the dataset.
>60 Low Yes Excellent No
7 30 To 59 Low 3.4 Logistic Regression
Yes Excellent Yes
<30 Medium No Fair No
Logistic regression is used to :
9 <30 Low Yes Fair Yes
Estimate the probability of an event occurs for a randomly selected observation verses the probability that the
10 60Medium Yes Fair Yes event does not occur.
11 <30 Medium Yes Excellent Predict the effect of variables on binary response variable.
Yes
12 30 To 59|Medium No Excellent Yes Classify observation by estimating the probability that an observation is ín particular category.
13 |30 To 59 High Yes Fair Model the probability of an event occurring depending on the values of the independent variable, which can be
Yes
14 >60 Medium No numerical.
excellent NO
TechKnouledy TechKnouledge
u b c ations
PubCatl00
Classification and Cluster
3-12
meansthe
Business Intelligence and
Data Analytics Binary. That dependenttvara
dependent variable is Default", "Living or- Dead".
Business Intelligence and Data Analytics 3-13 Classification and Clustering

Logisticregression
cantake onlytwo
lIs generally
possible values
used where the
such as "Yes or
No" "Default
or No

variables.
"Responde 3.5 Neural Networks

Aneural network comprises of units (neurons) which is arranged ín layers. It converts an input vector into some
"Yes or No" etc.
Non Responder" categorical or numerical output. Each unit takes an input, It applies a nonlinear function to t and then passes the output on to the next
variables can be
Independentfactors or layer. Generally the networks are defined to be feed-forward: aunit feeds its output to allthe units on the next
regression laver, but there is no feedback to the previous layer.
Example of logistic
card to a Weightings are applied to the signals which passes from one unit to another, and in these weightíngs which are
Example 1 whether to issue a credit cUstomer tuned in the training phase to adapt a neural network to the particular problem at hand.
going to build a model to decide on this credit card.
Ifa credit card
company is
customer is going to
"Default" or "Not Default" This is cal 3.5.1 The Rosenblatt Perceptron
whether the
will model for
"Default Propensity Modeling". when we plot the probability Perceptron were popularised by Frank Rosenblatt in the 1960. They appeared to have very powerful learning
1(or 0% to 100%).
of depend
probability of any event lies between 0and curve. algorithm. Aperceptron is a neural network unit (an artificial neuron) which does certain computations to detect
The 'S' shape
factors, it will demonstrate an features or business intelligence in the input data.
variable by independent
It consists of single neuron with adjustable synaptic weights and bias. It can be used to classify linearly separated
Example 2 pattern. A simple perceptron can be used to classify into two classes. A Perceptron is a supervised learning
get admission in a college of his or her cho
probability of a given candidate to
" Suppose we have to predict the The dependent variable is binary- "Admiscinn algorithm for binary classifiers. This algorithm enables neurons to learn and processes elements in the trainíng
in the admission test.
by the score candidates receives
set one at a time.

Admission".
linear it shows an 'S' shan
Inputs weights
and Probability of Selection is not
Since the relationship between the Score transformatis Net input Activatian
selection by a score. We need to do Logit
use a linear model to predict probability of function function
between the predictor and dependent variable linear, Use a l.
dependent variable to make the correlation +Output
getting the "Admission.
regression model to predict the probability of
100.0%
90.0%

80.0%

ofseie 70.0%
Probabilty Fig. 3.5.1
60.0% There are two types of Perceptrons: Single layer and Multilayer.
50.0%
Single layer Perceptrons can learn only linearly separable patterns.
40.0%
30.0%
Multilayer Perceptrons or feed forward neural networks with two or more layers have the greater
processing power.
20.0%
10.0%
Perceptron Function
0.0% Perceptron is a function that maps its input "x" which is multiplied with the learned weight coefficient; an output
200 300 400 500 600 700 800 value "f(x)"is generated.
Score in entrance test
f(x) = J1 if 0x+b>0
otherwise
Fig. 3.4.1 :Graph for selection of college
Where,
The above graph is called as Sigmoid function and it gives S-shaped curve. It gives value between 0 < p<1.
"o" = vector of real-valued weights.
The logistic function is defined as:
"b" =bias (an element that adjusts the boundary away from origin without any dependence on the input value).
Transformed = 1/(1+e^-x) "x" = vector of input x values.
Where e is the numerical constant Euler's number and x is a input we
plug into the function. Logit expression a m
be expressed as,
log p(x)/(1-p()) i=1

where the left hand side is called the logit or log Where, "m" = number of inputs to the Perceptron.
odds function. The odds signifies the ratio of
success to probability of failure. probabliy The output can be represented as "1" or "0." It can also be represented as "1" or "-1" depending on which
activation function is used.
TechKoul TechKaowledge
PubIICations
PUbica tlet
Cassification and Clue.
3-14

Data Analytics Business lntelligence and Data Analytics


Business Intelligence and the
3-15 Classification and Clustering
then applies
Inputs of a Perceptron moderates them with
certain weight values, transformDrman Each node of the network has glven welghts which are associated with the input arcs. Each node is associated
with adistortion or blas coefficient and an activatlon function.
inputs,
Perceptron accepts creditprofile, etc.
It has
A final result. married, age, past only two Back propagation algorithm is used in multilevel feed forward network.
function to output the
output is based on
inputs such as salaried,
summation function
"2" multiplies
allinputs of "x" by weights
"w andvaie th 3.6 Support Vector Machine
ABoolean False. The
True and
Yes and No or
adds them up as
follows : The simply way to describe SVM is abinary classifier. It attempts to find a hyperplane that can separate twoclass
loan). of data by the largest margin. Quazi Marufur Rahman gives a very good example of what is margin, and Janice Gates
=1 (issue bank
ox, >0=> then final output "o" points kernel trick. Ithink the kernel trick is most important part of SVM, it distinct SVM with other classifiers.
Forexample: If =-1 (deny bank
loan).
Else, final output "o" output. If it does not 3.6.1 Structural Risk Minimization
compared with the known
Inthe Perceptron Learning Rule, the predicted output is
adjustment to happen.
Cmats Structural Risk Minimization (SRM) (Vapnik and Chervonekis, 1974) is an inductive principle for model selection
backward toallow weight
the error is propagated characteristics:
used for learning from finite training data sets.
Perceptron has the following classifier.
layer binary linear It describes a general model of capacity control and provides a trade-off between hypothesis space complexity
algorithm for Supervised Learning of single
Perceptron is an (the VCdimension of approximating functions) and the quality of fitting the training data
are automatically learned.
Optimal weight coefficients decision is made If the neuron
is fired or not Suppose we have two dimensional data with different features x, so x,
multiplied with the input features and is greater thana
Weights are the weighting function
to check if the output of
Activation function applies a step rule two linearly separable l
enabling the distinction between the
Linear decision boundary is drawn
signal; otherwise, there is no outnut
and -1.
exceeds a certain threshold, it outputs a °88
Ifthe sum of the input signals
X4
Networks
3.5.2 Multi-Level Feed-Forward
lavel Fig. 3.6.1
layer (except for one input layer and one output
Multilayer Perceptron (MLP) includes at least one hidden since it includes in The above data can be divided into two classes class 1and class 2. The above data is linearly separable.
structure than the perceptron,
Multi-level feed-forward neural network, is a more complex
network with two input nodes i, and i,, two hidden neu Class + 1

nodes, hidden nodes and output nodes use a neural


(x)= t(x, x,) =0
h, and h,, two output neurons o, and o,
Here's the basic structure *
fx) < 0
We 0
W. >
f(x)
Trainingx Class -1
data
W6
°88°
W
Fig. 3.6.2
WA Wp
Astraight line will classify data into two classes. The equation is f(x) =f(x, x,] =0.
Fig. 3.5.2 The classifier is called as linear classifier. Data is called training sets.
The goal of back propagation is to optimize the weights so that the neural network can learn how to corredj
map arbitrary inputs to outputs.
Unseen pattern
Input nodes : Input nodes receive input the values of the explanatory attributes for each observation. Usual, (Test data)
the number of input nodes equals the number of explanatory variables.
Hidden nodes : Hidden nodes receives the information from input nodes and transforms the input values
inside the network. Each node is connected with outgoing arcs to output nodes or to other hidden nodes.
Output nodes: Output nodes receive connections from hidden nodes or from input nodes and return an oupu
value that corresponds to the prediction of the Fig. 3.6.3
response variable. Tech Kaonledr
PubCatlons
TechKnoulu
PUbicatios
Classification:and Clus
3-16
classifiedi as class -1.
itis
Analytics f(x)<0then Business Intelllgence and Data Analytics
Intelligenceand Data unseendataset 3-17 Classification and Clustering
Business The value of test error.
unseendataset quantitiestrainingerrorand value of training
Jf we add one more polnts.
Suppose wehave tow the low error is requ
positlon to define distribution of data,
Now we are In learns the pattern.
classiier data
trainingphase controlsthe unseen
Since during low., Becauseit error. Three points are lying
should be tralning
Testerroralso erroralongwith on the straight line
always lookfor test test error.
to
1. Wehave error notalways
improves
Improvingon training performance.
Z. result in poor test
capactymay
in machine
3. Increase true test error of
classifier. Fig. 3.6.6
difficult to estimate So we can say all above 3 points are not in general position.
4. It is classifier.
test value of 2m|)+1-log)
To ensure low
a(log m
Shattering

used = Test error s


training error + Hypothesis (H) shatters mpoints in n-dimensional space if all possible combinations of mpoints in n
VCdimension is dimensional space are correctly classified by H.
probability 1-n.
bound oftesterror with
It gives upper
training samples.
Where, M = Number of n is called as VC
dimension n=2
capacity of machine m =3
a = related complexity
Crervonekis test error s training error
VC = (Vapnik -
dimension with fixed sample size.
The graph of VC
Fig. 3.6.7
Upper bOund test error 2 =8 possible arrangements as this points can take two values0or1.
(Complexity) So VC dimension is cardinality of the largest set of points that the hypothesis can shatter. VC dimension of linear
classifier, (n + 1) (points should be in general position).
Training error For non linear classifier VC dimension is dificult to compare. VC dimension is directy related to
machine/hypothesis capacity error VC dimension gives probabilistic upper bound test error.
Vc dimenslon 3.6.2 Maximal Margin Hyperplane for Linear Separation
Fig. 3.6.4 The following is an example of hyper plane that separates training instances with no errors.

increases with VC dimension


As we increase VC dimension, the training error willbe reduced. Complexity
Upper round (dimension) first decreases later on it increases. For efficient classifier, the value of tect
should be minimum. To achieve this sum of penalty error or complexity error and training error should
minimum.

Points in general position


Fig. 3.6.8
In n dimensional feature space a set of mpoints (m > n) is in general position iff no subset (n + 1) points If we think then there are multiple hyper planes which can be choose for separating two data
points.
(n-1) dimensional hyperplane.

n=2 Where
m=4 m>nn+ 1=3

Fig. 3.6.9
Fig. 3.6.5
For the maximum margin hyper plane only examples on the margin mater (only these affect the distances).
These are called support vectors.
Techkou Tech Knouledge
PUbIlcatt PubIItations
Classification: and custa
3-18

and Data Analytics Buslness Intellgence and Data Analytlcs 3-19 Classificatlon and Clustering
Business Intelllgence
Definition
3.7 Clustering
planes Hsuch that:
Define the hyper Cluster analysis or clustering is the task of grouplng a set of objects in such a way that objects in the same group
Wx,+b2+1, wheny,=+1.
(called a cluster) are more simllar (ln some sense) to each other than to those In other groups (clusters).
wx,+b2-1, when y,=-1.
theplanes:
3.7.1 Clustering Methods
H, andH, are
H,:Wx+b2+1 Clustering methods must satlsfy a few general necessities, as indicated below.
vectors.
H, Wx,+ b2-1 the tips ofthe
support Clustering Methods
planes H, and H, are Necessltles
Thepoints on the where wx +b= 0.
median in between,
The planes H, is the positivepoint
1. Flexibility
distance to the closest
d*= the shortest negative point.
distanceto the closest 2. Robustness
d= the shortest
separating hyper plane is d' +d.
The margin (gutter) of a H4
3. Etficiency

Fig. 3.7.1: Clustering Methods Necessities


1. Flexibility
W'X +b=+1
There are clustering methods which can be used on numerical characteristics only. 1In such cases most of the
W:X +b0 time Euclidean metrics is used to determine the distances between observations.
WX +b=-1
A
flexible clustering algorithm is used to analyse datasets containing categorical attributes.
Fig. 3.6.10 2 Robustness

3.6.3 Nonlinear Separation The robustness of an algorithm is the stability of the clusters generated with respect to small changes in the
values of the attributes of each observation.
by a linear boundary.
Nonlinear Classification: Classes may not be separable 3. Efficiency
data to higher dimensions where it au
Kernels: Make linear models work in nonlinear settings By mapping
linear patterns. In some applications there are large number of observations In such case clustering algorithms must generate
line, flat plane an N-dimensional hvner l.
The simplest way to separate two groups of data is with straight clusters efficiently in order to guarantee reasonable computing times for large problems.
efficiently.
However there are situations where a non linear region can separate the groups more
SVM handles this by using kernel function(non linear) to map the data into different space where a hypernh 3.7.2 Taxonomy of Clustering Methods
(linear) cannot be used to do the separations.
The different types of Clustering based on the logic are partition methods, hierarchical methods, density based
It means a non linear function is learned by linear learning machine in a high dimensional feature space wè methods and grid methods.
the capacity of the system is controlled by aparameter that does not depend on the dimensionality of the spae
This is called as kernel trick which means kernel function transform the data into higher dimensional featu Types of Clustering
space to make it possible to perform the linear separation.
Kernel function map the data into new space. It take the inner product of new vectors. The image of the Inm 1.Partition methods
product of the data is the linear product of the images of the data. Two kernel function are shown as below:
2. Hierarchical methods
Polynomial kernel
3. Density based methods

4. Grid methods
Gaussian kernels

k(x, x) = exp (-l|%-H


2g
Fig. 3.7.2: Types of Clustering

TechKwe
PubIlCati
TechKneuledya
PubICatlons
Classification and Clustetn
3-20
Business lntelligence and Data Analytics Classiflcatlon and Clustering
and Data Analytics
3-21
Business Intelligence A third option which generalizes both the Euclldean and Manhattan metrics. The Minkowski
K of non--empty
1. Partition methods
ofthe given
datasetínto a
predetermined number subsets. The defined as,
distance

methods, is a divislon grouping.


convexshapeafter
Partitlon m
orat most dist (1,)) = q
generatea spherical
diferent Vj=1
categorized clusters by
2. Hlerarchical methods

methods, subset is divided into tree


structure. It homogeneh
In Hlerarchical required.
Predetermlned clusters are not
thresholds. Example
To calculate a distance between two polnts p (x y.) and q (x, y,)in
3. Density-based methods
between observations. xy-plane.
Hlerarchal and partition
methods are founded
on the distance
observations locally falling in a nelghbourhood of
Denslty-baseate Euclidoan Distance
rom the number of
methods determine clusters
diameter
observatlion. neighbourhood with a speclfied
belongs to a specific cluster, a threshold value.
For each member which less than a minimum
observations which should not be
contain a number of isolate anv
clusters of non-convex shape whlch helps them to
Denslty-based methods identify
Flg. 3.7.3
outllers.

4. Grid methods The distance between two points is the sum of the (absolute) differences of thelr coordinates. E.g. it counts 1 unit
achieved to reduce com.. for a straight move, and ít counts cost as 2 if one takes crossed move.
consisting of cells. The grid structure is
Grid methods obtain a grid structure Manhattan Distance
clusters generated.
times, despite a lower accuracy in the

3.7.3 Affinity Measures


pairs of clusters so that every data object is includedk
In Hierarchical clustering clusters are repeatedly links to
clusters the distance functions, such as the Manhattan ax
the hlerarchy. To determine the similarity between the
Euclidian distance functions, are used
X -Xl+ly-Y2
3.7.3(A) Distance Functions Fig. 3.7.4

Given two p-dimensional data objects i= (X, Xyp .x)and j= (Kyxg , the following common distang Inchess, the distance between squares on the chessboard for rooks is measured in Manhattan distance
functions can be defined: 3.7.4 Attribute
1 Euclidian Distance Function
An attribute is a data field, which represents a characteristic or feature of a data object. The nouns attribute,
2. Manhattan Distance Function dimension,feature, and variable are commonly recognized as attribute in literature.
1. Euclidian Distance Function In data warehousing attributes are referred as dimension. In Machine learning literature it is referred as feature,
while statisticians call this term as variable.

Data mining and database professionals commonly use the term attribute. Atributes describing a customer
2. Manhattan Distance Function object can include, for example, customer ID, name, and address.
d(i j) Univatiate distribution involves only one attribute. The distribution of data having two attributes is known as
bivariate.
Distances are always positive numbers. In the Euclidian distance
function, attributes with larger scales o
measurement may overcome attributes measured on a smaller scale. To prevent this
The type of an attribute is determined by the set of possible values the attribute can have. Attributes can be
problem, the attribut nominal, binary, ordinal, or numeric. In the following subsections, we introduce each type.
values are often normalized to lie between 0 and 1.

TechKneledn Tech Kaouledge


PublCat/ons PUDIItatlons
Classification and Clustern,

3-22 Business Intelligence and Data Analytics 3-23


Classiflcation and Clustering
Data Analytlcs
Business Intelligence and If objects Iandjare described by symmetric binary attributes, then the dissimilarity between iand j ls,
Typesof Attrlbute

d()) = a+r+S+t
1. BinarY The above equatlon states a degree of similarity between paírs(i, I) of observations through the coefficlent
2. Nominal of simllarlty. Assume that all n attributes are binary and asymmetric. In such case, for a pair of asymmetric
attributes it is interesting to match positives, records possessing the property relative to each attribute.
3. Ordinal For binary variables, the Jaccard coefficient is therefore used
Composition Attribute
4. Mixed d(i, j)) = r+s/q +r+s
Fig. 3.7.5
2 Nominal Attribute

1. 0means at Nominal attributes means "relating to names." Nominal attribute are symbols or names of things. Each value
1. Binary Attributes categories or states 0or
attribute. It has two two states denotes some kind of category, code, or state. Nominal attributes are also referred as categorical. In
treated as binary
Nominal attribute is
present. Binary attributes
are referred to as Boolean as
patíent object, 1 indicates
Correespond to
that the computer science, the values are also known as enumerations.
absent and 1 means it is Smoker describing a
that it is present. E.g. measure for two objects, i
and j,
will
true and false. 1 means
smokes, while 0indicates
that the patient does
objects are unalike.
not a similarity
typicaly Nominal attributes. Suppose that Hair color and Marital status are two attributes describing person objects.
In our application, possible values for Hair color are black, brown blond, red, auburn, grey, and white.
return the value O if the (Typically, a value of 1 ind It is symmetric attribute where the value is greater than 2.We use similarity coefficient in extended form,
the greater the similarity between objects.
The higher the similarity value, dist (i .)) = (n -)/n
objects are identical.)
complete similarity, that is,that the a value of 0if the objects
are the sanm
Where, fis the number ofattributes in which observations i and jtake the sane value.
works the opposite way. It returns more dissimilar the twn a
A dissimilarity measure higher the dissimilarity value, the
therefore, far from being dissimilar. The states. For example, flower color is
a nomínal attribut.,
Ordinal Attribute
on two or more
are a nominal attribute can take Values of ordinal attribute has possible values and have a meaningful order or ranking among them. The
yellow, green, pink, and blue.
may have, say, five states: red, letters, symbols magnitude between consecutive values is not known.
attribute be M. The states can be denoted by
Let the number of states of a nominal on .
dissimilarity between two objects iand j can be computed based Suppose that Drink size corresponds to the size of drinks available at a restaurant. This ordinal attribute has
of integers, such as 1, 2, ..., M. The
ratio of mismatches: three possible values - small, medium, and large. However, we cannot tell from the values how much bigger,
say, a medium is from a large ordinal variable can be discrete or continuous. Order is important and can be
p-m
d(i,) = p treated like interval scaled.
iand j are in the same state). and
Where m is the number of matches (i.e., the number of attributes for which Replace ordinal variables value by its rank re {1... M)
assigned to increase the effect of m
p is the total number of attributes describing the objects. Weights can be
having a larger number of states Map the range of variable (0,1].
or to assign greater weight to the matches in attributes
binary data.
There is another approach which involves computing a dissimilarity matrix from the given
Table 3.7.1 :A contingency table for binary attributes
4. Mixed Composition attribute
Object j
Object i 1 sum Adataset contain all attribute types nominal, ordinal, symmetric binary, asymmetricbinary etc. To define an
1 R
overall affinity measure which defines similarity between observations d, and d, One can use weighted
q q+r
formula as follows,
S t s+t
P
sum q+s r+t

Where g is the number of attributes that equal 1 for both objects i and i. r is the number of
attributes that d(1, j) =
f=1
equal 1 for objecti but that are 0 for object j. s is the number of attributes that equal 0 for P
1for object j. Andt is the number of attributes that egual 0 for both
object i but equal
obijects i and i. The total number o
attributes is p. Where p = q+r+ s + t. Recall that for symmetric binarv
valuable attributes, each state is equaly f=1

TechKnomled Tech Knouledge


PuDIIcations
PubIICatlons
Classification and Clusterh
3-24
Analytics Business Intelligence and Data Analytics
Business Intelligence and Data 3-25 Classification and Clustering
distance. Eyample of implementation of k means algorithm using
uses the normallzed
Iffis numeric it
0ifx= xf
k=2(partitions)
Iffis binary or nomínal d., =
computes rank Zyf Varlable1 Varlable2
Iff is ordtnal then it
1 1.0

2 1,5 2.0
3 3 4.0
Partition Methods
3.8 where at each sten
heuristic nature. Thev are based on
greedy methods they make th 4 5 7.0

Partition methods are 5 3.5 5.0


the most advantageous. majority of the datasets.
choice that locally appears The Kmea
be obtained for the 6
good subdivision will
4.5 5.0
that a
There is guarantee best-known partition algorithms
method, , are two of the 7 3.5
method and the K-medolds 4.5

3.8.1 K-means Algorithm


Step 1: Randomly we choose two centroids for k= 2.
In this case two centroids are c, and c, where c, =(1.0,1.0) and c, =
group the objects based on features or (5.07.0).
Algorithm
K an algorithm
is used to isclassify
means clustering
is used to classify or
into knumber of groups. Kis positive integer. The grouping is done by minlmaing attrlbute Individual Mean vector
cluster centroid.
between data and the corresponding Group 1 1 (1.0,1.0)
the sum of squares of distances
two varlables (as in the a
and each individual's scores include Group 2
The algorithm assumes two clusters, (5.0,7.0)
algorithm. The relationship between clheb
above). In non-hierarchical clustering such as the k-means are used to deta,
d(m, 2) = V|1.0- 1.5|?+ |1.0 -2.0|= 1.12
undetermined. Distance functions such as Manhattan and Euclidian distance functions,
d(m, 2) = Vis.0- 1.5|+|7.0 -2.0|=6.10
similarity.
Step 2: We obtain clusters containing {1,2,3) and centroids{4,5,6,7).
Distance Functions
centroid1 centrold2
Given two p-dimensional data objects i=(X,X ) and j =(x2 p). the following common distan
1 7.21
functions can be defined:

1. Euclidian Distance Function


2(1.5,2.0) 1,12 6.10

3 3.61 3.61

7.21
2. Manhattan Distance Function
5 4.72 2.06
6 5.31 2,06
3. Steps of k-means Algorithm
7 4.30 3
1. Choose k clusters arbitrarily.
L, = (1/3(1.0+1.5+3.0), 1/3(1.0+2.0+4.0) = (1.83, 2.33) = cluster 1
2. Initialize cluster centres with those k clusters.
Ly = 1/4(5.0+3.5+3.5), 1/3(7.0+5.0+4.5) = (4.12, 5.38)) =cluster2
3. loop
(a) Partition by assigning or reassigning all data
V(m-²+ (m-y)²
objects to their closest cluster center. d(m, 2) = VI1.0- 1.5|+ |1.0 -2.0|' =1.12
(b) Compute new cluster centers as mean value of the objects in each
cluster. d(m, 2) = VI5.0-1?.5|+ (7.0- 2.0j=6.10
(c) Until no change in cluster
center calculation.

TechKnoaledgl TechKnouledge
PUbIlCatlons PUDIICations
Classificatton, and Clustert
3-26
So, we compare each
Business Intelligence
and Data

not surethat
Analytics
has been assignedto the
each individual oppositecluster.And we
right cluster.
find:
(centroid) of Cliste
individua Rusiness Intelligence and Data Analytics
Four Swapping Cases
3-27 Classificatlon and Clustering

We are still thatofthe When a medold m is to be swapped with a non-medoid obiect h. check
own cluster mean and to
Cluster1Distancetomean each of other non-medold o0jecsj
its of
distance to
IndBvidual/Distanceto mean (centrold) 5.4
fis in cluster of m reassign /.
1.5 4.3 Case1: jis closer to some k than to h;after swapping m and h. i relocates to cluster
1
represented by k
0.4 1.8 Cimh dý, k) - dg, m) 0
2
2.1 1.8 Case 2: jis closer to h than to k; after swappíng mand h, jis in cluster represented by h.
3
5.7 0.7 Cimh dj, h) - dj, m)
4
3.2 0.6 jis in cluster of some k, not mcompare k with h.
5
6
3.8
1,. Case 3: jis closer to some kthan to h; afterswapping mand h,j remains in cluster represented by k.
2,8 Cimh = d(j, k) - d(j, k) = 0
7 Its own (Cluster 1). In other
cluster (Cluster 2) than wor
closer to the mean ofthe opposite smaller that the distance to the other cluster's Case 4: jis closer to hthan to k; after swapping mand h,j is in cluster represented by h.
Individual 3 is
cluster mean should be new nals mh dí, h) - dj, k) <0
each individual's
distance to its own
relocated to Cluster 2 resulting in the
with indívidual 3). Thus, individual 3 is
(which is notthecase Mean Vector (centroid) The K-medoids algorithm requires a large number of iterations and is not suited to deriving clusters for large
Individual datasets.
(1.3, 1.5)
Cluster 1 1,2 3.9 Hierarchical Methods
Cluster 2 3,4, 5, 6, 7 (3.9, 5.1)
Hierarchical clustering generates hierarchy in clusters. No need to specify k. It is more deterministic. The
3.8.2 K-medoids Algorithm graphical representation of the resulting hierarchy is a tree-structured graph called a dendrogram.
sum of dissinmil In order to calculate the distance between two clusters, the hierarchical algorithms resort to one of five
error. While k-medoids minimizes the
K-means tries to minimize the total squared center of that cluster. In contrastto . alternative measures: mínimum distance, maximum distance, mean distance, distance between centroids, and
point designated as the ward distance.
between points labelled to be in a cluster and a
datapoints as centers
means algorithm, k-medoids chooses Types of Hierarchical Clustering
reference point, mediods can be used, which is the m
Instead of taking mean value of the object in a cluster as
centrally located object incluster. 1. Single Linkage
algorithm. :
Kmedoids is called as Partitioning Around Medoids (PAM)
that they are medoids are not. 2. Complete Linkage
All the items from the input data set are examined by one to see
1 Initialize: arbitrarily select k out of the n data points as the medoids. 3. Average Linkage

2 Associate each data point to the nearest medoid.


Fig. 3.9.1: Types of hierarchical clustering
3 For each medoid mand each data point h associated to m, swap m and h and compute the total cost (that is
the average dissimilarity of h to all the data points associated to m). Select the medoid h with the lowest cost 1. Single linkage
of the configuration. In single linkage hierarchícal clustering, the shortest distance between two points in each cluster is defined.
Repeat alternating steps 2 and 3 untilthere is no change in the assignments. In more simpler terms for eadh For example, the distance between clusters "r and "s" is equal to the length of the arrow between their two
pair of a medoid mand a non-medoid object h, measure whether h is better than m as a medoid. closest points.
Use the squared-error criterion.

is Pecd(p. m)²
Compute E-m L(r,s) = min(D(xYs)
Choose the minimum swapping cost.
Fig. 3.9.2
TechKaould TechKnowledga
PubC ations
PUbIlCatlos
Classification and Cluster
3-28
Analytics Business Intelllgence and Data Analytlcs Classification and Clustering
Business Intelligence and Data 3-29
points in each In agglomerative or bottom-up clustering method we
2. Complete linkage distance between two cluster is defh
the between their
assign each
Step 1: Calculate the slmilarity (e.g., distance) between each of the
observatlon to its own cluster. Then,
clustering, longest the length of two furth clusters and jotn the two most similar clusters.
linkage hlerarchical is equal to
In complete between clusters " and "s" Step 2: Find the nearest (most simllar) pair of clusters and merge them into a
single cluster, so that now you have
For example,
the distance one less cluster.
points. Step 3: Compute distances (similarities) between the new cluster and each of the
old clusters.
Step 4: Repeat steps 2 and 3 until allitems are clustered into a single cluster of size N.

3.9.1(B) Divisive Hierarchical Methods

L(rs) = max(D(*X) In divisive or top-down clustering methodwe allocate all of the observations to a single cluster. We
partition the
cluster to two least similar clusters.
Fig. 3.9.3 Finally, we proceed repetitively on each cluster until there is one cluster for each observation. There is evidence
that divisive algorithms produce more accurate hierarchies than agglomerative algorithms ín some
3. Average Linkage to eye circumstances but is conceptually more complex.
between each point in one cluster
average linkage hierarchical clustering, the average distance the left Divisible hierarchical clustering, top down approach is used. It starts with all objects in one cluster. Clusters
In between clusters "r" and "s" to is
in the other cluster is defined.
For example, the distance equal
t
are subdivided into smaller and smaller clusters until each object forms a cluster on its owm. Certain
connecting the points of one cluster to the other. termination
average length each arrow between condition is satisfied.

Acluster is split according to some principle, eg., the maximum Euclidian distance between the closest
neighbouring objects in the cluster. Start with single cluster at the top of the tree and continue splítting it into
smaller and smaller

Clusters till the bottom is reached where there are n clusters with one menmber each. Dendrogram is a tree data
L(r.s) = D(g)) structure which illustrates hierarchical clustering techniques.
nns i=1 i=1 Each level shows clusters for that level. Leaf- indívidual cluster, Root- one cluster. A cluster at level ii is the Union
Fig. 3.9.4 of its children clusters at level i + 1.

Ward distance

The Ward distance, basedon the analysis of the variance of the Euclidean distances between the observations
Methods based on the Ward distance tend to generate a large number of clusters, each containing a few observatios
Centroid Method

In centroid method, distance between the two mean vectors of the clusters is consider as the distance betweer
two clusters. At each stage of the process we combine the two clusters that have the smallest centroid distanca
A B
Hierarchical methods can be subdivided into two main groups: agglomerative and divisive methods. E

Fig. 3.9.5
3.9.1 Agglomerative and Divisive Hierarchical Methods

3.9.1(A) Agglomerative Method 3.10 Evaluation of Clusteringg Models

Agglomerative method is bottom up clustering. Suppose there is set of N To measure of performance of a clustering method, one need to verify the clusters generated correspond to an
(similarities) between the clusters equal the distances (similarities) between observations.
the
Calculate the distances actual regular pattern in the data. It is appropriate to apply other clustering algorithms and to compare the
most similar clusters. items they contain. Join the t results obtained by different methods.
In this way it is also possible to evaluate if the number of identifñed clusters is robust with respect to the different
techniques applied.
TechKnonlodu TechKaonlede
PubICatlos PubICations
Classification ano Custer
3-30
Data Analytics Business Intellgence and Data Analytics 3-31 Classification and Clustering
Business Intelligence and related are objects
in cluster.
other cluster. a.19 Write short note Binary attribute. (Refer
howcloselv from Sectlorn (8Marks)
coheslon: Measures separatedclusteris generated 3.7.5(1)
Cluster
how distinctor
well Kclusters
separatlon:Measures x} be the setof Q. 20 Write short note on Nominal attribute. (Refer Sectlon (4 Marke)
Cluster X =(x,,Xy, .., 3.7.5(2)
dist (C, Cj) a. 21 Wite short note on Ordinal attribute. (Refer
Let Sectlon 3.7.5(3) (4 Marks)
Q. 22 Explain K-means method. (Reer Sectlon 3.8.1)
(X,) coh = (4 Marke)
Cohesion is defined as
O. 23 Explain K-medoids algorithm. (Refer Sectlon 3.8.2) (5 Marke)
defined as,
of clustersis Explain single linkage, complete linkage, average linkage and ward distance.(Refer Sectlon 3.9)
Separation betweena pair 0. 24 (5 Marke)
dist (C, C) How one evaluates clustering model? (Refer Sectlon 3.10)
a. 25 (5 Marke
Sep (Xy, Xa) =
consistency of clusters of
data..The silhouette
cluster (separation).valy
and valldation of
of interpretation compared to other
Silhouette refers a method own cluster (cohesion)
similar an object is to its
is a measure of how the object is well I matched with Its ow
1to + 1. The high value indicate that
The coefficient value ranges from - Silhouette can be calculated with distance metric such
with neighbouring cluster.
cluster and poorly matched
distance.
Eclulidean or Manhattan

Review Questlons

(Refer Sectlon 3.1) (5 Marta


components of classification problem?
Q. 1 What is classification? What are the
classification model ? (Refer Sectlon 3.1.1) (5 Mara
Q.2 What are the three phases of
Sectlon 3.1.2) (5 Marta
a.3 What are the main components of classificatlon model ? (Refer
(Refer Sectlon 3.2) (S Marti
a. 4 How you evaluate classification method?
(4 Marka
Q. 5 Explain the Holdout method. (Refer Section 3.2.1)
Q.6 Explain the Repeated random sampling. (Refer Sectlon 3.2.2) (4 Mark

Q.7 Explain the cross validation. (Refer Sectlon 3.2.3) (4 Markg

Q. 8 Explain the confuslon matrices. (Refer Section 3.2.4) (5 Marka|

Q.9 Explain the ROC curve chart. (Refer Sectlon 3.2.5) (5 Marl

a. 10 Explain the Cumulative gain and lift chart. (Refer Section 3.2.6) (5 Marks)
Q. 11 Write short note on Bayesian methods. (Refer Sectlon 3.3) (4 Marks)

a. 12 Explain naive Bayes classifier with example. (Refer Section 3.3.2) (5 Marks)
Q. 13 What is Bayesian networks ? (Refer Sectlon 3.3.3) (4 Marks)
Q, 14 Write short note on logistic regression. (Refer Section 3.4) (5 Marks)
Q. 15 Write short note on neural network. (Refer Sectlon 3.5) (5 Marko
Q. 16 Write short note on support vector machine. (Refer Section 3.6) (5 Marksl
a. 17 What are the characteristics of clustering method? (4Marts
(Refer Section 3.7.1)
Q. 18 What is taxonomy of clustering method? (Refer
Section 3.7.3) (4Marls)

TechKnoully
PUbtatiots
Husiness Intelligenceand Data Analytics 42 Manage ment Information System

Management Information 4. Based on Analytical Purpose

4 Systemn
Descriptive Information: Explains what has haopened using
dashboards, reports, and basic anays
Dlagnostic Informatlon:Analyzes why somerhing happened by identifring patterns and
corTelations.
Predlctive Informatlon Provides forecasts based on histordcal dsta using pradicttve modeing and
machine learning.
Prescriptive Information : Sugyests actionable stens to ontimiza outeomes through advanced
Syllabus Quality of Information, simulations
(MIS): Classification and and algorithms.
Management Information System management,
Marketing models:Relational
marketing Sales force for logistics plannino D
optimization, Optimization models Based on Frequency
: Supply chain S.
Logistic and production models
management systems. of good operating practíces Static Information: Data that does not change freauently. like
The CCR model, ldentification archived financial regorts.
Data envelopment analysis, Dynamic Information : Continuously updated dat. such as real-ríme stock prices or
website tratic
4.1 Management Information System (MIS) analytics.

Intelligence (B) bv
as a foundation for Business 4.1.2 Quality of Information
Management Information System (MIS) serves lereby
decision-making, analy_is, and reporting. BÍ enhances MIS
structured data and information to support and predictive insights. The quality of information in BI is critical for making informed decislons. BI tools focus on detivering high
data analysis, visualization,
advanced tools and technologies for quality data with the following attributes:
4.1.1 Classification of Information Accuracy : Ensures that the data reflects real-world facts without errors or
inconsistencies.
can be tailored to suit specifc hue Timeliness : Provides up-to-date infornation for real-time or nearreal-time decision- making
In the context of Business Intelligence, the classification of information
2

needs Here's how information in BIl can be classified Relevance : Delivers information tailored to the specilc business questions and user roles.

1. Based on Decision-Making Level 4. Completeness : Ensures no critical data is missing, providing a full picture of the situation.
5. Consistency : Harmonizes data across multiple sources to avoid contradictlons in reports and anaByss
Strategic Information : Supports high-level decision-making by analyzing market trends, competior
performance. and long-term forecasts. 6. Clarity i Presents information in an intuitive and understandable format, such as through visualizatlons.
dashboards, or summnary reports
Tactical Information : Aids middle-level managers in optimizíng business processes, managing resourcas
and improving performance. 7, Accessibllity: Makes data readily available to authorlzed users through 8l platforms or seif- service toois
Operational Information Provides granular, real-time data to support daily tasks and immedate 8. Actionabllity : Provides insights that can directly influence decision- making and strategy
formulation
decisions, such as inventory levels or customer support metrics. 9. Scalabillty : Ensures that the data infrastructure can handle increasing amounts of information as the
business
2. Based on Data Source grows.

Internal Information: Derived from within the organization, such as sales reports, financial 4.1.3 Role of MIS in Business Intelligence
statements
and employee performance data.
External Information: Collected from external sources like market MIS lays the groundwork for Bl by ensuring that
research, social media, government
reports, and industry benchmarks. Datais systematically collected, stored, and processed.
3. Based on Data Characteristics Information flows efflclently across organizatlonal levels.
Structured Data : Organized into rows, columns, and A single version of truth is maintained, reducing data silos.
predefined formats. such as data in relational
databases. Decision-makers can rely on consistent, accurare, and actionable insights.
Unstructured Data : Includes emails, social media posts,
images, videos, and other formats that require
advanced analytics. By integrating MIS with Bl tools, organizations can leverage their data to gain a comperitive edye, improve
performance, and drive innovation. The synergy between MIS and Bl transtorms raw data into strategic assets.
Semi-Structured Data: Data with some level of organization, like JSON or
XML files.
Teca kaeuedee
Management
Information s
Business Intelligenceand Data
Analytics
4-3 Syst Rusiness Intelllgenceand Data Analytics
4-4 Management Information System
Relational Marketing
Marketing Models : whenever a
4.2 us have
noticed that
so that they
get rmobile ompany Producis
Sovices Distributlon
Let's understand relational
about to launch a new
marketing
device into the
with
market
example. Most of
a
the
the
functionality
company
survey is done by provided by that device.
the feedback
aif erent oplnlo channals

them to enhance
customers, which helns
when you visit a
restaurant waiters get
aspects so that
forms along wa
Irom their
And it is not only about a
mobile phone,
customers have to
rate the
restaurant in different
they improvs Segments Relattonat
marketing
Sales
processes
the bills wherein the customers and try to
themselves.
Almost all the companies
study the behaviour
and the feedbacks
given by the

customers into their device


reasonable and effective
with a
company is increased.
inculcatpriece
cost
t
required by the thus sale of the Prices Promotlon
teatures that are been the product and
information about channels
customers are attracted towards collective their custome
that the which have
company store huge database company to provide options to its Fig. 4.2.1: Decision-making optlons for a relational marketing strategy
Most of the e-commerce
and the data regarding
their previous purchases
the customers
which helps the
again resulting in growth in the sales
of the customecustomen Above mentioned are the choices through which the strategles for relational marketing can be
constructed and
are more lilkely to be liked by strengthen, objectify and implemented.
which
followed in relational marketing
is to start,
The strategy that is beencustomers, stakeholders and the company, which is been presented
objectives.
maintaln
by the c u e th
Product services are the services that can be provided by the company for the maintenance of the product post
relationship between the executed and evaluated to achieve the purchase.
accordingly,
analysis is done, planning is done increase customer's satisfaction eos
became popular in late 1990s to Various distribution channels can be constructed to make the product avallable for the customers, like nowadays
Relational Marketing evolved and
competitive advantage is achieved. telecommunication servicee . the companles are not sourly depended on traditional approach where the product is distributed to various
companies providing financial and shops from where the customers would purchase the same instead the products are been distributed to e
Initially this approach was initiated by they are more concern about what the
customere
commerce sites and sales with attractive offers due to which customers get wide range of options to purchase
the companies wherein
on implemented by almost all their respective products so as to sustain the comme the product.
needs and accordingly Implement the same into
market.
Strategy
4.2.1 Motivations and Objectives
which are listed below:
Reasons to spread relational marketing are complex but interconnected (Organization Processes
also increased comparaal
With evolution of companies in the respective fields, the number of customers has
Rolatíonal
Earlier it was innovation-production and obsolescence cycle which was eventually compressed from 1980. marketing
which happened to boost customized business intelligence options for customers.
Increase transparency and flow of data an also addition of e-commerce sites lead to global comparisons betwea
different features, prices and also reviews from the customers who have used that particular product.
Business
Due to increased competitors in the market, it is very uncertain whether the customer will renew the existine
service or opt a new one because the facilities to change the services have become much easier and convenient h
(inteligence and
data mining
Technology
Use

Fig. 4.2.2 : Components of a relational marketing strategy


Most of the companies have maintain different levels/versions of the products and services provided by them s
that the customer has got the flexibility of choosing the services according to its requirement and also switch
between the services as and when required. Segments and prices of the product is also maintained to compete in the market. Different creative promotions
are done to attract the customers and make them aware about the specification of the product.
Data is gathered of the transactions and products and services that are been used by
the customers so that the
company has huge range of data to analyze what is expected next by the
custonmers, advanced automation Above mentioned are the different components that are been used in relational marketing strategy where in the
techniques are used to analyze this data so that accurate organization, its technologies, business strategies and its data mining, process implemented to construct and
observation achieved.
is
Strategies of relational marketing rotate around the following choices: promote the product together help in achieving efficient and strong relationship among its customers and also
the company.

Fig. 4.2.3 represent the different people involved in relational marketing strategy where all the nodes are
interconnected to each other.

TechKaoulel Tech Knouledge


PUbIlCa tieis PUDIcations
Management Informatlon Syster
4-5 Rusiness Intelllgenceand Data Analvtlcs 4-6 Management Information Systerm
Data Analytics
Business Intelligenceand This decision making process can be managed and formally expressed with the help of optimizatlon models. The
Company
end phase of marketing activity cycle is executlon of the campalgn that is been planned with appropriate
gathering of results.
CEmployees
The data that Is been collected through this results is then put Into marketing data mart for future data mining
analysis.
Sales force
Customers Whenever a campalgn is been executed It is important to set procedures which willhelp to control the campaign
Dealers and also analyze the data whlch is been obtalined in the form of result.
To test how effective the campalgn has been It is important to restrict the campalgn to selected set of people
Suppliers
which will have same features as of the people who would be using that product without taking any action
Competitors against them.

relational marketing strategy


relationships involved in a
Fig. 4.2.3 : Network of Data
warehouse
Relational Marketing Analysis
4.2.2 An Environment for Customers
products Marketing
Operational servicos data mart
system payrnents
profitablity
Business intelligence Marketing
|Metadata
k6 and data minng cánpaigns
quervy and OLAP planning
time series analysís larget selection
Marketing regression channel selection
ETL Data
data mart content selection
tools warehouse ssification
classific Sales force Database Database
Segmentation. campaign cycle databae
optimizatlon marketing Contact center
clustering contacts call-center
association rules implementation presentations
sales
"segments " trouble-ticket
optimization price list "promotions Survey
sales "campaigns clickstream
Externaldata
payments "CUstomer valuo " emauls
-Information systems Decisíon makíng process

Flg. 4.2.4 :Components of an environment for relational marketing analysis


Fig. 4.2.4 shows the main elements that are been used to create an environment for relational marketing Fig. 4.2.5 :Types of data feeding datamart for relational marketing analysis
analysis. Collect
information
Information infrastructures consist of the company's data warehouse, which isbeen achieved by collecting data on customers
from different internal and external data sources, and also from marketing data mart which gives business
intelligence and data mining analyses for understanding the potential of the company and identifying the actual
customers that the company has. Perform ldentify
optimized and segments
With different machine learning and pattern recognition models it is easy to achieve various sections of targeted actions and needs
customner base which can be later on used to define and design policies for marketing actions.
Classification model can also be generated to classify different objectives of the company say as for
classification model can be made to check what the customer is frequently buying from the offers example the Plan actions
by the company and project the similar kind of offer to
been provided based on
only those customers where the possibility of their knowledge
acceptance to the model is more.
Managing marketing campaign is a difficult task which needs strong Fig. 4.2.6 :Cycle of relational marketing analysis
would be the actions taken and planning for every type of customer, what
communication channels through which the customer can
company and how can the available resources
both human and finance is been used.
communicate with the

Tech Knouledgu Anowledge


PubLr ations
Managementt
4-7
Informatlon Ssysen
Business Intelligenceand Data Analytics Management Information System
Business Intelligenceand Data Analytics 4-8
customer
cumulative value of
4.2.3 LIfetime Value

Following are the main stages of


customer llfetime
which show

ln the startine
throughout t Selection of
prospects

by any company.purchasing
taken for a customer
the prphase
time.
actlons that can be started
not yet
It also shows
the different
individual is a prospect or also
known as potential
customer who has oduct Cross-selling Business
Intelligence Customers
company. directly and indirectly
Up-sell1ng acquisition
using the services provided by the
actlons are been
carried out in both
emails.
fashion. In and data mining
acquisition or service via calls, oral
For these customers
acquisition the customer is
beengiven information about the product talks wt
on.
the agents of the company and so displayed on the
company's acquisition
In indirect
advertising
webslte highlighting neWinformation
the and
about the product is
dashboard
of t
products or services. This actions includes cost whlch wi) be assigne
g
Retention

that are been approached


would not
calculate the loss as all the customers Fig. 4.2.8: Main relational marketing tasks
to the customers and then
buy the product or service. A24 The Effect of Latency in Predictive Models
different situations like the service may
require subscription
This event can have different forms in account with the o
buy the product when he/she opens an Fig. 4.2.9 illustrates the logic for development of ciassification model for analysis of relational marketing taking
service, or the customer will only be able to
and so on. into consíideration the temporal dimension. Let's assume t is the current time period which needs to be derived
company he/she will be getting constant remninders f as inductive learning model of classification problem.
Before the prospect becomes a customer for the
their customer ship.
company ín the form of messages, call, and emails in order to get Say for example at the beginning of month January a mobile provider wants to develop a classifñication model to
This lead to generation of cost which has an progressive amount and if the prospect is not convinced to b. find the probability of its customer. The data mart will contain data from past periods which will be updated as
outcome. t-1.In our case will have data up to December.
product this ultimately puts the company at the loss which is stated to be negative
Profits, Retention Imagine the provider wanted to get the probability of future h months in advance say for supposing next 2
Cross/up-selling months that is February and March so in that case probability will be generated from the data that you have till
Churner
Retention December. Here you have to note that data for period t will not be used to predict because the data for period t
Lost proposal will not be clear at starting of periodt.
Cross/up-selling|
To develop classification model the values of target variables are used for last known period as t-1, which are

LOsses Acquisition the customers that were seethed in December month. For testing the model the data from t-2 should not be used
because that is the training period of the model.
Time

Most recent
data (period t +t

Fig. 4.2.7 : Lifetime of a customer Past data


from marketing
This phase which is considered to make the data mart Learning Model
relationship between the customer and company strong and also up to
known as maturity phase may also lead to retention, cross selling and up period t-1
on the customer. selling to sustain the revenue invested
The last phase is interruption of Prediction
relationship where the customer calls off the service of the (period th)
on to the competitor company due to the inconvenience in terms of company and moves
change in office or residence address. payments or various other problems lIke
Fig. 4.2.9: Development and application flow chart for a predictive model

Tech Knouldu Tech Knemledge


PubIcatlens PubICations
Management Informatlon Sys
4-9
Busíness Intelligenceand Data Analytics 4-10
Management Information System
Analytics
Intelligenceand Data
Business
acquisítionisalso an Cross selling means trying to sell a product or service to the custonmer who is already actlve and is Irelationship
4.2.5 Acquisition
relationalmarketingstrategies important with the company.
importantaspect of potential Through classification model the company can understand which all customers are interested in cross selling
retentionis the
Evenif companies. prospects which are sald
to be
are been customers and approach only those customers.
for some of
It is an
the
process which requíres
ldentiflcation of
completely unaware
new
about the
products or services
past and now
are in
that
need of one or the provimidedght: by For example, we often get calls from our banks asking us to upgrade our debit cards to credít once, now this calls

can be or may
be partlallyor products or services inthe
this services or the
other case
would be that
the custone also are only been done to the customers holding debit card and not to those holding credit. So this defines a margin
companyfor did not requlre
hunting for better for acquisition to call only those customers holding debit card.
competitors who are
customersof the competitor. acquisition This can also be stated as up selling where the customer is informed and asked to own the product or services
has switched
from your
company to the
prospects it is
important to assign
marketing strategies
campaign with which are one level higher than the existing one and willhave more features and availability.
Once the
company has identified the
prospects and the company
with various levels along wth t 4.2.8 Market Basket Analysis
profitabilltyto both the company. thee earlier pools
avallable with the campaign is based on
marketingresources
strategies are were the
advertising and
been provided which is
been n
taken fro The main objective of market basket analysis is to get the exact view of what products the customers are
Traditlonal marketing of products and services that
are
acquisltion fed nt purchasing so that the company gets the required knowledge to organize and plan their marketing strategles.
quality of
the publíc in order to
enhance the characteristics for the profiles Usually used to analyze what kínd of product is sold more on e commerce sites or retail industries.
ruleswhich provides
data mart to derive classification It can also to be applied to check the purchases done with help of credit card or landline services or
4.2.6 Retention
complementary once to check whether the policies taken are been taken by same households.
its saturation in market has Data used here can also be referred as purchase transactions whích can be associated with time dímenston
to
products and services and
Due to the reach of maturity stage by most of the leada track the purchase.
competition amongst companies. company has more at
of customer base of
thls the negative side effect is that the expansion which is common in 4,2.9 Web Mining
Due to other company
customer at cost of that taken by
mechanism like acquisition of on. As it is well known fact that web is the most common and easier way of communication with the maximum
of the
telecommunication and so
Industries for saving management, attributee . crowd. And most of the companies are using social media platform to promnote their products to the people.
analyze and characterize the
companies invest more amounts in resources to Ecommerce sites are considered to be the important sales channels.
Due to this many
company to another.
which customer's switches from their grab the attention nf A. Since web mining is used to analyze data from the activities that are been carried out on those sites
by the visítor
attractive offers given by the competitive company to
The other reason could be the this web mining methods are mostly used for three purposes content mining, structure mining and usage mining.
if the company down.
prospects and thus bring the market strategies
pay for the seri Text míning
there can be various reasons that the customer would not find the charge relevant to
Also for the same.
alternative one and switch HTML mining
provided by the company and thus hunt for an
of products and services that are been provided hu
There are various other aspects that would lead to retention Content mining XML miníng
thus the company has to be keen about the same.
the company and Image mining

4.2.7 Cross-selling and Up-selling Multimedia míning


identify different marke
Data mining models can also contribute to relational marketing analysis which aims to
segments through which most of the possibility for purchasing additional services or products from the Web mining Static links

company.
Structure mining Dynamic links
For example assume amobile shop where there is an offer that if the customer buys a smart phone the or she cn
pay extra Rs. 100to get annual subscription of Netflix along with smart phone but there is no compulsion thal User profile
every customer purchasing smart phone would be interested for the subscription and due to this the mobilk
provider get the classification of customer who are interested and people who are not interested in the offer. Usage mining Clickstream analysis
And if the number of interested customer is more the shop owner will have to get more services from Neuls
Purchasing behayior
This demographic information about the customer can be fed into data mart which can be used as explana0)
attributes to develop classification model which will help to develop various offers forthcoming period and Fig. 4.2.10: Taxonomy of web mining analyses
how customer would react to it.
Tech Kaowledge
PubIlC ations
TechKnouldy
PUbIlCatloas
Management Informatlon Sysy
4-11 Business Intelligenceand Data Analytics
Analytics 4-12 Management Information System
Inteligenceand Data
Business information contact management.

custSearomerch .eng
required
Content mining the web page to remove requiredl by the
1 sales opportunity management.
of content
thatis there on datathat is been
analyses provide links to present on web page in
Itinvolves con tent mining to of texts customer management.
lke Google also
It can also be
perform
tracked back to
images and
data míníng
problems for
multímedia content.
analysis
format activity management.

HTML and XML, order managemnent.


different links on area and territory management.
2.
Structure mining
mining is used to understand
the structure of
web pages
web using
and arches are going to thee different
nodes that aqg support for the configuration of products and services.
This type of
Graphs can be created where
nodes correspond to aret knowledge management with regard to products and services.
link to other page. web structure which identifies : area of hlg When a sales network is been designed and when agent's actívity are been planned there are
used to characterize requirement or
from graph theory is
Results and algortthms decision making task which will take advantage of optimization model.
intensity. Rest can be managed with help of automation tools also known as SalesForce Automation (SFA) which is
3. Usage mining that nowadays implemented by almost all the companies.
marketing which explores paths
relevant standpolnt of relational
It aims to certifying most the visit to company's website. A3.1 Decision Processes in SalesForce Management
behaviour during
followed by navigators and used to obtain correlations
extraction of association rules are betwee
Methods that are been used for When it comes to designing and managing salesforce various problems related decision makíng arises as
different pages visited during session.
shown is Fig. 4.3.1. If this problems are successfully overcome then they yield maximum of profit, increases the
4.3 Sales Force Management efficiency of sales action and also sees to efficient use of resources along with professional rewards to the sales
agents.
their organizations and rely on the emnloua
Nowadays almost all the companies have sales department into The process of decision that is shown in the Fig. 4.3.1. It shows that how the strategic objective of the company
that are been offered the company. Every employee
those department for the sales of product or services should be taken into consideration along with different other components of marketing and see to it that the role
achieved these employees play an impe
been given a target and depending upon how the targets are been assigned to salesforce have broader framework with respect to relational marketing.
role in the profit that is been gained by the company.
of A. Sales force management
There are various marketing strategies that are been implemented by the sales department for selling roles and oblectives
product or services, The sales forces is a term coined for all the people and roles along with different taske a
responsibilities that are associated with sales as a process.
The basic terms associated with sales forces based on the activities that are been carried out are stated below:
Residential: This sales activities take place at one or more places which are managed by company
Sales force design Activity planning
supplying products and services from where the customers can purchase, this includes sales at retail shoN organizational structure resource allocation to products
and wholesale dealers. sizing customers and segments
sales territories planning of visits
Mobile: In this type of sales the agents of the company go to the customers house or office to gie
information about their product or service and also collect the orders. In this category the sale occurs withn
CAssessment and control
B2B(Business 2 Business) relationship it can also be encountered in B2C(Business 2 Customer) criteria. .evaluation criterla
"rewards
Telephone: This sales happens on telephonic conversations where the company
agents call up the motivation
customers and promote the product and also collect the orders.
When it comes to mobile salesforce there are varies Fig. 4.3.1: Decision processes in salesforce management
problem with it which can be subdivided into few maln
categories listed below: The two ways arrow connection means that all the component interact with each other in consideration with
designing the sales network. marketing. The decision-making processes related to salesforce management can be grouped into three
planning the agents' activities. categories : design, planníng and assessment.

Tech Kneuley!
Pubcatloas TechKaeuledge
PUbIICations
Management Information Syst
4-13
Rusiness Intelligenceand Data Analytics
Data Analytics 4-14 Management Information System
Business Intelligenceand 4.3.1(B) Planning
subsequent restriction phase.
For
4.3.1(A) Design
aiferentexampnanN
during worksin
commercial activity or companies. This phase Decision making tasks that are associated with planning are
phase of any group of
during the design phase, to market entitles. Resources can be assienmnent
dealswith the start prospects or types ofdecisions. of sales resources, structured and Slzed
It plans for the
of acquisition build. Salesforcedesign
includes three
calculated as work time of the agent and the buaget
during the planninggsegments which is whereas market entities comprises of products, market
ofcreation of
market Types of decisions segments, distribution channels and customers.
Allocation can be calculated as the time spend on every customer to
cost required to travel and how effect the action was to
promote the product or service, time and
1, Organizational structure convince the customer for the product. Further
Dossibilities can also be considered like explaining the technical and functional
and suggestions coming from the customers.
features of the product or service
2. Sizing

3. Sales territories
4.3.1(C) Assessment

Decisions Assessment is important to control the activities to check the effectiveness and efficiency of the
Fig. 4.3.2:Types of agents in sales
network so that proper remuneration and incentives can be designed for every
individual. On account to
measure effective efficiency of the agent it is very important to announce the criteria on
1, Organizational structure which they would be
hierarchical cluster of agents judged.
groupstructure
This can take
of products, geographical
forms which corresponds to
differentareas or brands, in some cases markets are also been considered to fona
with help So that the agents give their full contribution towards the sales of the
product and services thus increasing the
profit of the company as well as their individual proit and also enhance their performance
cluster. the cuet
is mandatory to analyze complexity of
For understanding organizational structure it and to what extent. 4.3.2 Models for Sales Force Management
agents be specialized
products and else activity to decide how can
Following are some classes of optimization models for designing and planning salesforce. Before starting here
2. Sizing are some of the notions that would be used in following sectlons so let's learm
work within a selected structure of sales whiol about it first.
It is the working done on the number of agents that should Let's assume that are a particular region is divided into M geographical areas of sales,
on different factors like count of customers and prospects, how much of sales area coverage should be done which is also known as
sales coverage unit so let M= (1, 2, .., M). Areas should be divided into disjoint clusters known as territories such
limit for every call and travelling time of every agent.
that each area belongs to only one territory and is also connected to all
3. Sales territories areas of same territory.
Time connection property implements that each area it is possible to reach another area of same
When it comes designing sales territory means creating a cluster of geographical areas in a region any territory. Time
span can be divided into T intervals which are of same length which are usually weeks or
assigns that region to a particular agent or group of agent. months which can be
indicated as te T={ 1,2, .., T}.
Factors that should be considered while designing and assigning this territories to the agents are the salek Each territory has a sales agent associated with it which belongs o one area of the territory which is
potential of every area, time required to travel from one area to another and what time limit a particular considered
to be agent's residence. Time and cost of travelling from one area to another depends on the area of
agent has. residence of
the agent. Let Nbe number of territories so N= (1, 2, ., N).
Segmentation:Products-services In territories there are customers and prospects which would be visited by the agent to
promote their product
which will be given as H in some models it is considered to have various segments and thus they are
counted
Sales activity same. So h={1, 2, ., H}. And finally assume every agent sells K products and services during the call so let
k=(1,2,-. ,K).
Sales and communication channels 4.3.3 Response Functions

This plays an important role in formulating the models to design and plan sales network. In general it defines the
flexibility of sales with respect to sales action and a formal way to describe complex relationships between sales
Sales force Sales force
organization
actions and market reactions. Sales to which response functions refers to are expressed in products units or
Sizing
monetary units known as revenue or margins.
Sales territory allocation to agents They are presented as sales revenues formally. The anxiety of sales action can be related to diferent variables
Fig. 4.3.3: Salesforce number of calls made to the customer in given period of time, how many times product was mentioned in given
design process period of time ,how much time was given to customer in person during a given period of time.
TechKaould
PUbIlCatlons
TechKaouledge
PubICations
Management.Information Syste
4-15 Business Intelligenceand Data Analvtics 4-16
Management Information System

Business Intelligenceand Data Analytcs Define I additional continuous variables that express the deviations from the average sales opportunity
value for
Response each territory:

S,=deviation from the average opportunity value 2a.


r(8)h for territory 1.
Hence. the corresponding optimization problem can be formulated as :

min
Sales action effort iel je)

response function
S.to ,asS,. ie l,
concave
Fig. 4.3.4(a) :A
Responset 1e1,

je J,
iel
S,e 0, Ye {0, 1) le l, j J.

4.4 Supply Chain Optimization


Xh Sales action effort
Xo
Supply chain can be stated as network of linked and interdependent institutional units which co-ordinates with
Fig. 4.3.4(b) : A sigmoidal
response function each other to manage and improve the material quality and information related to material that is been given by
the vendor to the customers after all the required process related to the delivery of the product is done.
4.3.4 Sales Territory Design The aim and benefit of having an integrated planning and operations been carried out between the supply chain
particular agent to minimize weighted sum of two terms.
whiA institutes to have systematic objectivity to make decisions and take actions accordíngly to maintain the standard
It involves allocation of sales coverage units to a of sub programs which would be related to logistic operating of company system.
inequality between the opportunities given ,
represents total distance between the areas of same territory and
Most of the companies involved in manufacturing are implementing such kind of logistic supply chain approach
the agents.
whose number will be already so that the upstream and downstream of the supply chain whereas the problems in the co-operation between the
Every region is divided in Jareas whích are then combined into I territories subprograms can also be tracked.
decided. Every territory has an agent which would be associated to sales coverage unit which is considered to he
Also the other advantage of having integrated logistic supply chain willreduce the cost of expenditure which
residence of that agent.
includes cost of processing, cost for transportation and distribution. Also the inventory and equipment cost are
It is imagined that travel times with each area is slandered keeping in mind travel time between a pair of distind been included and reduced in integrated supply chain.
areas. Every area will be identified by coordinates (e, f) of one of its point .Choose the point whose coordínates
are obtained as the average of the coordinates of all points belonging to that area. For every territory, let (e, f) It is equally important to upgrade logistic supply chain by adding models and automated tools which would help
in planning and analyzing the capacity in critical situations where the complexity is high in the logistic supply
denote the coordinates of the area where the agent associated with the territory resides. chain which is made to function.
This area will be called centroid of territory i. The parameters in the model are as follows: d,, is the distance
between centroid iand area j. It is given by, In most dynamic situations where the competition is much more high as the competitor company would also
have all its efforts put into their supply chain to make it more effective. Competitor companies can be the
companies which are production wide range of products and so these companies will require multi centric
a, is the opportunity for sales in area j; and B is a relative weight logistic supply chain which would effectively look into distribution of the products according to the demands of
factor between total distance and sales
imbalance. Consider a set of binary decision variables Y, defined as: the customers.

if area j is assigned to territory i This multi centric logistic supply chains need to be widely spread with most of the automation which makes the
otherwise work simpler and also these chains have large amount of fnancial investment done so as to automate and make
the chains more effective. The effectiveness and features that are associated with logistic supply chain is directly
proportional to the profile that the company maintains to communicate with the customers.
TechKnowledgu Tech Kneuledge
Pubicatlons PuDIlcations
Managementt
4-17 Information Syst Business Intelligenceand Data Analytics
Data Analytics 4-18
-

Business Intelligenceand Transp. L.. are products I which ls in Management Information System
Inventory costs inventory at end period of time t.
Production
Transp costs d.. is the product demand l over t
Purchase costs
costs period of time.
is unit
Costs
manufacturing cost for I product in t period of time.
h.. is inventory cost for product I in t period of
US market
time.
e, is capacity absorption to
US manufacture a particular unit
US suppllers US plants b, is capacity available in period t.
Sothe problem is formulated as follows:

min
ieT iel
Asla plants European market
Europe
s.to P*1-de ie 1, te T,
Offshore suppliers

iel te T,
P20, ie l, te T.

Kit suppliers Asia/Pacific market


Asia/Paclfic 4.5.2 Extra Capacity
supply chain
Fig. 4.4.1:An example of global The first model deals with resorting extra capacity with
The decision variables in first model are also considered respect to over time, part time or third party capacity.
here with addition of few more variables listed below.
4.5 Optimization Models for Logistics Planning extra capacity which is occupied in period t. And parameter like, a. is unit cost of 0, is
extra capacity in for t period.
sunnl.a
are associated with the features of logistic So the formula now becomes
Following are some of the optimization models which
and logistic production systems. min
world logistic production systems have teT 9,0,
While learning about this models one should understand that real
ieT iel
it will have combination of difer
than one element that are been considered so it would be more complex and s.toP +,t-1-=die ie I, te T,
some notations that are eu
features of different elements. Before starting with detailed study of the models
used by these models should be known. te T,
In logistic systems Iis products denoted by index i e 1= (1, 2, ... , I}. Also the planning horizon is been furthe Pr20, ie l, te T.
divided into time intervals Tdenoted as te T={1 2, ..T }which is usually of equal length with duration
weeks or months. 4.5.3 Multiple Resources
The manufacturing company have some set of critical resources that are been shared among the companis If thecritical resources are to be included in the manufacturing
during the manufacturing process and are also available in limited quantity. These resources may contain process the formula will have few more
parameters included and the decision variables required are already been included.
manpower,tools, assembly lines, specificfixtures and so on. These critical resources are denoted by Rand given
as re R={1, 2,..R}.When even a single critical resource is applicable to the manufacturing process the inder Additional parameters are listed below:
b., is quantity of resource r available in t period of
value of r is completely omitted to maintain simplicity. time.
e is quantity of resource r absorbed to manufacture one unit of producti.
4.5.1 Tactical Planning
So the formula is given as:
It is the first formn where the main objective of planning is to regulate the
amount of production for every product
min
over T time period which includes midterm planning horizon as well
which should alsÏsatisfy given demand and ieT iel (Ge P+h)
capacity limits for each and every resource that is been used in
cost to minimal which will sum up manufacturing process and which also keeps the s.to P+t-1= ie l, te T.
manufacturing and inventory costs.
Hence decision variables like:
P,are products iwhich will be
iel r Psbr re R, te T,
manufactured over t period of time. Pie 20, ie I, te T.

TechKnouledu
atlens
Publltat Kaowledons
Management Information Systen
4-19
Rusíness Intelllgenceand Data Analytics
4-20 Management Informatlon System
Business Intelligenceand Data Analytics 4.5.6 Billof Materials
backlogrefers to
4.5.4 Backlogging
This is an additlonal feature
that isto be
consideredin logistic
and
systems. Term
ít could not
be
completedso thereispossipenalblitytytha
a
c
One more feature that can be added in
planning model ls hill of materdals whlch ls assoclated with
comples
perdod of time said to be backlogged. structure.
given in certain completion is
portion of demandis to be thetime
was left after Consumer goods In which end product that is been made
that is been involved and the
work that
industries which
produce mass
cannotbe whichis tmog willhave various components that are been used to build up ne eu
Backlog 0s a feature that usually happens
in

variants in backlog
vari
B2B
which can be referred as lostsales
which
fulfl ed and y product.
Parameters that deflne the format of bll of materials are:
likely to develop different for product i
the there is a subsequent
lost.
decision variables like B,, is
units of demand that are
iiover period of time ben
A..which ís units of product i directly required by one unit of
product , in which term product refers end
add new demand for product product and associated components required which define different levels of bill of
Thís model is importantto of delaying the materials.
8, is unít cost
delayed in period t. And parametersS So the formula becomes:
So the formula becomes: min (P+ hge)
ieT iel
min
ieT iel
ieI, te T, s.to P*i1-d ieI, te T,
s.to P ie-1 -e+Bi -B,t- 1=de
te T,
iel Psb te T,
jel te T.
PpB,20,
ie I, P20, ie I, te T.

4.5.7 Multiple Plants


4.5.5 Minimum Lots and Fixed Costs
in mln:
systems whichare to be presented
to be added in manufacturing For this model it is been assumed that the company has network of Mproduction plants which are situated at
More additional features needs conditions are like the Drod..
sometimes the different locations which are manufacturing a single product.
and economy reasons only,
conditions which would be for technical
or less than the threshold value that is been in minimum
values should be equal to 0 for one or
more products It is the responsibility of logistic system to supply Nnumber of peripheral depots to every manufacturing plant
decision variables listed below need to be included. turn by turn. Every manufacturing plant me M={ 1,.M) is been featured by maximum product that are
Include these conditions ín model binary
available there which is given by s, when that particular plant has demand of d, products.
ifP>0
Also the transportation cost cmn Ís included which include sending a production plant mo depot n and for every
otherwise,
pair of (m,n) which is origin and destination we have logistic network. The main aim of company is to have
Also the parameters liked.
optimistic logistic plan which satisfies the demands of depots in minimum cost without exploiting the availability
1, which is minimum lot for product i.
of production plants.
y is constant value larger than any producible volume for i. Decision variables included in thís model which represent the quantity that needs to be transported for every
So the formula becomes: plant and depot pair is given byxm which is unit of product to be transported from mto n.
So the formula for the product becomes:
min
ieT iel
min
s.to P+-1-=e iel,te T, me MnEN mnmn

te T, S.to nEN mnSm me M,

ie l,te T,
ne N,
me M
ie L,te T,
me M, ne N.
Xm20,
Pie 20,Ye (0,1), ie I,te T.

TechKnewledga
Tech Knouldy PuDIIcatons
PubIlcatlons
Management Information Systen
4-21 Duusiness Intelligenceand Data Analytics 4-22 Management Information System
DataAnalytics Whereas when it comes to second case, the ability value mav bave diferent varlations also when it becomes
Business Intelligenceand profitsfo 366cult to fix single structure of weights which can be shared by different units.
Revenue Management Systems maximize the the
4.6
manageand its
main
objective is to
interest in
company So to avold different problems that can be raised by units to renresent a unit of weights that will give advantage
ro few DMUs instead of benefiting to all.
Revenue management: Sa
policy to
between demand
and supply.
criteria and
has also
gained service indusm Data envelopment analysis calculates the ablity for everv unit on bases of this weigh mechanísm which is go0d
maintaining the balance
marketing and logistic expected for DMU where the efficiency of system will be maximized. Also by dolng additional analysis the aim of data
created for companies. It was to
It is usually transport, tourist and hotels. distribution grow envelopment analysis are efficient or not.
responsible for manufacturing and maximizing their profit to the max.
accepted by about 4.7.1 Efficient Frontier
company thinks
Eventually it was been decision making
basic idea was
related to the
revenue and every
be planned according to the
strategies and
to it.
patterns and Ir is also known as production function which shows the relation between the inputs that are been used and the
management needs to when data is bee
feed
But the revenue becomes complex Outputs that are been produced using those inputs, It also shows the maximum amount of outputs that can be
companyand so it
models of the generated by given combination of inputs. Also it showed the minimum quantity of inputsthat would be
Management
Processes in Revenue mathematical models required to obtain the required output level.
4.6.1 Decision
management the models
that are involved
have which
product and its priceare use And hence efficient frontier is directly proportional to technical efciency of operating methods. Efictent
availability of the
When it comes to
revenue
the customers at every level so the canbe frontier can easily be gained by having set of observations whích shows the output level of given set of
the actions of out of the sales. combination of input level production factor.
to determine maximum of the profits managing various offers
optimized to have
revenue management is not only
maximizing profit but also
marketing strategies to promote the o
products
and When it comes to data envelopment analysis the observations that are been obtained responds to the units that
The aim of have different ideas of for the tra are been evaluated. Statistical methods which use instances to calculate regresslon curve give predeined
demand which will expenditure on the cost
servlces to increase the with minimum hypotheses on shape of production functions.
fulfilling the requirements policy and working over
it le.
logistics. It gives focus on companies have taken up this Data envelopment analysis considers assumptions on functional form of efficient frontier and is non parametric
policies most of the
successfully and the fields that ara
Since it is a managerial growing in nature. The only condition is that the units which are been compared should not be placed on production
become the favourite and companies, hotel chains. airli
notices that this policy have automotive rental companies, entertainment p0ssibil function depending on its ability value.
implenenting this policy are
fields are they have loW margin sales Cost and the
among these
so on. The common features also violating various sales channels. 4.8 The CCR Model
public and
imposing dynamic policies for

Development Analysis : Efficiency Measures When data envelopment analysis model is used the option of choosing the optimal weights of generic DMU,
4.7 Data included solving mathematical optimization model whose decision variables are given by weights u,, reKand v,
compared are known as decision mal
analvsis the units which are being ieH that is been associated with every input and output.
When it comes to data development
decisions that are self governed. There are various formulas to get the efficiency score both the well known is Charnes -Cooper-Rhodes (CCR)
units also known as DMUs as they have
being compared. If these units are ahlet model which is given by formula:
of n units N = (1, 2, .., n} re the set of units
To calculate the efficiency
effect of jh decision making unit DMU, j eNwhich k
produce one single output from one single input only the
max V=
reK
given as:
ieH y
,
if output is generated
In that y, will be the output value generated by DMU, and x, is input that is been used. And ieH
using different input factors, the efficiency of DMU, willbe defined as ratio between weighted sum of outputs and s.to s1, je N,

inputs. ie H.
u v20, re K,
Given by H=(1, 2, .., s) is set of production factors and K= (1, 2, .., m} which are the outputs. In x iE Hwhich
The aim is to maximize the capability measures for DMU,
gives quantity of inputs Iwhich are been used in DMU, and v, re Kwhich is the quantity of output r that is been
gained and the efficiency of DMU, is given as:
max :M
reKY
6, =
S. to
ie H
Vy1,
Where weighs u,ug,..u are been associated by outputs and v,,v,,..,.v. is been
assigned to inputs. Tech Kneuledge
Cati o
Tech Knoulelj!
PubIcatlons

You might also like