0% found this document useful (0 votes)
27 views6 pages

Estimating The On-Time Probability For Vendor Selection Problem 1

This document discusses estimating the on-time probability for vendor selection in supply chain management. It analyzes data from a gas turbine manufacturer to build probabilistic models for predicting a vendor's likelihood of delivering parts on time. Three quantitative methods - logistic regression, discrete time survival analysis, and naive Bayes classification - are applied and evaluated using key performance metrics. The goal is to supplement traditional selection criteria with data-driven estimates of a vendor's on-time performance to help make more informed decisions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views6 pages

Estimating The On-Time Probability For Vendor Selection Problem 1

This document discusses estimating the on-time probability for vendor selection in supply chain management. It analyzes data from a gas turbine manufacturer to build probabilistic models for predicting a vendor's likelihood of delivering parts on time. Three quantitative methods - logistic regression, discrete time survival analysis, and naive Bayes classification - are applied and evaluated using key performance metrics. The goal is to supplement traditional selection criteria with data-driven estimates of a vendor's on-time performance to help make more informed decisions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Estimating the On-Time Probability for Vendor Selection Problem

B. Ashish Kumar1, Parthasarthy Ramachandran1, Girish Modgil2


1
Department of Management Studies, Indian Institute of Science, Bangalore, India
2
GE Power Services, Data Science, Atlanta, GA, USA
([email protected], [email protected], [email protected] )

Abstract – Customers expect fast delivery of products and energy and convert it into mechanical energy to produce
services. Businesses understand this requirement and focus electricity. It is a complex machine consisting of the four
on efficient supply chains. The vendor selection process, modules mentioned previously. Besides these critical units,
which is complicated due a host of internal and external there are large number of components (such as air inlet
factors affecting the decision making, is fundamental to an ducts, foundation bolts, pipes etc.) that are essential to
efficient and responsive supply chain. As a selection criterion,
assemble and run a turbine-generator set. The analysis
the on-time probability for a vendor to supply a part can be
used. In this paper, we have applied three quantitative discussed herein refers to the on-time probabilities
methods, namely logistics regression, discrete time survival associated with delivery of these supplementary parts. The
analysis and naïve Bayes classifier to evaluate a vendor. The manufacturer, forges strategic partnership with the vendors
mathematical models to estimate the on-time probability, to supply these supplementary parts per an agreement in
were built and tested on a data set provided by a case terms of item, quantity and price. Whenever, a customer
company and evaluated with the help of key metrics. places an order for a power generating unit such as a gas
turbine with the original equipment manufacturer (OEM),
Keywords - vendor selection, supply chain, probabilistic the OEM initiates a procurement process for critical and
models supplementary parts. For the supplementary parts, there is
a vendor selection process based on certain traditional
criteria leading to the placement of an order for the parts
I. INTRODUCTION with a selected vendor. To give a legal scope so that the
In an ever competitive business environment with business is insulated from loss, the manufacturer enters into
shrinking time lines and strict budget caps, the need to a multilevel contractual agreement with the customer, a
complete the project within the contractual time period is part of which mentions the due date for the installation of
paramount. Tardiness in delivery time has huge the turbine at the project site. Tardiness will invoke the
repercussions, usually in the form of financial penalties and “Fair Condition” clause, which states that any delay of the
budget overruns. Additionally, there may be an adverse- project results in fair compensation to the customer if the
indirect effect on the customer experience which is not manufacturer fails to meet the contractual condition. Thus
quantifiable in many cases. Hence managing the project the onus to install the turbine at the project site, on time, is
deliverables is fundamental to efficient project execution with the manufacturer.
and can benefit immensely from a quantitative approach The manufacturer has direct control on the workflow
that provides realistic estimates. Traditional industries are of the turbine, right from the beginning of manufacturing
continually evolving and becoming more digitized and in to its delivery at the project site. However, the
the process are gathering large volumes of data, at a very supplementary parts, which are critical in the installation
high rate, about their supply chain. This has generated phase are not under the direct purview of the manufacturer
petabytes of information surrounding products and for all stages of delivery. Thus the manufacturer-vendor
services, which if used appropriately can be successfully relationship is the key to get all the components at the
leveraged by the project managers to adhere to specified project site by a scheduled date. As a result, the
timelines and make rapid and more informed decisions. manufacturer is expected to exercise care in the vendor
These actions can have positive effect on the overall selection across the supplementary parts. Ceteris paribus, a
operations strategy. vendor which has a proven track record of supplying these
In this paper, we try a novel approach to analyze a parts on time, should be favored over a non-performing
business problem faced by a market leading industrial gas- vendor. Once a vendor is selected, the manufacturer should
turbine manufacturer, and explore solutions within the carefully monitor the progress of the outsourced parts
domains of data science. across various stages to ensure their arrival at the project
Gas and steam turbines are widely used for power site, on time. The objective of this paper is to design and
generation. Broadly speaking, a turbine consists of 4 major develop a quantitative approach to estimate the on time
modules namely compressor, combustion system, turbine probability for a vendor for supplying a selected part. The
and exhaust system. The reader is encouraged to review [1] goal would be to use this quantitative approach to
for a detailed description and further information on this supplement the traditional means that rely on historical
subject. These units operate together to extract thermal

Authorized978-1-5090-3665-3/16/$31.00 ©2016
licensed use limited to: J.R.D. Tata IEEE Library Indian Institute of Science Bengaluru.
Memorial 850 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 2016 IEEE IEEM

vendor relationships and geography to help make a more jointly, influence the variability in the date of actual arrival
informed decision on vendor selection. of a part at the project site. Unfortunately, information
relating to internal and external factors was not available
for this study. In the absence of such information,
II. BUSINESS PROCESS mathematical modelling becomes challenging, since the
model establishes empirical relationships between the
Once a part and the corresponding vendor has been dependent variable and the covariates. However, this
identified, the business has to track its performance across provides an opportunity to engineer the covariates required
the following stages, which serve as the links in the supply for the analysis. We depend upon two intuitions which are
chain network. supposed to govern the dynamics.
Stage 1- ENGINEERING (ENG)
a) The part arrives at the project site on, before or
OEM activity to prepare the technical specification of the
after the expected date of arrival depending upon
parts before placing an order with the vendor
the extent of being early/tardy in the preceding
Stage 2- PURCHASE ORDER (POR)
stages
OEM activity to place the purchase order (quantity and
b) The part arrives at the project site on, before or
cost) which is the OEM’s commitment to buy the part from
after the expected date of arrival depending upon
the vendor
the time allocated to complete that particular
Stage 3- PROCUREMENT (PRQ)
stage and all other stages preceding to it
Vendor activity to perform procurement, assembly,
fabrication, etc. once the order has been placed The logic behind the two intuitions are pretty straight
Stage 4- PORT OF EXPORT (POE) forward. More is the extent of tardiness (or earliness) in the
Vendor activity to move the parts from the vendor’s preceding stages, it is expected that the part will become
warehouse to the port of export tardier (or later) in its current stage. Thus, if an instance of
Stage 5- PORT OF IMPORT (POI) part is late in the jth stage (say), and corrective actions taken
Vendor activity to ship the parts from the port of export to in the j+1th stage, it has high probability to become not late
the port of import and vice versa. This serve as a guideline for the first
Stage 6- ON SITE (SIT) intuition. Each task of a particular stage inherently takes
Customer activity to move the parts from the port of export certain time to complete. So, the summation of these
to the warehouse situated at the project site individual time is the minimum time required to complete
These six stages, which are indexed by j = 1, 2, …, 6, a stage. Let us say this time is to, j. Therefore, any time
respectively, are assumed to be discrete, observable and allocated to complete a stage less than to, j will add to the
sequential and align to form a process. As a part of this tardiness of the part instance. Though to, j is unknown to us,
process, a part being tardy yesterday can become early if the time allocated by the manufacturer to complete a
today due to corrective sequence of actions, and vice versa. stage is large, we expect the tardiness to be small leading
Thus continuous adaptation is intrinsic to the process, to a higher probability of that instance of part to be on time
which makes the analysis complex and challenging to in that particular stage. The excess of the allocated time can
model. also be carried forward in the subsequent stage. With
respect to the first intuition, the measure of
tardiness/earliness (in days) is captured by deriving a new
III. DATA DESCRIPTION random variable called as the “variance of jth stage”, vi, j,
which is defined as the difference between the status date
The data which was provided to us by the project and expected date.
sponsor, corresponded to the period from 1st January, 1995 With respect to the second intuition, we define another
to 4th February, 2016. In total there were 275k data points. random variable called as the “time allocated by the
Each of the record corresponds to one order placed by the manufacturer to complete jth stage”, ui, j, which is defined
manufacturer with a vendor, for a specific part. as the difference between the expected date of jth stage and
Additionally, for a specific part, status dates, sd, and the expected date of j-1th stage. Mathematically;
expected dates, ed, are provided for each of the six stages vi , j sdi , j  ed i , j (1)
described in the business process above. A key observation
here is that the manufacturer provides the expected date ui , j ed i , j  ed i , j 1 (2)
and the vendor provides the status date for the completion where; i ϵ {1,2, …, n} represents a specific order for the
of each stage. part, n is the total number of transactions.
Empirically, the variation in the delivery date of a part, Thus, we can expect the value of the vi, j to be both
either on, or before or after the expected date of arrival, positive and negative. When vi, j takes a value greater than
depends upon a host of internal and external factors. A few zero; the ith instance of the part is early in the jth stage.
internal factors are the order quantity, type of material, Similarly, when vi, j takes a value less than zero; the ith
weight, dimensions, etc. of the part. Factors like the mode instance of the part is tardy in the jth stage.
of shipment, weather, terrain, etc. comprise the external
source to this variation. These factors partially and/or

851 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.
Proceedings of the 2016 IEEE IEEM

IV. METHODOLOGY specific part, beginning from its birth at ENG stage to its
culmination at SIT stage, has been traced by the business
In this paper, we specifically explore and evaluate the with the help of status dates, sd’s across all stages.
three different techniques, logistic regression, discrete time As a part of this analysis, the event of interest is
survival analysis and naïve Bayes classifier to determine defined as “completion of Stage SIT by the instance of a
whether a specific part is expected to be on-time or tardy part”. Here, we try to model the random variable T, which
wherein the end of the process is defined as the conclusion denotes the time to the event. Also, there is absence of the
of the SIT stage. The following sections detail the process case of censoring because the time period of study is long
followed in the analysis. enough to capture the life time of the part instance from
POI stage to SIT stage, i.e. none of the event occurs post
1) Logistic Regression: the period of study. Also, any instance of the specific part
As a first approach, we introduce the logistic does not leave during the period of study (to be defined
regression model when the response variable is qualitative later), i.e. once it arrives at the port of import, it
with two possible outcomes, such as blood pressure (high mandatorily reaches the project site (may be on-time or not
or low), match outcome for a team (win or loss), etc. In our on-time). Few assumptions which serve as guidelines for
analysis, the outcome is whether an instance of the specific our discrete time model building approach are mentioned
part is on-time or not at the end of SIT stage. Hence, the below.
response variable vi, j=6 is dichotomized to have two
possible outcomes: the instance of part having vi, j=6 within a) The instance of the part in the sample space are
an allowable band is coded 1 and 0 otherwise. Let us denote independent
yi j=6 to be the random variable indicating whether the ith b) The time intervals are discrete (non-overlapping)
instance of a part is on time at SIT stage. and independent and identically distributed (iid)
ͳǡ ݈ ൑  ‫ݒ‬௜ǡ௝ୀ଺ ൑ ݄ c) Each instance of the part experiences the event
‫ݕ‬௜ǡ௝ୀ଺ ൌ  ൜ only once, i.e. it is a case of non-repeatable event
Ͳǡ ‫݁ݏ݅ݓݎ݄݁ݐ݋‬
The value of l and h, provides the lower and upper limits To start the clock as a part of this approach, we traced
of the allowable deviation of the actual dates from the when an instance of the specific part has reached the
expected dates. These values were provided by the project project site, after it arrived at the port of import. So we
leaders in the business. Let Yi, j = 6 are independent Bernoulli calculated the time ti,j=6 (in days) it took for an instance of
random variables [2] where the specific part to travel from the port of import and reach
Yi , j E{Yi , j 6 }  H i , j 6 (3) the project site, which is actually the time it takes to
Now, we attempt to define a probability model linking the complete SIT stage.
response variable, yi, j = 6 to a set of covariates, i.e. the ti , j 6 sdi , j 6  sdi , j 5 (7)
amount of being tardy/early in its previous stages and the To make event analysis discrete, we study the event in
time allocated by the manufacturer for the completion of terms of weeks. Hence any ti,j=6 in between the interval [0,
these stages, The value of these covariates is provided by 7) denotes the occurrence of an event at the end of week
provided by vi,j’s and ui,j’s respectively. one and the interval [7, 14) indicates at the end of week two
pi , j 6 E{Yi , j } (4) and so on and forth. The period of study is denoted by N
number of weeks. N represents the last time interval in the
pi , j 6 P ( yi , j 6 1 | X i(,1j) 6 vi , j 1 , X i(,2j) 6 vi , j 2 ,...,sample.

X i(,5j) 6 vi , j 5 , X i(,6j) 6 ui , j 2 , X i(,7j) 6 ui , j 3 ,..., End of 1st week End of Nth week
…..............
X i(,10j )6 ui , j 6 ) (5)
[0, 7)
We specify the popular logistic regression as a probability Discrete time survival analysis is a modification of the
model. logistic regression. However, to implement the former, we
E 'Xi, j alter the data structure, transforming the standard person
e 6

pi , j E 'Xi, j
(6) data to person-period data [4]. We use this person period
1 e 6
data to model the occurrence of event of interest. A recap
By the method of maximum likelihood estimation, the of basic terminologies, sets the landscape for further
coefficients, ߚመ ‫ ݏ‬are estimated from the training data. We analysis [5].
perform the standard logistic regression residual analysis Survival Function:
to check the adequacy of the logistic regression model [3]. The survival function denotes the probability that a part
does not complete SIT at the end of week denoted by time
2) Discrete Time Survival Analysis: interval t. It can be expressed as
The discrete time survival model is used to study not St P(T ! t ) (8)
only whether a specific event has occurred or not, but also
the time to the event. The life time of an instance of the

852 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.
Proceedings of the 2016 IEEE IEEM

pmf of T, that will help us to estimate the on time


Hazard Function: probability. As soon as a new order comes in, denoted by
The hazard Function denotes the probability that a part will the index i+1, the expected date, ed, for all stages of this
complete SIT at the end of week denoted by time interval t, order are provided by the business. To predict the on time
given that it has not completed SIT prior to time interval t. probability, we design the following algorithm.
It can be expressed as
ht P(T t | T t t) (9) Step 1: To estimate the hazard probability, the model takes
the historic median values for the covariates v1, v2, …, v5
Hence, we can relate survival function and hazard
(since these are currently unknown to us) and the values of
probability as covariates u1, u2, …, u5.
St (1  h1 )(1  h2 )...(1  ht 1 )(1  ht ) (10) Step 2: Let a new order for the part arrive and be indexed
Hazard function is fundamental to survival as i+1. To calculate on time probability (shown in Step 3)
analysis which needs to be estimated. There are 2 of this instance, we have to first calculate the probability
underlying variability in Hazard probabilities mass function of T, ‫݌‬Ƹ௜ǡ௧ . As an example we show how to
a) Baseline variability due to time i.e. completion of calculate ‫݌‬Ƹ௜ǡ௧ୀ௧೔ , for the ith instance at any t using the chain
SIT varies with time rule.
b) Variability due to individual characteristics ‫ۓ‬ ݄෠௜ǡ௧ൌ‫ ݅ݐ‬ǡ݂݅‫ݐ‬௜ ൌ ͳ
ۖ ‫ ݅ݐ‬െͳ
which is captured by introducing time-invariant ‫݌‬Ƹ௜ǡ௧ൌ‫ ݅ݐ‬ൌ 
covariates ෡ ݅‫ ݐ‬ሻǡ
‫݄۔‬෠݅‫ ݅ݐ‬ෑ ሺͳ െ ݄ ݂݅‫ݐ‬௜  ് ͳ
ۖ
‫ە‬ ‫݅ݐ‬ൌͳ
On formalizing the discrete time survival analysis, we have On plotting the cdf of ‫݌‬Ƹ௜ǡ௧ , we have ‫ܨ‬෠ .
hi , j hi ,t P(T t | T t t , X i(,1j) vi , j 1 , X i(,2j) 6 vi , j 2 ,..., Note: The un-conditioning is only on the time component and is
6, t 6
is still conditional on the time-invariant covariates
X i(,5j) 6 vi , j 5 , X i(,6j) 6 ui , j 2 , X i(,7j) 6 ui , j 3 ,..., X i(,10j )6 ui , j 6 ) (11) Step3: For the new order, let the time allocated to complete
SIT by the business is ui+1,j =6. The probability of on time is
The hazard probabilities mentioned in the model above calculated as
needs to be parameterized to be conditional on the š

covariates. Cox (1972) proposed the “Discrete Time p ontimei 1 F (ui 1, j 6  h)  F (ui 1, j 6  l ) (14)
Hazard Model” using logit function. This model represents
the log-odds of occurrence of event (i.e. completion of SIT where ui+1,j =6 , and the limits u and l are converted into
stage) as a function of covariates and also provides a way weeks.
to capture dependency of hazard probabilities on time. After the three steps, the algorithm gives us a rough
§ h · estimate of ‫݌‬Ƹ௢௡ି௧௜௠௘೔శభ for the i+1th instance of the specific
log¨¨ i ,t ¸
¸ D t  E ' X i ,t (12) part.
© 1  hi ,t ¹ However, as soon as it completes ENG, the actual
The model parameters can be estimated by the method value of vi+1, j=1, together with the historic median values of
of maximum likelihood estimation [6]. The component ߙ௧ v2, v3, …, v5 (since these are currently unknown to us) and
captures the variation in hazard probabilities due to time the values of covariates u1, u2, …, u5 are fed into the model.
and ߚ௧ accommodates the same due to individual The three steps, as envisaged, are repeated to provide a
characteristics. The most general way to account for time refined estimate of ‫݌‬Ƹ௢௡ି௧௜௠௘೔శభ . This process continues till
dependency is by inclusion of temporal dummy variables. the i+1th instance reaches port of import and gives us a
So, our complete Discrete Time Hazard model is realistic ‫݌‬Ƹ௢௡ି௧௜௠௘೔శభ . Hence, the algorithm developed can
§ h · be termed as evolutionary. The advantage it offers the
log¨¨ i ,t ¸¸ D1 Di(,1t )  D 2 Di(,2t )  ...  D N Di(,tN )  E ' X i ,t (13)
business is that it can indicate the stage at exactly where
© 1  hi ,t ¹
‫݌‬Ƹ௢௡ି௧௜௠௘೔శభ is dropping and take appropriate remedial
where, D’s are dummy variables for each time period. measures.
ͳǡ ݂݅‫ ݐ‬ൌ ͳ
‫ܦ‬ଵǡ௜ǡ௧ ൌ ൜ 3) Naïve Bayes Classifier:
Ͳǡ ‫݁ݏ݅ݓݎ݄݁ݐ݋‬
Just like logistic regression, where the vi, j=6 was
ͳǡ ݂݅‫ ݐ‬ൌ ʹ dichotomized into 0 or 1, the Bayes classifier follows the
‫ܦ‬ଶǡ௜ǡ௧ ൌ ൜ same approach for the response variable [7]. Hence each
Ͳǡ ‫݁ݏ݅ݓݎ݄݁ݐ݋‬
instance of the part is given a class label Y.
ͳǡ ݂݅‫ ݐ‬ൌ ܰ ͳǡ ݈ ൑  ‫݁ܿ݊ܽ݅ݎܽݒ‬௜ǡ௝ୀ଺ ൑ ݄
……….. ‫ܦ‬ேǡ௜ǡ௧ ൌ ൜ ܻ ൌ  ൜
Ͳǡ ‫݁ݏ݅ݓݎ݄݁ݐ݋‬ Ͳǡ ‫݁ݏ݅ݓݎ݄݁ݐ݋‬
Plugging in the estimated coefficients and the known The Bayes classifier uses the Bayes theorem to express
values of the covariates, we get our estimated hazard the posterior probability, P (Y|ܺԦ) in terms of prior
probability ݄෠௜ǡ௧ . However, the task at hand it to estimate the

853 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.
Proceedings of the 2016 IEEE IEEM

ሬሬԦ|Y)
probability P(Y), the class conditional probability P (ࢄ
ሬሬԦ
and the evidence P (ࢄ).
P( X | Y ) u P(Y )
P(Y | X ) (15)
P( X )
P( X | Y ) u P(Y )
P(Y | X ) (16)
P( X | Y 1) u P(Y 1)  P( X | Y 0) u P(Y 0)

The prior probability is estimated from the training set by Fig. 1. ‫݌‬Ƹ௜ǡ௝ୀ଺ versus the studentized Pearson residuals
computing the fraction of training instances belonging to
each class. However, to estimate the class conditional The plot in Fig. 2. portrays that as more weeks are
probability, poses a challenge. Traditionally three permitted, higher is the probability to complete SIT.
approaches are exhaustively used for implementation: Similar curves if plotted across all the stages, gives a
a) Assume that the covariates are conditionally scientific approach to estimate the eds.
independent, given a class label y
b) Bayesian belief networks
c) Assume that the joint distribution of the
covariates, given a class label y is univariate
normal or multivariate normal
As a part of our analysis, we explore the naïve Bayes
classifier to determine the on time probability of an
instance of the specific part. It follows the first traditional
approach as
P( X | Y y) P( X i(,1j) 6 v1 | Y y) u P( X i(,2j) 6 v2 | Y y) Fig. 2. Baseline survival probabilities

u ... u P( X (10)
i, j 6 uj 6 |Y y) (17) The plot in Fig. 3. portrays that the hazard probabilities
vary with time and the heterogeneity is due to the influence
To classify a new instance, indexed as i+1, the naïve of individual characteristics.
Bayes classifier computes the posterior probability for each
class Y and assigns the instance to the class where it is
maximum. However, correlated covariates can degrade the
performance of the naive Bayes classifier.

V. RESULTS & DISCUSSION


Fig. 3. Baseline hazard probabilities
The entire data set was split into training and testing
set in the ratio of 70:30 respectively. For method one and Let the new order for the part “ABCD” indexed as
three, out of n number of instances for the part, 45% were i+1th. The hypothetical expected days to complete this
on time and 55% were not on time. We exercised extreme order across all the stages is provided in TABLE 1.
care to make the test data as representative as the training
data. The model discussed above, were built using the TABLE 1
training data set and a confusion matrix was constructed on Stage Expected Days to Complete
the testing data. Even though the discrete time survival
SIT 81 ̱
෥ 12 weeks (approx.)
analysis underperformed as compared to logistic regression POE 82
model and naïve Bayes classifier, it includes the time POI 20
component which seems logical given the type of problem PRQ 217
we are trying to address. Further efforts are being made to POR 14
refine the algorithm developed by the research team and
compare its evolution. The results documented below are The plot in Fig. 4. shows the cdfs. The bold magenta
for a particular part “ABCD”. line shows ‫ܨ‬෠ with median values of v1, v2, …, v5 (since these
From the plot in Fig. 1. as a part of residual analysis of are currently unknown to us) and u1, u2, …, u5. The cyan
the logistic regression model, it is apparent that there is no dotted line shows ‫ܨ‬෠ at end of POI. Similarly, the updated
significant model inadequacy. cdf at the end other stages are shown in dotted lines with
varying colors.

854 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.
Proceedings of the 2016 IEEE IEEM

Naïve Bayes Classifier 58 47 66

VI. CONCLUSION

On comparing the three models, we conclude that the


logistic regression model and the naïve Bayes classifier
performed better than the discrete time survival analysis.
One reason of underperformance of the discrete time
Fig. 4. ‫ܨ‬෠ versus t in weeks survival model may be that the deviation of the actual date
from the expected date is not modeled. This a vivid
From the cdf of the medians, and using (14) the example of underperformance of a mathematical model
estimated ‫݌‬Ƹ௢௡ି௧௜௠௘೔శభ for the i+1th instance of part when it is trained by a data set lacking the outcome (in case
“ABCD” is 41.48 %. As the instance completes each stage, of supervised learning). We also believe that, had the
a revised ‫݌‬Ƹ௢௡ି௧௜௠௘೔శభ is calculated by reusing (14). The discreteness in time, been modeled in weeks instead of
process continues to give a realistic estimate of ‫݌‬Ƹ௢௡ି௧௜௠௘೔శభ days, we could have a better estimate ‫݌‬Ƹ௢௡ି௧௜௠௘೔శభ . In the
at the end of each stage. A key observation is the fall in the demonstrated example, the models were built on the part
on-time probability after completion of procurement stage level instead of vendor level. The models discussed above
for this data. This is shown in TABLE 2. can be further improved by incorporating additional
information about the vendor like assembly line capacity,
TABLE 2 technology used, etc. The methodologies discussed herein
Status
Early/Late Revised serve as a directional tool and the business is encouraged
(in Days) On Time Probability to practice caution while applying them in absolute sense
End of ENG -265 32.73 %
End of POR -93 34.45 %
for vendor selection.
End of PRQ 41 26.33 %
End of POE 26 34.92 %
End of POI 30 35.57 % ACKNOWLEDGEMENTS

The confusion matrix of each model is shown in Fig. 5. We are grateful to Brian Costigan, John Damalas, PL
Vadivel from GE Power for kindly making the data set
Discrete Actual Actual available for the present work and for their continued
Time Value Logistic Value support of the project.
Survival Regression
0 1 0 1
Analysis
Predicted

Predicted

0 20 29 0 28 16
Values

Values

REFERENCES
1 19 3 1 11 16
[1] H. P. Bloch, C. Soares, Process Plant Machinery.
Butterworth-Heinermann, 1998, pp. 45-101
(a) (b) [2] M. H. Kutner, C. J. Nachtshiem, J. Neter, W. Li, Applied
Linear Statistical Models. McGraw Hill/Irwin, 2004, pp.
Actual 563-565.
Naïve [3] T. Hastie, R. Tibshirani, J. Friedman, The Elements of
Value
Bayes
Statistical Learning. Springer, 2008, pp. 265-267.
Classifier 0 1
[4] J. D. Singer, J. B. Willett, “Modeling the days of our lives:
using survival analysis when designing and analyzing
Predicted

0 26 17
Values

longitudinal studies of duration and the timing of events,”


1 13 15 Psychological Bulletin, vol. 110, no. 2, pp. 282–286, Sep.
1991.
(c)
[5] P. D. Allison, “Discrete-time methods for the analysis of
Fig. 5. Confusion Matrix for (a) Logistic Regression (b) Discrete Time event histories,” Sociological Methodology, vol. 13, pp. 61-
Survival Analysis (c) Naïve Bayes Classifier 98, 1982
[6] J. D. Singer, J. B. Willett, “It’s all about time: using discrete
The model key metrics are shown in TABLE 3. time survival analysis to study the duration and the timing of
events,” Journal of Educational Statistics, vol. 18, no. 2, pp.
TABLE 3 169-171, 1993
Accuracy Sensitivity Specificity
[7] A. Y. Ng, M. I. Jordan, “On discriminative vs. generative
Model Name classifiers: a comparison of logistic regression and naïve
(in %) (in %) (in %)
Bayes,” in Proc. 15th Annual Information Processing
Logistic Regression 62 50 72
Systems, NIPS, British Columbia, Canada, pp.841-842,
Discrete Time Survival 2001.
32 10 51
Analysis

855 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.

You might also like