Estimating The On-Time Probability For Vendor Selection Problem 1
Estimating The On-Time Probability For Vendor Selection Problem 1
Abstract – Customers expect fast delivery of products and energy and convert it into mechanical energy to produce
services. Businesses understand this requirement and focus electricity. It is a complex machine consisting of the four
on efficient supply chains. The vendor selection process, modules mentioned previously. Besides these critical units,
which is complicated due a host of internal and external there are large number of components (such as air inlet
factors affecting the decision making, is fundamental to an ducts, foundation bolts, pipes etc.) that are essential to
efficient and responsive supply chain. As a selection criterion,
assemble and run a turbine-generator set. The analysis
the on-time probability for a vendor to supply a part can be
used. In this paper, we have applied three quantitative discussed herein refers to the on-time probabilities
methods, namely logistics regression, discrete time survival associated with delivery of these supplementary parts. The
analysis and naïve Bayes classifier to evaluate a vendor. The manufacturer, forges strategic partnership with the vendors
mathematical models to estimate the on-time probability, to supply these supplementary parts per an agreement in
were built and tested on a data set provided by a case terms of item, quantity and price. Whenever, a customer
company and evaluated with the help of key metrics. places an order for a power generating unit such as a gas
turbine with the original equipment manufacturer (OEM),
Keywords - vendor selection, supply chain, probabilistic the OEM initiates a procurement process for critical and
models supplementary parts. For the supplementary parts, there is
a vendor selection process based on certain traditional
criteria leading to the placement of an order for the parts
I. INTRODUCTION with a selected vendor. To give a legal scope so that the
In an ever competitive business environment with business is insulated from loss, the manufacturer enters into
shrinking time lines and strict budget caps, the need to a multilevel contractual agreement with the customer, a
complete the project within the contractual time period is part of which mentions the due date for the installation of
paramount. Tardiness in delivery time has huge the turbine at the project site. Tardiness will invoke the
repercussions, usually in the form of financial penalties and “Fair Condition” clause, which states that any delay of the
budget overruns. Additionally, there may be an adverse- project results in fair compensation to the customer if the
indirect effect on the customer experience which is not manufacturer fails to meet the contractual condition. Thus
quantifiable in many cases. Hence managing the project the onus to install the turbine at the project site, on time, is
deliverables is fundamental to efficient project execution with the manufacturer.
and can benefit immensely from a quantitative approach The manufacturer has direct control on the workflow
that provides realistic estimates. Traditional industries are of the turbine, right from the beginning of manufacturing
continually evolving and becoming more digitized and in to its delivery at the project site. However, the
the process are gathering large volumes of data, at a very supplementary parts, which are critical in the installation
high rate, about their supply chain. This has generated phase are not under the direct purview of the manufacturer
petabytes of information surrounding products and for all stages of delivery. Thus the manufacturer-vendor
services, which if used appropriately can be successfully relationship is the key to get all the components at the
leveraged by the project managers to adhere to specified project site by a scheduled date. As a result, the
timelines and make rapid and more informed decisions. manufacturer is expected to exercise care in the vendor
These actions can have positive effect on the overall selection across the supplementary parts. Ceteris paribus, a
operations strategy. vendor which has a proven track record of supplying these
In this paper, we try a novel approach to analyze a parts on time, should be favored over a non-performing
business problem faced by a market leading industrial gas- vendor. Once a vendor is selected, the manufacturer should
turbine manufacturer, and explore solutions within the carefully monitor the progress of the outsourced parts
domains of data science. across various stages to ensure their arrival at the project
Gas and steam turbines are widely used for power site, on time. The objective of this paper is to design and
generation. Broadly speaking, a turbine consists of 4 major develop a quantitative approach to estimate the on time
modules namely compressor, combustion system, turbine probability for a vendor for supplying a selected part. The
and exhaust system. The reader is encouraged to review [1] goal would be to use this quantitative approach to
for a detailed description and further information on this supplement the traditional means that rely on historical
subject. These units operate together to extract thermal
Authorized978-1-5090-3665-3/16/$31.00 ©2016
licensed use limited to: J.R.D. Tata IEEE Library Indian Institute of Science Bengaluru.
Memorial 850 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 2016 IEEE IEEM
vendor relationships and geography to help make a more jointly, influence the variability in the date of actual arrival
informed decision on vendor selection. of a part at the project site. Unfortunately, information
relating to internal and external factors was not available
for this study. In the absence of such information,
II. BUSINESS PROCESS mathematical modelling becomes challenging, since the
model establishes empirical relationships between the
Once a part and the corresponding vendor has been dependent variable and the covariates. However, this
identified, the business has to track its performance across provides an opportunity to engineer the covariates required
the following stages, which serve as the links in the supply for the analysis. We depend upon two intuitions which are
chain network. supposed to govern the dynamics.
Stage 1- ENGINEERING (ENG)
a) The part arrives at the project site on, before or
OEM activity to prepare the technical specification of the
after the expected date of arrival depending upon
parts before placing an order with the vendor
the extent of being early/tardy in the preceding
Stage 2- PURCHASE ORDER (POR)
stages
OEM activity to place the purchase order (quantity and
b) The part arrives at the project site on, before or
cost) which is the OEM’s commitment to buy the part from
after the expected date of arrival depending upon
the vendor
the time allocated to complete that particular
Stage 3- PROCUREMENT (PRQ)
stage and all other stages preceding to it
Vendor activity to perform procurement, assembly,
fabrication, etc. once the order has been placed The logic behind the two intuitions are pretty straight
Stage 4- PORT OF EXPORT (POE) forward. More is the extent of tardiness (or earliness) in the
Vendor activity to move the parts from the vendor’s preceding stages, it is expected that the part will become
warehouse to the port of export tardier (or later) in its current stage. Thus, if an instance of
Stage 5- PORT OF IMPORT (POI) part is late in the jth stage (say), and corrective actions taken
Vendor activity to ship the parts from the port of export to in the j+1th stage, it has high probability to become not late
the port of import and vice versa. This serve as a guideline for the first
Stage 6- ON SITE (SIT) intuition. Each task of a particular stage inherently takes
Customer activity to move the parts from the port of export certain time to complete. So, the summation of these
to the warehouse situated at the project site individual time is the minimum time required to complete
These six stages, which are indexed by j = 1, 2, …, 6, a stage. Let us say this time is to, j. Therefore, any time
respectively, are assumed to be discrete, observable and allocated to complete a stage less than to, j will add to the
sequential and align to form a process. As a part of this tardiness of the part instance. Though to, j is unknown to us,
process, a part being tardy yesterday can become early if the time allocated by the manufacturer to complete a
today due to corrective sequence of actions, and vice versa. stage is large, we expect the tardiness to be small leading
Thus continuous adaptation is intrinsic to the process, to a higher probability of that instance of part to be on time
which makes the analysis complex and challenging to in that particular stage. The excess of the allocated time can
model. also be carried forward in the subsequent stage. With
respect to the first intuition, the measure of
tardiness/earliness (in days) is captured by deriving a new
III. DATA DESCRIPTION random variable called as the “variance of jth stage”, vi, j,
which is defined as the difference between the status date
The data which was provided to us by the project and expected date.
sponsor, corresponded to the period from 1st January, 1995 With respect to the second intuition, we define another
to 4th February, 2016. In total there were 275k data points. random variable called as the “time allocated by the
Each of the record corresponds to one order placed by the manufacturer to complete jth stage”, ui, j, which is defined
manufacturer with a vendor, for a specific part. as the difference between the expected date of jth stage and
Additionally, for a specific part, status dates, sd, and the expected date of j-1th stage. Mathematically;
expected dates, ed, are provided for each of the six stages vi , j sdi , j ed i , j (1)
described in the business process above. A key observation
here is that the manufacturer provides the expected date ui , j ed i , j ed i , j 1 (2)
and the vendor provides the status date for the completion where; i ϵ {1,2, …, n} represents a specific order for the
of each stage. part, n is the total number of transactions.
Empirically, the variation in the delivery date of a part, Thus, we can expect the value of the vi, j to be both
either on, or before or after the expected date of arrival, positive and negative. When vi, j takes a value greater than
depends upon a host of internal and external factors. A few zero; the ith instance of the part is early in the jth stage.
internal factors are the order quantity, type of material, Similarly, when vi, j takes a value less than zero; the ith
weight, dimensions, etc. of the part. Factors like the mode instance of the part is tardy in the jth stage.
of shipment, weather, terrain, etc. comprise the external
source to this variation. These factors partially and/or
851 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.
Proceedings of the 2016 IEEE IEEM
IV. METHODOLOGY specific part, beginning from its birth at ENG stage to its
culmination at SIT stage, has been traced by the business
In this paper, we specifically explore and evaluate the with the help of status dates, sd’s across all stages.
three different techniques, logistic regression, discrete time As a part of this analysis, the event of interest is
survival analysis and naïve Bayes classifier to determine defined as “completion of Stage SIT by the instance of a
whether a specific part is expected to be on-time or tardy part”. Here, we try to model the random variable T, which
wherein the end of the process is defined as the conclusion denotes the time to the event. Also, there is absence of the
of the SIT stage. The following sections detail the process case of censoring because the time period of study is long
followed in the analysis. enough to capture the life time of the part instance from
POI stage to SIT stage, i.e. none of the event occurs post
1) Logistic Regression: the period of study. Also, any instance of the specific part
As a first approach, we introduce the logistic does not leave during the period of study (to be defined
regression model when the response variable is qualitative later), i.e. once it arrives at the port of import, it
with two possible outcomes, such as blood pressure (high mandatorily reaches the project site (may be on-time or not
or low), match outcome for a team (win or loss), etc. In our on-time). Few assumptions which serve as guidelines for
analysis, the outcome is whether an instance of the specific our discrete time model building approach are mentioned
part is on-time or not at the end of SIT stage. Hence, the below.
response variable vi, j=6 is dichotomized to have two
possible outcomes: the instance of part having vi, j=6 within a) The instance of the part in the sample space are
an allowable band is coded 1 and 0 otherwise. Let us denote independent
yi j=6 to be the random variable indicating whether the ith b) The time intervals are discrete (non-overlapping)
instance of a part is on time at SIT stage. and independent and identically distributed (iid)
ͳǡ ݈ ݒǡୀ ݄ c) Each instance of the part experiences the event
ݕǡୀ ൌ ൜ only once, i.e. it is a case of non-repeatable event
Ͳǡ ݁ݏ݅ݓݎ݄݁ݐ
The value of l and h, provides the lower and upper limits To start the clock as a part of this approach, we traced
of the allowable deviation of the actual dates from the when an instance of the specific part has reached the
expected dates. These values were provided by the project project site, after it arrived at the port of import. So we
leaders in the business. Let Yi, j = 6 are independent Bernoulli calculated the time ti,j=6 (in days) it took for an instance of
random variables [2] where the specific part to travel from the port of import and reach
Yi , j E{Yi , j 6 } H i , j 6 (3) the project site, which is actually the time it takes to
Now, we attempt to define a probability model linking the complete SIT stage.
response variable, yi, j = 6 to a set of covariates, i.e. the ti , j 6 sdi , j 6 sdi , j 5 (7)
amount of being tardy/early in its previous stages and the To make event analysis discrete, we study the event in
time allocated by the manufacturer for the completion of terms of weeks. Hence any ti,j=6 in between the interval [0,
these stages, The value of these covariates is provided by 7) denotes the occurrence of an event at the end of week
provided by vi,j’s and ui,j’s respectively. one and the interval [7, 14) indicates at the end of week two
pi , j 6 E{Yi , j } (4) and so on and forth. The period of study is denoted by N
number of weeks. N represents the last time interval in the
pi , j 6 P ( yi , j 6 1 | X i(,1j) 6 vi , j 1 , X i(,2j) 6 vi , j 2 ,...,sample.
X i(,5j) 6 vi , j 5 , X i(,6j) 6 ui , j 2 , X i(,7j) 6 ui , j 3 ,..., End of 1st week End of Nth week
…..............
X i(,10j )6 ui , j 6 ) (5)
[0, 7)
We specify the popular logistic regression as a probability Discrete time survival analysis is a modification of the
model. logistic regression. However, to implement the former, we
E 'Xi, j alter the data structure, transforming the standard person
e 6
pi , j E 'Xi, j
(6) data to person-period data [4]. We use this person period
1 e 6
data to model the occurrence of event of interest. A recap
By the method of maximum likelihood estimation, the of basic terminologies, sets the landscape for further
coefficients, ߚመ ݏare estimated from the training data. We analysis [5].
perform the standard logistic regression residual analysis Survival Function:
to check the adequacy of the logistic regression model [3]. The survival function denotes the probability that a part
does not complete SIT at the end of week denoted by time
2) Discrete Time Survival Analysis: interval t. It can be expressed as
The discrete time survival model is used to study not St P(T ! t ) (8)
only whether a specific event has occurred or not, but also
the time to the event. The life time of an instance of the
852 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.
Proceedings of the 2016 IEEE IEEM
covariates. Cox (1972) proposed the “Discrete Time p ontimei 1 F (ui 1, j 6 h) F (ui 1, j 6 l ) (14)
Hazard Model” using logit function. This model represents
the log-odds of occurrence of event (i.e. completion of SIT where ui+1,j =6 , and the limits u and l are converted into
stage) as a function of covariates and also provides a way weeks.
to capture dependency of hazard probabilities on time. After the three steps, the algorithm gives us a rough
§ h · estimate of Ƹି௧శభ for the i+1th instance of the specific
log¨¨ i ,t ¸
¸ D t E ' X i ,t (12) part.
© 1 hi ,t ¹ However, as soon as it completes ENG, the actual
The model parameters can be estimated by the method value of vi+1, j=1, together with the historic median values of
of maximum likelihood estimation [6]. The component ߙ௧ v2, v3, …, v5 (since these are currently unknown to us) and
captures the variation in hazard probabilities due to time the values of covariates u1, u2, …, u5 are fed into the model.
and ߚ௧ accommodates the same due to individual The three steps, as envisaged, are repeated to provide a
characteristics. The most general way to account for time refined estimate of Ƹି௧శభ . This process continues till
dependency is by inclusion of temporal dummy variables. the i+1th instance reaches port of import and gives us a
So, our complete Discrete Time Hazard model is realistic Ƹି௧శభ . Hence, the algorithm developed can
§ h · be termed as evolutionary. The advantage it offers the
log¨¨ i ,t ¸¸ D1 Di(,1t ) D 2 Di(,2t ) ... D N Di(,tN ) E ' X i ,t (13)
business is that it can indicate the stage at exactly where
© 1 hi ,t ¹
Ƹି௧శభ is dropping and take appropriate remedial
where, D’s are dummy variables for each time period. measures.
ͳǡ ݂݅ ݐൌ ͳ
ܦଵǡǡ௧ ൌ ൜ 3) Naïve Bayes Classifier:
Ͳǡ ݁ݏ݅ݓݎ݄݁ݐ
Just like logistic regression, where the vi, j=6 was
ͳǡ ݂݅ ݐൌ ʹ dichotomized into 0 or 1, the Bayes classifier follows the
ܦଶǡǡ௧ ൌ ൜ same approach for the response variable [7]. Hence each
Ͳǡ ݁ݏ݅ݓݎ݄݁ݐ
instance of the part is given a class label Y.
ͳǡ ݂݅ ݐൌ ܰ ͳǡ ݈ ݁ܿ݊ܽ݅ݎܽݒǡୀ ݄
……….. ܦேǡǡ௧ ൌ ൜ ܻ ൌ ൜
Ͳǡ ݁ݏ݅ݓݎ݄݁ݐ Ͳǡ ݁ݏ݅ݓݎ݄݁ݐ
Plugging in the estimated coefficients and the known The Bayes classifier uses the Bayes theorem to express
values of the covariates, we get our estimated hazard the posterior probability, P (Y|ܺԦ) in terms of prior
probability ݄ǡ௧ . However, the task at hand it to estimate the
853 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.
Proceedings of the 2016 IEEE IEEM
ሬሬԦ|Y)
probability P(Y), the class conditional probability P (ࢄ
ሬሬԦ
and the evidence P (ࢄ).
P( X | Y ) u P(Y )
P(Y | X ) (15)
P( X )
P( X | Y ) u P(Y )
P(Y | X ) (16)
P( X | Y 1) u P(Y 1) P( X | Y 0) u P(Y 0)
The prior probability is estimated from the training set by Fig. 1. Ƹǡୀ versus the studentized Pearson residuals
computing the fraction of training instances belonging to
each class. However, to estimate the class conditional The plot in Fig. 2. portrays that as more weeks are
probability, poses a challenge. Traditionally three permitted, higher is the probability to complete SIT.
approaches are exhaustively used for implementation: Similar curves if plotted across all the stages, gives a
a) Assume that the covariates are conditionally scientific approach to estimate the eds.
independent, given a class label y
b) Bayesian belief networks
c) Assume that the joint distribution of the
covariates, given a class label y is univariate
normal or multivariate normal
As a part of our analysis, we explore the naïve Bayes
classifier to determine the on time probability of an
instance of the specific part. It follows the first traditional
approach as
P( X | Y y) P( X i(,1j) 6 v1 | Y y) u P( X i(,2j) 6 v2 | Y y) Fig. 2. Baseline survival probabilities
u ... u P( X (10)
i, j 6 uj 6 |Y y) (17) The plot in Fig. 3. portrays that the hazard probabilities
vary with time and the heterogeneity is due to the influence
To classify a new instance, indexed as i+1, the naïve of individual characteristics.
Bayes classifier computes the posterior probability for each
class Y and assigns the instance to the class where it is
maximum. However, correlated covariates can degrade the
performance of the naive Bayes classifier.
854 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.
Proceedings of the 2016 IEEE IEEM
VI. CONCLUSION
The confusion matrix of each model is shown in Fig. 5. We are grateful to Brian Costigan, John Damalas, PL
Vadivel from GE Power for kindly making the data set
Discrete Actual Actual available for the present work and for their continued
Time Value Logistic Value support of the project.
Survival Regression
0 1 0 1
Analysis
Predicted
Predicted
0 20 29 0 28 16
Values
Values
REFERENCES
1 19 3 1 11 16
[1] H. P. Bloch, C. Soares, Process Plant Machinery.
Butterworth-Heinermann, 1998, pp. 45-101
(a) (b) [2] M. H. Kutner, C. J. Nachtshiem, J. Neter, W. Li, Applied
Linear Statistical Models. McGraw Hill/Irwin, 2004, pp.
Actual 563-565.
Naïve [3] T. Hastie, R. Tibshirani, J. Friedman, The Elements of
Value
Bayes
Statistical Learning. Springer, 2008, pp. 265-267.
Classifier 0 1
[4] J. D. Singer, J. B. Willett, “Modeling the days of our lives:
using survival analysis when designing and analyzing
Predicted
0 26 17
Values
855 Downloaded on December 02,2023 at 11:58:35 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru.