2021 Book AppliedAdvancedAnalytics
2021 Book AppliedAdvancedAnalytics
Applied
Advanced
Analytics
6th IIMA International Conference
on Advanced Data Analysis, Business
Analytics and Intelligence
Springer Proceedings in Business and Economics
Springer Proceedings in Business and Economics brings the most current research
presented at conferences and workshops to a global readership. The series features
volumes (in electronic and print formats) of selected contributions from conferences
in all areas of economics, business, management, and finance. In addition to an
overall evaluation by the publisher of the topical interest, scientific quality, and
timeliness of each volume, each contribution is refereed to standards comparable
to those of leading journals, resulting in authoritative contributions to the respective
fields. Springer’s production and distribution infrastructure ensures rapid publication
and wide circulation of the latest developments in the most compelling and promising
areas of research today.
The editorial development of volumes may be managed using Springer’s inno-
vative Online Conference Service (OCS), a proven online manuscript management
and review system. This system is designed to ensure an efficient timeline for your
publication, making Springer Proceedings in Business and Economics the premier
series to publish your workshop or conference volume.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore
Pte Ltd. 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Contents
Contributors
vii
viii Editor and Contributors
Sandip Kumar Pal Cognitive Business and Decision Support, IBM India,
Bengaluru, India
Shreya Piplani Experian Credit Information Company of India, Mumbai, India
Rudra P. Pradhan Indian Institute of Technology Kharagpur, Kharagpur, India
Sagar Raichandani Unbxd, Bengaluru, India
Aditi Singh Data Modeler, Experian Credit Information Company of India,
Mumbai, India
Abir Sinha Deloitte Consulting, Hyderabad, India
Vibhu Srivastava Analytics Centre of Excellence, Max Life Insurance, New Delhi,
India
R. Sujatha PSG Institute of Management, Coimbatore, India
R. P. Suresh Supply Chain and Operations Analytics, Applied Intelligence, Accen-
ture Digital, Gurugram, India
Sanjay Thawakar Analytics Centre of Excellence, Max Life Insurance, New
Delhi, India
B. Uma Maheswari PSG Institute of Management, Coimbatore, India
Shikha Verma Indian Institute of Management Ahmedabad, Ahmedabad, India
Machine Learning for Streaming Data:
Overview, Applications and Challenges
Shikha Verma
This chapter gives a brief overview of machine learning for streaming data by estab-
lishing the need for special algorithms suited for prediction tasks for data streams,
why conventional batch learning methods are not adequate, followed by applications
in various business domains.
Section 1 gives a brief overview of machine learning definitions and termi-
nologies with an emphasis on the challenges faced while mining streaming data.
Section 2 gives a panoramic view of the literature on classification and regression for
data streams. Section 3 gives a review of existing research on drift detection algo-
rithms, both in supervised and unsupervised manner. Section 4 provides interesting
application areas of the algorithms discussed in Sects. 3 and 4.
The past decade has witnessed a rapid decline in cost of capturing, storing and
analysing data which has facilitated a surge in interest in ‘machine learning’ by
academics and practitioners alike. Static programming relies on manual effort in
building an exhaustive set of conditions expected to be encountered by the programme
and their associated actions. In contrast, machine learning algorithms minimise
human effort by letting the data speak for itself by making the models learn from
the data. The goal of a machine learning model is to learn from historic data and
make predictions on unseen data. The learning phase is called the training phase,
and the predicting phase is called the testing phase. Generalisability is an important
model attribute that signifies that the model performance on training and test data is
S. Verma (B)
Indian Institute of Management Ahmedabad, Ahmedabad, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 1
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_1
2 S. Verma
comparable; i.e., the learnings from training data are useful for predictions on unseen
data.
Common machine learning algorithms include support vector machines, neural
networks, and decision trees. Machine learning has been adopted widely in fields like
marketing and sales, logistics, particle research, and autonomous mobility to name
a few, and the business value created by machine learning and artificial intelligence
is projected to reach $3.9 T in 2022 (Columbus 2019).
Overfitting and bias–variance tradeoff, especially in low sample size scenarios, are
concerns in conventional machine learning. However, in this day and age, data size
is huge, rather time and memory requirements for processing these huge volumes of
data are the bottleneck (Domingos and Hulten 2000). The prevalence of information
and communication technologies like the Internet of Things, GPS, and wearable
sensors and mobile Internet devices in our everyday lives has heralded the age of
big data—characterised by 3 V’s: volume, variety, and velocity. Consequently, the
development of efficient algorithms to mine big data has attracted significant attention
in the literature. Streaming data is a variant of big data with a salient velocity aspect.
Agarwal (2007) defines data streams as ‘sequence of data arriving from a source
at very high velocity’. Data streams are potentially infinite in nature, exerting high
demands of disc storage.
Developing algorithms for streaming data is particularly challenging as a single
pass at each incoming observation is allowed and the model has to be ready to predict
anytime (Bifet et al. 2009).
Domingos and Hulten (2000) propose an incremental learning algorithm for decision
trees that can learn optimal splits on multiple attributes based on information theoretic
criterion (Gini index, etc.) from small samples only as entire volumes of new portions
of streams are infeasible to process. They use Hoeffding bounds to ascertain that the
splitting choice of trees formed on incremental data and batch data are asymptotically
similar.
Ensemble machine learning models have often shown superior performance
to monolithic models in predictive tasks as the overall learning process is more
diverse and adaptable to multiple drift properties evolving over time (Khamassi et al.
2018). Extending this result to streaming data, Street and Kim (2001) introduced
an ensemble learning algorithm (SEA) using decision trees. It involves continuous
retraining of classifiers on incoming, new data chunks, and training of new classi-
fiers which are only added to the predictive ensemble if they improve the overall
prediction, and in lieu, a poorly performing classifier is dropped from the ensemble.
Developing this idea further, Kotler and Maloof (2003) created a weighted ensemble
where classifiers are given weights dynamically or dropped from ensemble based on
their deviation from the global ensemble prediction.
Kourtellis et al. (2016) propose a distributed algorithm for learning decision trees
from massive amounts of data arriving at high velocity which enables parallel compu-
tation and improves scalability on real-life datasets. Interested readers are referred
to Parthasarathy et al. (2007) for a detailed review of distributed mining methods for
data streams.
Many times, in classification problems encountered in real life, we witness ‘class
evolution’, i.e. emergence of a new class and change in prior distributions of older
classes. Some examples can be emergence of new topics on social media chatter in
the natural language processing task of topic classification and introduction of a new
transport mode in the task of transport mode detection. For detecting concept drift in
such cases, Sun et al. (2016) propose an ensemble approach where the base learner
for each class is updated with new data to adapt to changing class distributions.
4 S. Verma
The Hoeffding bounds on classifier performance of very fast decision trees (VFDT)
proposed by Domingos and Hulten (2000) can also be extended to regression prob-
lems. Ikonomovska and Gama (2008) propose fast and incremental regression trees
(FIRT) using the Chernoff bound to determine the necessary sample size for a partic-
ular splitting variable. After obtaining a split, the authors propose training neural
networks in incremental fashion in leaves of the node. The method is demonstrated
to have low time and space complexity, works well with numerical attributes, and
is robust to noise. FIRT with explicit drift detection capability based on sequential
statistical tests was introduced by Ikonomovska et al. (2015) which only allows for
model adaption when change is detected, saving on global model adaptation costs
when incoming data is sufficiently stable. Ikonomovska et al. (2015) propose methods
to form ensembles of Hoeffding trees for regression in evolving data streams.
As discussed in the previous sections, due to the changing nature of data, there is
a need for models trained on initial data chunk to attune themselves to the most
recent data concepts. Broadly categorising, there are two ways to do that: active and
passive. Active drift detection methods explicitly detect drift based on incoming data
characteristics, discard old data, and retrain classifier if and only if a drift is detected
which is a computationally parsimonious way to adapt to concept drift.
Passive methods update model irrespective of drift by retraining on new data as
it arrives or in chunks/windows. Sliding window models, exponential weighting of
older observations, and dropping the weakest learner in constantly evolving ensemble
are passive methods of adapting the model to recent changes in data distribution. As
is evident, they are computationally very expensive, require constant access to new
labels, and only suitable for small-scale problems. In the following subsections, we
will limit the discussion to active drift detection methods only.
Concept drift can occur in multiple ways as real-life data streams are generated
via complex, dynamic processes. Virtual drift refers to a scenario where predictor
distribution; i.e. p(x) changes but p(y|x) remains unchanged. Real drift refers to a
scenario where conditional posterior p(y|x) changes.
In addition to its nature, concept drift can also be characterised in several other
ways: speed (abrupt/gradual), severity (global/local), and recurrency (cyclic/acyclic).
In real-life scenarios, drift detection problem is particularly challenging as multiple
drift characteristics are evolving simultaneously (Khamassi et al. 2018). Sections 3.1
and 3.2 give an overview of supervised and unsupervised drift detection algorithms.
Machine Learning for Streaming Data: Overview, Applications … 5
Supervised drift detection refers to the task of drift detection when labels of all
successive observations in the stream are known and instantaneously available. In
one of the initial research efforts in this area, Page (1954) proposes a cumulative sum
(CUSUM) which use deviation from original mean as a measure for drift detection
and perform well for univariate sequence data.
Gama et al. (2004) propose a drift detection method (DDM) that monitors the
error rate of learning algorithm over new training data. If the error rate increases on
a temporal scale, it indicates a likely change in underlying distribution of data. As
error computation requires continuous availability of predicted and actual labels, this
method is supervised in nature. While DDM exhibits good performance in detecting
sudden drifts, its performance in detecting gradual drift was superseded by early drift
detection method (EDDM) proposed by Baena-Garcia et al. (2006).
Windowing has been used extensively in streaming data literature to update the
model as per the latest data concept. However, deciding on the window size a priori is
as challenging as it is important as window size represents a tradeoff between sensi-
tivity and stability. Bifet and Gavalda (2007) propose adaptive windowing (ADWIN)
algorithm which computes summary statistics on data windows and uses them as
inputs to contract the window size when a ‘sufficiently big’ change is detected and
expand the window size when no change is detected.
Bifet et al. (2009) introduce the adaptive size Hoeffding tree (ASHT), a method
that entails building an ensemble of trees of varying sizes to adapt to gradual and
abrupt drifts, with each tree having a weight inversely proportional to its error rate.
The authors extended the idea of incremental learning in Hoeffding trees with math-
ematical guarantees on performance to an ensemble setup which has more diverse
learners and better scalability.
As true labels are available, the nature of supervised drift detection problem has
a high resemblance to a typical classification task, Therefore, the evaluation metrics
used for supervised drift detection are accuracy, recall, precision, specificity, and
sensitivity.
Masud et al. (2010), Faria et al. (2013), Sethi et al. (2016) for novelty and concept
drift detection.
Clustering methods have been used to detect novelties/drifts not only in sequence
data but also in graph data. Aggarwal and Yu (2005) use clustering techniques on
graph data to detect expanding and contracting communities in graphs based on
changing interaction patterns. The use cases of this algorithm can be in identi-
fying evolving user subgroups based on phone calls references between individuals,
changing lending patterns in P2P microfinance groups, and information diffusion in
social media communities.
The evaluation metrics for an unsupervised drift detection task are false alarms,
sensitivity to drift, and additionally computation time for real-time applications.
4 Applications
This section gives a brief overview of the motivation and the present state of
research of machine learning for data streams in various application areas such as
recommender systems, transport, and mobility and fraud detection.
(a) Recommender systems
Recommender systems have been used extensively by e-commerce firms to
reduce the information overload on customers due to the presence of millions
of products on their platforms by analysing the user’s past behaviour on the plat-
form and suggesting products they are most likely to find valuable (Bobadilla
et al. 2013). However, user preferences change over time due to evolving market
dynamics and introduction of new products. Traditional approaches to recom-
mender systems do not incorporate this temporal dynamics and model item
user relationship in a static way, akin to batch learning. Nasraoui et al. (2007)
study the behaviour of collaborative filtering-based recommender systems in
streaming setup with concept evolution and found its performance to be inferior
to that in static scenarios. Chang et al. (2017) propose a streaming recommender
system which models user ratings over item space as time dependent and uses
recursive mean-field approximation to estimate the updated user preference.
(b) Mobility and Transportation
The pervasive use of GPS sensors across transport modes has enabled accu-
rate records of vehicle trajectories. This is leveraged by transport authorities
and researchers for short- and long-term demand estimation, real-time traffic
management, vehicle routing, and mode split estimation (Bajwa et al. 2011;
Huang et al. 2019). However, mobility patterns change dynamically due to
multiple factors such as pricing, service quality and quantity, demographic and
socio-economic factors, and land use patterns. Models built in streaming setup
are more suited for prediction and classification tasks in such a context. Laha
and Putatunda (2018) propose regression models and their ensembles in sliding
window setup with exponential fading strategy to predict drop-off location based
Machine Learning for Streaming Data: Overview, Applications … 7
on location data stream of ongoing trip in GPS taxis. The models are demon-
strated to have superior predictive performance with respect to their counterparts
in batch setup considering accuracy and computation time tradeoff.
Moreira et al. (2013) propose streaming data models to predict taxi demand over
spatial grids in the city over short horizons. Accurate forecasts of short-term
demand helps in efficient vehicle routing and better fleet management in order
to meet customer demand faster.
Boukhechba et al. (2015) propose association rule mining on GPS trajectories
of individuals to learn their mobility habits and use this knowledge to predict
the next location likely to be visited by them.
(c) Anomaly detection
Anomaly detection involves uncovering unusual and exceptional patterns from
recognised patterns. Email spam filtering, fraud detection in banking, and finance and
intrusion detection in networks are examples of anomaly detection tasks (Chandola
et al. 2009; Mazhelis and Puuronen 2007).
Anomaly detection models are trained on historical data but spam and intrusion
patterns evolve over time, created a need for adaptive models. Hayat et al. (2010)
propose an adaptive language model for spam filtering with concept drift detec-
tion abilities using KL divergence. Parveen et al. (2011) use ensembles of adap-
tive, multiple classification models to identify malicious inside users trying to gain
unauthorised access to information from historical activity log data.
5 Conclusion
In this chapter, we discussed the challenges faced while extracting intelligence from
data streams and give an overview of key predictive algorithms and drift detection
methods present in literature.
However, there is a need to develop adaptive algorithms as per application type
and keeping real-life challenges in time such as limited and costly access to ground
truth labels and computation time and accuracy tradeoff, especially for real-time
decision-making tasks. Also, as hardware becomes increasingly inexpensive and
deployment of machine learning models becomes pervasive in multiple domains, the
main concerns are about using the computing resources judiciously. This has opened
up an exciting new area of research called ‘green machine learning’ which aims to
improve algorithmic efficiency by reducing the computing resources required and
consequently minimise the carbon footprint of data centres and personal machines
running machine learning models (Strubell et al. 2019). To sum it up, as the digital
world generates ever increasing amounts of data, machine learning for streaming
data comes with its own set of challenges but has immense potential of delivering
business value.
8 S. Verma
References
Aggarwal, C. C. (Ed.). (2007). Data streams: Models and algorithms (Vol. 31). Springer Science
& Business Media.
Aggarwal, C. C., & Yu, P. S. (2005, April). Online analysis of community evolution in data streams.
In Proceedings of the 2005 SIAM International Conference on Data Mining (pp. 56–67). Society
for Industrial and Applied Mathematics.
Baena-Garcia, M., del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavalda, R., & Morales-Bueno, R.
(2006, September). Early drift detection method. In Fourth international workshop on knowledge
discovery from data streams (Vol. 6, pp. 77–86).
Bajwa, R., Rajagopal, R., Varaiya, P., & Kavaler, R. (2011, April). In-pavement wireless
sensor network for vehicle classification. In Proceedings of the 10th ACM/IEEE International
Conference on Information Processing in Sensor Networks (pp. 85–96). IEEE.
Bifet, A., & Gavalda, R. (2007, April). Learning from time-changing data with adaptive windowing.
In Proceedings of the 2007 SIAM international conference on data mining (pp. 443–448). Society
for Industrial and Applied Mathematics.
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., & Gavaldà, R. (2009, June). New ensemble
methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining (pp. 139–148). ACM.
Bobadilla, J., Ortega, F., Hernando, A., & Gutiérrez, A. (2013). Recommender systems survey.
Knowledge-based systems, 46, 109–132.
Boukhechba, M., Bouzouane, A., Bouchard, B., Gouin-Vallerand, C., & Giroux, S. (2015).
Online prediction of people’s next Point-of-Interest: Concept drift support. In Human Behavior
Understanding (pp. 97–116). Springer, Cham.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing
Surveys (CSUR), 41(3), 15.
Chang, S., Zhang, Y., Tang, J., Yin, D., Chang, Y., Hasegawa-Johnson, M. A., & Huang, T. S. (2017,
April). Streaming recommender systems. In Proceedings of the 26th International Conference on
World Wide Web (pp. 381–389). International World Wide Web Conferences Steering Committee.
Columbus, L. (2019, March). Roundup of Machine Learning Forecasts and Market Estimates for
2019. Forbes. Retrieved from https://www.forbes.com/sites/louiscolumbus/2019/03/27/roundup-
of-machine-learning-forecasts-and-market-estimates-2019/#206e54247695
Domingos, P. M. (2012). A few useful things to know about machine learning. Communications of
the ACM, 55(10), 78–87.
Domingos, P., & Hulten, G. (2000, August). Mining high-speed data streams. In Kdd (Vol. 2, p. 4).
Faria, E. R., Gama, J., & Carvalho, A. C. (2013, March). Novelty detection algorithm for data
streams multi-class problems. In Proceedings of the 28th annual ACM symposium on applied
computing (pp. 795–800). ACM.
Gama, J., Medas, P., Castillo, G., & Rodrigues, P. (2004, September). Learning with drift detection.
In Brazilian symposium on artificial intelligence (pp. 286–295). Springer, Berlin, Heidelberg.
Hastie T. T. R., & Friedman, J. H. (2003). Elements of statistical learning: data mining, inference,
and prediction.
Hayat, M. Z., Basiri, J., Seyedhossein, L., & Shakery, A. (2010, December). Content-based
concept drift detection for email spam filtering. In 2010 5th International Symposium on
Telecommunications (pp. 531–536). IEEE.
Huang, H., Cheng, Y., & Weibel, R. (2019). Transport mode detection based on mobile phone
network data: A systematic review. Transportation Research Part C: Emerging Technologies.
Ikonomovska, E., & Gama, J. (2008, October). Learning model trees from data streams. In
International Conference on Discovery Science (pp. 52–63). Springer, Berlin, Heidelberg.
Ikonomovska, E., Gama, J., & Džeroski, S. (2015). Online tree-based ensembles and option trees
for regression on evolving data streams. Neurocomputing, 150, 458–470.
Machine Learning for Streaming Data: Overview, Applications … 9
Ikonomovska, E., Gama, J., Sebastião, R., & Gjorgjevik, D. (2009, October). Regression trees from
data streams with drift detection. In International Conference on Discovery Science (pp. 121–
135). Springer, Berlin, Heidelberg.
Khamassi, I., Sayed-Mouchaweh, M., Hammami, M., & Ghédira, K. (2018). Discussion and review
on evolving data streams and concept drift adapting. Evolving Systems, 9(1), 1–23.
Kolter, J. Z., & Maloof, M. A. (2003, November). Dynamic weighted majority: A new ensemble
method for tracking concept drift. In Third IEEE international conference on data mining
(pp. 123–130). IEEE.
Kourtellis, N., Morales, G. D. F., Bifet, A., & Murdopo, A. (2016, December). Vht: Vertical
hoeffding tree. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 915–922).
IEEE.
Laha, A. K., & Putatunda, S. (2018). Real time location prediction with taxi-GPS data streams.
Transportation Research Part C: Emerging Technologies, 92, 298–322.
Laney, D. (2001). 3D data management: Controlling data volume, velocity and variety. META Group
Research Note, 6(70), 1.
Masud, M., Gao, J., Khan, L., Han, J., & Thuraisingham, B. M. (2010). Classification and novel
class detection in concept-drifting data streams under time constraints. IEEE Transactions on
Knowledge and Data Engineering, 23(6), 859–874.
Mazhelis, O., & Puuronen, S. (2007, April). Comparing classifier combining techniques for mobile-
masquerader detection. In The Second International Conference on Availability, Reliability and
Security (ARES’07) (pp. 465–472). IEEE.
Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., & Damas, L. (2013). Predicting
taxi–passenger demand using streaming data. IEEE Transactions on Intelligent Transportation
Systems, 14(3), 1393–1402.
Nasraoui, O., Cerwinske, J., Rojas, C., & Gonzalez, F. (2007, April). Performance of recommenda-
tion systems in dynamic streaming environments. In Proceedings of the 2007 SIAM International
Conference on Data Mining (pp. 569–574). Society for Industrial and Applied Mathematics.
Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41(1/2), 100–115.
Parthasarathy, S., Ghoting, A., & Otey, M. E. (2007). A survey of distributed mining of data streams.
In Data Streams (pp. 289–307). Springer, Boston, MA.
Parveen, P., Evans, J., Thuraisingham, B., Hamlen, K. W., & Khan, L. (2011, October). Insider threat
detection using stream mining and graph mining. In 2011 IEEE Third International Conference
on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social
Computing (pp. 1102–1110). IEEE.
Sethi, T. S., Kantardzic, M., & Hu, H. (2016). A grid density based framework for classifying
streaming data in the presence of concept drift. Journal of Intelligent Information Systems, 46(1),
179–211.
Spinosa, E. J., de Leon F de Carvalho, A. P., & Gama, J. (2007, March). Olindda: A cluster-based
approach for detecting novelty and concept drift in data streams. In Proceedings of the 2007 ACM
symposium on Applied computing (pp. 448–452). ACM.
Street, W. N., & Kim, Y. (2001, August). A streaming ensemble algorithm (SEA) for large-scale clas-
sification. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge
discovery and data mining (pp. 377–382). ACM.
Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and Policy Considerations for Deep
Learning in NLP. arXiv preprint arXiv:1906.02243.
Sun, Y., Tang, K., Minku, L. L., Wang, S., & Yao, X. (2016). Online ensemble learning of data
streams with gradually evolved classes. IEEE Transactions on Knowledge and Data Engineering,
28(6), 1532–1545.
Binary Prediction
1 Introduction
Binary prediction is one of the most widely used analytical techniques having far-
reaching applications in multiple domains. In the business context, it is used to predict
which loans are likely to default, which policyholders are likely to discontinue an
insurance policy, which customers are likely to change their service provider, which
customers are likely to buy a newly released book, which transactions are likely to
be fraud, etc. Apart from business applications, the binary prediction problem arises
routinely in medicine, e.g., to determine whether a person has a certain disease or
not (Shilaskar and Ghatol 2013), chemistry (Banerjee and Preissner 2018) and many
other fields. Because of the huge importance of the binary prediction problem, a
number of methods have been developed over the years. The more well-known and
widely used methods are linear discriminant analysis, logistic regression, random
forest, support vector machines and k-nearest neighbors (see James et al. 2013 for
an introduction to these methods).
In this article, we concentrate on the binary prediction task. We discuss the well-
known logistic regression predictor and compare its performance with a relatively less
widely used predictor—the maximum score predictor using two real-life datasets.
The two datasets considered in this paper are both unbalanced with one class hav-
ing significantly larger number of observations than the other class. The maximum
score predictor discussed in this article is based on a modification of the maximum
score estimator introduced in Manski (1975). It is observed that the maximum score
predictor performs better than the logistic regression predictor for these two real-life
datasets.
The article is structured as follows: In Sect. 2, we briefly discuss the logistic
regression from a prediction perspective; in Sect. 3, we discuss the use of the logistic
A. K. Laha (B)
Indian Institute of Management Ahmedabad, Ahmedabad, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 11
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_2
12 A. K. Laha
2 Logistic Regression
where β0 , . . . , βk are unknown constants that have to be estimated from the given
data.
Let (yi , x1i , . . . , xki ), i = 1, . . . , n be a random sample of size n from the target
population. Then,
eβ0 +β1 x1i +...+βk xki yi eβ0 +β1 x1i +...+βk xki 1−yi
P(Yi = yi ) = 1 −
1 + eβ0 +β1 x1i +...+βk xki 1 + eβ0 +β1 x1i +...+βk xki
where yi = 0 or 1.
The parameters of the logistic regression model are estimated using the maximum
likelihood estimation (MLE) method. The likelihood is
n
eβ0 +β1 x1i +...+βk xki yi eβ0 +β1 x1i +...+βk xki 1−yi
L(β0 , . . . , βk ) = β +β +...+β
1−
i=1
1 + e 0 1 1i x x
k ki 1 + eβ0 +β1 x1i +...+βk xki
The values of β0 , . . . , βk for which L(β0 , . . . , βk ) is maximized are the MLEs, and
these are denoted as βˆ0 , . . . , βˆk . Given a new observation for which the values of
the predictor variables are known, say (x1∗ , . . . , xk∗ ), but the value of the response
variable Y ∗ is unknown we can estimate
ˆ ˆ ∗ ˆ ∗
eβ0 +β1 x1 +...+βk xk
P(Y ∗ = 1) = p̂ =
1 + eβˆ0 +βˆ1 x1 +...+βˆk xk
∗ ∗
The delta method (Small 2010) can be used to obtain the approximate standard error
of the estimated probability when the sample size n is large. Let β = (β0 , β1 , . . . , βk )
∗
ex β
and x∗ = (1, x1∗ , . . . , xk∗ ), then in matrix notation we have p̂ = 1+e x∗ β . An applica-
Binary Prediction 13
tion of delta method yields the estimated asymptotic standard error of p̂ as se( p̂) =
p̂(1 − p̂)x∗ V̂(β̂)x∗ where V̂(β̂) is the estimated variance–covariance matrix of the
estimated coefficients (βˆ0 , βˆ1 , . . . , βˆk ). An approximate 95% confidence interval for
p̂ can then be obtained as ( p̂ − 2 se( p̂), p̂ + 2 se( p̂)).
The logistic regression model provides an estimate of the probability P(Y ∗ = 1).
When a prediction of Y ∗ is desired, this information is converted to an estimate of
Y ∗ by use of a threshold c on the magnitude of p̂, i.e., Ŷ ∗ = 1 if p̂ > c and is = 0
otherwise. In other words,
Ŷ ∗ = 1 p̂>c
m 00 + m 11
Accuracy =
m
and is often expressed as a percentage. While being a good measure in situations
where the response is a balanced mix of 0s and 1s, accuracy can be a misleading
14 A. K. Laha
and the specificity is the accuracy of the predictor in predicting response 0, i.e.,
m 00
Specificity =
m 00 + m 01
These are often expressed as percentages. An effective binary predictor should have
both high sensitivity and specificity desirably close to 100%. However, it is generally
not possible to have both specificity and sensitivity close to 100% when dealing
with real-life datasets and therefore based on the application context a trade-off
between sensitivity and specificity is carried out while choosing the threshold value
c. Note that all the performance measures discussed until now are all dependent on
the choice of the threshold value c. As the threshold value c is varied in the range
0 ≤ c ≤ 1, we obtain a set of points (Specificity (c), Sensitivity (c)). The receiver
operating characteristic (ROC) curve is a plot of 1 − Specificity(c), Sensitivity(c) ,
0 ≤ c ≤ 1. The area under the ROC curve (AUC) is often used as a summary measure
of binary predictor performance with its ideal value being 1.
In practical applications, it is advisable to determine c using a “validation” dataset
that is separate from the training dataset to reduce the chance of overfitting. For this
purpose at the initial stage itself, the given data is divided randomly into three parts,
training, validation and test datasets containing 100α%, 100β% and 100(1 − α −
β)% of the data where 0 < α, β < 1 and 0 < α + β < 1. A popular choice for (α, β)
is (0.7, 0.2). The test data is used to get an idea about the performance of the binary
predictor with new data.
It is easy to check that p̂ > ĉ ⇐⇒ x β̂ > ln( 1−ĉ ĉ ). Writing β̃ˆ 0 = β̂0 − ln( 1−ĉ ĉ ) and
β̃ˆ = (β̃ˆ , βˆ , . . . , βˆ ), we can rewrite x β̂ > ln( ĉ ) as x β̃ˆ > 0. This suggests an
0 1 k 1−ĉ
alternative approach to the binary prediction problem, wherein we consider binary
predictors of the form Ŷ ∗ = 1x β>0 and estimate the unknown parameters β by
maximizing a “score function.” The score function can be accuracy or can be a
function of specificity and sensitivity as discussed below. Manski (1985) suggests
maximizing the accuracy on the training data for estimating the parameter β. Since
Binary Prediction 15
x β > 0 ⇐⇒ kx β > 0 for any constant k > 0, to ensure the identifiability of β it is
restricted to have unit Euclidean norm, i.e., ||β|| = 1. Other “score functions” that
may be considered are Youden’s index which is Sensitivity − (1 − Specificity) and
the G-mean which is the geometric mean of the specificity and sensitivity. Note that
a good binary predictor would have Youden’s index and G-mean as high as possible.
For both of these measures, the maximum possible value is 1.
5 Examples
In this section, we provide two examples based on real-life publicly available datasets.
For Example 1, we analyze the Amazon books dataset from DASL
(https://dasl.datadescription.com/datafile/amazon-books/). We aim to predict
whether a book is paperback (P) or hardcover (H) based on the information about
their list price, height, width and thickness given in the dataset. In Example 2, we
analyze a telecom customer churn dataset
(https://www.kaggle.com/mnassrib/telecom-churn-datasets) provided by Orange.
Here, we aim to predict churn using the predictors: total day minutes, total evening
minutes, total night minutes, total international minutes and number of customer
service calls.
5.1 Example 1
The numbers of “P” and “H” in the given dataset are not balanced, and the ratio
P : H is roughly1 3:1. We split the given dataset into three parts: training (70%),
validation(20%) and test(10%).2 For the purpose of comparison, we use the same
training dataset and test dataset for the logistic regression predictor and maximum
score predictor. The validation dataset is not used when working with the maximum
score predictor. For the logistic regression predictor, the validation data is used to
determine the threshold in two different ways: (i) minimizing the misclassification
error (where Misclassification error = m 01 + m 10 ) and (ii) maximizing the G-mean.
The sensitivity, specificity and the median G-mean value on the test data are noted for
all the three methods, i.e., logistic regression with threshold chosen by minimizing
the misclassification error (LR-Misclass), logistic regression with threshold chosen
by maximizing the G-mean (LR-G-mean) and maximum-score method with G-mean
score (Max-Score). This whole exercise is repeated 100 times, and then the median
sensitivity, median specificity and the G-mean of the three methods on the test data
are noted (see Table 1).
1 Of the 318 observations in the dataset, 84 books were H and the rest were P.
2 The number of observations in training, validation and test datasets was 223, 64 and 31, respectively.
16 A. K. Laha
It can be seen that the Max-Score predictor performs better than both the logistic
regression-based predictors, LR-Misclass and LR-G-mean, in terms of the median
G-mean.
5.2 Example 2
The telecom customer churn data consisted of 3333 observations of which 14.5%
were churners and the rest were not churners. Of the several variables available in
the dataset, we chose only the five variables mentioned earlier for this example. The
same steps as those followed in Example 1 above were followed, and Table 2 gives
the results.
It may be noted that the accuracy of the LR-Misclass predictor is the highest but is
of no use as it always predicts every observation in the test dataset as belonging to the
non-churner class. This is a typical problem when dealing with datasets in which one
class has many more observations compared to the other class. In these situations,
the trivial predictor which assigns every new observation to the majority class has
high accuracy but no business relevance. The use of G-mean alleviates this problem
to some extent. We find that the Max-Score predictor performs slightly better than
the LR-G-mean predictor in terms of the median G-mean.
Binary Prediction 17
6 Conclusion
References
Banerjee, P., & Preissner, R. (2018). Bittersweetforest: A random forest based binary classifier to
predict bitterness and sweetness of chemical compounds. Frontiers in Chemistry, 6, 93.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning:
With applications in r. Springer.
Manski, C. (1975). Maximum score estimation of the stochastic utility model of choice. Journal of
Econometrics, 3(3), 205–228.
Manski, C. F. (1985). Semiparametric analysis of discrete response: Asymptotic properties of the
maximum score estimator. Journal of Econometrics, 27(3), 313–333.
Shilaskar, S., & Ghatol, A. (2013). Feature selection for medical diagnosis : Evaluation for cardio-
vascular diseases. Expert Systems with Applications, 40(10), 4146–4153.
Small, C. G. (2010). Expansions and asymptotics for statistics. Chapman: Hall/CRC.
Reliability Shock Models:
A Brief Excursion
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 19
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_3
20 M. Mitra and R. A. Khan
and
∞ that the expected number of shocks required to cause failure is finite, i.e.
j=0 P̄j = B < ∞. Note that the probability that failure is caused by the k-th shock
is given by pk = P̄k−1 − P̄k , k = 1, 2, . . . and we define p0 = 1 − P̄0 . Then the
probability that the device survives beyond time t is given by the survival function
Reliability Shock Models: A Brief Excursion 21
where N (t) denotes the number of shocks to which the system has been exposed upto
time t. This is the prototype reliability shock model. During the last few decades,
many authors have studied this model extensively in several scenarios where shocks
arrive according to various counting processes. Some of these are listed below:
1. Homogeneous Poisson process (HPP), i.e. a counting process null at the origin
with independent and stationary increments where the probability of a shock
occurrring in (t, t + t] is λt + o(t), while the probability of more than one
shock occurring in (t, t + t] is o(t). Here shocks arrive with a constant inten-
sity λ and interarrival time Vk ’s are independent, identically distributed exponen-
tial random variables.
2. Nonhomogeneous Poisson process (NHPP), i.e., a counting process null at the
origin with independent increments where the probability of a shock in occurring
in (t, t + t] is λ(t)t + o(t), while the probability of more than one shock
occurring in (t, t + t] is o(t). Here λ(t) is the (integrable) intensity function
of the process.
3. Stationary pure birth process, i.e., shocks occur according to a Markov process;
given that k shocks have arrived in (0, t], the probability of a shock occurring in
(t, t + t] is λk t + o(t), while the probability of more than one shock occur-
ring in (t, t + t] is o(t). Here the shocks are independent and are governed by
a birth process with intensities λk , k = 0, 1, 2, . . .. Note that Vk+1 is exponentially
distributed with mean 1/λk for k = 0, 1, 2, . . ..
4. Nonstationary pure birth process, i.e. shocks occur according to a Markov process;
given that k shocks have arrived in (0, t], the probability of a shock occurring in
(t, t + t] is λk λ(t)t + o(t), while the probability of more than one shock
occurring in (t, t + t] is o(t).
5. renewal process; i.e. interarrival time between two consecutive shocks are inde-
pendent and identically distributed random variables.
Before proceeding further, a motivating example would be appropriate. Consider
an insurance company where claims arrive randomly in time. Claims can be inter-
preted as ‘shocks’ and the magnitude Xi of the i-th claim can be interpreted as the
damage caused by the i-th shock. When the total claim Z exceeds a threshold Y
(insurance company’s capital), the company goes bankrupt. This cumulative damage
model is illustrated in Fig. 1. This idea is applicable in various scenarios such as
risk analysis, inventory control, biometry, etc., by simply considering appropriate
analogues. The following table (Table 1) shows certain application areas with cor-
responding interpretations of the concepts of ‘shock’, ‘device failure’ and ‘survival
until time t’.
22 M. Mitra and R. A. Khan
One may refer to Nakagawa (2007) and the references therein for a more extensive
discussion of the application of shock model theory in diverse areas.
Using prior knowledge of the fact that the survival function H̄ (t) belongs to a
certain class of life distributions, it is feasible to obtain sharp bounds on the sur-
vival function and other parameters of the distribution. Such knowledge produces
proper maintenance strategies in reliability contexts, higher accuracy in estimation
results and functional financial policies in risk analysis. In the huge body of litera-
ture concerning shock model theory, the works of Shanthikumar and Sumita (1983),
Anderson (1987), Yamada (1989), Kochar (1990), Pérez-Ocón and Gámiz-Pérez
(1995), Pellerey (1993), Ebrahimi (1999) deserve special mention.
To the best of our knowledge, there is no review article available in the litera-
ture regarding nonparametric ageing classes arising from shock models. Reliability
shock models are primarily concerned with the following fundamental question : if
the sequence (P̄k )∞
0 possesses a certain discrete ageing property, does H̄ (t) inherit
the corresponding continuous version of the said ageing property under the trans-
formation (1) in various scenarios (i.e. when N (t) is a HPP, NHPP, stationary or
nonstationary pure birth process, etc.)? Before proceeding further, we introduce the
concept of ageing and notions of ageing classes as these concepts are germane to the
material that follows.
In the theory of reliability, survival or risk analysis, the concept of ageing plays a
prominent role. The term ‘ageing’ in the context of biological or mechanical devices
defines how the residual life of the device is affected by its age in some probabilistic
sense. ‘No ageing’ means that the age of a component has no effect on the distribution
of its residual lifetime. Positive ageing describes the situation where the residual
lifetime tends to decrease, in some probabilistic sense, with increasing age; that is,
the age has an adverse effect on the residual lifetime. Negative ageing describes the
opposite beneficial effect. If the same type of ageing pattern persists throughout the
entire lifespan of a device, it is called monotonic ageing. However, in many practical
situations, the effect of age does not remain same throughout the entire lifespan.
Typically, the effect of age is initially beneficial where negative ageing takes place (the
so-called burn-in phase). Next comes the ‘useful life phase’ where the ageing profile
remains more or less the same and finally, the effect of age is adverse and the ageing
is positive (the ‘wear-out’ phase). This kind of ageing is called nonmonotonic ageing
and arises naturally in situations like infant mortality, work hardening of mechanical
or electronic machines and lengths of political coalitions or divorce rates, etc.
To model this ageing phenomenon, several nonparametric ageing families of dis-
tributions have been defined in the literature. These classes have been categorized
based on the behaviour of the survival function, failure rate function and the mean
residual life (MRL) functions. The approach is to model lifetime as a nonnegative
random variable X . For t ≥ 0,
F(t) := P[X ≤ t]
f (t)
r(t) := , for t ≥ 0 satisfying F̄(t) > 0.
F̄(t)
It is easy to see that the failure rate function gives the approximate probability that a
unit will fail in the interval (t, t + t], given that the unit is functioning at time t. The
MRL function answers the fundamental question ‘what is the expected remaining
life of a unit of age t?’
24 M. Mitra and R. A. Khan
Definition 1.1 If F is a life distribution function with finite mean, then the MRL
function of F at age t > 0 is defined as
⎧ ∞
⎨ 1
F̄(u)du if F̄(t) > 0
eF (t) := E(X − t|X > t) = F̄(t)
t (2)
⎩
0 otherwise.
For modelling of nonmonotonic ageing phenomena, one typically uses the failure
rate and MRL functions. If failure rate function is used to model nonmonotonic
Reliability Shock Models: A Brief Excursion 25
ageing, then one can get the bathtub failure rate (BFR) class (see Glaser 1980)
and when MRL function is employed, then we get the increasing then decreasing
mean residual life (IDMRL) and new worse then better than used in expectation
(NWBUE) ageing classes introduced by Guess et al. (1986) and Mitra and Basu
(1994), respectively. It is worth noting that the NWBUE family is a rich class of life
distributions encompassing both the IDMRL class as well as all BFR distributions.
We now recapitulate the definitions of the above-mentioned nonmonotonic ageing
classes.
Definition 1.3 A life distribution F is called a bathtub failure rate (BFR) (upside-
down bathtub failure rate (UBFR)) distribution if there exists a point t0 ≥ 0 such
that − ln F̄(t) is concave (convex) on [0, t0 ) and convex (concave) on [t0 , ∞). The
point t0 is referred to as a change point (or turning point) of the distribution in the
BFR (UBFR) sense.
Definition 1.4 A life distribution F with a finite first moment is called an increas-
ing then decreasing mean residual life (IDMRL) (decreasing then increasing mean
residual life (DIMRL)) distribution if there exists a point τ ≥ 0 such that eF (t) is
increasing (decreasing) on [0, τ ) and decreasing (increasing) on [τ , ∞) where eF (t)
is defined as in (2). The point τ is referred to as a change point (or turning point) of
the distribution in the IDMRL (DIMRL) sense.
Definition 1.5 A life distribution F with a finite first moment is called an new
worse then better than used in expectation (NWBUE) (new better then worse than
used in expectation (NBWUE)) distribution if there exists a point x0 ≥ 0 such that
eF (t) ≥ eF (0) for [0, x0 ) and eF (t) ≤ eF (0) for [x0 , ∞) where eF (t) is defined in (2).
The point x0 is referred to as a change point (or turning point) of the distribution in
the NWBUE (NBWUE) sense.
The results of Mitra and Basu (1994) and Khan et al. (2021) establish the fol-
lowing interrelationships between the BFR, IDMRL and NWBUE classes of life
distributions:
Several probabilistic and inferential aspects of the above mentioned ageing classes
have received considerable attention in reliability literature. The properties of inter-
est concerning ageing classes mainly involve preservation of class property under
reliability operations like formation of coherent systems, convolutions and mixtures
as well as issues regarding reliability bounds, moment bounds and moment inequal-
ities, maintenance and replacement policies, closure under weak convergence, etc.
For an extensive discussion on the aforementioned properties and their applications,
one may refer to the outstanding books by Barlow and Proschan (1965, 1975), Zacks
(1991), Lai and Xie (2006), Marshall and Olkin (2007) and others. In this article,
we discuss how the above-mentioned nonparametric ageing classes arise from shock
models.
26 M. Mitra and R. A. Khan
An important aspect of reliability analysis is to find a life distribution that can ade-
quately describe the ageing behaviour of the concerned device. Most of the lifetimes
are continuous in nature and hence several continuous ageing classes have been pro-
posed in the literature. On the other hand, discrete failure data where life is measured
in terms of cycles, completed hours or completed number of shifts, etc., may arise in
several common situations; for example, report on field failure is collected weekly,
monthly and the observations are the number of failures, without specification of the
failure time. The following definitions represent discrete ageing classes.
Definition 2.1 A survival probability P̄k with support on {0, 1, 2, . . .} and with fre-
quency function pk = P̄k−1 − P̄k for k = 1, 2, . . ., and p0 = 1 − P̄0 is said to be or
to have:
(i) discrete IFR (DFR) class if P̄k /P̄k−1
is decreasing (increasing) in k = 1, 2, . . ..
(ii) discrete DMRL (IMRL) class if ∞ j=k P̄j /P̄k is decreasing (increasing) in k =
0, 1, 2, . . ..
1/k
(iii) discrete IFRA (DFRA) class if P̄k is decreasing (increasing) in k = 1, 2, . . ..
(iv) discrete NBU (NWU) class if P̄j P̄k ≥ (≤)P̄j+k for j, k = 0, 1, 2, . . ..
(v) discrete NBUE (NWUE) class if P̄k ∞ j=0 P̄j ≥ (≤)
∞
j=k P̄j for k = 0, 1, 2, . . ..
∞
j
(vi) discrete HNBUE (HNWUE) class if k=j P̄k ≤ (≥) ∞ k=0 P̄k 1 −
∞1
P̄
k=0 k
for j = 0, 1, 2, . . ..
(vii) discrete NBUFR (NWUFR) property if P̄k+1 ≤ (≥)P̄1 P̄k for k = 0, 1, 2, . . ..
(viii) discrete NBAFR (NWAFR) if P̄k ≤ (≥)P̄1−k for k = 0, 1, 2, . . ..
(ix) discrete L (L¯ )class if
∞ ∞
P̄k
P̄k p ≥ (≤)
k k=0
∞ for 0 ≤ p ≤ 1.
k=0
p + (1 − p) k=0 P̄k
These definitions can be found in Esary et al. (1973), Klefsjö (1981, 1983),
Abouammoh and Ahmed (1988).
In the context of nonmonotonic ageing, the first results in this direction were
obtained by Mitra and Basu (1996). They introduced the notions of discrete BFR and
NWBUE distributions. In a similar vein, Anis (2012) provided the discrete version
Reliability Shock Models: A Brief Excursion 27
of the IDMRL ageing class. The discrete versions of these classes can formally be
defined as follows:
Definition 2.2 A sequence (P¯k )∞ k=0 is said to possess the discrete BFR property if
there exists a positive integer k0 such that the following holds:
We say that k0 is the change point of the sequence P̄k (in the BFR sense) and write
(P¯k )∞
k=0 is BFR(k0 ).
Definition 2.3 A sequence (P¯k )∞ k=0 is said to possess the discrete IDMRL (DIMRL)
property if there exists an integer k0 ≥ 0 such that
∞
1 is increasing (decreasing) in k, k < k0 ,
P̄j
P̄k j=k is decreasing (increasing) in k, k ≥ k0 .
The point k0 will be referred to as the change point of the sequence (P¯k )∞
k=0 (in the
IDMRL (DIMRL) sense) and we shall write (P¯k )∞ k=0 is IDMRL(k 0 ) (DIMRL(k 0 )).
Definition 2.4 A sequence (P¯k )∞ k=0 with
∞
j=0 P̄j < ∞ is said to be a discrete
NWBUE (NWBUE) sequence if there exists an integer k0 ≥ 0 such that
∞
∞
≤ (≥) 0 ∀ k < k0 ,
P̄k P̄j − P̄j
≥ (≤) 0 ∀ k ≥ k0 .
j=0 j=k
The point k0 will be referred to as the change point of the sequence (P¯k )∞
k=0
(in the NWBUE (NBWUE) sense) and we shall write (P¯k )∞ k=0 is NWBUE(k0 )
(NBWUE(k0 )).
Reliability shock models are primarily concerned with the following fundamental
question: if the sequence (P¯k )∞
k=0 possesses a certain discrete ageing property (such
as IFR, IFRA, NBU, NBUE, etc.), does H̄ (t), defined via the transformation (1),
inherit the corresponding continuous version of the said ageing property in various
scenarios (i.e. when N (t) is a HPP, NHPP, stationary, or nonstationary pure birth
process, etc.)?
28 M. Mitra and R. A. Khan
One of the simplest models is the homogeneous Poisson shock model studied by
Esary et al. (1973) in which the shocks occur according to a homogeneous Poisson
process with constant intensity λ > 0, i.e.,
(λt)k
P(N (t) = k) = e−λt , k = 0, 1, . . . .
k!
The Poisson shock model has several applications in risk, survival, or reliability
analysis, according as shocks arriving at the system are interpreted as the claims,
the cause of deterioration of health, or the technical reasons for failure in a car or
other machines during functioning. Therefore, it is interesting to study conditions
on the discrete survival probabilities which give rise to the ageing classes under
consideration.
Suppose that the device has probability P¯k of surviving the first k shocks, k =
0, 1, . . .. Then, the survival function of the device is given by
∞
(λt)k ¯
H̄ (t) = e−λt Pk . (3)
k!
k=0
Esary et al. (1973) established that H̄ (t) inherits the IFR, IFRA, NBU, NBUE
and DMRL properties if the sequence (P¯k )∞ k=0 possesses the corresponding discrete
properties. Analogous results for the HNBUE and L classes were proved in Klefsjö
(1981, 1983), respectively. Later, Abouammoh et al. (1988) established such results
for the NBUFR and NBAFR classes of life distributions.
The key tools used to derive most of the above results is the notion of total positivity
and in particular the variation diminishing property (VDP) of totally positive (TP)
functions. In this context, we first recapitulate the definition of a totally positive (TP)
function and a important theorem of Karlin (1968, p. 21).
where x1 < x2 < · · · < xr (xi ∈ A, i = 1, 2, . . . , r) and y1 < y2 < · · · < yr (yi ∈
B, i = 1, 2, . . . , r).
Definition 3.2 A function which is TPn for every n ≥ 1 is called totally positive.
Then,
(i) S(g) ≤ S(f ) provided S(f ) ≤ r − 1.
(ii) f and g exhibit the same sequence of signs in the same direction.
To prove Theorem 3.1 in the context of IFR, IFRA, DMRL and their duals, we use
(λt)r
the VDP of totally positive kernel K(r, t) = e−λt , r = {0, 1, . . .}, t ∈ (0, ∞).
r!
Here, we will present the proof of Theorem 3.1 when C represents the IFRA class
of life distributions. Analogously, using suitable modifications, one can prove this
theorem for IFR and DMRL classes and their respective duals.
To prove the main result, we first need the following proposition.
Proposition 3.1 If for each λ > 0, F̄(t) − e−λt has at most one change of sign from
+ to −, then F is IFRA.
1) 2)
Proof Suppose 0 < t1 < t2 . We want to show that − ln F̄(t
t1
≤ − ln F̄(t
t2
. Define
g(x) = − ln F̄(x) and λi = g(tti i ) , i = 1, 2. Now, F̄(t) − e−λt has at most one change
of sign from + to − for each λ > 0. So this happens for λ1 and λ2 in particular. Thus,
g(x) − λi x has at most one change of sign from − to +. By choice of λi , i = 1, 2,
we have,
≤ λi x for x ≤ ti
g(x)
≥ λi x for x > ti .
30 M. Mitra and R. A. Khan
g(t1 ) g(t2 )
If possible, let λ1 = t1
> t2
= λ2 . Take t1 < x < t2 . Then
g(x) ≥ λ1 x, as x > t1
> λ1 x, as λ1 > λ2
≥ g(x), as x < t2 .
Then, by the variation diminishing property (Theorem 3.2), H̄ (t) − e−(1−η)λt has at
most one change of sign from + to −. Take any θ > 0. Clearly, there exists η ∈ [0, 1]
and some λ > 0 such that θ = (1 − η)λ. Since θ is an arbitrary positive number, we
can conclude that H is IFRA by virtue of Proposition 3.1.
Now we will present the proof of Theorem 3.1 when C represents the NBU
class of life distributions. One can prove (with some modifications) the theorem for
the remaining above-mentioned monotonic ageing classes (see Esary et al. (1973),
Klefsjö (1981, 1983), Abouammoh et al. (1988) among others).
∞
P̄k
k
k
= e−λ(t+x) ∞ (λt)k−j (λx)j
k! j
k=0 j=0
∞
∞
P̄k (λ(t + x))k
= e−λ(t+x) (λx + λt)k = e−λ(t+x) P̄k = H̄ (t + x)
k! k!
k=0 k=0
holds. If
(i) k0 ≤ 3, then H̄ is BFR;
(ii) k0 > 3 and br := rj=0 (P̄j+2 P̄r−j − P̄j+1 P̄r−j+1 )/j!(r − j)! has the same sign
for all r, k0 − 1 ≤ r ≤ 2k0 − 4, then H̄ is BFR.
For details of proofs, one can see the first paper in this direction by Mitra and
Basu (1996). To prove the corresponding result for IDMRL (DIMRL) distributions,
the following lemma due to Belzunce et al. (2007) is crucial.
∞ ∞
∞
(λu)k
H̄ (u)du = e−λu P̄k du
k!
t t k=0
∞
∞
P̄k λk+1
= e−λu uk du
λ (k + 1)
k=0 t
∞ k
P̄k (λt)i −λt
= e
λ i!
k=0 i=0
∞
∞
e−λt (λt)i
= P̄k .
λ i=0 i!
k=i
Now
e−λt ∞ (λt)i ∞
∞ (λt)k ∞
λ i=0 i! k=i P̄k 1 k=0 k! j=k P̄j 1
eH (t) = ∞ −λt (λt)k = ∞ (λt)k = δ(t)
k=0 e P̄k λ k=0 k! P̄k λ
k!
where ∞ (λt)k ∞
k=0 k! j=k P̄j
δ(t) = ∞ (λt)k .
k=0 k! P̄k
In order to prove the theorem, we simply need to show that there exists a t0 ≥ 0 for
which
is increasing in t for t < t0
δ(t) = (4)
is decreasing in t for t ≥ t0 .
∞
Now define αk = j=k P̄j and βk = P̄k in δ(t). Then,
∞
αk 1 is increasing in k for k < k0
= P̄j (5)
βk P̄k j=k is decreasing in k for k ≥ k0
as (P¯k )∞
k=0 has the discrete IDMRL property. Putting s = λt, an application of Lemma
3.5 yields
∞ (λt)k ∞
k=0 k! j=k P̄j φ(s) is increasing (decreasing) in s for s < s0
δ(t) = ∞ (λt)k =
k=0 k! P̄k
ψ(s) is decreasing (increasing) in s for s ≥ s0 .
Thus H is IDMRL.
Reliability Shock Models: A Brief Excursion 33
This shock model was first studied by A-Hameed and Proschan (1973). They
extended the work of Esary et al. (1973) to the context of nonhomogeneous Poisson
process. In this model, shocks arrive according to a nonhomogeneous Poisson pro-
cess with intensity function (t) and event rate λ(t) = d(t)/dt, both defined on
the domain [0, ∞); we take λ(0) as the right-hand derivative of (t) at t = 0. In this
case, the survival function H̄ (t) can be expressed as
∞
[(t)]k
H̄ ∗ (t) = e−(t) P̄k , (6)
k!
k=0
with density
∞
[(t)]k
h∗ (t) = e−(t) λ(t)pk+1 .
k!
k=0
With
suitable
∞ assumptions on (t), A-Hameed and Proschan (1973) established that
if P̄k k=0 has the discrete IFR, IFRA, NBU, NBUE or DMRL property, then the sur-
vival function H̄ (t) has the corresponding continuous property. Klefsjö (1981, 1983)
established the same result for the HNBUE and L classes of life distributions. Later,
analogous theorems for NBUFR and NBAFR families were proved by Abouammoh
et al. (1988). The following theorem collects all the above-mentioned results.
Theorem 3.7 Suppose that
∞
[(t)]k
H̄ ∗ (t) = e−(t) P̄k
k!
k=0
(viii) H̄ ∗ (t) is NBAFR (NWUFR) if (P̄k )∞ k=0
belongs to the discrete NBUFR (NWUFR)
class and (0) = 0 and (t) > t (0) ((t) < t (0)).
(ix) H̄ ∗ (t) is L (L¯ ) if (P̄k )∞ ¯
k=0 belongs to the discrete L (L ) class and (t) is
starshaped (anti-starshaped).
The proofs of the results contained in the above theorem utilize the following lemmas
concerning composition of functions. Note that a nonnegative function f (x) defined
on [0, ∞) is superadditive (subadditive) if f (x + y) ≥ (≤)f (x) + f (y) for all x ≥
0, y ≥ 0. A nonnegative function g is defined on [0, ∞) with g(0) = 0 is said to be
starshaped (anti-starshaped) if g(x)
x
is increasing (decreasing) on (0, ∞).
Lemma 3.8 Let u(t) = u1 (u2 (t)) and let u1 be increasing. Then,
(a) ui convex (concave), i = 1, 2, =⇒ u convex (concave).
(b) ui starshaped, i = 1, 2, =⇒ u starshaped.
(c) ui (αx) ≥ αui (x) for all 0 ≤ α ≤ 1 and x ≥ 0, i = 1, 2, =⇒ u(αx) ≥ αu(x) for
all 0 ≤ α ≤ 1 and x ≥ 0.
(d) ui superadditive (subadditive), i = 1, 2, =⇒ u superadditive (subadditive).
Lemma 3.9 (a) Let F be DMRL (IMRL) and g be an increasing and convex (con-
cave) function. Then K(t) = F(g(t)) is DMRL (IMRL).
(b) Let F be NBUE (NWUE) and g be an increasing and starshaped function (g
be an increasing function such that g(αx) ≥ αg(x), 0 ≤ α ≤ 1). Then K(t) =
F(g(t)) is NBUE (NWUE).
Note that H̄ ∗ (t) = H̄ ((t)) where H̄ (t) is defined as in (3). Now applying Theorem
3.1 and the above lemmas, one can prove Theorem 3.7 after imposing a suitable
condition on (t).
Now consider a pure birth shock model where a system is subjected to shocks gov-
erned by a birth process with intensities λk , k = 0, 1, 2, . . .. Let Vk+1 denote the inter-
arrival time between the k-th and (k + 1)-st shocks. Assume that Vk ’s are independent
with Vk+1 being exponentially distributed with mean 1/λk for k = 0, 1, 2, . . .. Then,
the survival function H̄ (t) of the system can be written as
∞
H̄ (t) = Zk (t)P̄k (7)
k=0
with
Zk (t) := P (N (t) = k) ,
where N (t) is the pure birth process defined above. We assume that the intensities
{λk }∞
k=0 are such that the probability of infinitely many shocks in (0, t] is 0 (i.e.
Reliability Shock Models: A Brief Excursion 35
∞
Zk (t) = 1), which is equivalent to the condition ∞
k=0
−1
k=0 λk = ∞; see Feller
(1968, p. 452).
Note that Zk (t) is TP since the intervals between successive shocks in a stationary
pure birth process are independent (nonidentical) exponential random variables (see
Karlin and Proschan (1960), Theorem 3). To develop sufficient conditions for the
preservation of ageing properties, we will use the following lemmas.
Lemma 3.10 (A-Hameed and Proschan 1975) Let Zk (t) be as defined in (7). Then
t ∞
1
(a) Zk (u)du = Zj (t),
λk
0 j=k+1
∞
1
k
(b) Zk (u)du = Zj (t).
λk j=0
t
With
suitable
∞ assumptions on λk , A-Hameed and Proschan (1973) established that
if P̄k k=0 has the discrete IFR, IFRA, NBU, NBUE or DMRL property, then under
some suitable assumptions on (t), the survival function H̄ (t) has the correspond-
ing continuous property. Klefsjö (1981, 1983) established analogous results for the
HNBUE and L classes of life distributions respectively. Later, similar results for
NBUFR and NBAFR families were proved by Abouammoh et al. (1988). The fol-
lowing theorem collects all the above-mentioned results.
(ix) H̄ (t) is L if
∞
α0
P¯k πk (s) ≥ for s ≥ 0.
1 + sα0
k=0
∞
where α0 = j=0 (P̄j /λj ) and πk (s) is defined by
⎛ ⎞
1 λj
k−1
π0 (s) = and πk (s) = ⎝ ⎠ 1 for k = 1, 2, . . . .
λ0 + s j=0
λ j + s λk + s
Using the variation diminishing property of totally positive (TP) functions and
Lemma 3.10, whenever required, one can prove the above theorem applying argu-
ments analogous to those in the proof of Theorem 3.1.
In this subsection, we treat the more general case in which shocks occur accord-
ing to the following nonstationary pure birth process: shocks occur according to a
Markov process; given that k shocks have occurred in (0, t], the probability of a
shock occurring in (t, t + t] is λk λ(t)t + o(t), while the probability of more
than one shock occurring in (t, t + t) is o(t).
Remark 3.12 Note that in the stationary pure birth process, given that k shocks have
occurred in [0, (t)],
∞ the probability of a shock occurring in [(t), (t) + λ(t)t]
(where (t) = 0 λ(x)dx) is of the same form: λk λ(t)t + o(t). is of the same
form: λk λ(t) + o(). It follows immediately that the pure birth shock model may
be obtained from the stationary pure birth process by the transformation t → (t).
This model was first studied by A-Hameed and Proschan (1975). By making
appropriate assumptions on λk and λ(t), they proved that if the survival probabilities
(P̄k )∞
k=0 possess discrete IFR, IFRA, NBU, NBUE properties (or their duals), then
the continuous time survival probability H̄ (t) belongs respectively to IFR, IFRA,
NBU, NBUE (or their dual) classes. Klefsjö (1981, 1983) proved that the same
results remain valid for the HNBUE and L -families as well. Later, Abouammoh
et al. (1988) showed this for NBUFR and NBAFR classes of life distributions.
The survival function H̄ ∗ (t) of the system for this shock model can be written as
∞
H̄ ∗ (t) = Zk ((t))P̄k , (8)
k=0
t
where Zk (t) is as defined in (7) and (t) = 0 λ(u)du.
Theorem 3.13 Suppose that
∞
H̄ ∗ (t) = Zk ((t))P̄k
k=0
∞
where α0 = j=0 (P̄j /λj ) and πk (s) is defined by
⎛ ⎞
1
k−1
λj
π0 (s) = and πk (s) = ⎝ ⎠ 1 for k = 1, 2, . . . .
λ0 + s λ + s λk + s
j=0 j
Note that H̄ ∗ (t) = H̄ ((t)) where H̄ (t) is defined as in (7). Now applying The-
orem 3.11 together with Lemmas 3.8 and 3.9, one can prove Theorem 3.13 after
imposing a suitable condition on (t).
In this model, each arriving shock causes a damage of random magnitude to the
device. When the accumulated damage exceeds a certain threshold x, the device fails.
Let Xi be the damage caused to the equipment by the i-th shock, i = 1, 2, . . .. We
assume that X1 , X2 , . . . are independent and identically distributed (iid) with some
common distribution function (df) F. We also assume that shocks arrive according
to homogeneous Poisson process with intensity λ > 0. In this case, we shall denote
the survival function of the system by H̄F to indicate the dependence on F.
38 M. Mitra and R. A. Khan
Now,
k+1
k
F [k] (x) = F [k] (x) F [k] (x)
⎛ x ⎞k
= F [k] (x) ⎝ F [k−1] (x − y)dF(y)⎠
0
⎛ ⎞k
1 x k−1
≥ ⎝ F [k] (x) F [k] (x − y) dF(y)⎠ , by (10)
k k
0
⎛ ⎞k
x
≥⎝ F [k]
(x − y)dF(y)⎠ , as F [k] (x) ≥ F [k] (x − y) since F [k] is a df
0
k
= F [k+1] (x) .
1
Raising both sides to the power k(k+1)
, we get,
As P̄k = F [k] in the cumulative damage model, the lemma shows that the condition
1/k
‘P̄k is decreasing in k’ is satisfied here irrespective of the choice of F. We thus have
The assumption that Fk (x) is decreasing in k for each x has the interpretation that
shocks are increasingly more effective in causing damage/wear to the device. Here
∞
(λt)k
H̄ (t) = e−λt (F1 ∗F2 ∗ · · · ∗Fk )(x) (13)
k!
k=0
x
(F1 ∗ F2 ) (x) = F2 (x − y)dF1 (y), asF1 ∗ F2 = F2 ∗ F1
0
x
≤ F2 (x) dF1 (y), as F2 (x) ≥ F2 (x − y)
0
≤ (F1 (x))2 , by (12).
1 1
(F1 ∗ F2 ∗ · · · ∗ Fk−1 ) k−1 ≥ [(F1 ∗ F2 ∗ · · · ∗ Fk )] k , k ≥ 2. (15)
Now
0
! 1
= {(F1 ∗ F2 ∗ · · · ∗ Fk )(x)} k ×
⎤k
x
dFk (y)⎦ , by (15)
k−1
{(F1 ∗ F2 ∗ · · · ∗ Fk )(x − y)} k
0
⎡ ⎤k
x
≥ ⎣ (F1 ∗ F2 ∗ · · · ∗ Fk )(x − y)dFk (y)⎦ ,
0
1
Raising both sides to the power k(k+1)
, we get,
1 1
[(F1 ∗ F2 ∗ · · · ∗ Fk )] k ≥ (F1 ∗ F2 ∗ · · · ∗ Fk+1 ) k+1 (16)
Proof The P¯k defined in (14) satisfy the condition ‘P¯k is decreasing in k’ by virtue
1/k
of the above lemma. The result now follows directly from Theorem 3.1.
This seems to be an appropriate juncture to bring down the curtain on our brief
overview of reliability shock models. We have tried, as far as practicable, to introduce
the notion of shock models and to explain how they can give rise to various ageing
classes (both monotonic and nonmonotonic). If this expository article has interested
the reader sufficiently to pursue the subject of reliability shock models further, the
authors would consider this venture a successful one.
Acknowledgements The authors are indebted to Professor Arnab K. Laha for his constant encour-
agement in making this article a reality and to Mr. Dhrubasish Bhattacharyya for some helpful
discussions during the preparation of this manuscript.
Reliability Shock Models: A Brief Excursion 41
References
A-Hameed, M., & Proschan, F. (1975). Shock models with underlying birth process. Journal of
Applied Probability, 12(1), 18–28.
A-Hameed, M. S., & Proschan, F. (1973). Nonstationary shock models. Stochastic Processes and
their Applications, 1, 383–404.
Abouammoh, A. M., & Ahmed, A. N. (1988). The new better than used failure rate class of life
distributions. Advances in Applied Probability, 20, 237–240.
Abouammoh, A. M., Hendi, M. I., & Ahmed, A. N. (1988). Shock models with NBUFR and NBAFR
survivals. Trabajos De Estadistica, 3(1), 97–113.
Akama, M., & Ishizuka, H. (1995). Reliability analysis of Shinkansen vehicle axle using probabilis-
tic fracture mechanics. JSME international journal. Ser. A, Mechanics and Material Engineering,
38(3), 378–383.
Anderson, K. K. (1987). Limit theorems for general shock models with infinite mean intershock
times. Journal of Applied Probability, 24(2), 449–456.
Anis, M. (2012). On some properties of the IDMRL class of life distributions. Journal of Statistical
Planning and Inference, 142(11), 3047–3055.
Barlow, R. E., & Proschan, F. (1965). Mathematical theory of reliability. New York: Wiley.
Barlow, R. E., & Proschan, F. (1975). Statistical theory of reliability and life testing. Rinehart and
Winston, New York: Holt.
Belzunce, F., Ortega, E.-M., & Ruiz, J. M. (2007). On non-monotonic ageing properties from the
Laplace transform, with actuarial applications. Insurance: Mathematics and Economics, 40(1),
1–14.
Bogdanoff, J. L., & Kozin, F. (1985). Probabilistic models of cumulative damage. New York: Wiley.
Bryson, M. C., & Siddiqui, M. M. (1969). Some criteria for aging. Journal of the American Statistical
Association, 64, 1472–1483.
Campean, I. F., Rosala, G. F., Grove, D. M., & Henshall, E. (2005). Life modelling of a plastic
automotive component. In Proceedings of Annual Reliability and Maintainability Symposium,
2005 (pages 319–325). IEEE.
Dasgupta, A., & Pecht, M. (1991). Material failure mechanisms and damage models. IEEE Trans-
actions on Reliability, 40(5), 531–536.
Durham, S., & Padgett, W. (1997). Cumulative damage models for system failure with application
to carbon fibers and composites. Technometrics, 39(1), 34–44.
Ebrahimi, N. (1999). Stochastic properties of a cumulative damage threshold crossing model. Jour-
nal of Applied Probability, 36(3), 720–732.
Esary, J. D., Marshall, A. W., & Proschan, F. (1973). Shock models and wear processes. Annals of
Probability, 1, 627–649.
Feller, W. (1968). An introduction to probability theory and its applications (Vol. 1). New York:
Wiley.
Garbatov, Y., & Soares, C. G. (2001). Cost and reliability based strategies for fatigue maintenance
planning of floating structures. Reliability Engineering & System Safety, 73(3), 293–301.
Gertsbakh, I., & Kordonskiy, K. B. (2012). Models of failure. Springer Science & Business Media.
Glaser, R. E. (1980). Bathtub and related failure rate characterizations. Journal of the American
Statistical Association, 75(371), 667–672.
Guess, F., Hollander, M., & Proschan, F. (1986). Testing exponentiality versus a trend change in
mean residual life. Annals of Statistics, 14(4), 1388–1398.
Hollander, M., & Proschan, F. (1984). Nonparametric concepts and methods in reliability. In P. R.
Krishnaiah & P. K. Sen (Eds.), Handbook of statistics (Vol. 4, pp. 613–655). Amsterdam: Elsevier
Sciences.
Karlin, S. (1968). Total positivity, (Vol. 1). Stanford University Press.
Karlin, S., & Proschan, F. (1960). Pólya type distributions of convolutions. The Annals of Mathe-
matical Statistics, 721–736.
42 M. Mitra and R. A. Khan
Khan, R. A., Bhattacharyya, D., & Mitra, M. (2021). On classes of life distributions based on the
mean time to failure function. Journal of Applied Probability, 58(2), in-press.
Klefsjö, B. (1981). HNBUE survival under some shock models. Scandinavian Journal of Statistics,
8, 39–47.
Klefsjö, B. (1983). A useful ageing property based on the Laplace transformation. Journal of Applied
Probability, 20, 615–626.
Kochar, S. C. (1990). On preservation of some partial orderings under shock models. Advances in
Applied Probability, 22(2), 508–509.
Lai, C. D., & Xie, M. (2006). Stochastic ageing and dependence for reliability. New York: Springer.
Loh, W. Y. (1984). A new generalization of the class of NBU distributions. IEEE Transactions on
Reliability, R-33, 419–422.
Lukić, M., & Cremona, C. (2001). Probabilistic optimization of welded joints maintenance versus
fatigue and fracture. Reliability Engineering & System Safety, 72(3), 253–264.
Marshall, A. W., & Olkin, I. (2007). Life distributions. New York: Springer.
Mitra, M., & Basu, S. (1994). On a nonparametric family of life distributions and its dual. Journal
of Statistical Planning and Inference, 39, 385–397.
Mitra, M., & Basu, S. K. (1996). Shock models leading to non-monotonic ageing classes of life
distributions. Journal of Statistical Planning and Inference, 55, 131–138.
Nakagawa, T. (2007). Shock and damage models in reliability theory. Springer Science & Business
Media.
Padgett, W. (1998). A multiplicative damage model for strength of fibrous composite materials.
IEEE Transactions on Reliability, 47(1), 46–52.
Pellerey, F. (1993). Partial orderings under cumulative damage shock models. Advances in Applied
Probability, 25(4), 939–946.
Pérez-Ocón, R., & Gámiz-Pérez, M. L. (1995). On the HNBUE property in a class of correlated
cumulative shock models. Advances in Applied Probability, 27(4), 1186–1188.
Petryna, Y. S., Pfanner, D., Stangenberg, F., & Krätzig, W. B. (2002). Reliability of reinforced
concrete structures under fatigue. Reliability Engineering & System Safety, 77(3), 253–261.
Rolski, T. (1975). Mean residual life. Bulletin of the International Statistical Institute, 46, 266–270.
Satow, T., Teramoto, K., & Nakagawa, T. (2000). Optimal replacement policy for a cumulative
damage model with time deterioration. Mathematical and Computer Modelling, 31(10–12), 313–
319.
Shanthikumar, J. G., & Sumita, U. (1983). General shock models associated with correlated renewal
sequences. Journal of Applied Probability, 20(3), 600–614.
Sobczyk, K., & Spencer, B, Jr. (2012). Random fatigue: From data to theory. Academic Press.
Sobczyk, K., & Trebicki, J. (1989). Modelling of random fatigue by cumulative jump processes.
Engineering Fracture Mechanics, 34(2), 477–493.
Yamada, K. (1989). Limit theorems for jump shock models. Journal of Applied Probability, 26(4),
793–806.
Zacks, S. (1991). Introduction to reliability analysis. New York: Springer.
Explainable Artificial Intelligence Model:
Analysis of Neural Network Parameters
1 Introduction
In recent years, there has been growing interest of extracting patterns from data using
artificial neural network (ANN)-based modelling techniques. The use of these models
in the real-life scenarios is becoming primary focus area across different industries
and data analytics practitioners. It is already established that the ANN-based models
provide a flexible framework to build the models with increased predictive perfor-
mance for the large and complex data. But unfortunately, due to high degree of
complexity of ANN models, the interpretability of the results can be significantly
reduced, and it has been named as “black box” in this community. For example, in
banking system to detect the fraud or a robo-advisor for securities consulting or for
opening a new account in compliance with the KYC method, there are no mech-
anisms in place which make the results understandable. The risk with this type of
complex computing machines is that customers or bank employees are left with a
series of questions after a consultancy or decision which the banks themselves cannot
answer: “Why did you recommend this share?”, “Why was this person rejected as a
customer?”, “How does the machine classify this transaction as terror financing or
money laundering?”. Naturally, industries are more and more focusing on the trans-
parency and understanding of AI when deploying artificial intelligence and complex
learning systems.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 43
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_4
44 S. K. Pal et al.
Probably, this has opened a new direction of research works to develop various
approaches to understand the model behaviour and the explainability of the model
structure. Recently, Joel et al. (2018) has developed explainable neural network model
based on additive index models to learn interpretable network connectivity. But it is
not still enough to understand the significance of the features used in the model and
the model is well specified or not.
In this article, we will express the neural network (NN) model as nonlinear regres-
sion model and use statistical measures to interpret the model parameters and the
model specification based on certain assumptions. We will consider only multilayer
perceptron (MLP) networks which is a very flexible class of statistical procedures. We
have arranged this article as: (a) explain the structure of MLP as feed-forward neural
network in terms of nonlinear regression model, (b) the estimation of the parameters,
(c) properties of parameters and their asymptotic distribution, (d) simulation study
and conclusion.
In this article, we have considered the MLP structure given in Fig. 1. Each neural
network can be expressed as afunction of explaining variable X = x1 , x2 , . . . , x p
and the network weights ω = γ , β , b where α is the weights between input and
hidden layers, β is the weights between hidden and output layers and b is the bias
of the network. This network is having the following functional form
Fig. 1 A multilayer perceptron neural network: MLP network with three layers
Explainable Artificial Intelligence Model … 45
H
I
F(X, ω) = βh g( γhi xi + bh ) + b00 (1)
h=1 i=1
where the scalars I and H denote the number of input and hidden layers of the net-
work and g is a nonlinear transfer function. The transfer function g can be considered
as either logistic function or the hyperbolic tangent function. In this paper, we have
considered logistic transfer function for all the calculation. Let us assume that Y is
dependent variable and we can write Y as a nonlinear regression form
Y = F(X, ω) + (2)
where is i.i.d normal distribution with E[] = 0, E[ ] = σ I . Now, Eq. (2) can
be interpreted as parametric nonlinear regression of Y on X . So based on the given
data, we will be able to estimate all the network parameters. Now the most important
question is what would be the right architecture of the network, how we can identify
the number of hidden units in the network and how to measure the importance of those
parameters. The aim is always to identify an optimum network with small number
of hidden units which can well approximate the unknown function (Sarle 1995).
Therefore, it is important to derive a methodology not only to select an appropriate
network but also to explain the network well for a given problem.
In the network literature, available and pursued approaches are r egulari zation,
stopped-training and pr uning (Reed 1993). In r egulari zation methods, we can
minimize the network error (e.g. sum of error square) along with a penalty term to
choose the network weights. In the stopped-training data set, the training data set
split into training and validation data set. The training algorithm is stopped when the
model errors in the validation set begin to grow during the training of the network,
basically stopping the estimation when the model is overparameterized or overfitted.
It may not be seen as sensible estimates of the parameters as the growing validation
error would be an indication to reduce the network complexity. In the pr uning
method, the network parameters are chosen based on the “significant” contribution
to the overall network performance. However, the “significance” is not judged by
based on any theoretical construct but more like a measure of a factor of importance.
The main issue with r egulari zation, stopped-training and pr uning is that they
are highly judgemental in nature which makes the model building process difficult
to reconstruct. In transparent neural network (TRANN), we are going to explain the
statistical construct of the parameters’ estimation and their properties through which
we explain the statistical importance of the network weights and will address well the
model misspecification problem. In the next section, we will describe the statistical
concept to estimate the network parameters and their properties. We have done a
simulation study to justify our claim.
46 S. K. Pal et al.
where ωi j is the weight from neuron j to neuron i, si is the output, and neti is the
weighted sum of the inputs of neuron i. Once the partial derivatives of each weight
are known, then minimizing the error function can be achieved by performing
ω̌t+1 = ω̌t − ηt [−∇ F(X t , ω̌t )] [Yt − F(X, ω̌t )], t = 1, 2, . . . , T (4)
Based on the assumptions of the nonlinear regression model (2) and under some
regularity conditions for F, it can be proven (White 1989) that the parameter estimator
ω̂ is consistent with asymptotic normal distribution. White ((White, 1989)) had shown
that the parameter estimator an asymptotically equivalent estimator can be obtained
from the backpropagation estimator using Eq. (4) when ηt is proportional to t −1 as
−1
T
ω̂t+1 = ω̌t + ∇ F(X t , ω̌t ) ∇ F(X t , ω̌t ) (5)
t=1
T
× ∇ F(X t , ω̌t ) [Yt − F(X, ω̌t )], t = 1, 2, . . . , T
t=1
In that case, the usual hypothesis test like Wald test or the LM test for nonlinear
models can be applied. Neural network belongs to the class of misspecified models as
it does not map to the unknown function exactly but approximates. The application
of asymptotic standard test is still valid as the misspecification can be taken care
through covariance matrix calculation of the parameters (White 1994). The estimated
parameters ω̂ are normally distributed with mean ω ∗ and covariance matrix T1 C. The
parameter vector ω ∗ can be considered as best projection of the misspecified model
onto the true model which lead to:
√
T (ω̂ − ω ∗ ) ∼ N (0, C) (6)
Explainable Artificial Intelligence Model … 47
where the T denotes the number of observations. As per the theory of misspecified
model (Anders 2002), the covariance matrix can be calculated as
1
= A−1 B A−1 (7)
T
The hypothesis tests for significance of the parameters are an instrument for any
statistical models. In TRANN, we are finding and eliminating redundant inputs from
the feed-forward single layered network through statistical test of significance. This
will help to understand the network well and will be able to explain to network
connection with mathematical evidence. This will help to provide a transparency to
the model as well. The case of irrelevant hidden units occurs when identical optimal
network performance can be achieved with fewer hidden units. For any regression
method, the value of t-statistic plays an important role for hypothesis testing whereas
it is overlooked in neural networks. The non-significant parameters can be removed
from the network, and the network can be uniquely defined (White 1989). This is
valid for linear regression as well as neural networks. Here, we estimate the t-statistic
as
ω̂k − ω H0 (k)
(8)
σ̂k
where ω H0 (k) denotes the value or the restrictions to be tested under null hypothesis
H0 . The σ̂k is the estimated standard deviation of the estimated parameter ω̂k . Later,
we have estimated the variance–covariance matrix Ĉ where the diagonal elements
are ωk and the Ĉ can be estimated as
1
Ĉ = Â−1 B̂ Â−1 (9)
T
1 ∂ 2 S Et
T T
∂ F(t, ω̂) ∂ F(t, ω̂)
Â−1 = and B̂ −1
= ˆ2t ( )( ) (10)
T t=1 ∂ ω̂∂ ω̂ t=1
∂ ω̂ ∂ ω̂
5 Simulation Study
6 Conclusion
Neural networks are a very flexible class of assumptions about the structural form of
the unknown function F. In this paper, we have used nonlinear regression technique to
explain the network through statistical analysis. The statistical procedures usable for
model building in neural networks are significance test of parameters through which
an optimal network architecture can be established. In our opinion, the transparent
neural network is a major requirement to perform a diagnosis of neural network
architecture which not only approximates the unknown function but also explains
the network features well through the statistical nonlinear modelling assumptions.
As a next step, we would like to investigate more on the deep neural networks based
on the similar concepts.
Acknowledgements We use this opportunity to express our gratitude to everyone who supported
us in this work. We are thankful for their intellectual guidance, invaluable constructive criticism
and friendly advice during this project work. We are sincerely grateful to them for sharing their
truthful and illuminating views on a number of issues related to the project. We express our warm
thanks to our colleagues Koushik Khan and Sachin Verma for their support to write code in Python
Explainable Artificial Intelligence Model … 51
and R. We would also like to thank Prof. Debasis Kundu from IIT Kanpur who provided the valuable
references and suggestions for this work.
References
Anders, U. (2002). Statistical model building for neural networks. In 963 Statistical Model Building
for Neural Networks.
Joel, V., et al. (2018). Explainable neural networks based on additive index model. arXiv.
Reed, R. (1993). Pruning algorithms—A survey. IEEE Transactions on Neural Networks, 4, 740–
747.
Rumelhart, D. E., et al. (1986). A direct adaptive method for faster backpropagation learning-the
rprop algorithm. Parallel distributed Processing.
Sarle, W. S. (1995). Stopped training and other remedies for overfitting. In Proceedings of the 27th
Symposium on the Interface.
White, H. (1994). Estimation, inference and specification analysis. Cambridge University Press.
White, H. (1989). Learning in neural networks: A statistical perspective. Neural Computation, 1,
425–464.
Style Scanner—Personalized Visual
Search and Recommendations
1 Introduction
Fashion is driven more by visual appeal than by any other factor. We all buy clothes
that we think ‘looks good’ on us. Today, a large portion of e-commerce is driven by
fashion and home décor which includes apparels, bags, footwear, curtains, sofas, etc.
Any search engine that does not have the capability to ‘see’ will not work in a way
that humans do. The need for visual search becomes the next phase of evolution of
e-commerce search engines.
Present e-commerce search engines lack in this regard. They use metadata to look
for similarity between clothes which does not capture finer details of the apparel
and design. We see several recommendation engines that use collaborative filter-
ing to recommend apparels and other products. But collaborative filtering cannot
understand the nuances of visual design as perceived by humans.
To understand the importance of visual search, let us understand the typical stages
of buying fashion products. Whether one buys offline or online, human behaviour
tends to remain the same. On a digital platform, it gets restricted by the features of
the platform. When one decides to buy clothes, they have a category such as shirt,
tee or dress in mind. But details of design and features are rarely pre-decided.
For shopping offline or online, the first stage is exploration. Several designs and
patterns are explored before narrowing down the search to few designs. The second
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 53
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_5
54 A. Kushwaha et al.
stage begins at this point when one investigates their chosen designs in greater detail.
At this second stage, given the exploratory nature of people, there is a tendency to look
for clothes that are similar to the chosen one. In offline stores, this is evinced from
customers asking the sales person for variations of the chosen design—clothes with a
different colour, a slight variation in design, same pattern with different borders, etc.
The transition from first to second stage is bidirectional, meaning, after looking for
similar clothes the person may again go back to exploration stage. And the process
continues.
On e-commerce platforms, for stage one, retrieval algorithm should focus on
keeping high variance. This means the first search should show a variety of clothes,
not similar ones. One can also add taste, body size, etc., to this algorithm. For second
stage, the search algorithm should retrieve choices that look similar to the ones chosen
in the first stage. This helps in faster decision-making. Through this, the customer
already has a selection of similar items to choose from and therefore can focus on
the final selection.
2 Related Work
Many models have been developed in the past to capture the notion of similar-
ity. Image features are extracted using traditional computer vision algorithms like
SIFT Lowe (1999) and HOG Dalal and Triggs (2005), and on top of them image
similarity models are learned. These have been studied in Boureau et al. (2010),
Chechik et al. (2010), Taylor et al. (2011). Traditional computer vision (CV) tech-
niques have limited expressive power and hence have not worked very well. With
recent developments in neural network, deep CNNs have been used with great suc-
cess for object recognition Krizhevsky et al. (2012), Simonyan et al. (2014), Szegedy
et al. (2015), detection, etc. They have been able to take CV to the next level where
they have become effective enough to be applied to real-life problems with great
results. In CNNs, successive layers learn to represent the image with increasing level
of abstraction. The final layer contains abstract descriptor vector. CNN is robust to
illumination variation, in-image location variation, occlusion, etc. Deep CNNs have
been made successful with residual networks He et al. (2016) and have been able to
truly learn image features.
In real-life applications, we may not be interested in the similarity between cats
and dogs but we are interested in similarity between two dogs. Similarly, for fashion
search, we are interested in similarities in various aspects of products such as print
designs. To be able to recognize an object, the final layer must understand the image
at the highest level of abstraction (cat, dog, horse). But the notion of similarity lies
at the lower levels.
Therefore, similarity models require lower-level CNN feature abstraction as out-
put too . Learning finer details along with abstract feature has been studied by Wang
et al. (2014). We will make use of this architecture for improved learning.
Style Scanner—Personalized Visual Search and Recommendations 55
Siamese network has also been used with contrastive loss by Chopra et al. (2005),
for similarity assessment. In this architecture, there are two CNNs with shared
weights with binary output. This helps to find similarity between two clothes but
does not capture fine-grained similarity. These networks have proved to be good for
certain face verification tasks but not for design similarity.
Wang et al. (2016) propose a deep Siamese network with a modified contrastive
loss and a multitask network fine-tuning scheme. Their model being a Siamese net-
work also suffers from the same limitations discussed above.
Image similarity using triplet networks has been studied in Wang et al. (2014),
Lai et al. (2015). The pairwise ranking paradigm used here is essential for learning
fine-grained similarity. Further, the ground truth is more reliable as it is easier to
label relative similarity than absolute similarity.
3 Our Approach
We built this tool as an enterprise solution. We soon realized the same solution would
not work with all enterprises. Each enterprise will have different data. For example,
one may have metatag level info for each apparel but others might not. It is a difficult
task to generate such a huge amount of data manually. Hence, building different
models became necessary.
Our core approach has been to learn embeddings which captures the notion of
similarity. We achieved this using CNNs that learn visual feature with various types of
output architecture and loss function depending upon the type/level of data available.
We built three models. The first one was with least amount of data. We had category-
level information about the clothes (shirt, shorts, tees, tops, etc.). Using this, we
designed a classification model with Softmax loss and used it to get embeddings
from the second last layer after training. The idea was that the Softmax loss would
cluster the same category clothes and within each cluster the intra-class distribution
would be in accordance with similarity.
The second model we built was using metadata tags for each item. We modelled
this as tag detection problem (multi-class and multi-label) with sigmoid loss and
used the embeddings from second last layer to find similarity. This model performed
much better than the previous one, not because of the loss function, but because it
had more data for supervised learning.
The third one we built was modelled on Wang et al. (2014). In this deep CNN
architecture, two more shallow branches were added to capture fine-grained simi-
larity. The deep branch we chose was Inception and VGG16, Fig. 1. We found that
VGG16 performed better than Inception. The training is based on triplet approach
with ranking loss. This significantly improved the model performance as ranking loss
objective is the same as ranking clothes based on similarity. The dataset creation for
triplet loss is complicated because of the subjective nature of similarity. However, it
is very simple in case of learning embeddings for face recognition system (as the two
photographs are either of same person or of different persons, i.e. no subjectivity).
56 A. Kushwaha et al.
Fig. 1 Multi-scale network architecture. Each image goes through one deep and two shallow
branches. The number shown on the top of an arrow is the size of the output image or feature. The
number shown on the bottom of a box is the size of the kernels for the corresponding layer
For the triplet loss dataset, each training data element consists of 3 images
<a,p,n>, an anchor image <a>, a positive image <p>, a negative image <n>.
<a,p> pair of images are expected to be more visually similar than <a,n>. So, the
dataset needs to be labelled for relative similarity than absolute similarity. Although
this is not simple, it is doable.
We use two types of triplets, in-class and out-of-class triplets. The out-of-class
triplet has easy negatives which helps to learn coarse-grained differences. This dataset
is easy to make. The in-class triplet has hard negatives, meaning visually on broad
levels they look like the anchor image but on finer level are not that similar (different
stripe widths, etc.). This helps in learning fine-grained distinctions, and the model
becomes sensitive to difference in pattern and colour. In-class triplet is harder to
create, and its quality affects the performance of the final model.
For our first model, the classification model, dataset is comprised of catalogue images
taken in professional studio lighting conditions. The labels corresponded to their
category. For the second model, the multi-class, multi-label model, dataset consisted
of catalogue images with metatags (like stripe, blue, short, etc.). Each image had
several tags defining its visual characteristics. This has been manually labelled. We
chose this model as we had data available. But if this metadata is not available, this
model cannot be used.
Style Scanner—Personalized Visual Search and Recommendations 57
For the triplet model, as described earlier, data creation is not straightforward,
and one cannot do it manually for millions of image (metadata creation has been the
work of many years). We adopted an algorithmic way of labelling data for relative
similarity. For this, we create multiple basic similarity models (BMSs) that focus on
various aspects of similarity. Few of them are as follows: Our first model served as a
BMS that focusses on courser details. Others were ColorHist which is LAB colour
histogram of image foreground and PatternNet which is model trained to recognize
pattern (stripes, checks, etc.) for which we used metadata. We also used our second
model as one of the BMSs. These BMSs were built using metadata information, and
the type of BMS used was dependent on the information available.
To form a triplet, an anchor image a is selected, and a positive image is randomly
selected from a set of 200 positive images. This set is formed by the following
process: each BMS identifies 500 nearest neighbours to anchor image a, and top 200
from the union of all BMS is taken as the set of positive images. For in-class negative
image, union of all BMS image ranked between 500 and 1000 is taken as sample set
and from this an image is randomly selected for in-class negative. Out-class negative
sample space is rest of the universe within the category group. The final set contained
20% in-class negative and 80% out-class negatives. For this exercise, these numbers
were arbitrarily chosen, but further fine-tuning can be done to find the optimal ratio.
5 Implementation Details
7 Production Pipeline
We productionized the whole system to cater to end user needs. There were trade-offs
involved between cost and quality. Our present components include:
Embedding calculation service: We created CPU-based service with elastic load
balancer to calculate the embedding vector for new items added from second last
layer of our CNN. The GPU is not used for this as it is not cost effective. GPU is used
only for training. These embeddings were saved in a distributed file system (Azure).
Nearest neighbour search: We used open-source annoy library for finding nearest
neighbours. We did not apply approximate nearest neighbour technique, locality sen-
sitive hashing (LSH), as we found it significantly decreases model performance. We
kept our embedding vector dimension at 128. Although higher-dimensional embed-
dings were giving better results, they drastically increased execution time of nearest
neighbour search.
We calculated nearest neighbour for new items added and updated that for the
past ones. We also needed to update it whenever an item was deleted.
Reduction in search space: Since we had metadata (shirt, shorts), (male, female),
etc., for each image, we effectively used this to reduce our search space for faster
execution.
8 Future Developments
In one of our latest experiments, we have observed that training the model with
removed background enhances model performance. This is one of the areas we are
working on for further improvement. The background removal must be automatic and
accurate. GrabCut Rother et al. (2004), an image segmentation method, can be used
for background removal semiautomatically. By using data created by this technique,
we can train one more CNN model to automatically remove the background.
Style Scanner—Personalized Visual Search and Recommendations 59
9 Conclusion
References
Boureau, Y.-L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition.
In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp.
2559–2566). IEEE.
Chechik, G., Varun S., Uri, S., & Samy, Bengio. (2010). Large scale online learning of image
similarity through ranking. Journal of Machine Learning Research, 11(3).
Chopra, S., Raia, H., & Yann, L. (2005). Learning a similarity metric discriminatively, with appli-
cation to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR’05) (Vol. 1, pp. 539–546). IEEE.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer
Vision and Pattern Recognition. CVPR 2005 (pp. 886–893). IEEE Computer Society Conference
on 1.
He, K., Xiangyu, Z., Shaoqing, R., & Jian, S. (2016). Deep residual learning for image recognition.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–
778).
Krizhevsky, A., Ilya, S., & Hinton, G. E. (2012). Imagenet classification with deep convolutional
neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Lai, H., Yan, P., Ye, L., & Shuicheng, Y. (2015). Simultaneous feature learning and hash coding with
deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 3270–3278).
Lowe, D. G. (1999). Object recognition from local scale-invariant features. Paper presented at the
meeting of the Proceedings of the International Conference on Computer Vision ICCV, Corfu.
Rother, Carsten, Kolmogorov, Vladimir, & Blake, Andrew. (2004). " GrabCut" interactive fore-
ground extraction using iterated graph cuts. ACM Transactions on Graphics (TOG), 23(3), 309–
314.
Simonyan, K., & Andrew, Z. (2014). Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556.
Szegedy, C., Wei, L., Yangqing, J., Pierre, S., Scott, R., Anguelov, D., Erhan, D., et al. (2015). Going
deeper with convolutions, 1–9. In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR). IEEE, New Jersey.
Taylor, G. W., Spiro, I., Christoph, B., & Rob, F. (2011). Learning invariance through imitation. In
CVPR 2011 (pp. 2729–2736). IEEE.
Wang, X., Sun, Z., Zhang, W., Zhou, Y., & Jiang, Y.-G. (2016). Matching user photos to online
products with robust deep features. In Proceedings of the 2016 ACM on International Conference
on Multimedia Retrieval (pp. 7–14).
Wang, J., Yang, S., Thomas, L., Chuck, R., Jingbin, W., James, P., et al. (2014). Learning fine-grained
image similarity with deep ranking. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (pp. 1386–1393).
Artificial Intelligence-Based Cost
Reduction for Customer Retention
Management in the Indian Life
Insurance Industry
1 Introduction
Max Life Insurance (Max Life) is the largest non-bank private life insurer in India
with total revenue of Rs. 12,501 crore and assets of Rs. 52,237 crore in FY 2017–18.
Max Life Insurance is a joint venture between Max India Ltd. and Mitsui Sumitomo
Insurance Co., Ltd. It offers comprehensive life insurance and retirement solutions
for long-term savings and protection to 3.5 million customers. It has a countrywide
diversified distribution model including the agent advisors, exclusive arrangement
with Axis Bank and several other partners.
1.2 Background
Unlike many other industries, customer retention pays a very critical for life insurance
companies. On an average, life insurers become profitable only after 6–7 renewal
premiums are paid by the customer. Customer retention efficiency is defined by
persistency ratio, which is one of the most important metrics for the life insurance
companies. Persistency is measured by the percentage of policies renewed year on
year.
According to the Insurance Regulatory and Development Authority of India
(IRDAI), insure@nceiknowledge admin experts speak (2018), in 2015–16, the
average persistency rate for life insurance policies in 13th month after issuance
was 61%, indicating that only 61% policies paid their first renewal premium as on
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 61
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_6
62 S. Thawakar and V. Srivastava
13th month. Globally, the persistency is around 90% in the 13th month and over 65%
after 5 years of policy issuance, while the acceptable percentage of persistency in
life insurance is 80% for polices after 3 years and 60% after 10 years.
With the stringent persistency guidelines issued by IRDAI to protect policy
holder’s interests, retention management has become a key focus area for the life
insurance companies.
According to a survey conducted by data aggregation firm LexisNexis Risk Solu-
tions (Jones 2017), factors including lack of need-based selling, lack of identification
of changing needs of the customer, limited payment reminders and inefficiencies
in retention management contribute to lower than benchmark persistency for life
insurers in India. Life insurance companies are spending top dollars on customer
retention management, and improving persistency along with reduced retention costs
has become a key focus area.
Max Life has a base of nearly 3.5 million active policy holders, comprising freshly
acquired policy holders and existing policy holders. Income from renewal premiums
accounts for 70% of the total revenue, representing the future cash flows, and plays
an important role in company’s profitability and valuation.
For Max Life, the renewal income collection and persistency are managed by the
company’s customer retention operations. Max Life’s customer retention operations
primarily include reaching out to customers through telephonic renewal calls, or other
mediums such as mobile text messages, to remind them to pay renewal premiums
on time. This retention effort is made to maximize both persistency and renewal
premium collection. Figure 1 shows the overview of Max Life business operations
along with key activities of customer retention operations.
As the policy tenure increases, the retention rates keep decreasing and thus signifi-
cant effort is required to retain customers over time. There are many different reasons
contributing to the discontinuation of policies. Within Max Life, these factors are
13th Month 25th Month 37th Month 49th Month 61st Month
The retention operations focus primarily on the premium payment reminders and
renewal income collections. The retention customer contact team reaches out to
customers, due for renewal payment through calls, SMS, emails and physical chan-
nels. The reminder calls typically start 15 days before the renewal payment due date
and go on till 180 days after the due date. On an average, each policy holder is
given 9–10 reminder calls during the period of 180 days, starting the policy due date.
Efforts are stopped for the cycle, when the customer pays the renewal premium. If
the customer does not pay till 180 days after the due date, the policy is considered to
be lapsed. Figure 3 provides an overview of activities performed for policy renewal
payment collections.
In the absence of any analytical model, every policy holder was reminded for
renewal payments.
2 Source of Data
All the data used in the study is of Max Life Insurance except some macroeco-
nomic data published on Indian government Web sites. Various workshops were
held between the analytics team, customer retention team and the IT team to iden-
tify the relevant variables. A map of customer journey with Max Life was prepared
to identify customer touch points and map the relevant data available with logical
considerations for the data points impacting lapsation rates. Figure 4 lists the different
types of data used for the research.
Policy Lapsed
No
Yes
Payment
processing and
receipt
Policy Data related to policy type, premiums, issuance TAT related details
The policy-related variables were considered to understand the journey of the policy
from issuance, nature of preference from the customer in terms of type of policy, its
alignment from the customer demographics and needs. Policy premium-related vari-
ables indicate amount of investment done as compared with the income impacting
commitment for policy continuation. Past due payments and policy-related trans-
actions were considered to understand the commitment toward policy continuation
(Table 1).
Agents and distribution partners play an important role in the quality of sales done,
and quality of sales is an important contributor to lapsation. Max Life has different
types of sales channels including a network of Max Life agents and distribution
partner banks. The seller- and agent-related variables captured information such as
agent or seller demographics, their experience in sales and quality of sales done,
volume of business done for Max Life (Table 3).
Sales branches play a significantly important role in both getting new customers
and maintaining relationships with existing customers. Branch performance metrics
such as branch efficiency and past persistency trends for policies sold from branches
become important indicators in understanding lapse rates. Geography as an important
indicator of quality of life and well-being of customers is a determinant to customer
interest and awareness about the life insurance products. Geography of branch, hence,
was hypothesized to play an important role in determining lapsation (Table 4).
Artificial Intelligence-Based Cost Reduction for Customer Retention … 67
Data related to product consisted of variables such as type of product, for example
if the product was Traditional Life Insurance plan or Unit Linked Insurance Plan
(ULIP) or Online Term plan. Type of plan selected by a customer is an important
indicator of alignment of customer need with the chosen plan. Apart from that,
some plans are very popular and have better market performance. Such differential
performance across plans was expected to be an important driver of lapsation (Table
5).
The service request data consisted of dispositions from past touch points with
the customers. These variables play a very important role in understanding lapse
68 S. Thawakar and V. Srivastava
rates. Intuitively, dissatisfied customers have negative dispositions and are likely to
discontinue the policy and maybe difficult to retain (Table 6).
3 Methodology
The overall methodology involved applying the research to build a solution which
can be used to solve the business problem at hand. A 7-step process was followed
to build a deployable solution. The solution was used to classify renewal policies
on the basis of propensity to lapse. Figure 5 shows the seven steps followed for the
research methodology.
Define
Data model Data Data Model Model Solution
Model
& Pipeline Exploration Preparation building validation deployment
objective
Fig. 6 Overview of how the solution is expected to solve the business problem
Following the business objective, the research aimed at optimizing the customer
retention costs and improving the renewal collections, through executing a differ-
ential retention calling strategy for renewal policy book. This differential strategy
would be based on classification of policies on propensity to lapse. The research
objective was to build an advanced analytics model to determine the probability of
a policy to lapse within 180 days from the due date of renewal payment. Once the
lapsation probability scores could be obtained, policies would be clubbed to define
3 key segments, high risk, medium risk and low risk of lapsation. From the strategy
execution point of view, high-risk customers would be given more and frequent
reminders to maximize collection. The medium-risk customers can be reached out
by digital means such as SMS and emails, which are less expensive than telephonic
calls. The number of renewal calls for the Low Risk customers can be minimized as
they represent customers who require less intervention to pay the renewal premiums.
Figure 6 provides an illustrative view of end state as envisioned from the output of
the research.
Max Life has a wide landscape of technology systems used for different business
processes across the value chain. The data required for the research resided in 8–10
different systems, and as a first step, a data model and pipeline were required to
be built by collating all the required data at one place. The required datasets were
combined, and first level of processing was done to create a centralized database.
This was done using Informatica and Base SAS (Statistical Analysis System). The
processed dataset was then used for performing the next steps.
70 S. Thawakar and V. Srivastava
Exploratory data analysis was carried out as a first step to prepare the data. Descriptive
analysis was done to identify distributions and isolate missing values and outliers.
A few numerical variables were found to be skewed, because of which variable
transformation was done to reduce the level of skewness.
A number of data processing steps were required, before the data can be used for
building the analytical models. One of the most important considerations was the
form of the data. Since 3.5 years of past renewal data was used, there were multiple
due dates for same policies present in the data. For example, an annual renewal
mode policy will have due dates every year and hence the policy will appear 3 or 4
times in the entire dataset. Hence, policy number, which is a unique identifier for the
policy, could not be used directly as unique identifier. Policy number along with the
due date for renewal payment was used as the unique identifier for each row of the
dataset. Data processing was a 5-step process, as illustrated in Fig. 7.
A few variables, such as client income, were found to have some outliers, and the
outlier treatment was done using flooring and capping techniques.
Since the data was coming directly from the source systems, some variables had
missing values which were required to be cleaned. Variables such as agent education
and client industry, which had more than 20% missing values, were excluded from
the dataset. In some cases, the absence of values could not be considered as a missing
value. One example is nominee demographics. If the policy has no nominee, then
the nominee details were not present for that policy. As such variables cannot be
directly used in the model, these variables were marked using a flag indicating if
a nominee is present for a policy or not, and these flags were used instead. Other
important variables such as client income were found missing for 8% of policies, and
these were imputed using proc MI, with appropriate parameters, which uses different
algorithms to impute missing values.
Apart from the raw variables, a number of derived variables were also calculated
to increase the explainability of lapsation behavior. Although, theoretically, derived
variables and variable reduction techniques are not required for deep learning models,
since we also build supervised machine learning models such as logistic regression,
some derived variables were created. Largely, these derived variables consisted of
ratios for direct variables. For example, instead of considering number of policies
issued by a branch and number of policies active from that branch, we considered
a ratio of these two variables to arrive at a relative measure, which are comparable
across branches. There were 20 derived variables prepared to be used in the models.
% of Lapse Rate
policies
3.4.5 Normalization
3.4.6 Encoding
The categorical independent variables were converted to binary values, before they
could be used in the model. This process is called as one-hot encoding. While it
converts the categorical variables to numeric, it also ensures that individual effect of
specific categories is captured independently, rather than the combined effect of that
categorical variable.
The dependent variable considered was a binary variable indicating whether a policy
lapsed (1) or not (0). A policy is considered lapsed if the customer does not pay
the renewal premium amount till 180 days from the payment due date. There are
different cases of lapse based on reason for discontinuation of the policy. Figure 9
Artificial Intelligence-Based Cost Reduction for Customer Retention … 73
Pure Lapse Indicate that the customer hasn’t paid up the renewal
premium till 180 days from due date
Extended Term Indicates that the customer did not pay the renewal
Overall
Insurance (ETI) premium and the policy coverage is extended for a
Lapse
limited period. Policy can be reinstated after payment
of all due past premiums
defines two different cases of lapsation considered for the purpose of building the
models.
Effectively for the company, both these types of lapses are considered as policy
discontinuation because there is no future renewal income expected from these
policies. Both pure lapse and ETI have a 50–50% contribution to overall lapse
events.
The customer retention operations focus on both pure lapse and ETI categories.
The key reason for this is that the policy can be reactivated from these 2 states (pure
lapse and ETI) and proactive retention efforts through calls, SMS and other digital
channels help controlling such events.
In order to test out the predictive power of the predictive power of the models, only
pure lapse events were considered. Pure lapsation is a larger problem, and proactive
efforts are expected to make higher (than ETI) impact on retaining customers; hence,
being able to predict pure lapse policies adds the highest business value.
An iterative three-stage model development process was followed. The first stage
comprised developing models using different methodologies on a representative
sample and dropping the models showing lesser predictive capabilities. In the second
stage, final selected models were created on full dataset. The third step involved
testing of the developed model for stability and robustness.
Before the model building exercise, the final dataset was split into 3 parts.
1. Model Building Data: All policies due from January 2015 to December 2017
were studied and further split into parts.
a. Training Data: A sample comprising 70% of policies is selected at random
to create and train the model.
b. Validation Data (Out of Sample): The remaining 30% of policies is used for
validation.
74 S. Thawakar and V. Srivastava
2. Model Validation (Out of Time): All policies due from January 2018–June 2018*
were used to test the model for robustness and stability.
At the first stage, a random sample of policies was selected from the model building
dataset and 3 different machine learning classification algorithms were tried along
with various combinations of deep learning neural network models.
A binary classification model was built using classic logistic regression technique to
classify pure lapse events across the renewal policy book sample.
3.5.2 XG Boost
Extreme gradient boosting, commonly known as XG boost, has become a widely used
and really popular tool among data scientists in industry. A gradient boost model was
built with the same sample dataset with pure lapse as dependent event.
For building a deep learning model to predict pure lapse events, various combina-
tions of feedforward neural network (FNN) models were tried and the final deep
learning model was iteratively crafted using a combination of feedforward neural
network (FNN) and long short-term memory (LSTM) neural network architectures.
The LSTM layer was built specifically to use the past interactions of the customers,
such as payment transactions and service request data. The final architecture of the
DL model is shown in Fig. 10.
For the purpose of assessing the predictive power of the models, the cumulative lift or
capture rate was considered to be the most important metric. Cumulative lift shows
the ability of the model to separate out the events and non-events. It is measured by
cumulative sum of percentage of events captured in each decile. It is also referred to
as the capture rate (indicating the events captured in each decile and cumulatively
for 10 deciles).
Capture Rate (or Lift)= Total events in a decile / Total number of events in the
dataset
Cumulative Capture Rate= Cumulative sum of events captured, for deciles
Artificial Intelligence-Based Cost Reduction for Customer Retention … 75
Policy data
Customer Data
Fully Connected
Agent Data Layer (FNN), 50
Nodes
Branch Data
Macro economic
Fully Output
Basic LSTM (Long Fully
Connected Layer
Transaction Short Term Memory), Connected
Layer
Data Single directional Layer (FNN) - 2 Nodes
(FNN), 20
One to One – 10 Cells 30 Nodes
Nodes
Service
Request
Data
In the context of the business objective, a higher capture rate in top deciles indi-
cates higher concentration of pure lapsers in the high-risk bucket and higher confi-
dence to concentrate retention efforts on top deciles; at the same time, a low capture
rate in the lower deciles indicates a lower risk of any potential loss of renewal income
while optimizing the retention efforts in the low-risk bucket.
4 Results
For assessing the predictive power of the models, three classification techniques
were tried, with a dataset containing 61 finalized variables, and the capture rate for
each of these model outputs was compared. Table 7 provides a comparison of model
performance across the 3 techniques.
The logistic regression model sharply distinguished the pure lapse events
capturing 84% events in the top 3 deciles (top 30% population). However, for the case
at hand, XG boost algorithm performed poorly as compared with logistic regression
model probably because this technique is more suited for ensemble model outputs
rather than a stand-alone classifier. An overall cummulative capture rate of 75% was
achieved in top 3 deciles, after multiple iterations.
The neural network-based deep learning model turned out to have the highest
predictive power. Not only the model captured 89% pure lapsers in the top 3 deciles,
it also reduced the capture rate to 0% in the bottom 3 deciles. The long short-term
memory (LSTM) network effectively captured the interaction behavior of customers
based on past premium payments and service call sentiments.
Following the superior performance of the deep learning model, it was built on
the complete dataset with 3 lakh policies (some policies had to be removed from
76
the dataset while data cleaning and processing). The architecture of the DL model
was maintained as is for the larger dataset, with a train and test split of 70% and
30%, respectively. The train and test model performance indicated a more than 99%
capture rate in the top 3 deciles. Although this seemed to be an overfitted model,
however the out-of-time sample performance of the model turned out to be 90%,
which was seen to be stable, with a 3% (percentage points) variance in capture rates,
over the months across January’18 to June’18. Tables 8 and 9 provide a decile-wise
capture rates for the final deep learning model.
Since proactive efforts for retention play an effective role in case of pure lapse and
ETI policy discontinuation, the chosen deep learning architecture was also applied
Table 8 Decile-wise capture rates for train and test dataset (January’15–December’17 due policies)
Decile Number of Number of events Percentage events Cumulative Risk tag
policies captured (%) capture rate (%)
10 305,765 192,634 84 84 High
9 305,765 34,229 15 99
8 305,765 2378 1 100
7 305,765 11 0 100 Medium
6 305,765 1 0 100
5 305,765 1 0 100
4 305,765 2 0 100
3 305,765 0 0 100 Low
2 305,765 0 0 100
1 305,765 0 0 100
3,057,650 229,256
Table 9 Decile-wise capture rates for out-of-time dataset (January’18–June’18 due policies)
Decile Number of Number of events Percentage events Cumulative Risk tag
policies captured (%) capture rate (%)
10 202,047 48,587 45 45 High
9 202,047 31,314 30 75
8 202,049 15,911 15 90
7 202,048 10,191 10 99 Medium
6 202,050 983 1 100
5 202,046 16 0 100
4 202,049 6 0 100
3 202,048 0 0 100 Low
2 202,047 4 0 100
1 202,048 2 0 100
2,020,479 107,014
78 S. Thawakar and V. Srivastava
Table 10 Capture rates for deep learning models with different types of lapse events
Model Model event Train Test Out of time
(1) Top 3 Bottom 3 Top 3 Bottom 3 Top 3 Bottom 3
deciles deciles deciles deciles deciles deciles
(high (low risk) (high (low risk) (high (low risk)
risk) (%) (%) risk) (%) (%) risk) (%) (%)
Model 1 Pure lapse 99 0 99 0 90 0
Model 2 ETI 88 0.15 88 0 84 0
Model 3 Combination 81 2 81 2 72 3
of pure lapse
and ETI
to iteratively create models for ETI events and a combination of both types of lapse
events (pure lapse and ETI).
It can be observed that deep learning model performs quite well for modeling ETI
events, as well, with a capture rate of 87% and 84% in top 3 deciles, for train, test
and out-of-time samples, respectively; however, a combined model with both ETI
and pure lapse (event was defined as either ETI or pure lapse = 1) drops significantly
indicating there might be some difference between nature of policies which move to
pure lapse as compared with those which move to ETI (Table 10).
5 Conclusion
The deep learning model, with the designed architecture, came out to be the most
superior in terms of capturing both pure lapsers and ETI events. Both in-time and
out-of-time validations were done, and the model is robust and stable over time and
does not witness any significant drop in ability to predict the lapsers.
Although predictor explainability is difficult to be derived in a deep learning
model, a both information value and variable dropping exercise were conducted to
get a sense of important predictors of lapsation. A list of important predictor variables
was compiled, and Table 11 presents the description of these important variables.
The final results showed that a total of 17 variables were important to predict
lapsers at the time of policy due date. The final model capturing pure lapsers was
selected to be deployed, and appropriate decile-wise risk tags were created to use the
model outputs for customer retention strategy. As a further research, the combined
model for pure lapse and ETI is being refined further to see if both these types of
events can be modeled together using ensembling techniques.
Artificial Intelligence-Based Cost Reduction for Customer Retention … 79
6 Implications
Following the purpose and objective of the research, the outcomes were used to create
an AI/deep learning-based predictive intelligence solution for the customer retention
strategy.
An end-to-end integrated solution was built to classify customers based on risk of
lapsation, using the deep learning model. The risk classification is used to devise a
differential retention calling strategy. The solution also tracks retention efforts with
overall cost and renewal income monitoring.
The solution enables execution of a differential retention strategy based on risk
classification by
• Proactively reaching out to high-risk (high chance lapsation) customer segment
to educate them on policy benefits, thereby increasing customer retention.
• Reducing the number of renewal calls made to low-risk customers and leverage
other communication modes like SMS and digital.
We estimate that, using this solution, Max Life will be able to save 12–15%
customer retention cost without any drop in renewal income. In the long term, we
will be able to use the solution for proactive retention efforts in high-risk segment
to increase retention rates, thereby increasing the renewal income (with goal of 1%
improvement in retention rates, thereby adding Rs 100 crore to renewal income).
Apart from the quantitative benefits, the research-based solution has already
achieved a number of qualitative benefits including
• Business integration with cutting-edge, sharp data science algorithm.
• New business quality improvement driven by customer segment characteristics
which define risky customers.
• Process and metrics standardization achieved during data preparation from
various different source systems.
80 S. Thawakar and V. Srivastava
References
insure@nceiknowledge admin experts speak. (2018, 7 July). Technology can help solve one of the
largest problems in Life Insurance in India: The Persistency Ratio. https://www.aegonlife.com/
insurance-investment-knowledge/life-insurance-india-persistency-ratio/.
Jones, T. L. (2017, April). Indian consumers positive to using mobile and apps for-
insurance. https://blogs.lexisnexis.com/insurance-insights/2017/04/indian-consumers-positive-
to-using-mobile-and-apps-for-insurance/.
Optimization of Initial Credit Limit
Using Comprehensive Customer Features
With a total of 36.24 million credit cards in operation with a spend of Rs. 41,437
crores in January’18 from a 28.85 million credit cards and usage of Rs. 32,691
crores in January’17, credit card market has shown an incredible growth in India.
During the initial introduction of credit cards in the Indian market, the word credit
did not go along with the Indian mentality, believing that credit cards would increase
their liability and might lead to payment of huge interests, if not cleared on time.
The tremendous growth in the recent times of this market can be accredited to the
acceptance of ‘spend now and pay later’ strategy which was supported by the ease
of digital payments, acceptance in almost every monetary transaction and the e-
commerce boom along with the option to repay in easy instalments.
Traditionally, credit card issuers have relied upon the income and the score of
an applicant to calculate his credit limit. In the current scenario, the average utiliza-
tion on credit cards is only 30%. This highlights that a majority of customers have
a huge unutilized limit, resulting in capital blockage for the institution. Addition-
ally, post-issuance, 23.3% of the applicants do not activate the card. Hence, it is
necessary to devise a dynamic and reliable method to determine credit limit which
focuses on crucial factors like expected spending, odds of activation given the limit,
behaviour on similar account, etc. This paper proposes a more granular methodology
to determine the credit limit of a customer based on his expenditure potential and
his credibility to payback based on his credit history, payment history and various
demographic variables. The idea is to understand the role of the above factors and
the methodologies adopted to address the problem of limit assignment.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 81
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_7
82 S. Piplani and G. Bansal
2 Literature Review
Businesses granting credit through credit cards faces numerous challenges with the
growing demand and varied consumer behaviour. Researchers over the time have
focused on credit limit increase and decrease post observing payment patterns on
the account over a specified time period. Questions on usage of credit line, payment
patterns in terms of revolving, profitability of the customer and how likely is the
customer going to default in future have been raised and discussed. Various segmen-
tation and prediction techniques have been devised and tested. Bierman and Hausman
(1970) formulated a dynamic programming model in which the decision process
focused on whether to grant credit or not and for what amount. In their formulation,
the amount of credit offered was linked to the probability of non-payment or default.
Haimowitz and Schwarz (1997) developed a framework of optimization based on
clustering and prediction. Expected net present value is calculated at multiple credit
lines combined with the probability of cluster memberships. This paper also high-
lights the future scope in terms of using multiple independent variables for opti-
mization and studying their effects using other techniques such as neural networks.
Research by Hand and Blunt (2001) highlighted the use of data mining techniques
to predict spending patterns in a database of UK credit card transactions, specifically
on the petrol stations. Dey (2010) highlighted the importance of understanding the
action effect models and addressed the prominence of each component of revenue
and risk. The research by Terblanche and Rey (2014) focused on the problem of
determining optimal price to be quoted to the customer, such as income of the lender
is maximized while taking price sensitivity into account. Using probability of default,
loss given default and other factors, an equation describing the net present income
is developed. Budd and Taylor (2015) presented a model to derive profitability from
a credit card assuming that the card holder pays the full outstanding balance. Most
of the research either focused on one constituent of optimal allocation or followed a
single approach that is either from a pure risk or revenue perspective. Even though
the elements described in this paper are discussed in multiple researches either one
or more of the approaches, methodology and techniques are missing.
3 Methodology
The methodology to obtain an optimal initial credit limit for each customer is
based on the information submitted during the application and bureau history of
the customer (in case he/she has a previous line of credit). In this context, allocation
and maintenance of pertinent credit limit will maximize revenue and minimize the
risk associated.
Understanding the revenue and risk components, theoretically, a high credit limit
has an advantageous effect of increased expected revenue, but at the same time both
the probability of default and expected exposure at default also increase. Similarly, a
Optimization of Initial Credit Limit Using Comprehensive Customer … 83
low credit limit decreases expected loss but leads to a decrease in expected revenue.
The overall effect and the appropriate action depend on which of these effects is
stronger (Dey 2010).
Revenue generated from any credit card portfolio is influenced by complex inter-
actions between several factors like probability of activation, probability of attrition,
propensity to revolve and credit limit utilization, while the risk component comprises
the probability of default at the time of acquisition, behavioural probability of default
and exposure at default (Dey 2010).
The behaviour of each effect variable is modelled separately, and their combined
effect defined as risk-adjusted return is studied.
Probability of
Activation
Propensity to
Expected Return Revolve
Forecasted
Utilization
Risk Adjusted
Return
Probability to
default
Expected Exposure at
Risk default
Loss Given
Default
– Transactor: A customer who pays the exact balance due each month and hence
does not incur any interest charges
– Accidental Revolver: A customer with deferred payment for less than or equal
to two months
84 S. Piplani and G. Bansal
A method of modelling and determining the initial credit limit consistent with the
objective of maximizing revenue and minimizing risk has been discussed. Each
component being predicted can be applied separately to most customer decisions
across the customer life cycle: customer acquisition and customer account manage-
ment. Businesses based on their requirement can understand the trade-off between
different scenarios, thereby enabling them to determine the best action for each
customer. The approach enables improved customer decision-making process in
terms of:
• Moving away from the traditional methodologies to taking a holistic view of the
customer actions and decisions in terms of his spend and repayment behaviour
• Helping the business to understand optimal capital allocation through credit card
• Helping the business identify homogenous customer segment and design targeting
strategies accordingly
• Using individual component models, to understand characteristics of customer
segments, devise varied strategies and schemes to enhance customer experience
and helping the business reduce churn and ensure adequate customer loyalty.
References
Bierman, H., & Hausman, W. (1970). The Credit Granting Decision. Management Science, 16(8),
B-519–B-532.
Budd, J., & Taylor, P. (2015). Calculating optimal limits for transacting credit.
Dey, S. (2010). Credit limit management using action-effect models.
86 S. Piplani and G. Bansal
1 Introduction
As per the Situation Assessment Survey (SAS) for Agricultural Households by NSSO
70th round, in 2012–13, almost 40% of the agricultural households still relied on
non-institutional sources for their credit needs, an increase of almost 11% over 1990–
91. Moneylenders form a major part, around 26%, of that non-institutional credit.
Even with the rising credit disbursements and loan waivers, we have not been able
to improve the situation of our farmers. Empirical and situational evidences suggest
that generalized loan waivers have made less than marginal contribution toward
improving credit situation of farmers (FE Online 2018). It rather creates a situation
of moral hazard which affects the loan repayment behavior of all the farmers. Since
2011–12, percentage of bad loans from agriculture sector has climbed every year and
the growth rate of loans disbursed to this sector has become close to stagnant. In FY
2018, banks disbursed only an additional 6.37% to this sector which is the lowest in
a decade (Iyer 2019).
Agriculture sector poses risks for the banks in multiple forms. Lending to the agri-
culture sector has been adversely affected in recent times, and it could be indicative
of the deteriorating asset quality (Trends and Progress Report, RBI 2017–18).
Reserve Bank of India (RBI), which is India’s central bank and regulating body,
oversees the functioning of all the banks that operate in the country and has identified
certain priority sectors, of which agriculture is also a part, to ensure necessary credit
flow to these sectors. However, banks especially the private and foreign banks are
not familiar with India’s agricultural landscape and feel reluctant to lend in this
sector (Jayakumar 2018). Because of inadequate knowledge of the risks pertaining
to these sectors, they refrain from direct lending and instead end up investing in
Rural Infrastructure Development Fund (RIDF) of NABARD or buying Priority
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 87
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_8
88 A. Singh and N. Jain
Sector Lending Certificates (PSLCs) to meet their priority sector lending targets.
This paper aims to shed some light on the financial landscape of agricultural sector
in India to help banks understand the market and model the associated credit risk in
an improved manner and hence bridge the gap between the borrower and the lender.
Credit risk associated with an individual can be classified into two broad
categories:
1. Capacity to pay and
2. Intention to pay.
Capacity to pay is governed by the principle that the individual should have the
ability to generate a steady flow of income which depends largely on his demographic
features such as age, qualification and profession. These features along with income
and existing assets of the individual determine whether the individual has the capacity
to pay back the loan. Following Maurer (2014), risks in agriculture finance can
be broadly classified into the following categories which influence an agricultural
household’s capacity to pay:
i. Production Risks: Agriculture production in India is fraught with the risk of
poor monsoon, disease and pests due to which farmer’s income suffers. Lack
of proper irrigation facilities, immense dependency on monsoon, lack of good
quality seeds and chemical fertilizers can lead to suboptimal output and therefore
insufficient generation of income to pay back the loans.
ii. Market Risks: There are price uncertainty and volatility associated with farming
where farmers do not know at the time of plantation what prices their produce
would fetch. The interplay of demand and supply factors in determining market
prices causes agriculture income to be volatile. Minimum support price (MSP)
plays a crucial role here in defining a floor price at which government would
procure crops from farmers. The Cabinet Committee on Economic Affairs
(CCEA), Government of India, determines the MSP based on the recommenda-
tions of the Commission for Agricultural Cost and Prices (CACP). The objective
of MSP is to protect farmers from the price shocks and to ensure food security
through buffer stocks and Public Distribution System (PDS). MSP, however, is
replete with problems. The 2016 Evaluation Report on Minimum Support Prices
released by NITI Aayog underlined that the lack of procurement centers, closed
storage facilities and delay in payments were some of the shortcomings of MSP.
Lack of knowledge about MSP also contributed to farmers not being able to plan
crop growing pattern ahead of sowing season and reap additional benefits from
it. According to the report, despite its shortcomings farmers find MSP to be very
useful and want it to continue as it provides a floor price for their produce and
protects them against price fluctuations.
Despite having the capacity to pay, an individual may not have the intention or
discipline to pay back the loans on a timely basis which is costly for banks. This is the
behavioral aspect of credit risk and is reflected in his/her credit history. Recent delin-
quency, on-time payment history, leverage, default and non-default credit accounts
are some metrics which give us insights into the behavior of the customer through
Mitigating Agricultural Lending Risk: An Advanced Analytical Approach 89
which we can evaluate whether he/she has the intention and required discipline to
pay back the loans.
In agricultural loan market, loan waivers announced by government severely
impact the behavior of the agriculture households and create a moral hazard problem
where farmers default on loans in expectation of loan waivers in the future. Such loan
waivers undoubtedly relieve distressed farmers of their credit burden, but it nega-
tively impacts the credit culture. Such political risks associated with the agriculture
sector make banks reluctant to lend to this sector. Post the 2008 comprehensive loan
waiver scheme, a survey showed that one out of every 4 respondents wanted to wait
for another loan waiver (Maurer 2014).
Instead of giving out generalized loan waivers to farmers, the need of the hour
is to focus on creating robust mechanism to assess credit risk in the agriculture
sector which can help banks increase their reach and help bring the farmers into the
formal sector. In this paper, we highlight an approach that can make this possible. We
show how using farm and household characteristics we can risk rank the agriculture
households by assessing their “capacity to pay.” Considering the difficulties faced
by farmers and banks, our model would help in bridging the gap between them. By
reducing the risk associated with farm lending, it would create a potentially profitable
market for banks and would make cheaper credit available to the farmers along with
reducing their dependency on moneylenders.
2 Literature Review
The economic survey of 2017–18 reveals that India’s agricultural sector which
employs more than 50% of the population contributes only 17–18% in its total
output (Economic Division 2018). Therefore, enhancement of farm mechanization
is important to mitigate hidden unemployment in the sector and free up useful labor.
Agricultural credit plays a pivotal role in achieving technical innovation, and there-
fore measures need to be taken to expand the reach of low-cost formal credit to all
farmers. Abhiman Das (2009) show that direct agricultural lending has a positive
and significant impact on agricultural output whereas indirect credit has an affect
after a lag of one year. Therefore, despite its shortcomings like less penetration to
small and marginal farmers, and paucity of medium- to long-term lending, agri-
cultural credit plays a critical role in supporting agricultural production and hence
farm incomes and livelihood. In order to lend efficiently and minimize defaults on
loans, it is imperative to have a sound analytical system in place to assess credit-
worthiness of borrowers. There have been several studies on credit scoring models
for agricultural lending which use bank or credit history data as well as farm’s and
borrower’s characteristics to assess debt repaying capacity of farmers. Identifying
low-risk customers using credit risk assessment models is important not only for
reducing cost for banks but also to increase the penetration of credit to small and
marginal farmers who would have otherwise been left out due to misclassification as
bad customers. Bandyopadhyay (2007), using sample data of a public sector bank,
90 A. Singh and N. Jain
developed a credit risk model for agricultural loan portfolio of the bank. With the
help of bank’s credit history and borrower’s loan characteristics such as loan to value,
interest cost on the loan, value of land and crops grown, he arrives at a logistic regres-
sion model that predicts the probability of default-defined as per the then NPA norm
of the RBI, i.e., if the interest and/or installment of principals remains overdue for
two harvest seasons but for a period not exceeding two and half years in the case
of an advance granted for agricultural purpose. However, low sample size of the
study serves as a major limitation of the model as it renders the model vulnerable to
sample biases. Seda Durguner (2006), in their paper, showed that net worth does not
play a significant role in predicting probability of default for livestock farms while it
does matter significantly for crop farms. They develop separate model for crop and
livestock farms in order to prevent misclassification errors that could arise by not
differentiating between the farm types. Durguner (2007) showed using a panel data of
264 unique Illinois farmers for a five-year period, 2000–2004, that both debt-to-asset
ratio and soil productivity are highly correlated with coverage ratio (cash inflow/cash
outflow). Using a binomial logit regression model on 756 agricultural loan applica-
tions of French banks, Amelie Jouault (2006) show that leverage, profitability and
liquidity at loan origination are good indicators of probability of default.
The studies mentioned above suffer from some severe limitations which need to
be addressed for obtaining a robust credit risk model:
(1) No differentiation on geographical location and farm type: The ability of a
farmer to repay depends on the income that he generates which is highly depen-
dent on where he lives, rainfall pattern in that location, the soil type, crop grown,
etc. Therefore, considering such agro-climatic factors is necessary in the model
building process.
(2) Limited data sources: Bank’s data would not be helpful for assessing risk
of the farmers who are new to formal credit or if banks expand their direct
lending to agriculture to new locations. Alternative methods to score farmers
for their riskiness need to be identified as opposed to relying just on their past
performance.
(3) Narrow focus of study: Credit risk from farm sector, as mentioned in the above
section, can result from inability to pay that can be influenced by price risk and
market risk or it could be due to indiscipline and fraudulent behavior which
could result from political risk. Focusing only on the behavior of farmers on
their credit account will not take into account a complete picture of the situation
of the farmers, and most importantly, it would leave out those who are new to
credit.
(4) Small sample size: Given the nature of diversity in India’s agricultural landscape,
a single bank’s data cannot capture all the dimensionalities of the sector and a
small sample size can lead to sample biases and cannot be applied universally.
Mitigating Agricultural Lending Risk: An Advanced Analytical Approach 91
Current mandate for Priority Sector Lending (PSL) by RBI requires all scheduled
commercial banks and foreign banks to lend 18% of their Adjusted Net Bank Credit
(ANBC) or Off Balance Sheet Exposure, whichever is higher, to the agriculture
sector. Out of this, a sub-target requires them to lend 8% to the small and marginal
farmers. As per the RBI guidelines, a small farmer is one who holds less than or
equal to 1 ha of land whereas any farmer with more than 1 ha but up to 2.5 ha of
land is considered to be a marginal farmer. These guidelines hold for all Scheduled
Commercial Banks (SCBs) including foreign banks (RBI 2016).
Additional measures taken by the government to improve the farm credit situation
include Kisan Credit Card (KCC) and Agricultural Debt Waiver and Debt Relief
Scheme though their effectiveness can be debated and most of the experts consider
them to be an unnecessary fiscal burden.
Despite taking the policy measures mentioned above, year-on-year growth of farm
loans has gone down in past few years. After seeing a close to 40% growth rate in
2014, increase in farm credit went down to below 10% which is lowest since 2012
(Trends and Progress Report, RBI 2017–18).
As per the All India Rural Financial Inclusion Survey (NAFIS), 2016–17, by
NABARD, 52.5% agricultural households had an outstanding debt at the time of
the survey and out of these almost 40% households still went to non-institutional
sources for their credit needs. Similar results are shown by the Situation Assessment
Survey (SAS), 2013, by NSSO which shows a dependence of 44% households on non-
institutional sources (please refer to Table 1). Even though two surveys have different
samples, this indicates that the share of non-institutional sources has remained almost
same from 2013 to 2016–17 and additionally corroborates the fact that growth in
institutional credit has remained stagnant.
With flexible lending terms and often no collateral required, agricultural house-
holds continue to borrow from informal sources (moneylenders, friends and
family).
Despite the exorbitant interest rates, which can go as high as 4 times the interest
rates charged by the formal sources (refer to Table 2), moneylenders continue to cater
to the credit needs of close to 11% of the farm borrowers (NAFIS 2016–17). This,
including the reasons mentioned above, could be due to various factors including
the availability of credit for personal reasons such as marriage. Another reason for
this could be the unavailability of formal sources of credit. As per SAS 2013, for the
agricultural households which owned less than 0.01 ha of land, only about 15% of
Table 1 Distribution of rural credit across institutional and non-institutional sources (in %)
Type of credit 1951 1961 1971 1981 1991 2002 2012
Institutional credit 7.2 14.8 29.2 61.2 64 57.1 56
Non-institutional credit 92.8 85.2 70.8 38.8 36 42.9 44
92 A. Singh and N. Jain
Table 2 Distribution of outstanding cash debt as per the rate of interest charged
Rate of interest (%) Distribution of outstanding cash debt
Rural Urban
Institutional Non-institutional Institutional Non-institutional
Nil 0.8 18.3 0.4 27
<6 7.1 2.3 1.5 1.1
10-June 26 0.4 14.5 0.9
12-October 12.9 0.7 41.6 1.2
15-December 42.6 4.1 34.1 7.7
15–20 7.3 5.6 6.2 4.3
20–25 2.1 33.9 1.2 27.3
25–30 0.1 0.6 0.2 0.3
>30 1 34.1 0.4 30.2
loans were sourced from institutional lenders. On the other hand, this number was
as high as 79% for farmers with more than 10 ha of land.
Most of this farm lending continues to be done by public sector banks. As of
December 2016, private sector lent out 9.5% of the total loans whereas public sector
lent out 85% of the total loans (Credit Bureau Database 2018). Private players,
including foreign banks, have been reluctant to lend to farmers. For the year 2017–
18, private and foreign banks met their PSL targets but did not meet their sub-targets
of 8% lending to the small and marginal farmers (Trends and Progress Report, RBI
2017–18).
One major reason for this reluctance is the rise in bad loans coming from this
sector. Between 2012 and 2017, bad loans in agriculture sector have jumped by
142.74% (Financial Express Online 2018). One reason behind this jump is the farm
loan waiver announced by the central government. Subvention schemes, a subsidy
provided by the government on interest rate, are another reason why private banks find
PSL challenging. Banks are mandated to charge 7% interest on loans up to 3 lakhs.
A further 3% subvention is provided in case of timely payments. So effectively these
loans become available to farmers at 4% interest rate (PIB, DSM/SBS/KA, release
ID 169414). This scheme has recently been made available to the private sector banks
since 2013–14, prior to which it was only available to public sector banks.
Another reason for the meager farm lending by the private and foreign banks is
the lack of understanding of the agriculture sector as a whole. This also leads to
the inability to effectively assess risk in this sector. Without a proper understanding
of the sector and the understanding of risk, operating in rural and semi-urban areas
can be very expensive for banks. Entering a new market requires opening of new
branches, launching market-specific products and huge operating costs. Due to all
these reasons, these banks stay away from the agriculture sector or do marginal
amount of lending in urban areas.
Mitigating Agricultural Lending Risk: An Advanced Analytical Approach 93
4 Research Methodology
Given the challenges faced by banks in lending to the agriculture sector, we propose
in this paper a holistic approach to assess credit risk of farmers using alternative data
and advanced analytical techniques. The focus of this study is the farmers who are not
a part of the formal credit and still rely on non-institutional sources. These farmers
would not have a credit footprint available to assess their riskiness, and therefore we
focus on “capacity to pay” of the farmers rather than their default behavior on their
credit accounts. In this study, we have used NSSO 70th round data—Key Indicators of
Situation of Agricultural Households in India to identify complete characteristics of
farmers in India. This is a comprehensive dataset of agricultural households in India
which are defined as households receiving at least Rs. 3000 of value from agricultural
activities (e.g., cultivation of field crops, horticultural crops, fodder crops, plantation,
animal husbandry, poultry, fishery, piggery, beekeeping, vermiculture, sericulture,
etc.) during last 365 days, and it encompasses all the factors that reflect the then
situation of farmers. The survey was conducted in two visits. Visit 1 comprises data
collected in the period January 2013 to July 2013 with information collected with
reference to period July 2012 to December 2012, and Visit-2 comprises data collected
between August 2013 and December 2013 with information with reference to period
January 2013 to July 2013. This way it covers both kharif and rabi cropping seasons.
However, for our modeling purpose we use only Visit-1 data as the information on
outstanding loans is captured only in the Visit-1 Survey. The NSSO data captures
variables such as the kind of dwelling unit of farmers, status of ownership of land,
primary and subsidiary activity of farmers, whether the household has MGNREG
job cards, no. of dependents and their employment status, the kind of crop grown on
the farm, size of land under irrigation, the value of sale of crop, the agency the crops
are sold through (dealers, mandi, cooperative agency and government), details of
expenses in inputs and whether the farmer avails MSP or not. Such a detailed dataset
of farmer characteristics is very helpful in assessing whether the farmer will be able
to “afford” the loan or not. Following Seda Durguner (2006), we used debt to income
as a proxy to judge creditworthiness. The mean debt to income in the population is
14, while the median is 1.5. Below table shows the distribution of debt to income in
the data (Refer to Table 3).
We use debt-to-income ratio of 4 as the threshold; i.e., farmers whose ratio of
outstanding debt is more than 4 times the income of one cropping season are clas-
sified as bad (farmers who would default), and logistic regression technique is used
to predict the probability of default for these farmers. The overall bad rate of the
population with the given threshold is 26%.
We build two models in our analysis. First, we use the variables that are captured
by banks (Model 1) in their agricultural loan application form. For this purpose,
we use standardized loan application form for agricultural credit devised by Indian
Bank’s Association (IBA). This form contains the required details that need to be
collected from agri-credit loan applicants. This helps banks and customers maintain
uniformity in the loan applications for agricultural needs. The second model (Model
94 A. Singh and N. Jain
Table 3 Distribution of
Quantile Estimate
debt-to-income ratio in our
data 100% max 21,300
0.99 186.96
0.95 51.07
0.9 18.18
75% Q3 4.8
50% median 1.55
25% Q1 0.53
0.1 0.19
0.05 0.1
0.01 0.03
0% min 0
2) that we built considered both, the details already captured by the bank along with
the additional features created from the NSS 70th round data. We use information
value (IV), which tells how well my variable is able to distinguish between good and
bad customers, to select important or predictive variables in the model. The variables
whose IV was between 0.02 and 0.5 were then binned using weight of evidence
(WOE). Variables with similar WOE were combined in a bin because they have
similar distribution of events and nonevents. In this way, we transformed continuous
independent variable to a set of groups/bins. We then built a logistic regression model
to obtain probability of default using WOE of independent variables.
We find that Model 2 performs better than Model 1 in terms of Gini, KS and rank
ordering. The results of the model are discussed in the next section.
5 Results
Our model gives a comprehensive set of variables which includes farmer’s demo-
graphic features, agro-climatic factors and cropping patterns that describe his/her
ability to pay. Variables like highest value crop grown and whether the farmer faced
crop loss during the last one year capture the farming pattern for the farmer and
explain how the recent trend of farming has been for the farmer. Whether the farmer
has taken technical advice or not shows if farmer has access and willingness to incor-
porate new techniques in his farming. Our model covers both the endowment and
behavior-related variables of the farmers.
The following tables give the resultant significant variables in both the models:
1. Model 1:
Mitigating Agricultural Lending Risk: An Advanced Analytical Approach 95
2. Model 2:
Capturing these additional variables in our model for assessing the risk of farmers
gives us a lift of almost 7% in the Gini coefficient (from 41.7 to 48%); i.e., it improves
the model accuracy from 70 to 75%. Bad rate distribution for our model goes from
58.87% in the lowest decile (highest risk decile) to 5.48% in the highest decile (lowest
risk decile). On the other hand, using the variables already captured by banks, bad
rate ranged from 55.27 to 8.02%. The below graph shows the risk ranking across
deciles for both the models. We observe a break in the risk ranking of Model 1 at
decile 7, whereas Model 2 holds perfectly across all the deciles. Refer to Appendix
for detailed tables and to Appendix 2 for a note on Gini and KS Summary Statistic.
96 A. Singh and N. Jain
Risk Ranking
80.00%
60.00%
Bad Rate
40.00%
20.00%
0.00%
0 1 2 3 4 5 6 7 8 9
Model 1 Model 2
6 Policy Implications
A policy aspect that comes out from this analysis is that this model would allow
the government to figure out the population they need to focus on for their policy
measures. Farmers who get identified with a lower capability to pay using Model 2
become the target population for the government policies. Also, the model variables
on which they did not do well define the areas where government needs to focus to
bring those farmers to the formal financial sector. For example, if a large number of
farmers in a district are identified to have a lower capability to repay due to having
not taken technical advice or because of having suffered crop loss in the last farming
season, this defines the focus area for the government to work upon. Here, they need
to improve the availability of technical advice to the farmers and work on reasons
of crop loss. Hence, above model would serve the dual purpose of helping both the
banks and the government. Even though banks need to keep lending at the reduced
interest rates as per the government policies, using this model they can identify the
population with a lower risk of default. At the same time, government can form
specific policies based on the needs of the farmers and help bring them to the formal
credit market.
Even though our model brings out results which can help both the banking sector
and government, we do not claim that our model is free of any limitations or has
no scope for improvement. Considering the type of data that has been used for this
model, there is an inherent risk of endogeneity to occur in the analysis and it needs
to be accounted for in the model building process. Also, the variables in Model 2 are
not easily verifiable and it would require banks to invest in proper due diligence of
their agricultural loan applicants.
On the basis of above information, it is understandable that farming sector needs
special attention when it comes to credit facilities. Existing schemes and facilities
have been unable to fulfill the credit needs of this sector. Generalized loan waivers
announced time, again have put a financial burden on the economy and are not a
solution in the long run. Our model shows that if banks capture specific information
about farmer characteristics and consider agro-climatic conditions like rainfall in
their lending decisions, they can reduce the delinquencies from this sector. In this
way, agricultural lending can be made much more efficient and the level of financial
inclusion of farmers can be improved.
98 A. Singh and N. Jain
Appendix 1
Approval Strategy:
Decile Good Bad % population Default rate (%) Approved population (%)
0 16,971 20,974 9.48 26 100.00
1 20,799 20,403 10.30 23 90.52
2 26,315 15,600 10.47 20 80.22
3 25,543 8251 8.45 17 69.75
4 42,643 11,776 13.60 16 61.30
5 23,761 6272 7.51 15 47.70
6 32,621 5360 9.49 14 40.20
7 36,168 8574 11.18 13 30.70
8 32,855 4569 9.35 10 19.52
9 37,436 3262 10.17 8 10.17
Mitigating Agricultural Lending Risk: An Advanced Analytical Approach 99
GINI
100%
80%
60%
40%
20%
0%
0% 20% 40% 60% 80% 100%
Approval Strategy:
Decile Good Bad % population Default rate (%) Approved population (%)
0 16,596 23,349 9.98 26 100.00
1 20,542 20,808 10.33 23 90.02
2 21,034 14,688 8.93 19 79.68
3 34,241 13,398 11.91 16 70.76
4 28,493 6698 8.79 14 58.85
5 30,892 7029 9.48 13 50.06
6 35,153 7484 10.66 12 40.58
7 36,267 5212 10.37 10 29.93
8 34,434 4302 9.68 8 19.56
9 37,460 2073 9.88 5 9.68
100 A. Singh and N. Jain
GINI
100%
80%
60%
40%
20%
0%
0% 20% 40% 60% 80% 100%
Kolmogorov–Smirnov Test
where
M is the total number in the development sample.
N is the total number in the validation sample.
K is the Kolmogorov–Smirnov statistic.
Gini Coefficient
The Gini coefficient is a measure of the power of a scorecard. The higher the Gini,
the stronger the scorecard. A scorecard with no discrimination would have a Gini
of zero; a perfect scorecard would have a Gini of 100%. A Gini is calculated by
comparing the cumulative number of goods and bads by score. Graphically, it is the
area between the two lines on the curve below (XYZ) expressed as a percentage of the
Mitigating Agricultural Lending Risk: An Advanced Analytical Approach 101
maximum possible (XYW ). The two axes on the graph are cumulative percentage of
goods (y-axis) and cumulative percentage of bads (x-axis). Graphical representation
of the Gini coefficient:
Cum
%
Goods
X Cum % Bads W
The area under the curve (the unshaded area, not enclosed within Z) for a given
score is defined as:
1
Ai = (bi − bi−1 ) ∗ (gi + g i−1 )
2
The total area not defined by the curve is equal to:
Sn
Ag = Ai
i=S1
1
A X Ŷ W = (100 ∗ 100) = 5000
2
The Gini coefficient is then calculated as:
A X Ŷ W − A g
g=
A X Ŷ W
102 A. Singh and N. Jain
References
1 Introduction
A. Jain · S. Jain
SVKM’s Narsee Monjee Institute of Management Studies (NMIMS), Indore, India
e-mail: [email protected]
S. Jain
e-mail: [email protected]
N. Merh (B)
Jaipuria Institute of Management Indore, Indore, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 103
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_9
104 A. Jain et al.
buying habits with the help of associations between the products that the customer
place in their shopping basket. This will help the retailers to develop marketing
strategies by gaining insights into the items that are frequently purchased with each
other.
Business analytics is a scientific process of transforming data into insight for
making better decisions. Business analytics is used for data-driven or fact-based
decision making which is often seen as more objective than other alternatives. It is
of three types: descriptive, predictive and prescriptive; out of these, this research is
focusing on predictive analytics (Cochran et al. 2015). Predictive analytics includes
variety of statistical techniques from modeling, data mining and machine learning
that analyze current and historical data to make predictions about future outcomes.
Prediction helps organization in making right decision at right time by right person
as there is always time lag between planning and actual implementation of the event.
“Try Us,” a new retail clothing store located in Indore, Madhya Pradesh, is selected
as the store under study. It sells multiple brands for men. The organization wants to
expand its business and is planning to open another store on a larger scale.
Apriori algorithm is used for mining datasets for association rules. The name
Apriori is used because it uses prior knowledge of frequent item properties. Apriori
algorithm uses bottom-up approach where frequent subsets are extended one by one.
In Apriori algorithm, iterative approach is used where k frequent item sets are used
to find k + 1 item sets. It is generally used in market basket analysis as it is useful
in finding the relationship between two products. Apriori makes some assumptions
like:
• All subsets of a frequent item set must be frequent.
• If an item set is infrequent, all its supersets will be infrequent.
While applying Apriori algorithm, the standard measures are used to assess asso-
ciation rules. These rules are the support and confidence value. Both are computed
from the support of certain item sets. For association rules like A → B, two criteria
are jointly
used for rule evaluation. The support is the percentage of transactions that
contain A B (Agrawal
et al. 1993; Avcilar et al. 2014). It takes the form support
(A → B) =P(A B). The confidence is the ratio of percentage of transactions that
contain (A B) to the percentage of transactions that contain A. It takes the form
confidence (A → B) = P(B\A) = support(A B)/support (A). Rules that satisfy
both a minimum support threshold (min_sup) and minimum confidence threshold
(min_conf) are called strong (Avcilar et al. 2014).
Primary objective of the study is to understand the buying pattern of the customer
and to study and analyze proper basket (combos) of products for cross-selling and
upselling. Another objective is to explore the relativity between the products for
applying the optimal design layout for the clothing retail store.
Application of Association Rule Mining in a Clothing Retail Store 105
Research work done by Kurniawal et al. (2018) suggests that market basket anal-
ysis performs better results over association rule mining using Apriori algorithm.
The research done by Tatiana et al. (2018) on a study of integrating heteroge-
neous data sources from a grocery store based on market basket analysis, for
improving the quality of grocery supermarkets, shows positive results for increasing
the performance of the store.
Szymkowiak et al. (2018) propose theoretical aspects of market basket analysis
with an illustrative application based on data from the national census of population
and housing with respect to marital status, through which it was made possible to
identify relationships between legal marital status and actual marital status taking
into account other basic socio-demographic variables available in large datasets.
Study (Roodpishi et al. 2015) conducted on various demographic variables for an
insurance company in the city of Anzali, Iran, provides various associations with
clients of an insurance company. The study used association rules and practice of
insurance policy to find hidden patterns in the insurance industry.
In the study done by Sagin et al. (2018), market basket analysis was conducted on
a data of a large hardware company operating in the retail sector. Both the Apriori and
FP growth algorithms (Sagin et al. 2018) were run separately and their usefulness in
such a set of data was compared. When both the algorithms were compared in terms of
performance, it was seen that FP growth algorithm yielded 781 times faster results but
resulting rules showed that FP growth algorithm failed to find the first 14 rules with
high confidence value. In the study done by Srinivasa Kumar et al. (2018), product
positioning of a retail supermarket based at Trichy, Tamil Nadu, was examined using
data mining to identify the items sets that were bought frequently and association
rules were generated. The study done by Santarcangelo et al. (2018) focused on
visual market basket analysis with the goal to infer the behavior of the customers
of a store with the help of dataset of egocentric videos collected during customer’s
real shopping sessions. They proposed a multimodal method which exploited visual,
motion and audio descriptors and concluded that the multimodal method achieved
an accuracy of more than 87%. In the study done by Avcilar et al. (2014), association
rules were estimated using market basket analysis and taking support, confidence
and lift measures. These rules helped in understanding the purchase behavior of the
customers from their visit to a store while purchasing similar and different product
categories. The objective of the research study done by Seruni et al. (2005) was to
identify the associated product, which then were grouped in mix merchandise with
the help of market basket analysis. The association between the products was then
used in the design layout of the product in the supermarket.
106 A. Jain et al.
2 Methodology
In the current study, an attempt is made to find the relationship between the different
products using Apriori algorithm.
At the first stage, data is preprocessed and transformed, values are handled and
the data is cleaned before selecting the components. After transforming the data,
Apriori algorithm was used to find the relationship between different apparels using
association rules. The data is analyzed on the basis of results obtained from Frontline
Analytic Solver® Data Mining (XLMiner).
Various parameters used for evaluation of the model are antecedent support (if
part) which is the number of transactions in which item/s is present, consequent
support (then part) which is number transactions in which item/s is present, support
which is number of transactions that include all items in the antecedent and conse-
quent. Antecedent (the “if” part) and the consequent (the “then” part), an association
rule contains two numbers that express the degree of uncertainty about the rule.
The first number is called the support which is simply the number of transac-
tions that include all items in the antecedent and consequent. The second number
is confidence which is the ratio of the number of transactions that include all items
in the consequent as well as the antecedent (namely, the support) to the number of
transactions that include all items in the antecedent.
Lift is another important parameter of interest in the association analysis. It is the
ratio of confidence to expected confidence. A lift ratio larger than 1.0 implies that the
relationship between the antecedent and the consequent is more significant. Larger
the lift ratio, the more significant the association. The following are the parameters
used to evaluate the model:
• Support—support which is simply the number of transactions that include all
items in the antecedent and consequent.
• Confidence = (no. of transactions with antecedents and consequent item sets)/
(no. of transactions with antecedents item sets).
Application of Association Rule Mining in a Clothing Retail Store 107
3 Data
The data used for this paper is collected from “Try Us” a multi-brand retail outlet
in Indore, Madhya Pradesh, for a period during November 26, 2017, to September
19, 2018. The collected data is used for the study of association between different
products, and inferences generated can then be used to arrange shelves in a better
way when planned for a bigger retail store. The data collected includes bill number,
date, brand name, size, amount, GST, item type which was refined according to our
purpose to bill number, brand name and item type.
For applying Apriori algorithm on binary data format, the data was first converted
to binary format where if a product was purchased it was recorded as 1 and if no
purchase was made then it was recorded as 0. In total, there are 29 columns and
13,065 rows which were refined to 29 columns and 6008 rows since multiple items
purchased by a customer were recorded in multiple rows. The brands that were not
present in the store from November 26, 2017, were not taken into consideration.
Therefore, 185 rows were deleted out of 6193 rows. Multiple purchases made by a
single customer were merged in a single row.
Main objective of the study is to study the buying pattern of the customer and to
analyze proper basket (combos) of products for cross-selling and upselling. Another
objective is to study the association between the products for applying the optimal
design layout for the retail store. In the paper, association rule mining through Apriori
algorithm is used to find baskets of products which are purchased together. A total
of 5223 transactions are included in the analysis. Using combination of various
minimum support transactions and minimum confidence percentage, the following
results are derived:
Case I
108 A. Jain et al.
Rules
Lift ratio—Lift value of an association rule is the ratio of the confidence of the rule
and the expected confidence.
Application of Association Rule Mining in a Clothing Retail Store 109
Rules:
110 A. Jain et al.
In case II, the rules having lift ratio more than 1 are rule 4, rule 5, rule 6, rule 7.
A brief description of these rules is given below.
Rule 4
A customer who purchases jeans C (Nostrum) purchases a shirt B (Ecohawk).
Rule 5
A customer who purchases jeans O (Revit) purchases a shirt B (Ecohawk).
Rule 6
A customer who purchases trouser D (Sixth Element) purchases a shirt B (Ecohawk).
Rule 7
A customer who purchases jeans N (Mufti) purchases a shirt N (Mufti).
Thus, jeans C (Nostrum) and shirt B (Ecohawk), jeans O (Revit) and shirt B
(Ecohawk), trouser D (Sixth Element) and shirt B (Ecohawk), jeans N (Mufti) and
shirt N (Mufti) should be clubbed and kept together.
Case III
Rules:
In case III, the rules having lift ratio more than 1 are rule 2, rule 3, rule 4. A brief
description of these rules is given below.
Rule 2
A customer who purchases jeans C (Nostrum) purchases a shirt B (Ecohawk).
Rule 3
A customer who purchases trouser D (Sixth Element) purchases a shirt B (Ecohawk).
Rule 4
A customer who purchases jeans N (Mufti) purchases a shirt N (Mufti).
Thus, jeans C (Nostrum) and shirt B (Ecohawk), trouser D (Sixth Element) and
shirt B (Ecohawk), jeans N (Mufti) and shirt N (Mufti) should be clubbed and kept
together.
From the data, it was observed that shirt B (Ecohawk) and shirt N (Mufti) had a
very strong association as they both were sold together for 259 times. Similarly, shirt
B (Ecohawk) and jeans C (Nostrum) were sold together for 240 times.
Figure 1 gives a radar chart of what products are purchased together:
Different colors represent different types of products that were available in the
store. Each concentric circle represents 50 transactions. The following suggestions
can be given to the entrepreneur after analyzing the data.
• On the basis of market basket analysis, products which are sold together with high
frequency like shirt B (Ecohawk) and jeans N (Mufti) should be kept near to each
other such that it reduces the handling time of the customer by the salesperson.
Furthermore, baskets can be developed using the analysis done above.
• Products which have a low frequency should be near to the products that are
preferred more by the customer with some dynamic discount pattern so as to
increase the sales of the low frequency products.
112 A. Jain et al.
4 Conclusion
Applying market basket analysis, some meaningful patterns were obtained which
would help the entrepreneur in planning the discount pattern of the products and
also making various combos according to the rules generated in order to increase
the sales. It will also help in preparing the shelf design for the new store so as to
minimize the product searching time by the customer.
Appendix
(continued)
Serial No. Code Brand
3 C Nostrum
4 D Sixth Element
5 E Status Que
6 F US Polo
7 G Stride
8 H Killer
9 I Ed Hardy
10 J Yankee
11 K Vogue Raw
12 L Delmont
13 M Rookies
14 N Mufti
15 O Revit
16 P UCB
17 Q Beevee
18 R Silver Surfer
19 T Status Quo
20 U M Square
21 W Got It
22 X Borgoforte
23 Y Fort Collins
24 Z Okane
References
Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in
large databases. Association of Computing Machinery (ACM) SIGMOD Record, 22(2). Newyork,
USA. https://doi.org/10.1145/170036. 170072. ISSN-0163-5808.
Avcilar, M. S., & Yakut, E. (2014). Association rules in data mining: An application on a clothing
and accessory specialty store. Canadian Social Science, 10(3), 75–83.
Ballard, A. (2018, August 06). Pricing intelligence: What it is and why it matters. Retreived
from https://www.mytotalreatil.com/article/pricing-intelligence-what-it-is-and-why-it-matters/.
Date of downloading January 31, 2019.
Cochran, C., Ohlmann, F., Williams, A. S. (2015). Essentials of business analytics, pp. 323–324.
ISBN-13: 978-81-315-2765-8.
Kurniawan, F., Umayah, B., Hammad, J., Mardi, S., Nugroho, S., & Hariadi, M. (2018). Market
basket analysis to identify customer behaviors by way of transaction data. Knowledge Engineering
and Data Science KEDS, 1(1), 20–25.
114 A. Jain et al.
Roodpishi, M. V., & Nashtaei, R. (2015). Market basket analysis in insurance industry. Management
Science Letters, 5, 393–400.
Sagin, A. N., & Ayvaz, B. (2018). Determination of association rule with market basket analysis:
An application of the retail store. Southeast Europe Journal of Soft Computing, 7(1), 10–19.
Santarcangelo, V., Farinella, G. M., Furnari, A., & Battiato, S. (2018). Market basket analysis from
egocentric videos. Pattern Recognition Letters, 112, 83–90.
Srinivasa Kumar, V., Renganathan, R., VijayBanu, C., & Ramya, I. (2018). Consumer buying pattern
analysis using apriori association rule. International Journal of Pure and Applied Mathematics,
119(7).
Surjandari, I., & Seruni, A. C. (2005). Design of product layout in retail shop using market basket
analysis. MakaraTeknolgi, 9(2), 43–47.
Szymkowiak, M., Klimanek, T., & Jozefowski, T. (2018). Applying market basket analysis to official
statistical data. Econometrics Ekonometria Advances in Applied Data Science, 22(1), 39–57.
Tan, P.-N., Steinbach, M., & Kumar, V. (2016). Introduction to data mining, pp. 2–3. ISBN 978-
93-3257-140-2.
Tatiana, K., & Mikhail, M. (2018). Market basket analysis of heterogeneous data sources for
recommendation system improvement. Procedia Computer Science, 246–254. ISSN 1877-0509.
Improving Blast Furnace Operations
Through Advanced Analytics
1 Introduction
The blast furnace is a huge stack where the iron oxides are chemically reduced
and physically converted to liquid iron called hot metal. The furnace is lined with
refractory brick where iron ore, limestone and coke are fed from the top and hot
blast (preheated air) is blown up from the bottom. The high-temperature air blowing
up reacts with coke to constitute a combustion and releases heat which releases the
oxides from iron ore and reduces the iron oxide to molten iron. The molten iron is
drained from the furnace bottoms into torpedo which carries the hot metal-to-basic
oxygen converter to convert the hot metal to steel (see Ricketts).
The full blast furnace assembly constitutes not only the blast furnace but also a
heat recovery section which is commonly referred as hot blast stoves. The hot blast
stove supplies hot air to the blast furnace. The air from atmosphere is heated inside
the stoves using blast furnace top gas in cycles. The preheated air coming out of
stoves is called hot blast which is delivered to the blast furnace through tuyeres. In
this way, the energy coming out from the top of blast furnace as top gas is recovered
back to the blast furnace in terms of hot blast. The higher the temperature of the hot
blast, lower will be the fuel–coke/coal requirement in the blast furnace. Thus, the
good recovery in this heat recovery section directly reduces the cost of hot metal and
thus steel (see Satyendra 2015).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 115
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_10
116 R. Agrawal and R. P. Suresh
Wang (2016) presented some work on the oxygen enrichment to hot stoves. The
stoves were heated by the blast furnace top gas coming out of blast furnace top. The
paper presented that oxygen enrichment in the stoves can improve the performance
of the stove. This can be done by increasing the dome temperature, etc. Wang says
“CFD (Computational flow dynamics) modeling work indicated that the oxygen
lance position is one important factor to achieve a uniform mixture of oxygen and
combustion air for the combustion process.”
Zetterholm (2015) developed a dynamic model where thermophysical properties
of both gas and solid were calculated with respect to time and position in the stove.
He says that the model could be used to represent the stove. Oxygen enrichment
was studied in the paper as a major factor to improve the stove operation and in
turn produce hot blast with higher temperature to improve the efficiency of the blast
furnace.
Simkin (2015) developed Mamdani fuzzy control model to reduce the consump-
tion of the natural gas in heating the stoves. The paper says that “One of the main
advantages of fuzzy knowledge base is the ability to use minimum information about
the modeled object.” It is formed on the information about the input and the output
parameters of the stove. It takes into account the heat of combustion going into the
stoves or the total calories of the fuel used to heat the stoves. This takes care of the
problems that arise due to control systems or nonlinear functions related to the blast
air.
Butkarev (2015) presented that the hot blast temperature can be increased by
30–40 °C by improving the automatic control system designed by OAO VNIIMT.
Lin (2007) discussed that on studying the effect of fuel gas preheating on the
thermal efficiency of the hot blast generating system, experimental results revealed
that the efficiency of the hot blast generating process was increased from 75.6% to
78.7%. According to the test results, “the advantage of operating the heat recovery
system was evident in the reduction of fuel gas depletion rate by 822 L/h on an oil
equivalent basis. The annual energy saving over 1.9 million US dollars and the CO2
reduction 15,000 ton/year can be achieved.”
The current state of stove operations is not standardized. Many times, operators
based on their experience take critical decisions. To standardize the decision-making
process in the most optimum way, an analytics research project with one of the
Asia’s largest iron and steel manufacturing plants was kick-started. After examining
the available data, it was observed that the hot blast temperature varied between
1175 °C (10th percentile) and 1204 °C (90th percentile) between January 1, 2018,
and September 30, 2018. The median hot blast temperature was 1192 °C.
Improving Blast Furnace Operations Through Advanced Analytics 117
The purpose of analytics research is to enable the plant operations to operate the
blast furnace stoves in a way that it maximizes the energy recovery in the stoves,
thus increases the hot blast temperature entering the blast furnace and hence reduces
the fuel requirement inside blast furnace. The end result of the research would be to
reduce the variability and increase the median temperature, eventually reducing the
cost of hot metal.
2 Data Availability
We used the data collected from a steel manufacturing process. The data is captured
through the SCADA system (Supervisory Control and Data Acquisition) which is
often called the Level-1 data with almost zero process lag. The Level-2 data with
<1 min lag is present through the process historians. The dataset identified to use for
the project is present at 1 min. level granularity, and 8.5 months of plant operation
data was available which included controllable and non-controllable variables (see
Fig. 1). Variables are as follows: hydrogen%, CO%, CO2 % in blast furnace top gas,
blast furnace top gas flow, blast furnace top gas temperature, combustion air flow to
stoves, combustion air temperature to stoves, wind volume, cold blast temperature,
hot blast temperature, stove dome temperature, stove refractory temperature, stove
exhaust gas temperature and cold blast flow. Please refer below figure to understand
the variables (see Fig. 2).
There are 4 stoves in parallel operation to heat up the blast entering the blast furnace.
Stoves are operated in cycle. First, a stove is heated up to certain temperature and
Heating
Mode
Cooling
Top Gas to
Blast Mode
Stove -1
Furnace
Stove-1
Stove-2
Stock Cold Blast
House
Hot Blast
then cold blast at ~200 °C is heated by passing through the heated stove to ~1200 °C.
In the process of heating the cold blast, the stove’s temperature comes down, it again
is heated and hence the cycle continues. Hence, the cycle is divided into two parts:
heating period and cooling period (Fig. 3).
The control system during the heating and soaking period includes dome temper-
ature, exhaust temperature and air-to-gas ratio. While the refractory temperature
steadily rises as soon as the heating starts, dome temperature on the other hand rises
very steeply and reaches 1300 °C in between 5 and 10 min. The upper limit of dome
temperature is set at 1350 °C after which the cycle stops. The exhaust temperature
measured near the exit of the hot flue gases from stove bottom also steadily rises and
is set at a high limit of 350 °C after which the cycle stops.
Once the dome temperature hits 1300 °C, the heating cycle is over, and soaking
cycle starts. In the heating cycle, only one air-to-gas ratio (A1) is operational which
is controlling the flow to combustion air to the stove. Once, the soaking cycle starts,
the 2nd air-to-gas ratio (A2) also becomes active and the control system varied the
air flow to stoves between the A1 and A2 ratios to keep the dome temperature below
1120 1340
1100 1330
1320
1080
1310
1060 1300
1040 1290
1020 1280
1270 Refractory temp (Pri)
1000
1260 Dome Temp (Sec)
980 1250
960 1240
1/15/2018 6:37
1/15/2018 6:41
1/15/2018 6:45
1/15/2018 6:49
1/15/2018 6:53
1/15/2018 6:57
1/15/2018 7:01
1/15/2018 7:05
1/15/2018 7:09
1/15/2018 7:13
1/15/2018 7:17
1/15/2018 7:21
1/15/2018 7:25
1/15/2018 7:29
1/15/2018 7:33
1/15/2018 7:37
1/15/2018 7:41
1/15/2018 7:45
1/15/2018 7:49
1/15/2018 7:53
1/15/2018 7:57
HeaƟng
Soaking
Fig. 5 Dashboard
1350 °C so that the cycle does not stop prematurely and maximum possible refractory
temperature is reached. As the air-to-gas ratio increases, the exhaust temperature
increases, and as the air-to-gas ratio is reduced, the dome temperature increases. The
A1 and A2 ratios are set by the operators at the start of the heating cycle, and thus
determining optimum A1 and A2 for different stoves is important to maximize the
hot blast temperature.
After studying the plant control system, it was understood that there is no control
during the cooling period. It is the correct handling of control system during the
heating cycle which would yield a longer cooling cycle (longer implies more heat
transfer to cold blast). Each individual heating cycle and cooling cycle are typically
30 min long.
During the heating period of the stoves, there are two controlling parameters
which the operators could handle and manipulate for effective heating of the stoves.
120 R. Agrawal and R. P. Suresh
After rounds of interviews with the operations team and understanding the process
and control system in detail, the next step is to understand the data and finalize the
model structure.
The process is cyclic in nature. Therefore, the analysis would be done on the cycle
level rather than time stamp level. Data was transformed from 1 min level to cycle
level using Python. 8.5 months of data was finally converted to around 6000 cycles
for one stove. The periods in which the plant was under transition state/unsteady
were removed (see Fig. 4).
The red zone above indicates the heating cycle. The heating cycle is demarcated
by the continuous increase in the dome temperature—when the temperature stops
rising, the end of the heating cycle is indicated. During the soaking cycle, the dome
temperature is seen to be sinusoidal in nature. The end of the soaking cycle is indicated
when the refractory temperature starts decreasing. In this way, the minute-level data
was converted to cycle-level data.
3.2 Modeling
The model was developed in Python using scikit-learn library. The cluster validation
was performed based on multiple variables. The first validation was to see if mean-
ingful clustering has taken place based on refractory temperature. It was observed
that refractory temperature in different clusters followed distribution tabulated below
(see Table 1):
As the stoves are heated from blast furnace/top gas, the composition of which is
not controllable, it is imperative to check if the clusters formed in the above table
have similar distribution of the calorific value of the top gas. This is to validate that
the clusters are not modeled on the basis of calorific values as that cannot be the
basis for recommendation.
The best cluster must give best refractory temperature for all the calorific value
ranges. The calorific value was divided into deciles, and average refractory temper-
ature for best cluster was compared with average refractory temperature of the other
clusters. It was found that for all calorific ranges the best cluster had maximum
refractory temperature.
The best cluster’s air-to-gas ratio and blast furnace gas flow for different
calorific values were shared with the client to make the operators follow the model
recommendations.
122 R. Agrawal and R. P. Suresh
Web-based live dashboard was designed on .NET framework which takes live
data from data server and updates the recommendations on real-time basis. Operator
control room training was organized to train the operators to use the dashboard.
Please refer to sample dashboard below (see Fig. 5).
To improve the usage of the model and to evaluate the model performance, it is
very important to calculate the extent of compliance of the operators to the model
recommendations. The compliance was calculated using Python script where the
recommended BFG flow and A1 and A2 ratios are compared with the actual BFG
flows and A1 and A2 ratios. The margin of error has been kept at 2.5%. If the actual
numbers fall in that window of recommended numbers, the operators are called to
be compliant.
The model is set to be retuned in every 1 month as the efficiency of the process
equipment related to the process keeps changing with any change in the process.
The model tuning is done by replacing the oldest data of the 8.5-month window with
the latest one-month data to include any recent change in the process. However, if
any major turnaround/shutdown maintenance activity takes place, which includes
change in any equipment, then complete model data overhaul needs to be done.
Hot trials were carried out for a week. The median hot blast temperature increased
to 1201 °C, 10th percentile temperature increased to 1188 °C and 90th percentile
Improving Blast Furnace Operations Through Advanced Analytics 123
References
Butkarev, A.A. (2015). Boosting the hot-blast temperature in blast furnaces by means of an optimal
control system. Stal No. 3.
Lin, P.-H. (2007). Efficiency improvement of the hot blast generating system by waste heat recovery.
Kaohsiung, Taiwan: Energy and Air Pollution Control Section, New Materials Research and
Development Department, China Steel Corporation.
Ricketts, J. A. How it works: The blast furnace. https://www.thepotteries.org/shelton/blast_furnace.
htm.
Satyendra. (2015). Generation of hot air blast and hot blast stoves. https://ispatguru.com/genera
tion-of-hot-air-blast-and-hot-blast-stoves/.
Simkin, A. (2015). Control model of the heating hot blast stove regenerative chamber based on
fuzzy knowledge with training set. Metallurgical and Mining Industry.
Wang, C. (2016). Modelling and analysis of oxygen enrichment to hot stoves. In The 8th
international conference on applied energy—ICAE2016.
Zetterholm, J. (2015). Model development of a blast furnace stove. In The 7th international
conference on applied energy—ICAE2015.
Food Index Forecasting
1 Business Problem
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 125
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_11
126 K. Dacha et al.
Price index forecasting was used to forecast monthly inflation numbers for 18
product categories namely (Table 1):
2 Data Gathering
All the information used for this analysis has been downloaded using publicly
available, free, external websites.
Data gathering was primarily carried out for the dependent variable (Consumer
Price index). BLS website stores monthly price index/inflation numbers for over 100
food items, and updates are carried out each month. Extensive research was carried
out to find out what can drive price for each product category. Instead of looking into
the individual sub-categories (like ground beef, beef roast and beef steak)—research
was focused on the overall categories (like what affects beef prices in this example).
We can broadly categorize key drivers across major product categories into five
different levels (Fig. 1):
Few insights from the research on affecting price trends:
1. It was observed that feed expenses were high in 2012, and huge number of
cows were slaughtered (Frohlich 2015) as an immediate impact. In the next
Food Index Forecasting 127
two years, number of meat producing cows were quite less in number, and it
significantly affected beef price in 2014. Hence, feed prices (hay, corn, soybean)
were identified as one of the key drivers
2. Butterfat percentage: Milk (Bailey 2017; Cushnahan 2003) butterfat content
lowest at peak production and highest toward the end of lactation
3. Biofuels (Rosillo-Calle et al. 2009; Parcell et al. 2018),1 soaps, washing powders,
personal care products have close interdependence to oilseed and biodiesel
markets. Any changes in either US biodiesel policy or global biodiesel policy
could shock oilseed prices
4. El Niño2 is an abnormal weather pattern caused by the warming of the Pacific
Ocean near the equator, off the coast of South America. In South America, there
is a drastic increase in the risk of flooding on the western coast, while there is an
increase in the risk of droughts on parts of the eastern coast. In eastern countries,
like India and Indonesia, there is an increase in droughts. These affect the fish
and seafood (OECD/FAO 2016) price index a lot.
5. The main drivers for decline of price of the commodities will be the competitive
prices of substitutes (like eggs, chicken, etc.) the slowdown in demand from key
markets due to sluggish economic growth and reduced production and marketing
costs of aquaculture products due to lower transport and feed costs
6. Political situations in Brazil, Indonesia affects the coffee (van den Brom 2020)3
price most as they are the leading coffee producers.
1 An overview of the Edible Oil Markets: Crude Palm Oil vs Soybean Oil” (July 2010).
2 Rinkesh, “What is El Niño?”.
3 Jack Purr, “What affects the price of coffee.
128 K. Dacha et al.
Data for independent variables was downloaded from various data sources for
conducting multivariate time series analysis. Bureau of Labor Statistics (BLS) was
one of the main data sources. Other sources include Federal Reserve Economic Data
(FRED), United States Department of Agriculture (USDA), National Oceanic and
Atmospheric Administration (NOAA), Data World, National Agricultural Statistics
Service (NASS).
o CPI index and unemployment rate were mainly sourced from BLS,
o Import/export from USDA,
o Temperature and rainfall data from NOAA.
3 Model Development
After collecting the data for dependent and independent variables, the first task was to
collate the data in a data frame so that it can be further processed for modeling. Data
from different sources was collated, and time series dataset starting from January
2009 was created. The area of interest in this study was to model the Year-over-
Year (Y–o-Y) inflation numbers for all the dependent variable categories. Due to the
volatile nature of Y–o-Y variable, modeling was done on the actual index data for
all the food categories, and then finally, the forecasted numbers were converted into
Y–o-Y for further analysis.
3.1 Pre-modeling
Given this is a time series dataset, it is highly likely that lagged version of independent
variables might influence the dependent variables the most. Given we are trying to
forecast for future months, lagged independent variables can be directly used (given
data is available), else we can also forecast directly. After the creation of lags for each
of the independent variable, correlation with the dependent variables was calculated
for all the 13 versions of each independent variable, i.e., original variable (Lag 0)
and its 12 lagged versions. For example,
• For Ground Beef Price Index, petrol price (six months’ lag) correlated the most.
• Beef slaughter count lag two variable was the highest correlated variable with
Ground Beef Index with correlation −0.78.
• Lag 1 of shell egg import 1000 dozen is the highest correlated variable with Egg
Index with correlation 0.34.
Data considered for further analysis was in the period Jan 2010–May 2018, and
there were no missing values within this time frame. For providing price index
forecasts, five modeling techniques (univariate and multivariate) were applied. Best
model was chosen from the five techniques based on the applicability, accuracy and
Food Index Forecasting 129
blind validation. Forward forecast inflation (average and interval range) for 12 and
18 months is provided as results.
After identifying all the highly correlated variables for each of the 18 dependent
variables separately, next task was to run models to identify the significant variables
for each of the dependent variable category.
Train, test and forecast periods were created as given in Fig. 2 throughout the
modeling process:
To define train and test period of the study, several test periods were taken into
consideration like: 12 months, 9 months and 6 months, respectively. While observing
the pattern of the test period (12, 9 and 6 months), it was seen that for most of the
categories, distribution of the test set was completely different from that of train
period, and as a result, the forecasts were going in a completely different direction.
To avoid this situation, test periods were reduced to six months to better train the
time series models. To compare the performance of various models, MAAPE was
used. Below is the definition and reasoning behind using this metric.
In time series analysis, mean absolute percentage error (MAPE) is widely used and
is calculated as below.
100% |Actual − Forecast|
MAPE =
n |Actual|
In this case study, Y–o-Y is the dependent variable, and this can take both positive
and negative numbers, and using MAPE as a model evaluation criterion was found
to be not applicable as errors were huge. Example is shown below:
130 K. Dacha et al.
Table 2 Comparison of
Model # Test period YoY Train YoY Test MAAPE
MAAPE across test periods
MAAPE (%) (%)
1 12 18 83
2 9 21 40
3 6 20 22
• Assume for the forecast month June 2018, for the best model, the estimate is
2.50, whereas the actual number is 0.50. In this case, the MAPE would be |0.50
− 2.50|*100/|0.50| = 400% which is not reflecting the actual scenario.
• Hence, alternative model evaluation criteria MAAPE (Kim and Kim 2016) was
used as it is applicable to deal with positive or negative numbers where as MAPE
was not. MAAPE is calculated using the below formula:
100% |Actual − Forecast|
MAAPE = arctan
n |Actual|
• For the above example, MAAPE is 132%. Another advantage is MAAPE is well
defined even if Y–o-Y is zero though MAPE is not.
• Table 2 describes different test periods and the corresponding MAAPE for one of
the food categories index.
Table 3 Modeling
Method Technique
techniques used
Univariate ARIMA (Box and Jenkins 1970)
Univariate Holt-Winters (Chatfield and Yar 1988)
Univariate Exponential smoothing (Broze and Mélard 1990)
Multivariate Regression (Ramcharan 2006)
Multivariate ARIMAX (YuanZheng and Yajing 2007)
Food Index Forecasting 131
• For multivariate time series models, independent variables were forecasted first
to get the final forecast numbers for the dependent variables. For each of the
dependent categories, regression models were run using independent variables
in the training period. Once the model variables were finalized, next step was to
forecast for the test period. This was done in a two step processes. First was to
forecast for the independent variable, and once this was complete, next step was to
get the forecast numbers for the dependent variable from the regression equation.
Forecast of the independent variables was done using the three univariate time
series methods (listed in Table 3), and selected method was the one for which
the test MAAPE was the least. Once all the forecast numbers were handy for all
the independent variables, regression equation was used to get the final forecast
numbers for the dependent variables.
• For ARIMAX model, same set of independent variables and their corresponding
forecasted numbers were used. To select the optimal parameters of the ARIMAX
model, grid search was carried out.
4 Results
Out of the 18 models built for the study, we have achieved greater accuracy (<35%
MAAPE) for 50% of the models, and error range for the rest of the models was
between 40 and 60%.
Among the 18 food categories, majority were stable (Fig. 3) with the overall Y–
o-Y range between ±5%, few categories like eggs (Fig. 4) are a volatile category
with the Y–o-Y range between ±40%, however, our models were robust, and we
achieved great accuracy in such scenarios too.
4%
YOY ARIMAX
YOY Actual
2%
0%
-2%
Fig. 3 Figure showing Y–o-Y plots comparison between Actual (Historical Cheese values shown
as blue dotted line) versus ARIMAX (forecasted values based on the model selected based on
performance as red line)
132 K. Dacha et al.
245
230
215
200
Fig. 4 Price index comparison between Actual (Historical Cheese values shown as blue dotted
line) versus ARIMAX (forecast values based on ARIMAX model as red line) versus Regression
(forecast values based on regression model as green
Below is the summary report of 18 models built with details on final model
selected (based on performance); MAAPE metrics for train and test periods are
provided in Table 4. ARIMAX had been predicting better than rest of the models in
most categories. Beef category predictions were 70–80% accurate, and these were
the main categories. Of all categories, dairy and poultry are having greater accuracy.
Except for pork and potatoes category, rest all category forecasts were in the range of
60–80%. Pork and potatoes categories were highly volatile, and the greatest accuracy
achieved in these cases is about 50%.
5 Conclusion
In this study, we have tried to address food price forecasting using various univariate
and multivariate techniques at monthly level. More than 60 explanatory variables
were tested for each category based on extensive research for forecasting consumer
price indexes of 18 food categories. The forecasting performance of the model is
measured using MAAPE, and accuracy achieved for most of the models is <15%.
This price-forecasting model is useful in capturing economic demand-pull factors
such as food use, substitute prices, feed prices, weather, macro-economic factors
and income in the food price changes. All the data used for the analysis is using
publicly available, external data. The approach used here is robust—as we were able
to capture trend for highly volatile category and stable category likewise and obtain
satisfactory performance.
Food Index Forecasting 133
References
Bailey, H. (2017, September). Dairy risk-management education: Factors that affect U.S farm-gate
milk prices.
Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis: Forecasting and control. San Francisco:
Holden-Day Inc.
Broze, L., & Mélard, G. (1990). Exponential smoothing: Estimation by maximum likelihood.
Journal of Forecasting, 9, 445–455.
Chatfield, C., & Yar, M. (1988). Holt-winters forecasting: Some practical issues. Journal of the
Royal Statistical Society. Series D (The Statistician), 37, 129–140.
Cushnahan, A. (2003, April). Factors influencing milk butterfat concentration.
Frohlich, T. C. (2015, April). States killing the most animals for food. https://www.usatoday.com/
story/money/business/2015/04/15/247-wall-st-states-killing-animals/25807125/.
Kim, S., & Kim, H. (2016, September). A new metric of absolute percentage error for intermittent
demand forecasts.
OECD/FAO. (2016). Fish and seafood. In OECD-FAO Agricultural Outlook 2016–2025.
134 K. Dacha et al.
Parcell, J., Kojima, Y., Roach, A., & Cain W. (2018, January). Global edible vegetable oil market
trends.
Ramcharan, R. (2006). Regressions: Why are economists obessessed with them? Accessed 2011–
12–03.
Rosillo-Calle, F., Pelkmans, l., & Walter, A. (2009, June). A global overview of vegetable oils, with
reference to biodiesel.
van den Brom, J. (2020). Coffee.
YuanZheng, W., & Yajing, X. (2007). Application of multi-variate stable time series model
ARIMAX. Statistics and Operation, 9B, 132–134
Implementing Learning Analytic Tools
in Predicting Students’ Performance
in a Business School
1 Introduction
Developments in the field of information technology with respect to big data have
resulted in disruptive implications across all sectors (Baradwaj and Pal 2011). Data
is now available in abundance, and therefore, there is a need to employ tools and
techniques to mine such data. Data mining tools and techniques have found applica-
tions across various disciplines including customer profiling, fraud detection, DNA
sequencing, etc. (Lauria and Baron 2011). Educational data mining (EDM) and
learning analytics (LA) are two communities evincing a keen interest in how big
data could be exploited for the larger benefit of the education sector (Baker and
Inventado 2014). EDM deals with “developing methods that discover knowledge
from the data originating from educational environments” (Han and Kamber 2006).
Such mining techniques result in pattern recognition which forms the basis for deci-
sion making and support interventions (Siemens et al. 2011) ultimately leading to
optimizing the learning process (Baker and Siemens 2014).
LA on the other hand emphasizes more on the data visualization and human
intervention and has evolved into a critical domain in the education space (Gasevic
et al. 2016). LA is defined as “the measurement, collection, analysis and reporting
of data about learners and their context, for purposes of understanding and opti-
mizing learning and the environment which it occurs” (Ferguson 2012; Elbadrawy
et al. 2016). Research in the field of EDM and LA has clearly demonstrated the
need for understanding the teaching–learning process and using this information for
improving the same (Baker and Inventado 2014; Gasevic et al. 2016).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 135
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_12
136 R. Sujatha and B. Uma Maheswari
2 Purpose of Research
A solid theoretical framework in this domain has not evolved yet. The vast differences
in the kind of learning tools and educational systems in different institutions across
developed versus developing nations clearly delineate the impracticality of a model
which could be generalized across geographies as well as across disciplines (Agudo-
Peregrina et al. 2014). Studies clearly proved that self-regulation of learning (Black
and Deci 2000), self-efficacy (Chung et al. 2002) and information-seeking behavior
(Whitmire 2002) vary with course and discipline. Early identification of students
who are likely to fail in a particular subject gives scope for early interventions and
therefore helps faculty to provide more attention to a specific set of students.
The review of literature showed that existing studies in this domain has been based
on data extracted from learning management system (LMS) (Romero et al. 2013).
One such LA model was developed at Purdue University called Course Signals
(Arnold and Pistilli 2012) which successfully transformed a research agenda into
a practical application. LMS data was used in pattern recognition of user behavior
(Talavera and Gaudioso 2004). Studies have also been undertaken with data obtained
from Moodle (an open-source course management system) (Romero et al. 2013).
Demographic characteristics and course management system usage data was used
to develop predictive machine learning models (Campbell 2007). Unstructured data
obtained from online discussion forums was used to perform sentiment analysis in a
study conducted by Laurie and Timothy (2005). Lykourentzou et al. (2009) used the
scores of a quiz activity to cluster the students using neural networks.
Applying analytics in education is the need of the hour, especially in the context
of a developing economy like India. The inferences drawn from prior studies have
been eagerly accepted by the academic community (Gasevic et al. 2016). Hence, it is
time for educational institutions to use machine learning tools to enhance teaching–
learning experience. This study deploys learning analytics technique using the data
of students undergoing a post-graduate management program and attempts to create
a system of preventive feedback mechanism for faculty and students.
3 Research Objectives
The objective of the study was to create an early intervention mechanism to enhance
students’ performance. This study aims to develop a predictive model using machine
learning algorithms to predict the academic risk of a student passing or failing a
course (binary response) and the marks of the student (continuous data) in a course
based on past data. The research objectives of the study are
o RO1: To develop a model to predict the academic status of a student in a course?
o RO2: To develop a model to predict the grade of a student in the course?
Implementing Learning Analytic Tools in Predicting Students’ … 137
4 Methodology
The learning analytics model adopted in this study is based on supervised learning
algorithm. The training data set had both predictive features including demographic
characteristics of the students, as well as the response feature which is the students’
academic status. The methodological framework developed by Lauria and Baron
(2011) has been adopted in this study. The framework includes five steps such as
data collection, data preparation, data partition, building the models and evaluating
the models.
The study was conducted among post-graduate management students undergoing the
Master of Business Administration (MBA) program. The data was collected during
the admission and during the progression of the course. Demographic data of the
students was collected from the admissions portal and the academic performance of
the students from the examination portal. The academic performance of the students
relating to six foundation courses undertaken by the student in the first semester and
one capstone course strategic management was considered for the purpose of this
study. Past three years data was considered (n = 522).
The data was pre-processed for missing values, outliers and incomplete data. Few
students’ data who left the college was identified and removed from the database.
Next step, the identity of the students represented by names and roll numbers was
removed to ensure anonymity. A few features had to be derived for the purpose
of this study. The data relating to the ‘date of birth’ of the students was extracted
from the portal, and the age of the student at the time of joining the course was
derived. The other derived feature was ‘break in study.’ The data relating to the
undergraduate degree completion year was extracted. This data showed that most of
the students did not have a break in study; therefore, this feature was dichotomized
into ‘break in study,’ Yes/No. Data transformation was done in the case of four
features (community, tenth board, higher secondary board, undergraduate degree).
The feature ‘community’ had seven categories such as OC, BC, MBC, DNC, SC,
ST and Others. A summary of the feature showed a small percentage of the students
belonged to MBC, DNC, SC and ST. Therefore, these categories were combined
with ‘Others’ category resulting in only three categories of ‘community,’ namely
OC, BC and Others. The tenth and the higher secondary board also went through a
similar process to result in four categories, namely Tamil Nadu Board, Kerala Board,
138 R. Sujatha and B. Uma Maheswari
CBSE and Others. As the MBA program did not have any restrictions in terms of
undergraduate degree, this program attracts students from versatile disciplines. This
feature also had to be transformed into three major groups. namely Arts, Engineering
and Science. The features used for model building is shown in Table 1.
The data was partitioned into two sets, the training dataset and testing dataset. Eighty
percentage of the dataset was used for training the model, and the model was tested
using the remaining twenty percentage of the data set.
Logistic regression was used to predict the response variable. Logistic regression is
a generalized linear model, where the response variable is a function of the linear
combination of all the predictor variables (Lauria and Baron 2011). The categorical
predictor variables used in this study include age, gender, community, tenth board,
higher secondary board, undergraduate degree and break in study. The continuous
predictor variables include tenth percentage, higher secondary percentage, under-
graduate percentage, work experience and entrance exam score. The response vari-
able in this study is the academic status which is denoted by pass (50% or more
marks) or fail (less than 50% marks) (Palmer 2013; Barker and Sharkey 2012). The
academic status of all the six foundation courses was predicted using this model.
The prediction of students’ marks in capstone course is done using stepwise multiple
linear regression. The coefficients derived for each of the predictor variables helped
in identifying which of these variables influences the response variable.
The accuracy of the logistic regression model can be evaluated based on three metrics
such as accuracy, specificity and sensitivity which are derived from the confusion
matrix. In this study the overall accuracy [(TP + TN)/(TP + TN + FP + FN)] is
not considered as a metric for model evaluation. Instead the focus is on sensitivity
[TP/(TP + FN)] and specificity [(TN/(TN + FP)], where TP stands for True Positive,
TN for True Negative, FP for False Positive and FN for False Negative. The multiple
linear regression model developed using stepwise regression was validated through
tenfold cross-validation technique. This technique uses ten rounds of cross-validation
using multiple cross-validation training and testing sets. The result of this technique
estimates the validity of the machine learning model.
140 R. Sujatha and B. Uma Maheswari
The results of frequency analysis depicted in Table 2 showed almost equal repre-
sentation of male students (55%) and female students (45%). Descriptive analysis
related to the socio-economic status of the students showed that 52% of the students
belonged to BC category. Students who have undergone tenth and higher secondary
from Tamil Nadu Board represented 70% and 79%, respectively. Students with Engi-
neering as their undergraduate study was 62% which is the highest compared to Arts
(31%) and Science (7%). Thirty-nine percentage of the students have a break in their
study indicating that they would have taken up a job after their undergraduation.
Before deciding on the modeling strategy, it was essential to understand the corre-
lation between response and predictor variables. A scatter plot was used to identify
the correlation. The scatter plot between the response variable (marks obtained in
strategic management course) and other predictor variables like tenth percentage,
higher secondary percentage, undergraduate percentage, work experience, entrance
exam scores is shown in Fig. 1. Figure 1 shows a positive relationship between marks
obtained in strategic management course and the other predictor variables.
Implementing Learning Analytic Tools in Predicting Students’ … 141
The scatter plot between the response variable (marks obtained in strategic
management course) and predictor variables (marks obtained in foundation courses)
is shown in Fig. 2. Figure 2 shows a positive relationship between marks obtained in
strategic management courses and marks of foundation courses like organizational
behavior, business environment, managerial economics, accounting for managers,
business communication and quantitative techniques.
Logistic regression was used to build the model. Sensitivity and specificity scores
of this model was used for predicting the academic status of the foundation courses.
In this study, it is important to note that the model so developed should be capable
of predicting a student who has failed in the course as “FAIL,” and more important
that the model does not predict a student who actually failed as “PASS.” There-
fore, specificity as a metric gains more prominence than sensitivity. The speci-
ficity of the logistic regression models developed for the foundation courses was
82.54% for organizational behavior, 59.57% for business environment, 92.75% for
managerial economics, 60.42% for accounting for managers, 92.47% for business
communication and 87.01% for quantitative techniques.
Multiple linear regression model was deployed to capture the unique contribution
of the predictor variables in explaining the variation in the response variable. Since
this study had six categorical variables which could not be included directly into the
model, the categorical variables were re-coded using dummy variables. For example,
since the variable undergraduate degree had three categories, i.e., Arts, Engineering
and Science, two (n − 1) dummy variables were included. The same process was
applied for all the other categorical variables.
In stepwise regression, the entering criterion for a new variable to enter the model
is based on the smallest p value of the partial F test and the removal criterion for a
variable is based on the β value. In this study, α = 0.05 was considered and if the
p value < α then the variable was entered in to the model, and if the p value > β =
0.10, the variable will be excluded from the model. At each stage, the variable was
either entered into the model or removed from the model.
The stepwise regression model excluded the variables such as community, tenth
board, higher secondary board, break in study, work experience, entrance exam
scores. The final model had retained the variables gender, tenth percentage, higher
secondary percentage, undergraduate degree percentage, and the marks of all the six
Implementing Learning Analytic Tools in Predicting Students’ … 143
foundation courses. The p values of these variables are less than 0.05 indicating the
significance of these variables in the model. The results of final regression model is
given in Table 3. The model was further validated using the tenfold cross-validation
technique. The root mean square error is 5.6.
The results of this study indicate that learning analytics could be effectively imple-
mented in enhancing the quality of teaching–learning experience (Macfadyen and
Dawson 2012). In this paper, two predictive models were used to predict academic
risk of students who were not performing well in the course as an early intervention
mechanism. In the first part, logistic regression was used to identify the academic
status of foundation courses in the first semester. Data obtained during the admission
process is used as input for model building. Since an MBA program is open for all
streams of undergraduate studies, it is essential to have an early intervention in order
to ensure a smooth progression of the students into the second semester where they
are introduced to advanced management courses.
In the second part of the study, the stepwise regression model was used to predict
the marks of the students in capstone course. The results showed that as the students’
Implementing Learning Analytic Tools in Predicting Students’ … 145
progress into second semester courses, the tenth and higher secondary board become
irrelevant. Performance in the first semester courses greatly influences the results of
the second semester. The student who scored well in the first semester also scored
well in the second semester. Therefore, this early intervention would help enhance
student performance, thereby preparing him to face forthcoming semesters more
confidently. This understanding further helps students to select courses in which
they can perform better.
Model deployment would help build a transparent system by which both the
stakeholders, faculty and student would get insights about the students’ progress.
This study could be further extended to all courses in the forthcoming semesters.
This would gradually evolve into a learning analytics system which can be inbuilt in
to the curriculum. Further, this model could be extended to predict the probability
of the students succeeding in placement. Deployment of the models developed in
this study would go a long way in not only enhancing students’ performance but
also more fruitful faculty engagement. Embedding analytics in the education system
would transform the education landscape to greater heights.
References
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 147
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_13
148 T. Mukherjee et al.
Consider a clinical trial with t (>2) competing treatments, where the patient outcome
is circular in nature. Unlike linear responses , circular responses cannot be compared
directly and hence identifying a “better” patient response requires further considera-
tion. In fact, circular responses are periodic in nature and hence in circular set-up, the
responses 20◦ and 340◦ are identical in effect. Consequently, fallacious conclusions
may be reached if such responses are analysed using existing methods. To avoid
such impediment, the comparison among circular treatment responses is made with
respect to a preferred direction, which is treated as a reference point. In general, a
preferred direction should be chosen by practitioners as per the requirement of the
study. A preferred direction can be chosen in multiple ways. For example, in medical
studies related to shoulder movement, it is usually seen that a perfect shoulder allows
90◦ of internal rotation (Jain et al. 2013), and the preferred direction should be taken
as 90◦ in that context. However, preferred direction can also be data driven.
An Optimal Response-Adaptive Design for Multi-treatment … 149
P(δk,i+1 |Fi ) = ρk (
θ (i) ),
where ρk (θ (i) ) is a strongly consistent estimator of ρk based on the available data up
to and including the ith subject. In practice, we use sequentially updated maximum
likelihood estimators and plug it into the allocation function at every stage to calculate
the allocation probabilities.
Since for any allocation design, primary concern is ethics, we study the behaviour
of the observed proportion of allocation to different treatments. If we denote the
number of allocations by the proposed design to treatment k out of n assignments
n
by Nkn = i=1 δk,i , the observed allocation proportion to treatment k is simply Nnkn .
Then under certain widely satisfied restrictions Hu et al. (2004) on the response
distribution and continuity of ρk (θ1 , θ2 , . . . , θk ) in each of its arguments for every
k = 1, 2, . . . , t, we have the following result.
Result: As n → ∞
Nkn
→ ρk (θ)
n
almost surely for each k = 1,2, …t
An Optimal Response-Adaptive Design for Multi-treatment … 151
4 Performance Evaluation
Performance of any allocation design needs to be assessed in the light of both ethics
and optimality. An allocation function exhibits strong ethical perspective if it allo-
cates higher number of patients to the better performing treatment arm. In this context,
the expected allocation proportions (EAPs), defined by E( Nnkn ), k = 1, 2, . . . , t, can
be regarded as a measure of ethics, where the higher the value of EAP for better
performing treatment arm is the indicator for ethical impact of the allocation design.
Again to measure efficiency, we use the power of a relevant test of equality of treat-
ment effects. But the concerned test is not a simple adaptation of the usual test of
homogeneity for linear responses. In fact, for circular responses if μk is the mean
direction associated with the kth treatment, then treatments j and k are equally effec-
tive if d(μk , 0) = d(μ j , 0) or equivalently if μk = μ j ( mod 2π) or μk = 2π − μ j (
mod 2π). However, the distance functions are linear in nature, and hence as an alter-
native, we consider testing the null
H0 : d(μ1 , 0) = d(μ2 , 0) = · · · d(μt , 0) against the alternative H1 : at least one
inequality in H0 .
Assuming treatment 1 as experimental and others as existing, we define the
contrast-based homogeneity test statistic
−1
Tn = (Hd̂)T ˆ HT
H (Hd̂),
d̂
where ⎛ ⎞
d(μ̂1 , 0)
⎜d(μ̂2 , 0)⎟
t×1 ⎜ ⎟
d̂ =⎜ .. ⎟,
⎝ . ⎠
d(μ̂t , 0)
⎡ ⎤
1 −1 0 ··· 0
⎢1 0 −1 ··· 0 ⎥
¯ ⎢ ⎥
Ht−1×t = ⎢. .. .. .. ⎥,
⎣ .. . . . ⎦
1 0 0 · · · −1
ˆ is the estimated dispersion matrix of d̂ t×1 , and μ̂k is a strongly consistent estimator
d̂
of μk based on n observations, generated through the proposed adaptive allocation
design. Naturally, larger value of Tn indicates departure from the null hypothesis and
hence a right tailed test based on Tn is appropriate to test H0 against H1 .
152 T. Mukherjee et al.
Now to evaluate the proposed procedure from a real clinical perspective, we consider
a real trial on small incision cataract surgery (Bakshi 2010). We take into account
three competing treatments, namely snare technique (see Basti 1993), irrigating vec-
tis technique (see Masket 2004) and torsional phacoemulsification (see Mackool and
Brint 2004) based on 19, 18 and 16 observations, respectively. Responses corre-
sponding to each treatment are circular in nature, and hence the trial is appropriate to
154 T. Mukherjee et al.
(a) κ1 = 2, κ2 = 2, κ3 = 2, κ4 = 2 (b) κ1 = 1, κ2 = 1, κ3 = 1, κ4 = 1
(c) κ1 = 1, κ2 = 1, κ3 = 2, κ4 = 2 (d) κ1 = 2, κ2 = 2, κ3 = 1, κ4 = 1
Fig. 1 Limiting allocation proportion von Mises response for four treatments
judge the performance of the proposed allocation. The responses obtained from these
three types of surgical interventions, namely snare, irrigating vectis and torsional
phacoemulsification techniques, are assumed to follow von Mises with parameters
(μs , κs ), (μv , κv ) and (μt , κt ), respectively, and rationale behind such assumption
is verified by Watson’s goodness of fit test Mardia and Jupp (2004). In the light
of this three independent competing treatments, the proposed allocation design is
redesigned with the following parameter choices, estimated from the available data
points.
An Optimal Response-Adaptive Design for Multi-treatment … 155
For the snare technique, parameters are estimated as μ̂s = 20.67◦ , κ̂s = 1.59;
for irrigating vectis, these estimates are μ̂v = 52.71◦ , κ̂v = 1.27; for torsional pha-
coemulsification, the estimates of the parameters are μ̂t = 2.29◦ , κ̂t = 4.99, respec-
tively. As far as distance from preferred direction is concerned, torsional phacoemul-
sification appears to be much better than its competitors followed by snare’s tech-
nique. In addition, torsional phacoemulsification has significantly higher concentra-
tion over others. Thus, the treatment clearly emerges as the best one. From Tables
3 and 4, we find that the proposed optimal allocation design produces about 23%
higher EAP to the superior treatment torsional phacoemulsification and reduced the
EAP for the other treatments as compared to the original allocation. This naturally
shows the ethical impact of the proposed optimal response-adaptive allocation and
hence makes the proposed allocation desirable in real clinical trial.
6 Concluding Remarks
The current work develops an optimal treatment allocation design for multiple arms
by minimizing total number of treatment failures subject to fixed precision. Although
essence of the proposed design is based on ethical point of view, the optimality
of inference of treatment effect detection is not compromised. In fact, it is well
competing with equal allocation design. However, no covariate effect is studied here,
which is left for future consideration.
Acknowledgements The authors of this paper would like to thank the anonymous referees for
their valuable comments towards the betterment of the current work.
References
Atkinson, A. C., & Biswas, A. (2014). Randomised response-adaptive designs in clinical trials.
Boca Raton: CRC Press.
Bakshi, P. (2010). Evaluation of various surgical techniques in Brunescent cataracts (Unpublished
thesis). Disha Eye Hospital, India.
Basti, S., Vasavada, A. R., Thomas, R., & Padhmanabhan, P. (1993). Extracapsular cataract extrac-
tion: Surgical techniques. Indian Journal Ophthalmology, 41, 195–210.
156 T. Mukherjee et al.
Bazaraa, M., Sherali, H., & Shetty, C. M. (2006). Nonlinear programming: Theory and algorithms
(3rd edn.). Chichester, London: Wiley.
Biswas, A., Bhattacharya, R., Mukherjee, T. (2017). An adaptive allocation design for circular
treatment outcome. Journal of Statistical Theory and Practice. https://doi.org/10.1080/15598608.
2017.1307147.
Biswas, A., & Coad, D. S. (2005). A general multi-treatment adaptive design for multivariate
responses. Sequential Analysis, 24, 139–158.
Biswas, A., Dutta, S., Laha, A. K., & Bakshi, P. K. (2015). Response-adaptive allocation for circular
data. Journal of Biopharmaceutical Statistics, 25, 830–842.
Fisher, N. I. (1993). Statistical analysis of circular data. Cambridge: Cambridge University Press.
Hu, F., & Zhang, L. X. (2004). Asymptotic properties of doubly adaptive biased coin design for
multi-treatment clinical trials. Annals of Statistics, 32, 268–301.
Jain, N. B., Wilcox, III, R. B., & I. I. I., Katz, J. N., & Higgins, L. D., (2013). Clinical examination
of the rotator cuff. PM& R, 5, 45–56.
Jammalamadaka, S. R., & SenGupta, A. (2001). Topics in circular statistics. Singapore: World
Scientific.
Mackool, R. J., & Brint, S. F. (2004). AquaLase: A new technology for cataract extraction. Current
Opinion Ophthalmology, 15, 40–43.
Mardia, K. V., & Jupp, P. E. (2004). Directional statistics. Chichester, London: Wiley.
Masket, S. (2004). The beginning of modern cataract surgery. The evaluation of small incision
cataract surgery—A short history of ophthalmologists in progress. Cataract and Refractory Surg-
eries Today, 77–80.
Rosenberger, W. F., & Lachin, J. L. (2002). Randomisation in clinical trials: Theory and practice.
New York: Wiley.
Silvey, S. (1980). Optimal designs: An introduction to the theory for parameter estimation. Springer
Texts.
Stochastic Comparisons of Systems with
Heterogeneous Log-Logistic Components
1 Introduction
S. Ghosh (B)
Department of Mathematical Statistics and Actuarial Science, University of the Free State,
Bloemfontein, South Africa
e-mail: [email protected]
P. Majumder
Department of Mathematics, Indian Institute of Technology Bombay, Mumbai, India
e-mail: [email protected]
M. Mitra
Department of Mathematics, Indian Institute of Engineering Science and Technology,
Howrah, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 157
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_14
158 S. Ghosh et al.
tribution function and its quite flexible hazard rate function. It is a good alternative
to the Weibull, whose hazard rate function is either increasing or decreasing, i.e.,
monotonic, depending on the value of its shape parameter. As such, the use of the
Weibull distribution may be inappropriate where the course of the disease is such that
mortality reaches a peak after some finite period and then slowly declines. Addition-
ally, the LLD is also connected to extreme value distributions. As showed by Lawless
(1986), the Weibull distribution has paramount importance in reliability theory as
it is the only distribution that belongs to two families of extreme value distribu-
tions, each of which has essential qualities for the study of proportional hazard and
accelerated failure times. Thus, the LLD possesses the nice characteristic of being a
representative of both these families.
A random variable (r.v.) X is said to have the LLD with shape parameter α and
scale parameter γ, written as LLD(α, γ), if its probability density function (pdf) is
given by
αγ(γx)α−1
f (x; α, γ) = , x ≥ 0, (α > 0, γ > 0). (1.1)
(1 + (γx)α )2
Just as one gets the log-normal and log-Pearson distributions from normal and Pear-
son distribution, LLD is obtained by taking the logarithmic transformation of the
logistic distribution. The LLD is also a special case of the ‘kappa distributions’
introduced by Mielke and Johnson (1973). Another interesting fact is that LLD can
also be obtained from the ratio of two independent Stacy’s generalized gamma vari-
ables (see Malik 1967; Block and Rao 1973). Even though different properties of
this distribution have been explored intensely by many researchers, the stochastic
comparisons of their extreme order statistics have not been studied so far. This is the
primary motivation behind the present work.
But first, a few words about order statistics which occupy a place of remarkable
importance in both theory and practice. It play a vital role in many areas including
reliability theory, economics, management science, operations research, insurance,
hydrology, etc., and have received a lot of attention in the literature during the last
several decades [(see, e.g., the two encyclopedic volumes by Balakrishnan and Rao
(1998a, b)]. Let X1:n ≤ · · · ≤ Xn:n represent the order statistics corresponding to the
n independent random variables (r.v.’s) X1 , . . . , Xn .
It is a well-known fact that the kth order statistic Xk:n represents the lifetime of a
(n − k + 1)-out-of-n system which happens to be a suitable structure for redundancy
that has been studied by many researchers. Series and parallel systems, which are the
building blocks of many complex coherent systems, are particular cases of a k-out-
of-n system. A series system can be regarded as a n-out-of-n system, while a parallel
system is a 1-out-of-n system. In the past two decades, a large volume of work has
been carried out to compare the lifetimes of the series and parallel systems formed
with components from various parametric models; see Fang and Zhang (2015), Zhao
and Balakrishnan (2011), Fang and Balakrishnan (2016), Li and Li (2015), Torrado
(2015), Torrado and Kochar (2015), Kundu and Chowdhury (2016), Nadarajah et al.
(2017), Majumder et al. (2020) and the references therein.
Stochastic Comparisons of Systems with Heterogeneous … 159
Here, we investigate comparison results between the lifetimes of series and par-
allel systems formed with LLD samples in terms of different ordering notions such
as stochastic order, hazard rate order, reversed hazard rate order, and likelihood ratio
order. These orders are widely used in the literature for fair and reasonable com-
parison (see Shaked and Shanthikumar 2007). The rest of the paper is presented as
follows. Preliminary definitions and useful lemmas can be found in Sect. 2. In Sect. 3,
we discuss the comparison of lifetimes of parallel systems with heterogeneous LLD
components. We also study the comparison in the case of the multiple-outlier LLD
model. In Sect. 4, ordering properties are discussed for the lifetimes of series systems
with heterogeneous LLD components.
Throughout this article, ‘increasing’ and ‘decreasing’ mean ‘nondecreasing’ and
sign
‘nonincreasing,’ respectively, and the notation f (x) = g(x) implies that f (x) and
g(x) are equal in sign.
Here, we review some definitions and various notions of stochastic orders and
majorization concepts.
Definition 1 (Shaked and Shanthikumar 2007) Let X and Y be two absolutely con-
tinuous r.v.’s with cumulative distribution functions (cdfs) F(·) and G(·), survival
functions F̄(·) and Ḡ(·), pdfs f (·) and g(·), hazard rates hF (·) and hG (·), and reverse
hazard rate functions rF (·) and rG (·), respectively.
(i) If F̄(x) ≤ Ḡ(x) for all x ≥ 0, then X is smaller than Y in the usual stochastic
order, denoted by X ≤st Y .
(ii) If Ḡ(x)/F̄(x) is increasing in x ≥ 0, then X is smaller than Y in the hazard rate
order, denoted by X ≤hr Y .
(iii) If G(x)/F(x) is increasing in x ≥ 0, then X is smaller than Y in the reversed
hazard rate order, denoted by X ≤rh Y .
(iv) If g(x)/f (x) is increasing in x ≥ 0, then X is smaller than Y in the likelihood
ratio order, denoted by X ≤lr Y .
and
X ≤lr Y =⇒ X ≤rh Y =⇒ X ≤st Y
but the opposite implications do not hold in general. Also, X ≤hr Y X ≤rh Y .
The notion of majorization is a key concept in the theory of stochastic inequalities.
Let (x(1) , x(2) , . . . , x(n) ) denote the components of the vector x = (x1 , x2 , . . . , xn )
160 S. Ghosh et al.
and
n
n
x(i) = y(i) .
i=1 i=1
j
j
x(i) ≥ y(i) for j = 1, . . . , n.
i=1 i=1
Clearly,
m
x y =⇒ x w y. (2.1)
This section considers stochastic comparisons between the lifetimes of parallel sys-
tems whose components arise from two sets of heterogeneous LLD samples with a
common shape parameter but different scale parameters and vice versa.
Let Xγi for i = 1, . . . , n be n independent nonnegative r.v.’s following LLD(α, γi )
with density function given by (1.1). Let the lifetime of the parallel system formed
γ
from Xγ1 , Xγ2 , . . . , Xγn be Xn:n . Then, its distribution and density functions are given
by
n n
n
γ γ
Fn:n (x) = Fγi (x), fn:n (x) = Fγi (x) rFγi (x),
i=1 i=1 i=1
γ
fn:n (x) n
γ
rn:n (x) = γ = rFγi (x).
Fn:n (x) i=1
At first, we compare two different parallel systems with common shape parameter
under reversed hazard rate ordering.
Theorem 1 For i = 1, 2, . . . , n, let Xγi and Xβi be two sets of independent r.v.’s such
that Xγi ∼ LLD(α, γi ) and Xβi ∼ LLD(α, βi ) where γi , βi > 0. Then for 0 < α ≤ 1,
γ β
(γ1 , . . . , γn )w (β1 , . . . , βn ) =⇒ Xn:n ≤rh Xn:n .
γ
Proof Fix x ≥ 0. The reversed hazard rate function of Xn:n is
n
n
γ −1 α −1 −1
rn:n (x) = αx (1 + (γi x) ) = αx κ(γi x)
i=1 i=1
One can have the following corollary which is an easy consequence of the relation
(2.1).
Corollary 1 For i = 1, 2, . . . , n, let Xγi and Xβi be two sets of independent r.v.’s such
that Xγi ∼ LLD(α, γi ) and Xβi ∼ LLD(α, βi ) where γi , βi > 0. Then for 0 < α ≤ 1,
m
γ β
(γ1 , . . . , γn ) (β1 , . . . , βn ) =⇒ Xn:n ≤rh Xn:n .
The above theorem ensures that for two parallel systems having independent
LLD components with common shape parameter, the majorized scale parameter
vector leads to corresponding system lifetime smaller in the sense of the reversed
hazard rate ordering. In the following theorem, we investigate whether the systems
are ordered under likelihood ratio ordering for the case n = 2.
Theorem 2 For i = 1, 2, let Xγi and Xβi be two sets of independent r.v.’s such that
Xγi ∼ LLD(α, γi ) and Xβi ∼ LLD(α, βi ) where γi , βi > 0. Then for 0 < α ≤ 1,
m γ β
(γ1 , γ2 ) (β1 , β2 ) =⇒ X2:2 ≤lr X2:2 .
β β β
f2:2 (x) F (x) r2:2 (x)
γ = 2:2
γ · γ is increasing in x. (3.1)
f2:2 (x) F2:2 (x) r2:2 (x)
β γ
From Corollary 1, we already have F2:2 (x)/F2:2 (x) is increasing in x for 0 < α ≤ 1.
β γ
So, (3.1) implies that it only remains to show that ψ(x) = r2:2 (x)/r2:2 (x) is increasing
β
in x. Now the reversed hazard rate function of X2:2 is given by
β
r2:2 (x) = αx−1 (1 + (β1 x)α )−1 + (1 + (β2 x)α )−1 .
κ(β1 x) + κ(β2 x)
Then, ψ(x) = , where κ(x) is defined as in Lemma 3. Observe that
κ(γ1 x) + κ(γ2 x)
sign
ψ (x) = κ (β1 x) + κ (β2 x) [κ(γ1 x) + κ(γ2 x)] − [κ(β1 x) + κ(β2 x)] κ (γ1 x) + κ (γ2 x)
sign
= [κ(β1 x)η(β1 x) + κ(β2 x)η(β2 x)] [κ(γ1 x) + κ(γ2 x)]
− [κ(β1 x) + κ(β2 x)] [κ(γ1 x)η(γ1 x) + κ(γ2 x)η(γ2 x)]
Stochastic Comparisons of Systems with Heterogeneous … 163
Thus showing that ψ(x) is increasing in x, i.e., ψ (x) ≥ 0 ∀ x ≥ 0, is equivalent to
proving
κ(β1 x)η(β1 x) + κ(β2 x)η(β2 x)
φ(β1 , β2 ) =
κ(β1 x) + κ(β2 x)
is Schur-convex in (β1 , β2 ). Now, the function ϕ(x) defined in Lemma 4 turns out to
be κ(x)η (x), where κ(x) and η (x) are defined as before. We thus have
∂φ sign
= κ (β1 x)η(β1 x) + κ(β1 x)η (β1 x) [κ(β1 x) + κ(β2 x)]
∂β1
− [κ(β1 x)η(β1 x) + κ(β2 x)η(β2 x)] κ (β1 x)
= κ (β1 x)κ(β2 x) [η(β1 x) − η(β2 x)] + ϕ(β1 x) [κ(β1 x) + κ(β2 x)] .
and
∂φ sign
= κ(β1 x)κ (β2 x) [η(β2 x) − η(β1 x)] + ϕ(β2 x) [κ(β1 x) + κ(β2 x)] .
∂β2
Thus,
∂φ ∂φ sign
− = [η(β1 x) − η(β2 x)] κ (β1 x)κ(β2 x) + κ (β2 x)κ(β1 x)
∂β1 ∂β2
+ [κ(β1 x) + κ(β2 x)] [ϕ(β1 x) − ϕ(β2 x)] .
From Lemma 4, ϕ(x) is increasing in x for 0 < α ≤ 1. This together with the obser-
vation β1 ≤ β2 and the facts that κ(x) and η(x) are decreasing functions of x yields
∂φ ∂φ
(β1 − β2 ) − ≥ 0.
∂β1 ∂β2
It is worth mentioning here that for α > 1 the above result may not hold, as the
next example shows.
Example 1 Let (Xγ1 , Xγ2 ) and (Xβ1 , Xβ2 ) be two sets of vectors of heterogeneous
LLD r.v.’s with shape parameter α = 1.5 and scale parameters (γ1 , γ2 ) = (0.5, 1.5)
m β γ
and (β1 , β2 ) = (0.3, 1.7). Then obviously (γ1 , γ2 ) (β1 , β2 ) but f2:2 (x)/f2:2 (x) is
not monotonic as is evident from Fig. 1. Hence in Theorem 2, the restriction over α
is necessary to get the ≤lr order comparison.
Next theorem shows that the likelihood ratio order holds among two parallel
systems formed with heterogeneous LLD components where heterogeneity occurs
in terms of scale parameters.
164 S. Ghosh et al.
0.60
0.58
0.56
0.54
0.52
0.50
0.48
0.46
0.0 0.5 1.0 1.5 2.0
x
β γ
Fig. 1 Plot of f2:2 (x)/f2:2 (x) when α = 1.0, γ = (0.5, 1.5), and β = (0.3, 1.7)
Proof The reversed hazard rate function of X2:2 (x) has the form
r2:2 (x) = αx−1 (1 + (γ1 x)α )−1 + (1 + (γx)α )−1 .
∗
r2:2 (x) (1 + (γx)α )−1 + (1 + (γ ∗ x)α )−1
Let ψ(x) = = . Now utilizing Eq. (3.1) and
r2:2 (x) (1 + (γx)α )−1 + (1 + (γ1 x)α )−1
Theorem 1, it only remains to show that ψ(x) is increasing in x, i.e., ψ (x) ≥ 0, ∀ x ≥
0. Now differentiating ψ(x) with respect to x and using the functions κ(x) and η(x)
defined earlier, we get
sign
ψ (x) = (1 + (γ1 x)α )−1 + (1 + (γx)α )−1 −(γx)α (1 + (γx)α )−2 −(γ ∗ x)α (1 + (γ ∗ x)α )−2
− (1 + (γx)α )−1 + (1 + (γ ∗ x)α )−1 −(γx)α (1 + (γx)α )−2 −(γ1 x)α (1 + (γ1 x)α )−2
= κ(γx)η(γx) + κ(γ ∗ x)η(γ ∗ x) [κ(γ1 x) + κ(γx)]
− κ(γx) + κ(γ ∗ x) [κ(γ1 x)η(γ1 x) + κ(γx)η(γx)]
= κ(γ1 x)κ(γ ∗ x) η(γ ∗ x) − η(γ1 x) + κ(γ1 x)κ(γx) [η(γx) − η(γ1 x)]
+ κ(γx)κ(γ ∗ x) η(γ ∗ x) − η(γx) .
Thus in both the cases, one has ψ(x) is increasing in x. Hence, the theorem follows.
Now we establish a comparison between parallel systems based on two sets of hetero-
geneous LLD r.v.’s with common scale parameter and majorized shape parameters
according to stochastic ordering.
Theorem 4 For i = 1, 2, . . . , n, let Xαi and Xβi be two sets of independent r.v.’s with
Xαi ∼ LLD(αi , γ) and Xβi ∼ LLD(βi , γ) where αi , βi > 0. Then for any γ > 0,
m
α β
(α1 , . . . , αn ) (β1 , . . . , βn ) =⇒ Xn:n ≤st Xn:n .
α
Proof The distribution function of Xn:n is
n
n
n
α
Fn:n (x) = Fαi (x) = (γx)αi (1 + (γx)αi )−1 = ζγx (αi )
i=1 i=1 i=1
0.5
0.4
0.3
0.2
0.1
0.0
1.0 1.5 2.0 2.5 3.0 3.5 4.0
x
α (continuous line) and X β
Fig. 2 Plot of the reversed hazard rate function of X2:2 2:2 (dashed line)
when γ = 2, (α1 , α2 ) = (2.5, 1.5) and (β1 , β2 ) = (1, 3)
Proof In view of Theorem 2, an equivalent form of (3.1) for this model enables us
β γ
to complete the proof by simply showing that rn:n (x)/rn:n (x) is increasing in x. Here,
β
the reversed hazard rate of Yn:n is
β
rn:n (x) = αx−1 p(1 + (β1 x)α )−1 + q(1 + (β2 x)α )−1
where p + q = n. Then,
β
rn:n (x) pκ(β1 x) + qκ(β2 x)
ψ(x) = γ =
rn:n (x) pκ(γ1 x) + qκ(γ2 x)
Stochastic Comparisons of Systems with Heterogeneous … 167
sign
ψ (x) = pκ(γ1 x) + qκ(γ2 x) pκ (β1 x) + qκ (β2 x)
− pκ(β1 x) + qκ(β2 x) pκ (γ1 x) + qκ (γ2 x)
sign
= pκ(β1 x)η(β1 x) + qκ(β2 x)η(β2 x) pκ(γ1 x) + qκ(γ2 x)
− pκ(β1 x) + qκ(β2 x) pκ(γ1 x)η(γ1 x) + qκ(γ2 x)η(γ2 x)
∂φ sign
= κ (β1 x)η(β1 x) + κ(β1 x)η (β1 x) pκ(β1 x) + qκ(β2 x)
∂β1
− pκ(β1 x)η(β1 x) + qκ(β2 x)η(β2 x) κ (β1 x)
= qκ (β1 x)κ(β2 x) [η(β1 x) − η(β2 x)] + ϕ(β1 x) pκ(β1 x) + qκ(β2 x)
∂φ sign
= pκ(β1 x)κ (β2 x) [η(β2 x) − η(β1 x)] + ϕ(β2 x) pκ(β1 x) + qκ(β2 x) .
∂β2
Now,
∂φ ∂φ sign
− = [η(β1 x) − η(β2 x)] qκ (β1 x)κ(β2 x) + pκ (β2 x)κ(β1 x)
∂β1 ∂β2
+ pκ(β1 x) + qκ(β2 x) [ϕ(β1 x) − ϕ(β2 x)]
Since β1 ≤ β2 and κ(x) and η(x) are decreasing in x and ϕ(x) is increasing in x, we
have, for 0 < α ≤ 1
∂φ ∂φ
(β1 − β2 ) − ≥ 0.
∂β1 ∂β2
pendent r.v.’s following the multiple-outlier LLD model such that Yi ∼ LLD(α, γ ∗ )
for i = 1, 2, . . . , p and Yj ∼ LLD(α, γ) for j = p + 1, p + 2, . . . , n with γ ∗ , γ > 0.
Suppose that γ ∗ = min(γ, γ1 , γ ∗ ) then for any α > 0,
w r ∗ (x)
(γ1 , . . . , γ1 , γ, . . . , γ ) (γ ∗ , . . . , γ ∗ , γ, . . . , γ ) =⇒ n:n is increasing in x,
rn:n (x)
p q p q
where p + q = n.
∗
rn:n (x) q(1 + (γx)α )−1 + p(1 + (γ ∗ x)α )−1
Let ψ(x) = = . To show ψ(x) is increas-
rn:n (x) q(1 + (γx)α )−1 + p(1 + (γ1 x)α )−1
ing in x, we consider
sign p q −q(γx)α −p(γ ∗ x)α
ψ (x) = + +
1 + (γ1 x)α 1 + (γx)α (1 + (γx)α )2 (1 + (γ ∗ x)α )2
q p −q(γx)α −p(γ1 x)α
− + +
1 + (γx)α 1 + (γ ∗ x)α (1 + (γx)α )2 (1 + (γ1 x)α )2
= qκ(γx)η(γx) + pκ(γ ∗ x)η(γ ∗ x) pκ(γ1 x) + qκ(γx)
− qκ(γx) + pκ(γ ∗ x) pκ(γ1 x)η(γ1 x) + qκ(γx)η(γx)
= p2 κ(γ1 x)κ(γ ∗ x) η(γ ∗ x) − η(γ1 x) + pqκ(γ1 x)κ(γx) [η(γx) − η(γ1 x)]
+ pqκ(γx)κ(γ ∗ x) η(γ ∗ x) − η(γx) .
Now using the facts that η(x) is decreasing in x, κ(x) ≥ 0 ∀ x > 0 and γ ∗ =
min(γ, γ1 , γ ∗ ), it is easy to show the following: If γ ∗ ≤ γ ≤ γ1 , then ψ (x) ≥ 0 ∀ x >
0. Also, if γ ∗ ≤ γ1 ≤ γ, we have
ψ (x) ≥ p2 κ(γ1 x)κ(γx) η(γ ∗ x) − η(γ1 x) + pqκ(γ1 x)κ(γx) [η(γx) − η(γ1 x)]
+ pqκ(γ1 x)κ(γx) η(γ ∗ x) − η(γx)
= npκ(γx)κ(γ1 x) η(γ ∗ x) − η(γ1 x) ≥ 0.
Thus in both the cases, ψ (x) ≥ 0 ∀ x > 0 and the theorem follows.
Observe that if (γ1 , γ)w (γ ∗ , γ) where γ ∗ = min(γ, γ1 , γ ∗ ) then the parallel sys-
tem formed by LLD(α, γ1 ) and LLD(α, γ) has the smaller lifetime than the system
formed with LLD(α, γ ∗ ) and LLD(α, γ) in the reverse hazard rate sense for any
shape parameter α > 0. Using this fact together with the result in Theorem 1.C.4. of
Shaked and Shanthikumar (2007), one can get the following result.
Stochastic Comparisons of Systems with Heterogeneous … 169
In this section, our main aim is to compare two series systems formed with inde-
pendent heterogeneous LLD samples either having common shape parameter but
different scale parameters or conversely.
γ
Let X1:n denote the lifetime of the series system formed with n independent non-
negative r.v.’s Xγ1 , Xγ2 , . . . , Xγn , where each Xγi ∼ LLD(α, γi ). Then, its survival and
density functions are given by
γ
n
γ
n
n
F̄1:n (x) = F̄γi (x), f1:n (x) = F̄γi (x) hFγi (x),
i=1 i=1 i=1
γ
γ
f1:n (x) n
h1:n (x) = γ = hFγi (x).
F̄1:n (x) i=1
The following theorem shows that under a certain condition on the shape param-
eter, one can compare the lifetimes of two series systems with independent LLD
components according to hazard rate ordering.
Theorem 8 For i = 1, 2, . . . , n, let Xλi and Xβi be two sets of independent r.v.’s with
Xγi ∼ LLD(α, γi ) and Xβi ∼ LLD(α, βi ) where γi , βi > 0. Then for 0 < α ≤ 1,
m γ β
(γ1 , . . . , γn ) (β1 , . . . , βn ) =⇒ X1:n ≤hr X1:n .
γ
Proof Fix x ≥ 0. The hazard rate function of X1:n is
γ
n
n
−1 α α −1
h1:n (x) = αx (γi x) (1 + (γi x) ) = αx−1 τ (γi x)
i=1 i=1
170 S. Ghosh et al.
From the theory of stochastic ordering, we have ≤rh =⇒ ≤st . Thus from the above
theorem, it is clear that the result is also valid in the sense of stochastic ordering.
The next question that arises naturally is whether the comparison can be extended to
likelihood ratio ordering, i.e., if a version of Theorem 3 for comparison in the sense
of likelihood ratio ordering is valid in the context of series systems. The following
example gives the answer in the negative.
Example 3 Let Xγi ∼ LLD(α, γi ) and Xβi ∼ LLD(α, βi ) for i = 1, 2 where the scale
parameters are (γ1 , γ2 ) = (0.5, 1.5) and (β1 , β2 ) = (0.3, 1.7), respectively. Now the
β γ
plot of f1:n (x)/f1:n (x) for the common shape parameters α = 0.5 and α = 1.5 is given
m
in Figs. 3a, b, respectively. Obviously in both the cases, (γ1 , γ2 ) (β1 , β2 ) holds but
γ β β γ
X2:2 lr X2:2 since in both the cases f1:2 (x)/f1:2 (x) is not a monotonic function.
Now we consider series systems having heterogeneous LLD components with com-
mon scale parameter and different shape parameters (which are also majorized) and
investigate similar results.
Theorem 9 For i = 1, 2, . . . , n, let Xαi and Xβi be two sets of independent r.v.’s with
Xαi ∼ LLD(αi , γ) and Xβi ∼ LLD(βi , γ) where αi , βi > 0. Then for any γ > 0,
m β
α
(α1 , . . . , αn ) (β1 , . . . , βn ) =⇒ X1:n ≥st X1:n .
α
Proof The survival function of X1:n is
n
n
n
α
F̄1:n (x) = F̄αi (x) = (1 + (γx)αi )−1 = υγx (αi )
i=1 i=1 i=1
where υγx (αi ) = (1 + (γx)αi )−1 . To establish the result, it is enough to show that
α
F̄1:n (x) is Schur-concave in (α1 , . . . , αn ). Observe that the function loge υλx (α) is
concave in α for all γ > 0. Then, an argument similar to that of Theorem 4 yields
the result.
Stochastic Comparisons of Systems with Heterogeneous … 171
0.96
0.94
0.92
0.90
0.88
1.08
1.06
1.04
1.02
1.00
0.98
0.96
0.94
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
x
(b)
β γ
Fig. 3 Plot of f1:2 (x)/f1:2 (x) when α = 0.5 (Sub-Fig. a) and α = 1.5 (Sub-Fig. b) for = (0.5, 1.5),
and fi = (0.3, 1.7)
In this type of series system model when we compare further, the following
example illustrates that no such comparison can be made in the sense of hazard rate
ordering.
Example 4 Figure 4 illustrates that stochastic comparison between lifetimes of two
α β
series systems X1:2 and X1:2 with LLD components having common scale parameter
γ = 1 and majorized shape parameters (α1 , α2 ) = (2.5, 1.5) and (β1 , β2 ) = (1, 3)
is not ordered in the sense of hazard rate ordering.
Acknowledgements We thank the anonymous reviewer for his/her helpful comments which have
substantially improved the presentation of the paper. The authors are also grateful to Prof. Arnab
K. Laha, IIMA, for his constant encouragement and words of advice.
172 S. Ghosh et al.
2.0
1.5
1.0
0.5
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
α (continuous line) X β
Fig. 4 Plot of the hazard rate function of X1:2 1:2 (dashed line) when γ = 1,
(α1 , α2 ) = (2.5, 1.5) and (β1 , β2 ) = (1, 3)
References
Balakrishnan, N. (2007). Permanents, order statistics, outlier, and robustness. Revista matemática
complutense, 20(1), 7–107.
Balakrishnan, N., & Rao, C. R. (1998a). Handbook of statistics, in: Order statistics: Applications
(Vol. 17). Elsevier, Amsterdam.
Balakrishnan, N., & Rao, C. R. (1998b). Handbook of statistics, in: Order statistics: Theory and
methods (Vol. 16). Elsevier, Amsterdam.
Bennet, S. (1983). Log-logistic regression models for survival data. Applied Statistics, 32, 165–171.
Block, H. W., & Rao, B. R. (1973). A beta warning-time distribution and a distended beta distribu-
tion. Sankhya B, 35, 79–84.
Fahim, A., & Smail, M. (2006). Fitting the log-logistic distribution by generalized moments. Journal
of Hydrology, 328, 694–703.
Fang, L., & Balakrishnan, N. (2016). Ordering results for the smallest and largest order statistics
from independent heterogeneous exponential-Weibull random variables. Statistics, 50(6), 1195–
1205.
Fang, L., & Zhang, X. (2015). Stochastic comparisons of parallel systems with exponentiated
Weibull components. Statistics and Probability Letters, 97, 25–31.
Fisk, P. (1961). The graduation of income distributions. Econometrica, 29, 171–185.
Gago-Benítez, A., Fernández-Madrigal, J.-A., & Cruz-Martín, A. (2013). Log-logistic modelling
of sensory flow delays in networked telerobots. IEEE Sensors, 13(8), 2944–2953.
Kundu, A., & Chowdhury, S. (2016). Ordering properties of order statistics from heterogeneous
exponentiated Weibull models. Statistics and Probability Letters, 114, 119–127.
Lawless, J. F. (1986). A note on lifetime regression models. Biometrika, 73(2), 509–512.
Li, C., & Li, X. (2015). Likelihood ratio order of sample minimum from heterogeneous Weibull
random variables. Statistics and Probability Letters, 97, 46–53.
Majumder, P., Ghosh, S., & Mitra, M. (2020). Ordering results of extreme order statistics from
heterogeneous Gompertz-Makeham random variables. Statistics, 54(3), 595–617.
Malik, H. (1967). Exact distribution of the quotient of independent generalized Gamma variables.
Canadian Mathematical Bulletin, 10, 463–466.
Stochastic Comparisons of Systems with Heterogeneous … 173
Marshall, A., Olkin, I., & Arnold, B. C. (2011). Inequalities: Theory of majorization and its appli-
cations. New York: Springer series in Statistics.
Mielke, P. W., & Johnson, E. (1973). Three-parameter Kappa distribution maximum likelihood
estimates and likelihood ratio tests. Monthly Weather Review, 101, 701–709.
Nadarajah, S., Jiang, X., & Chu, J. (2017). Comparisons of smallest order statistics from Pareto
distributions with different scale and shape parameters. Annals of Operations Research, 254,
191–209.
Shaked, M., & Shanthikumar, J. (2007). Stochastic orders. New York: Springer.
Shoukri, M. M., Mian, I. U. M., & Tracy, D. S. (1988). Sampling properties of estimators of the
log-logistic distribution with application to Canadian precipitation data. The Canadian Journal
of Statistics, 16(3), 223–236.
Torrado, N. (2015). Comparisons of smallest order statistics from Weibull distributions with different
scale and shape parameters. Journal of the Korean Statistical Society, 44, 68–76.
Torrado, N., & Kochar, S. C. (2015). Stochastic order relations among parallel systems from Weibull
distributions. Journal of Applied Probability, 52, 102–116.
Zhao, P., & Balakrishnan, N. (2011). New results on comparison of parallel systems with hetero-
geneous Gamma components. Statistics and Probability Letters, 81, 36–44.
Stacking with Dynamic Weights on Base
Models
1 Introduction
2 Literature Review
Stacking appears in the papers by Wolpert [1] and Breiman [2]. It is widely used by
machine learning practitioners to get better classification by creating ensemble of
multiple models based on different classification techniques.
In the conventional way, stacking is done by running a classification technique
with outputs of the base (or primary) learners as independent variables. The target
variable is kept same. Hence, one can think the overall structure as function of
functions. Classification techniques are run on the dataset, and prediction of the
classes is obtained. Such predicted classes go as independent variables in another
model while the target variable remains same. Usually, the second-level modelling
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 175
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_15
176 B. Mookherjee and A. Halder
is done using logistic regression. Finally, classification happens by the whole nested
structure.
Stacking does not necessarily do well always because of its rigid structure of
applying same set of weights on the base learners in all parts of the data. The weights
are obtained from the second level of model. We have not found any procedure of
stacking where the weights given on base learners vary in different parts of the data
considering the performance of the base learners in different parts. Hence, we have
developed methods which are narrated in Sects. 4 and 5.
In this conventional way, different models are prepared using different techniques.
Then, the predicted classes by different models are used as predictors along with the
observed class as target to run the upper-level model.
The R programming language functions that are used are ‘glm’ for logistic regres-
sion, ‘lda’ for linear discriminant analysis, rpart for decision tree, trainControl and
train for k-nearest neighbors. Stacking by the conventional way is done by logistic
regression.
We propose a new way of stacking where the base learners do not get weights from
the second level model. Such weights are applied on all points over the whole dataset.
Our method is dynamic as it gets different set of weights at different parts of the data.
Stacking with Dynamic Weights on Base Models 177
The method does not need a second level of model. It defines a neighbourhood and
checks number of correct classifications done by different base learners. The weights
provided to different base learners come from the number of correct classifications
done by those base learners in the neighbourhood of the point to be classified. The
detailed calculations can be understood in the Sect. 4.1.
k
w1i p1i
i=1
Step 8 The number obtained at Step 7 can be considered as the probability of event
for O1. Thereafter, classification is done based on a cut-off.
Step 9 Repeat same procedure for other observations from the new dataset like O2,
O3, etc. The neighbourhood for different observations can be different, and
hence, the weights they get for different base learners would be different.
Step 10 Hyperparameter tuning: Run the whole procedure for different values of
n to get optimal value of n. Please note that n is the number of nearest
neighbours as mentioned in Step 4.
178 B. Mookherjee and A. Halder
This method is a variant of the method described in proposed method I. Here, all
points in the neighbourhood do not get same importance. Within the neighbourhood,
distance of a point from the new observation, is considered to provide importance
to the point while calculating weights of the base learners. Adjustment factors are
calculated which are higher for a closer point to the new observation in comparison to
a distant point. These adjustment factors are used to get an adjusted count of correct
classifications by different base learners. The weights provided to different base
learners come from the number of adjusted correct classifications done by those base
learners in the neighbourhood of the new observation to be classified. The detailed
calculations can be understood in Sect. 5.1.
Step 12. Repeat same procedure for other observations from the new dataset like O2,
O3, etc. The neighbourhood for different observations can be different, and
hence, the weights they get for different base learners would be different.
Step 13. Hyperparameter tuning: Run the whole procedure for different values of
n to get optimal value of n.
6 Findings
Table 1 Evaluation of the results on the training dataset of wholesale customer data
Techniques Accuracy (%) Precision (%) Recall (%)
Logistic regression 90 82 90
Linear discriminant analysis 85 94 58
Decision tree 94 93 88
K-nearest neighbor (K = 15) 92 87 89
Stacking by conventional way 94 93 89
Chosen value of K is the one where highest accuracy obtained when different values of K are tried
Table 2 Evaluation of the results on the test dataset of wholesale customer data
Techniques AUC Accuracy (%) Precision (%) Recall (%)
Logistic regression 0.97 92 84 93
Linear discriminant analysis 0.97 88 95 64
Decision tree 0.94 92 84 93
K-nearest neighbor (K = 15) 0.97 92 84 93
Stacking by conventional way 0.96 92 84 93
The data is of female patients of at least 21 years old of Pima Indian heritage [4].
It is of 768 observations. The data has information of patients about number of
pregnancies, plasma glucose concentration at 2 h in an oral glucose tolerance test,
Stacking with Dynamic Weights on Base Models 181
Table 3 Evaluation of the results on the training dataset of Pima Indians diabetes data
Techniques Accuracy (%) Precision (%) Recall (%)
Logistic regression 74 60 80
Linear discriminant analysis 77 73 56
Decision tree 85 84 71
K-nearest neighbor (K = 17) 79 76 58
Stacking by conventional way 85 80 75
Chosen value of K is the one where highest accuracy obtained when different values of K are tried
Table 4 Evaluation of the results on the test dataset of Pima Indians diabetes data
Techniques AUC Accuracy (%) Precision (%) Recall (%)
Logistic regression 0.87 77 61 91
Linear discriminant analysis 0.87 77 69 65
Decision tree 0.77 75 65 59
K-nearest neighbor (K = 17) 0.80 72 63 50
Stacking by conventional way 0.81 75 65 65
diastolic blood pressure, triceps skinfold thickness (mm), 2 h serum insulin (mu
U/ml), body mass index (weight in kg/(height in m)ˆ2), diabetes pedigree function
and age (years). Diabetes (Class) is the target variable, which is a binary nominal
variable having categories ‘diabetic’ and ‘non-diabetic’.
We have used different techniques aiming classification of the categories of the
target variable ‘diabetes (class)’. The models provide probability of being ‘diabetic’
(Tables 3 and 4).
Here, logistic regression performs much better than other techniques. Surpris-
ingly, performance of logistic regression is much better in the test data set than
training dataset. In case of linear discriminant analysis, its precision is 8% higher
than the precision of logistic regression, but recall is 26% lower than recall of logistic
regression. Performance of other techniques are not satisfactory, and thus, conven-
tional stacking does not cause any benefit to the results when compared with logistic
regression. However, both of the new methods of stacking does much better than the
conventional way of stacking in recall though some compromise is there in preci-
sion. Accuracy, precision and recall on the test dataset in case of stacking using
neighbourhood-based dynamic weights with size of neighbourhood n = 4 (found
optimum among the values of n = 2, 3, 4, …, 25) are 74%, 60%, 81%, respec-
tively, while such measures on the test dataset when stacking using distance-based
dynamic weights with size of neighbourhood n = 6 (found optimum among the
values of n = 2, 3, 4, …, 25) is used are 76%, 61%, 87%, respectively. So, we see
that stacking using distance-based dynamic weights performed better than stacking
using neighbourhood-based dynamic weights where accuracy, precision and recall
got increased by 2%, 1% and 6%, respectively.
182 B. Mookherjee and A. Halder
Table 5 Evaluation of the results on the training dataset of bank note authentication data
Techniques Accuracy (%) Precision (%) Recall (%)
Logistic regression 99 98 99
Linear discriminant analysis 97 95 100
Decision tree 98 97 99
K-nearest neighbor (K = 9) 100 100 100
Stacking by conventional way 100 100 100
Chosen value of K is the one where highest accuracy obtained when different values of K are tried
Table 6 Evaluation of the results on the test dataset of bank note authentication data
Techniques AUC Accuracy (%) Precision (%) Recall (%)
Logistic regression 0.99 99 98 99
Linear discriminant analysis 0.99 99 97 100
Decision tree 0.98 99 98 99
K-nearest neighbor (K = 9) 1.00 100 100 100
Stacking by conventional way 1.00 100 100 100
The data is of different types of flowers [6]. It is of 150 observations. The data has
information about sepal length, sepal width, petal length, petal width and species
type. Species is the categorical variable, having categories ‘setosa’, ‘versicolor’ and
Stacking with Dynamic Weights on Base Models 183
‘virginica’; other variables are continuous in nature which are used as explanatory
variables. Spec_1 is our derived variable used as target variable which is binary in
nature stating whether the species is ‘versicolor’ or not.
We have used different techniques aiming classification of the categories of
the target variable ‘Spec_1’. The models provide probability of being ‘versicolor’
(Tables 7 and 8).
Here, all of the methods of stacking provide 100% in all three measures, namely
accuracy, precision and recall. The size of neighbourhood n used for both of the
proposed methods is 3.
7 Conclusion
Both of the proposed methods of stacking with dynamic weights work better than the
conventional way of stacking since the proposed methods are flexible. The perfor-
mance of stacking by conventional way and the proposed methods is shown below
(Table 9).
184 B. Mookherjee and A. Halder
Though ‘accuracy’ and ‘precision’ are not seen to get improved by the proposed
methods of stacking, but improvement in ‘recall’ is noticed. In the ‘wholesale
customer data’, recall is as high as 93% by conventional stacking. Still by applying
both of the proposed methods, we are able to increase recall by further 3%. In the
Pima Indian Diabetes Data, stacking by neighbourhood-based dynamic weights has
performed poorer in precision by 5% but has increased recall by 16% over stacking by
conventional way. Results of stacking using distance-based dynamic weights is even
more encouraging as we get 22% increase in recall over stacking by conventional
way, and a compromise of 4% is there in precision. Performances of the proposed
methods of stacking are same as the one by conventional way on other two datasets as
Stacking with Dynamic Weights on Base Models 185
the conventional way did not leave any scope of improvement here by hitting 100%
in all of the measures.
Hence, both of the proposed methods ’stacking by neighbourhood-based dynamic
weights’ and ‘stacking by distance-based dynamic weights’ are seen to outperform
conventional way of stacking in recall.
References
Rudra P. Pradhan
1 Introduction
The connotation between infrastructure and economic growth has been the subject
of considerable academic research over the past couple of decades (Holmgren and
Merkel 2017; Pradhan et al. 2018a, b; WDR 1994). Various studies have concentrated
on diverse countries, time periods, statistical techniques and altered proxy variables
which have been used for reviewing the infrastructure-growth relationship (Pradhan
et al. 2015, 2016; Canning 1999; Duggal et al. 1999; Holtz-Eakin and Schwartz
1995). From the available studies, we observe that the nexus between infrastruc-
ture and economic growth is rather indecisive and there is a consent neither on the
existence nor on the direction of causality. A foremost reason for the absence of
consensus is that the Granger causality test in a bivariate framework is likely to be
biased due to the omission of relevant variables affecting infrastructure and economic
growth nexus (Pradhan et al. 2019; Stoilova 2017; Besley and Persson 2013). This
calls for studying the relationship between the two using a multivariate framework.
No doubt there are couple of existing studies that examine the relationship between
the two using a multivariate framework by incorporating different macroeconomic
variables that affect the infrastructure-growth linkage (Pradhan et al. 2017, 2018c;
Barro 1991). In this paper, we intend to examine the relationship between infras-
tructure and economic growth by incorporating taxation into the system. There are
many different ways we can justify the inclusion and importance of taxation to the
infrastructure-growth nexus (see, inter alia, Pradhan et al. 2020; Chauvet and Ferry
2016; Yoshino and Adidhadjaev 2016; Besley and Persson 2013). The remainder of
this paper is organized as follows. Section 2 describes our model and data. Section 3
describes the results. Section 4 offers conclusion.
R. P. Pradhan (B)
Indian Institute of Technology Kharagpur, Kharagpur, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 187
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_16
188 R. P. Pradhan
The paper uses the following panel data modelling to investigate the effect of
infrastructure and taxation on economic growth.
The below regression model is set for this investigation.
3 Empirical Results
The dynamic panel data model specified in Eq. (2) is positioned to estimate the
impact of infrastructure and taxation on per capita economic growth. The estimated
results are shown in Tables 1, 2 and 3, depending upon the use of three subsets.
First subset is for upper middle-income countries (UMICs), second subset is for
lower middle-income countries (LMICs), and third subset is for total middle-income
countries (MICs), combined UMICs and LMICs. Each subset has six different cases,
depending upon the use of six ICT infrastructure indicators, namely TLL, MOB, INU,
INS, FIB, and ICI. These are as follows:
Case I [C1] deals with the relationship between PEGt , PEGt −1 , TAXt , TRAt ,
TLLt , GEXt , GCFt , FDIt , OPEt , INFt , POGt , and HCIt .
1 It includes both ICT infrastructure and transportation infrastructure. ICT infrastructure includes
the use of telephone land lines (TLL), mobile phones (MOB), Internet users (INU), Internet servers
(INS), fixed broadband (FIB), and a composite index (ICI), while transportation infrastructure
includes the use of road length, and railways length. Principal component analysis (PCA) is deployed
to derive the ICI from the other five ICT infrastructure indicators. Detailed derivation is available
from the authors upon request.
2 It is used as tax revenue as a percentage of gross domestic product.
The Effect of Infrastructure and Taxation on Economic Growth … 189
Case 2 [C2] deals with the relationship between PEGt , PEGt −1 , TAXt , TRAt ,
MOBt , GEXt , GCFt , FDIt , OPEt , INFt , POGt , and HCIt .
Case 3 [C3] deals with the relationship between PEGt , PEGt −1 , TAXt , TRAt ,
INUt , GEXt , GCFt , FDIt , OPEt , INFt , POGt , and HCIt .
Case 4 [C4] deals with the relationship between PEGt , PEGt −1 , TAXt , TRAt ,
INSt , GEXt , GCFt , FDIt , OPEt , INFt , POGt , and HCIt .
Case 5 [C5] deals with the relationship between PEGt , PEGt −1 , TAXt , TRAt ,
FIBt , GEXt , GCFt , FDIt , OPEt , INFt , POGt , and HCIt .
Case 6 [C6] deals with the relationship between PEGt , PEGt −1 , TAXt , TRAt , ICIt ,
GEXt , GCFt , FDIt , OPEt , INFt , POGt , and HCIt .
The estimated coefficients of ICT indicators indicate that they are positively asso-
ciated with per capita economic growth in all the three subsets. However, the coeffi-
cients of ICT indicators are statistically significant in few occasions and varies from
case to case (see Tables 1, 2, 3 and cases C1–C6 in each subset). They estimated
coefficients range from 0.002 to 0.022 basis point. The point estimate implies that
a 100% increase in ICT infrastructure (for TLL, MOB, INU, INS, FIB, and ICI) is
associated with 2–22 basis point increase in per capita economic growth.
190 R. P. Pradhan
Table 4 (continued)
Country name Region Income group
Egypt Arab Republic Middle East and North LMICs
Africa
El Salvador Latin America and LMICs
Caribbean
Equilateral Guinea Central America UMICs
Fiji Pacific Ocean UMICs
Gabon Central Africa UMICs
Georgia Western Asia and LMICs
Europe
Ghana Sub-Saharan Africa LMICs
Grenada Latin America and UMICs
Caribbean
Guatemala Central America LMICs
Guyana Latin America and LMICs
Caribbean
Honduras Latin America and LMICs
Caribbean
India South Asia LMICs
Indonesia South East Asia LMICs
Iran Islamic republic West Asia UMICs
Iraq West Asia UMICs
Jamaica Latin America and UMICs
Caribbean
Jordan South West Asia UMICs
Kazakhstan Central Asia UMICs
Kenya Sub-Saharan Africa LMICs
Kiribati East Asia and Pacific LMICs
Kosovo Europe LMICs
Kyrgyz Republic Europe and Central LMICs
Asia
Lao PDR East Asia and Pacific LMICs
Lebanon Middle East UMICs
Lesotho Sub-Saharan Africa LMICs
Libya North Africa UMICs
Macedonia South East Europe UMICs
Malaysia South East Asia UMICs
Maldives South West Asia and UMICs
the Middle East
Marshall Islands Central Pacific Ocean UMICs
(continued)
194 R. P. Pradhan
Table 4 (continued)
Country name Region Income group
Mauritius South East Africa UMICs
Mexico North America UMICs
Montenegro South East Europe UMICs
Micronesia Pacific Ocean LMICs
Mauritania Sub-Saharan Africa LMICs
Micronesia East Asia and Pacific LMICs
Moldova European and Central LMICs
Asia
Mongolia East Asia and Pacific LMICs
Morocco Middle East and North LMICs
Africa
Myanmar East Asia and Pacific LMICs
Namibia South East Asia UMICs
Nauru Pacific Ocean LMICs
Nicaragua Latin America and LMICs
Caribbean
Nigeria Sub-Saharan Africa LMICs
Pakistan South Asia LMICs
Papua New Guinea East Asia and Pacific LMICs
Paraguay South America UMICs
Peru South America UMICs
Philippines East Asia and Pacific LMICs
Romania South Eastern Europe UMICs
Russian Federation Eastern Europe and UMICs
Northern Asia
Samoa Africa LMICs
Sao Tome and Sub-Saharan Africa LMICs
Principe
Serbia South East Europe UMICs
Solomon Islands Sub-Saharan Africa LMICs
Sudan Central Africa LMICs
South Africa Africa UMICs
St’ Lucia Latin America and UMICs
Caribbean
St. Vincent and the Eastern Caribbean UMICs
Grenadines
Suriname South America UMICs
Sri Lanka South Asia LMICs
Swaziland Southern Africa LMICs
(continued)
The Effect of Infrastructure and Taxation on Economic Growth … 195
Table 4 (continued)
Country name Region Income group
Thailand South East Asia UMICs
Timor-Leste East Asia and Pacific LMICs
Tonga Pacific Ocean UMICs
Tunisia Middle East and North LMICs
Africa
Turkey Western Asia UMICs
Turkmenistan Central Asia UMICs
Tuvalu Pacific Ocean UMICs
Ukraine European and Central LMICs
Asia
Uzbekistan European and Central LMICs
Asia
Vanuatu East Asia and Pacific LMICs
Venezuela RB South America UMICs
Vietnam East Asia and Pacific LMICs
West Bank and Gaza Middle East and North LMICs
Africa
Zambia Sub-Saharan Africa LMICs
Note: UMICs is upper middle-income countries; and LMICs is
lower middle-middle inincome countries
some cases, the impact is positive on economic growth, while the impact is negative
in other occasions. That means the findings are consistent with theoretical arguments
and quite robust to different measures of ICT infrastructure including the country
and year fixed effects.
The paper started with two standard questions, “how does infrastructure affect
economic growth?” and “how does tax propensity affect economic growth?” Our
answer to this question is very much certain in the case of infrastructure and is true
for both ICT infrastructure and transport infrastructure. The answer is also equally
certain in the case of taxation, but it shows negative impact on economic growth.
Acknowledgements This paper has benefited from the helpful comments of the anonymous
reviewers and Prof. Arnab K. Laha, the convenor of ICADABAI 2019, and the Editor of this
volume, to whom we are grateful.
See Table 4.
References
Pradhan, R. P., Arvin, M. B., Mittal, J., & Bahmani, S. (2016). Relationships between telecom-
munications infrastructure, capital formation, and economic growth. International Journal of
Technology Management, 70(2–3), 157–176.
Pradhan, R. P., Arvin, M. B., Nair, M., Mittal, J., & Norman, N. R. (2017). Telecommunica-
tions infrastructure and usage and the FDI–growth nexus: Evidence from Asian-21 countries.
Information Technology for Development, 23(2), 235–260.
Pradhan, R. P., Arvin, M. B., Bahmani, S., Hall, J. H., & Bennett, S. E. (2018). Mobile telephony,
economic growth, financial development, foreign direct investment, and imports of ICT goods:
The case of the G-20 countries. Journal of Industrial and Business Economics, 45(2), 279–310.
Pradhan, R. P., Mallik, G., Bagchi, T. P., & Sharma, M. (2018). Information communication
technology penetration and stock markets-growth nexus: From cross country panel evidence?
International Journal of Services Technology and Management, 24(4), 307–337.
Pradhan, R. P., Mallik, G., & Bagchi, T. P. (2018). Information communication technology (ICT)
infrastructure and economic growth: A causality evinced by cross-country panel data. IIMB
Review, 30, 91–103.
Pradhan, R. P., Arvin, M. B., Nair, M., Bennett, S. E., & Hall, J. H. (2019). The information
revolution, innovation diffusion and economic growth: An examination of causal links in european
countries. Quality and Quantity, 53, 1529–1563.
Pradhan, R. P., Arvin, M. B., Nair, M., Bennett, S. E., & Bahmani, S. (2020). Some determinants
and mechanics of economic growth in middle-income countries: The role of ICT infrastruc-
ture development, taxation and other macroeconomic variables. Singapore Economic Review
(forthcoming).
Stoilova, D. (2017). Tax structure and economic growth: Evidence from the European Union.
Contaduria Administracion, 62, 1041–1057.
WDR. (1994). Infrastructure for development, world development report (WDR). Washington DC:
World Bank.
Yoshino, N., & Abidhadjaev, U. (2016). Impact of infrastructure investment on tax: Estimating
spillover effects of the kyushu high-speed rail line in Japan on regional tax revenue. ADBI
Working Paper Series, No. 574, Asian Development Bank Institute (ADBI), Tokyo.
Response Prediction and Ranking Models
for Large-Scale Ecommerce Search
1 Problem Statement
User response prediction is the bread and butter of an ecommerce site. Every ecom-
merce site which is popular is running a response prediction engine behind the
scenes to improve user engagement and to minimize the number of hops or queries
that a user must fire in order to reach the destination item page which best matches
the user’s query. With the dawn of artificial intelligence (AI) and machine learning
(ML), the whole merchandising process, web-commerce carousel product arrange-
ment, personalized search results and user interactions can be driven by the click
of a button. Modern-day ML platforms enable a dynamic cascade of models with
different optimization functions which can be tuned towards a user’s preference, taste
and query trajectory.
In this paper, we talk about how Unbxd search services powers its user engagement
and response prediction behind the scene using a plethora of optimized features across
multiple channels and multiple domains. Elaborate feature engineering is deployed
to understand the user’s propensity to click. The search funnel lifecycle starts with a
personalized search impression, captures a user click, progresses towards a cart and
finally materializes into an order or sale. In this scenario, click through rate (CTR)
modelling is the binary classification task of predicting whether a user would click
given a ranked ordered set of products and conversion rate (CVR) modelling entails
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 199
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_17
200 S. Chatterjee et al.
the binary task of predicting whether user would purchase an item given he has shown
click interest.
Figure 1 demonstrates the view of the user from a search engine’s point and the
view of the merchandizing website from the user’s point. A user might be present
in any device linked through the browser cookie or the device ID and might choose
to initiate a search or browse session for the day; however, the click through and
the conversion might potentially happen at a different device and at a different place
(work or home) during a different time of day. Hence, an intelligent search engine
must be able to stitch the user’s trajectory seamlessly and understand the feature
combinations leading up to an event for better engagement.
2 Literature Survey
drive its business metrics other than just optimizing precision and recall per search. In
(Zhou et al., 2018), researchers at Alibaba attempted to understand the deep interest
graph of a user and the context of an ad, thereby using deep learning to model higher
order feature interactions which drive a click. They have closely modelled the user’s
historical data as a sequence model and built a network which given the current
sequence can closely predict the future interactions of the user in terms of product
affinity and personalized ads. In (Guo et al., 2017), authors mention that during
their study in a mainstream apps market, they found that people often download apps
for food delivery at meal time, suggesting a second-order interaction between app
category and time stamp.
In this paper, we talk about the search business insights that Unbxd has gathered
being one of the largest ecommerce search service providers across domains like
electronics, furniture, fashion, grocery. These insights indicate strong correlations
between user, context, category, time of day features and the performance metrics of
a query. Starting with the business problem, we have implemented distributed models
at scale which now define our AI or ML framework. Together with our inhouse A/B
testing framework, we have demonstrated the capability of our ML models to our
clients, and the overall journey has been summarized in this paper.
3 Algorithm
where r = rank of the product and a and b are constants and m is lookback days which
is number of days in past we want to consider. This approach is a relatively static
approach but captures the recency of clicks, carts, and orders and can be considered
a ranking by popularity score. The composite score acts like an overall boost factor
to be overlaid on indigenous search ranking implemented in Apache Solr Search
Platform (Solr) in order to bubble up the trending products. However, this score is
not nimble enough to adapt to dynamic ranking depending on the device or browser
or query context or location or time of day.
Each impression consists of various attributes extracted from the request side
parameters such as site, query, device, user, time of day, day of week, query category,
location. This impression must be now matched with the Solr retrieved document
attributes like product category, price, keywords, reviews, related products, tokens,
etc. Hence, now the problem morphs into a bipartite graph matching algorithm with
202 S. Chatterjee et al.
certain constraints and measured by the ranking loss function. Such features are
called unigram features; since they only depend on one attribute; we can also use
advanced features like query-dwell-time, time-to-first-click, time-to-first-cart, time-
to-first-order, was-autosuggest-used, Wi-Fi-connection-type-of-user as mentioned in
(Cheng & Cantu-Paz, 2010) depending on the data collection exposed through the
search API.
Once we have collected such unigram features, we can fit the impressions data
complete with the outcome to a logistic regression model as a binomial classification
task which then estimates the probability of click given a new impression based on
its features.
Logistic regression model: Given any input event, we assume that its outcome is
a binary variable; i.e., it is either positive or negative. The logistic regression model
calculates the probability of a positive outcome with the following function:
p
where p = 1/ 1 + e−w x
T
logit( p) = log (2)
1− p
Here, x denotes vector of the features extracted from the input event and w is the
vector of corresponding feature weight that we need to learn. LR model is trained
with gradient descent. Assume that we have observed a set of events X = {xi } and
their outcomes Y = {yi }. Each event xi can be represented with a set of features
{xi j }. We want to find the set of parameters w by maximizing the data likelihood
P(Y|X, w). This is equivalent to minimizing the following loss function:
n
L = − log P(Y |X, w) = log P(yi |xi , w) (3)
i=0
The beta coefficients of the model and ROC curve helps us understand the discrim-
inating ability of the model between the positive and negative samples and ability to
explain CTR through features.
4 Feature Selection
In Fig. 2, we show the factor map of the search session that is available to a third-party
search engine. Some interesting features have been described below:
Data fields of a search session
• Outcome—click: 0/1 for non-click/click (can be cart or order depending on the
model)
• Time Series Features
– hour_of_day: int from 0 to 23, (parse format is YY-MM-DD-HH from
session_time, so 14091123 means 23:00 on Sept. 11, 2014 UTC.)
Response Prediction and Ranking Models for … 203
5 Business Insights
We present in this section some of the business insights our analysts have come
up with which provides the intuition behind feature-based response prediction.
Figure 3 provides the intuition that country, region, and zip code are differentiator
signals for deciding the propensity of the user to purchase.
In Figs. 4 and 5, we show that by channel (mobile, desktop) and by day of week
(weekday vs. weekend) our search sessions volumes and conversions vary. We
see that the weekday traffic post 9 am comes mostly from desktop which indicates
a user browsing or searching from workplace leading up to a lower average order
value (AOV) compared to a user logging in the weekend over mobile when the
AOV and engagement both peak, hence opening up an opportunity window for
response prediction models to promote bigger ticket items for a query during this
time and thereby maximizing conversions and AOV.
In the fourth and fifth graph Figs. 6 and 7, we compare new versus existing users
and their search volumes and conversions over various channels—social media,
email, display ads, private apps, organic search, etc. By tracking the user type and
channel, response prediction models can effectively maximize CTR and CVR.
In the sixth and seventh graphs Figs. 8 and 9, we show how location signals
and query category can be correlated. This opens up the opportunity to response
predictor models to utilize the user’s location (work or home) and the region
to optimize the search results for certain query categories. However for staple
products like laundry the business metrics remain fairly uniform as shown in
Fig. 10.
In the last graph Fig. 11 we note that behaviour in cities is markedly different than
behaviour in non-urban areas.
6 ML Architecture
Here, in Fig. 12 we present the details of the ML relevancy platform we have built
at Unbxd and how we use the platform to power our scalable distributed logistic
regression-based modelling workflow for response prediction.
Distributed LR Training Details in Spark
• The algorithm takes the following inputs:
Response Prediction and Ranking Models for …
Fig. 3 Differences in per session value (i.e. revenue/num sessions), conversion, Avg. order value (i.e. revenue/transactions) across Regional Divisions in USA.
Max differences observed are: Per Session Value– $5.16 versus $3.96, Conversion–6.26% versus 4.81%, Avg. order value–$86.56 versus $82.48
207
208 S. Chatterjee et al.
Fig. 4 Percentage of desktop users on weekdays versus weekend. The dots indicate conversion value
(check y-axis to the right), whereas the text is the Avg. order value. What is evident is the decrease in
desktop usage from morning to evening (work hours) on weekends compared to weekdays. Further,
the jump in conversion from weekdays to weekends is accompanied by a significant jump in Avg.
order value as well
1. Calculate the feature weight updating factor with a map-reduce job using only
the impressions which are part of the ith batch.
2. Apply the feature weight updates to the current model.
3. Check for model convergence and continue to next batch if converged.
4. Feature pruning based on the criterion mentioned above.
Fig. 5 Percentage of mobile users on weekdays versus weekend. The dots indicate conversion value
(check y-axis to the right), whereas the text is the Avg. order value. What is evident is the increase in
mobile usage from morning to evening (work hours) on weekends compared to weekdays. Further,
the jump in conversion (or Avg. order value) from weekdays to weekends is not as strong as in
desktop
Fig. 6 Sessions and bounce rate by channel (filtered for top 10 across site) categorized by visitor type. Do note that channels referral (e.g., Facebook, YouTube)
and shopping engine (e.g., Google, Bing) bring in more sessions from new visitors than repeat
S. Chatterjee et al.
Response Prediction and Ranking Models for …
Fig. 7 Conversion and Avg. order value by channel (filtered for top 10 across site) categorized by visitor type. Do note that new visitors across all channels
have higher Avg. order value
211
212
Fig. 8 AOV and conversion for jewellery (specifically necklaces, rings, earrings, beads and jewellery, etc.) products. Max differences observed: AOV– $71.57
versus $62.02, Conversion–2.55% versus 2.06%
S. Chatterjee et al.
Response Prediction and Ranking Models for …
Fig. 9 AOV and conversion for electronics (specifically laptops, computers, computer accessories, TV accessories, cameras, etc.) products. Max differences
observed: AOV–$327.93 versus $232.47, Conversion–2.02% versus 1.66%
213
214
Fig. 10 AOV and conversion for soaps and laundry care products. Max differences observed: AOV–$71.57 versus $62.02, Conversion– 2.55% versus 2.06%
S. Chatterjee et al.
Response Prediction and Ranking Models for …
Fig. 12 ML architecture
7 A/B Framework
When any response prediction or ranking model tries to change some of the system
parameters, we need to measure its impact. How do we measure impacts of the
model? To this effect, we have designed an inhouse A/B experimentation framework
which works on these principles. Working within the space of incoming traffic and
the system parameters, we have three key concepts:
• A domain is a segmentation of traffic.
Response Prediction and Ranking Models for … 217
40
35
30
25
CTR
20
15
10
5
0
Day Day Day Day Day Day Day Day Day Day Day Day Day Day Day Day Day Day Day Day Day Day Day
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45
Control Test
8 Future Work
From the above feature-based response prediction model, we have been able to both
personalize the search results for a user given a query and improve the CTR and
CVR which is the revenue tracking metrics of the search business. Through feature-
based ranking models, we have mostly captured the general trends of clickability of a
product and improved the business performance metrics by driving a higher average
order value. However, we have not explored the option of serendipity or cross learning
when it comes to surprising the user or providing related product recommendation
in the same search session. Window shopping and serendipity shopping is another
paradigm which is also known to improve engagement of a shopper and a site. In
literature, cross-selling products, “bought also bought”, “viewed also viewed” are
common basis for recommendations. In search however since user’s context is set
through a query, we cannot drift afar, but using an epsilon greedy approach or multi-
armed bandits, we can exploit our feature-based predictions and explore with a subtle
mix of random predictions. This would be the next set of ranking algorithms that we
look forward to working on in the future.
References
Cheng, H., & Cantu-Paz, E. (2010). Personalized click prediction in sponsored search. https://www.
wsdm-conference.org/2010/proceedings/docs/p351.pdf.
Guo, H., Tang, R., Ye, Y., Li, Z., & He, X. (2017). DeepFM: A factorization-machine based neural
network for CTR prediction. https://arxiv.org/pdf/1703.04247.pdf.
Kumar, R., Kumar, M., Shah, N., & Faloutsos, C. (2018). Did we get it right? Predicting query
performance in e-commerce search. https://sigir-ecom.github.io/ecom18Papers/paper23.pdf.
Tze Cheng, H., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhya, H., et al. (2016). Wide and
deep learning for recommender systems. https://arxiv.org/pdf/1606.07792.pdf.
Zhou, G., Song, C., Fan, Y., Zhu, X., Zhu, H., Ma, X., et al. (2018). Deep interest network for
click-through rate prediction. https://arxiv.org/pdf/1706.06978.pdf.
Connectedness of Markets with
Heterogeneous Agents
and the Information Cascades
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 219
A. K. Laha (ed.), Applied Advanced Analytics, Springer Proceedings in Business
and Economics, https://doi.org/10.1007/978-981-33-6656-5_18
220 A. Ghosh et al.
Fig. 1 Sample representations of the network of two financial systems that are symmetric. a A
financial system with no economic agent depending on other agents. b A financial system with each
agent relying equally on all other economic agents (Adopted from: Acemoglu et al. Econometrica
2012)
every stage, some firm may touch a threshold and lose value discontinuously. This
may, at macro-level, even amplify the risks in the network substantially.
In our work, we consider forty-three major world countries covering about 85%
of world gross domestic product (GDP) as heterogeneous economic agents that hold
economic assets (e.g., any factor of production or other investment) as well as liabil-
ities. The primitive holdings of these agents are represented in terms of input–output
of these countries obtained from World Input-Output Database (WIOD: Trimmer et
al., Rev. Int. Eco. 2015). Our network model shows that cascades of failures spread
rapidly in the network and that large markets with values above certain threshold are
able to sustain the cascades of failure much longer compared to the smaller markets
even though these smaller markets have value slightly below the threshold. The infor-
mation about the threshold of economic value, below which the cascade of failure
starts in the network, for different countries can help us avoid failure of the network
as a whole. We also show that the connectedness of markets is well represented in the
WIOD datasets over time. Our backbone network model narrows down the connect-
edness of the markets to show the dynamic impact of industry-specific input–output
on the network formation in our sample.
The remaining of the paper is organized as follows. Section 2 discusses the rele-
vant and recent literature on the applications of networks in finance and macroeco-
nomic research. Section 3 presents the details pertaining to the data, their character-
istics, sources, and mathematical and empirical methods to implement the network
approach to macro-finance issues in the context. Preliminary results obtained through
the analyses of the data are provided in Sect. 4, followed by detailed discussions and
the implications of the results in Sect. 5. The paper ends with a summary and con-
cluding remarks along with the issues that are left unaddressed in this work in Sect. 6.
2 Related Work
Financial markets in general and stock markets in particular act as complex systems
consisting of several agents that interact with each other in a stochastic manner. The
level of complexity is attributed to many factors including the structure of micro-
and macro-environments, the heterogeneity of the participating agents, and their
interactions with each other.
Economic agents such as firms at micro-level and countries and/or markets at
macro-level are interconnected by way of international trade relations, eco-political
integration, and information spillovers. They share risks and shock as well as get
shocked by each entity in the system. Recent literature dealing with issues such as
contagion and cascades of failures across multiple agents in a system highlights the
context in which agents are ex ante heterogeneous. In such cases, the risk character-
istics of the shocks in the system hit the projects of the various agents. Because of the
increase in interconnections among firms and/or markets (Diebold and Yilmaz 2014),
we argue that it is inadequate to consider the regulation of capital requirements and
macro-prudential investigation as reliable. Some researchers have suggested that a
222 A. Ghosh et al.
policy that allows modulation exposures within a network can act as an effective tool
(Stiglitz 2010) that are evident in a concrete empirical framework as well (Degryse
and Nguyen 2007).
One customary way of addressing the problem of macro-prudential regulation
is to rely on stress tests. Local macro-prudential policy in a core country tends to
affect the cross-border transmission of local macroeconomic policies by way of
lending abroad, by restricting the increase in lending by less strongly capitalized
banks in the country. This essentially requires the researchers and policy makers
to carry out empirical studies that examine the outcomes for a system in which
institutions that are interconnected are subjected to large shocks (see, e.g., ESMA
(2006); Kara et al. (2015)). This, however, does not consider that a well-connected
network will likely collapse after subjected to a sufficiently large shock. It is argued
that such connectedness at times offers the benefit of forestalling problems when
the network experiences smaller yet frequent shocks. It is, therefore, not feasible to
achieve a certain level of the optimality of a particular connection structure unless
we incorporate the impact of the whole distribution of shocks in the system. An
alternative approach for assessing the risks in the banking systems is to hypothetically
simulate the impact of stock drawn from the empirical distribution of the historical
returns under the connectedness as seen in the actual network (Elsinger et al. 2006).
Moreover, examining the information spillovers across markets/firms can high-
light similar insights. Preliminary research within the domain of risk spillover and
market contagion in the debt markets emphasizes on an examination of primary
determinants of debt markets. These studies use structural, financial, institutional
indicators such as micro-factors, and macroeconomic characteristics that explain the
dynamic movements of the yields of a sovereign bond. In this context, seminal works
by Eichengreen and Luengnaruemitchai (2004), Claeys and Vaší˜cek (2014), and
Burger and Warnock (2006) for the Asian and European markets; Eichengreen et al.
(2004) empirical work in the Latin American markets; Adelegan and Radzewicz-Bak
(2009) and Mu et al. (2013) for the African markets emphasize on the significance of
macro-economic factors in determining the spillovers across markets. These studies
argue that the volatilities in the exchange rate and the fiscal characteristics typically
hinder the development of both sovereign and corporate bond markets across most of
the emerging economies. On the contrary, institutional, firm-specific, and structural
characteristics, such as trade openness and bureaucratic qualities, provide a positive
nudge for the growth of both sovereign and corporate bond markets. In this context,
the application of network approach to understand the contagion and risk sharing
attributes of different markets is studied by Ahmed et al. (2018).
The literature on financial contagion, cascades of failures, and systemic risk spans
across markets and economies and appears to be emerging steadily in the recent past.
It captures the imagination of researchers across disciplines including those primarily
working in financial economics, mathematics and other computational domains, and
engineering disciplines. Hence, we present only a brief summary of some of the more
closely related and recent research works.
The research on interconnectedness was pioneered by Allen and Gale (2000)
who studied the stability in interconnected financial systems in a developed market.
Connectedness of Markets with Heterogeneous Agents … 223
They propose a model in line with Diamond and Dybvig (1983), where a network
structure has a single and completely connected component. This structure is always
optimal. Such a network minimizes the extent of default. We show in our model the
contrasting results. We find that a richer shock structure generates a genuine trade-off
between the risk shared within the network and the contagion effect and that both
segmentation and lower dispersion of connections may be optimal for the network.
Some more recent, related research on the issue is by Elliott et al. (2014), Diebold
and Yilmaz (2014), Glasserman and Young (2015), Acemoglu et al. (2015), and
Ahmed et al. (2018). The nature and form of the financial linkages among firms (and
in some contexts, markets) examined in these works are seemingly different, as they
consider both the asset side and the liability side of the firms’ (country’s) balance
sheet. Such a framework, in turn, entails the presence of a mechanism that amplifies
the shocks hitting a firm (a country) and subsequently spreading across the network
in unique manners.
In a multi-firm context, literature suggests the presence of the trade-off between the
risk distribution enabled by stronger interconnection and the increased exposure to
cascades as an outcome of larger components in the financial network (Cabrales et al.
2013). Cabrales et al. (2013) study selected benchmark networks that are minimally
interconnected and complete, to identify the best for different distributions of shocks.
Another unique approach that our work focuses on is related to the implementation
of network formation in growth model framework and the related analysis. Elliott et
al. (2014) characterize conditions for the macroeconomic structure of the network
under which the default cascades might occur. We, however, aim to characterize the
optimal and dynamic structures of financial and economic networks in diverse sce-
narios and also investigate if the industry context matter for the network sustains the
shocks. In an earlier examination, Shaffer (1994) too suggests a moderated relation-
ship between risk spillovers and systemic failures of economies. Although entities
hold diversified portfolios to reduce risk, they also face the risk of owning similar
portfolios in the market and being in a system that might be susceptible to contempo-
raneous failures. Acemoglu et al. (2015) highlight the optimal structure of financial
networks, but they focus on examining the shock distributions that are concentrated
within a system for a given shock magnitude. More recent works, on the contrary,
present the properties of the curvature of the function of the risk exposure and the
cumulative distribution of shocks (Cabrales et al. 2017). These more recent evidences
with regard to the network formation among heterogeneous economic agents allow
us to incorporate in our analysis a rich set of possible shock distributions. We can
then show different ways of variations in an optimal financial structure, in response
to the characteristics of those shock distributions. This also enables us to examine
the dynamic properties of the networks over time and other inputs such as industry
category. This uniqueness of our work is a significant contribution to the theoretical
and empirical work on the issue.
224 A. Ghosh et al.
Another line of the literature highlights how financial contagion and cascades of
failures are affected by imperfect information about the shocks hitting the system.
This is studied widely in econometrics and financial economics literature on infor-
mation spillover and market co-integration to certain extent. Some recent work, for
example, Allen et al. (2012), studies the effects of the arrival of a signal on segmented
and unsegmented structures of the network. These signals indicate that a firm in the
system will have to default. This argument can further be extended to the networks
of markets from different countries.
Finally, it is important to discuss the empirical and policy-oriented evidence that
has been the main objective of bringing in the summary measures for the network
connectedness. These measures are derived from the network of relationships among
business entities (mostly financial firms) with the aim of predicting the probabilities of
systemic failures. Some studies, for example, propose different measures of centrality
in networks (Battiston et al. 2012; Denbee et al. 2011). A significant contribution in
this respect is the work by Elsinger et al. (2006), who use data from the Austrian
market and show that a correlation in banks’ asset portfolios can be considered as
the main source of systemic risk.
3 Research Objectives
The above-mentioned review of relevant research examining twin issues of the inter-
connectedness of heterogeneous economic agents and the contagion and cascades
of failures across firms and markets, information and risk spillover, and macro-
prudential regulations in the context of financial and economic entities such as firms
and markets has brought out the following research issues to be examined:
• Since the theories emphasize that the economic entities such as firms and/or mar-
kets, whether homogeneous or heterogeneous, are interconnected to certain extent,
how do these entities form networks? It would be of empirical interest to investi-
gate how these entities connect to each other in terms of the directional spillover
of risks and cascades of failures.
• With a growth model framework, the network formation tends to evolve with
change in underlying attribute(s), such as time, values, and so on. We propose to
examine how industry-specific inputs affect the formation of networks of economic
entities. This would suggest the interconnectedness of economic entities at the
micro-level where firm/market characteristic(s) becomes an important input to
identify the potential risk in the network.
Connectedness of Markets with Heterogeneous Agents … 225
4 Methodology
Our aim is to study the cross-holdings of entities in terms of input–output and look
at a time-varying feature to examine the changes in the network. We also hope to
study the ripple effects caused due to the failure of entities inside the model. It is
hypothesized that the ripple effects in the network should be caused once an identified
entity(-ies) touches or crosses the threshold indicating the cascade of failures.
Our framework proposes that there are n organizations (as economic agents such as
countries, financial firms, or other business entities) making up a set N = 1, ..., n, with
n = 43. The values assigned to these sample organizations are ultimately based on
the economic value of asset holdings or factors of production—henceforth, simply
assets M = 1, ... , m. For consistency, an asset holding may be taken as a project that
is expected to generate a series of cash flows over time. The present value (or the
current market price) of asset k is denoted pk . Further, let Dik ≥ 0 be the share of
226 A. Ghosh et al.
the value of asset k owned by an organization i that receives the cash flows and let
D denote the matrix whose (i, k)th entry is equal to Dik .
An organization can also hold shares (here we have an amount of debt held by one
country from another country) of other organizations in the sample. For any i, j ∈ N ,
the number Ci j ≥ 0 is the fraction of the organization j owned by the organization
i, where Cii = 0 for each i. The matrix C can be proposed to be a network with a
directed link from i to j, if i holds a share of j with a positive value, so that Ci j > 0.
After we account for all these cross-holding
shares across sample organizations,
ii := 1 −
we are left with a share C j ∈ Ni Ci j of organization i that is not owned
by any organization in the system. This component of the share is assumed to be of
positive value. Theoretically, this is the part that is held by outside shareholders of
the organization i, and is external to the system of cross-holdings. The off-diagonal
entries of the matrix
C are defined to be 0.
The equity or book value Vi of an organization i is the total value of its shares.
The value is obtained by adding the value of the shares owned by other organizations
and the shares owned by outside shareholders. This value equals to the value of
organization i’s asset holding plus the value of its claims on other organizations in
the system:
Vi = Dik pk + V j Ci j (1)
k j
V = Dp + CV or V = (I − C)−1 Dp (2)
As shown in both Brioschi et al. (1989) and Fedenia et al. (1994), the market
value reflects the external asset holdings. The final non-inflated economic value of an
organization to the economy is well captured by the equity value of that organization
that is held by its outside investors. This economic value captures the flow of real
assets that is expected to accrue to the ultimate investors of that organization. The
market value is denoted by vi and equals to Cii Vi , and therefore:
v=
CV =
C(I − C)−1 Dp = ADp (3)
where
A=
C(I − C)−1 (4)
If the market value vi of an organization i in the system, for any reason, drops below
a threshold μ, then i is said to fail, in economic sense, and incurs failure costs βi ( p).
These failure costs are then subtracted from the cash flows received by the failing
organization. In such scenario, these organizations can propose to push the diversion
of cash flow to deal with the failure. They can also hope a decrease in the returns
that the organization generates from the unique assets that they hold. Either way, the
proposed approach introduces critical nonlinearities, or rather discontinuities, into
the system of organizations.
We have calculated a fractional ownership matrix which represents the fraction
of GDP produced by the country using its own resources and debt taken from other
countries. Here, GDP is the total output from all industries, where each industry
might or might not have borrowed money from other countries. We have taken the
base year as 2000. We have normalized the GDP values with the GDP of India.
So, if we consider all the diagonal elements of a fractional matrix as zero we get
a C matrix and by forming a matrix with the diagonal elements of the fractional
matrix give a C matrix (non-diagonal elements are filled with zeros). Using Eq.(4),
we can calculate matrix A.
We define parameter θ ∈ [0, 1], which is used to calculate the fractional decrease
in GDP values of base year which can cause the country to fail, and numerically
threshold is defined as:
ϒ = θ ∗ (A.pt ) (6)
Here, pt is the normalized GDP of base year (2000). So, if the normalized GDP
of any country goes below this threshold we consider that country to be a failure. If
any country fails, we subtract 50% of the threshold value from its normalized GDP
of the present year and repeat the process again until no country fails.
Mathematically, steps can be visualized as:
1. A= C(I − C)−1 (nXn matrix)
2. p = G D P (Normalised nX1 matrix)
3. pt = G D P (Normalised base year (2000) nX1 matrix)
4. ϒ = θ ∗ (A.pt )
5. If A.p < ϒ (compared elementwise) follow step 6 else all countries are safe at
that .
6. p = p − ϒ/2 (elementwise, subtract ϒ/2 only from countries which has failed)
7. Repeat step 5 until no country fails.
To get a qualitative idea of the accuracy of our network, we run the cascading
model on the WIOD dataset from 2000 to 2014, and try to see if there are economic
explanations for failure of the countries reported by the algorithm. Once the veracity
of the algorithm is ascertained, we will look into simulating a large amount of trading
and then repeatedly apply our failure model to find the most ideal trading scenarios
for the least failures.
228 A. Ghosh et al.
5 Preliminary Results
The WIOD dataset provides the input–output in current prices and is denoted in
millions of dollars. The WIOD database covers twenty-eight countries from the
European Union and fifteen other major countries (totaling to 43 countries) from
across the world for the sample period from 2000 to 2014. The data is sorted industry-
wise for 56 industries in total. For a preliminary analysis, we have condensed the
data to only the inter-country input–output table. The interlink, with respect to trade,
between the countries is given in Fig. 2.
6 Discussion
Fig. 2 A representative network diagram of the sample countries based on the WIOD dataset for
the year 2016
Connectedness of Markets with Heterogeneous Agents … 229
carry out controlled and non-controlled experiments to observe and analyze the reac-
tion of economic agents as a response to shifts in behavior of other players. This
mechanism explains the nature of interconnectedness of agents in a system where
the agents are believed to be part of the network and hypothesized to be homoge-
neous. Recent research focuses on heterogeneity characteristic of these economic
agents to understand the nature of interconnectedness in the network.
In finance, the interconnectedness becomes more significant as it helps understand
the contagion of risk, the flow of information (that eventually explains the price
discovery process and existence of arbitrage opportunities), and overall the ability
of the system to absorb the shocks caused by one or more agents (Diebold and
Yilmaz 2014). During the 2008 global financial crisis, several firms belonging to the
finance industry across the globe were so deeply interconnected that one failure in
one part of the world would result in tremors across several players in the industry.
For example, the fall of Lehman Brothers caused few big finance firms to become
susceptible to bankruptcy that led the government to swing into action and bail out
many other financial institutions to save the entire economy (or, probably several
economies around the world). In this context, examining the interconnectedness of
the markets should explain the vulnerability of the system and pinpoint the weak
nodes in networks so that regulators and governments among others take corrective
actions well in time.
7 Concluding Remarks
Our research presents the evidence on the nature of interconnectedness that global
markets exhibit in terms of their input–output representing the cross-holdings. It
shows that the interdependence of some markets in a global network has strong cor-
relation with not only the size of the markets, but also the direction of trades/cross-
holdings, and the type of industries that dominate in their input–output data. With
growth model estimation, we are able to project the cascades of failures in the network
significantly. Our results as exhibited in the graphs corroborate with the empirical
research on the failures of the markets. It is shown that markets having more connec-
tions with other markets in the network are likely to sustain the shocks and cascades
of failures for longer time. This evidence is aligned with the argument of diversifica-
tion as a strategy to mitigate risk. Our findings employ innovative approaches such as
network formation approach and graph theory to explain the interconnectedness of
markets across the world, and contribute significantly to the theoretical issues related
to market integration and risk spillover (Diebold and Yilmaz 2014; Cabrales et al.
2017; Ahmed et al. 2018) (Tables 1, 2, 3, 4, 5 and 6).
Acknowledgements The authors are grateful to the reviewers and the editor for their comments and
suggestions. The financial support of Indian Institute of Technology Kharagpur through the Chal-
lenge Grant (CFH ICG 2017 SGSIS/2018-19/090) toward this research is hereby acknowledged.
Usual disclaimers apply.
230 A. Ghosh et al.
8 Appendix
Table 1 Name of countries failed at that threshold and the cascade impact in 2001
2001
0.1 [[]]
0.775 [[]]
0.78 [[’TUR’] []]
0.785 [[’TUR’] []]
0.88 [[’TUR’] []]
0.885 [[’BRA’ ’TUR’] [’TWN’] [’JPN’] []]
0.89 [[’BRA’ ’JPN’ ’TUR’ ’TWN’] []]
0.91 [[’BRA’ ’JPN’ ’TUR’ ’TWN’] []]
0.915 [[’BRA’ ’JPN’ ’TUR’ ’TWN’] [’MLT’] []]
0.92 [[’BRA’ ’JPN’ ’TUR’ ’TWN’] [’KOR’ ’MLT’] []]
0.925 [[’BRA’ ’JPN’ ’TUR’ ’TWN’] [’KOR’ ’MLT’] []]
0.93 [[’BRA’ ’JPN’ ’TUR’ ’TWN’] [’KOR’ ’MLT’] []]
0.935 [[’BRA’ ’JPN’ ’KOR’ ’TUR’ ’TWN’] [’AUS’ ’MLT’] [’IDN’] []]
0.94 [[’BRA’ ’JPN’ ’KOR’ ’MLT’ ’TUR’ ’TWN’] [’AUS’ ’IDN’ ’SWE’] []]
0.945 [[’BRA’ ’JPN’ ’KOR’ ’MLT’ ’SWE’ ’TUR’ ’TWN’] [’AUS’ ’IDN’] []]
0.95 [[’BRA’ ’JPN’ ’KOR’ ’MLT’ ’SWE’ ’TUR’ ’TWN’] [’AUS’ ’IDN’] []]
0.955 [[’BRA’ ’JPN’ ’KOR’ ’MLT’ ’SWE’ ’TUR’ ’TWN’] [’AUS’ ’IDN’] [’ROW’] [’FIN’ ’LUX’] []]
0.96 [[’AUS’ ’BRA’ ’JPN’ ’KOR’ ’MLT’ ’SWE’ ’TUR’ ’TWN’] [’FIN’ ’IDN’ ’ROW’] [’CYP’ ’LUX’] []]
0.965 [[’AUS’ ’BRA’ ’JPN’ ’KOR’ ’MLT’ ’SWE’ ’TUR’ ’TWN’] [’FIN’ ’IDN’ ’ROW’] [’CYP’ ’LUX’] []]
Table 2 Name of countries failed at that threshold and the cascade impact in 2002
2002
0.1 [[]]
0.95 [[]]
0.955 [[’JPN’] [’BRA’] []]
0.96 [[’BRA’ ’JPN’] []]
0.975 [[’BRA’ ’JPN’] []]
0.98 [[’BRA’ ’JPN’] [’ROW’] []]
0.985 [[’BRA’ ’JPN’] [’ROW’] [’TWN’] [’USA’] [’CAN’ ’MEX’] []]
0.99 [[’BRA’ ’JPN’] [’ROW’] [’TWN’ ’USA’] [’CAN’ ’MEX’] []]
0.995 [[’BRA’ ’JPN’ ’ROW’] [’CAN’ ’TWN’ ’USA’] [’MEX’] []]
Table 3 Name of countries failed at that threshold and the cascade impact in 2009
2009
0.1 [[]]
0.725 [[]]
0.73 [[]]
0.735 [[’LTU’] []]
0.74 [[’LTU’] []]
0.745 [[’LTU’] []]
0.75 [[’LTU’ ’LVA’] []]
0.755 [[’LTU’ ’LVA’] []]
0.76 [[’LTU’ ’LVA’] [’EST’] []]
0.765 [[’EST’ ’LTU’ ’LVA’] []]
0.77 [[’EST’ ’LTU’ ’LVA’] []]
0.775 [[’EST’ ’LTU’ ’LVA’] []]
0.78 [[’EST’ ’LTU’ ’LVA’] []]
0.785 [[’EST’ ’LTU’ ’LVA’] []]
0.79 [[’EST’ ’LTU’ ’LVA’ ’POL’ ’RUS’] []]
0.795 [[’EST’ ’LTU’ ’LVA’ ’POL’ ’RUS’] [’HUN’] []]
0.8 [[’EST’ ’HUN’ ’LTU’ ’LVA’ ’MEX’ ’POL’ ’RUS’] []]
0.805 [[’EST’ ’HUN’ ’LTU’ ’LVA’ ’MEX’ ’POL’ ’RUS’] [’CZE’ ’SWE’] []]
0.81 [[’EST’ ’HUN’ ’LTU’ ’LVA’ ’MEX’ ’POL’ ’RUS’ ’SWE’] [’CZE’] [’SVK’] []]
0.815 [[’CZE’ ’EST’ ’HUN’ ’LTU’ ’LVA’ ’MEX’ ’POL’ ’RUS’ ’SWE’] [’FIN’ ’SVK’] []]
0.82 [[’CZE’ ’EST’ ’HUN’ ’LTU’ ’LVA’ ’MEX’ ’POL’ ’RUS’ ’SWE’] [’FIN’ ’SVK’ ’TUR’] [’ROU’ ’SVN’] []]
0.825 [[’CZE’ ’EST’ ’HUN’ ’LTU’ ’LVA’ ’MEX’ ’POL’ ’RUS’ ’SWE’] [’FIN’ ’ROU’ ’SVK’ ’SVN’ ’TUR’] [’ESP’] [’DNK’] [’NOR’] []]
0.83 [[’CZE’ ’ESP’ ’EST’ ’HUN’ ’LTU’ ’LVA’ ’MEX’ ’POL’ ’ROU’ ’RUS’ ’SWE’ ’TUR’] [’DNK’ ’FIN’ ’ITA’ ’NOR’ ’SVK’ ’SVN’] [’HRV’ ’IRL’ ’LUX’]
[’GBR’] [’BEL’ ’DEU’] [’AUT’ ’BGR’ ’MLT’ ’PRT’] []]
Connectedness of Markets with Heterogeneous Agents …
0.835 [[’CZE’ ’ESP’ ’EST’ ’FIN’ ’HUN’ ’LTU’ ’LVA’ ’MEX’ ’POL’ ’ROU’ ’RUS’ ’SVN’ ’SWE’ ’TUR’] [’DNK’ ’HRV’ ’ITA’ ’NOR’ ’SVK’]
[’BGR’ ’DEU’ ’GBR’ ’IRL’ ’LUX’] [’AUT’ ’BEL’ ’MLT’ ’PRT’] []]
0.84 [[’CZE’ ’ESP’ ’EST’ ’FIN’ ’HUN’ ’ITA’ ’LTU’ ’LVA’ ’MEX’ ’POL’ ’ROU’ ’RUS’ ’SVN’ ’SWE’ ’TUR’]
[’BGR’ ’DEU’ ’DNK’ ’GBR’ ’HRV’ ’IRL’ ’LUX’ ’NOR’ ’PRT’ ’SVK’] [’AUT’ ’BEL’ ’MLT’ ’TWN’ ’USA’] [’CAN’ ’FRA’ ’NLD’] []]
0.845 [[’CZE’ ’ESP’ ’EST’ ’FIN’ ’HRV’ ’HUN’ ’ITA’ ’LTU’ ’LVA’ ’MEX’ ’POL’ ’ROU’ ’RUS’ ’SVK’ ’SVN’ ’SWE’ ’TUR’]
[’BEL’ ’BGR’ ’DEU’ ’DNK’ ’GBR’ ’IRL’ ’LUX’ ’NOR’ ’PRT’ ’TWN’] [’AUT’ ’FRA’ ’MLT’ ’NLD’ ’USA’] [’CAN’] []]
0.85 [[’CZE’ ’DNK’ ’ESP’ ’EST’ ’FIN’ ’GBR’ ’HRV’ ’HUN’ ’IRL’ ’ITA’ ’LTU’ ’LUX’ ’LVA’ ’MEX’ ’NOR’ ’POL’ ’ROU’ ’RUS’ ’SVK’ ’SVN’ ’SWE’ ’TUR’]
[’AUT’ ’BEL’ ’BGR’ ’DEU’ ’FRA’ ’MLT’ ’PRT’ ’TWN’ ’USA’] [’CAN’ ’NLD’] []]
0.855 [[’CZE’ ’DNK’ ’ESP’ ’EST’ ’FIN’ ’GBR’ ’HRV’ ’HUN’ ’IRL’ ’ITA’ ’LTU’ ’LUX’ ’LVA’ ’MEX’ ’NOR’ ’POL’ ’ROU’ ’RUS’ ’SVK’ ’SVN’
’SWE’ ’TUR’ ’TWN’ ’USA’] [’AUT’ ’BEL’ ’BGR’ ’CAN’ ’DEU’ ’FRA’ ’MLT’ ’PRT’] [’KOR’ ’NLD’] []]
0.86 [[’BGR’ ’CZE’ ’DNK’ ’ESP’ ’EST’ ’FIN’ ’GBR’ ’HRV’ ’HUN’ ’IRL’ ’ITA’ ’LTU’ ’LUX’ ’LVA’ ’MEX’ ’NOR’ ’POL’ ’PRT’ ’ROU’ ’RUS’
’SVK’ ’SVN’ ’SWE’ ’TUR’ ’TWN’ ’USA’] [’AUT’ ’BEL’ ’CAN’ ’DEU’ ’FRA’ ’KOR’ ’MLT’] [’CYP’ ’NLD’] []]
0.865 [[’BEL’ ’BGR’ ’CZE’ ’DEU’ ’DNK’ ’ESP’ ’EST’ ’FIN’ ’GBR’ ’HRV’ ’HUN’ ’IRL’ ’ITA’ ’LTU’ ’LUX’ ’LVA’ ’MEX’ ’NOR’ ’POL’
’PRT’ ’ROU’ ’RUS’ ’SVK’ ’SVN’ ’SWE’ ’TUR’ ’TWN’ ’USA’] [’AUT’ ’CAN’ ’FRA’ ’KOR’ ’MLT’ ’NLD’] [’CHE’ ’CYP’] []]
231
232 A. Ghosh et al.
Table 4 Name of countries failed at that threshold and the cascade impact in 2010
2010
0.1 [[]]
0.93 [[]]
0.935 [[’GRC’ ’IRL’] []]
0.94 [[’GRC’ ’IRL’] []]
0.945 [[’GRC’ ’IRL’] []]
0.95 [[’GRC’ ’HRV’ ’IRL’] []]
0.955 [[’GRC’ ’HRV’ ’IRL’] [’CYP’] []]
0.96 [[’GRC’ ’HRV’ ’IRL’] [’CYP’] []]
0.965 [[’GRC’ ’HRV’ ’IRL’] [’CYP’] []]
0.97 [[’GRC’ ’HRV’ ’IRL’] [’CYP’ ’ESP’] []]
0.975 [[’CYP’ ’ESP’ ’GRC’ ’HRV’ ’IRL’] [’BGR’] []]
0.98 [[’CYP’ ’ESP’ ’GRC’ ’HRV’ ’IRL’] [’BGR’] []]
0.985 [[’BGR’ ’CYP’ ’ESP’ ’GRC’ ’HRV’ ’IRL’] []]
0.99 [[’BGR’ ’CYP’ ’ESP’ ’GRC’ ’HRV’ ’IRL’] [’PRT’] []]
0.995 [[’BGR’ ’CYP’ ’ESP’ ’GRC’ ’HRV’ ’IRL’] [’PRT’] []]
1 [[’BGR’ ’CYP’ ’ESP’ ’GRC’ ’HRV’ ’IRL’] [’PRT’] []]
Table 5 Name of countries failed at that threshold and the cascade impact in 2013
2013
0.1 [[]]
0.85 [[]]
0.855 [[’JPN’] []]
0.86 [[’JPN’] []]
0.965 [[’JPN’] []]
0.97 [[’JPN’] [’AUS’] []]
0.975 [[’JPN’] [’AUS’] []]
0.98 [[’GRC’ ’JPN’] [’AUS’] []]
0.985 [[’AUS’ ’GRC’ ’JPN’] [’IDN’] []]
0.99 [[’AUS’ ’GRC’ ’JPN’] [’IDN’] []]
0.995 [[’AUS’ ’GRC’ ’JPN’] [’IDN’] [’TWN’] []]
1 [[’AUS’ ’GRC’ ’IDN’ ’JPN’] [’IND’ ’TWN’] [’BRA’] []]
Table 6 Name of countries failed at that threshold and the cascade impact in 2015
2014
0.1 [[]]
0.935 [[]]
0.94 [[]]
0.945 [[’JPN’] []]
0.95 [[’JPN’] [’AUS’] []]
0.955 [[’JPN’] [’AUS’] []]
0.96 [[’JPN’] [’AUS’] []]
0.965 [[’AUS’ ’CYP’ ’JPN’] [’RUS’] []]
0.97 [[’AUS’ ’BRA’ ’CYP’ ’JPN’] [’RUS’] []]
0.975 [[’AUS’ ’BRA’ ’CYP’ ’JPN’ ’RUS’] []]
0.98 [[’AUS’ ’BRA’ ’CYP’ ’JPN’ ’RUS’] [’CAN’ ’IDN’] [’TUR’] [’GRC’] []]
0.985 [[’AUS’ ’BRA’ ’CAN’ ’CYP’ ’JPN’ ’RUS’] [’GRC’ ’IDN’ ’TUR’] [’HRV’ ’ITA’ ’SWE’] [’AUT’ ’EST’ ’FIN’ ’NLD’ ’NOR’ ’SVK’]
[’BEL’ ’CZE’ ’DEU’ ’DNK’ ’FRA’ ’LTU’ ’LVA’ ’MLT’ ’SVN’] [’CHE’ ’ESP’ ’HUN’ ’POL’ ’PRT’ ’ROU’] [’BGR’ ’IRL’] [’LUX’] []]
Connectedness of Markets with Heterogeneous Agents …
0.99 [[’AUS’ ’BRA’ ’CAN’ ’CYP’ ’GRC’ ’JPN’ ’RUS’ ’TUR’] [’FIN’ ’HRV’ ’IDN’ ’ITA’ ’NOR’ ’SWE’] [’AUT’ ’CZE’ ’DEU’ ’DNK’ ’EST’ ’FRA’
’LTU’ ’LVA’ ’NLD’ ’SVK’ ’SVN’] [’BEL’ ’CHE’ ’ESP’ ’HUN’ ’MLT’ ’POL’ ’PRT’ ’ROU’ ’TWN’] [’BGR’ ’IRL’ ’LUX’ ’ROW’] [’KOR’ ’MEX’] []]
0.995 [[’AUS’ ’BRA’ ’CAN’ ’CYP’ ’GRC’ ’HRV’ ’IDN’ ’ITA’ ’JPN’ ’RUS’ ’TUR’] [’AUT’ ’CZE’ ’DEU’ ’FIN’ ’FRA’ ’NLD’ ’NOR’ ’SVK’ ’SVN’ ’SWE’]
[’BEL’ ’CHE’ ’DNK’ ’ESP’ ’EST’ ’HUN’ ’LTU’ ’LVA’ ’MLT’ ’POL’ ’PRT’ ’ROU’ ’TWN’] [’BGR’ ’IRL’ ’LUX’ ’ROW’] [’KOR’ ’MEX’] [’GBR’] []]
233
234 A. Ghosh et al.
References
Ahmed, W., Mishra, A. V., & Daly, K. J. (2018). Financial connectedness of BRICS and global
sovereign bond markets. Emerging Markets Review, 37, 1–16.
Cabrales, A., Gottardi, P., & Vega-Redondo, F. (2017). Risk sharing and contagion in networks.
The Review of Financial Studies, 30(9), 3086–3127.
Degryse, H. A., & Nguyen, G. (2007). Interbank exposures: An empirical examination of contagion
risk in the Belgian banking system. International Journal of Central Banking, 3(3), 123–172.
Diebold, F., & Yilmaz, K. (2014). On the network topology of variance decompositions: Measuring
the connectedness of financial firms. Journal of Econometrics, 182(1), 119–134.
Elliott, M., Golub, B., & Jackson, M. O. (2014). Financial networks and contagion. American
Economic Review, 104(10), 3115–3153.
Helmut Elsinger; Alfred Lehar and Martin Summer. (2006). Risk assessment for banking systems.
Management Science, 52(9), 1301–1314.
Mulally, A. R. (2008). Examining the state of the domestic automobile industry, hearing. United
States Senate Committee on Banking, Housing, and Urban Affairs.
Stiglitz, Joseph E. (2010). Risk and global economic architecture: Why full financial integration
may be undesirable. American Economic Review, 100(2), 388–92.
Trimmer, M. P., Dietzenbacher, E., Los, B., Stehrer, R., & deVries, G. J. (2015). An illustrated user
guide to the world input-output database: The case of global automotive production. Review of
International Economics, 23, 575–605.
Schweitzer, F., Fagiolo, G., Sornette, D., Vega-Redondo, F., Vespignani, A., & White, D. R. (2009).
Economic Networks: The new challenges. Science, 422–425.
Esma, N.C. (2006). Solving stochastic PERT networks exactly using hybrid Bayesian networks, In
proceedings of the 7th workshop on uncertainty processing (pp. 183–197). Oeconomica Publish-
ers.
Kara, F., & Yucel, I. (2015). Climate change effects on extreme flows of water supply area in
Istanbul: Utility of regional climate models and downscaling method. Environmental Monitoring
and Assessment, 187, 580–596.
Burger, J. D., & Warnock, F. E. (2006). Foreign participation in local currency bond markets. Review
of Financial Economics, 16(3), 291–304.
Eichengreen, B., & Luengnaruemitchai, P. (2004). Why doesn’t Asia have bigger bond markets?
NBER Working Paper 10576.
Adelegan, O.J. & Radzewicz-Bak, B. (2009). What Determines Bond Market Development in
Sub-Saharan Africa? IMF Working Paper No. 09/213, Available at SSRN: https://ssrn.com/
abstract=1486531
Mu, Y., Peter, P., & Janet, G. S. (2013). Bond markets in Africa. Review of Development Finance,
3(3), 121–135.
Wasim Ahmad, W., Mishra, A. V., & Daly, K. J. (2018). Financial connectedness of BRICS and
global sovereign bond markets. Emerging Markets Review, 37, 1–16.
Timmer, M. P., Dietzenbacher, E., Los, B., Stehrer, R., & de Vries, G. J. (2015). An illustrated user
guide to the World Input-Output database: The case of global automotive production. Review of
International Economics, 23(3), 575–605.
Dietzenbacher, E., Los, B., Stehrer, R., Timmer, M., & de Vries, G. (2013). The construction of
world input–output tables in the wiod project. Economic Systems Research, 25(1), 71–98.