A99 SCM StatisticalvsNeural
A99 SCM StatisticalvsNeural
A PROJECT REPORT
Submitted by
HEEROK BANERJEE [Reg No: RA1511008010064]
DIKSHIKA K. ASOLIA [Reg No: RA1511008010214]
PRIYANSHI GARG [Reg No: RA1511008010224]
NIKHIL SHAW [Reg No: RA1511008010233]
GRISHMA SAPARIA [Reg No: RA1511008010251]
INFORMATION TECHNOLOGY
i
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR-603203
BONAFIDE CERTIFICATE
Certified further, that to the best of my knowledge the work reported herein
does not form part of any other thesis or dissertation on basis of which a degree
or award was conferred on an earlier occasion for this or any other candidate.
ii
ABSTRACT
The advent of AI tools in industrial management and business operations has broadly
reinforced the interplay among business entities in the digital realm. These autonomous
tools are undoubtedly powerful in employing self-learning and robust paradigms to fa-
cilitate predictive analytics for business intelligence but such tools still remain insuffi-
cient to overcome the impact of inevitable risks involved in businesses. In context to
Supply Chain Management (SCM), it is therefore an open problem to eliminate and
more significantly, optimize the impact of such risky operations on high-priority objec-
tives such as Total cost, Lead-time and inventory costs that have been an evolutionary
target for supply chain managers. In this study, we compare widely employed statistical
and neural approaches to perform forecasting and conduct numerical analyses on multi-
echelon supply chain networks. We start with experimental hypothesis testing on ac-
quired datasets from multiple sources and hypothesize different batches of data to eval-
uate the measures of centrality and their co-relation attributes. This essentially reduces
the input batches that are forwarded to forecasting models, hence relieving the system
from computational overhead. We then proceed to evaluate machine learning models
such as Decision Trees, Random Forests and Extended Gradient Boosting (XGB) Trees
with pipe-lined architectures in order to observe their model and runtime performances
on empirical and streaming data. We compare the obtained results to draw conclusions
on their run-time performances. Next, we construct and train neural network mod-
els for use-cases such as function estimation and time-series analysis on forecasting
problems related to risk-averse logistics and lead-time optimization. We have achieved
outperforming results for Bi-directional Long Short-term Memory (LSTM) model and
Non-linear Auto-Regressive (NAR) model for sequence-to-sequence prediction. We
compare these models based on their performance, the loss function and their ability
to generalize trends from the datasets. Finally, we conclude our study by commenting
on the observed performances of the desired models and providing future directions to
extend the contributions of this comparative study.
ACKNOWLEDGEMENTS
We would like to express our deepest gratitude to our guide, Dr. V. Ganapathy, (Pro-
fessor Dept. of Information Technology) for his valuable guidance, consistent encour-
agement, personal caring, timely help and providing us with an excellent atmosphere
for conducting research. All throughout the work, in spite of his busy schedule, he has
extended cheerful and cordial support to our group for completing this research work.
His suggestions and expertise played a key factor in redirecting our central focus to the
major outcomes which are presented in this study.
We would also like to thank Dr. G. Vadivu (HoD, Dept. of Information Technology)
for providing extra mural support and state-of-the-art infrastructure to us in order to fa-
cilitate this research work. We feel extremely privileged to acquire academic license for
MATLAB™, JASP™and QtiPlot™software, without which this work could not have
been completed.
Heerok Banerjee
Dikshika Asolia
Priyanshi Garg
Nikhil Shaw
Grishma Saparia
iv
TABLE OF CONTENTS
ABSTRACT iii
ACKNOWLEDGEMENTS iv
LIST OF FIGURES ix
ABBREVIATIONS x
LIST OF SYMBOLS xi
2 Review of Literature 4
2.1 Multi-objective Optimization . . . . . . . . . . . . . . . . . . . . . 4
2.2 Bayesian Modelling for Risk Analysis . . . . . . . . . . . . . . . . 5
2.3 Reverse Supply Chain Problems . . . . . . . . . . . . . . . . . . . 6
4 Statistical Approaches 12
4.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.1 Student’s T Test . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.2 Mean Normalization . . . . . . . . . . . . . . . . . . . . . 14
4.2 Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . 17
v
4.2.1 Regression Model . . . . . . . . . . . . . . . . . . . . . . 17
4.2.2 Decision Tree Model . . . . . . . . . . . . . . . . . . . . . 18
4.2.3 Random Forest Regression . . . . . . . . . . . . . . . . . . 19
4.2.4 XGB Tree Model . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 Neural Approaches 26
5.1 Multi-layered Perceptron Networks . . . . . . . . . . . . . . . . . 26
5.1.1 Training the Model . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Long-Short Term Memory (LSTM) Networks . . . . . . . . . . . . 28
5.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Performance Analyses . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Conclusion 38
8 Code Analysis 39
8.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
8.2 Dynamic Code Analysis . . . . . . . . . . . . . . . . . . . . . . . 48
8.3 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9 Publication 52
A Dataset Description 53
A.1 Walmart Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
A.2 UAE Distributor Dataset . . . . . . . . . . . . . . . . . . . . . . . 53
A.3 Time Series Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 54
B Backpropagation Algorithm 55
LIST OF TABLES
vii
LIST OF FIGURES
3.1 Simple Supply chain architecture with multiple suppliers, single manu-
facturer and single retailer . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Non-negative integral solutions of Diophantine equation with d=100 11
viii
8.2 Execution of Code snippet for Random Forest . . . . . . . . . . . . 48
8.3 Execution of Code snippet for XGB Trees . . . . . . . . . . . . . . 49
8.4 Execution of Code snippet for NAR live forecasting . . . . . . . . . 49
8.5 Dynamic code analysis using Matlab™Code Analyzer . . . . . . . 50
B.1 Simple neural network architecture with one input layer of ’n’ neurons,
one hidden layer of ’l’ neurons and one output layer of ’m’ neurons 55
ix
ABBREVIATIONS
RI Risk Index
GA Genetic Algorithm
IG Information Gain
x
LIST OF SYMBOLS
xi
CHAPTER 1
SCM is the study of methods to run collaborative business operations in a way to maxi-
mize profit from the investments in the business pertaining to risk-averse strategies and
optimizing resources, expenditures as well as effort to sustain a healthy growth. Nev-
ertheless, it is clear that topological illustrations are used to represent business entities
and their relationships. In network theory, Supply Chain Risk Management (SCRM) is
understood as a Directed Acyclic Graph (DAG) (1). For example, shoppers, suppliers,
makers and distributors are represented by vertices whereas the edges can be business
activities like packaging, delivery, assembling, etc.
S1 D1
S2 M1 D2
S3
M R
S4 M2 D3
S5
S6 M3
In figure 1.1, the circular nodes denote an element in the supply chain network.
The directed arrow denote a specific operation to be performed between two node. The
complexity of the network is in-determinant of the number of nodes and the bi-lateral
processes involved during a supply chain cycle (1). However, we have introduced the
formulation of non-linear attributes such as risks in later chapters. As deducible from
the inherent architectures of Supply chain networks, there are numerous issues attached
with SCM relatable to social and economic factors (1).
1.1 Issues in Supply Chain Management
Psychological, social, cultural and personal factors influence consumer behaviour which
is rapidly altered by globalization and technology. As social media users comply with
new norms for interaction in the online virtual world, there is a need for the companies
to use this new gigantic data to promote or make relevant products. The trends have a
short cycle as the products which thrive and go extinct with them. Firms are under com-
pulsion to keep producing new products as well as shipping new features while keeping
the cost minimal. Furthermore, enhancing existing product features need revamping
supply chain to aid product enhancement.
The emergence of social media has made the internet the biggest market where
anyone can advertise and sell products to anyone across the earth. This has lifted the
expectations of consumers for top standard products with innovation and consistency.
Prominent trends like blockchain, big data, IoT, smart packaging are not only uplifting
concurred standards but the methods in which they are administered and gauged. New
advanced applications are required for handling, processing and making sense of gigan-
tic data being generated. The immediate issue an enterprise face is whether investing
in utilizing platforms established on micro-services and big data will bear data lifting
essentials.
Hence, it is clear that the overgrowth of democratized technologies across the globe
is a harbinger for severe competence in the contemporary market. The impact of these
inevitable components are only some of the attributes which surround the problems
related to SCM. In this research study, we overlook the quantitative factors involved
in business operations and provide a context to transform operation inclined toward
fault-tolerant and risk-aware decision making. These essentially brings new trade-offs
to managers but it is necessary to outline the tools applicable for dealing with multi-
objective decision making. So, the central focus of our study is on reversing effects of
risks and understand the non-linearity of this quantity from an optimization approach.
However, the goal of this study is to provide substantial contributions to modelling risks
are non-dominant and provide a model-to-model comparison on quantitative measures
for evaluating a risk-aware logistics scenario in a business setting.
2
.
3. Operational risk- Operational risk is the cumulative factor that describes the
probability of enduring loss due to a potential failure of a business operation (5).
Operational risks results from unprecedented failure of operations (5); regressive
manufacturing endeavours and incapacitated processing, high degree of deviation
and advent of incoming new trends and technology(5).
5. Social risk-Social risks are described as the arrival of challenges that business
practices endure, due to impacts propagated through the social participants of the
business hierarchy. (3).
Figure 1.2: Venn Diagram for representation of risks involved in supply chain systems
3
CHAPTER 2
REVIEW OF LITERATURE
Optimization problems in supply chain systems have attracted researchers and supply
chain managers since the early industrialization and democratization of products at the
expense of outsourcing services like manufacturing, dispensing, transportation, pack-
aging etc. Optimum resource configuration is one of the most attributed problems in
SCMS (12; 13). For supply chain systems that are immune to pervasive risks, sup-
ply chain managers emphasis on determining singleton optimization problems such
as total costs, Return on Investment, transportation cost etc. However, framing sin-
gle objective-based supply chain needs prior exposure and additional supportive func-
tions such as lead-time optimization, stock optimization, inventory-level optimization
etc (14). Sometimes conflicting objectives can cause disruptions in the supply chain;
and therefore, a thorough study outlining the trade-offs between objective functions is
necessary prior to mathematical modelling. This deviates the attention from supply
chain modelling towards satisfying conflicting objectives (15).
Many novel supply chain evaluation models have been proposed with limited con-
straints and succinct objectives. For example, A location-inventory problem was inves-
tigated using a multi-objective self-learning algorithm based on non-dominated sorting
genetic algorithm II (NSGA II) in order to minimize the total cost of the supply chain
and maximize the volume fill rate and the responsiveness level (16). In another study, a
multi-objective programming model using Fuzzy logic was derived for forward and re-
verse supply chain systems for minimizing total cost and the environmental impact. The
Fuzzy-based model consisted of two phases. The first phase converts the probability
distribution of multi-objective scenarios into mixed integer linear programming model
and then into an auxiliary crisp model (17). The second phase generates fuzzy decom-
position methodology based on the subjected linear inequalities and variable constraints
to search for the non-dominant solution (18). Several models have also been proposed
for multi-echelon supply chains with closed-loops. For example, A multi-product with
multi-stage and multi-period random distribution model is proposed to compute with
predetermined goals for a multi-echelon supply chain network with high fluctuations in
market demands and product prices. Similarly, a two-phased multi-criteria fuzzy-based
decision approach is proposed where the central focus is to maximize the participant’s
expected revenues, average inventory levels along with providing robustness of selected
objectives under demand uncertainties (20). Considering travelling distance and trav-
eling time, another multi-objective self-learning algorithm is proposed namely fuzzy
logic non-dominated sorting genetic algorithm II (FL-NSGA II), which solves a multi-
objective Mixed-Integer Linear Programming (MILP) problem of allocating vehicles
with multiple depots based on average traffic counts per unit time, average duration of
parking and a primitive cost function (20). Alternatively, a bi-objective mathematical
model is formulated which searches for local minimas of total cost and transportation
cost; which attempt a pareto-search optimal checks to plot the pareto optimal fronts. A
new hybrid methodology is also proposed which combines robust optimization, queu-
ing theory and fuzzy inferential logic for multi-objective programming models based on
hierarchical if-then rules (22). Another variant of the above soft-computing approaches
consisting of a swarm-based optimization technique namely "The Bees algorithm" is
discussed and implemented to deal with multi-objective supply chain problems; to find
pareto optimal configurations of a given SCM network and attempts to minimize the
total cost and lead-time. For an optimization model integrating cost and time criteria,
a modified particle-swarm optimization method (MEDPSO) is adopted for generating
solutions of a multi-echelon unbalanced supplier selection problem (24; 25; 26).
Bayesian logic is one of the most fundamental method borrowed today for optimizing
and reinforcing probabilistic models (10). Quantitative measures that are found dis-
persed in any objective based problem are often modelled using Bayesian analysis (12).
For example, Supply chain network as Directed Acyclic Graph (DAG) welcomes the
expansion of conditional probability and measures of similarity to examine the extent
5
of how such graphs can be extrapolated and utilized to infer vague quantities. In con-
trast to fuzzy logic, where no deterministic model is presumably evaluated, Bayesian
modelling assumes the underlying principals of probability distributions to derive con-
clusions (9). Such methods are extraordinarily employed in modelling risks in supply
chain networks. Capturing the effects of risk propagation and assess the total fragility
of a supply chain are considered one of the most significant problems in SCRM. The
amount of uncertainty and inevitability in supply chain operations transforms this prob-
lem to multi-criteria based analytical problems. However, several models were sum-
marized and reinforced to create a well-acknowledged and succinctly derived model
for formulating disruption probability and the occurrence of risks. Based on Inventory
optimization literature, it is evident that supply risk originates in multiple locations and
propagate autonomously within the network. The effect is identical to and termed as
"Ripple effect".
As per the investigation conducted within a limited scope, design strategies and man-
agerial architectures for reverse supply chain systems are unexplored and lacking con-
firmed literature (8). Optimization problems pertaining to lead-time optimization, in-
ventory level optimization and cost optimization are traditionally dealt with G (GA)
based approaches by spawning a subset of ideal solutions and evaluating fitness values
for each solution (18; 19; 22). Heuristic algorithms tend to search for local optimal
values and have better dropout ratios as compared to GA-based approaches. On the
other hand, evolutionary schemes tend to combine optimization methods with GA ap-
proaches (8). For example, a customer-allocation optimization is attempted using MILP
based on multi-tier retailer location model. The model examined the locations and ca-
pacities of manufacturing cum retailing facilities as well as transportation links to better
facilitate customers with criteria-based requirements. In (8), an Reverse Supply Chain
Problem (RSCP) model for addressing the design of reverse manufacturing supply net-
work is proposed. The model is a hybrid model consisting of GA-based approaches and
underlying 0-1 MILP optimization constraints.
6
CHAPTER 3
In this chapter, a deterministic mathematical model for each attribute in the supply chain
network is discussed. We have used MILP for formulating risk indices.
For a given supply chain with 4 echelons: Supplier, Distributor, Manufacturer, Retailer
and selective activities such as sourcing or supplying raw material to each component,
assembling final products and delivering to destination markets, each component has its
own cost, lead-time and associated risk.
1. The Risk Index (RI) is derived from the model proposed in (9), and can be math-
ematically formulated as:
n
∑ m
∏
RIsupplier = αsij . βsij . (1 − (1 − P (S̃ij )) (3.1)
i=1 j=1
where, αsij is the consequence to the supply chain if the ith supplier fails,
βsij is the percentage of value added to the product by the ith supplier,
P (S̃ij ) denotes the marginal probability that the ith supplier fails for j th demand,
Similarly, the risk indices for the rest of the components can be calculated as:
2. For each set of demand, the cumulative risk index of the supply chain network
can be calculated as:
T RI = w1 . RIsupplier + w2 . RIdistributor + w3 . RImanuf acturer + w4 . RIretailer (3.5)
4. For each couple of normalized risk index and total cost of the supply chain, the main
objective function Z is given as:
Z = w1 . T Cn + w2 . T RIn (3.7)
S1
S2 M R demand
S3
Figure 3.1: Simple Supply chain architecture with multiple suppliers, single manufac-
turer and single retailer
8
Table 3.1: Example of supply chain logistics
x 1 + x2 + x 3 = d (3.9)
For example, let us consider that the demand subjected to the supply chain model is
100 units. In that case we form a system of two linear equations, from which we can
obtain a set of possible solutions. For a simplistic case study, we have assumed that the
linear inequality is a linear equality subject to linear constraints, given that the supply
chain model is at a supply-demand equilibrium. This implies that supplementary costs
outside the scope of the suppliers are neglected and the overall expenditure is equated
to the total revenue generated in one echelon.
x1 + x2 + x3 = 100 (3.11)
9
100 + x1
x3 = (3.13)
4
Assume x1 = a, then
100 + a
x3 = (3.14)
4
100 + a
a+ + x2 = 100 (3.15)
4
100 + a
x2 = 100 − a − (3.16)
4
The linear Diophantine equation can be solved employing the SymPy library in
Python. A psuedo code is given below:
1 from sympy . s o l v e r s . d i o p h a n t i n e i m p o r t d i o p h a n t i n e
2 from sympy . s o l v e r s . d i o p h a n t i n e i m p o r t d i o p _ l i n e a r
10
3 from sympy i m p o r t symbols , s y m p i f y
4
5 a , b , c= s y m b o l s ( " x , y , z " , i n t e g e r = T r u e )
6
8 coeff= l i s t ()
9 f o r key i n s u p p l i e r s :
10 # p r i n t ( key )
11 c o e f f . a p p e n d ( s u p p l i e r s [ key ] )
12 # print ( coeff )
13
14 # ## Base S o l u t i o n
15 demand =500
16 m a n _ c o s t =38
17
18 b a s e _ e x p r =2 * a +3 * b +7 * c −4* demand
19 p r i n t ( base_expr )
20 [ b a s e _ s o l s ]= d i o p h a n t i n e ( b a s e _ e x p r )
21 p r i n t ( " Base S o l n : " + s t r ( b a s e _ s o l s ) )
22 t_0 , t_1 = b a s e _ s o l s [ 2 ] . f r e e _ s y m b o l s
120
Supplier 1
100 Supplier 2
Supplier 3
80
Value
60
40
20
0
0 20 40 60 80 100
Solutions
11
CHAPTER 4
STATISTICAL APPROACHES
In this chapter, we will introduce descriptive hypothesis testing and statistical machine
learning models for problems related to demand forecasting and curve-fitting. In order
to observe the nature of the models, We will test the selected models with large datasets
acquired from multiple sources. Consequently, we will tabulate the achieved evaluation
metrics and attempt to tune the model parameters in order to optimize the limitations
and boost their individual performances. Finally, we will illustrate a well-structured
nomenclature to present a generalized conclusion drawn from the observed scenarios.
Table 4.1: Student’s T test for "UAE Distributor" dataset: Column E (Sales)
Hypothesis Parameters
Sample Size (%) Population Mean Sample Mean Decision
t p
Table 4.2: Student’s T test for "UAE Distributor" dataset: Column F (Cost)
Hypothesis Parameters
Sample Size (%) Population Mean Sample Mean Decision
t p
The two statistical tests conducted on the same dataset concludes that a percent-
age of sample extracted from the population can be assumed to be an appropriate rep-
resentative of the population and later utilized for descriptive analytical tasks such as
input-output curve fitting. This promises to reduce redundancy and computational com-
plexity.
13
Table 4.3: Student’s T test for "Walmart" dataset : Column K (Weight KG)
Hypothesis Parameters
Sample Size (%) Population Mean Sample Mean Decision
t p
n
1∑
χ= xi (4.3)
n i=1
During the course of this study, we reviewed many datasets obtained from multi-
ple sources. Generally, empirical datasets extracted from physical experimental setup
are more likely to contain marginal errors, instrumentation error, human error etc. and
therefore it is crucial to eliminate the redundant error percentage indigenous in such
datasets. We have transformed the obtained datasets into normalized datasets and re-
designed some of the machine learning models as well as ANN models to observe their
learning behaviour from such tightly encapsulated data. The normalized data series
14
however retains the original information provided that the normalization process is not
stringent. We must carefully observe the standard deviation of the original series and
reasonably select a batch size. Consequently, it is concluded within the scope of this
study that the error percentage is drastically reduced when the samples are prepossessed
with appropriate measures.
original series
normalized series
1.5
1
f (x)
0.5
0 2 4 6 8 10
x
The pseudo code given below illustrates the algorithmic steps of normalizing a
dataset using MATLAB ™. As per the pseudo code, samples are grouped in order
of 100s and their respective means are computed. The mean is substituted as an in-
formative datapoint to yield a axiomatically similar dataset as the original one. This
essentially decomposes the dataset by reducing the number of samples to be processed
but also preserves the integrity of the data. Hence, it is a fruitful approach when con-
sidering high volume of noisy data along with huge sample size.
1 % B a t c h e d Mean N o r m a l i z a t i o n − B a t c h _ s i z e =100
2 s a m p l e _ s i z e =170000;
3 b a t c h _ s i z e =100;
4 batch_input100 =[];
5 for i =0:( sample_size / batch_size )
6 temp= d a t a _ i n p u t ( 1 + i * b a t c h _ s i z e : b a t c h _ s i z e * ( i +1 ) , 1 ) ;
7 b a t c h _ i n p u t 1 0 0 = [ b a t c h _ i n p u t 1 0 0 ; mean ( temp ) ]
8 end
15
(a) Over-generalization
Figure 4.2: Actual vs Predicted plot for Normalized data with different batch sizes
16
4.2 Machine Learning Models
There are broadly two categories of problems in Machine Learning namely regres-
sion and classification problems. A regression problem is defined as an estimation
problem where the target variable is a continuous variable whereas in a classification
problem, the target variable is categorical. For the sake of a comparative study, we will
discuss each model extensively and compare the results in section 5.3. Regression and
classification algorithms allows you to draw inferences from data and build predictive
models.
y = θ 1 x 1 + θ2 x 2 (4.5)
17
4.2.2 Decision Tree Model
Decision Trees are predictors that interprets decisions based on path traversals in a
structured tree beginning from the root node to a leaf node. The leaf node essentially
remarks a decision. Every parent node in the decision tree denotes an arbitrary binary
test that concludes in either ’True’ or ’False’ for a classification problem, or results
in a real integer. Regression trees measure these real-valued integers to give numeric
responses. The algorithm to grow decision trees is given below:
1 Compute E n t r o p y f o r d a t a s e t .
2 Select quantitative attributes .
3 For each a t t r i b u t e :
4 C a l c u l a t e entropy of a l l response v a r i a b l e .
5 C a l c u l a t e Mean e n t r o p y f o r t h e a t t r i b u t e .
6 Calculate gain for the a t t r i b u t e .
7 S e l e c t the a t t r i b u t e with h i g h e s t gain .
8 Repeat randomly a f t e r s e v e r a l i t e r a t i o n s .
N
∑
S=− pi log2 (pi ) (4.6)
i=1
q
∑ Ni
IG(Q) = S0 − Si (4.7)
i=1
N
However, in this study we have considered a regression problem. Hence our objec-
tive is to split the tree in a way that the Residual Sum of Squares (RSS) is minimal. So,
we calculate the absolute mean to bifurcate the samples into two splits, each linking two
child nodes. The child nodes iteratively calculate the mean for rest of the samples and
the tree is grown until a desired model is achieved. Performance measures such as dept
of the tree, number of splits and pruning ratio can be controlled by hyper-parameter
tuning. This topic is excluded in this study as it requires advanced hands-on experience
with Matlab toolbox ™and will be discussed in future contributions to the literature.
18
Figure 4.3: Decision Tree Model
Decision trees are prone to variability since the selective features are arbitrarily drawn.
Conducting a series of test on the same sample may result in different since individual
19
decision tree may tend to overfit. In order to overcome this anomaly, decision trees
are combined to produce a collective result by combination or pooling, which reduced
over-fitting and improves generalization.
Random Forests yields many additive decision tree and harvests a weighted pre-
dicted from all the trees. For a regression problem, the individual predictors produce
their corresponding predictions and the mean prediction is considered as the final value.
The pseudo code for implementing Random Forest Regressors is also given below:
1
4 # T r a i n a Ra n d o m F or e s t model .
5 RFmodel = R a n d o m F o r e s t R e g r e s s o r ( l a b e l C o l = " O r d e r _ p u r e " , f e a t u r e s C o l = "
InpVec " , numTrees =3 , maxBins =5 00 0)
6 model = RFmodel . f i t ( t r a i n i n g D a t a )
7
8 # Make p r e d i c t i o n s .
9 p r e d i c t i o n s = model . t r a n s f o r m ( t e s t D a t a )
10 # Evaluate Performance
11 evaluator = RegressionEvaluator (
12 l a b e l C o l = " O r d e r _ p u r e " , p r e d i c t i o n C o l = " p r e d i c t i o n " , metricName = "
rmse " )
13 rmse = e v a l u a t o r . e v a l u a t e ( p r e d i c t i o n s )
14 p r i n t ( " Root Mean S q u a r e d E r r o r (RMSE) = %g " % rmse )
20
Figure 4.5: Actual vs Predicted plot for Random Forest predictors
XGB Trees are ensemble learning approach with two or more hybrid learning algo-
rithms. As discussed in the previous section, Decision Tree models exhibit high vari-
ance in generalization. Ensemble based learning approaches over this variance in a
non-dominant fashion.
Boosting: Trees are generated sequentially such that each successive tree is empha-
sized to minimize the error generated by the preceding tree. The overall error is reduced
since every generation of tree reduces error for its predecessors. In contrast to bagging
techniques, in which trees are grown to their maximum extent, boosting uses trees with
fewer splits. Several learning parameters such as number of trees or iterations, the learn-
ing rate or gradient boosting rate, dept of each tree can be optimally selected to reduce
the computing overhead.
Boosting Algorithm:
1 T r a i n i n i t i a l model F_0 t o p r e d i c t t a r g e t Y .
2 Compute r e s i d u a l e r r o r , d e l t a _ y = Y − F_0 .
3 C r e a t e new model H_1 and f i t t o t h e r e s i d u a l s .
4 Combine ( F_0 + H_1 ) t o y i e l d F_1 .
21
5 R e p e a t f o r F_1
n
∑
F _0(x) = argmin L(yi , γ) (4.8)
1
n
∑ n
∑
argmin L(yi , γ) = argmin (yi − γ)2 (4.9)
1 1
n
1∑
F _0(x) = (yi ) (4.10)
n 1
The additive model H_1(x) computes the mean of the residuals at each leaf node of
the tree. The boosted function F_1(x) is obtained by summing F_0(x) with H_1(x).
Numerical Example
Consider a regression problem with input feature Sales and target variable Quantity.
22
Pseudo code
1 # ## XGBRegressor Model
2 p a r a m s = { ’ max_depth ’ : 2 ,
3 ’ s i l e n t ’ :0 ,
4 ’ colsample_bytree ’ :0.3 ,
5 ’ max_dept ’ : 5
6 ’ alpha ’ :10
7 ’ learning_rate ’ :0.38 ,
8 ’ o b j e c t i v e ’ : ’ reg : l i n e a r ’ ,
9 ’ n_estimators ’ :10}
10
23
Figure 4.7: Response Plot for XGBoosted Trees
In this section, we will discuss the performance of each model based on empirical ob-
servations.
We attempted hyper-parameter tuning for the discussed models and overlooked op-
timization parameters to better understand the run-time performance of these model.
The source code was compiled and executed with varying model parameters and the re-
spective loss function was evaluated as a MILP optimization problem, the loss function
being the sole objective function.
24
Table 4.5: Tabulated performances of ML Models
Test Tuning Trees Training Memory
Model RMSE Max Dept
loss loss Pruned Time (in s) Utilized
Decision Tree
0.59 0.60 0.76 7 22 67 66B
(maxBins=100)
RandomForest
{n_estimators=50, 2.36 1.82 1.53 23 3 82 246B
MaxBins=100}
RandomForest
{n_estimators=100, 1.20 2.66 1.09 18 3 87 107B
MaxBins=100}
XGB Tress
{’colsample_bytree’: 0.6,
’alpha’: 10,
7.90 7.28 2.81 38 7 1020 2081B
’learning_rate’: 0.25,
’max_depth’: 100,
’n_estimators’:100}
XGB Tress
{’colsample_bytree’: 0.3,
’alpha’: 8,
6.69 6.28 2.58 29 2 340 ∼1040B
’learning_rate’: 0.38,
’max_depth’: 50,
’n_estimators’:50}
As observed in 4.5, the respective performance metrics for each model is noted and
tabulated. We achieve a low test error in Random Forest and Decision Tree model as
compared to XGB models. This comparison is drawn based on accounting two hypo-
thetical assumptions:
Hyper-parameter tuning for XGB Tree models is performed extensively after care-
fully examining the dataset and its statistical parameters. We employ data visualization
techniques to reduce internal dependency on stochastic constraints and estimate a num-
ber of predictors/estimators based on the average rate of change of the response variable.
We achieve a decent score, but however, since the number of boosting rounds is only
considered as 5, the optimization process is terminated after obtaining a satisfactory
measurement. In context to learning parameters, we have noticed that a incremental
change in the learning rate brings a linear change in the performance, and the topo-
logical difference in the model which played a central role in optimizing the objective
function.
25
CHAPTER 5
NEURAL APPROACHES
We carefully examine the neural network architecture ranging from number of hidden
layers used and number of hidden neurons inside every layer. It is imperative to first
select a distinctive problem and a neural network model in order to determine its model
and run-time performance. For example, In our experiment we have considered the
dataset "UAE_distributor.xlsx" in order to evaluate a Multi-Layered Perceptron (MLP)
model and perform a function approximation to predict invoiced sales quantity.
The dataset contain 6011 samples containing the above mentioned attributes. We split
the population samples as 80% training samples and 20% testing samples. We use the
generalized back-propagation algorithm, as described in Appendix B. We then tabulate
the observations after performing multiple experiments altering the chronology of train-
ing and randomizing the samples.
Table 5.1: Recorded Performance for MLP Network with 10 Hidden Layers
Training Parameters
Network Architecture General Remarks
Performance Gradient Mu Epochs Time
MLP Network 54.0 96.1 1.00 25 0.03 Accepted
(1 Input Layer + 45.8 1.13e+0.3 1.00 10 0.01 Best Fit
10 Hidden Layer + 62.6 1.60e+0.3 10 35 0.05 Acceptable
1 Output Layer ) 63.6 50.9 10 6 0.01 Acceptable
Figure 5.2: Actual vs Predicted Plot for MLP network with 10 Hidden Layers
27
5.2 Long-Short Term Memory (LSTM) Networks
LSTM is a type of artificial neural network designed especially for time series predic-
tion problems. These have an input gate, output gate, forget gate and memory cell which
are connected by loops adding feedback over time. The memory cell stores states which
allow LSTMs to generalize patterns across large data points rather than succumbing to
immediate patterns. As more layers of LSTM are added to the model, it is found to
be successful across a diverse range of problems such as describing an image, gram-
mar learning, music composition, language translation, etc. Unidirectional LSTM store
information that has appeared in past whereas bidirectional LSTM store information
from the past as well as the future in time series problems. It maintains two different
hidden states for the same. Bidirectional LSTM is thus found more effective than the
unidirectional variant as it understands the context better.
5.2.1 Architecture
ati i
bt−1
h ϕ
btω
at−1
c ãtc atc
Notations
28
• ati is network input to node j at a particular time t
• ι, ϕ and ω represents input gate, forget gate and output gate respectively
Forward pass:
Input gates
I
∑ H
∑ C
∑
atι = wiι xti + whι bt−1
h + wcι st−1
c (5.1)
i=1 h=1 c=1
Forget gates
I
∑ H
∑ C
∑
atϕ = wiϕ xti + whϕ bt−1
h + wcϕ st−1
c (5.3)
i=1 h=1 c=1
Cells
I
∑ H
∑
atc = wic xti + whc bt−1
h (5.5)
i=1 h=1
29
Output gates
I
∑ H
∑ C
∑
atω = wiω xti + whω bt−1
h + wcω st−1
c (5.7)
i=1 h=1 c=1
Cell outputs
btω = f (atω ) (5.9)
Backward pass:
∂O
ϵtc = (5.10)
∂btc
∂O
ϵts = (5.11)
∂stc
Cell outputs
C
∑
διt = f ′ (atι ) g(atc )ϵts (5.12)
c=1
Output gates
C
∑
′
δϕt =f (atϕ ) s(c t − 1)ϵts (5.13)
c=1
States
δct = btι g ′ (atc )ϵts (5.14)
Cells
(
ϵts = btω hι (stc )ϵtc + bϕ t + 1)ϵ(c t + 1) + w( cι)δι( t + 1) + w( cω)δω( t + 1) (5.15)
Forget Gates
C
∑
′
δωt =f (atω ) h(stc )ϵtc (5.16)
c=1
30
Input Gates
K
∑ H
∑ (
ϵts = wck δkt + wch δh t + 1) (5.17)
k=1 h=1
where f (x) = 1
1+e−x
, g(x) = 4
1+e−x
, h(x) = 2
1+e−x
−1 and g(x)ϵ[−2, 2], h(x)ϵ[−1, 1]
5.2.2 Model
We used bi-directional LSTM to predict the selling price of items. Our feature vector
consisted of selling price by the supplier, distributor, manufacturer and retailer. We used
raw data of fluctuating prices every two minutes across 42 months starting from 9 Feb
2014 till 28 August 2017. The total number of data points were above 8,00,000. The
model was trained on 70% of the data while 20% and 10% was used for testing and
validation. Following diagram which represents the model summary.
31
Figure 5.5: Error Surface plot for Bi-LSTM Model
This section outlines the performance of each model extensively based on observations
as well as empirical result. All the models were carefully optimized and tested for
evaluating with benchmarks. We started with projecting the performance of each model
in two-dimensional plots and attempted to deduce their inherent model performance by
simple extrapolation. The principal objectives that is accounted for every model are the
model parameters such as sample_size, number of hidden layers, number of neurons
in each layer and the its effect on learning parameters. We attempted the analyses on
obtained results from taken observations and extrapolated and extrapolated the model
expectations further under the same learning parameters.
32
(a) Training cycle=1 (b) Training cycle=2
Figure 5.6: Observed training performance of MLP networks after 3 training cycles
33
CHAPTER 6
NAR Model is a predictive model that considers an input sequence y(t) of preceding ’d’
time-steps to predicts the next sequence of d time-steps. This type of forecasting prob-
lems are termed as sequence-to-sequence prediction as the input sequence is coherently
mapped with the target sequence.
y(t) = f (y(t − 1), y(t − 2), y(t − 3), ....., y(t − d)) (6.1)
35
4. Predict for series y(t) and evaluate performance.
1
2 [Y, Xf , Af ] = NAR_model ( Xs , Xi , Ai ) ;
3 p e r f = p e r f o r m ( NAR_model , Ts , Y)
4
Training Parameters
Network Architecture
Performance Gradient Mu Epochs Time
7.92e-05 1.67e-04 1.00e-06 12 0.04
8.10e-05 5.18e-05 1.00e-06 18 0.04
NAR Model
7.99e-05 1.48e-04 1.00e-07 6 0.01
(delay 4 time-steps)
7.49e-05 2.88e-04 1.00e-07 14 0.04
8.64e-05 1.44e-04 1.00e-07 20 0.05
b
7.92 e-05 denotes a very desirable model for sequence-to-sequence prediction.
6.1.2 Results
36
Figure 6.3: Error Surface Plot for NAR Model
Fig. 6.3 suggests that the error surface for NAR model is curvi-linear.The plot
suggests a monotonous increase in the absolute error, however we need to calculate the
rate of change of error to draw quantitative conclusion as compared to results obtained
from other models. The rate of change of error can be derived as follows:
y = b ex
dy
= b ex
dx
tanθ = b ex
θ = tan−1 (b ex )
dθ d
= (tan−1 (b ex ))
dx dx
dθ 1
=
dx 1 + b e2x
37
CHAPTER 7
CONCLUSION
The statistical hypothesis tests conducted across several datasets denotes arbitrary deci-
sions on random sampling and data optimizations cannot be generally considered affir-
mative. As a contradictory remark, our hypothesis was rejected for the Walmart dataset.
The tabulated results shows that even at 0.05% level of significance, the difference of
mean of samples to that of population mean can vary and is in-determinant until calcu-
lated carefully. This study helped us to refrain from making hypothetical assumptions
regarding the distribution of the dataset. As a conclusion, it is clear the the hypothe-
sis validation would entirely depend on the distribution of the dataset and its degree of
skweness.
CODE ANALYSIS
In this chapter, the program code related to our work using MATLAB™and Python
is presented. Additionally, we have also provided a dynamic code analysis generated
using the MATLAB™code analyzer.
1 # ! / u s r / b i n / env p y t h o n 3
2 # −*− c o d i n g : u t f −8 −*−
3 """
4 C r e a t e d on Thu Feb 28 1 3 : 1 8 : 5 4 2019
5 Random F o r e s t R e g r e s s i o n w i t h P y s p a r k
6 @author : h e e r o k b a n e r j e e
7 """
8
9 i m p o r t p a n d a s a s pd
10 from p y s p a r k . ml i m p o r t P i p e l i n e
11 from p y s p a r k . ml . r e g r e s s i o n i m p o r t R a n d o m F o r e s t R e g r e s s o r
12 from p y s p a r k . ml . e v a l u a t i o n i m p o r t R e g r e s s i o n E v a l u a t o r
13 from p y s p a r k . ml . f e a t u r e i m p o r t V e c t o r A s s e m b l e r
14 from p y s p a r k . ml . f e a t u r e i m p o r t I m p u t e r
15 from p y s p a r k . ml . f e a t u r e i m p o r t S t r i n g I n d e x e r
16 from p y s p a r k . s q l . s e s s i o n i m p o r t S p a r k S e s s i o n
17 from p y s p a r k . c o n t e x t i m p o r t S p a r k C o n t e x t
18
19 #
20 sc = SparkContext ( ’ l o c a l ’ )
21 spark = SparkSession ( sc )
22
23
24 # Importing Dataset
25 d a t a s e t = spark . read . format ( " csv " ) . option ( " header " , " t r u e " ) . load ( " /
home / h e e r o k b a n e r j e e / Documents / hpd . c s v " )
26 d a t a s e t = d a t a s e t . withColumn ( " Order_Demand " , d a t a s e t [ " Order_Demand " ] .
c a s t ( ’ double ’ ) )
27 d a t a s e t = d a t a s e t . s e l e c t ( " P r o d u c t _ C o d e " , " Warehouse " , " P r o d u c t _ C a t e g o r y
" , " D a t e " , " Order_Demand " )
28
29 # I n d e x l a b e l s , a d d i n g m e t a d a t a t o t h e l a b e l column .
30 # F i t on whole d a t a s e t t o i n c l u d e a l l l a b e l s i n i n d
31 CodeIndexer= S t r i n g I n d e x e r ( inputCol =" Product_Code " , outputCol ="
CodeIndex " , h a n d l e I n v a l i d =" s k i p " )
32 W a r e h o u s e I n d e x e r = S t r i n g I n d e x e r ( i n p u t C o l = " Warehouse " , o u t p u t C o l = "
WarehouseIndex " , h a n d l e I n v a l i d =" s k i p " )
33 CategoryIndexer = S t r i n g I n d e x e r ( inputCol =" Product_Category " ,
outputCol =" CategoryIndex " , h a n d l e I n v a l i d =" skip " )
34 D a t e I n d e x e r = S t r i n g I n d e x e r ( i n p u t C o l =" Date " , o u t p u t C o l =" DateIndex " ,
h a n d l e I n v a l i d =" skip " )
35
36 assembler = VectorAssembler (
37 i n p u t C o l s =[ " CodeIndex " , " WarehouseIndex " , " C a t e g o r y I n d e x " , "
DateIndex " ] ,
38 o u t p u t C o l = " Ghoda " , h a n d l e I n v a l i d = " s k i p " )
39
48
49 # C h a i n i n d e x e r s and f o r e s t i n a P i p e l i n e
50 p i p e l i n e = P i p e l i n e ( s t a g e s =[ CodeIndexer , WarehouseIndexer ,
CategoryIndexer ,
51 D a t e I n d e x e r , a s s e m b l e r , DemandImputer , r f ] )
52
40
53 # T r a i n model . This also runs the indexers .
54 model = p i p e l i n e . f i t ( t r a i n i n g D a t a )
55
56 # Make p r e d i c t i o n s .
57 p r e d i c t i o n s = model . t r a n s f o r m ( t e s t D a t a )
58
61 p r e d i c t i o n s . d i s t i n c t ( ) . show ( )
62
63 evaluator = RegressionEvaluator (
64 l a b e l C o l = " O r d e r _ p u r e " , p r e d i c t i o n C o l = " p r e d i c t i o n " , metricName = "
rmse " )
65 rmse = e v a l u a t o r . e v a l u a t e ( p r e d i c t i o n s )
66 p r i n t ( " Root Mean S q u a r e d E r r o r (RMSE) = %g " % rmse )
67
68
69 # ## p l o t t i n g g r a p h
70 eg = p r e d i c t i o n s . s e l e c t ( " p r e d i c t i o n " , " O r d e r _ p u r e " , " Ghoda " ) . l i m i t
(1000)
71 p a n d a _ e g = eg . t o P a n d a s ( )
72
1 # ! / u s r / b i n / env p y t h o n 3
2 # −*− c o d i n g : u t f −8 −*−
3 """
4 C r e a t e d on Wed J a n 23 0 2 : 1 2 : 4 1 2019
5 XGBoosted C l a s s i f i c a t i o n w i t h P y s p a r k and x g b o o s t l i b
6 @author : h e e r o k b a n e r j e e
7 """
8
9 i m p o r t numpy a s np
10
11 from p y s p a r k . ml i m p o r t P i p e l i n e
12 from p y s p a r k . ml . f e a t u r e i m p o r t V e c t o r A s s e m b l e r
13 from p y s p a r k . ml . f e a t u r e i m p o r t S t r i n g I n d e x e r
14 # from p y s p a r k . ml . c l a s s i f i c a t i o n i m p o r t D e c i s i o n T r e e C l a s s i f i e r
41
15 from p y s p a r k . s q l . s e s s i o n i m p o r t S p a r k S e s s i o n
16 from p y s p a r k . c o n t e x t i m p o r t S p a r k C o n t e x t
17
18 from s k l e a r n i m p o r t m o d e l _ s e l e c t i o n
19 from s k l e a r n . m e t r i c s i m p o r t a c c u r a c y _ s c o r e
20
21 i m p o r t x g b o o s t a s xgb
22
23 sc = SparkContext ( ’ l o c a l ’ )
24 spark = SparkSession ( sc )
25
28
38 # ## I m p o r t T r a i n i n g d a t a s e t
39 data = spark_read ( fname_train )
40 d a t a = d a t a . s e l e c t ( "ARRIVAL DATE" , "WEIGHT (KG) " , "MEASUREMENT" , "QUANTITY
" , "CARRIER CITY " )
41 ( train_data , t e s t _ d a t a )=data . randomSplit ( [ 0 . 8 , 0 . 2 ] )
42 train_data =convert_to_numeric ( train_data )
43
44 # ## P i p e l i n e Component1
45 # ## S t r i n g I n d e x e r f o r Column " Timestamp "
46 # ##
47 dateIndexer = StringIndexer (
48 i n p u t C o l = "ARRIVAL DATE" ,
49 outputCol =" dateIndex " , h a n d l e I n v a l i d =" skip " )
50 # p r i n t ( s t r I n d e x e r . getOutputCol ( ) )
51
42
52 # i n d e x e r _ o u t . show ( )
53
54 # ## P i p e l i n e Component2
55 # ## S t r i n g I n d e x e r f o r Column " L a b e l "
56 # ##
57 carrierIndexer = StringIndexer (
58 i n p u t C o l = "CARRIER CITY " ,
59 outputCol =" c a r r i e r I n d e x " , h a n d l e I n v a l i d =" skip " )
60 # p r i n t ( s t r I n d e x e r . getOutputCol ( ) )
61 # out2 = l a b e l I n d e x e r . f i t ( t r a i n _ d a t a ) . transform ( t r a i n _ d a t a )
62
63 # ## P i p e l i n e Component2
64 # ## V e c t o r A s s e m b l e r
65 # ##
66 vecAssembler = VectorAssembler (
67 i n p u t C o l s = [ "WEIGHT (KG) " , "MEASUREMENT" , "QUANTITY" , " d a t e I n d e x "
],
68 outputCol =" vecFea " , h a n d l e I n v a l i d =" s k i p " )
69 # assembler_out = vecAssembler . transform ( indexer_out )
70 # a s s e m b l e r _ o u t . s e l e c t ( " v e c F e a " ) . show ( t r u n c a t e = F a l s e )
71
72 # ## P i p e l i n e Component3
73 # ## GBT C l a s s i f i e r
74 # d t _ c l a s s = D e c i s i o n T r e e C l a s s i f i e r ( l a b e l C o l =" I n d e x L a b e l " , f e a t u r e s C o l ="
vecFea " )
75
76 # ## T r a i n i n g − P i p e l i n e Model
77 # ##
78 p i p e = P i p e l i n e ( s t a g e s =[ d a t e I n d e x e r , c a r r i e r I n d e x e r , vecAssembler ] )
79 pipe_model= pipe . f i t ( t r a i n _ d a t a )
80
81 output =pipe_model . t r a n s f o r m ( t r a i n _ d a t a )
82 o u t _ v e c = o u t p u t . s e l e c t ( " d a t e I n d e x " , " v e c F e a " ) . show ( 1 0 )
83
43
89 p a r a m s = { ’ max_depth ’ : 2 ,
90 ’ s i l e n t ’ :0 ,
91 ’ learning_rate ’ :0.38 ,
92 ’ objective ’ : ’ multi : softprob ’ ,
93 ’ num_class ’ :284}
94
101
105 # xgbmodel = X G B C l a s s i f i e r ( )
106 xgbmodel = xgb . t r a i n ( params , x g b _ t r a i n , 1 0 )
107 p r i n t ( xgbmodel )
108
109 # ## T e s t i n g P i p e l i n e + X G B o o s t C l a s s i f i e r
110 # ##
111
112
113
114 t e s t _ o u t p u t =pipe_model . t r a n s f o r m ( t e s t _ d a t a )
115
116 x g b _ o u t p u t = xgbmodel . p r e d i c t ( x g b _ t e s t )
117 p r i n t ( xgb_output )
118
119 p r e d i c t i o n s = np . a s a r r a y ( [ np . argmax ( l i n e ) f o r l i n e i n x g b _ o u t p u t ] )
120 print ( predictions )
121
122 # ## D e t e r m i n i n g A c c u r a c y S c o r e
123 # ##
124 accuracy = accuracy_score ( Y_test , p r e d i c t i o n s )
44
125 p r i n t ( " A c c u r a c y : %.2 f%%" % ( a c c u r a c y * 1 0 0 . 0 ) )
1 i m p o r t numpy a s np
2 i m p o r t p a n d a s a s pd
3 from k e r a s . m o d e l s i m p o r t S e q u e n t i a l
4 from k e r a s . l a y e r s i m p o r t D r o p o u t
5 from k e r a s . l a y e r s i m p o r t LSTM
6 from k e r a s . l a y e r s i m p o r t Dense
7 from k e r a s . l a y e r s i m p o r t B i d i r e c t i o n a l
8 from k e r a s . c a l l b a c k s i m p o r t M o d e l C h e c k p o i n t
9 from s k l e a r n . m e t r i c s i m p o r t m e a n _ s q u a r e d _ e r r o r a s mse
10
14 def remove_duplicates ( df ) :
15 ColoumnArr = np . a r r a y ( d f . i x [ : , ’ Timestamp ’ ] )
16 i =0
17 ArrLen = l e n ( ColoumnArr )
18 index_duplicate= []
19 # i d e n t i f y d u p l i c a t e s by i n d e x
20 w h i l e ( i <ArrLen −1) :
21 i f ColoumnArr [ i ]== ColoumnArr [ i + 1 ] :
22 i n d e x _ d u p l i c a t e . append ( i +1)
23 i +=1
24 # remove d u p l i c a t e s
25 df=df . drop ( i n d e x _ d u p l i c a t e )
26 r e t u r n df
27
28
29 d e f a v g _ o v e r _ t i m e ( df , i n d e x C o l = 0) :
30 avg ={}
31 colLen= df . shape [ 0 ]
32 f o r x in range ( colLen ) :
33 time = s t r ( df [ x ] [ indexCol ] ) [ 1 1 : ]
34 i f t i m e n o t i n avg :
35 avg [ t i m e ] = [ [ 0 . 0 , 0 . 0 , 0 . 0 , 0 . 0 , 0 . 0 ] , 0 ]
36 f o r colNo i n r a n g e ( 1 , 6 ) :
45
37 avg [ t i m e ] [ 0 ] [ colNo −1]+= f l o a t ( d f [ x ] [ colNo ] )
38 avg [ t i m e ] [ 1 ] + = 1
39 f o r key , v a l i n avg . i t e m s ( ) :
40 avg [ key ] = [ x * 1 . 0 / v a l [ 1 ] f o r x i n v a l [ 0 ] ]
41 r e t u r n avg
42
43
44 d e f r e p l a c e _ n o i s e ( df , i n d e x C o l = 0 ) :
45 avg = a v g _ o v e r _ t i m e ( df , i n d e x C o l )
46 colLen= df . shape [ 0 ]
47 f o r x in range ( colLen ) :
48 time = s t r ( df [ x ] [ indexCol ] ) [ 1 1 : ]
49 for col in range (1 ,6) :
50 i f df [ x ] [ col ]==0:
51 try :
52 d f [ x ] [ c o l ] = avg [ t i m e ] [ c o l −1]
53 except KeyError :
54 print (x)
55 print ( col )
56 r e t u r n df
57
64 # data preperation
65 s e q _ l e n g t h = 100
66 DataX= [ ]
67 DataY= [ ]
68
69 f o r x i n r a n g e ( l e n ( d f )−s e q _ l e n g t h ) :
70 SeqX= d f [ x : x+ s e q _ l e n g t h , 1 : 6 ]
71 SeqY= d f [ x+ s e q _ l e n g t h , 5 ]
72 DataX . a p p e n d ( SeqX )
73 DataY . a p p e n d ( SeqY )
74
75 DataX=np . a r r a y ( DataX )
46
76 DataY=np . a r r a y ( DataY )
77
86 # d e v e l o p i n g t h e model
87 model = S e q u e n t i a l ( )
88 model . add ( B i d i r e c t i o n a l (LSTM( 3 2 , r e t u r n _ s e q u e n c e s = T r u e ) , i n p u t _ s h a p e
=( T r a i n D a t a X . s h a p e [ 1 ] , T r a i n D a t a X . s h a p e [ 2 ] ) ) )
89 model . add ( D r o p o u t ( 0 . 2 ) )
90 model . add ( B i d i r e c t i o n a l (LSTM( 3 2 ) ) )
91 model . add ( D r o p o u t ( 0 . 2 ) )
92 model . add ( Dense ( T r a i n D a t a Y . s h a p e [ 1 ] ) )
93 filename =" b e s t _ w e i g h t . hdf5 "
94 model . l o a d _ w e i g h t s ( f i l e n a m e )
95 model . c o m p i l e ( l o s s = ’ m e a n _ s q u a r e d _ e r r o r ’ , o p t i m i z e r = ’ adam ’ )
96 # c h e c k p o i n t = M o d e l C h e c k p o i n t ( f i l e p a t h , m o n i t o r = ’ l o s s ’ , v e r b o s e =1 ,
s a v e _ b e s t _ o n l y = True ,
97 #mode= min )
98 # c a l l b a c k s _ l i s t = [ checkpoint ]
99 # f i t t h e model
100 # model . f i t ( T r a i n D a t a X , T r a i n D a t a Y , n b _ e p o c h =50 , b a t c h _ s i z e =64 ,
callbacks= callbacks_list )
101 p r i n t ( " loaded weights " )
102 y = model . p r e d i c t ( T e s t D a t a X )
103 print ( " predicted " )
104 p r i n t ( mse ( TestDataY , y ) )
47
8.2 Dynamic Code Analysis
We executed the MATLAB programs in MATLAB™online cloud platform and the Ma-
chine Learning models with standalone python library. Additionally, a series of bench-
mark tests were conducted to assess whether they can be implemented for extrapolated
samples from the original datasets. The time complexities for each successive function
call is given to trace the call stack.
48
Figure 8.3: Execution of Code snippet for XGB Trees
Along with the execution of each model, we also attempted to understand the in-
herent time complexity and functional dependencies of our algorithmic steps. In order
to evaluate a line-by-line dynamic code analysis and obtain a tabulated result, we have
chosen MATLAB ™Code Analyzer to generate an automated report. The report depicts
the total time elapsed for each functional call and allows to optimize a given source code
in terms of modular programming standards.
49
(a) Breakdown of Total Elapsed Time for Function calls
50
8.3 Test Cases
In this section, we have tabulated the test cases and their respective dimensions as sub-
jected to the prepared models.
51
CHAPTER 9
PUBLICATION
Grishma, Saparia: Time Series Dataset for Risk Assessment in Supply Chain Net-
works, ResearchGate, DOI: 10.17632/gystn6d3r4.2"
Finally, the results from this study is cross-validated with pre-existing models and
we have finished drafting our research paper. We plan to submit the paper for a double-
blinded peer review at Operations Research-PUBSonline library by April 2019.
APPENDIX A
DATASET DESCRIPTION
The dataset "wallmart.csv" contains an extensive amount of data with specific geo-
graphical data like address, port description, destination details etc pertaining to the
inherent logistics involved in a typical Walmart supply chain.
The dataset contains features such as: SHIPPER, SHIPPER ADDRESS, CON-
SIGNEE, CONSIGNEE ADDRESS, ZIPCODE, NOTIFY, NOTIFY ADDRESS, BILL
OF LADING, ARRIVAL DATE, WEIGHT (LB), WEIGHT (KG), FOREIGN PORT,
US PORT QUANTITY, Q.UNIT, MEASUREMENT, M.UNIT, SHIP REGISTERED
IN, VESSEL NAME, CONTAINER NUMBER, CONTAINER COUNT, PRODUCT
DETAILS, MARKS AND NUMBERS, COUNTRY OF ORIGIN, DISTRIBUTION
PORT, HOUSE vs MASTER, MASTER B/L CARRIER CODE, CARRIER NAME,
CARRIER ADDRESS, CARRIER CITY, CARRIER STATE, CARRIER ZIP, PLACE
OF RECEIPT
54
APPENDIX B
BACKPROPAGATION ALGORITHM
X1 1 1 1 O1
Xi i j k Ok
Xn n l m Om
Figure B.1: Simple neural network architecture with one input layer of ’n’ neurons,
one hidden layer of ’l’ neurons and one output layer of ’m’ neurons
m
1∑ 2
Ep (m) = δ (B.2)
2 k=1 Ok
Using the estimate of gradient descent along the error surface to determine the
weight update, Wij we get
∂Ep (m)
∆Wij (m) = −η (B.3)
∂Wij
n
∑
nethpj = (Wij . Xpi ) + bhj (B.5)
i=1
l
∑
netopk = o
(Wjk . ypj )h + bok (B.7)
j=1
Using equ.( B.1) to equ.(B.8), we derive the equation for updating output layer
weights (between hidden layer & output layer).
∂Ep (m)
∆Wjk (m) = −η
∂Wjk
56
REFERENCES
[1] Ivanov, Dmitry, Alexander Tsipoulanidis, and Jörn Schönberger. "Basics of Sup-
ply Chain and Operations Manageme04nt." Global Supply Chain and Operations
Management. Springer, Cham, 2017. 1-14.
[2] Van der Vorst, J. G. A. J. "Supply Chain Management: theory and practices." Bridg-
ing Theory and Practice. Reed Business, 2004. 105-128.
[3] Guedes, Edson Júnior Gomes, et al. "Risk Management in the Supply Chain of the
Brazilian automotive industry." Journal of Operations and Supply Chain Manage-
ment 8.1 (2015): 72-87.
[4] Flynn, Barbara, Mark Pagell, and Brian Fugate. "Survey research design in supply
chain management: the need for evolution in our expectations." Journal of Supply
Chain Management 54.1 (2018): 1-15.
[5] Scheibe, Kevin P., and Jennifer Blackhurst. "Supply chain disruption propagation:
a systemic risk and normal accident theory perspective." International Journal of
Production Research 56.1-2 (2018): 43-59.
[6] Ghadge, Abhijeet, et al. "A systems approach for modelling supply chain risks."
Supply chain management: an international journal 18.5 (2013): 523-538.
[7] Ojha, Ritesh, et al. "Bayesian network modelling for supply chain risk propaga-
tion." International Journal of Production Research 56.17 (2018): 5795-5819.
[8] Santibanez-Gonzalez, Ernesto Del R., and Henrique Pacca Luna. "An Evolutionary
Scheme for Solving a Reverse Supply Chain Design Problem." Proceedings on the
International Conference on Artificial Intelligence (ICAI). The Steering Committee
of The World Congress in Computer Science, Computer Engineering and Applied
Computing (WorldComp), 2012.
57
[9] Neureuther, Brian D., and George Kenyon. "Mitigating supply chain vulnerability."
Journal of marketing channels 16.3 (2009): 245-263.
[10] Käki, Anssi, Ahti Salo, and Srinivas Talluri. "Disruptions in supply networks: A
probabilistic risk assessment approach." Journal of Business Logistics 36.3 (2015):
273-287.
[11] Scheibe, Kevin P., and Jennifer Blackhurst. "Supply chain disruption propagation:
a systemic risk and normal accident theory perspective." International Journal of
Production Research 56.1-2 (2018): 43-59.
[12] Mastrocinque, Ernesto, et al. "A multi-objective optimization for supply chain
network using the bees algorithm." International Journal of Engineering Business
Management 5.Godite 2013 (2013): 5-38.
[13] Cao, Cejun, et al. "A novel multi-objective programming model of relief distribu-
tion for sustainable disaster supply chain in large-scale natural disasters." Journal
of Cleaner Production 174 (2018): 1422-1435.
[14] Hajikhani, Alborz, Mohammad Khalilzadeh, and Seyed Jafar Sadjadi. "A fuzzy
multi-objective multi-product supplier selection and order allocation problem in
supply chain under coverage and price considerations: An urban agricultural case
study." Scientia Iranica 25.1 (2018): 431-449.
[15] Loni, Parvaneh, Alireza Arshadi Khamseh, and Seyyed Hamid Reza Pasandideh.
"A new multi-objective/product green supply chain considering quality level repro-
cessing cost." International Journal of Services and Operations Management 30.1
(2018): 1-22.
[16] Cao, Cejun, et al. "A novel multi-objective programming model of relief distribu-
tion for sustainable disaster supply chain in large-scale natural disasters." Journal
of Cleaner Production 174 (2018): 1422-1435.
[17] Cao, Cejun, et al. "A novel multi-objective programming model of relief distribu-
tion for sustainable disaster supply chain in large-scale natural disasters." Journal
of Cleaner Production 174 (2018): 1422-1435.
58
[18] Singh, Sujeet Kumar, and Mark Goh. "Multi-objective mixed integer program-
ming and an application in a pharmaceutical supply chain." International Journal of
Production Research 57.4 (2019): 1214-1237.
[19] Metiaf, Ali, et al. "Multi-objective Optimization of Supply Chain Problem Based
NSGA-II-Cuckoo Search Algorithm." IOP Conference Series: Materials Science
and Engineering. Vol. 435. No. 1. IOP Publishing, 2018.
[20] Hendalianpour, Ayad, et al. "A linguistic multi-objective mixed integer program-
ming model for multi-echelon supply chain network at bio-refinery." EuroMed
Journal of Management 2.4 (2018): 329-355.
[21] Park, Kijung, Gül E. Okudan Kremer, and Junfeng Ma. "A regional information-
based multi-attribute and multi-objective decision-making approach for sustainable
supplier selection and order allocation." Journal of Cleaner Production 187 (2018):
590-604.
[23] Cao, Cejun, et al. "A novel multi-objective programming model of relief distribu-
tion for sustainable disaster supply chain in large-scale natural disasters." Journal
of Cleaner Production 174 (2018): 1422-1435.
[25] Çalk, Ahmet, et al. "A Novel Interactive Fuzzy Programming Approach for Opti-
mization of Allied Closed-Loop Supply Chains." International Journal of Compu-
tational Intelligence Systems 11.1 (2018): 672-691.
[26] Park, Kijung, Gül E. Okudan Kremer, and Junfeng Ma. "A regional information-
based multi-attribute and multi-objective decision-making approach for sustainable
supplier selection and order allocation." Journal of Cleaner Production 187 (2018):
590-604.
59