0% found this document useful (0 votes)
55 views70 pages

A99 SCM StatisticalvsNeural

Uploaded by

Dzaky Fajratama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views70 pages

A99 SCM StatisticalvsNeural

Uploaded by

Dzaky Fajratama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

A COMPARATIVE STUDY ON STATISTICAL AND

NEURAL APPROACHES FOR OPTIMIZING SUPPLY


CHAIN MANAGEMENT (SCM) SYSTEMS

A PROJECT REPORT

Submitted by
HEEROK BANERJEE [Reg No: RA1511008010064]
DIKSHIKA K. ASOLIA [Reg No: RA1511008010214]
PRIYANSHI GARG [Reg No: RA1511008010224]
NIKHIL SHAW [Reg No: RA1511008010233]
GRISHMA SAPARIA [Reg No: RA1511008010251]

Under the Guidance of


Dr. V. GANAPATHY
(Professor, Department of Information Technology)
In partial fulfillment of the Requirements for the Degree
of
BACHELOR OF TECHNOLOGY
in

INFORMATION TECHNOLOGY

DEPARTMENT OF INFORMATION TECHNOLOGY


FACULTY OF ENGINEERING AND TECHNOLOGY
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR-603203
MAY 2019

i
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR-603203

BONAFIDE CERTIFICATE

Certified that this project report titled “A COMPARATIVE STUDY ON


STATISTICAL AND NEURAL APPROACHES FOR OPTIMIZING SUPPLY CHAIN
MANAGEMENT (SCM) SYSTEMS” is the bonafide work of “HEEROK
BANERJEE [Reg No: RA1511008010064], DIKSHIKA K. ASOLIA [Reg No:
RA1511008010214], PRIYANSHI GARG [Reg No: RA1511008010224], NIKHIL
SHAW [Reg No: RA1511008010233], GRISHMA SAPARIA [Reg No:
RA1511008010251] who carried out the project work under my supervision.

Certified further, that to the best of my knowledge the work reported herein
does not form part of any other thesis or dissertation on basis of which a degree
or award was conferred on an earlier occasion for this or any other candidate.

Dr. V. GANAPATHY Dr. G. VADIVU


GUIDE HEAD OF THE DEPARTMENT
Professor Dept. of Information Technology
Dept. of Information Technology

Signature of Internal Examiner Signature of External Examiner

ii
ABSTRACT

The advent of AI tools in industrial management and business operations has broadly
reinforced the interplay among business entities in the digital realm. These autonomous
tools are undoubtedly powerful in employing self-learning and robust paradigms to fa-
cilitate predictive analytics for business intelligence but such tools still remain insuffi-
cient to overcome the impact of inevitable risks involved in businesses. In context to
Supply Chain Management (SCM), it is therefore an open problem to eliminate and
more significantly, optimize the impact of such risky operations on high-priority objec-
tives such as Total cost, Lead-time and inventory costs that have been an evolutionary
target for supply chain managers. In this study, we compare widely employed statistical
and neural approaches to perform forecasting and conduct numerical analyses on multi-
echelon supply chain networks. We start with experimental hypothesis testing on ac-
quired datasets from multiple sources and hypothesize different batches of data to eval-
uate the measures of centrality and their co-relation attributes. This essentially reduces
the input batches that are forwarded to forecasting models, hence relieving the system
from computational overhead. We then proceed to evaluate machine learning models
such as Decision Trees, Random Forests and Extended Gradient Boosting (XGB) Trees
with pipe-lined architectures in order to observe their model and runtime performances
on empirical and streaming data. We compare the obtained results to draw conclusions
on their run-time performances. Next, we construct and train neural network mod-
els for use-cases such as function estimation and time-series analysis on forecasting
problems related to risk-averse logistics and lead-time optimization. We have achieved
outperforming results for Bi-directional Long Short-term Memory (LSTM) model and
Non-linear Auto-Regressive (NAR) model for sequence-to-sequence prediction. We
compare these models based on their performance, the loss function and their ability
to generalize trends from the datasets. Finally, we conclude our study by commenting
on the observed performances of the desired models and providing future directions to
extend the contributions of this comparative study.
ACKNOWLEDGEMENTS

We would like to express our deepest gratitude to our guide, Dr. V. Ganapathy, (Pro-
fessor Dept. of Information Technology) for his valuable guidance, consistent encour-
agement, personal caring, timely help and providing us with an excellent atmosphere
for conducting research. All throughout the work, in spite of his busy schedule, he has
extended cheerful and cordial support to our group for completing this research work.
His suggestions and expertise played a key factor in redirecting our central focus to the
major outcomes which are presented in this study.

We would also like to thank Dr. G. Vadivu (HoD, Dept. of Information Technology)
for providing extra mural support and state-of-the-art infrastructure to us in order to fa-
cilitate this research work. We feel extremely privileged to acquire academic license for
MATLAB™, JASP™and QtiPlot™software, without which this work could not have
been completed.

We are also thankful to Dr. V. M. Shenbagaraman (HoD, Faculty of Management)


for his insightful brainstorming discussions. We would like to thank Dr. D. Malathi
(Professor, Dept. of Computer Science Engineering) for her anecdotal suggestions
on performance tuning and simulation modelling in MATLAB. We also thank Dr. R.
Subburaj for reviewing this work and for sharing his remarks.

Heerok Banerjee
Dikshika Asolia
Priyanshi Garg
Nikhil Shaw
Grishma Saparia

iv
TABLE OF CONTENTS

ABSTRACT iii

ACKNOWLEDGEMENTS iv

LIST OF TABLES vii

LIST OF FIGURES ix

ABBREVIATIONS x

LIST OF SYMBOLS xi

1 Introduction to Supply Chain Management 1


1.1 Issues in Supply Chain Management . . . . . . . . . . . . . . . . . 2
1.2 Risks involved in Supply Chains . . . . . . . . . . . . . . . . . . . 3

2 Review of Literature 4
2.1 Multi-objective Optimization . . . . . . . . . . . . . . . . . . . . . 4
2.2 Bayesian Modelling for Risk Analysis . . . . . . . . . . . . . . . . 5
2.3 Reverse Supply Chain Problems . . . . . . . . . . . . . . . . . . . 6

3 Risk Modelling in Supply Chain Networks 7


3.1 Mixed Integer Linear Programming (MILP) Model . . . . . . . . . 7
3.2 Numerical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Statistical Approaches 12
4.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.1 Student’s T Test . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.2 Mean Normalization . . . . . . . . . . . . . . . . . . . . . 14
4.2 Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . 17

v
4.2.1 Regression Model . . . . . . . . . . . . . . . . . . . . . . 17
4.2.2 Decision Tree Model . . . . . . . . . . . . . . . . . . . . . 18
4.2.3 Random Forest Regression . . . . . . . . . . . . . . . . . . 19
4.2.4 XGB Tree Model . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 Neural Approaches 26
5.1 Multi-layered Perceptron Networks . . . . . . . . . . . . . . . . . 26
5.1.1 Training the Model . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Long-Short Term Memory (LSTM) Networks . . . . . . . . . . . . 28
5.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Performance Analyses . . . . . . . . . . . . . . . . . . . . . . . . 32

6 Time Series Analyses with MATLAB™ 34


6.1 Non-Linear Auto-Regressive Model (NAR) . . . . . . . . . . . . . 35
6.1.1 Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7 Conclusion 38

8 Code Analysis 39
8.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
8.2 Dynamic Code Analysis . . . . . . . . . . . . . . . . . . . . . . . 48
8.3 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9 Publication 52

A Dataset Description 53
A.1 Walmart Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
A.2 UAE Distributor Dataset . . . . . . . . . . . . . . . . . . . . . . . 53
A.3 Time Series Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 54

B Backpropagation Algorithm 55
LIST OF TABLES

3.1 Example of supply chain logistics . . . . . . . . . . . . . . . . . . 9


3.2 Non-negative integral solutions . . . . . . . . . . . . . . . . . . . . 10

4.1 Student’s T test for "UAE Distributor" dataset: Column E (Sales) . . 13


4.2 Student’s T test for "UAE Distributor" dataset: Column F (Cost) . . 13
4.3 Student’s T test for "Walmart" dataset : Column K (Weight KG) . . 14
4.4 Tabulated performance of Additive Boosting Models . . . . . . . . 22
4.5 Tabulated performances of ML Models . . . . . . . . . . . . . . . . 25

5.1 Recorded Performance for MLP Network with 10 Hidden Layers . . 26


5.2 Training performance of MLP Network . . . . . . . . . . . . . . . 27

6.1 Training performance of NAR model . . . . . . . . . . . . . . . . . 36

8.1 Test cases for ML models . . . . . . . . . . . . . . . . . . . . . . . 51


8.2 Test cases for ANN Models . . . . . . . . . . . . . . . . . . . . . . 51

vii
LIST OF FIGURES

1.1 Supply chain as a complex network . . . . . . . . . . . . . . . . . 1


1.2 Venn Diagram for representation of risks involved in supply chain sys-
tems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.1 Simple Supply chain architecture with multiple suppliers, single manu-
facturer and single retailer . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Non-negative integral solutions of Diophantine equation with d=100 11

4.1 Representation of original data vs Normalized data series . . . . . . 15


4.2 Actual vs Predicted plot for Normalized data with different batch sizes 16
4.3 Decision Tree Model . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Plot for Minimum objective Function . . . . . . . . . . . . . . . . 19
4.5 Actual vs Predicted plot for Random Forest predictors . . . . . . . . 21
4.6 Actual vs Predicted Plot for XGBoosted Tree Model . . . . . . . . 23
4.7 Response Plot for XGBoosted Trees . . . . . . . . . . . . . . . . . 24

5.1 Error Surface Plot for Multi-Layered Perceptron Model . . . . . . . 27


5.2 Actual vs Predicted Plot for MLP network with 10 Hidden Layers . 27
5.3 LSTM Architecture with one cell . . . . . . . . . . . . . . . . . . . 28
5.4 Model summary for Bi-LSTM Model . . . . . . . . . . . . . . . . 31
5.5 Error Surface plot for Bi-LSTM Model . . . . . . . . . . . . . . . . 32
5.6 Observed training performance of MLP networks after 3 training cycles 33
5.7 Actual vs Predicted plot for LSTM network . . . . . . . . . . . . . 33

6.1 NAR Model Architecture . . . . . . . . . . . . . . . . . . . . . . . 35


6.2 Actual vs Predicted series for NAR model . . . . . . . . . . . . . . 36
6.3 Error Surface Plot for NAR Model . . . . . . . . . . . . . . . . . . 37
6.4 Forecasted Series for NAR Model . . . . . . . . . . . . . . . . . . 37

8.1 Execution of Code snippet for Decision Trees . . . . . . . . . . . . 48

viii
8.2 Execution of Code snippet for Random Forest . . . . . . . . . . . . 48
8.3 Execution of Code snippet for XGB Trees . . . . . . . . . . . . . . 49
8.4 Execution of Code snippet for NAR live forecasting . . . . . . . . . 49
8.5 Dynamic code analysis using Matlab™Code Analyzer . . . . . . . 50

B.1 Simple neural network architecture with one input layer of ’n’ neurons,
one hidden layer of ’l’ neurons and one output layer of ’m’ neurons 55

ix
ABBREVIATIONS

SCM Supply Chain Management

SCRM Supply Chain Risk Management

RSCP Reverse Supply Chain Problem

DAG Directed Acyclic Graph.

RI Risk Index

LSTM Long Short-term Memory

NAR Non-linear Auto-Regressive

GA Genetic Algorithm

MILP Mixed-Integer Linear Programming

RSS Residual Sum of Squares

MSE Mean Squared Error

MLP Multi-Layered Perceptron

XGB Extended Gradient Boosting

IG Information Gain

x
LIST OF SYMBOLS

θ Angle made by error surface to the plane of reference


∑n
i=1 Ai Summation of all elements in the set A from i=1 to n
∏n
i=1 Bi Product of all elements in the set B from i=1 to n

xi
CHAPTER 1

INTRODUCTION TO SUPPLY CHAIN MANAGEMENT

SCM is the study of methods to run collaborative business operations in a way to maxi-
mize profit from the investments in the business pertaining to risk-averse strategies and
optimizing resources, expenditures as well as effort to sustain a healthy growth. Nev-
ertheless, it is clear that topological illustrations are used to represent business entities
and their relationships. In network theory, Supply Chain Risk Management (SCRM) is
understood as a Directed Acyclic Graph (DAG) (1). For example, shoppers, suppliers,
makers and distributors are represented by vertices whereas the edges can be business
activities like packaging, delivery, assembling, etc.

S1 D1

S2 M1 D2

S3
M R
S4 M2 D3

S5

S6 M3

Figure 1.1: Supply chain as a complex network

In figure 1.1, the circular nodes denote an element in the supply chain network.
The directed arrow denote a specific operation to be performed between two node. The
complexity of the network is in-determinant of the number of nodes and the bi-lateral
processes involved during a supply chain cycle (1). However, we have introduced the
formulation of non-linear attributes such as risks in later chapters. As deducible from
the inherent architectures of Supply chain networks, there are numerous issues attached
with SCM relatable to social and economic factors (1).
1.1 Issues in Supply Chain Management

Psychological, social, cultural and personal factors influence consumer behaviour which
is rapidly altered by globalization and technology. As social media users comply with
new norms for interaction in the online virtual world, there is a need for the companies
to use this new gigantic data to promote or make relevant products. The trends have a
short cycle as the products which thrive and go extinct with them. Firms are under com-
pulsion to keep producing new products as well as shipping new features while keeping
the cost minimal. Furthermore, enhancing existing product features need revamping
supply chain to aid product enhancement.

The emergence of social media has made the internet the biggest market where
anyone can advertise and sell products to anyone across the earth. This has lifted the
expectations of consumers for top standard products with innovation and consistency.
Prominent trends like blockchain, big data, IoT, smart packaging are not only uplifting
concurred standards but the methods in which they are administered and gauged. New
advanced applications are required for handling, processing and making sense of gigan-
tic data being generated. The immediate issue an enterprise face is whether investing
in utilizing platforms established on micro-services and big data will bear data lifting
essentials.

Hence, it is clear that the overgrowth of democratized technologies across the globe
is a harbinger for severe competence in the contemporary market. The impact of these
inevitable components are only some of the attributes which surround the problems
related to SCM. In this research study, we overlook the quantitative factors involved
in business operations and provide a context to transform operation inclined toward
fault-tolerant and risk-aware decision making. These essentially brings new trade-offs
to managers but it is necessary to outline the tools applicable for dealing with multi-
objective decision making. So, the central focus of our study is on reversing effects of
risks and understand the non-linearity of this quantity from an optimization approach.
However, the goal of this study is to provide substantial contributions to modelling risks
are non-dominant and provide a model-to-model comparison on quantitative measures
for evaluating a risk-aware logistics scenario in a business setting.

2
.

1.2 Risks involved in Supply Chains


1. Supply risk- It is the pervasive factor that determines the extent of variations in
supply trends and supply chain outcomes. Broadly speaking, it can be defined as a
conflict between supply and demand which can disrupt consumer life and welfare.
Supply risks occurs due to affairs associated with individual insufficiency and
severe competence (4).

2. Demand risk - A risk of encountering unusual consumer demand is inevitable


due to socio-economic changes. A high forecast but low actual demand bears ad-
ditional costs for a business firm in terms of discarding or depositing their abun-
dant assets (4). Conversely, low forecast but high actual demand supplements
opportunity costs in terms of bygone sales. It is routinely remarked in marketing,
sales, capital investments and supply chain decisions to be a construct of demand
forecasts (5).

3. Operational risk- Operational risk is the cumulative factor that describes the
probability of enduring loss due to a potential failure of a business operation (5).
Operational risks results from unprecedented failure of operations (5); regressive
manufacturing endeavours and incapacitated processing, high degree of deviation
and advent of incoming new trends and technology(5).

4. Security risk-Something or someone likely to cause danger or difficulty.Security


risks arrives due to infrastructural security, systems security, vandalism, villainy
(7). A breach in technical infrastructure or a conflict of interests are also ac-
counted as security risks as they promote an extent of unnatural behaviour.

5. Social risk-Social risks are described as the arrival of challenges that business
practices endure, due to impacts propagated through the social participants of the
business hierarchy. (3).

Figure 1.2: Venn Diagram for representation of risks involved in supply chain systems

3
CHAPTER 2

REVIEW OF LITERATURE

Optimization problems in supply chain systems have attracted researchers and supply
chain managers since the early industrialization and democratization of products at the
expense of outsourcing services like manufacturing, dispensing, transportation, pack-
aging etc. Optimum resource configuration is one of the most attributed problems in
SCMS (12; 13). For supply chain systems that are immune to pervasive risks, sup-
ply chain managers emphasis on determining singleton optimization problems such
as total costs, Return on Investment, transportation cost etc. However, framing sin-
gle objective-based supply chain needs prior exposure and additional supportive func-
tions such as lead-time optimization, stock optimization, inventory-level optimization
etc (14). Sometimes conflicting objectives can cause disruptions in the supply chain;
and therefore, a thorough study outlining the trade-offs between objective functions is
necessary prior to mathematical modelling. This deviates the attention from supply
chain modelling towards satisfying conflicting objectives (15).

2.1 Multi-objective Optimization

Many novel supply chain evaluation models have been proposed with limited con-
straints and succinct objectives. For example, A location-inventory problem was inves-
tigated using a multi-objective self-learning algorithm based on non-dominated sorting
genetic algorithm II (NSGA II) in order to minimize the total cost of the supply chain
and maximize the volume fill rate and the responsiveness level (16). In another study, a
multi-objective programming model using Fuzzy logic was derived for forward and re-
verse supply chain systems for minimizing total cost and the environmental impact. The
Fuzzy-based model consisted of two phases. The first phase converts the probability
distribution of multi-objective scenarios into mixed integer linear programming model
and then into an auxiliary crisp model (17). The second phase generates fuzzy decom-
position methodology based on the subjected linear inequalities and variable constraints
to search for the non-dominant solution (18). Several models have also been proposed
for multi-echelon supply chains with closed-loops. For example, A multi-product with
multi-stage and multi-period random distribution model is proposed to compute with
predetermined goals for a multi-echelon supply chain network with high fluctuations in
market demands and product prices. Similarly, a two-phased multi-criteria fuzzy-based
decision approach is proposed where the central focus is to maximize the participant’s
expected revenues, average inventory levels along with providing robustness of selected
objectives under demand uncertainties (20). Considering travelling distance and trav-
eling time, another multi-objective self-learning algorithm is proposed namely fuzzy
logic non-dominated sorting genetic algorithm II (FL-NSGA II), which solves a multi-
objective Mixed-Integer Linear Programming (MILP) problem of allocating vehicles
with multiple depots based on average traffic counts per unit time, average duration of
parking and a primitive cost function (20). Alternatively, a bi-objective mathematical
model is formulated which searches for local minimas of total cost and transportation
cost; which attempt a pareto-search optimal checks to plot the pareto optimal fronts. A
new hybrid methodology is also proposed which combines robust optimization, queu-
ing theory and fuzzy inferential logic for multi-objective programming models based on
hierarchical if-then rules (22). Another variant of the above soft-computing approaches
consisting of a swarm-based optimization technique namely "The Bees algorithm" is
discussed and implemented to deal with multi-objective supply chain problems; to find
pareto optimal configurations of a given SCM network and attempts to minimize the
total cost and lead-time. For an optimization model integrating cost and time criteria,
a modified particle-swarm optimization method (MEDPSO) is adopted for generating
solutions of a multi-echelon unbalanced supplier selection problem (24; 25; 26).

2.2 Bayesian Modelling for Risk Analysis

Bayesian logic is one of the most fundamental method borrowed today for optimizing
and reinforcing probabilistic models (10). Quantitative measures that are found dis-
persed in any objective based problem are often modelled using Bayesian analysis (12).
For example, Supply chain network as Directed Acyclic Graph (DAG) welcomes the
expansion of conditional probability and measures of similarity to examine the extent

5
of how such graphs can be extrapolated and utilized to infer vague quantities. In con-
trast to fuzzy logic, where no deterministic model is presumably evaluated, Bayesian
modelling assumes the underlying principals of probability distributions to derive con-
clusions (9). Such methods are extraordinarily employed in modelling risks in supply
chain networks. Capturing the effects of risk propagation and assess the total fragility
of a supply chain are considered one of the most significant problems in SCRM. The
amount of uncertainty and inevitability in supply chain operations transforms this prob-
lem to multi-criteria based analytical problems. However, several models were sum-
marized and reinforced to create a well-acknowledged and succinctly derived model
for formulating disruption probability and the occurrence of risks. Based on Inventory
optimization literature, it is evident that supply risk originates in multiple locations and
propagate autonomously within the network. The effect is identical to and termed as
"Ripple effect".

2.3 Reverse Supply Chain Problems

As per the investigation conducted within a limited scope, design strategies and man-
agerial architectures for reverse supply chain systems are unexplored and lacking con-
firmed literature (8). Optimization problems pertaining to lead-time optimization, in-
ventory level optimization and cost optimization are traditionally dealt with G (GA)
based approaches by spawning a subset of ideal solutions and evaluating fitness values
for each solution (18; 19; 22). Heuristic algorithms tend to search for local optimal
values and have better dropout ratios as compared to GA-based approaches. On the
other hand, evolutionary schemes tend to combine optimization methods with GA ap-
proaches (8). For example, a customer-allocation optimization is attempted using MILP
based on multi-tier retailer location model. The model examined the locations and ca-
pacities of manufacturing cum retailing facilities as well as transportation links to better
facilitate customers with criteria-based requirements. In (8), an Reverse Supply Chain
Problem (RSCP) model for addressing the design of reverse manufacturing supply net-
work is proposed. The model is a hybrid model consisting of GA-based approaches and
underlying 0-1 MILP optimization constraints.

6
CHAPTER 3

RISK MODELLING IN SUPPLY CHAIN NETWORKS

In this chapter, a deterministic mathematical model for each attribute in the supply chain
network is discussed. We have used MILP for formulating risk indices.

3.1 Mixed Integer Linear Programming (MILP) Model

For a given supply chain with 4 echelons: Supplier, Distributor, Manufacturer, Retailer
and selective activities such as sourcing or supplying raw material to each component,
assembling final products and delivering to destination markets, each component has its
own cost, lead-time and associated risk.

1. The Risk Index (RI) is derived from the model proposed in (9), and can be math-
ematically formulated as:

n
∑ m

RIsupplier = αsij . βsij . (1 − (1 − P (S̃ij )) (3.1)
i=1 j=1

where, αsij is the consequence to the supply chain if the ith supplier fails,
βsij is the percentage of value added to the product by the ith supplier,
P (S̃ij ) denotes the marginal probability that the ith supplier fails for j th demand,

Similarly, the risk indices for the rest of the components can be calculated as:

RIdistributor = αdriski . βmi . (1 − (1 − P (M̃j ))) (3.2)

RImanuf acturer = αmriski . βmi . (1 − (1 − P (M̃j ))) (3.3)

RIretailer = αrriski . βri . (1 − (1 − P (R̃j ))) (3.4)

2. For each set of demand, the cumulative risk index of the supply chain network
can be calculated as:
T RI = w1 . RIsupplier + w2 . RIdistributor + w3 . RImanuf acturer + w4 . RIretailer (3.5)

where, w1 , w2 , w3 , w4 are arbitrary weights such that w1 + w2 + w3 + w4 = 1


3. The total supply chain cost can be calculated as (? ):
N
∑ Ni

TC = ξ × ( µi . Cij yij ) (3.6)
i=1 j=1

where, N is the number of components,


ξ denotes the period of interest,
µi is the average demand per unit time,
Cij is the cost of the j th resource option for the ith component,
yij is a variable denoting the amount that the ith component supplies as a participant for
the j th resource option

4. For each couple of normalized risk index and total cost of the supply chain, the main
objective function Z is given as:

Z = w1 . T Cn + w2 . T RIn (3.7)

where, T Cn is the normalized total cost,


T RIn is the normalized total risk index,
w1 , w2 are the weights ; w1 + w2 = 1

3.2 Numerical Analysis

S1

S2 M R demand

S3

Figure 3.1: Simple Supply chain architecture with multiple suppliers, single manufac-
turer and single retailer

Consider a simple supply chain network as a directed graph G comprising of sup-


pliers S = {S1 , S2 , S3 }, a manufacturer M and D, a retailer R as shown in Fig.3.1. For
a realistic supply chain scenario, one must assume that the supply chain actors are inde-
pendent and consequently the logistics and operational costs can be arbitrarily chosen.
Table 1 represents the configuration of the pseudo supply chain model given in Fig.3.1
:

8
Table 3.1: Example of supply chain logistics

Component Name Stage Cost Stage time Avg Demand StdDev


S1 $2 8 25 2
S2 $3 15 50 7
S3 $7 32 120 14
M $0 10 300 30
R $4 10 300 25

In the above architecture, the subjected demand to the manufacturer is propagated


directly to the manufacturer. Consequently, the demand is arbitrarily distributed among
the suppliers {S1 , S2 .S3 }. For example, if the demand is ’d’ and the demand distribution
is {x1 , x2 , x3 }, then the cost function Fc can be represented as a linear Diophantine
equation:
Fc (G, S) : 2x1 + 3x2 + 7x3 ≤ 4d (3.8)

x 1 + x2 + x 3 = d (3.9)

For example, let us consider that the demand subjected to the supply chain model is
100 units. In that case we form a system of two linear equations, from which we can
obtain a set of possible solutions. For a simplistic case study, we have assumed that the
linear inequality is a linear equality subject to linear constraints, given that the supply
chain model is at a supply-demand equilibrium. This implies that supplementary costs
outside the scope of the suppliers are neglected and the overall expenditure is equated
to the total revenue generated in one echelon.

2x1 + 3x2 + 7x3 = 400 (3.10)

x1 + x2 + x3 = 100 (3.11)

Multiplying equ.(3.11) by 3 and subtracting from equ. (3.10),

4x3 − x1 = 100 (3.12)

9
100 + x1
x3 = (3.13)
4

Assume x1 = a, then

100 + a
x3 = (3.14)
4

clearly, a must a multiple of 4 since x3 should be a positive integer.

Substituting x3 and x1 in terms of a, we get

100 + a
a+ + x2 = 100 (3.15)
4

100 + a
x2 = 100 − a − (3.16)
4

Table 3.2: Non-negative integral solutions

Supplier 1 (S1 ) Supplier 2 (S2 ) Supplier 3 (S3 ) Cost Function


0 25 75 600
4 26 70 576
8 27 65 552
12 28 60 528
16 29 55 504
20 30 50 480
24 31 45 456
28 32 40 432
32 33 35 408
36 34 30 384
40 35 25 360
44 36 20 336
48 37 15 312
52 38 10 288
56 39 5 264
60 40 0 240

The linear Diophantine equation can be solved employing the SymPy library in
Python. A psuedo code is given below:
1 from sympy . s o l v e r s . d i o p h a n t i n e i m p o r t d i o p h a n t i n e
2 from sympy . s o l v e r s . d i o p h a n t i n e i m p o r t d i o p _ l i n e a r

10
3 from sympy i m p o r t symbols , s y m p i f y
4

5 a , b , c= s y m b o l s ( " x , y , z " , i n t e g e r = T r u e )
6

8 coeff= l i s t ()
9 f o r key i n s u p p l i e r s :
10 # p r i n t ( key )
11 c o e f f . a p p e n d ( s u p p l i e r s [ key ] )
12 # print ( coeff )
13

14 # ## Base S o l u t i o n
15 demand =500
16 m a n _ c o s t =38
17

18 b a s e _ e x p r =2 * a +3 * b +7 * c −4* demand
19 p r i n t ( base_expr )
20 [ b a s e _ s o l s ]= d i o p h a n t i n e ( b a s e _ e x p r )
21 p r i n t ( " Base S o l n : " + s t r ( b a s e _ s o l s ) )
22 t_0 , t_1 = b a s e _ s o l s [ 2 ] . f r e e _ s y m b o l s

120
Supplier 1
100 Supplier 2
Supplier 3
80
Value

60

40

20

0
0 20 40 60 80 100
Solutions

Figure 3.2: Non-negative integral solutions of Diophantine equation with d=100

The monotonous change in supplier distribution as the solutions increases over x


axis suggests that the minimum likelihood for an appropriate distribution is only con-
centrated where the demand is equally subject to the supplier. The encircled region in
Fig. 3.2 denotes the minimum likelihood for risk-propagation.

11
CHAPTER 4

STATISTICAL APPROACHES

In this chapter, we will introduce descriptive hypothesis testing and statistical machine
learning models for problems related to demand forecasting and curve-fitting. In order
to observe the nature of the models, We will test the selected models with large datasets
acquired from multiple sources. Consequently, we will tabulate the achieved evaluation
metrics and attempt to tune the model parameters in order to optimize the limitations
and boost their individual performances. Finally, we will illustrate a well-structured
nomenclature to present a generalized conclusion drawn from the observed scenarios.

4.1 Hypothesis Testing

4.1.1 Student’s T Test

Descriptive hypothesis testing, sometimes referred as confirmatory data analysis is a


method of statistical inference on the basis of observing events and statistical measures
such as measures of central tendency and measures of shape of distribution. A statistical
hypothesis is a hypothetical statement about a sample or a population of samples, which
describes the relationship between two distinct variables and both the variables lead to
specific effects on the other. In order to validate a statistical hypothesis, we conducted
a series of parametric tests such as student’s T test. A student’s T test is required to
compare the measures of central tendency of two samples. Essentially, one sample is
taken from the original population and their respective means are calculated.

First we hypothesize a statement as follows:

H0 (null hypothesis) : µ1 − µ2 ≤ 0; (4.1)


And, another statement such as:

H1 (alternative hypothesis) : µ1 − µ2 > 0; (4.2)

Table 4.1: Student’s T test for "UAE Distributor" dataset: Column E (Sales)

Hypothesis Parameters
Sample Size (%) Population Mean Sample Mean Decision
t p

20 409.97 -1.95 0.97 Accept


25 404.91 -1.56 0.94 Accept
30 408.18 -2.07 0.98 Accept
33 406.83 -1.97 0.97 Accept
38 404.64 -1.79 0.96 Accept
391.27
40 404.99 -1.86 0.96 Accept
48 401.79 -1.53 0.93 Accept
50 399.36 -1.19 0.88 Accept
70 395.59 -0.71 0.76 Accept
99 391.57 -0.05 0.52 Accept
a
There is no significant difference in the population mean and sample mean across variable sample sizes.
b
A conclusion can be drawn that the dataset is uniformly distributed across all 6011 samples and a sample can
be extracted and treated as a representation of the entire population.

Table 4.2: Student’s T test for "UAE Distributor" dataset: Column F (Cost)

Hypothesis Parameters
Sample Size (%) Population Mean Sample Mean Decision
t p

20 743.19 -0.44 0.67 Accept


30 781.56 -1.70 0.95 Accept
35 771.65 -1.49 0.93 Accept
40 755.87 -1.02 0.84 Accept
45 752.71 -0.96 0.83 Accept
726.70
50 742.63 -0.61 0.72 Accept
60 737.74 -0.45 0.67 Accept
65 734.38 -0.32 0.62 Accept
70 727.92 -0.05 0.52 Accept
80 752.71 -0.96 0.83 Accept
a
There is no significant difference in the population mean and sample mean across variable sample sizes.

The two statistical tests conducted on the same dataset concludes that a percent-
age of sample extracted from the population can be assumed to be an appropriate rep-
resentative of the population and later utilized for descriptive analytical tasks such as
input-output curve fitting. This promises to reduce redundancy and computational com-
plexity.

13
Table 4.3: Student’s T test for "Walmart" dataset : Column K (Weight KG)

Hypothesis Parameters
Sample Size (%) Population Mean Sample Mean Decision
t p

30 10,405.42 0.69 0.24 Reject


15 12,405.21 0.46 0.32 Reject
44 9,447.93 0.86 0.19 Reject
65 9,243.03 1.05 0.14 Reject
48,818.57
72 8,860.44 1.11 0.13 Reject
82 10,320.07 1.14 0.12 Reject
10 11,025.16 0.39 0.34 Reject
7 7,759.20 0.35 0.36 Reject
a
The null hypothesis is rejected.
b
The data is heavily skewed and assuming that the distribution is normal, we observe that the difference in
means is significantly greater for most of the sample sizes, hence we cannot arbitrarily choose any of these
sample sizes for our purpose.

4.1.2 Mean Normalization

Mean Normalization is a technique of reducing the marginal variance in a set of data


points by representing a series by its measures of central tendency such as mean. Mean
normalization helps in diluting errors in the data by interpolating statistical represen-
tation of data points from which the original set can be derived. Mathematically, the
mean normalization is performed as follows:

n
1∑
χ= xi (4.3)
n i=1

x̂i = |xi − χ| (4.4)

During the course of this study, we reviewed many datasets obtained from multi-
ple sources. Generally, empirical datasets extracted from physical experimental setup
are more likely to contain marginal errors, instrumentation error, human error etc. and
therefore it is crucial to eliminate the redundant error percentage indigenous in such
datasets. We have transformed the obtained datasets into normalized datasets and re-
designed some of the machine learning models as well as ANN models to observe their
learning behaviour from such tightly encapsulated data. The normalized data series

14
however retains the original information provided that the normalization process is not
stringent. We must carefully observe the standard deviation of the original series and
reasonably select a batch size. Consequently, it is concluded within the scope of this
study that the error percentage is drastically reduced when the samples are prepossessed
with appropriate measures.

original series
normalized series
1.5

1
f (x)

0.5

0 2 4 6 8 10
x

Figure 4.1: Representation of original data vs Normalized data series

The pseudo code given below illustrates the algorithmic steps of normalizing a
dataset using MATLAB ™. As per the pseudo code, samples are grouped in order
of 100s and their respective means are computed. The mean is substituted as an in-
formative datapoint to yield a axiomatically similar dataset as the original one. This
essentially decomposes the dataset by reducing the number of samples to be processed
but also preserves the integrity of the data. Hence, it is a fruitful approach when con-
sidering high volume of noisy data along with huge sample size.

1 % B a t c h e d Mean N o r m a l i z a t i o n − B a t c h _ s i z e =100
2 s a m p l e _ s i z e =170000;
3 b a t c h _ s i z e =100;
4 batch_input100 =[];
5 for i =0:( sample_size / batch_size )
6 temp= d a t a _ i n p u t ( 1 + i * b a t c h _ s i z e : b a t c h _ s i z e * ( i +1 ) , 1 ) ;
7 b a t c h _ i n p u t 1 0 0 = [ b a t c h _ i n p u t 1 0 0 ; mean ( temp ) ]
8 end

15
(a) Over-generalization

(b) Normalization: Batch_size=100

(c) Normalization: Batch_size=200

(d) Normalization: Batch_size=500

Figure 4.2: Actual vs Predicted plot for Normalized data with different batch sizes
16
4.2 Machine Learning Models

A Machine Learning model is a composite processing system that attempts to perform


statistical inference based on input-output patterns. Statistics and Machine Learning
Toolbox provides functions and apps to describe, analyze, and model data. You can use
descriptive statistics and plots for exploratory data analysis, fitting probability distribu-
tions to data, generating random numbers for Monte Carlo simulations, and performing
hypothesis testing.

There are broadly two categories of problems in Machine Learning namely regres-
sion and classification problems. A regression problem is defined as an estimation
problem where the target variable is a continuous variable whereas in a classification
problem, the target variable is categorical. For the sake of a comparative study, we will
discuss each model extensively and compare the results in section 5.3. Regression and
classification algorithms allows you to draw inferences from data and build predictive
models.

4.2.1 Regression Model

For a regression problem, we have considered the dataset "UAE_Distributor.xlsx"[see


Appendix A.2] and selected 2 features "Net Sales" and "Net Cost" as input features and
"Average Sales Quantity" as the target feature . Hence the linear regression problem
can be formulated as:

y = θ 1 x 1 + θ2 x 2 (4.5)

where, y denotes the average sales quantity,


x1 denotes the net sales and s2 denotes the net cost

17
4.2.2 Decision Tree Model

Decision Trees are predictors that interprets decisions based on path traversals in a
structured tree beginning from the root node to a leaf node. The leaf node essentially
remarks a decision. Every parent node in the decision tree denotes an arbitrary binary
test that concludes in either ’True’ or ’False’ for a classification problem, or results
in a real integer. Regression trees measure these real-valued integers to give numeric
responses. The algorithm to grow decision trees is given below:
1 Compute E n t r o p y f o r d a t a s e t .
2 Select quantitative attributes .
3 For each a t t r i b u t e :
4 C a l c u l a t e entropy of a l l response v a r i a b l e .
5 C a l c u l a t e Mean e n t r o p y f o r t h e a t t r i b u t e .
6 Calculate gain for the a t t r i b u t e .
7 S e l e c t the a t t r i b u t e with h i g h e s t gain .
8 Repeat randomly a f t e r s e v e r a l i t e r a t i o n s .

The entropy of a given attribute is calculated as follow:

N

S=− pi log2 (pi ) (4.6)
i=1

The Information Gain (IG) for an attribute is calculated as:

q
∑ Ni
IG(Q) = S0 − Si (4.7)
i=1
N

However, in this study we have considered a regression problem. Hence our objec-
tive is to split the tree in a way that the Residual Sum of Squares (RSS) is minimal. So,
we calculate the absolute mean to bifurcate the samples into two splits, each linking two
child nodes. The child nodes iteratively calculate the mean for rest of the samples and
the tree is grown until a desired model is achieved. Performance measures such as dept
of the tree, number of splits and pruning ratio can be controlled by hyper-parameter
tuning. This topic is excluded in this study as it requires advanced hands-on experience
with Matlab toolbox ™and will be discussed in future contributions to the literature.

18
Figure 4.3: Decision Tree Model

(a) Objective Function, Iteration 1 (b) Objective Function, Iteration 2

(c) Objective Function, Iteration 3 (d) Optimized Objective Function: Best


Model

Figure 4.4: Plot for Minimum objective Function

4.2.3 Random Forest Regression

Decision trees are prone to variability since the selective features are arbitrarily drawn.
Conducting a series of test on the same sample may result in different since individual

19
decision tree may tend to overfit. In order to overcome this anomaly, decision trees
are combined to produce a collective result by combination or pooling, which reduced
over-fitting and improves generalization.

Random Forests yields many additive decision tree and harvests a weighted pre-
dicted from all the trees. For a regression problem, the individual predictors produce
their corresponding predictions and the mean prediction is considered as the final value.

Random Forests are built as follows:


1 Randomly s e l e c t k a t t r i b u t e s from d a t a s e t .
2 F o r k a t t r i b u t e s , s e l e c t b e s t a t t r i b u t e ( e n t r o p y s e l e c t i o n and IG ) .
3 S p l i t nodes i n t o c h i l d nodes .
4 R e p e a t u n t i l maximum number o f n o d e s ’ l ’ a r e c r e a t e d .
5 B u i l d f o r e s t f o r n number o f t i m e s .
6 P r e d i c t f or t r a i n i n g samples .
7 Compute w e i g h t e d a v e r a g e o f i n d i v i d u a l p r e d i c t i o n s f o r f i n a l
prediction .

The pseudo code for implementing Random Forest Regressors is also given below:
1

2 ( trainingData , testData ) = dataset . randomSplit ([0 .7 , 0 . 3 ] )


3

4 # T r a i n a Ra n d o m F or e s t model .
5 RFmodel = R a n d o m F o r e s t R e g r e s s o r ( l a b e l C o l = " O r d e r _ p u r e " , f e a t u r e s C o l = "
InpVec " , numTrees =3 , maxBins =5 00 0)
6 model = RFmodel . f i t ( t r a i n i n g D a t a )
7

8 # Make p r e d i c t i o n s .
9 p r e d i c t i o n s = model . t r a n s f o r m ( t e s t D a t a )
10 # Evaluate Performance
11 evaluator = RegressionEvaluator (
12 l a b e l C o l = " O r d e r _ p u r e " , p r e d i c t i o n C o l = " p r e d i c t i o n " , metricName = "
rmse " )
13 rmse = e v a l u a t o r . e v a l u a t e ( p r e d i c t i o n s )
14 p r i n t ( " Root Mean S q u a r e d E r r o r (RMSE) = %g " % rmse )

20
Figure 4.5: Actual vs Predicted plot for Random Forest predictors

4.2.4 XGB Tree Model

XGB Trees are ensemble learning approach with two or more hybrid learning algo-
rithms. As discussed in the previous section, Decision Tree models exhibit high vari-
ance in generalization. Ensemble based learning approaches over this variance in a
non-dominant fashion.

Boosting: Trees are generated sequentially such that each successive tree is empha-
sized to minimize the error generated by the preceding tree. The overall error is reduced
since every generation of tree reduces error for its predecessors. In contrast to bagging
techniques, in which trees are grown to their maximum extent, boosting uses trees with
fewer splits. Several learning parameters such as number of trees or iterations, the learn-
ing rate or gradient boosting rate, dept of each tree can be optimally selected to reduce
the computing overhead.

Boosting Algorithm:
1 T r a i n i n i t i a l model F_0 t o p r e d i c t t a r g e t Y .
2 Compute r e s i d u a l e r r o r , d e l t a _ y = Y − F_0 .
3 C r e a t e new model H_1 and f i t t o t h e r e s i d u a l s .
4 Combine ( F_0 + H_1 ) t o y i e l d F_1 .

21
5 R e p e a t f o r F_1

We create a model with a function F_0(x).

n

F _0(x) = argmin L(yi , γ) (4.8)
1

n
∑ n

argmin L(yi , γ) = argmin (yi − γ)2 (4.9)
1 1

Taking the first differential w.r.t γ ,

n
1∑
F _0(x) = (yi ) (4.10)
n 1

The additive model H_1(x) computes the mean of the residuals at each leaf node of
the tree. The boosted function F_1(x) is obtained by summing F_0(x) with H_1(x).

Numerical Example

Consider a regression problem with input feature Sales and target variable Quantity.

Sales Quantity F_0 y - F_0 H_1 F_1 y - F_1

5 82 134 -52 -38.25 95.75 -13.75


7 80 134 -54 -38.25 95.75 -15.75
12 103 134 -31 -38.25 95.75 7.25
23 118 134 -16 -38.25 95.75 22.25
25 172 134 38 25.50 159.50 12.50
28 127 134 -7 25.50 159.50 -32.50
29 204 134 70 25.50 159.50 44.50
34 189 134 55 25.50 159.50 29.50
35 99 134 -35 25.50 159.50 -60.50
40 166 134 32 25.50 159.50 6.50

Table 4.4: Tabulated performance of Additive Boosting Models

22
Pseudo code

1 # ## XGBRegressor Model
2 p a r a m s = { ’ max_depth ’ : 2 ,
3 ’ s i l e n t ’ :0 ,
4 ’ colsample_bytree ’ :0.3 ,
5 ’ max_dept ’ : 5
6 ’ alpha ’ :10
7 ’ learning_rate ’ :0.38 ,
8 ’ o b j e c t i v e ’ : ’ reg : l i n e a r ’ ,
9 ’ n_estimators ’ :10}
10

11 f e a t u r e s _ x =np . a r r a y ( o u t p u t . s e l e c t ( " v e c F e a " ) . c o l l e c t ( ) )


12 l a b e l s _ y =np . a r r a y ( o u t p u t . s e l e c t ( "QUANTITY" ) . c o l l e c t ( ) )
13 f e a t u r e s _ x =np . s q u e e z e ( f e a t u r e s _ x , a x i s = 1)
14

15 X_train , X_test , Y_train , Y_test = model_selection . t r a i n _ t e s t _ s p l i t (


f e a t u r e s _ x , l a b e l s _ y , t e s t _ s i z e = 0 . 5 1 , r a n d o m _ s t a t e =1 23 )
16

17 xgbmodel = xgb . XGBRegressor ( p a r a m s )


18 xgbmodel . f i t ( X _ t r a i n , Y _ t r a i n )
19 p r e d s = xgbmodel . p r e d i c t ( X _ t e s t )

Figure 4.6: Actual vs Predicted Plot for XGBoosted Tree Model

23
Figure 4.7: Response Plot for XGBoosted Trees

4.3 Performance Analysis

In this section, we will discuss the performance of each model based on empirical ob-
servations.

We attempted hyper-parameter tuning for the discussed models and overlooked op-
timization parameters to better understand the run-time performance of these model.
The source code was compiled and executed with varying model parameters and the re-
spective loss function was evaluated as a MILP optimization problem, the loss function
being the sole objective function.

24
Table 4.5: Tabulated performances of ML Models
Test Tuning Trees Training Memory
Model RMSE Max Dept
loss loss Pruned Time (in s) Utilized

Decision Tree
0.59 0.60 0.76 7 22 67 66B
(maxBins=100)

RandomForest
{n_estimators=50, 2.36 1.82 1.53 23 3 82 246B
MaxBins=100}

RandomForest
{n_estimators=100, 1.20 2.66 1.09 18 3 87 107B
MaxBins=100}

XGB Tress
{’colsample_bytree’: 0.6,
’alpha’: 10,
7.90 7.28 2.81 38 7 1020 2081B
’learning_rate’: 0.25,
’max_depth’: 100,
’n_estimators’:100}

XGB Tress
{’colsample_bytree’: 0.3,
’alpha’: 8,
6.69 6.28 2.58 29 2 340 ∼1040B
’learning_rate’: 0.38,
’max_depth’: 50,
’n_estimators’:50}

As observed in 4.5, the respective performance metrics for each model is noted and
tabulated. We achieve a low test error in Random Forest and Decision Tree model as
compared to XGB models. This comparison is drawn based on accounting two hypo-
thetical assumptions:

1. The relative error in the sample dataset is uniformly distributed.

2. The percentage of outliers are negligible in training phase.

Hyper-parameter tuning for XGB Tree models is performed extensively after care-
fully examining the dataset and its statistical parameters. We employ data visualization
techniques to reduce internal dependency on stochastic constraints and estimate a num-
ber of predictors/estimators based on the average rate of change of the response variable.
We achieve a decent score, but however, since the number of boosting rounds is only
considered as 5, the optimization process is terminated after obtaining a satisfactory
measurement. In context to learning parameters, we have noticed that a incremental
change in the learning rate brings a linear change in the performance, and the topo-
logical difference in the model which played a central role in optimizing the objective
function.

25
CHAPTER 5

NEURAL APPROACHES

5.1 Multi-layered Perceptron Networks

We carefully examine the neural network architecture ranging from number of hidden
layers used and number of hidden neurons inside every layer. It is imperative to first
select a distinctive problem and a neural network model in order to determine its model
and run-time performance. For example, In our experiment we have considered the
dataset "UAE_distributor.xlsx" in order to evaluate a Multi-Layered Perceptron (MLP)
model and perform a function approximation to predict invoiced sales quantity.

5.1.1 Training the Model

The dataset contain 6011 samples containing the above mentioned attributes. We split
the population samples as 80% training samples and 20% testing samples. We use the
generalized back-propagation algorithm, as described in Appendix B. We then tabulate
the observations after performing multiple experiments altering the chronology of train-
ing and randomizing the samples.

Table 5.1: Recorded Performance for MLP Network with 10 Hidden Layers

Model parameters Comparative Measurement


Training measurement Testing measurement
Sample (%) Layers MSE R MSE R
70 10 54.03 0.99082 61.69 0.99079
60 10 75.81 0.95061 77.40 0.91088
50 10 118.06 0.92061 127.88 0.9002
Table 5.2: Training performance of MLP Network

Training Parameters
Network Architecture General Remarks
Performance Gradient Mu Epochs Time
MLP Network 54.0 96.1 1.00 25 0.03 Accepted
(1 Input Layer + 45.8 1.13e+0.3 1.00 10 0.01 Best Fit
10 Hidden Layer + 62.6 1.60e+0.3 10 35 0.05 Acceptable
1 Output Layer ) 63.6 50.9 10 6 0.01 Acceptable

Figure 5.1: Error Surface Plot for Multi-Layered Perceptron Model

Figure 5.2: Actual vs Predicted Plot for MLP network with 10 Hidden Layers

27
5.2 Long-Short Term Memory (LSTM) Networks

LSTM is a type of artificial neural network designed especially for time series predic-
tion problems. These have an input gate, output gate, forget gate and memory cell which
are connected by loops adding feedback over time. The memory cell stores states which
allow LSTMs to generalize patterns across large data points rather than succumbing to
immediate patterns. As more layers of LSTM are added to the model, it is found to
be successful across a diverse range of problems such as describing an image, gram-
mar learning, music composition, language translation, etc. Unidirectional LSTM store
information that has appeared in past whereas bidirectional LSTM store information
from the past as well as the future in time series problems. It maintains two different
hidden states for the same. Bidirectional LSTM is thus found more effective than the
unidirectional variant as it understands the context better.

5.2.1 Architecture

ati i

bt−1
h ϕ
btω

at−1
c ãtc atc

Figure 5.3: LSTM Architecture with one cell

Notations

• wij denotes connection weight from node i to node j

28
• ati is network input to node j at a particular time t

• bti value of node after applying activation function

• ι, ϕ and ω represents input gate, forget gate and output gate respectively

• C represents set of memory cells

• stc denotes the state of a cell c at particular time t

• f represents activation function of gates, g denotes cell input activation functions


whereas h is cell output activation functions

• I, K and H denotes number of inputs, number of outputs and number of cells in


hidden layer respectively

Forward pass:

Input gates

I
∑ H
∑ C

atι = wiι xti + whι bt−1
h + wcι st−1
c (5.1)
i=1 h=1 c=1

btι = f (atι ) (5.2)

Forget gates

I
∑ H
∑ C

atϕ = wiϕ xti + whϕ bt−1
h + wcϕ st−1
c (5.3)
i=1 h=1 c=1

btϕ = f (btϕ ) (5.4)

Cells
I
∑ H

atc = wic xti + whc bt−1
h (5.5)
i=1 h=1

stc = btϕ st−1


c + btι g(atc ) (5.6)

29
Output gates

I
∑ H
∑ C

atω = wiω xti + whω bt−1
h + wcω st−1
c (5.7)
i=1 h=1 c=1

btω = f (atω ) (5.8)

Cell outputs
btω = f (atω ) (5.9)

Backward pass:
∂O
ϵtc = (5.10)
∂btc

∂O
ϵts = (5.11)
∂stc

Cell outputs
C

διt = f ′ (atι ) g(atc )ϵts (5.12)
c=1

Output gates
C


δϕt =f (atϕ ) s(c t − 1)ϵts (5.13)
c=1

States
δct = btι g ′ (atc )ϵts (5.14)

Cells

(
ϵts = btω hι (stc )ϵtc + bϕ t + 1)ϵ(c t + 1) + w( cι)δι( t + 1) + w( cω)δω( t + 1) (5.15)

Forget Gates
C


δωt =f (atω ) h(stc )ϵtc (5.16)
c=1

30
Input Gates
K
∑ H
∑ (
ϵts = wck δkt + wch δh t + 1) (5.17)
k=1 h=1

where f (x) = 1
1+e−x
, g(x) = 4
1+e−x
, h(x) = 2
1+e−x
−1 and g(x)ϵ[−2, 2], h(x)ϵ[−1, 1]

5.2.2 Model

We used bi-directional LSTM to predict the selling price of items. Our feature vector
consisted of selling price by the supplier, distributor, manufacturer and retailer. We used
raw data of fluctuating prices every two minutes across 42 months starting from 9 Feb
2014 till 28 August 2017. The total number of data points were above 8,00,000. The
model was trained on 70% of the data while 20% and 10% was used for testing and
validation. Following diagram which represents the model summary.

Figure 5.4: Model summary for Bi-LSTM Model

Following is error plot of the model after training.

31
Figure 5.5: Error Surface plot for Bi-LSTM Model

5.3 Performance Analyses

This section outlines the performance of each model extensively based on observations
as well as empirical result. All the models were carefully optimized and tested for
evaluating with benchmarks. We started with projecting the performance of each model
in two-dimensional plots and attempted to deduce their inherent model performance by
simple extrapolation. The principal objectives that is accounted for every model are the
model parameters such as sample_size, number of hidden layers, number of neurons
in each layer and the its effect on learning parameters. We attempted the analyses on
obtained results from taken observations and extrapolated and extrapolated the model
expectations further under the same learning parameters.

32
(a) Training cycle=1 (b) Training cycle=2

(c) Training cycle=3

Figure 5.6: Observed training performance of MLP networks after 3 training cycles

Figure 5.7: Actual vs Predicted plot for LSTM network

33
CHAPTER 6

TIME SERIES ANALYSES WITH MATLAB™

In this chapter, we will discuss problems related to time-series analysis specifically


trend approximation and sequence-to-sequence prediction using Deep Learning Mod-
els. In the course of this study, we have generated a time-series sequence pertaining
to risk-aware logistics relatable to a multi-echelon reverse supply chain network. The
dataset is explicity described in Apprendix A.3 with mathematical notationsa and sym-
bols. The objective in this case study is primarily towards building a nomenclature to
reproduce the model evaluated and outlining the noteworthy bottlenecks encountered
during the training and testing phase.

We have carefully investigated the time-series and after an initial visualization of


the series, regular patterns were observed. The patterns suggested that the non-linear
component instigating the trends in the time-series was coherent at every cycle with
absolute minimal variations. The patterns continued till the end of the time-series and
could be easily predictable as observed from a human vision. Enlarging the dataset, we
encountered several regular patterns justifying that the time-series was masked under
a linearly separable function without any substantial non-linear component acting on
it. Hence, our conclusion from the initial study was that ANN models were also im-
plementable for forecasting the series beyond its domain. However, for the sake of a
comparative study, we have selected two distinctive time-series models and evaluated
its performance to draw conclusions. We will discuss those models in the next section.

Broadly speaking, a time-series can be interpreted as series with a progression that


have two components, one which is an independent component or linear component
and the other which is a non-linear component as a function of time itself.
6.1 Non-Linear Auto-Regressive Model (NAR)

NAR Model is a predictive model that considers an input sequence y(t) of preceding ’d’
time-steps to predicts the next sequence of d time-steps. This type of forecasting prob-
lems are termed as sequence-to-sequence prediction as the input sequence is coherently
mapped with the target sequence.

Figure 6.1: NAR Model Architecture

Mathematically, a general sequence-to-sequence problem can be represented as:

y(t) = f (y(t − 1), y(t − 2), y(t − 3), ....., y(t − d)) (6.1)

6.1.1 Pseudo Code

An example code is described to create a NAR model.

1. Importing a dataset as tabular data structures in Matlab ™.


1
2 T i n p = x l s r e a d (" < f i l n a m e . x l s x " , " <A2 : A1000 > " ) ;
3 Tinp ’ = c s v r e a d ( " f i l e n a m e . c s v " , " < r a n g e > " ) ;
4

2. Time-series models do not operate on discrete data. They must be transformed


into time-series sequence. Matlab™provides the preparets() function to convert
discrete data into a continuous time-series.
1
2 NAR_model= n a r n e t ( 1 : 3 , 5 ) ;
3 [ Xs , Xi , Ai , Ts ] = p r e p a r e t s ( NAR_model , { } , { } , T i n p ) ;
4

3. Every predictive model needs to be trained.


1
2 NAR_model = t r a i n ( NAR_model , Xs , Ts , Xi , Ai ) ;
3

35
4. Predict for series y(t) and evaluate performance.
1
2 [Y, Xf , Af ] = NAR_model ( Xs , Xi , Ai ) ;
3 p e r f = p e r f o r m ( NAR_model , Ts , Y)
4

Table 6.1: Training performance of NAR model

Training Parameters
Network Architecture
Performance Gradient Mu Epochs Time
7.92e-05 1.67e-04 1.00e-06 12 0.04
8.10e-05 5.18e-05 1.00e-06 18 0.04
NAR Model
7.99e-05 1.48e-04 1.00e-07 6 0.01
(delay 4 time-steps)
7.49e-05 2.88e-04 1.00e-07 14 0.04
8.64e-05 1.44e-04 1.00e-07 20 0.05
b
7.92 e-05 denotes a very desirable model for sequence-to-sequence prediction.

6.1.2 Results

(a) NAR forecasted series 1 (b) NAR forecasted series 2

(c) NAR forecasted series 3 (d) NAR forecasted series 4

Figure 6.2: Actual vs Predicted series for NAR model

36
Figure 6.3: Error Surface Plot for NAR Model

Fig. 6.3 suggests that the error surface for NAR model is curvi-linear.The plot
suggests a monotonous increase in the absolute error, however we need to calculate the
rate of change of error to draw quantitative conclusion as compared to results obtained
from other models. The rate of change of error can be derived as follows:

y = b ex
dy
= b ex
dx
tanθ = b ex

θ = tan−1 (b ex )
dθ d
= (tan−1 (b ex ))
dx dx
dθ 1
=
dx 1 + b e2x

Figure 6.4: Forecasted Series for NAR Model

37
CHAPTER 7

CONCLUSION

The statistical hypothesis tests conducted across several datasets denotes arbitrary deci-
sions on random sampling and data optimizations cannot be generally considered affir-
mative. As a contradictory remark, our hypothesis was rejected for the Walmart dataset.
The tabulated results shows that even at 0.05% level of significance, the difference of
mean of samples to that of population mean can vary and is in-determinant until calcu-
lated carefully. This study helped us to refrain from making hypothetical assumptions
regarding the distribution of the dataset. As a conclusion, it is clear the the hypothe-
sis validation would entirely depend on the distribution of the dataset and its degree of
skweness.

Next, as we attempted several performance analyses on statistical machine learn-


ing and curve-fitting models, we studies the indigenous sparsity of the datasets. Even
though some of the model reproduced identical series as compared to the actual series,
the loss function could not be reduced detrimentally after a threshold. This signifies
that even with multiple attempts of curve-fitting and hyper-parameter tuning, a stable
model could not be achieved unless the underlying mathematical models are known.
However, we also concluded that XGB model outperformed other models although
Random Forest ensembles achieved a reasonably good fit. We also attempted hyper-
parameter tuning for coarsed and simple decision tree model with random pruning and
arbitrary number of splits. The objective function was optimized within 4 iterations,
However, the Mean Squared Error (MSE) and gradient obtained after training was
comparatively lower than the other models. We also compared Bi-directional LSTM
networks with MLP networks.Bi-directional LSTM networks achieved outperforming
results for sequence-to-sequence prediction, however the MLP networks failed to be
robust against non-linear variations in the Time-series dataset. Finally, we discussed
time-series analyses with NAR model and examined the results extensively. NAR model
achieved outperforming MSE within the range of 1e-05 to 1e-06.
CHAPTER 8

CODE ANALYSIS

In this chapter, the program code related to our work using MATLAB™and Python
is presented. Additionally, we have also provided a dynamic code analysis generated
using the MATLAB™code analyzer.

8.1 Source Code

1 # ! / u s r / b i n / env p y t h o n 3
2 # −*− c o d i n g : u t f −8 −*−
3 """
4 C r e a t e d on Thu Feb 28 1 3 : 1 8 : 5 4 2019
5 Random F o r e s t R e g r e s s i o n w i t h P y s p a r k
6 @author : h e e r o k b a n e r j e e
7 """
8

9 i m p o r t p a n d a s a s pd
10 from p y s p a r k . ml i m p o r t P i p e l i n e
11 from p y s p a r k . ml . r e g r e s s i o n i m p o r t R a n d o m F o r e s t R e g r e s s o r
12 from p y s p a r k . ml . e v a l u a t i o n i m p o r t R e g r e s s i o n E v a l u a t o r
13 from p y s p a r k . ml . f e a t u r e i m p o r t V e c t o r A s s e m b l e r
14 from p y s p a r k . ml . f e a t u r e i m p o r t I m p u t e r
15 from p y s p a r k . ml . f e a t u r e i m p o r t S t r i n g I n d e x e r
16 from p y s p a r k . s q l . s e s s i o n i m p o r t S p a r k S e s s i o n
17 from p y s p a r k . c o n t e x t i m p o r t S p a r k C o n t e x t
18

19 #
20 sc = SparkContext ( ’ l o c a l ’ )
21 spark = SparkSession ( sc )
22

23

24 # Importing Dataset
25 d a t a s e t = spark . read . format ( " csv " ) . option ( " header " , " t r u e " ) . load ( " /
home / h e e r o k b a n e r j e e / Documents / hpd . c s v " )
26 d a t a s e t = d a t a s e t . withColumn ( " Order_Demand " , d a t a s e t [ " Order_Demand " ] .
c a s t ( ’ double ’ ) )
27 d a t a s e t = d a t a s e t . s e l e c t ( " P r o d u c t _ C o d e " , " Warehouse " , " P r o d u c t _ C a t e g o r y
" , " D a t e " , " Order_Demand " )
28

29 # I n d e x l a b e l s , a d d i n g m e t a d a t a t o t h e l a b e l column .
30 # F i t on whole d a t a s e t t o i n c l u d e a l l l a b e l s i n i n d
31 CodeIndexer= S t r i n g I n d e x e r ( inputCol =" Product_Code " , outputCol ="
CodeIndex " , h a n d l e I n v a l i d =" s k i p " )
32 W a r e h o u s e I n d e x e r = S t r i n g I n d e x e r ( i n p u t C o l = " Warehouse " , o u t p u t C o l = "
WarehouseIndex " , h a n d l e I n v a l i d =" s k i p " )
33 CategoryIndexer = S t r i n g I n d e x e r ( inputCol =" Product_Category " ,
outputCol =" CategoryIndex " , h a n d l e I n v a l i d =" skip " )
34 D a t e I n d e x e r = S t r i n g I n d e x e r ( i n p u t C o l =" Date " , o u t p u t C o l =" DateIndex " ,
h a n d l e I n v a l i d =" skip " )
35

36 assembler = VectorAssembler (
37 i n p u t C o l s =[ " CodeIndex " , " WarehouseIndex " , " C a t e g o r y I n d e x " , "
DateIndex " ] ,
38 o u t p u t C o l = " Ghoda " , h a n d l e I n v a l i d = " s k i p " )
39

40 DemandImputer = I m p u t e r ( i n p u t C o l s = [ " Order_Demand " ] , o u t p u t C o l s = [ "


Order_pure " ] )
41

42 ( trainingData , testData ) = dataset . randomSplit ([0 .7 , 0 . 3 ] )


43 # t r a i n i n g D a t a . show ( )
44 # T r a i n a Ra n d o m F or e s t model .
45 r f = R a n d o m F o r e s t R e g r e s s o r ( l a b e l C o l = " O r d e r _ p u r e " , f e a t u r e s C o l = " Ghoda "
,
46 numTrees =3 , maxBins = 500 0)
47

48

49 # C h a i n i n d e x e r s and f o r e s t i n a P i p e l i n e
50 p i p e l i n e = P i p e l i n e ( s t a g e s =[ CodeIndexer , WarehouseIndexer ,
CategoryIndexer ,
51 D a t e I n d e x e r , a s s e m b l e r , DemandImputer , r f ] )
52

40
53 # T r a i n model . This also runs the indexers .
54 model = p i p e l i n e . f i t ( t r a i n i n g D a t a )
55

56 # Make p r e d i c t i o n s .
57 p r e d i c t i o n s = model . t r a n s f o r m ( t e s t D a t a )
58

59 p r e d i c t i o n s . s e l e c t ( " p r e d i c t i o n " ) . d i s t i n c t ( ) . show ( )


60

61 p r e d i c t i o n s . d i s t i n c t ( ) . show ( )
62

63 evaluator = RegressionEvaluator (
64 l a b e l C o l = " O r d e r _ p u r e " , p r e d i c t i o n C o l = " p r e d i c t i o n " , metricName = "
rmse " )
65 rmse = e v a l u a t o r . e v a l u a t e ( p r e d i c t i o n s )
66 p r i n t ( " Root Mean S q u a r e d E r r o r (RMSE) = %g " % rmse )
67

68

69 # ## p l o t t i n g g r a p h
70 eg = p r e d i c t i o n s . s e l e c t ( " p r e d i c t i o n " , " O r d e r _ p u r e " , " Ghoda " ) . l i m i t
(1000)
71 p a n d a _ e g = eg . t o P a n d a s ( )
72

73 panda_eg . p l o t ( kind =" bar " , s t a c k e d =" t r u e " )

Listing 8.1: Random Forest Regression

1 # ! / u s r / b i n / env p y t h o n 3
2 # −*− c o d i n g : u t f −8 −*−
3 """
4 C r e a t e d on Wed J a n 23 0 2 : 1 2 : 4 1 2019
5 XGBoosted C l a s s i f i c a t i o n w i t h P y s p a r k and x g b o o s t l i b
6 @author : h e e r o k b a n e r j e e
7 """
8

9 i m p o r t numpy a s np
10

11 from p y s p a r k . ml i m p o r t P i p e l i n e
12 from p y s p a r k . ml . f e a t u r e i m p o r t V e c t o r A s s e m b l e r
13 from p y s p a r k . ml . f e a t u r e i m p o r t S t r i n g I n d e x e r
14 # from p y s p a r k . ml . c l a s s i f i c a t i o n i m p o r t D e c i s i o n T r e e C l a s s i f i e r

41
15 from p y s p a r k . s q l . s e s s i o n i m p o r t S p a r k S e s s i o n
16 from p y s p a r k . c o n t e x t i m p o r t S p a r k C o n t e x t
17

18 from s k l e a r n i m p o r t m o d e l _ s e l e c t i o n
19 from s k l e a r n . m e t r i c s i m p o r t a c c u r a c y _ s c o r e
20

21 i m p o r t x g b o o s t a s xgb
22

23 sc = SparkContext ( ’ l o c a l ’ )
24 spark = SparkSession ( sc )
25

26 fname_train = " wallmart . csv "


27

28

29 def spark_read ( filename ) :


30 f i l e = spark . read . format ( " csv " ) . option ( " header " , " t r u e " ) . load
( filename )
31 return file
32

33 def convert_to_numeric ( data ) :


34 f o r x i n [ "WEIGHT (KG) " , "MEASUREMENT" , "QUANTITY" ] :
35 d a t a = d a t a . withColumn ( x , d a t a [ x ] . c a s t ( ’ d o u b l e ’ ) )
36 return data
37

38 # ## I m p o r t T r a i n i n g d a t a s e t
39 data = spark_read ( fname_train )
40 d a t a = d a t a . s e l e c t ( "ARRIVAL DATE" , "WEIGHT (KG) " , "MEASUREMENT" , "QUANTITY
" , "CARRIER CITY " )
41 ( train_data , t e s t _ d a t a )=data . randomSplit ( [ 0 . 8 , 0 . 2 ] )
42 train_data =convert_to_numeric ( train_data )
43

44 # ## P i p e l i n e Component1
45 # ## S t r i n g I n d e x e r f o r Column " Timestamp "
46 # ##
47 dateIndexer = StringIndexer (
48 i n p u t C o l = "ARRIVAL DATE" ,
49 outputCol =" dateIndex " , h a n d l e I n v a l i d =" skip " )
50 # p r i n t ( s t r I n d e x e r . getOutputCol ( ) )
51

42
52 # i n d e x e r _ o u t . show ( )
53

54 # ## P i p e l i n e Component2
55 # ## S t r i n g I n d e x e r f o r Column " L a b e l "
56 # ##
57 carrierIndexer = StringIndexer (
58 i n p u t C o l = "CARRIER CITY " ,
59 outputCol =" c a r r i e r I n d e x " , h a n d l e I n v a l i d =" skip " )
60 # p r i n t ( s t r I n d e x e r . getOutputCol ( ) )
61 # out2 = l a b e l I n d e x e r . f i t ( t r a i n _ d a t a ) . transform ( t r a i n _ d a t a )
62

63 # ## P i p e l i n e Component2
64 # ## V e c t o r A s s e m b l e r
65 # ##
66 vecAssembler = VectorAssembler (
67 i n p u t C o l s = [ "WEIGHT (KG) " , "MEASUREMENT" , "QUANTITY" , " d a t e I n d e x "
],
68 outputCol =" vecFea " , h a n d l e I n v a l i d =" s k i p " )
69 # assembler_out = vecAssembler . transform ( indexer_out )
70 # a s s e m b l e r _ o u t . s e l e c t ( " v e c F e a " ) . show ( t r u n c a t e = F a l s e )
71

72 # ## P i p e l i n e Component3
73 # ## GBT C l a s s i f i e r
74 # d t _ c l a s s = D e c i s i o n T r e e C l a s s i f i e r ( l a b e l C o l =" I n d e x L a b e l " , f e a t u r e s C o l ="
vecFea " )
75

76 # ## T r a i n i n g − P i p e l i n e Model
77 # ##
78 p i p e = P i p e l i n e ( s t a g e s =[ d a t e I n d e x e r , c a r r i e r I n d e x e r , vecAssembler ] )
79 pipe_model= pipe . f i t ( t r a i n _ d a t a )
80

81 output =pipe_model . t r a n s f o r m ( t r a i n _ d a t a )
82 o u t _ v e c = o u t p u t . s e l e c t ( " d a t e I n d e x " , " v e c F e a " ) . show ( 1 0 )
83

84 num_classes= output . s e l e c t ( " c a r r i e r I n d e x " ) . d i s t i n c t ( ) . count ( )


85 p r i n t ( num_classes )
86 # ## X G B o o s t C l a s s i f i e r Model
87 # ##
88 # ##

43
89 p a r a m s = { ’ max_depth ’ : 2 ,
90 ’ s i l e n t ’ :0 ,
91 ’ learning_rate ’ :0.38 ,
92 ’ objective ’ : ’ multi : softprob ’ ,
93 ’ num_class ’ :284}
94

95 f e a t u r e s _ x =np . a r r a y ( o u t p u t . s e l e c t ( " v e c F e a " ) . c o l l e c t ( ) )


96 l a b e l s _ y =np . a r r a y ( o u t p u t . s e l e c t ( " c a r r i e r I n d e x " ) . c o l l e c t ( ) )
97 p r i n t ( max ( l a b e l s _ y ) )
98 f e a t u r e s _ x =np . s q u e e z e ( f e a t u r e s _ x , a x i s = 1)
99 X_train , X_test , Y_train , Y_test = model_selection . t r a i n _ t e s t _ s p l i t (
f e a t u r e s _ x , l a b e l s _ y , t e s t _ s i z e = 0 . 5 1 , r a n d o m _ s t a t e =1 23 )
100

101

102 x g b _ t r a i n = xgb . DMatrix ( X _ t r a i n , l a b e l = Y _ t r a i n )


103 x g b _ t e s t = xgb . DMatrix ( X _ t e s t , l a b e l = Y _ t e s t )
104

105 # xgbmodel = X G B C l a s s i f i e r ( )
106 xgbmodel = xgb . t r a i n ( params , x g b _ t r a i n , 1 0 )
107 p r i n t ( xgbmodel )
108

109 # ## T e s t i n g P i p e l i n e + X G B o o s t C l a s s i f i e r
110 # ##
111

112

113

114 t e s t _ o u t p u t =pipe_model . t r a n s f o r m ( t e s t _ d a t a )
115

116 x g b _ o u t p u t = xgbmodel . p r e d i c t ( x g b _ t e s t )
117 p r i n t ( xgb_output )
118

119 p r e d i c t i o n s = np . a s a r r a y ( [ np . argmax ( l i n e ) f o r l i n e i n x g b _ o u t p u t ] )
120 print ( predictions )
121

122 # ## D e t e r m i n i n g A c c u r a c y S c o r e
123 # ##
124 accuracy = accuracy_score ( Y_test , p r e d i c t i o n s )

44
125 p r i n t ( " A c c u r a c y : %.2 f%%" % ( a c c u r a c y * 1 0 0 . 0 ) )

Listing 8.2: XGBoosted Classification

1 i m p o r t numpy a s np
2 i m p o r t p a n d a s a s pd
3 from k e r a s . m o d e l s i m p o r t S e q u e n t i a l
4 from k e r a s . l a y e r s i m p o r t D r o p o u t
5 from k e r a s . l a y e r s i m p o r t LSTM
6 from k e r a s . l a y e r s i m p o r t Dense
7 from k e r a s . l a y e r s i m p o r t B i d i r e c t i o n a l
8 from k e r a s . c a l l b a c k s i m p o r t M o d e l C h e c k p o i n t
9 from s k l e a r n . m e t r i c s i m p o r t m e a n _ s q u a r e d _ e r r o r a s mse
10

11 i n p u t _ f i l e = " t r a i n . csv "


12 d f = pd . r e a d _ c s v ( i n p u t _ f i l e )
13

14 def remove_duplicates ( df ) :
15 ColoumnArr = np . a r r a y ( d f . i x [ : , ’ Timestamp ’ ] )
16 i =0
17 ArrLen = l e n ( ColoumnArr )
18 index_duplicate= []
19 # i d e n t i f y d u p l i c a t e s by i n d e x
20 w h i l e ( i <ArrLen −1) :
21 i f ColoumnArr [ i ]== ColoumnArr [ i + 1 ] :
22 i n d e x _ d u p l i c a t e . append ( i +1)
23 i +=1
24 # remove d u p l i c a t e s
25 df=df . drop ( i n d e x _ d u p l i c a t e )
26 r e t u r n df
27

28

29 d e f a v g _ o v e r _ t i m e ( df , i n d e x C o l = 0) :
30 avg ={}
31 colLen= df . shape [ 0 ]
32 f o r x in range ( colLen ) :
33 time = s t r ( df [ x ] [ indexCol ] ) [ 1 1 : ]
34 i f t i m e n o t i n avg :
35 avg [ t i m e ] = [ [ 0 . 0 , 0 . 0 , 0 . 0 , 0 . 0 , 0 . 0 ] , 0 ]
36 f o r colNo i n r a n g e ( 1 , 6 ) :

45
37 avg [ t i m e ] [ 0 ] [ colNo −1]+= f l o a t ( d f [ x ] [ colNo ] )
38 avg [ t i m e ] [ 1 ] + = 1
39 f o r key , v a l i n avg . i t e m s ( ) :
40 avg [ key ] = [ x * 1 . 0 / v a l [ 1 ] f o r x i n v a l [ 0 ] ]
41 r e t u r n avg
42

43

44 d e f r e p l a c e _ n o i s e ( df , i n d e x C o l = 0 ) :
45 avg = a v g _ o v e r _ t i m e ( df , i n d e x C o l )
46 colLen= df . shape [ 0 ]
47 f o r x in range ( colLen ) :
48 time = s t r ( df [ x ] [ indexCol ] ) [ 1 1 : ]
49 for col in range (1 ,6) :
50 i f df [ x ] [ col ]==0:
51 try :
52 d f [ x ] [ c o l ] = avg [ t i m e ] [ c o l −1]
53 except KeyError :
54 print (x)
55 print ( col )
56 r e t u r n df
57

58 df = df . drop ( [ ’ Label ’ ] , a x i s =1)


59 d f = d f . r e p l a c e ( np . nan , 0 )
60 df= remove_duplicates ( df ) . values
61 df= r e p l a c e _ n o i s e ( df )
62 p r i n t ( df [ 0 ] )
63

64 # data preperation
65 s e q _ l e n g t h = 100
66 DataX= [ ]
67 DataY= [ ]
68

69 f o r x i n r a n g e ( l e n ( d f )−s e q _ l e n g t h ) :
70 SeqX= d f [ x : x+ s e q _ l e n g t h , 1 : 6 ]
71 SeqY= d f [ x+ s e q _ l e n g t h , 5 ]
72 DataX . a p p e n d ( SeqX )
73 DataY . a p p e n d ( SeqY )
74

75 DataX=np . a r r a y ( DataX )

46
76 DataY=np . a r r a y ( DataY )
77

78 # t r a n s f o r m i n g to ( samples , seq length , f e a t u r e s )


79 DataX= np . r e s h a p e ( DataX , ( DataX . s h a p e [ 0 ] , DataX . s h a p e [ 1 ] , DataX . s h a p e
[2]) )
80 DataY= np . r e s h a p e ( DataY , ( DataY . s h a p e [ 0 ] , 1 ) )
81 T r a i n D a t a X = DataX [ : i n t ( 0 . 7 * l e n ( DataX ) ) ]
82 T r a i n D a t a Y = DataY [ : i n t ( 0 . 7 * l e n ( DataY ) ) ]
83 T e s t D a t a X = DataX [ i n t ( 0 . 7 * l e n ( DataX ) ) : ]
84 T e s t D a t a Y = DataY [ i n t ( 0 . 7 * l e n ( DataY ) ) : ]
85

86 # d e v e l o p i n g t h e model
87 model = S e q u e n t i a l ( )
88 model . add ( B i d i r e c t i o n a l (LSTM( 3 2 , r e t u r n _ s e q u e n c e s = T r u e ) , i n p u t _ s h a p e
=( T r a i n D a t a X . s h a p e [ 1 ] , T r a i n D a t a X . s h a p e [ 2 ] ) ) )
89 model . add ( D r o p o u t ( 0 . 2 ) )
90 model . add ( B i d i r e c t i o n a l (LSTM( 3 2 ) ) )
91 model . add ( D r o p o u t ( 0 . 2 ) )
92 model . add ( Dense ( T r a i n D a t a Y . s h a p e [ 1 ] ) )
93 filename =" b e s t _ w e i g h t . hdf5 "
94 model . l o a d _ w e i g h t s ( f i l e n a m e )
95 model . c o m p i l e ( l o s s = ’ m e a n _ s q u a r e d _ e r r o r ’ , o p t i m i z e r = ’ adam ’ )
96 # c h e c k p o i n t = M o d e l C h e c k p o i n t ( f i l e p a t h , m o n i t o r = ’ l o s s ’ , v e r b o s e =1 ,
s a v e _ b e s t _ o n l y = True ,
97 #mode= min )
98 # c a l l b a c k s _ l i s t = [ checkpoint ]
99 # f i t t h e model
100 # model . f i t ( T r a i n D a t a X , T r a i n D a t a Y , n b _ e p o c h =50 , b a t c h _ s i z e =64 ,
callbacks= callbacks_list )
101 p r i n t ( " loaded weights " )
102 y = model . p r e d i c t ( T e s t D a t a X )
103 print ( " predicted " )
104 p r i n t ( mse ( TestDataY , y ) )

Listing 8.3: Bi-LSTM Forecasting

47
8.2 Dynamic Code Analysis

We executed the MATLAB programs in MATLAB™online cloud platform and the Ma-
chine Learning models with standalone python library. Additionally, a series of bench-
mark tests were conducted to assess whether they can be implemented for extrapolated
samples from the original datasets. The time complexities for each successive function
call is given to trace the call stack.

Figure 8.1: Execution of Code snippet for Decision Trees

Figure 8.2: Execution of Code snippet for Random Forest

A static code analysis may be interpreted as a benchmark test to establish if the


source code is syntactically accurate and is well-structured for the assembler to translate
the program.

48
Figure 8.3: Execution of Code snippet for XGB Trees

Figure 8.4: Execution of Code snippet for NAR live forecasting

Along with the execution of each model, we also attempted to understand the in-
herent time complexity and functional dependencies of our algorithmic steps. In order
to evaluate a line-by-line dynamic code analysis and obtain a tabulated result, we have
chosen MATLAB ™Code Analyzer to generate an automated report. The report depicts
the total time elapsed for each functional call and allows to optimize a given source code
in terms of modular programming standards.

We have considered individual time elapsed and cyclomatic complexity of different


functions as a parameter while concluding our tests. The results highlight the section of
the code with the dominant consumption of time due to recursive calls and higher-order
operations.

49
(a) Breakdown of Total Elapsed Time for Function calls

(b) Time complexity and cyclomatic complexity

Figure 8.5: Dynamic code analysis using Matlab™Code Analyzer

50
8.3 Test Cases

In this section, we have tabulated the test cases and their respective dimensions as sub-
jected to the prepared models.

Table 8.1: Test cases for ML models

Model Parameters Dimenstions


Model Dataset Dimension
Input Features Output Feature

Walmart.csv [190962,33] [50000,3] [50000,1]


Decision Tree
UAE_Distributor.xlsx [6011,7] [2500,2] [2500,1]

UAE_Distributor.xlsx [6011,7] [2500,1] [2500,1]


Random Forest UAE_Distributor.xlsx [6011,7] [4500,3] [4500,1]
UAE_Distributor.xlsx [6011,7] [2500,3] [2500,1]

Walmart.csv [190962,33] [150000,3] [150000,1]


XGBoosted Tree
Walmart.csv [190962,33] [100000,7] [100000,1]

Table 8.2: Test cases for ANN Models

Model Parameters Dimenstions


Model Dataset Dimension
Input Features Output Feature

Walmart.csv [190962,33] [150000,4] [150000,1]


MLP
UAE_Distributor.xlsx [6011,7] [5500,4] [5500,1]

train.csv [800001,7] [80000,7] [80000,1]


NAR train.csv [800001,7] [500001,7] [500001,1]

Walmart.csv [190962,33] [150000,4] [150000,1]


Bi-LSTM
train.csv [800001,7] [150001,3] [150001,1]

51
CHAPTER 9

PUBLICATION

We have simulated a multi-echelon closed-loop supply chain and generated a time-


series sequence of 6 lakh timesteps for the open research community. The dataset is
published in Mendeley data library under the affiliation of SRM Institute of Science
& Technology, India. (https://data.mendeley.com/datasets/gystn6d3r4/2)

"Banerjee, Heerok; Saparia, Grishma; Ganapathy, Velappa; Garg, Priyanshi; Shen-


bagaraman, V. M. (2019), Time Series Dataset for Risk Assessment in Supply Chain
Networks, Mendeley Data, v2 http://dx.doi.org/10.17632/gystn6d3r4.2"

Next, an article describing the mathematical modelling and a numerical example is


published.

Grishma, Saparia: Time Series Dataset for Risk Assessment in Supply Chain Net-
works, ResearchGate, DOI: 10.17632/gystn6d3r4.2"

Finally, the results from this study is cross-validated with pre-existing models and
we have finished drafting our research paper. We plan to submit the paper for a double-
blinded peer review at Operations Research-PUBSonline library by April 2019.
APPENDIX A

DATASET DESCRIPTION

A.1 Walmart Dataset

The dataset "wallmart.csv" contains an extensive amount of data with specific geo-
graphical data like address, port description, destination details etc pertaining to the
inherent logistics involved in a typical Walmart supply chain.

The dataset contains features such as: SHIPPER, SHIPPER ADDRESS, CON-
SIGNEE, CONSIGNEE ADDRESS, ZIPCODE, NOTIFY, NOTIFY ADDRESS, BILL
OF LADING, ARRIVAL DATE, WEIGHT (LB), WEIGHT (KG), FOREIGN PORT,
US PORT QUANTITY, Q.UNIT, MEASUREMENT, M.UNIT, SHIP REGISTERED
IN, VESSEL NAME, CONTAINER NUMBER, CONTAINER COUNT, PRODUCT
DETAILS, MARKS AND NUMBERS, COUNTRY OF ORIGIN, DISTRIBUTION
PORT, HOUSE vs MASTER, MASTER B/L CARRIER CODE, CARRIER NAME,
CARRIER ADDRESS, CARRIER CITY, CARRIER STATE, CARRIER ZIP, PLACE
OF RECEIPT

A.2 UAE Distributor Dataset

The dataset "UAE_distributor.xlsx" stores an empirical record of a supply chain net-


work. The data-sheet "Sales" contains 6011 samples describing sales transactions made
between a single distributor and multiple customers across the UAE fish industry. An-
other data-sheet "Purchases" contains 94 samples representing purchases made by the
distributor in procuring different fish products from multiple suppliers. The data-sheet
"Items" describes the different items purchased and sold to different customers. The
data-sheet "Customers" contains a list of customers. This dataset can be extensively
utilized for regression problems.

The dataset contains 7 features such as :


1. Date (Timestamp)

2. Item Number (double)

3. Item Description (String)

4. Customer Name (String)

5. Net Sales (double)

6. Net Cost (double)

7. Average Sales Quantity (double)

A.3 Time Series Dataset

The Time-series sequence consists of 6 Lakh timesteps of seven attributes namely,


Timestamp, RI_Supplier1, RI_Distributor1, RI_Manufacturer1, RI_Retailer1, Total_Cost,
SCMstability_category. The time-series sequence is generated by simulating a multi-
echelon supply chain network in MATLAB with three suppliers, one distributor, one
manufacturer and one retailer. An arbitrary demand is introduced to the supply chain
model and selective features such as total cost and risk index are calculated at each
time-step.

The dataset is available online in Mendeley library.


To manually add the dataset in the bibiliography, use the following :
"Banerjee, Heerok; Saparia, Grishma; Ganapathy, Velappa; Garg, Priyanshi; Shenba-
garaman, V. M. (2019), Time Series Dataset for Risk Assessment in Supply Chain Net-
works, Mendeley Data, v2 http://dx.doi.org/10.17632/gystn6d3r4.2"

54
APPENDIX B

BACKPROPAGATION ALGORITHM

X1 1 1 1 O1

Xi i j k Ok

Xn n l m Om

Figure B.1: Simple neural network architecture with one input layer of ’n’ neurons,
one hidden layer of ’l’ neurons and one output layer of ’m’ neurons

The error for he k th neuron in the output layer is-

δOk = (Tpk − Ok ) (B.1)

The error minimized by the Generalized Delta Rule (GDR):

m
1∑ 2
Ep (m) = δ (B.2)
2 k=1 Ok

Using the estimate of gradient descent along the error surface to determine the
weight update, Wij we get

∂Ep (m)
∆Wij (m) = −η (B.3)
∂Wij

The weight update equation is given by-

W (m + 1)ij = Wij (m) + ∆Wij (B.4)


The net input of Hidden Layer is given by-

n

nethpj = (Wij . Xpi ) + bhj (B.5)
i=1

Output of neuron ’j’ in hidden layer is-

ypj = fjh (nethpj ) (B.6)

The net output of Output Layer is given by-

l

netopk = o
(Wjk . ypj )h + bok (B.7)
j=1

Output of the k th neuron-


Opk = fko (netopk ) (B.8)

Using equ.( B.1) to equ.(B.8), we derive the equation for updating output layer
weights (between hidden layer & output layer).

∂Ep (m)
∆Wjk (m) = −η
∂Wjk

Ep (m) δEp (m)


∂ = −η
Wjk δWjk

56
REFERENCES

[1] Ivanov, Dmitry, Alexander Tsipoulanidis, and Jörn Schönberger. "Basics of Sup-
ply Chain and Operations Manageme04nt." Global Supply Chain and Operations
Management. Springer, Cham, 2017. 1-14.

[2] Van der Vorst, J. G. A. J. "Supply Chain Management: theory and practices." Bridg-
ing Theory and Practice. Reed Business, 2004. 105-128.

[3] Guedes, Edson Júnior Gomes, et al. "Risk Management in the Supply Chain of the
Brazilian automotive industry." Journal of Operations and Supply Chain Manage-
ment 8.1 (2015): 72-87.

[4] Flynn, Barbara, Mark Pagell, and Brian Fugate. "Survey research design in supply
chain management: the need for evolution in our expectations." Journal of Supply
Chain Management 54.1 (2018): 1-15.

[5] Scheibe, Kevin P., and Jennifer Blackhurst. "Supply chain disruption propagation:
a systemic risk and normal accident theory perspective." International Journal of
Production Research 56.1-2 (2018): 43-59.

[6] Ghadge, Abhijeet, et al. "A systems approach for modelling supply chain risks."
Supply chain management: an international journal 18.5 (2013): 523-538.

[7] Ojha, Ritesh, et al. "Bayesian network modelling for supply chain risk propaga-
tion." International Journal of Production Research 56.17 (2018): 5795-5819.

[8] Santibanez-Gonzalez, Ernesto Del R., and Henrique Pacca Luna. "An Evolutionary
Scheme for Solving a Reverse Supply Chain Design Problem." Proceedings on the
International Conference on Artificial Intelligence (ICAI). The Steering Committee
of The World Congress in Computer Science, Computer Engineering and Applied
Computing (WorldComp), 2012.

57
[9] Neureuther, Brian D., and George Kenyon. "Mitigating supply chain vulnerability."
Journal of marketing channels 16.3 (2009): 245-263.

[10] Käki, Anssi, Ahti Salo, and Srinivas Talluri. "Disruptions in supply networks: A
probabilistic risk assessment approach." Journal of Business Logistics 36.3 (2015):
273-287.

[11] Scheibe, Kevin P., and Jennifer Blackhurst. "Supply chain disruption propagation:
a systemic risk and normal accident theory perspective." International Journal of
Production Research 56.1-2 (2018): 43-59.

[12] Mastrocinque, Ernesto, et al. "A multi-objective optimization for supply chain
network using the bees algorithm." International Journal of Engineering Business
Management 5.Godite 2013 (2013): 5-38.

[13] Cao, Cejun, et al. "A novel multi-objective programming model of relief distribu-
tion for sustainable disaster supply chain in large-scale natural disasters." Journal
of Cleaner Production 174 (2018): 1422-1435.

[14] Hajikhani, Alborz, Mohammad Khalilzadeh, and Seyed Jafar Sadjadi. "A fuzzy
multi-objective multi-product supplier selection and order allocation problem in
supply chain under coverage and price considerations: An urban agricultural case
study." Scientia Iranica 25.1 (2018): 431-449.

[15] Loni, Parvaneh, Alireza Arshadi Khamseh, and Seyyed Hamid Reza Pasandideh.
"A new multi-objective/product green supply chain considering quality level repro-
cessing cost." International Journal of Services and Operations Management 30.1
(2018): 1-22.

[16] Cao, Cejun, et al. "A novel multi-objective programming model of relief distribu-
tion for sustainable disaster supply chain in large-scale natural disasters." Journal
of Cleaner Production 174 (2018): 1422-1435.

[17] Cao, Cejun, et al. "A novel multi-objective programming model of relief distribu-
tion for sustainable disaster supply chain in large-scale natural disasters." Journal
of Cleaner Production 174 (2018): 1422-1435.

58
[18] Singh, Sujeet Kumar, and Mark Goh. "Multi-objective mixed integer program-
ming and an application in a pharmaceutical supply chain." International Journal of
Production Research 57.4 (2019): 1214-1237.

[19] Metiaf, Ali, et al. "Multi-objective Optimization of Supply Chain Problem Based
NSGA-II-Cuckoo Search Algorithm." IOP Conference Series: Materials Science
and Engineering. Vol. 435. No. 1. IOP Publishing, 2018.

[20] Hendalianpour, Ayad, et al. "A linguistic multi-objective mixed integer program-
ming model for multi-echelon supply chain network at bio-refinery." EuroMed
Journal of Management 2.4 (2018): 329-355.

[21] Park, Kijung, Gül E. Okudan Kremer, and Junfeng Ma. "A regional information-
based multi-attribute and multi-objective decision-making approach for sustainable
supplier selection and order allocation." Journal of Cleaner Production 187 (2018):
590-604.

[22] Ebrahimi, M., R. Tavakkoli-Moghaddam, and F. Jolai. "Bi-objective Build-to-


order Supply Chain Problem with Customer Utility." International Journal of Engi-
neering 31.7 (2018): 1066-1073.

[23] Cao, Cejun, et al. "A novel multi-objective programming model of relief distribu-
tion for sustainable disaster supply chain in large-scale natural disasters." Journal
of Cleaner Production 174 (2018): 1422-1435.

[24] Yildizbai, Abdullah, et al. "Multi-level optimization of an automotive closed-loop


supply chain network with interactive fuzzy programming approaches." Technolog-
ical and Economic Development of Economy 24.3 (2018): 1004-1028.

[25] Çalk, Ahmet, et al. "A Novel Interactive Fuzzy Programming Approach for Opti-
mization of Allied Closed-Loop Supply Chains." International Journal of Compu-
tational Intelligence Systems 11.1 (2018): 672-691.

[26] Park, Kijung, Gül E. Okudan Kremer, and Junfeng Ma. "A regional information-
based multi-attribute and multi-objective decision-making approach for sustainable
supplier selection and order allocation." Journal of Cleaner Production 187 (2018):
590-604.

59

You might also like