0% found this document useful (0 votes)
49 views50 pages

ssrn-3888897 - End-to-End Deep Learning For Automatic Inventory

Uploaded by

Danilo Fachin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views50 pages

ssrn-3888897 - End-to-End Deep Learning For Automatic Inventory

Uploaded by

Danilo Fachin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Vol. 00, No. 0, Xxxxx 0000, pp.

000–000
ISSN 0000-0000 | EISSN 0000-0000 | 00 | 0000 | 0001

End-to-End Deep Learning for Automatic Inventory


Management with Fixed Ordering Cost
Mo Liu
Industrial Engineering and Operations Research Department, University of California, Berkeley mo [email protected]

Meng Qi
SC Johnson College of Business, Cornell University [email protected]

Zuojun (Max) Shen


Industrial Engineering and Operations Research Department, University of California, Berkeley [email protected]

Abstract: We designed an automatic decision-making system for an e-commerce platform to help manage the inventory
of millions of stock keep units (SKUs) sold on the platform. The platform has accumulated a significant amount of
contextual information, which may be useful for predicting both the demand and vendor lead time. Our proposed decision-
making system makes the following decisions automatically based on contextual information and historical data: (1)
order timing and (2) order quantity. We used an end-to-end (E2E) learning approach, where we developed a deep learning
framework that directly outputs the optimal order timing and quantity with given contextual information as the input. Our
numerical experiments, which use real-world data obtained from a leading online retailer, demonstrate the superiority
of our approach to benchmark methods: (r, Q) policy, predict-then-optimize (PTO) framework, and E2E method with
given order timing. Furthermore, we analyze the convergence of our E2E approach and provide the upper bounds for its
expected daily average cost under certain conditions. To compare our approach with the optimal policy, we also provide
the lower bounds for the cost of the optimal policy under certain assumptions. Our study elucidates the empirical success
of the E2E method in inventory management. The analysis and results of the numerical experiments also suggest that
considering the order timing and order quantity jointly in an E2E method can lower the lost sales cost more than the E2E
method with a prespecified order timing.

Key words: end-to-end learning, inventory management, fixed ordering cost

1 Introduction
As one of the most fundamental problems in management science, inventory management has been studied
for over a century (e.g., refer to Snyder and Shen (2019)for a review). However, the growth of e-commerce
is currently introducing new challenges for inventory management. The emergence of massive, high-quality
data, including customer information, item (stock keep units (SKUs)) features, and historical sales, is allow-
ing leading e-commerce companies, such as Amazon, to provide a higher level of service with lower overall
operational cost. A high service level, which implies the satisfaction of the needs of various customers for a
wide range of products, is key to the success of retailers when servicing hundreds of millions of customers

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
2

(e.g., Amazon (2021), JD (2021)) in the present competitive globalized e-commerce market. To achieve
a high service level, effective, personalized, and timely inventory management using large-scale data is
critical and highly essential.
The uncertainties of the demand and vendor lead time (VLT) are the main challenges in inventory
management. In recent decades, many studies on inventory management have focused on developing
optimal/near-optimal policies, frequently using distributional assumptions of uncertainty, such as a Poisson
distribution of demand or VLT. However, in practice, generally, the demand and the VLT have significant
nonstationarity and variability; thus, the aforementioned assumptions become unrealistic in many real-world
scenarios. Therefore, data-driven methods without restrictive assumptions are imperative for e-commerce
companies.
In traditional data-driven approaches, in the first stage, the demand and the VLT are estimated, and in
the second stage, certain policies are applied based on these estimations. However, this two-step predict-
then-optimize (PTO) framework is inefficient when there are numerous SKUs, because it is difficult to
accurately predict the demand and VLT of millions of SKUs. Leading online retailers, such as Amazon,
adopt advanced machine learning models to improve their prediction accuracy. However, a higher accuracy
of prediction does not necessarily result in a lower cost of decisions (e.g., see the SPO loss function in
Elmachtoub and Grigas (2021)). Moreover, even with the prediction of the demand and the VLT, solving the
inventory problem is intractable (e.g., Halman et al. (2012)), and companies still need to rely on heuristic
algorithms. In addition, in the PTO framework, the prediction may omit some information from the original
data, and therefore, may not lead to optimal decisions. Thus, compared to the PTO framework, end-to-end
(E2E) solutions that take data as input and directly output inventory decisions become more desirable for
online retailers, to achieve a higher service level.
Replenishment of an order typically involves two types of decisions: order quantity, i.e., the number of
units to be replenished, and order timing, i.e., the time the replenishment order should be placed. Because
of the nonstationarity of the demand and VLT in inventory management, the order timing varies and should
be jointly optimized with the order quantity. However, changing from a regular order schedule to a dynamic
and flexible order schedule is challenging for retailers since they must determine the factors that control their
order frequency. In practice, these factors may include the minimum order quantity required by the suppli-
ers, and the fixed ordering cost (such as the order setup cost, e.g., transportation cost). In addition, because
the fixed ordering cost varies across SKUs, both the order timing and order quantity should be SKU-specific.
Although considering the order timing and the order quantity simultaneously is critical, designing an E2E
framework to output both types of decisions is difficult. Thus, Qi et al. (2020b) (forthcoming in Manage-
ment Science) only considered the order quantity and proposed an E2E framework for replenishment while
assuming a given order timing (for simplicity, we refer to this setting as the End-to-End no fixed ordering
(E2E-NFC) framework). Their method has been implemented by JD.com, a leading e-commerce company

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
3

in China, and is currently responsible for inventory decisions for thousands of products with an inventory
cost reduction of up to 25%. However, although Qi et al. (2020b) demonstrated the empirical success of
E2E methods, there is no theoretical analysis of the E2E methods in multi-period inventory management
before this study.
In this study, we propose an E2E deep learning framework to solve a multi-period inventory problem
considering the fixed ordering cost, lost sales cost, and holding cost, where both the demand and VLT are
uncertain. In addition, we have access to contextual information, also referred to as features or covariates,
on which the demand and VLT distributions depend. To decide the order timing and the order quantity
simultaneously, we incorporate the fixed ordering cost and propose an automatic E2E framework that, given
input features, directly outputs decisions of both the order timing and order quantity. Furthermore, since the
fixed ordering cost is not intuitive to evaluate or calculate in practice, we analyze the relations between the
fixed ordering cost and the minimum order quantity of the output of our framework, which may serve as a
guideline for the retailers to set the values of their fixed ordering cost in practice.
We also theoretically analyze the E2E approach. To examine its convergence, we define a term, condi-
tional averaged optimal decision (CAOD), which can be considered as a mapping from the features to the
weighted average values of the optimal decisions across different scenarios conditional on the observed fea-
tures. We show that under certain assumptions regarding the E2E network structure, the output policy of the
E2E approach asymptotically converges to the CAOD. By comparing the costs of the CAOD of the proposed
E2E and E2E-NFC frameworks, we show that theoretically our E2E approach has much a lower lost sales
cost than the E2E-NFC. To analyze the performance of the CAOD of the E2E framework, we further derive
upper bounds for their expected daily average costs under certain moderate assumptions. The daily demand
and VLT distributions are not assumed to be independent or identical. The main assumptions are that the
expectation of the demand, variances of the demand and squared demand, and VLT are bounded within
certain intervals (Assumptions 1.A, 1.B, 1.C, 1.D). We derive the bounds for fast- and slow-moving items
separately. In both cases, the bounds are tight under asymptotic conditions. To the best of our knowledge,
this is the first bound for the cost of the CAOD in an E2E approach. This method can be adapted to provide
the performance guarantees of the E2E approach in other problems. In addition to the upper bounds, we
provide two lower bounds for the expected cost of the optimal policy in certain special cases. The first lower
bound shows that the naive lower bound from the deterministic economic order quantity (EOQ) model is
tight under certain conditions. The second lower bound shows that the cost of the optimal policy increases at
least in the order of O (σ̄2 ) under certain moderate assumptions, where σ̄2 is the variance of the demand. We
further show that under some smartly designed input feature Xt that satisfies the defined strongly predictive
conditions with parameter Γ, the daily average cost of CAOD is at most o(Γ1.5 ).
The contributions of this study are summarized below.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
4

1. We establish an E2E framework for inventory management with random demand and VLT using deep
learning. This framework directly outputs joint decisions of the order quantity and the order timing,
given input features. To further illustrate how to control the order timing via fixed ordering cost in our
framework, we analyze the relations between the fixed ordering cost and the minimum order quantities
of outputs of our framework.
2. By numerical experiments we show the advantage of our approach over other benchmarks: the (r,
Q) policy, PTO framework with deep learning, and E2E-NFC approach. In particular, our approach
achieves a much lower lost sales cost than the benchmarks.
3. Theoretically, we show that under certain assumptions regarding the E2E network structure, the output
policy of the E2E approach converges to the CAOD, which is defined as a mapping from the feature to
the decisions. We also show that theoretically our E2E approach achieves a much lower lost sales cost
than the E2E-NFC framework.
4. We derive the upper bounds for the expected daily average costs for both fast- and slow-moving items
in our proposed E2E method without assuming that either demand or VLT is independently and iden-
tically distributed (i.i.d.), which are tight under asymptotic conditions. To the best of our knowledge,
this is the first performance guarantee for CAOD. We further show that when the feature vector is
smartly designed such that the defined strongly predictive condition is satisfied with parameter Γ, then
the output of CAOD achieves a small daily averaged cost in the order of o(Γ1.5 ).
5. We characterize the relations between the fixed ordering cost and the order quantity in our E2E
approach, which provide some insights on how to set the value of fixed ordering cost in practice. We
also provide tight lower bounds for the expected daily average costs of the CAOD in special cases.
The remainder of this paper is organized as follows. In Section 2, we summarize the literature on stochas-
tic inventory management and joint estimation–optimization approaches. In Section 3, we describe our E2E
framework. In Section 4, we define the term, CAOD, and present the validation of our E2E model by show-
ing that it converges to the CAOD. In Section 5, we present the considered assumptions and derivation of the
bounds for the CAOD under certain additional assumptions. In particular, we provide the bounds for fast-
and slow-moving items separately. In Section 5.4, we provide smaller upper bounds when the feature has
some strong predictive power. Next, in Section 5.5, we analyze how to set the value of fixed ordering cost
in practice, based on the value of desired minimum order quantity. In Section 5.6, we derive a tighter upper
bound for the daily average cost, when the unit lost sales cost is small. Subsequently, we present two lower
bounds for the costs of the optimal policy under some similar assumptions. In Section 6, we describe the
data from JD.com, a leading e-commercial company, used to perform numerical experiments. Our experi-
ments show that our E2E model achieves lower cost than the (r, Q) policy and the E2E-NFC framework. In
the e-companion, we present proofs, additional formulations, additional discussions, and sensitivity analysis
results.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
5

2 Literature Review
A multiperiod newsvendor problem is among the fundamental problems in supply chain management and
has vast literature in the classic inventory theory. Please refer to Snyder and Shen (2019) for a detailed
review. In this section, we discuss the following three classes of literature to appropriately position our
study.
1. Approximation algorithms for newsvendor and dynamic lot-sizing problems For deterministic
lot-sizing problems, various algorithms have been developed, e.g., Wagner and Whitin (1958), Brahimi
et al. (2006), Sandbothe and Thompson (1990), and Aksen et al. (2003). As will be discussed in Section
3.2, these methods can be used in the labeling process in our E2E framework.
For a multiperiod newsvendor problem with no fixed ordering cost, when the distribution of the
demand is known, the base stock policy is one of the optimal policies. If the fixed ordering cost is
positive, the (s, S) policy is one of the optimal policies, as proved by Iglehart (1963) and Zheng (1991).
Dynamic programming is frequently used in studies to find the optimal parameters in the (s, S) pol-
icy, e.g., Guan and Miller (2008). With further consideration of a random lead time, revised dynamic
programming can still be adopted. For example, in Jiang and Guan (2011), Huang and KüçüKyavuz
(2008), and Alp et al. (2003), different dynamic programming algorithms are proposed while assum-
ing that all the demands are satisfied. Because dynamic programming is computationally inefficient
when the planning horizon is large, different approximation algorithms have been developed, e.g.,
Levi et al. (2007a), Levi et al. (2008a) and Levi et al. (2008b). In Levi and Shi (2013), an efficient
algorithm assuming the VLT as a known constant was proposed, and the cost of their algorithm is
guaranteed to be no more than 3 times the cost of the optimal expected cost. For multi-echelon sys-
tems with multiple products, Levi et al. (2017) established different approximation algorithms based
on a new cost-accounting scheme. The aforementioned literature assumed that the demand distribu-
tion is known; however, for large-scale inventory systems, the prediction of demand distribution is a
challenging task.
2. Data-driven algorithms for inventory management problems. In data-driven algorithms for inven-
tory management problems, retailers do not assume the demand distribution; however, they have access
to demand samples. One well-known method, the sample average approximation (SAA), has been used
in inventory management problems with a nonparametric demand distribution. For example, Levi et al.
(2007b), Levi et al. (2015), and Cheung and Simchi-Levi (2019) used the SAA method to solve mul-
tiperiod newsvendor problems. Another class of these studies assumes that retailers learn the demand
on the fly while minimizing regret. For inventory management problems in online learning scenarios,
Huh et al. (2009), Huh and Rusmevichientong (2009), Agrawal and Jia (2019), Yuan et al. (2019),
and Zhang et al. (2020) achieved different results under different assumptions using different gradient

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
6

descent methods. For a detailed review of data-driven algorithms for inventory management problems,
please refer to Section 4 of Qi et al. (2020a). However, most studies in online settings assume the
demand to be i.i.d. or at least independent, which may not be in practice. Wagner (2011) used linear-
fractional programming and duality theory to solve an online lot-sizing problem in which the demand
was not necessarily i.i.d., and analyzed the competitive ratio. Thus far, our E2E model deviates from
the above two classes by assuming that the demand and VLT distributions are unknown and that the
daily demand and VLT are not identical or independent.
Another line of related literature is to use deep reinforcement learning to solve multi-period inven-
tory management problems. We refer to Boute et al. (2021) and Gijsbrechts et al. (2021) as representa-
tives. Because of the dynamic nature of the multi-period, reinforcement learning is a natural alternative
compared to our proposed method. Our method adopts a supervised learning approach and provides
a clear and scalable labeling process that augments the original demand and VLT observations to a
training dataset. Instead, deep reinforcement learning methods generally require more a complicated
re-sampling process to approximate the value functions.
3. Joint estimation–optimization methods. The third class related to our study comprises joint
estimation–optimization approaches. For example, Liyanage and Shanthikumar (2005) combined the
demand estimation and optimization of a newsvendor problem and proposed a biased policy called
operational statistics. Under several types of demand distributions, Liyanage and Shanthikumar (2005)
showed that this biased policy achieves a higher profit than the maximum likelihood estimation.
Subsequently, Chu et al. (2008) used a Bayesian analysis to estimate operational statistics. Klabjan
et al. (2013) further generalized a Bayesian model to a mini-max robust model using histograms,
and thereby integrated the estimation and prediction stages. Additionally, Beutel and Minner (2012)
assumed demand as a linear combination of some exogenous variables and a random shock and
designed a data-driven framework to minimize the in-sample cost by linear programming. Ban and
Rudin (2019) proposed an algorithm based on empirical risk minimization rules and used kernel opti-
mization to integrate the optimization and prediction stages. Considering contextual information, Beu-
tel and Minner (2012) and Meller and Taigel (2019) presented algorithms for minimizing the in-sample
cost. Bertsimas and Kallus (2020) and Bertsimas and McCord (2019) provided an asymptotically opti-
mal proof of the weighted SAA method and derived the sample complexity bound. Elmachtoub and
Grigas (2021) and Ho-Nguyen and Kılınç-Karzan (2020) used the decision cost to measure the pre-
diction loss in the training stage. Different from the above methods, Donti et al. (2017) proposed an
E2E approach to estimate the model parameters by minimizing the expected loss. They demonstrated
the advantage of the E2E method in a classical inventory stock problem, a real-world electrical grid
scheduling task, and a real-world energy storage arbitrage task. Cristian et al. (2022) propose a neural
network architecture that can map the features to the feasible space and apply it to a multi-location

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
7

newsvendor problem with cross fulfillment. Another type of E2E approach predicts some uncertain
parameters or decisions in a problem that help to find the optimal decision, instead of directly predict-
ing random variables. For example, Duan et al. (2020) applied an E2E framework in a graph clustering
problem and Nazari et al. (2018) used an E2E method to solve a vehicle routing problem.
Unlike the above E2E learning methods, Qi et al. (2020b) proposed an E2E framework that first
labels contextual information with the optimal decisions and subsequently uses a deep neural network
to fit the optimal decision mapping. In our study, while training the neural network, we adopt a similar
labeling method and loss function as Qi et al. (2020b).
Our study is different from that of Qi et al. (2020b) as follows:
• Qi et al. (2020b) considered a prespecified order timing, and thus, did not include the fixed order-
ing cost; in comparison, we consider the order timing. By including the order timing, our method
enables more model flexibility and better performance. Moreover, as shown by the experiments,
our proposed method reduces the risk of stockout by simultaneously deciding the order timing
and quantity.
• We derive the theoretical performance bounds for our proposed model, whereas Qi et al. (2020b)
focused on the practicality and applicability of their proposed model. Moreover, we show that in
theory, our model leads to much lower lost sales costs than the E2E-NFC approach under certain
special cases. To the best of our knowledge, there is no performance guarantee for any existing
E2E method for solving multiperiod inventory management problems.

3 E2E Framework with Fixed Ordering Cost


In this section, we introduce an E2E framework that jointly decides the order timing and the order quantity.
Before we provide the detail of the E2E model, we first introduce the mathematical formulation of a multi-
period inventory management problem with fixed ordering cost.

3.1 Multiperiod Newsvendor Model with Fixed Ordering Cost

Let the considered planning horizon be T . On Day 0, we need to decide the order quantity on Day 0. We
assume that the excess demand is lost and incurs an immediate lost sales cost. The excess inventory at
the end of the day is carried over to the next day. At the end of the planning horizon, we assume that the
salvage value is zero. The total cost for an order is composed of fixed ordering and linear purchase costs.
We use K, h, and p to represent the fixed ordering cost, holding cost per unit, and lost sales cost per unit,
respectively. We denote the uncertain demand and VLT on Day t by Dt and Lt , respectively. We assume
that the contextual information on Day t is xt , and the sample space of xt is X , i.e., xt ∈ X . xt may include
T T
the previous demand, VLT, vendor information and seasonality. ({Dt }t=0 , {Lt }t=0 , xt ) is from a joint fixed
distribution that is unknown to the decision-maker. The realization of lead time Lt on Day t is unknown

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
8

until we receive the order on Day t + Lt . We further define function (·)+ = max(·, 0). Thus, our problem of
interest can be formulated as follows:
h T i
min E ∑(Kat + hyt + pzt ) xt (P1)
qt ,yt ,at ,zt
t=0

s.t. yt = (yt−1 + ∑ qi − Dt )+ ∀ t = 0, ..., T (1)


i∈{ j: j+L j =t}

zt = (Dt − ∑ qi − yt−1 )+ ∀ t = 0, ..., T (2)


i∈{ j: j+L j =t}

qt ≤ Mat ∀ t = 0, ..., T (3)

yt , zt , qt ≥ 0 ∀ t = 0, ..., T (4)

at ∈ {0, 1} ∀ t = 0, ..., T (5)

yt , zt , qt ∈ Z+ ∀ t = 0, ..., T (6)

In (P1), qt is the order quantity placed on Day t. yt and zt are the inventory level and number of lost sales
on Day t. We assume the initial inventory level to be y−1 . at is a binary variable indicating whether an order
is placed on Day t. M is a sufficiently large number. The objective of problem (P1) is to minimize the total
fixed ordering cost, holding cost, and lost sales cost. The objective function does not include the linear part
of the purchase cost. We will present the corresponding reasons in Section 5 while stating the assumptions.
Constraints (1) are the inventory balance constraints, in which multiple orders may arrive on the same day.
Constraint (2) is the definition of lost sales. Constraints (3) require that if there is no fixed ordering cost
on Day t, the order quantity on Day t should be zero. Constraints (5) are the binary constraints for at , and
constraints (6) require that the inventory must be nonnegative integers, which can be relaxed to nonnegative
real numbers in certain cases. The order quantities in the history (q−1 , q−2 , ...), and the distribution of the
VLT in the history, (l−1 , l−2 , ...), are given as the input.

3.2 Labeling Process

An appropriate training dataset is a critical ingredient to train a deep neural network. Each sample from
the training dataset should consist of an observation of the feature and the associated target/label. The
neural network aims to learn the mapping from the input feature and the output labels. Unfortunately, as
the original dataset only contains demand and VLT observations, it cannot be directly used as the training
dataset because the labels are unknown. We would like to point out that it is pointless to use the historical
ordering decisions because we are seeking better solutions compared to current practice.
In the remainder of this section, we introduce a labeling process that calculates the ex-post optimal orders
in history. Although this method shares the same spirit as the labeling process proposed in Qi et al. (2020a),
introducing the fixed ordering cost requires distinct computational methods to calculate the labels. As both
the demand and VLT are observed in historical data, Problem (P1) reduces to a deterministic lot-sizing

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
9

problem with lost sales cost, which is presented in E-companion E as Problem (P2). In Problem (P2), the
range of both the historical demand and VLT is from Day −TH till Day −1, and random variables Dt and
Lt are replaced with the realized values. Thus, the daily optimal order decisions in history are obtained by
solving Problem (P2). This deterministic lot-sizing problem with lost sales cost can be solved by dynamic
programming, similar to Aksen et al. (2003), with complexity O (T 2 ). It can also be solved efficiently using
commercial solvers.
In practice, retailers collected historical demand and VLT data over a long period for the training dataset,
e.g., three years; therefore, TH is typically a large number. We would also like to clarify that, in this work,
we do not particularly address demand censoring. Instead, we refer to existing literature that focuses on
overcoming demand censoring by approximately recovering demand, e.g., Huh et al. (2011). Also, the
retailers may recover censored demand as a data pre-processing step in practice, especially when there are
enough observed data. The processed demand data thus can be used in various decision-making procedures.
In this work, we assume that the demand is fully observed or sales data has been pre-processed to recover
the uncensored demand.
Let q∗ (D, L) represents the multi-period solutions for Problem (P2) with demand vector D and VLT
vector L. Within the replenishment vector q∗ (D, L), the replenishment amount on the current day t, qt∗
is of our special interest. It is because, as practiced by many e-commercial companies, such as JD.com,
the replenishment schedules are updated every day, and we only need to learn qt∗ , the order decision on
the current day. Furthermore, this decision can be decomposed into two parts: whether an order should be
made on Day t and the order quantity. Therefore, we decompose the order sequence into two decisions in
reference to time t: next-order time ot∗ and order quantity wt∗ . To be more specific, we define ot∗ := inf{i ∈
Z : i ≥ 0, q∗i+t 6= 0} and wt∗ := q∗ot∗ +t . Intuitively, ot∗ is the number of days left before the next-order time, and
wt∗ is the next-order quantity. Therefore, the goal of the E2E model is to learn the mapping from the input
feature vector to both output quantities: ot∗ and wt∗ . We denote the mapping from the replenishment vector
q∗ (D, L) to (ot∗ , wt∗ ) at time t as πt : RTH →
− R2 .

3.3 Training the E2E Model

In our proposed E2E Model, we use a neural network to fit the potentially complicated relationship between
the optimal decisions and the contextual information. The input of the neural network is the optimal
order sequence obtained using the previously stated labeling process, history data, and contextual informa-
tion. To be more specific, the input data can be represented by five sequences of contextual information
−1+t
({Xi }i=−T̃ +t
, {Di }−1+t
i=−T̃ +t
, {Ai }−1+t
i=−T̃ +t
−1+t
, {Gi }i=−T̃ +t
, {Ii }−1+t
i=−T̃ +t
). Xt is a feature obtained before time t, e.g., it can
be the product type, the distribution center and some seasonality factors such as weekends or holidays. Dt
is the demand observed before time t. At and Gt are the histories of the order timing and order arrival time
before time t, respectively. It is the historical inventory before time t. T̃ is the length of the input contextual

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
10

data. Consequently, the contextual information on Day t is from Day −T̃ + t till Day −1 + t. Similarly,
the output of the neural network can be represented by (Out 1, Out 2, Out 3, and Out 4). Out 1 is the VLT
prediction, whose corresponding label is lt . Out 2 is the estimated next-order arrival time, whose label is
denoted as rt∗ , and rt∗ := ot∗ + lt . The last two outputs are the ordering decisions that are to be made, whose
labels are ot∗ and wt∗ , respectively.
Consequently, the objective of the training is to fit a function f such that
f ({Xi }−1+t
i=−T̃ +t
, {Di }−1+t
i=−T̃ +t
, {Ai }−1+t
i=−T̃ +t
−1+t
, {Gi }i=−T̃ +t
, {Ii }−1+t
i=−T̃ +t
) is close to (ot∗ , wt∗ ).
Figure (1) shows the structure of the proposed SKU-specific neural network model, i.e., the corresponding
structure of f (·). In Figure (1), we assume that the general feature, Xt , can be ignored because the network
is only trained for one specific SKU. In the figure, Optimal Order Quantities, Optimal Arrival Quantities,
and Stock denote the vectors of the optimal order quantities, optimal order arrival quantities, and optimal
inventory level in the history, respectively, which can be obtained in the labeling process, as discussed in
Section 3.2. The four sequences of the inputs are contextual information. Since the input is in the form of a
time series, we use long short-term memory (LSTM, Hochreiter and Schmidhuber (1997)) layers to encode
all input information. LSTM is a widely-adopted type of recurrent neural network (RNN) Medsker and Jain
(2001)), especially for encoding time-series input. It is natural to use LSTM as an encoder in our E2E model
because the input is a sequence of time-series data. In the E2E model, we rectified linear unit (ReLU) as the
activation function. Relu is defined as a piecewise linear function that will output the input directly if it is
positive, otherwise, it will output zero. For more details of terminologies about neural networks, we refer
the readers to Schmidhuber (2015). There are three independent LSTMs, and each outputs Out 1, Out 2,
and Out 4, respectively. Out 3 is simply max{Out 2 − Out 1, 0}.

Figure 1 Network Structure

The objective function of the training process is defined as:

min :
ϑ1 ,ϑ2 ,ϑ4
∑ {λ1 Lϑ (Out1, lt ) + λ2 Lϑ (Out2, rt∗ ) + λ3 Lϑ ,ϑ (Out3, ot∗ ) + Lϑ (Out4, wt∗ )}
1 2 1 2 4
(7)
t∈TI

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
11

TI is the total training period; ϑi denotes the weights of the network that need to be trained; and λ1 , λ2 , and
λ3 are positive constants. λ1 , λ2 , and λ3 can be used to adjust the length of the step at each iteration in the
gradient descent. λ1 and λ2 are typically smaller than λ3 .
The four terms in the objective function in Equation (7) L (·, ·) are the squared loss. The first two terms are
the loss of the predicted VLT and the predicted next-order arrival time, respectively. Because Out1, Out2,
and Out4 are the outputs of three independent networks, the weights of each network can be optimized
independently. The above neural network is designed with the knowledge that the prediction of the VLT
is independent of the demand and the inventory. As will be discussed in Section 5, these designs of the
prediction enable us to analyze the convergence of our framework.
The criterion for ending the training process can be either the convergence of the training loss or the
completion of a certain number of iterations. It should be noted that when applying the trained model to
a test set, the outputs need to be further rounded off to nonnegative integers. Subsequently, the following
rules are adopted to decide the order quantity at time t. When ot∗ = 0, we place an order immediately with
order size wt∗ ; otherwise, we do not place an order at time t.

4 Convergence Analysis
In this section, we define the mapping, conditional averaged optimal decisions(CAOD), for the E2E method
in Definition 1. CAOD is a key concept in analyzing the theoretical performance guarantee of the cost of the
E2E policy.
A general setting of unconstrained contextual stochastic programming. Here we would like to detour
from the inventory management setting and introduce the definition of CAOD in the general setting of uncon-
strained contextual stochastic programming. In the standard setting of contextual stochastic optimization
programming, there is a random parameter ξ ∈ Rn2 , which is the uncertainty, and contextual information
X ∈ X ⊆ Rn3 which serves as the predictor of the random parameter. The decision ω ∈ Rn1 is supposed to
minimize the cost function h(·, ·) : Rn1 × Rn2 → R.
In our previously mentioned inventory management setting, the correspondence of the random parameter
is demand Dt and VLT Lt , while the contextual information being the feature vector Xt .

D EFINITION 1 (CAOD). Under the setting of the unconstrained stochastic programming problem, let us
use ω∗ (ξ) : Rn2 → Rn1 to denote the optimal solution mapping ω∗ (ξ) := arg minω h(ω, ξ)1 The conditional
averaged optimal decisions, ω̃(X), are defined as ω̃(X) := E[ω∗ (ξ)|X].

Subsequently, we specify the mapping, CAOD, in our E2E framework for the inventory management
T +t
problem. For simplicity, we denote vectors {li }i=t and {di }Ti=t+t in bold letters lt and dt , respectively. Then,
we consider Problem (P1) in a rolling horizon from Day t to Day t + T . The total cost (Objective (P1)) can
be written as a function of the demand, VLT, and order quantities, which we define as C(dt , lt , qt ), where qt

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
12

denotes the replenishment decisions from Day t till Day T +t, i.e., qt := {qi }Ti=t+t . From the labeling process,
as described in Section 3.2, we obtain q∗t (dt , lt ) = arg minqt C(dt , lt , qt ), for t = −TH , ..., −1. Lower letters
dt and lt denote the realizations of the demand and VLT vectors recorded in the historical data, respectively,
and capital letters Dt and Lt denote the corresponding vector-valued random variables.
Recall that πt (·) is the mapping that converts q∗ (D, L), the optimal solution of Problem (P2), to (ot , wt ),
as stated in Section 3.2. Subsequently, the labels of Out 3 and Out 4, denoted by ot∗ and wt∗ can be obtained
by πt (q∗ (Dt , Lt )). Thus, we can establish the CAOD in terms of the order quantity and the order timing, as
mentioned in Definition 2.

D EFINITION 2. The output values (õt∗ , w̃t∗ ) of the CAOD are defined as (õt∗ , w̃t∗ ) = E[πt (q∗t (Dt , Lt ))|Xt ].

R EMARK 1. In general, the CAOD may return values outside the feasible space. For example, when the
feasible set is not continuous or convex, the return values of CAOD are frequently infeasible. Hence, in
practice, certain post-processing is required for the output of the E2E model, such as the transformation
of the variables into integers or the projection of the decisions into the feasible space. For example, in an
inventory management setting, the output order quantity needs to be transformed to obtain integers. If we
further consider the perishable products or the inventory capacity, then the output should be transformed
to the feasible region. Different projection rules result in different practical performances. However, in this
study, when deriving the performance guarantees, we relax the integer requirements of the order quantity
and the order timing and assume that these projection rules have a limited influence, which is true when
the inventory is large. In the numerical experiments, we round off the output values. (õt∗ , w̃t∗ ), to the closest
integers, and the practical performance shows that this simple projection rule is effective.

R EMARK 2. In general, the convergence result depends on the loss function, L (·), defined while training.
In our model, the MSE can be adopted as the loss function. In this case, the outputs converge to the CAOD.

To analyze the convergence of the E2E framework, we provide the minimizer of the loss function in
Lemma 1.

L EMMA 1. The expectation of the E2E loss function in Equation (7) is minimized at the CAOD (õt∗ , w̃t∗ ).

We restore the order vectors by defining q̂t (·) as follows:


(
  w̃∗i , if õ∗i = 0
q̂t {õ∗j }Tj=0 , {w̃∗j }Tj=0 [i] = for i = 0, ..., T (8)
0, if õ∗i 6= 0
 
Note that the order decision vector of the CAOD, q̂t {õ∗j }Tj=0 , {w̃∗j }Tj=0 , does not necessarily minimize
E[C(Dt , Lt , qt )]. Because our neural network uses an LST M to decode the sequence data and the flexibility
and generalization bound for an RNN network have been proven (e.g., Akpinar et al. (2019)), these results
can be used to prove the flexibility of the E2E model. Particularly, given the contextual information and

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
13

the sample size, n, let (ōt∗ , w̄t∗ ) represent the minimizers of the loss function in Equation (7). We consider
the case where the hypothesis class is well-specified. The well-specified hypothesis class is defined as the
hypothesis class where there exists a set of parameters for the neural network, such that ōt∗ = õt∗ , w̄t∗ = w̃t∗ . We
refer to Chapter 4 of Liang (2016) for a rigorous definition and more details of a well-specified hypothesis
class. Then, we can obtain Proposition 1, which yields asymptotic guarantees for convergence. We use X +
to denote the support of features, i.e., X + := {x|P(X = x) > 0}.

P ROPOSITION 1. Suppose that the hypothesis class of the neural network adopted in the E2E framework
is well-specified. Suppose there exists ν0 > 0, such that P(X = x) ≥ ν0 > 0, for any x ∈ X + . Given any ε > 0
and probability θ ∈ (0, 1), there exists a positive number Sεν0 ,θ . When the sample size is larger than Sε,ν0 ,θ ,
for any Xt ∈ X , we have
   
|E[C(Dt , Lt ,q̂t {õ∗j }Tj=0 , {w̃∗j }Tj=0 )|Xt ] − E[C(Dt , Lt , q̂t {ō∗j }Tj=0 , {w̄∗j }Tj=0 )|Xt ]| ≤ ε.

The detailed proof of Proposition 1 can be found in the e-companion. In E-companion D, we compare
the CAOD of the E2E and E2E-NFC approaches, and demonstrate that the E2E method has lower lost sales
costs in some special cases.

5 Performance Guarantees of the Converged Outputs


As presented in Section 4, the output policy of the E2E method converges to the cost of the CAOD, which
is not necessarily the optimal policy. In both simulation and practice, an E2E policy performs very well,
which motivates the derivation of a bound for the cost of the CAOD.
Although in general, it is intractable to derive the bound for the E2E approach to solve a stochastic
programming problem, in this section, by considering the order decisions in the form of (ōt∗ , w̄t∗ ), the cost
of CAOD can be bounded under the following assumptions:

A SSUMPTION 1. The distributions of the demand and VLT satisfy


A. The expectation for the demand on each day is positive and bounded, i.e., β := maxi=0,...,T {E[Di |Xt ]},
α := mini=0,...,T {E[Di |Xt ]}, α, β > 0.
B. The variance of the demand on each day is bounded by σ2 , i.e., Var(Di |Xt ) ≤ σ2 , ∀ i = 0, ..., T , σ > 0.
C. The variance of the squared demand on each day is bounded by σ2s , i.e., Var(D2i |Xt ) ≤ σ2s , ∀ i = 0, ..., T ,
σs > 0
D. The VLT on each day is bounded almost surely, i.e., Li ∈ [vl , vu ], ∀ i = 0, ..., T , vl , vu ≥ 0.

E. The unit lost sales cost satisfies p ≥ 2Kh.
F. The VLT and the demand are independent.
G. There is no crossover of the VLT, i.e., an order is placed after the previous orders arrive.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
14

Assumptions 1.A and 1.D state that the expectations of both the demand and VLT for each day are
bounded in certain known intervals. The demands or the VLTs are not necessarily i.i.d. Assumptions 1.B
and 1.C state that the variances of the demand and the squared demand for each day are bounded by σ2 and
σ2s , respectively. These assumptions are generally satisfied in the real world owing to the boundedness of
the demand within one day. Note that α, β, σ, and σ2s depend on the feature vector, Xt . In practice, given Xt ,
the conditional expectation and the conditional variance of the demand can be small. Assumption 1.E is a
necessary condition to ensure that given the outcome of the demand, the labeling problem (P2) has no lost
sales. As all demands are satisfied in during the labeling process, the total demand is always equal to the
total order quantity in the labeling stage. Therefore, including and excluding the total linear purchase cost
in the objective in Problem (P2) yields the same label result. For the second stage when analyzing the cost
with stochastic demand in Problem (P1), the analysis result of the bound is also the same, except for an
additional purchase cost, which is the maximum order quantity (whose upper bound derivation is presented
subsequently in the paper) multiplied by the unit purchase cost, p. Therefore, for simplicity, we exclude
this linear purchase cost throughout the study. Assumption 1.F enables the separate analysis of the worst
cases of the VLT and the demand. This assumption is appropriate in the real world because the VLT on the
supply side does not directly influence the demand on the demand side. Furthermore, Assumption 1.G is a
common assumption for the VLT (as stated in Qi et al. (2020b)).
To evaluate the cost of the CAOD, we calculate the long-run average cost per period. Without the loss of
generality, it is assumed that at the beginning of the planning horizon, T0 , the demand is positive (otherwise,
we can postpone the planning horizon until the expected demand on that day is positive).
Without the loss of generality, we assume the first-come-first-serve principle for the inventory. Specifi-
cally, the inventory that arrives earlier satisfies the demand first. Subsequently, to derive the cost per day,
the total planning horizon is divided into multiple cover periods. The cover period for one order refers to
the time interval from the day when the inventory from this order begins to satisfy the demand to the day
when the inventory from this order reaches zero. By this definition, the cover period of one order may begin
later than the arrival time if there are some inventories remaining from the previous orders when this order
arrives. The lifetime of one order has two parts: the VLT and the cover period. We assign the total cost
to each order based on the following rules. The holding cost incurred by the inventory of one order, fixed
ordering, cost and lost sales cost between the end of the last cover period and the beginning of the current
cover period is assigned to this order. Following these assigning rules, we neglect the potential lost sales
cost after the last cover period ends in the planning horizon. This is reasonable because this expected lost
sales cost is bounded, and when the planning horizon is sufficiently long, the daily average cost from this
type of lost sales converges to zero. This type of lost sales cost is bounded because it is no more than the
product of the maximum cover period and the maximum demand on each day; both of them are shown to
be bounded in the proofs in Sections 5.2 and 5.3. Two examples of the cover period and the VLT are shown

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
15

in Figure 2. We present the mathematical expression for the cover period in Section 5.1 after introducing
more notations.

Figure 2 Examples of cover periods and VLT


In the example, the demand is one per day and the initial inventory is four. Two orders are placed between Days 0
and 20. The first order is placed on Day 2 with the VLT being four. The first order arrives on Day 6 and is depleted on
Day 11. Thus, the cover period for the first order is from Day 6 till Day 11. The second order is placed on Day 7 and
arrives on Day 9. As the consumption of the second order begins on Day 11, the cover period for the second order is
from Day 11 till Day 16. The lost demands on Days 4 and 5 are assigned to the cost of the first order.

By the ZIO (zero inventory order) property for a deterministic lot-sizing problem stated in Wagner and
Whitin (1958), in the optimal solutions, the demand within all cover periods should be satisfied by the
corresponding orders and the inventory level should decrease to zero at the end of each cover period to
eliminate the excess holding cost. As the total cover period may be shorter than the total planning horizon
(in this case, the demand for certain days is lost), the uniform upper bound for the daily average cost for
each cover period is also the upper bound for the daily average cost of the planning horizon.
Furthermore, let the length of one cover period be τ0 and denote the average cost per day within the
planning horizon as Ĉ (Dt , Lt , qt ). The sequences of the demand and VLT are Dt and Lt respectively, and
 
the order vector is qt . For simplicity, we denote the order vectors of the CAOD, q̂t {õ∗j }t+T
j=t , { w̃∗ t+T
}
j j=t , as q̃t ,

which contains the order decisions following the E2E policy from Day t till Day t + T . Thus, the cost of the
CAOD, q̃t , is E[Ĉ (Dt , Lt , q̃t )|Xt ].
When analyzing the cost in each cover period, the CAOD can be represented by two decisions: (õt∗ , w̃t∗ ),
which are the order time and the order quantity for the first cover period, respectively. We partition the cost
for one order into two parts. The first part is the cost during the cover period, which includes the holding
cost, fixed ordering cost, and possible lost sales cost on the last day of that cover period. The other part is the
cost during the VLT, which includes the lost sales cost and extra holding cost during the VLT. Subsequently,
we divide the total cost by the length of one cover period to obtain the daily average cost for one order.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
16

Organization. In the rest of Section 5, we first discuss the analysis of the order timing, õt∗ , in Section 5.1,
where we present the design of an equivalent dynamic (r, Q) policy for the CAOD. Next, we analyze the per-
formance of the expected daily average inventory cost of CAOD when the product is fast- and slow-moving,
respectively in Sections 5.2 and 5.3. These results depend on demand and VLT distribution conditioned
on given contextual information. In Section 5.2.1, we provide the sketch of proof for upper bound for the
expected daily averaged cost, presented in Section 5.2. In Section 5.4, we consider the case when contex-
tual information has strong predictive power of the uncertainty. We then provide the upper bounds on the
expected daily average inventory cost of CAOD. In Section 5.5, we consider a similar setting that requires a
minimum order quantity in each order instead of paying a fixed ordering cost. We analyze the relationship
between the fixed ordering cost and minimum order quantity. Since the fixed ordering cost is the key to
controlling the order frequency in our framework, and its value is difficult to estimate in practice, we discuss
the relations between the fixed ordering cost and minimum order quantity in Section 5.5. In Section 5.6, we
discuss how to derive tighter upper bounds of cost of our E2E framework when assumption 1.E is relaxed.
Finally, we provide some lower bounds for the cost of the optimal policy in some special cases.

5.1 Order Timing

In this section, we present the analysis of the equivalent conditions under which we place an order on Day
+t
t under the CAOD. Let the inventory level on Day t be It , and let the VLT vector be lt = {l j }Tj=t . We define
function γ(Dt , It ) as γ(Dt , It ) = mini {i : ∑ij=0 Dt+ j ≥ It }. For simplicity, we denote E[lt |Xt ] as l¯t . γ(Dt , It ) is
the number of days from Day 0 when the inventory level tends to zero. Lemma 2 shows that we place an
order, if and only if õt∗ is zero.

L EMMA 2. Given demand Dt , õt∗ = 0 ⇐⇒ E[γ(Dt , It )|Xt ] ≤ l¯t .

We define s̃t∗ := arg minIt {E[γ(Dt , It )|Xt ] ≥ l¯t }. Thus, when the inventory level reaches below s̃t∗ , we place
an order on Day t. Therefore, s̃t∗ and w̃t∗ can be considered as the re-order point and the re-order quantity
on Day t, respectively. Thus, the CAOD are equivalent to a dynamic (r, Q) policy with parameters (s̃t∗ , w̃t∗ ).
Note that it is difficult to determine how r and Q depend on the contextual information in the dynamic (r, Q)
policy using data-driven methods.
When It = s̃t , the length of cover period T can be written as

T (Dt , Lt , It , (s̃t∗ , w̃t∗ )) = γ(Dt+Ts , w̃t∗ )

where Ts is the starting time of the cover period and Ts := max{Lt , γ(Dt , It )}. By Lemma 2, we can define
the CAOD in terms of s̃t and w̃t∗ . Particularly, we define CT (Dt , Lt , It , (s̃t∗ , w̃t∗ )) as the daily average cost (total

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
17

cost divided by the length of the cover period) within the first cover period after time t, given reorder point
s̃t∗ , order quantity w̃t∗ , and inventory level It . Mathematically, CT (Dt , Lt , It , (s̃t∗ , w̃t∗ )) can be written as
K + C1 + C2 + C3 + C4
CT (Dt , Lt , It , (s̃t∗ , w̃t∗ )) =
T (Dt , Lt , It , (s̃t∗ , w̃t∗ ))
T +T (Dt ,Lt ,It ,(s̃t∗ ,w̃t∗ )) j
S
where C1 := h ∑ j=Ts
(w̃t∗ − ∑i=Ts
Dt+i )+ denotes the holding cost within the cover period, C2 :=
T +T (Dt ,Lt ,It ,(s̃t ,w̃t∗ ))
S
p(∑i=Ts
Dt+i − w̃t∗ )+ denotes the lost sales cost within the cover period, and C3 := hw̃t∗ (γ(Dt , It )+
1 − Lt )+ denotes the holding cost within the VLT. C4 denotes the lost sales cost within the VLT. In particular,
C4 can be written as follows:
(
0, if Lt ≤ γ(Dt , It )
C4 = (9)
p ∑Li=γ(D
t
t ,It )+1
Dt+i , if Lt > γ(Dt , It )

We notice that the lost sales cost within the cover period, C2 , can only occur on the last day of the cover
period. In the following sections, without loss of generality, we claim that the current inventory position is
no lower than s̃t∗ . This is true because following the CAOD, when the inventory position at the end of the
day is less than s̃t∗ , we place an order to raise the inventory position. Therefore, for simplicity, we ignore
the input, I t , and rewrite the daily average cost function as CT (Dt , Lt , (s̃t∗ , w̃t∗ )). Subsequently, if we obtain
h i
the upper bound for E CT (Dt , Lt , (s̃t∗ , w̃t∗ )) Xt for each cover period, which is also the upper bound for the
average cost of the CAOD, E[Ĉ (Dt , Lt , q̃t )|Xt ]. In Sections 5.2 and 5.3, we present the procedure to bound s̃t∗
and w̃t∗ under certain additional assumptions.
To effectively obtain performance bound for the CAOD output, we consider fast-moving and slow-moving
products separately in Sections 5.2 and 5.3. As the name suggests, fast-moving products are items that
are sold quickly and slow-moving items are products that have a low turnover rate and stay longer in the
warehouse. It has been a tradition in retail industry to design separate inventory management algorithms.
In the remainder of this section, we provide different upper bounds on the expected daily average inventory
cost of the output CAOD for fast- and slow-moving items respectively.

5.2 Case 1: Fast-moving Items

For fast-moving items that sell quickly, we further assume that daily demand has to be positive.

A SSUMPTION 2. The demand on each day is positive., i.e., Di ≥ 1, ∀ i = 0, ..., T .

Note that without loss of generality, we assume that demand in every time period has to be greater than one.
Thus, we proceed and bound the expected daily average cost of CAOD in the following theorem.
q
T HEOREM 1. Suppose Assumptions 1 and 2 hold, and we set τ = 2K h
+ vu . Then, the expected daily
average cost of CAOD is at most:
r
Kh 2 √ √ √
E[Ĉ (Dt , Lt , q̃t )|Xt ] ≤ (β − α2 + σ2 + 2 τσs + α)(β − α + 2 τσ + 1)+
2

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
18
h √ √ √ √ ihq √ √ i
K h(β + τσ) + p h(β2 + σ2 + 2 τσs − α2 ) 2β − 2α + 4 τσ + α1 + α1 τσ
√ +
2K
√ √ √ αl¯t pβ
h l¯t (β + vu σ)(β − α + 2 τσ + 1) − vl β + (β2 + σ2 + τσs )vu p −


1 + vu σ
, where α, β, σ and σs are parameters from Assumption 1.

R EMARK 3. Solving the optimal policy or evaluating the minimum cost of the optimal policy is very
intractable in view of the uncertainty of both the correlated demand and VLT. Although it is difficult to
directly compare our upper bound with the optimal policy, we can show that our upper bound in Theorem
1 is close to the optimal policy under certain asymptotic conditions. Particularly, Under certain special con-
ditions, for example, when the demand has zero variance and when VLT is known, the result presented in
Theorem 1 is tight. To observe this, when demand variance σ tends to zero, σs also tends to zero. The first
two terms in the right-hand-side of Theorem 1 comprise the maximum daily average cost of the EOQ model,
given that the daily demand is β. The other three terms tend to zero when the upper and lower bounds of the
expectation of the demand, β − α, tend to zero.

5.2.1 Sketch of Proof for Theorem 1

In this section, we provide the sketch of proof for Theorem 1. The complete proof is in E-companion
C. The derivation of the bound in Theorem 1 has three steps. (1) First, we show that given the CAOD
(õt∗ , w̃t∗ ), the daily average cost has an upper bound. This upper bound is drawn from the assumptions of the
maximum variance of the demand and the boundedness of the VLT. (2) Second, from the assumptions of the
expectation and variance of the demand, we bound the values of the order quantity and the order timing in
the CAOD. (3) Third, we establish the upper bound for the cost by considering the worst cases of the CAOD.
Next, we mainly discuss the first step, where we derive an upper bound for the cost from Assumptions 1
and 2. To start, we consider cases when the demand is deterministic and bounded within a certain interval
and the VLT is zero.

L EMMA 3. For a lot-sizing problem with no lost sales, for each period, let dt ∈ [d l , d u ] be the demand and
let K and h be the fixed ordering cost and the unit holding cost, respectively. Let d l > 0 and the length of the
planning horizon be sufficiently long. Thus, the length of the cover period of one order, τ0 , should satisfy
$s % & r '
2K 1 2Kd u
≤ τ0 ≤ l (10)
(2d u − d l )h d h

Lemma 3 states that if the demand on each day is within a certain interval, then the length of one cover
period can be bounded. Lemma 3 helps to bound the length of a cover period and is critical for deriving the
bounds. In Lemma 3, the length of a cover period is required to be an integer. However, in the remaining
q q
2Kd u
study, we neglect this requirement and assume (2d u2K−d l )h
≤ τ 0 ≤ 1
dl h
, which is consistent with the

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
19

bounds in a continuous time setting, as is shown in the proof of Lemma 3. As this approximation reduces
the length of a cover period by at most two days, it does not affect the major results of the bounds for the
cost.
Subsequently, let the VLT be nonzero and let Assumption 1.D hold. Given the order quantity and the
reorder point, we can establish the upper bounds in Proposition 2 for the daily average cost if the demand
is positive and bounded within a certain interval almost surely.

P ROPOSITION 2. Let Assumption 1 hold and let the demand on each day be within the interval [d l , d u ],
d l > 0. Thus, given the value of the reorder point, st , and the order quantity, wt , the daily average cost of
the CAOD for one cover period is at most

wt d u h d u (K + p(d u − d l )) u st l d u (d u vu − st )p
CT (dt , lt , st , wt ) ≤ + + d h( − v ) + (11)
2d l wt dl wt
wt d u h
The upper bound in Proposition 2 can be interpreted to be comprised of four parts. The first part, 2d l
, is
d u (K+p(d u −d l ))
the maximum daily average holding cost. The second part, wt
, is the maximum daily average lost
sales cost and the fixed ordering cost within one cover period. The third part, d u h( ds̃tl − vl ), is the maximal
holding cost during the VLT. The fourth part is the maximal lost sales cost during the VLT.
The expression of Proposition 2 leads to an approach for deriving the upper bound for the stochas-
tic version. The proof of idea can be found as the following, we define two new random variables Du =
maxi∈T J {Dt [i]} and Dl = mini∈T J {Dt [i]}, for some time interval T J . We can then rewrite the right-hand side
u
(RHS) of Proposition 2 in terms of Dl and Du , which is ρ(Du , Du Dl , DDl , (Du )2 ), where ρ is a linear function
u
that is consistent with the RHS of Proposition 2. Therefore, E[CT (Dt , lt , s̃t , wt )] ≤ E[ρ(Du , Du Dl , DDl , (Du )2 )]
u
= ρ(E[Du ], E[Du Dl ], E[ DDl ], E[(Du )2 ]). Thus, to derive the performance bound for CAOD, we consider the
u
expectation on both sides of Proposition 2, and it suffices to bound E[Du ], E[Du Dl ], E[ DDl ], and E[(Du )2 ]
respectively from the variance and expectation of the demand. When Assumption 2 holds, Dl ≥ 1, and thus,
we can complete the first step for proving Theorem 1.

5.3 Case 2: Slow-moving Items

In this section, we focus on slow-moving items, for which non-zero demand is common. In order to control
the length of the cover period, we restricted the probability of demand remaining zero for an extremely long
duration, which is described in Assumption 3.

A SSUMPTION 3. For each time period i, if demand from the previous period Di−1 is zero, the probability of
demand at time i, Di being zero is restricted by a positive constant ε, i.e., ∃0 ≤ ε < 1, s.t. P Di = 0 Di−1 =

0, X ≤ ε, ∀X ∈ X , ∀i = 1, ..., T .

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
20

This assumption requires that the probability of demand being zero is less than a certain positive value ε,
conditional on the event that the demand on the previous day is zero. This assumption is intended to restrict
the probability of an infinitely long interval between two non-zero demands occurring, which is also rare
in a real-world setting. If for a specific SKU, the successive days of zero demand are too long, the online
retailers will remove this SKU from its category.
We denote the number of successive days with zero demand between two days with a positive demand as
t¯σ,ε . Specifically, E[t¯σ,ε ] := E[γ(Dt , It = 1)] − 1. Then, Lemma 4 yields the upper bound for E[t¯σ,ε ].

L EMMA 4. Under Assumptions 1 and 3, the expectation of the number of successive days with zero demand
σ2
can be upper bounded by γ̄σ,ε , i.e., E[t¯σ,ε ] ≤ γ̄σ,ε , where γ̄σ,ε := α2 (1−ε)
.

We further denote mini=0...T {P(Dt+i 6= 0|Xt )} as ε+ . Because the expectation for each day is positive,
ε+ > 0. Thus, the expected daily average cost of the CAOD can be upper bounded in Theorem 2.

T HEOREM 2. Suppose Assumptions 1 and 3 hold, then the daily average cost of the CAOD for one cover
period is at most

√ √ √
w̃t,2 h(β+ − α + 2 τσ + 1) K(β + τσ) + p(β2 + σ2 + 2 τσs − α2 )
E[Ĉ (Dt , Lt , q̃t )|Xt ] ≤ + +
2 w̃t,1
√ √
h s̃t,2 (β+ − α + 2 τσ + 1) − vl β + (β2 + σ2 + τσs )vu p − s̃t,1 pβ + γ̄σ,ε h(w̃t,2
2 2

+ s̃t,2 )

, where the expression of γ̄σ,ε , τ, beta+ , s̃t, j and w̃t, j for j = 1, 2 are listed in the below and α, β, σ, and σs are coefficients
from Assumption 1. q
σ2 2K
γ̄σ,ε = α2 (1−ε) , τ= h
+ vu + Kh , β+ = β
2 ,
max{1− σ2 ,ε+ }
¯ √ α
s̃t,1 = √ αlt , s̃t,2 = l¯t (β + vu σ),
 vu σ+γ̄σ,ε α
1+
p √ q
w̃t,1 = − 2β + 3 τσ − α + √ √
2
1 1√
2K
h
,
β−α+2 τσ+ α + α τσ
 2 2 √ √ q
w̃t,2 = β +σ +σ2 τσs − α2 + α 2K
h
.
max{1− ,ε+ }
α2

R EMARK 4. Similar to the case of fast-moving products, the upper bound on daily averaged cost is tight
under the following special asymptotic conditions when demand has zero variance, the gap between upper
and lower bounds of the VLT is zero, and when demand shares the same expected value for each day. When
the above three conditions are satisfied, the bound reduces to the minimum EOQ cost with the demand

w̃t,2 h(β+ −α+2 τσ+1) 2
being β. The bound in Theorem 2 has four parts. The first part 2
plus γ̄σ,ε hw̃t,2 is the upper
√ √
K(β+ τσ)+p(β2 +σ2 +2 τσs −α2 )
bound for the daily average holding cost within the cover period. The second part w̃t,1

is the upper bound of daily average fixed ordering cost and lost sales cost within the cover period. The third
√  2
part h s̃t,2 (β+ − α + 2 τσ + 1) − vl β plus γ̄σ,ε hs̃t,2 is the upper bound for the daily average holding cost

within VLT. The fourth part (β2 + σ2 + τσs )vu p − s̃t,1 pβ is the upper bound for the lost sales cost within
VLT.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
21

Finally, by combining the convergence of deep learning model (Proposition 1) and the upper bound
on daily average inventory cost of CAOD (Theorem 1 and Theorem 2), the following corollary states an
asymptotic performance guarantee on the daily average inventory cost of the E2E model outputs.

C OROLLARY 1. For simplicity, we let U(Dt , Lt , q̃t , Xt ) denote the upper bound from either Theorem 1 or
2. In each case, suppose the assumptions associated with the corresponding theorem are satisfied. Suppose
there exists ν0 > 0, such that P(X = x) ≥ ν0 > 0, for any x ∈ X + . When the neural network is well-specified,
there exists a positive number Sεν0 ,θ . When the sample size is larger than Sεν0 ,θ , we have, for any ε > 0, with
probability θ ∈ (0, 1), for any feature Xt ∈ X ,
 
C Dt , Lt , q̂t {ō∗j }Tj=0 , {w̄∗j }Tj=0 − U(Dt , Lt , q̃t , Xt ) ≤ ε.

Proof of Corollary 1 Combining Proposition 1 with Theorem 1 or 2 immediately yields the result. 
Corollary 1 provides the asymptotic bound for the cost of the output policy when the sample size is
sufficiently large.

5.4 Upper Bounds with Strongly Predictive Features

Theorems 1 and 2 provide upper bounds for the expected daily averaged cost of CAOD conditioned on
the value of Xt . The performance of CAOD can be further guaranteed when features have strong predictive
values of the underlying stochastic demand. We state the detailed conditions of strongly predictive features
in the following assumption.

A SSUMPTION 4 (Strongly predictive features). Suppose that there exists a finite constant Γ ∈ [0, 1], such
that the conditional variance of demand satisfies Var(Di |X) ≤ ΓVar(Di ), and Var(D2i |X) ≤ Γ1.5Var(Di ),
∀i = 0, ..., T , ∀X ∈ X .

Assumption 4 assumes that for every feature X, the conditional variance of demand Var(Di |X) on this fea-
ture X is no more than a constant Γ times the total variance of demand. By the law of total variance, Var(Di )
can be decomposed into two parts. Specifically, Var(Di ) = E[Var(Di |X)] + Var(E[Di |X]). Therefore, when
Γ is smaller, the first part E[Var(Di |X)] get smaller, and the variance of demand Var(Di ) is dominated by
the second part Var(E[Di |X]). The second part is the variance of the average demand given each feature.
Thus, when Γ becomes smaller, Assumption 4 implies that the variance of demand depends more on the
second part, and knowing the features would reduce the variance of demand.
A group of features being strongly predictive intuitively means being effective. In practice, effective
features are usually designed by pre-processed feature engineering. The results that we are going to demon-
strate in this section mainly state that, our proposed E2E model will lead to a small out-of-sample inventory
cost if the features are strongly predictive.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
22

P ROPOSITION 3. Suppose Assumptions 1 and 4 hold, and let one of assumptions 2 and 3 hold. We denote
maxi {Var(Di )} by σD . Then, the daily averaged cost of the output of CAOD is no more than o(Γ1.5 σ3D ).

Proposition 3 demonstrates that, although the averaged optimal decisions may deviate from the optimal
decisions, when the input feature Xt is smartly designed, CAOD can achieve a small daily average cost. In
particular, when the feature Xt satisfies the strongly predictive conditions in Assumption 4 with parameter
Γ, then the daily average cost of CAOD is upper bounded by o(Γ1.5 ). In fact, Proposition 3 shows that when
Γ goes to zero, i.e., there exists a function from the smartly designed feature Xt to the actual demand Dt ,
then the daily average cost of the output of CAOD tends to zero.

5.5 Relations between fixed ordering cost and minimum order quantity

Compared to E2E-NFC with a regular replenishment schedule, our E2E dynamically decides the replenish-
ment timing. Therefore, an investigation of the order frequency and how it depends on the fixed order cost
in the E2E framework is of significant importance.
In the proposed E2E framework, the order frequency is controlled by the value of the fixed ordering
cost. When the fixed ordering cost is zero, the retailer tends to make frequent replenishment, e.g. daily
replenishment, to better satisfy customer’s demand with a minimum inventory level. Thus, the existence of
the fixed ordering cost motivates the retailer to aggregate demands from multiple days to reduce the order
frequency to reduce the ordering cost.
In practice, fixed ordering cost often consists of transportation cost, production set-up cost, and any other
expense that may occur in the order placing and fulfillment process. Due to its complex composition, it
might be challenging to estimate the exact value of the fixed ordering cost. Under such cases, retailers may
want to set a minimum order quantity to directly control the order frequency. Therefore, it would be useful
to study the correspondence between fixed order cost and the minimum order quantity.
q
P ROPOSITION 4. Set τ = 2K h
+ vu + Kh . For the fast-moving items, given the fixed ordering cost as K, the
output order quantity of CAOD for one order can be bounded as follows.
r r
1 2K √ √ 2K
q √ √ ≤ w̃t∗ ≤ (β2 − α2 + σ2 + 2 τσs + α) (12)
2β − 2α + 4 τσ + 1 + 1 τσ h h
α α

For the slow-moving items, given the fixed ordering cost as K, the order quantity for one order can be
bounded as follows.

 q √ 2 r 2K
− 2β + 3 τσ − α + q √ √ ≤ w̃t∗ (13)
1
β − α + 2 τσ + α + α τσ1 h
 β2 + σ2 + √τσ r
√  2K
s 2
≤ 2 −α + α (14)
max{1 − ασ2 , ε+ } h

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
23

Proposition 4 provides the lower bounds and upper bounds of the order quantity of CAOD. In practice,
suppose that the minimum ordering quantity is required to be larger than some value w, then we can solve
Equations (12) or (13) to obtain the value of fixed ordering cost.

5.6 Extension to the cases with general unit penalty cost



In Assumption 1.E, we assume the unit lost sales cost satisfies p ≥ 2Kh. This assumption is a necessary
condition such that in the labeling process, all the demands will be satisfied. Subsequently, in Sections

5.2 through 5.5, the analysis is performed in the cases where p ≥ 2Kh. However, in reality, Assumption
1.E may not be satisfied, and some demand may be lost in the labeling process. Relaxing Assumption 1.E
means that we do not require all the demands to be satisfied during the label process. We emphasize that
the analysis in Sections 5.2 through 5.5 is still valid when some demand may be lost during the labeling
process, because when demand is not required to be satisfied in the labeling process, we will obtain better
labels, and thus yield better decisions. In this section, we relax Assumption 1.E, and provide smaller upper
bounds for the daily average cost of the output of CAOD when no assumption is made on unit lost sales cost
p.
We start by analyzing the bounds for the length of the cover period, which are stated in Proposition 5.

P ROPOSITION 5. Consider the lot-sizing problem with possible lost sales, ((P2) in the E-companion). dt ∈
[d l , d u ] be the demand and let K and h be the fixed ordering cost and the unit holding cost, respectively. Let
d l > 0 and the length of the planning horizon be sufficiently long. Thus, the length of the cover period of
one order, τ0 , should satisfy
$s % & r '
2K 1 2Kd u p
≤ τ0 ≤ min{ l , } (15)
(2d u − d l )h d h h

Proposition 5 implies that compared to the case with p ≥ 2Kh, the cover period is cut off on the value
p
h
. Intuitively, when the penalty cost is small, the retailer would sacrifice some demand at the end of one
possible cover period to save the holding cost. When analyzing the upper bounds of the order quantity in
l q m
u
Proposition 4, if we replace the upper bound of the scheduled cover period d1l 2Kd
h
with the right-hand
side of (15), we will obtain new upper bounds for the order quantity of CAOD with general unit penalty cost.
Then, by replacing the upper bounds of the order quantity with the new one, we will conclude new smaller
upper bounds for the expected daily average cost than Theorems 1 and 2. These upper bounds incorporate
the possible lost sales demand during the labeling process, and are provided in E-companion F.

5.7 Lower bound for the cost of the optimal policy

Although in Sections 5.2 and 5.3, we have presented the upper bounds for the costs of the CAOD in special
cases, quantifying the exact difference between the cost of the CAOD and the optimal policy is difficult.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
24

This is because the optimal policy or the minimum cost, in view of the correlated demand and the stochastic
VLT, remains an open question. Therefore, in this section, we again focus on special cases of this problem.
The optimal policy can be expressed as Equation (16).

qt ∗ = arg min E[Ĉ (Dt , Lt , qt )|Xt ] (16)


qt

Here, we present two lower bounds for the daily average cost of the optimal policy under different assump-
tions. These assumptions also incorporate the values of the mean and variance of the demand, which can
provide some insights when compared with the upper bounds presented in Sections 5.2 and 5.3.
First, let us assume that the mean and variance of the demand for each day are ᾱ and σ̄2 , respec-
tively. Because the daily average cost is convex with respect to the demand vector, by Jensen’s inequality,
E[Ĉ (E[Dt ], Lt , q̃t )|Xt ] = E[Ĉ (ᾱ, Lt , q̃t )|Xt ] is a lower bound of E[Ĉ (Dt , Lt , q̃t )|Xt ]. Theorem 3 further shows
that this bound is tight considering that the mean and variance of the demand on each day are fixed and the
VLT is zero.

T HEOREM 3. Let the VLT be zero for all days and the mean and variance of the demand for each day be ᾱ
and σ̄2 , respectively. In this case, there exists a distribution of demand vector Dt such that the daily average

cost of the optimal policy is 2ᾱKh.

Theorem 3 demonstrates that the cost of the EOQ model with demand ᾱ is tight without further assump-
tion of the demand. However, if we assume the demand is symmetric and bounded, we can derive the
following lower bound:

C LAIM 1. Let the VLT be zero for all days and the mean and variance of the demand for each day be
denoted as ᾱ and σ̄2 , respectively. The marginal distribution of the demand on each day is symmetric and
lower bounded by zero. Assume that the unit lost sales cost, p, is larger than the unit holding cost, h, then,
2 √
the expected daily average cost of the optimal policy is at least max{ σ̄ᾱ , 2ᾱKh}.

Claim 1 shows that the minimum daily cost increases at least in the order of O (σ̄2 ). Furthermore, com-
bined with the results of Proposition 3, when Γ is no more than o(σ̄−2/3 ), the daily average cost of the output
of CAOD is at most o(σ̄2 ). This implies that under some strongly predictive conditions between demand and
features, the daily average cost of the output of CAOD is at most in the same order as the cost of the optimal
strategy without the feature information.

6 Numerical Experiments
In this part of the study, we use real-world data to test the performance of the proposed E2E model. We use
the same dataset mentioned in Section 4.1 of Qi et al. (2020b), obtained from the Food & Drinks category
in JD.com, one of the largest online retailing platforms in China. In particular, we randomly selected 2000

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
25

SKUs from the original dataset and trained the model using the data from July 1 to August 15, 2018.
We test the trained model using the demand from August 16 to August 30, 2018. The parameters for the
experimental test are as follows. The contextual data for each day are from the previous 28 days, and to be
consistent with the notations in Section 3.3, the value of T̃ is 28. For example, the contextual information
for the E2E model on July 1 is the labeled historical data from June 3 to June 30. In the basic setting, we
assume that the holding cost per SKU per day is 1, the fixed ordering cost per order is 50, and the unit lost
sales cost is 100. We adopt the ReLU function as the activation function and 50 as the number of hidden
layers, based on a few pilot trials. During the training process, we set the learning rate as 0.001. In the
objective of the loss function, the weight parameters are set as λ1 = λ2 = 0.1 and λ3 = λ4 = 1. During the
training process, we use PyTorch to perform backpropagation for the neural network. We set the stopping
criterion as either the number of training epochs exceeding 500 or the total loss in the objective in Equation
(7) within one batch being less than 2.
We use three benchmark policies in the experiments. The first policy is the (r, Q) policy, by which we
calculate the empirical distributions of the demand and the VLT from the real data from June 1 to August
30 and assume the demand and the VLT are stationary. Note that this is a stricter benchmark than the typical
(r, Q) policy because we use the demand from the test period to compute the reorder point and the order
quantity. Although this benchmark uses more information than the E2E model, the numerical experiments
show that it still has the worst performance among all compared policies. This is because the (r, Q) policy
makes a strong assumption that the demand is stationary, which is typically untrue in practice. The second
benchmark is the PTO framework, in which we use a deep neural network to predict the demand and
the VLT in the planning horizon (30 days). Subsequently, when testing the PTO framework, on each day,
we obtain the point estimation of the demand and the VLT for the next 30 days and decide the optimal
decisions (order timing and order quantity) on that day by solving a deterministic lot-sizing problem. The
third benchmark is the E2E framework proposed in Qi et al. (2020b) (E2E-NFC), where the order timing
is prespecified. We assume that the order can be placed only on every Monday and use the same neural
network as in our E2E to predict the order quantity, which is slightly different from the network design in Qi
et al. (2020b). Because the E2E-NFC method does not consider the fixed cost in its model, when the review
periods are sufficiently short or the fixed ordering cost is sufficiently high, our E2E model always achieves
a lower total cost than the E2E-NFC method. To achieve a reasonable comparison between the E2E and
E2E-NFC methods, we conduct a sensitivity analysis of the fixed ordering cost. In the optimal policy, the
realization of the demand and the VLT during the test period are known, and the deterministic lot-sizing
model is directly solved. Subsequently, we use the following metrics to compare the E2E method with the
different benchmarks. These metrics were also adopted in Qi et al. (2020b). The stockout rate is calculated
as the number of days with lost sales divided by the total number of days. The turnover rate is calculated as

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
26

the ratio of the average inventory level and the average demand. In practice, JD.com aims to minimize both
the stockout rate and turnover rate.
Training an E2E network for one SKU requires approximately 1 min, which can be performed in parallel
for different SKUs. A typical inventory curve is shown in Figure (3). The test period begins on Aug 16,
2018, prior to which we assume that the retailer follows the optimal order quantities, i.e., the labels obtained
from the labeling process. During the first few days in the test period, the inventory curves are the same
for all policies because the VLT is positive and it takes a few days to receive an order. After receiving the
orders, the inventory curve behaves differently. In the clairvoyant policy, the realization of all demands and
VLT is known. Hence, the curve follows the ZIO property exactly, and there is no lost sales cost.

Figure 3 Typical inventory curve for one item

Policy Total cost Holding Lost sales Fixed ordering Turnover Stockout
cost cost cost rate Rate
Clairvoyant 369.12 207.99 0 161.13 1.75 0
E2E 4219.93 484.87 3572.56 162.50 2.75 0.11
(r, Q) policy 6973.23 315.19 6542.41 115.63 1.69 0.17
PTO 5458.42 245.55 5019.94 192.93 2.99 0.07
E2E-NFC 5746.55 434.26 5171.41 140.88 3.51 0.10
Table 1 Comparison of different policies

The performance matrix of the 2000 SKUs is provided in Table 1. Table 1 shows that our E2E model
achieves a lower total cost than all three benchmarks. In Particular, our E2E model outperforms the PTO and
E2E-NFC methods by achieving a lower lost sales cost. This result is consistent with the analysis presented
in Section 5.3, which provides a possible explanation of the E2E-NFC method showing a higher lost sales
cost than the E2E method in practice. Although the VLT is positive in the experiments, this provides some
insights into our E2E model showing a lower lost sales cost than the benchmarks.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
27

7 Conclusion
We considered a multiperiod inventory problem with stochastic demand and VLT. The decision-maker has
access to contextual information associated with both demand and VLT. The goal is to determine both order
quantity and timing that minimizes total inventory cost, which includes a fixed ordering cost, lost sales
cost, and holding cost. We proposed an E2E deep learning approach, in which a deep neural network model
is trained to output the optimal order timing and order quantity given any input contextual information.
Our numerical experiments demonstrated the advantages of our model over state-of-the-art benchmarks. To
better understand the performance of the E2E policy, we demonstrate that it converges to the CAOD, which
is defined as the mapping from the features to the average weighted decisions conditional on the contextual
information. Furthermore, we derived the upper and lower bounds for the cost of the CAOD under certain
additional assumptions. We demonstrate that when the feature vector satisfies strongly predictive conditions
with parameter Γ, CAOD can achieve a small daily average cost at most o(Γ1.5 ).
We then suggest several future directions. In this work, we showed that the output policy asymptotically
converges to the CAOD; however, a non-asymptotic analysis for the convergence can be performed in the
future. Moreover, we can use the proposed E2E network to solve other complex inventory models, such as
the joint replenishment problem, perishable products, and capacity restrictions, only by revising the labeling
and projection steps presented in this paper. We believe that the proposed E2E approach can demonstrate
good performance in the aforementioned cases because of the flexibility of our network.

References
Agrawal, Shipra, Randy Jia. 2019. Learning in structured mdps with convex cost functions: Improved regret bounds
for inventory management. Proceedings of the 2019 ACM Conference on Economics and Computation. 743-744.

Akpinar, Nil-Jana, Bernhard Kratzwald, Stefan Feuerriegel. 2019. Sample complexity bounds for recurrent neural
networks with application to combinatorial graph problems. arXiv preprint arXiv:1901.10289, .

Aksen, Deniz, Kemal Altınkemer, Suresh Chand. 2003. The single-item lot-sizing problem with immediate lost sales.
European Journal of Operational Research, 147 (3), 558-566.

Alp, Osman, Nesim K Erkip, Refik Güllü. 2003. Optimal lot-sizing/vehicle-dispatching policies under stochastic lead
times and stepwise fixed costs. Operations Research, 51 (1), 160-166.

Amazon. 2021. Amazon global selling | sell & ship products internationally - amazon. URL https://sell.
amazon.com/global-selling. Last accessed on 2021-10-15.

Anthony, Martin, Peter L Bartlett. 2009. Neural network learning: Theoretical foundations. cambridge university
press.

Aven, Terje. 1985. Upper (lower) bounds on the mean of the maximum (minimum) of a number of random variables.
Journal of applied probability, 723-728.

Ban, Gah-Yi, Cynthia Rudin. 2019. The big data newsvendor: Practical insights from machine learning. Operations
Research, 67 (1), 90-108.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
28

Bartlett, Peter, Dylan J Foster, Matus Telgarsky. 2017. Spectrally-normalized margin bounds for neural networks.
arXiv preprint arXiv:1706.08498, .

Bertsimas, Dimitris, Nathan Kallus. 2020. From predictive to prescriptive analytics. Management Science, 66 (3),
1025-1044.

Bertsimas, Dimitris, Christopher McCord. 2019. From predictions to prescriptions in multistage optimization prob-
lems. arXiv preprint arXiv:1904.11637, .

Bertsimas, Dimitris, Karthik Natarajan, Chung-Piaw Teo. 2006. Tight bounds on expected order statistics. Probability
in the Engineering and Informational Sciences, 20 (4), 667.

Beutel, Anna-Lena, Stefan Minner. 2012. Safety stock planning under causal demand forecasting. International
Journal of Production Economics, 140 (2), 637-645.

Boute, Robert N, Joren Gijsbrechts, Willem van Jaarsveld, Nathalie Vanvuchelen. 2021. Deep reinforcement learning
for inventory control: A roadmap. European Journal of Operational Research, .

Brahimi, Nadjib, Stéphane Dauzere-Peres, Najib M Najid, Atle Nordli. 2006. Single item lot sizing problems. Euro-
pean Journal of Operational Research, 168 (1), 1-16.

Chen, Minshuo, Xingguo Li, Tuo Zhao. 2019. On generalization bounds of a family of recurrent neural networks.
arXiv preprint arXiv:1910.12947, .

Cheung, Wang Chi, David Simchi-Levi. 2019. Sampling-based approximation schemes for capacitated stochastic
inventory control models. Mathematics of Operations Research, 44 (2), 668-692.

Chu, Leon Yang, J George Shanthikumar, Zuo-Jun Max Shen. 2008. Solving operational statistics via a bayesian
analysis. Operations Research Letters, 36 (1), 110-116.

Cristian, Rares, Pavithra Harsha, Georgia Perakis, Brian L Quanz, Ioannis Spantidakis. 2022. End-to-end learning
via constraint-enforcing approximators for linear programs with applications to supply chains. AAAI 2022
Workshop, .

Donti, Priya, Brandon Amos, J Zico Kolter. 2017. Task-based end-to-end model learning in stochastic optimization.
Advances in Neural Information Processing Systems. 5484-5494.

Duan, Lu, Haoyuan Hu, Zili Wu, Guozheng Li, Xinhang Zhang, Yu Gong, Yinghui Xu. 2020. Balanced order batch-
ing with task-oriented graph clustering. Proceedings of the 26th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining. 3044-3053.

Elmachtoub, Adam N, Paul Grigas. 2021. Smart “predict, then optimize”. Management Science, .

Gijsbrechts, Joren, Robert N Boute, Jan A Van Mieghem, Dennis Zhang. 2021. Can deep reinforcement learning
improve inventory management? performance on dual sourcing, lost sales and multi-echelon problems. Manu-
facturing & Service Operations Management, .

Guan, Yongpei, Andrew J Miller. 2008. Polynomial-time algorithms for stochastic uncapacitated lot-sizing problems.
Operations Research, 56 (5), 1172-1183.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
29

Halman, Nir, James B Orlin, David Simchi-Levi. 2012. Approximating the nonlinear newsvendor and single-item
stochastic lot-sizing problems when data is given by an oracle. Operations Research, 60 (2), 429-446.

Ho-Nguyen, Nam, Fatma Kılınç-Karzan. 2020. Risk guarantees for end-to-end prediction and optimization processes.
arXiv preprint arXiv:2012.15046, .

Hochreiter, Sepp, Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9 (8), 1735-1780.

Huang, Kai, Simge KüçüKyavuz. 2008. On stochastic lot-sizing problems with random lead times. Operations
Research Letters, 36 (3), 303-308.

Huh, Woonghee Tim, Ganesh Janakiraman, John A Muckstadt, Paat Rusmevichientong. 2009. An adaptive algorithm
for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Mathematics of
Operations Research, 34 (2), 397-416.

Huh, Woonghee Tim, Retsef Levi, Paat Rusmevichientong, James B Orlin. 2011. Adaptive data-driven inventory
control with censored demand based on kaplan-meier estimator. Operations Research, 59 (4), 929-941.

Huh, Woonghee Tim, Paat Rusmevichientong. 2009. A nonparametric asymptotic analysis of inventory planning with
censored demand. Mathematics of Operations Research, 34 (1), 103-123.

Iglehart, Donald L. 1963. Optimality of (s, S) policies in the infinite horizon dynamic inventory problem. Management
science, 9 (2), 259-267.

Janakiraman, Ganesh, Seung Jae Park, Sridhar Seshadri, Qi Wu. 2013. New results on the newsvendor model and the
multi-period inventory model with backordering. Operations Research Letters, 41 (4), 373-376.

JD. 2021. Jd.com announces fourth quarter and full year 2020 results, jd.com,
inc. URL https://ir.jd.com/news-releases/news-release-details/
jdcom-announces-fourth-quarter-and-full-year-2020-results. Last accessed on
2021-10-15.

Jiang, Ruiwei, Yongpei Guan. 2011. An o (n2 )-time algorithm for the stochastic uncapacitated lot-sizing problem with
random lead times. Operations Research Letters, 39 (1), 74-77.

Klabjan, Diego, David Simchi-Levi, Miao Song. 2013. Robust stochastic lot-sizing by means of histograms. Produc-
tion and Operations Management, 22 (3), 691-710.

Levi, Retsef, Ganesh Janakiraman, Mahesh Nagarajan. 2008a. A 2-approximation algorithm for stochastic inventory
control models with lost sales. Mathematics of Operations Research, 33 (2), 351-374.

Levi, Retsef, Martin Pál, Robin O Roundy, David B Shmoys. 2007a. Approximation algorithms for stochastic inven-
tory control models. Mathematics of Operations Research, 32 (2), 284-302.

Levi, Retsef, Georgia Perakis, Joline Uichanco. 2015. The data-driven newsvendor problem: new bounds and insights.
Operations Research, 63 (6), 1294-1306.

Levi, Retsef, Robin Roundy, Van Anh Truong, Xinshang Wang. 2017. Provably near-optimal balancing policies for
multi-echelon stochastic inventory control models. Mathematics of Operations Research, 42 (1), 256-276.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
30

Levi, Retsef, Robin O Roundy, David B Shmoys. 2007b. Provably near-optimal sampling-based policies for stochastic
inventory control models. Mathematics of Operations Research, 32 (4), 821-839.

Levi, Retsef, Robin O Roundy, David B Shmoys, Van Anh Truong. 2008b. Approximation algorithms for capacitated
stochastic inventory control models. Operations Research, 56 (5), 1184-1199.

Levi, Retsef, Cong Shi. 2013. Approximation algorithms for the stochastic lot-sizing problem with order lead times.
Operations Research, 61 (3), 593-602.

Liang, Percy. 2016. Statistical learning theory. 2016. URL https://web. stanford. edu/class/cs229t/notes. pdf , .

Liyanage, Liwan H, J George Shanthikumar. 2005. A practical inventory control policy using operational statistics.
Operations Research Letters, 33 (4), 341-348.

Medsker, Larry R, LC Jain. 2001. Recurrent neural networks. Design and Applications, 5.

Meller, Jan, Fabian Taigel. 2019. Machine learning for inventory management: Analyzing two concepts to get from
data to decisions. Available at SSRN 3256643, .

Nazari, Mohammadreza, Afshin Oroojlooy, Lawrence Snyder, Martin Takác. 2018. Reinforcement learning for solving
the vehicle routing problem. Advances in Neural Information Processing Systems. 9839-9849.

Qi, Meng, Ho-Yin Mak, Zuo-Jun Max Shen. 2020a. Data-driven research in retail operations—a review. Naval
Research Logistics (NRL), 67 (8), 595-616.

Qi, Meng, Yuanyuan Shi, Yongzhi Qi, Chenxin Ma, Rong Yuan, Di Wu, Zuo-Jun Max Shen. 2020b. A practical
end-to-end inventory management model with deep learning. Forthcoming in Management Science, .

Sandbothe, Richard A, Gerald L Thompson. 1990. A forward algorithm for the capacitated lot size model with
stockouts. Operations Research, 38 (3), 474-486.

Schmidhuber, Jürgen. 2015. Deep learning in neural networks: An overview. Neural networks, 61 85-117.

Snyder, Lawrence V, Zuo-Jun Max Shen. 2019. Fundamentals of supply chain theory. Wiley Online Library.

Wagner, Harvey M, Thomson M Whitin. 1958. Dynamic version of the economic lot size model. Management science,
5 (1), 89-96.

Wagner, Michael R. 2011. Online lot-sizing problems with ordering, holding and shortage costs. Operations Research
Letters, 39 (2), 144-149.

Wei, Colin, Tengyu Ma. 2019. Data-dependent sample complexity of deep neural networks via lipschitz augmentation.
arXiv preprint arXiv:1905.03684, .

Yuan, Hao, Qi Luo, Cong Shi. 2019. Marrying stochastic gradient descent with bandits: Learning algorithms for
inventory systems with fixed costs. Available at SSRN 3329611, .

Zhang, Huanan, Xiuli Chao, Cong Shi. 2020. Closing the gap: A learning algorithm for lost-sales inventory systems
with lead times. Management Science, 66 (5), 1962-1980.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
31

Zheng, Yu-Sheng. 1991. A simple proof for optimality of (s, S) policies in infinite-horizon inventory systems. Journal
of Applied Probability, 802-810.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
1

Appendix. E-Companion

A Notations Summary

Notations Meaning
K, p, h Unit fixed ordering cost, unit holding cost, and lost sales cost per unit
Dt , Lt , Xt The random demand and VLT, and feature information on Day t, the bold letter
denote the sequence of these random variables
dt , lt The realization of demand and VLT on Day t
qt∗ , qt , q̃t The optimal order quantity, order quantity in history,
and averaged optimal order vectors on Day t
rt∗ , λi The optimal next-order arrival time, and the weight of the MSE loss of each
output i
T̃ , TH , TI , Tr The length of the input features, history data, training period, and review
period
∗ ˜ j , Out
Outi , Out i , Out ¯ j The output value, optimal label, CAOD, the mean of optimal labels of Output i
∗ ∗ ∗
ot , õt , ōt The optimal label, averaged optimal value, the mean of optimal label of the
order timing
wt∗ , w̃t∗ , w̄t∗ The optimal label, averaged optimal value, the mean of optimal label of the
order quantity
α, β The upper bound and lower bound of demand on each day
σ, σs the upper bound of variance of Dt and Dt2 .
C(D, L, q) The deterministic cost function
Ĉ (D, L, q) The expected cost function.
CT (D, L, q, I, s, w) The daily average cost within the cover period of the order
Table 2 Key Notations

B Proofs for Section 4


Before providing the proof of Proposition 1, we first introduce several results of the generalization bound
for an RNN in the field of computer science.

B.1 Existing knowledge of the sample complexity of RNNs


The flexibility of an RNN has been examined in different studies in the field of neural networks. The basic
concept is to show that the Rademacher complexity tends to zero when the sample size is infinity. For the
classification problem (the output is binary), it suffices to show that the Vapnik–Chervonenkis dimension
of a neural network is finite (Anthony and Bartlett (2009)). For a real-valued output, such as the MSE
loss function in the E2E model, a similar notation, called the pseudo-dimension, can be used to show the
generalization bound. In particular, Anthony and Bartlett (2009) showed that if a class of function F1 has
finite pseudo-dimension Pdim(F1 ), then given population prediction error ε p and confidence interval δ, the
sample complexity is less than
128 h 34 16 i
ML (δ, ε p ) ≤ 2Pdim(F1 ) ln( ) + ln( )
ε2p εp δ

Because an LSTM network can be considered as a combination of tanh(·), sigmoid functions, ReLU func-
tions, and linear transformations, given the range of parameters and range of output, its pseudo-dimension

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
2

is finite. Recent literature is focused on deriving a tighter bound for pseudo-dimensions. For example, Chen
et al. (2019) assumed the spectral norms of weight matrices to be bounded, and derived a sample com-
plexity bound for an LSTM for a classification problem. Although Bartlett et al. (2017) showed that the
Rademacher complexity bounds depend exponentially on the depth of a neural network, Wei and Ma (2019)
derived a data-dependent polynomial-scaled sample complexity bound for an LSTM for a classification
problem. For a real-valued output, Akpinar et al. (2019) considered an RNN with ReLU as the activation
function and provided an explicit form for the bound of the pseudo-dimension.

B.2 Proofs
Proof of Proposition 1 When training the E2E model, our objective function (total loss function) is
the sum of the weighted squared losses, i.e., the first, second, and fourth outputs are the outputs of three
independent LSTM networks. We assume that the hypothesis class of the neural networks is well-specified,
i.e., the hypothesis class contains the true model. Thus, for each output j, ( j = 1, 2, 4) during the training
process of the deep neural network, because its pseudo-dimension is finite, there exists a positive integer
M j (ε p , δ) as the sample size for δ > 0, such that after training the model, we have, with probability at least
1 − δ,

˜ j , Out j∗ )| ≥ ε p
|EOut j∗ [L(Out j , Out j∗ )] − L(Out

j
˜ is the expectation of Out j∗ ,
Recall that Out j∗ is our label when training the E2E model for Output j, Out
¯ j as the
and L(·) is the mean squared loss function. Out j is the output of the E2E policy. We define Out
¯ j = 1 ∑ni=1 Out ij∗ . M j can be bounded in various forms. For example,
average of Outij∗ , i.e., Out n

M j (ε p , δ) ≤ A1 /ε2p [A2 ln(1/ε p ) + A3 ln(1/δ)],

where A1 , A2 , and A3 are constant real numbers and depend on the pseudo-dimension of the neural network.
The pseudo-dimension further depends on various network structure parameters including the number of
hidden layers and the width of the input.
Since P(X = x) ≥ ν0 for any x ∈ X + , we have

j j
˜ , Out j∗ )]| ≥ P(X = x)|EOut j∗ [L(Out j , Out j∗ )] − L(Out
|EOut j∗ [L(Out j , Out j∗ )] − L(Out ˜ , Out j∗ )|X]|

˜ j , Out j∗ )|X]|.
≥ ν0 |EOut j∗ [L(Out j , Out j∗ )] − L(Out

Therefore, we have that

˜ j , Out j∗ )|X]| ≤ ε p /ν0


|EOut j∗ [L(Out j , Out j∗ )] − L(Out

Thus, we have the following lemma for each output given any feature X:

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
3

L EMMA 5. For Output j, j = 1, 2, 4, with probability θ ∈ (0, 1), for any ε p > 0, there exists a positive
˜ j | < ε p /ν0 ) ≥ θ.
integer M j such that when the sample size is greater than M j , we have P(|Out j − Out

Proof of Lemma 5 Consider output j of Outputs 1, 2, and 4. Let there be n labels of the output, from
i = 1 to n. Given ε p > 0, δ, there exists a number M1j such that when the sample size is greater than M1j , we
˜ j − Out j∗ )2 ] − 1 ∑ni=1 (Out j − Outij∗ )2 | ≥ ε2p /(8ν20 )) ≤ δ. Thus,
have P(|EOut j∗ [(Out n

n
|EOut j∗ [(Out˜ j − Out j∗ )2 ] − 1 ∑(Out j − Outij∗ )2 | (17)
n i=1
n
= |EOut j∗ [(Out˜ j − Out j∗ )2 ] − 1 ∑(Out j − Out ¯ j + Out¯ j − Outij∗ )2 | (18)
n i=1
1 n j 2 1 n ¯ j
= |EOut ˜ j∗ [(Out j
− Out j∗ 2
) ] −
n∑
(Out j
− ¯
Out ) −
n ∑(Out − Outij∗ )2
i=1 i=1
n
j 2
¯ ) ∑(Out j
¯ − Outij∗ )|
− (Out j − Out (19)
n i=1
n
= |EOut j∗ [(Out˜ j − Out j∗ )2 ] − (Out j − Out ¯ j )2 − 1 ∑(Out ¯ j − Outij∗ )2
n i=1
− (Out˜ j − Out ¯ j ) × 0| (20)
n
˜ j − Out j∗ )2 ] − 1 ∑(Out
≥ |EOut j∗ [(Out ¯ j − Outij∗ )2 | − |(Out j − Out
¯ j )2 | (21)
n i=1
j j
˜ − Out j∗ )2 ] − 1 ∑ni=1 (Out
By the uniform law of large numbers, we know the first term |EOut j∗ [(Out ¯ −
n

Outij∗ )2 | converges to zero in distribution. In particular, there exists M2j such that when sample size n is
˜ j − Out j∗ )2 ] − 1 ∑ni=1 (Out
larger than M2j , |EOut j∗ [(Out ¯ j − Outij∗ )2 | ≤ ε2p /(8ν20 ).
n
j ε2p j
When the sample size is larger than max{M1j , M2j }, |(Out
¯ − Out j )2 | ≤ ¯ − Out j | ≤ εp
(4ν20 )
. Thus, |Out 2ν0
.
Again, by the law of large numbers, there exists a positive number M3j such that when the sample size is
˜ j | ≤ ε . Therefore, when the sample size is larger than M j = max{M1j , M2j , M3j },
¯ j − Out
larger than M3j , |Out 2ν0
j
˜ − Out j | ≤ ε p /ν0 .
|Out 
Based on Lemma 5, we know that the output policy is close to the CAOD, when the sample size is
sufficiently large. For the second part of the proof, we show that the excess cost of the order quantity or
the order timing is bounded by the decision errors. Let the length of the first cover period be τ, the output
policy be (ô, ŵ), and the CAOD be (õ, w̃). Recall that the concept of cover period is introduced in Section 5,
and in Section 5, we denote the daily average cost by Ĉ . Here, for simplicity, we denote the daily average
 
cost within the first cover period by Ĉ Dt , Lt , (o, w) , given the demand and VLT as Dt and Lt . Let W be
the maximum order quantity in Proposition 4 depending on the case. Particularly, we have the following:

L EMMA 6. Setting R = (h + K + p)τ + W p + W h, we have


 2
Ĉ (Dt , Lt , (õ, w̃)) − Ĉ (Dt , Lt , (ô, ŵ)) ≤ ||(ô, ŵ) − (õ, w̃)||22 R 2 (22)

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
4

Proof of Lemma 6 We consider the worst case for the expected daily average cost. Every extra item that
we order incurs a holding cost of at most τh. If the order time is delayed by one day, the excess lost sales
cost is at most W p. If the order time is earlier by one day, the excess holding cost is at most W h. If the order
quantity reduces by one, the excess order cost is at most pτ. 
Lemma 6 provides a Lipschitz property for the expected daily average cost. Combining all these results,
we can arrive at Proposition 1: There exists a positive number Sεν0 ,θ , such that when the sample size is larger
than, Sεν0 ,θ , we have, for any ε > 0, with probability θ ∈ (0, 1),
   
|E[C(Dt , Lt ,q̂t {õ∗j }Tj=0 , {w̃∗j }Tj=0 )|Xt ] − E[C(Dt , Lt , q̂t {ō∗j }Tj=0 , {w̄∗j }Tj=0 )|Xt ]| ≤ ε.


Proof of Lemma 1 We first consider the first term, ∑t∈T I λ1 L (Out1, lt∗ ), in the loss function in Equation
(7). Because L (·) is the mean squared error, minimizing this term is equivalent to minimizing ∑t∈T I (Out1 −
lt∗ )2 . As we assume the hypothesis space of the neural network is well-specified, the minimum value is
achieved when Out1∗ = 1
|T I | ∑t∈T I lt∗ . Therefore, by the law of large numbers, when |T I | →
− ∞, Out1∗ →
− E[lt ].
Similarly, because Out1, Out2, and Out4 are obtained from three independent LSTMs, we can show that
when the sample size tends to infinity, Out2∗ →
− E[rt∗ ] and Out4∗ →
− E[wt∗ ] = w̃t∗ . As ot∗ = rt∗ − lt∗ , ∀t ∈ T I ,
Out3∗ →
− E[rt∗ − lt∗ ] = õt∗ . Therefore, (õt∗ , w̃t∗ ) is the minimizer of the function in Equation (7), when the
sample size tends to infinity. 

C Proofs for Section 5


For the remainder of the proofs, the randomness is conditional on the contextual information, Xt . For sim-
plicity of expression, we neglect the notations of the conditioning on Xt within the proofs.
Proof of Lemma 2 We first consider Out 1 in the network in Figure (1). Because the label value for
Out 1 is the actual VLT, to minimize the MSE, we have l˜t∗ = l¯t . Second, we consider Out 2 in Figure (1).
By the ZIO property, the label values for Out 2 are γ(Dt , It ). Therefore, to minimize the MSE, we have
r̃t∗ = E[γ(Dt , It )|Xt ]. Thus, as õt∗ = max{r̃t∗ − l˜t∗ , 0}, the lemma is obtained. 
Proof of Lemma 3 For one order, we denote the daily average cost by Cm (τ0 ; d), given that the length of
the cover period is τ0 and the demand vector within the planning horizon is d.
We first assume the time as continuous and the demand at time t as d(t), where d(·) : R → R is a function
q
that gives the demand at time t. Subsequently, we show that in this continuous setting, we have (2d u2K −d l )h

q
u
τ0 ≤ d1l 2Kd h
. Without the loss of generality, we consider the first order. Thus, the daily average cost is
K τ
Cm (τ0 ; d) = τ0
+ τh0 ∈ τ00 td(t)dt.
Furthermore, the set of the demand functions d(·) of interest can be written as SD := {d : d l ≤ d(t) ≤
q
d u , ∀t}. Thus, to prove (2d u2K
−d l )h
≤ τ0 , it suffices to prove that
s
d(Cm (t; d)) 2K
≤ 0, ∀τ0 ≤ , ∀d ∈ SD (23)
dt t=τ0 (2d − d l )h
u

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
5

Because when the condition in Equation (23) holds, if the length of the cover period is less than the left-hand
side, the daily average cost decreases when the length increases. Hence, the optimal length of the cover
period should be no shorter than the left-hand side. In the following, we present the proof of (23).
q
When τ0 ≤ (2d u2K−d l )h
,

d(Cm (t; d)) K h τ


=− 2
− 2 ∈ τ00 td(t)dt + hd(τ0 )
dt t=τ0 τ0 τ0
K h d l τ20
≤− 2 − 2 + hd(τ0 )
τ0 τ0 2
K hd l
≤− 2 − + hd u
τ0 2
≤0
q
2K
The first and second inequalities are by d(t) ∈ SD , and the third inequality is by τ0 ≤ (2d u −d l )h
.
q
u
To prove τ0 ≤ d1l 2Kdh
, it suffices to prove that
r
d(Cm (t; d)) 1 2Kd u
≥ 0, ∀τ0 ≥ l , ∀d ∈ SD (24)
t=τ0dt d h
√ √
We claim that for any optimal cover period Cm (t; d) ≤ 2d u hK, 2d u hK is the optimal daily average
cost given the demand as d u , based on the EOQ model. This claim is true because if the daily average cost

is larger than 2d u hK, we can split this order into multiple orders, and the daily average cost for each order
√ Rτ √
is no more than 2d u hK, since d(t) ≤ d u . Thus, by this claim, we have h 0 0 td(t)dt + K ≤ τ0 2d u hK, and
d(Cm (t; d)) K h τ0
Z
=− 2
− 2 td(t)dt + hd(τ0 )
dt t=τ0 τ0 τ0 0

K τ0 2d u hK − K
≥− 2 − + hd(τ0 )
τ0 τ20

2d u hK
≥− + hd l
τ0
≥0
q
u
The second inequality is by d(t) ∈ SD and the third inequality is by τ0 ≥ d1l 2Kd h
. Thus, we obtain
q q
2K u
(2d u −d l )h
≤ τ0 ≤ d1l 2Kd
h
. It is worth mentioning that both sides of this inequality are tight. To understand
q q
this, for the left-hand side, we can design d(t) = d l , ∀t < (2d u2K −d l )h
and d(t) = d u , ∀t ≥ (2d u2K−d l )h
. For the

RHS, we can also design a demand vector such that Cm (t; d) = 2d u hK for one order.
In the following, we consider the setting with a discrete time, which is a special case of the above con-
tinuous time setting. First, we denote the daily average cost as Cdm (τ0 ; d) in the discrete time. Thus, we have
K τ
Cdm (τ0 ; d) = τ0
+ τh0 ∑t=1
0
dt (t − 1). For the left-hand side, we have

K K h τ0 h τ0 −1
Cdm (τ0 ; d) − Cdm (τ0 − 1; d) = − + ∑ dt (t − 1) − ∑ dt (t − 1)
τ0 τ0 − 1 τ0 t=1 τ0 − 1 t=1

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
6

τ0 −1
K h hdτ (τ0 − 1)
=− − ∑ dt (t − 1) + 0 τ0
τ0 (τ0 − 1) τ0 (τ0 − 1) t=1
τ0 −1
K hd l hd u (τ0 − 1)
≤− − ∑ (t − 1) +
τ0 (τ0 − 1) τ0 (τ0 − 1) t=1 τ0
l u
K hd (τ0 − 2) hd (τ0 − 1)
=− − +
τ0 (τ0 − 1) 2τ0 τ0
l
1 K d hd l 
= − + h(d u − )(τ0 − 1) −
τ0 τ0 − 1 2 2
l l
The inequality is by dt ∈ SD . − τ0K−1 + h(d u − d2 )(τ0 − 1) − hd2 is an increasing function of τ0 , and the positive
q q jq k
l l )2
root of this function is τ0 = 1 − 2(2ddu −d l ) + 4(2d(du −d l )2 + 2K
h(2d u −d l )
≥ 1
2
+ 2K
h(2d u −d l )
≥ 2K
(2d u −d l )h
. Thus, we
obtain the left-hand side.
For the right-hand side, we have
τ0 −1
K h hdτ (τ0 − 1)
Cdm (τ0 ; d) − Cdm (τ0 − 1; d) = − − ∑ dt (t − 1) + 0 τ0
τ0 (τ0 − 1) τ0 (τ0 − 1) t=1

K (τ0 − 1) 2d u hK − K hdτ0 (τ0 − 1)
≥− − +
τ0 (τ0 − 1) τ0 (τ0 − 1) τ0

2d u hK hd l (τ0 − 1)
≥− +
τ0 τ0
τ0 −1

The first inequality is by h ∑t=1 dt (t −1)+K ≤ (τ0 −1) 2d u hK, which is from the conclusion of the contin-
l q m √ l
, − 2dτ0 hK + hd (ττ00 −1) ≥ 0. Thus, we have Cdm (τ0 + 1; d) −Cdm (τ0 ; d) ≥ 0,
u u
uous case. When τ0 ≥ d1l 2Kd h
+ 1
l q m
u
∀τ0 ≥ d1l 2Kd h
, ∀dt ∈ SD . 
Proof of Proposition 2 First, we begin with the case wherein the VLT is zero. Let the sequence of the
T
demand, {dt }t=1 , satisfy dt ∈ [d l , d u ], ∀t = 1, ..., T , then the daily average cost of the CAOD for one cover
period, CT (dt , lt = 0, st = 0, wt ) is at most
wt d u h d u (K + p(d u − d l ))
CT (dt , lt = 0, st = 0, wt ) ≤ + (25)
2d l wt
wt
The proof of Equation (25) is as follows. The length of one cover period is at least Tmin ≥ du
and at most
wt
Tmax = dl
. The holding cost within the cover period is at most w2 Tmax h = w w
2 dl
h, and the lost sales cost is at
w2 h
most p(d u − d l ). Therefore, the total cost is at most 2d l
+ K + p(d u − d l ). Subsequently, we divide this total
cost by Tmin and obtain Equation (25).
In the following, we assume that the VLT is nonnegative and uncertain. Let Assumption 1 hold and the
demand for each day be positive, then given the values of the reorder point st , and the order quantity, wt ,
d u (d u vu −st )p
we claim that the daily average cost during the VLT is at most d u h( dstl − vl ) + wt
. The proof is as
follows: When an order arrives before the inventory level tends to zero, a holding cost is incurred. As the
st
inventory level tends to zero no later than dl
, the holding cost is at most wt ( dstl − vl )h. When an order arrives
after the inventory level tends to zero, a lost sales cost is incurred. As the total demand within the VLT is at

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
7

most d u vu and the inventory level is at least st , the lost sales cost is at most (d u vu − st )p. We divide the total
wt
cost by the minimum length of the cover period, du
, thus obtaining the aforementioned claim.
When the VLT is nonnegative, we add the daily average cost incurred by the VLT in the aforementioned
claim and the cost for the cover period in Equation (25), and establish Proposition 2. 
Proof of Theorem 1 The remaining proof of Theorem 1 has two parts. The first part is the derivation of
the bound for the values of the averaged optimal solutions. The other part is the derivation of the bound for
the expected cost of one given order time and order quantity.
First, we obtain the following bound for the values of the CAOD, s̃t∗ , in Lemma 7.

L EMMA 7. Let Assumptions 1 and 2 hold, then the reorder point in the CAOD satisfies
αl¯t √
√ u ≤ s̃t∗ ≤ l¯t (β + vu σ) (26)
1+ v σ
The proof of Lemma 7 is provided separately after the proof of Theorem 1.
q
2K
In the first part, we claim that when Assumptions 1 and 2 hold and set τ = h
, the averaged optimal
order quantity is bounded by the inequalities in Equation (27).
r r
1 2K √ √ 2K
q √ √ ≤ w̃t∗ ≤ (β2 − α2 + σ2 + 2 τσs + α) (27)
2β − 2α + 4 τσ + 1 + 1 τσ h h
α α

To prove the inequalities in Equation (27), we first notice that by Lemma 3, the maximum and minimum
q q
u
lengths of one scheduled cover period are Dl 2KD
1
h
and 2K
(2Du −Dl )h
, respectively. As the order quantity
q
equals the sum of the demands within the scheduled cover period, we can bound wt by Dl (2Du2K −Dl )h
≤ wt∗ ≤
q 3
u 2
Du 2KDu l
Dl h
. Therefore, to bound the expected order quantity, it suffices to bound E[ √ Du l ] and E[ (DDl) ].
2D −D
Dl
Let the length of the interval of interest be τ, then we have E[ √ ] = E[ r 2Du1 ]. Because √1x is
2Du −D l − 1
(Dl )2 Dl
 2Du 
a convex function, by Jensen’s inequality, we have E[ r 2Du1 ]≥ r  1
u
 . Because E (D 1
l )2 − Dl =
− 1 E 2Dl 2 − 1l
(Dl )2 Dl (D ) D
 u l  u l 2Du
E 2D(D−2D − E D1l = E 2D(D−2D
2
  1   1
l )2 + Dl
] l
l )2 ] + E Dl ,by D ≥ 1, we have E − ≤ 2E[Du − Dl ] + E[ d1l ].
l
(D ) 2 Dl
√ √
From the proof of Lemma 7, we have E[Du − Dl ] ≤ β − α + 2 τσ and E[ d1l ] ≤ α1 + α1 τσ. Thus, combining
l
the inequalities, we have E[ √ Du l ] ≥ √ √
1
√ .
2D −D 2β−2α+4 τσ+ α1 + α1 τσ
3
u 2
3
u 2
3
l) 2 √ u
3
3 3 √
E[ (DDl) ] = E[ (D ) D−(D
l + Dl ]. Because 1 ≤ Dl almost surely, E[ (D ) 2 ] ≤ E[(Du ) 2 − (Dl ) 2 ] + E[ Dl ].
Dl
√ √ 3 3
By Jensen’s inequality, E[ Dl ] ≤ α. Because 1 ≤ Dl ≤ Du , we also have E[(Du ) 2 − (Dl ) 2 ] ≤ E[(Du )2 −
√ √
(Dl )2 ]. By Lemma 8, we also have E[(Du )2 ] ≤ β2 + σ2 + τσs , and E[(Dl )2 ] ≥ α2 − τσs . Thus, we obtain
√ 3
u 2 √ √
E[(Du )2 − (Dl )2 ] ≤ β2 − α2 + σ2 + 2 τσs . Therefore, E[ (DDl) ] ≤ β2 − α2 + σ2 + 2 τσs + α. Consequently,
the inequalities in Equation (27) follow.
In the following, given reorder point s̃t and order quantity w̃t in the CAOD, when Assumptions (1–8) hold
q
and τ = 2K h
+ vu is set, we claim that the daily average cost of the CAOD for one cover period is at most
√ √ √
w̃t h(β − α + 2 τσ + 1) K(β + τσ) + p(β2 + σ2 + 2 τσs − α2 )
E[Ĉ (Dt , Lt , q̃t )|Xt ] ≤ + + (28)
2 w̃t

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
8

√ l
 (β2 + σ2 + τσs )vu p − s̃t pβ
h s̃t (β − α + 2 τσ + 1) − v β +
w̃t
u
To prove the inequality in Equation (28) via Proposition 2, it suffices to bound E[Du ], E[Du Dl ], E[ DDl ] and
E[(Du )2 ].
u
h i
i {Di }
By Lemma 8, let the length of the interval of interest be τ, then we have E[ DDl ] = E max mini {Di }
=
h i h i
maxi {Di }−mini {Di } maxi {Di }−mini {Di }
E mini {Di }
+ 1. By Assumption 2, E mini {Di }
+ 1 ≤ E[maxi {Di } − mini {Di }] + 1 =

E[maxi {Di }] − E[mini {Di }] + 1 ≤ β − α + 2 τσ + 1.

Thus, we have E[Du ] ≤ β + τσ, E[Du Dl ] ≥ E[(Dl )2 ] ≥ α2 , and E[(Du )2 ] = E[maxi {D2i }] ≤ β2 + σ2 +
√ u
τσs . Subsequently, by the linearity in Proposition 2 in terms of E[Du ], E[Du Dl ], E[ DDl ], and E[(Du )2 ], we
can establish the inequality in Equation (28).
Finally, to obtain the upper bound in Theorem 1, we consider the worst case and substitute s̃t and w̃t in
the inequality in Equation (28) with their upper or lower bounds to obtain the upper bound for each term
in that inequality. Particularly, for the terms pertaining to the holding cost, we substitute w̃t and s̃t in the
inequality in Equation (28) with the upper bound in the inequalities in Equation (27) and Lemma 7. For the
terms pertaining to the lost sales cost and the fixed ordering cost, we substitute the lower bounds of w̃t and
s̃t . For the last term pertaining to the lost sales cost within the VLT, we simply use the fact w̃t ≥ 1. 
Proof of Lemma 7 For the convenience of the proof, we assume that the number of days can be frac-
tional. Otherwise, we need to round off the days into integers, which results in a difference of no more than
one.
First, we state the lemma provided in Aven (1985).

L EMMA 8. Aven (1985) Let Xi , i = 1, ..., τ be random variables with mean µi and variance σ2i that are not
necessarily independent. We denote min{µi } as µ1 , max{µi } as µ2 , and max{σi } as σ. Thus, E[maxi {Xi }] ≤
√ √
µ2 + τσ, and E[mini {Xi }] ≥ µ1 − τσ.

This bound in Lemma 8 is not tight and was further improved in Bertsimas et al. (2006). However, we
use this bound because of the simplicity of the expression.
Let T V LT be the VLT time, Du = maxi∈T V LT {Dt [i]}, and Dl = mini∈T V LT {Dt [i]}. Because T V LT ≤ vu , by
√ √
Lemma 8, we have E[Dl ] ≥ α − vu σ and E[Du ] ≤ β + vu σ.
It It
Given the inventory level, It , it is easy to prove that Du
≤ γ(Dt , It ) ≤ Dl
.
1 1 α l l
Because Dl
= α Dl
= α1 ( α−D
Dl
+ 1) by the demand being positive, we have α1 ( α−D Dl
+ 1) ≤ α1 (α − Dl + 1).

Therefore, E[ D1l ] ≤ α1 E[α − Dl + 1] ≤ α1 + α1 vu σ.
¯ √ ¯ √
Thus, when It ≤ 1+α√ltvu σ , E[γ(Dt , It )] ≤ E[ DItl ] = It E[ D1l ] ≤ It ( α1 + α1 vu σ) ≤ 1+α√ltvu σ ( α1 + α1 vu σ) = l¯t .
¯
Because s̃t = arg minIt {E[γ(Dt , It )] ≥ l¯t }, we have s̃t ≥ 1+α√ltvu σ .

For the RHS, by Jensen’s inequality, we have E[ D1u ] ≥ E[D1 u ] ≥ β+√1 vu σ . Therefore, when It ≥ l¯t (β + vu σ),
√ √
E[γ(Dt , It )] ≥ E[ DItu ] = It E[ D1u ] ≥ It β+√1 vu σ ≥ l¯t (β + vu σ) β+√1 vu σ = l¯t . Therefore, s̃t ≤ l¯t (β + vu σ). 

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
9

Proof of Lemma 4 E[γ(Dt , It = 1)] is the expectation of the days before one demand arises. By Cheby-
σ2 σ2
shev’s inequality, P(Dt = 0) ≤ P(|Dt − E[Dt ]| ≥ E[Dt ]) ≤ E[D]2
≤ α2
.
If the first day has zero demand, then the number of successive days that also have zero demand follows a
geometric distribution with parameter 1 − ε. Therefore, by Assumption 3, the expectation of the days before
1 2
one demand arises is no more than P(dt = 0)( (1−ε) ) + 1 ≤ 1 + α2 (1−ε)
σ
. 
Proof of Theorem 2 The outline of the proof is similar to that of Theorem 1. First, we claim that if
Assumptions 1 and 3 hold, then we have the following bounds for w̃t and s̃t :
αl¯ √
√ u t ≤ s̃t∗ ≤ l¯t (β + vu σ) (29)
1 + v σ + γ̄σ,ε α
 q √ 2 r 2K
− 2β + 3 τσ − α + q √ √ ≤w̃t∗ (30)
1
β − α + 2 τσ + α + α τσ 1 h
 β2 + σ2 + √τσ r
√  2K
s 2
≤ 2 −α + α (31)
max{1 − ασ2 , ε+ } h
σ2 K
where γ̄σ,ε = α2 (1−ε)
, and τ = h

The proofs for the inequalities in Equations (29) and (31) are as follows: When we assume E[Dl ] to be
zero, we can still bound E[Dl ] and E[Du ] using Lemma 8 with a different τ. Hence, the upper bound of s̃t is
the same as that in Section 5.2 except for the value of τ. Because the scheduled cover period is never larger
K
than h
(otherwise, it is better to split the cover period into two periods), we can set τ = Kh .
For the lower bound of s̃t , by the definition of γ̄σ,ε , E[γ(Dt , 1)] ≤ E[γ(Dt , 1)|Dt ≥ 1] + γ̄σ,ε . As shown in

the proof of Lemma 7, E[γ(Dt , 1)|Dt ≥ 1] ≤ α1 + α1 vu σ. Furthermore, we have E[γ(Dt , It )] ≤ It E[γ(Dt , 1)].
¯ ¯
Therefore, when It ≤ √ uαlt , E[γ(Dt , It )] ≤ l¯t . Thus, √ uαlt
1+ v σ+γ̄σ,ε α
≤ s̃t .
1+ v σ+γ̄σ,ε α
Dl
For the lower bound of w̃t , it suffices to provide the lower bound for E[ √ ]. Without the loss of gen-
2Du −Dl
l
erality, there is at least one demand within the cover period; therefore, Du ≥ 1. Thus, we have E[ √ Du l ] =
2D −D
√ u √ p
E[− 2Du − Dl + √ 2Du l ] = E[− 2Du − Dl + r 2 2 Dl ] ≥ − E[2Du − Dl ] + r 2 2 Dl . The last
2D −D − E[ Du − u 2 ]
Du (Du )2
√ 2
(D )
inequality is because − x and √x are convex functions and we use Jensen’s inequality. E[2Du − Dl ] ≤
√ l u l √
2β + 3 τσ − α. We also have E[ D2u − (DDu )2 ] = E[ D1u + D(D−D 1 u l 1
u )2 ] ≤ E[ Du ] + E[D − D ] ≤ E[ Du ] + β − α + 2 τσ.
u 1 1 1 1

The first inequality is by D ≥ 1. From the proof of Lemma 7, we have that E[ Du ] ≤ E[ Dl ] ≤ α + α τσ.
l
p √
Combining the above, we have E[ √ Du l ] ≥ − 2β + 3 τσ − α + √ √
2
1 1√
. Thus, we obtain
2D −D β−α+2 τσ+ α + α τσ
the upper bounds of the inequalities in Equation (31).
In the following, we define D+,t = Dt |Dt ≥ 1. Consequently, we define Du+ = maxt=1,...,τ {D+,t } and Dl+ =

mint=1,...,τ {D+,t }. Thus, we have E[Dl+ ] ≥ E[Dl ] ≥ α − τσ. For E[Du |Dl 6= 0], since E[Du ] = E[Du |Dl 6=
E[Du ]−E[Du |Dl =0]P(Dl =0) E[Du ]
0]P(Dl 6= 0) + E[Du |Dl = 0]P(Dl = 0), we have E[Du |Dl 6= 0] = P(Dl 6=0)
≤ P(Dl 6=0)
. Because

σ2 E[(Du )2 ]
P(Dl 6= 0) ≥ max{1 − α2
, ε+ }, E[Du+ ] = E[Du |Dl 6= 0] ≤ β+ τσ
2 . Similarly, E[(Du+ )2 ] ≤ P(Dl 6=0)

max{1− σ2 ,ε+ }
√ α
β2 +σ2 + τσs
2 .
max{1− σ2 ,ε+ }
α

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
10
q
u 2KDu
For the upper bound of w̃t , we claim that E[w̃t ] ≤ E[ DDl h
|Dl 6= 0]. The reasons are as follows: When
the demand is zero on Day t, we can always increase the demand to one to increase the total demand within
the scheduled cover period without extending the cover period. Therefore, when considering the maximum
order quantity, we only need to consider the case where the demand is positive. Therefore, it suffices to
u√ u√
bound E[ DDl Du |Dl 6= 0]. From the proof of Theorem 1, we have E[ DDl Du |Dl 6= 0] ≤ E[(Du+ )2 − (Dl+ )2 ] +
p p √
E[ Dl+ ]. We have E[ Dl+ ] ≤ α. Thus, substituting the upper bound for E[Du+ ] and the lower bound for
u√ 2 2 √ √
E[Dl+ ], we have E[ DDl Du |Dl 6= 0] ≤ β +σ +σ2 τσs − α2 + α. Consequently, we obtain the lower bound in
max{1− 2 ,ε+ }
α
the inequalities in Equation (31).
Subsequently, to prove Theorem 2, we derive the following bound for the cost, given the reorder point
and the reorder quantity.
Assuming assumptions 1 and 3 hold, given reorder point s̃t and order quantity w̃t , the daily average cost
of the CAOD for one cover period is at most
√ √ √
w̃t h(β+ − α + 2 τσ + 1) K(β + τσ) + p(β2 + σ2 + 2 τσs − α2 )
E[Ĉ (Dt , Lt , q̃t )|Xt ] ≤ + + (32)
2 w̃t

√  (β2 + σ2 + τσs )vu p − s̃t pβ
h s̃t (β+ − α + 2 τσ + 1) − v β + l
+ γ̄σ,ε h(w̃t2 + s̃t2 )
w̃t
q
σ2
where γ̄σ,ε = α2 (1−ε) , τ = 2K h
, and β+ = β
σ2
.
max{1− 2 ,ε+ }
α
The proof of the inequality in Equation 32 is as follows: When compared with the case where all demands
are positive, some days with zero demand increase the holding cost, whereas the daily average lost sales
cost and the fixed ordering cost do not increase. Therefore, for the upper bound in the inequality in Equation
28, the second term for the daily average lost sales cost within the cover period and the fixed ordering cost
and the last term for the daily average lost within the VLT are still valid. Consequently, we only need to
modify the first and third terms pertaining to the holding cost.
For the first term, we divided the daily average holding cost within the cover period into two parts. The
first part is the holding cost on the days with positive demand. The second part is the holding cost on the
w̃t Du+ h
days with zero demand. For the former part, the daily average cost is at most E[ 2Dl+
]. By the proof of
E[D] β
the inequality in Equation (31), we have E[D+ ] ≤ P(D6=0)
, and we know that β ≤ 2 . We also have
max{1− σ2 ,ε+ }
α
Du+ Du −Dl
E[D+,t ] ≥ α. By the fact that Var(D+,t ) ≤ Var(Dt ), Var(D2+,t ) ≤ Var(Dt2 ), we have E[ ] = E[ +Dl + ] + 1 ≤
Dl+
√ +
E[Du+ − Dl+ ] + 1 ≤ β+ − α + 2 τσ + 1. For the second part, the holding cost each day is at most w̃t h and
the expectation of the length of days with zero demand is at most w̃t γ̄σ,ε . Therefore, the expectation of the
daily average holding cost on the days with zero demand is at most w̃t2 hγ̄σ,ε .
As above, we also divide the daily average holding cost within the VLT into two similar parts. Similarly,
for the first part with positive demand, we can use the same concept to bound E[Du h Ds̃tl ]. For the second part
with zero demand, the expectation of the daily average cost is at most s̃t2 hγ̄σ,ε . Combining the aforementioned
results yields the inequality in Equation (32).

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
11

Finally, to obtain the upper bound in Theorem 2, we consider the worst case and substitute s̃t and w̃t in
Equation (32) with their upper or lower bounds to obtain the upper bound for each term in Equation (32).
Particularly, for the terms pertaining to the holding cost, we substitute the upper bounds of w̃t and s̃t . For
the terms pertaining to the lost sales cost and the fixed ordering cost, we substitute the lower bounds of w̃t
and s̃t . For the fourth term in Equation 32, we use the fact that w̃t ≥ 1. Thus, we establish Theorem 2. 
Proof of Theorem 3 The concept of the proof is to first construct a distribution of the demand such

that the daily average cost of the optimal policy is 2ᾱKh. It is inspired by the tight lower bound of a
single-period newsvendor problem in Janakiraman et al. (2013).
Let the demand on each day be i.i.d and follow a three-point distribution with masses on points 0, ᾱ, and
B with probability z1 , 1 − z1 − z2 , and z2 , respectively. By the assumptions of the mean and the variance, we
have

(1 − z1 − z2 )ᾱ + Bz2 = ᾱ

(1 − z1 − z2 )ᾱ2 + B2 z2 = ᾱ2 + σ̄2

σ̄2 2 σ̄2
By solving the above two equations, we have z1 = Bᾱ
, 1 − z1 − z2 = 1 − ᾱ(B−
σ̄
ᾱ)
and z2 = B(B−ᾱ)
.
Subsequently, we consider a policy that whenever the inventory level tends to zero, we place an order
q
with size Q1 = 2Khᾱ , which is the same size as in the EOQ model with demand ᾱ. In the following, we
consider B to be a sufficiently large number (larger than the order size), then the probability that the demand
in one day is zero or B is very small. Whenever the demand is B on some day, the inventory approaches
zero, and we place another order. Therefore, the expected daily average cost of this policy is at most

r
pσ̄2 2σ̄2 K σ̄2 K 2K
2ᾱKh + + + (33)
B − ᾱ Bᾱ − σ̄ 2 B(B − ᾱ) ᾱh
The first term of Equation (33) is the daily average cost of the EOQ model, and the second term is the
upper bound for the extra lost sales cost, which is z2 Bp. The third term is the upper bound for the extra
holding cost, which can be derived as follows: Let Textra be the expected number of days between two
consecutive days with a positive demand. Moreover, Textra follows a geometric distribution, and Textra =
(1 − z1 ) ∑∞j=1 jz1j = z1
.
Therefore, the extra holding cost between two consecutive days with a positive
1−z1 q
demand is at most QhTextra . By the EOQ model, the number of days with a positive demand is at most 2K ᾱh
.
q
2
Hence, the total extra holding cost is at most QhTextra 2K
ᾱh
= B2ᾱ−
σ̄ K
σ̄2
. The last term in Equation (33) is the
upper bound for the expectation of the extra average fixed ordering cost, which can be derived as follows:
The extra average fixed ordering cost can only occur when the demand is B within the cover period, and the
extra daily fixed ordering cost is at most K. The probability that the demand is B within the cover period is
q
at most 2K z2 by the union bound. Hence, the upper bound for the expectation of the extra fixed ordering
q ᾱh q
σ̄2 K
cost is 2Kᾱh
z2 K = B(B−ᾱ)
2K
ᾱh
.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
12

In Equation (33), because B can be arbitrarily large, when B tends to infinity, the upper bound for the
average cost of the policy reduces to the first term. Because this upper bound is also an upper bound for the

cost of the optimal policy, we conclude that the cost of the optimal policy is at most 2ᾱKh when B tends
to infinity. Thus, we prove that this bound is tight. 
Proof of Claim 1 By Theorem 2 in Janakiraman et al. (2013), the holding cost and the lost sales cost
σ̄2
on each day are at least ᾱ
, for any inventory level. We ignore the average fixed ordering cost for the lower
bound. 
Proof of Proposition 3 By one of Theorem 1 and Theorem 2, the daily average cost grows in the order
of o(σ3 + σs ). When Assumption 4 holds, this order becomes o(Γ1.5 σ3D + Γ1.5 σ2D ) = o(Γ1.5 σ3D ). In the setting
of Clarim 1, replacing Γ with σ̄−2/3 , we have that the daily average cost is at most o(σ̄2 ). 
Proof of Proposition 4 See the proofs of Equations (27) in Theorem 1 and (31) in Theorem 2. 
Proof of Proposition 5 We prove proposition 5 by contradiction. Suppose the cover period is larger than
p p
h
, then consider Day i, where Day i is more than h
days behind the beginning of the cover period. In this
case, the holding cost for the demand on Day i is more than Di hp h = Di p, however, if the demand is lost on
Day i, the lost sales cost is Di P. Since the lost sales cost is smaller than the holding cost, the cover period
during the labeling process is no more than hp . 

D Comparison of the Costs of CAOD Obtained by the E2E and E2E-NFC Approaches
By analyzing the CAOD, we compare our E2E framework with the E2E-NFC method, where the fixed
ordering cost is zero and the order timing is pre-specified. First, we note that it is unreasonable to compare
the performance of the E2E and E2E-NFC methods directly because in the latter, the order timing is fixed.
Thus, the total cost in the E2E-NFC framework eventually becomes larger than the total cost by E2E method
as the fixed ordering cost increases. In addition, given the fixed ordering cost, when setting the length of
the review period in the E2E-NFC method, there is a trade-off between the holding cost and the sum of the
daily average fixed ordering cost and the lost sales cost. Thus, when comparing the E2E method with the
E2E-NFC method, we appropriately set the fixed ordering cost and the review period such that both policies
have approximately the same number of orders within the planning horizon. Claim 2 quantities the averaged
optimal order quantity in the E2E-NFC framework in a special case. Claim 3 further compares the costs in
the E2E and E2E-NFC methods.

C LAIM 2. Let wtNFC be the averaged optimal order quantity in the E2E-NFC method and Tr be the length
of the review period. Let the VLT be zero, and let the unit lost sales cost satisfy b hp c ≥ Tr . Therefore, wtNFC =
E[∑t+T
i=t Di |Xt ].
r

Proof of Claim 2 If the demand on Day t is satisfied by the order arrived on Day t0 , it implies that the
cumulative holding cost for the demand on Day t is no larger than the lost sales cost of the demand on Day

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
13

t. Furthermore, hDt (t − t0 ) − pDt ≤ 0 is the sufficient and necessary condition that the demand on Day t
is satisfied by the order on Day t0 in the deterministic setting. When Dt > 0, hDt (t − t0 ) − pDt ≤ 0 ⇐⇒
h(t − t0 ) − p ≤ 0, which is independent of Dt , the order quantity to satisfy the demand on Day t in the CAOD
min{t+Tr ,t+b hp c}
is E[Dt ], if t ≤ t0 + b hp c, and 0 otherwise. Thus, we have wtNFC = E[∑i=t Di |Xt ]. 
Claim 2 states that when the VLT is zero, in the CAOD, the order quantity is no more than the mean of
the total demand within the review period, which is independent of the unit lost sales cost. Therefore, when
the unit lost sales cost is large and the demand has a large variance, the E2E-NFC method has the risk of
suffering a large lost sales cost. Note that the condition, b hp c ≥ Tr , holds commonly in practice, because the
review period is typically one–two weeks and b hp c is generally larger than 20. If the review period is set
as one week, a nonnegligible portion of the demand within the week is lost in the E2E-NFC framework.
Although the VLT is greater than zero in reality, this special case might explain the observation that although
the E2E-NFC method shows excellent performance in production at JD.com, it occasionally has a risk of
lost sales in practice. However, our proposed E2E method leads to a relatively much lower lost cost. In this
special case, when the VLT is zero, because our E2E method sets the review period as one day and we have
instantaneous replenishment, the lost sales can only occur at the end of the cover period. Hence, we obtain
a much lower lost sales cost.
Moreover, we consider the following more general special case where the VLT is positive but fixed
and the demand on each day is i.i.d following Poisson’s distribution. Thus, Claim 3 states that the differ-
ence between the lost sales costs per order achieved by the E2E and E2E-NFC methods increases in order

O ( Tr ), where Tr is the length of the review period in the E2E-NFC method.

C LAIM 3. Let Assumption 1.G hold, the VLT be fixed, and the demand on each day be i.i.d. and follow
Poisson’s distribution. Let Tr be the length of the review period in the E2E-NFC method, and Tr ≤ b hp c. Let
E2E E2E−NFC
Clost and Clost denote the expected lost sales cost per order under the CAOD in the E2E and E2E-NFC
E2E−NFC

methods, respectively. Thus, Clost E2E
− Clost increases in order of O ( Tr ).
E2E E2E−NFC
Proof of Claim 3 As Clost does not depend on the review period Tr and Clost is an increasing func-
E2E−NFC E2E
tion of Tr , when Tr is larger than a certain value, the difference Clost − Clost is positive. Then, to prove
E2E−NFC

the lemma, it suffices to prove that Clost increases in O ( Tr ). Let lt denote the VLT at time t, then, the
cover period for the order placed on Day t starts on Day t + lt and ends on Day t + Tr + lt+Tr . Then, the total
r t+Tr t+T +l
lost sales within one order is (∑i=t+lt
Di − otNFC − wtNFC )+ , where otNFC is the inventory level at the review
point t and wtNFC is the order quantity in the CAOD in E2E-NFC. Then, by the definition of the CAOD, we have
t+T +l
r t+Tr r t+Tr t+T +l
r t+Tr t+T +l
otNFC + wtNFC = E[∑i=t+lt
Di ]. Therefore, the lost sales per order are E[(∑i=t+lt
Di − E[∑i=t+lt
Di ])+ ].
Let Di be i.i.d. and follow Poisson’s distribution, then E[(∑ni=1 Di − E[∑ni=1 Di ])+ ] increases in

order O ( n). To understand this, we first present the upper bound. E[(∑ni=1 Di − E[∑ni=1 Di ])+ ] ≤
p p
E[| ∑ni=1 Di − E[∑ni=1 Di ]|] = E[ (∑ni=1 Di − E[∑ni=1 Di ])2 ] ≤ E[(∑ni=1 Di − E[∑ni=1 Di ])2 ]. Because

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
14

E[(∑ni=1 Di − E[∑ni=1 Di ])2 ] is the variance of the sum of Poisson’s distribution, which increases in order

O (n), E[(∑ni=1 Di − E[∑ni=1 Di ])+ ] increases at most in order o( n). In the following, we provide the
lower bound. Because ∑ni=1 Di follows Poisson’s distribution, whose skewness is positive, the right
tail, E[(∑ni=1 Di − E[∑ni=1 Di ])+ ], is larger than the left tail, E[(∑ni=1 Di − E[∑ni=1 Di ])+ ]. Thus, we have
E[(∑ni=1 Di − E[∑ni=1 Di ])+ ] ≥ 21 E[| ∑ni=1 Di − E[∑ni=1 Di ]|]. Concurrently, the ratio of the mean absolute
λPoi +0.5
2λPoi
deviation to the standard deviation of Poisson’s distribution with parameter λPoi is Poi . When λPoi is
eλ λPoi !
a positive integer, this ratio is minimized at λ = 1, which is approximately 0.736. As the standard deviation
√ √
increases in order O ( n) when λ increases to nλ, E[(∑ni=1 Di − E[∑ni=1 Di ])+ ] increases in order O ( n).
E2E−NFC
Combining the above results, when Di is i.i.d. and follows Poisson’s distribution, Clost increases in

order O ( Tr ). 

E Formulation of the Labeling Process


To obtain the optimal order decisions in history, we solve the deterministic model in problem (P2), in which
dt and lt are the realizations of the demand and VLT on Day t, respectively.
−1
min ∑ (Kat + hyt + pzt ) (P2)
qt ,yt ,at ,zt
t=−TH

s.t. yt = (yt−1 + ∑ qi − dt )+ ∀ t = 0, ..., T


i∈{ j: j+l j =t}

zt = (dt − ∑ qi − yt−1 )+ ∀ t = 0, ..., T


i∈{ j: j+l j =t}

qt ≤ Mat ∀ t = 0, ..., T

yt , zt , qt ≥ 0 ∀ t = 0, ..., T

at ∈ {0, 1} ∀ t = 0, ..., T

yt , zt , qt ∈ Z+ ∀ t = 0, ..., T

F Supplementary Materials for Section 5.6


Proposition 6 provides the bounds for the order quantity with the general unit penalty cost.
q
P ROPOSITION 6. Set E = 2K h
+ vu + Kh . For the fast-moving items, given the fixed ordering cost as K, the
output order quantity of CAOD for one order can be bounded as follows.
r r
1 2K ∗ 2 2 2
√ √ 2K
q √ √ ≤ min{w̃t ≤ (β − α + σ + 2 τσs + α) , (34)
1 1
2β − 2α + 4 τσ + α + α τσ h h

p(β + τσ)
}. (35)
h

For the slow-moving items, given the fixed ordering cost as K, the order quantity for one order can be
bounded as follows.

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
15

 q √ 2 r 2K
− 2β + 3 τσ − α+ q √ √ ≤ w̃t∗ (36)
1
β − α + 2 τσ + α + α τσ1 h
 β2 + σ2 + √τσ r
√  2K
s 2
≤ min{ 2 −α + α , (37)
max{1 − ασ2 , ε+ } h

p(β + τσ)
} (38)
h

Based on Proposition 6, we have the upper bounds of the daily average cost with general unit penalty
cost in Theorems 4 and 5.
q
2K
T HEOREM 4. Let Assumptions 1 (except 1.E) and 2 hold, and set E = h
+ vu . Then, the daily average
cost of CAOD for one cover period is at most:
√ √ √
w̃t,2 h(β − α + 2 τσ + 1) K(β + τσ) + p(β2 + σ2 + 2 τσs − α2 )
E[Ĉ (Dt , Lt , q̃t )|Xt ] ≤ + +
2 w̃t,1
√  √
h s̃t,2 (β − α + 2 τσ + 1) − vl β + (β2 + σ2 + τσs )vu p − s̃t,1 pβ

¯ √
, where α, β, σ and σs depend on Xt in Assumption 1. s̃t,1 = 1+α√ltvu σ , s̃t,2 = l¯t (β + vu σ), w̃t,1 =
q √ √ q 2K p(β+√τσ)
√ √
1
1 1 √
2K
h
, and w̃t,2 = min{w̃t

≤ (β 2
− α2
+ σ2
+ 2 τσs + α) h , h
}.
2β−2α+4 τσ+ α + α τσ

T HEOREM 5. Let Assumptions 1 (except 1.E) and 3 hold, then the daily average cost of the CAOD for one
cover period is at most
√ √ √
w̃t,2 h(β+ − α + 2 τσ + 1) K(β + τσ) + p(β2 + σ2 + 2 τσs − α2 )
E[Ĉ (Dt , Lt , q̃t )|Xt ] ≤ + +
2 w̃t,1
√  √
h s̃t,2 (β+ − α + 2 τσ + 1) − vl β + (β2 + σ2 + τσs )vu p − s̃t,1 pβ + γ̄σ,ε h(w̃t,2
2 2
+ s̃t,2 )

2
q
¯ √ 
σ
, where γ̄σ,ε = α2 (1−ε) ,E= 2K
h
+ vu + Kh , β+ = β
2 , s̃t,1 = √ αlt , s̃
1+ vu σ+γ̄σ,ε α t,2
= l¯t (β + vu σ), w̃t,1 = −
max{1− σ2 ,ε+ }
α
p √ q  2 2 √
β +σ + τσs √ q 2K p(β+√τσ)
2β + 3 τσ − α + √ √
2

2K
h
, and w̃t,2 = min{ 2 − α2 + α h
, h
}.
β−α+2 τσ+ α1 + α1 τσ max{1− σ2 ,ε+ }
α
α, β, σ, and σs depend on Xt in Assumption 1.

The proofs of Proposition 6, Theorems 4 and 5 directly replace the upper bounds of the length of the
cover period with the new upper bounds in Proposition 5.

G Challenges for Designing Asymptotically Optimal Loss Function in E2E Training


The difficulty in designing a loss function that results in the optimal decisions is caused by two main
factors. First, the distribution information of the uncertainty is partially lost in the optimal solutions for
each scenario (e.g., in a multiperiod newsvendor problem with a random VLT, given the optimal order
quantity and the order timing, we cannot recover the value of VLT. Furthermore, given the total demand
for one order, we cannot recover the demand on each day.). Consequently, the optimal decisions for each

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
16

scenario are insufficient statistics for the overall optimal decisions. Second, given the uncertainty, closed-
form solutions are unavailable for the general case (e.g., there is no closed-form solution for a multiperiod
newsvendor problem with random VLT and fixed ordering cost.). Thus, although in some cases, the optimal
decisions for each scenario are sufficient statistics for the overall optimal decisions, the exact form of the
loss function is unavailable.
In contrast, the MSE loss function performs very well in practice. Therefore, using the MSE as the loss
function and evaluating the bounds for the CAOD are of interest for the proposed E2E framework.

H Sensitivity Analysis
In this section, we present the sensitivity analysis results of the E2E framework, which demonstrate its
robustness. We randomly sample 200 out of the 2000 SKUs to save time. The sensitivity analysis is per-
formed on two types of parameters in the E2E model. The first types are the cost parameters: unit lost sales
cost, unit holding cost, and unit fixed ordering cost. The second types are the hyper-parameters in the neural
network: number of hidden layers, weight parameters in the objective function, activation functions, and
learning rate during the training.

H.1 Cost parameters


For this part, we fix the unit holding cost as one and change the fixed ordering cost and the unit lost sales cost
to perform the sensitivity analysis. Note that these parameters depend on the categories of the products in
practice. The unit lost sales cost can be interpreted as the unit sales price in practice, and the fixed ordering
cost controls the frequency of orders. Because in the E2E-NFC benchmark, the review period is set as one
week, the sensitivity analysis of the fixed ordering cost demonstrates the advantage of the E2E method over
the E2E-NFC framework.
The results in Figure (4) show that the performance of our E2E approach is robust when the cost param-
eters vary in a reasonable range, as listed in Table 3.

Default range
Lost sale cost per unit 100 {50,75,100,125,150}
Fixed ordering cost 50 {50,75,100,125,150}
Table 3 Sensitivity analysis of cost

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
17

(a) (b)
Figure 4 Sensitivity analysis of the unit lost sales cost and fixed ordering cost

In Figure (4), the E2E method yields the lowest total cost, whereas the (r, Q) policy shows the highest
total cost. The decomposed costs for all parts are expressed in Figures (5) and (6), which are consistent with
the conclusion that our E2E framework mainly reduces the lost sales cost. The third subplot in Figure 5 also
shows that the total fixed ordering cost with the E2E method increases gradually compared to those with
the other benchmarks. Figure 6 shows that the lost sales cost is the main cause of the increase in the total
cost when the unit lost sales increase. This is because the setting of our experiments satisfies Assumption
1.E in Section 5, and thus, the labeled results remain the same when the unit lost sales cost increases. The
variations between the fixed ordering costs and holding costs of the E2E and E2E-NFC methods are due to
the randomness in the training process.

(a) (b) (c)


Figure 5 Detailed results for the sensitivity analysis of the fixed ordering cost

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
18

(a) (b) (c)


Figure 6 Detailed results for the sensitivity analysis of the unit lost sales cost

H.2 Hyper-parameters
In this section, we perform the sensitivity analysis on the following hyper-parameters in the model. The
ranges of parameters are shown in Table 4.

Default Range
Learning rate 0.001 {0.01, 0.001, 0.0001}
Hidden layer 100 {10,50,100}
Activation Linear {Linear, RELU}
Weight(λ1 ,λ2 ) in objective function 0.001 {0.1,0.01,0.001}
Table 4 Range for the Sensitive Analysis

The results for the test period are summarized in Tables (5,6,7,&8). We also attach the cost for the
unchanged (r, Q) policy and PTO policy as a reference in each table.

Policy Total cost Holding cost Lost sales cost Fixed ordering cost
Relu 3096.47 386.47 2542.00 168.00
Linear 4091.21 343.46 3587.00 160.75
E2E-NFC 5852.59 452.03 5259.82 140.75
(r, Q) 5588.84 193.09 5280.50 115.25
PTO 5631.43 250.72 5187.55 193.17
Table 5 Sensitive analysis on Activation Function

Policy Total cost Holding cost Lost sales cost Fixed ordering cost
10 layers 3749.70 269.95 3327.50 152.25
50 layers 3096.47 386.47 2542.00 168.00
100 layers 3154.33 366.08 2618.50 169.75
E2E-NFC 5852.59 452.03 5259.82 140.75
(r, Q) 5588.84 193.09 5280.50 115.25
PTO 5631.43 250.72 5187.55 193.17
Table 6 Sensitive analysis on number of Hidden layers

Electronic copy available at: https://ssrn.com/abstract=3888897


Mo, Meng and Max: E2E Deep Learning for Inventory Management with Fixed Ordering Cost
19

Learning rate Total cost Holding cost Lost sales cost Fixed ordering cost
0.01 2271.36 467.86 1612.00 191.50
0.001 3096.47 386.47 2542.00 168.00
0.0001 3438.61 317.61 2967.00 154.00
E2E-NFC 5852.59 452.03 5259.82 140.75
(r, Q) 5588.84 193.09 5280.50 115.25
PTO 5631.43 250.72 5187.55 193.17
Table 7 Sensitive analysis on learning rate

The objective function of the training process is defined as:

min :
ϑ
∑ {λ1 L (Out1, lt ) + λ2 L (Out2, rt∗ ) + λ3 L (Out3, ot∗ ) + L (Out4, wt∗ )} (39)
t∈TI

Value of λ1 , λ2 Total cost Holding cost Lost sales cost Fixed ordering cost
0.1 3096.47 386.47 2542.00 168.00
0.01 2935.45 416.45 2357.50 161.50
0.001 3772.25 346.00 3272.50 153.75
E2E-NFC 5852.59 452.03 5259.82 140.75
(r, Q) 5588.84 193.09 5280.50 115.25
PTO 5631.43 250.72 5187.55 193.17
Table 8 Sensitive analysis on weights

Tables 5–7 show that when we change the number of layers, learning rate, and activation function, our
E2E framework significantly reduces the total cost (particularly the lost sales cost) compared to the E2E-
NFC method and other benchmarks. It is worth mentioning that the costs from the benchmarks are slightly
different from the values in Table 1 because we sample 200 SKUs out of 2000 SKUs for the sensitivity
analysis. Table 8 shows that when the values of the weight parameters are very close to zero, i.e., when the
weights of intermediate outputs Out1 and Out2 in Figure 1 are close to zero, the E2E method shows the
worst performance. This validates the rationality of the intermediate outputs in our network structure, as
shown in Figure 1.

Electronic copy available at: https://ssrn.com/abstract=3888897

You might also like