0% found this document useful (0 votes)
20 views6 pages

Makale 2

Uploaded by

ynshp1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

Makale 2

Uploaded by

ynshp1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Available online at www.sciencedirect.

com

ScienceDirect
A first IFAC PapersOnLine 56-2 (2023) 1797–1802
attempt to enhance Demand-Driven
A
A first
first attempt
attempt to
to enhance Demand-Driven
AMaterial
first attempt to enhance
Requirements enhance Demand-Driven
Demand-Driven
Planning through
Material
AMaterial
first Requirements
attempt to
Requirements enhance Planning through
Demand-Driven
Planning
Material reinforcement
Requirements learning Planning through
through
Material reinforcement
Requirements
reinforcement
reinforcement
learning
Planning
learning
learning through
∗ ∗
Youssef reinforcement
Lahrichi David Damand learning Marc Barth ∗∗
Youssef Lahrichi ∗∗ David Damand
Youssef Lahrichi Stéphane ∗∗∗
∗ Marc Barth ∗
Mornay
Youssef ∗ David
∗ Damand ∗ Marc Barth
∗∗∗ ∗
Youssef Lahrichi
Lahrichi David
Stéphane
DavidMornay
Stéphane

Damand
Damand
Mornay ∗∗ Marc

Marc Barth Barth ∗∗
Youssef Lahrichi DavidMornayDamand ∗∗
∗∗ Marc Barth
∗ Stéphane
Stéphane Mornay

HuManis laboratory, EM Strasbourg
Stéphane Business
Mornay ∗∗ School, 61 avenue de la
∗ HuManisForêt laboratory,
Noire, EM Strasbourg
F-67000, Business
Strasbourg, School,
France. 61
(e-mail: avenue de la
∗ HuManis laboratory, EM Strasbourg Business School, 61 avenue de la
∗ HuManis
HuManis laboratory,
Forêt Noire,
laboratory, EM
EM Strasbourg
F-67000,
youssef.lahrichi.contact@ Business
Strasbourg,
Strasbourg Business School,
France.
School,
gmail.com). 61
(e-mail:
61 avenue
avenue de
de la
la

HuManis Forêt
Forêt Noire,
laboratory,
Noire, F-67000,
EM
F-67000, Strasbourg,
Strasbourg Business
Strasbourg, France.
School,
France. (e-mail:
61
(e-mail: avenue de la
∗∗ Forêt
Business Support youssef.lahrichi.contact@
Noire, F-67000, Strasbourg,
Department, FM Logistic
youssef.lahrichi.contact@ gmail.com).
France. (e-mail:
Corporate SAS, France
gmail.com).
∗∗ Forêt
∗∗ Business Support Noire, F-67000,
youssef.lahrichi.contact@
Department,
youssef.lahrichi.contact@Strasbourg,
FM Logistic France.
gmail.com). (e-mail:
Corporate
gmail.com). SAS, France
∗∗ Business
∗∗
Business Support Department,
youssef.lahrichi.contact@
Support Department, FM
FM Logistic Corporate
gmail.com).
Logistic Corporate SAS,
SAS, France
France
∗∗
Business Support Department, FM Logistic Corporate SAS, France
Abstract: Production Business planning methods, which
Support Department, FM are meant
Logistic to schedule
Corporate SAS,efficiently
France production
Abstract:
orders
Abstract: to meetProduction
costumer
Production planning
demands,
planning methods,
offer the
methods, which
which are meant
possibility
are meantto fixto
to schedule
some parameters
schedule efficiently
efficiently toproduction
better fit
production
Abstract:
orders
Abstract:
to to
a specific Production
meet costumer
Production
industrial planning
demands,
planning
context. methods,
offer
methods,
These which
the
which
parameters are
are meant
possibility beto
canmeant to schedule
fix
to
fixed schedule
some efficientlyto
parameters
efficiently
eitherparameters
according production
toproduction
better
the fit
user’s
orders
Abstract:
orders to
to meet
meet costumer
Production
costumer demands,
planning
demands, offer
methods,
offer the
which
the possibility
are meant
possibility to
to fix some
to schedule
fix some efficientlyto
parameters to
to better
production
better fit
fit
to
orders
prior
to a specific
to meet
knowledge,
a specific industrial
costumer
or according
industrial context.
demands,
context. These
to These parameters
offer the
decisionparameters
support tools. can
possibility
can be
betofixed
fix
fixed either
some
Optimization according
parameters
eitheralgorithms,
according to
which
to the
better user’s
fit
require
orders
to
to a to meet
a specific costumer
industrial demands,
context. These offer the possibility
parameters can betofixed
be fix some parameters to thebetter user’s
fit
prior
prior knowledge,
specific
knowledge
knowledge, or
of according
industrial
or context.
costumer
according to decision
These
demand,
to decision support
parameters
are tools.
most-commonly
support tools. usedeither
can Optimization
fixed
Optimization either according
foralgorithms,
according which
decision-support.
algorithms,
to
to
which
the
the user’s
require
user’s
Fewer
require
to a specific
prior knowledge,
knowledge
knowledge, industrial
or
of
or context.
according
costumer
according to These
decision
demand, parameters
support
are can
tools.
most-commonly be fixed
Optimization
used either
for according
algorithms,
decision-support. to
which the user’s
require
Fewer
decision-support
prior knowledge tools
of based to
costumer on decision
demand, support
reinforcement
are tools. Optimization
learning
most-commonly and used
requiringforalgorithms,
no prior which
decision-support. knowledge require
Fewer of
prior knowledge,
knowledge or according to decision support tools. Optimization foralgorithms, which require
decision-support
prior
costumer
decision-supportdemandof
knowledge tools
of costumer
have
tools based
costumer
based
demand,
on
demand,
also been
on
are
reinforcement
are
suggested
reinforcement
most-commonly
learning
most-commonly
inlearning and used
recent years
and requiring
used for decision-support.
no prior
priormost
decision-support.
to parameterize
requiring no knowledge
knowledge
Fewer
Fewer
common of
of
prior
costumer knowledge
decision-support
decision-supportdemand of
tools costumer
have
tools based
also
based demand,
on
been
on are most-commonly
reinforcement
suggested
reinforcement inlearning
recent
learning and
years
and used
requiring
to for decision-support.
no
parameterize
requiring no prior
prior knowledge
most
knowledge Fewer
common of
of
production
costumer planning
demand systems.
have also This suggested
been paper investigates
in recent such
yearsa reinforcement-learning
to parameterize approach
decision-support
costumer
production demand tools
planning have based
also
systems. on
also been
This reinforcement
suggested
paper inlearning
in recent
investigates
recentsuch anda requiring
years to
to parameterizeno priormost
reinforcement-learning knowledge
most common
common of
costumer
to
production demand
parameterize planning have
Demand-Driven
systems. been
This suggested
Materials
paper Requirements
investigates years
Planning
such a parameterize
(DDMRP), which
reinforcement-learning most isapproach
common
a recent
approach
costumer
production
to demand
parameterize
production planning have
Demand-Driven
planning also
systems.
systems. been
This suggested
paper
Materials in recent
investigates
Requirements years
such a
Planning to parameterize
reinforcement-learning
(DDMRP), most
which is common
approach
a recent
production
to parameterize planning method This
Demand-Driven proven paper
Materials to be investigates
successfulsuch
Requirements to a reinforcement-learning
avoid
Planning stockouts while
(DDMRP), which approach
minimizing
is aa recent
production
to parameterize
production planning systems.
Demand-Driven This paper
Materials investigates
Requirements such a reinforcement-learning
Planning (DDMRP), which isapproach
is a recent
to
the on-handplanning
parameterize
production inventory.
planning method
Demand-Driven
methodDespiteproven
Materials to
approximate
proven to be
be successful
Requirements
and exact
successful to
to avoid
Planning stockouts
(DDMRP),
optimization
avoid stockouts while
methodswhichminimizing
while have recent
minimizing been
to
the parameterize
production
on-hand
production Demand-Driven
planning
inventory.
planning method
methodDespite Materials
proven to
approximate
proven to Requirements
be
be successful
and
successfulexactPlanning
to
to avoid (DDMRP),
stockouts
optimization
avoid stockouts methodswhich
while
while is
have a recent
minimizing
minimizing been
suggested
the on-hand in literature
inventory. toDespite
parameterize DDMRP,
approximate and a reinforcement
exact optimization learning approach
methods have remains
been
production
the on-hand
suggested
the on-hand in planning
inventory.
literature
inventory. method
to Despiteproven
parameterize
Despite toDDMRP,
approximate
approximate be successful
and
and a exactto avoid
reinforcement
exact stockouts
optimization
optimization learning while minimizing
methods
approach
methods have
have been
remains
been
to be
suggested investigated.
in We
literature suggest
to a SARSA
parameterize (State–action–reward–state–action)
DDMRP, a reinforcement learning algorithm
approach to the
remains
thebeon-hand
suggested
to
suggested in inventory.
literature
investigated.
in We
literature to Despite
suggest
to a approximate
parameterize
SARSA
parameterize DDMRP, andaa reinforcement
exact optimization
(State–action–reward–state–action)
DDMRP, reinforcement learning
learning methods
approach
algorithm
approach have tobeen
remains
the
remains
problem
to be and test
investigated. it
We on a
suggestrandomly-generated
a SARSA instance. The first
(State–action–reward–state–action) results are promising
algorithm to in
the
suggested
to be in literature
investigated. We toaparameterize
suggest a SARSA DDMRP, a reinforcement
(State–action–reward–state–action) learning approach
algorithm remains
to
problem
to be
regards
problem and
toand test
investigated.
the test it
We
use ofita ononsuggestrandomly-generated
reinforcement
a a SARSAlearning approach
randomly-generated instance.
instance. The
for the
The first
(State–action–reward–state–action)
dynamic
first results are promising
algorithm
parameterization
results are promising of the
to in
the
the
in
to be investigated.
problem
regards
problem to and
the
and test
use
test ofWe
it
ita onsuggest
a
reinforcement
on a a SARSA
randomly-generated
learning
randomly-generated (State–action–reward–state–action)
instance.
approach
instance.for The
the
The first
dynamic
first results algorithm
are
parameterization
results are promising
promising to
of the
in
the
in
DDMRP.
regards to the use ofita reinforcement learning approach for the
problem
regards
DDMRP.
regards toand
to the test
the use
use of
of aa on a randomly-generated
reinforcement
reinforcement learning
learning approach instance.
approach for Thedynamic
for the
the dynamic
dynamic
parameterization
first results are promising
parameterization
parameterization
of
of the
in
of the
the
DDMRP.
Copyright
regards
DDMRP. to© the2023
useThe
of aAuthors. This is anlearning
reinforcement open access article under
approach for thethedynamic
CC BY-NC-ND license
parameterization of the
DDMRP.
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
Keywords:
DDMRP. Demand-Driven MRP, Reinforcement learning, SARSA, Inventory, On-Time
Keywords:
Delivery.
Keywords: Demand-Driven
Demand-Driven MRP,
MRP, Reinforcement
Reinforcement learning,
learning, SARSA,
SARSA, Inventory,
Inventory, On-Time
On-Time
Keywords: Demand-Driven
Delivery.
Keywords: Demand-Driven MRP, MRP, Reinforcement
Reinforcement learning,learning, SARSA,
SARSA, Inventory,
Inventory, On-Time
On-Time
Delivery.
Keywords:
Delivery. Demand-Driven MRP, Reinforcement learning, SARSA, Inventory, On-Time
Delivery.
Delivery. 1. INTRODUCTION The second step of DDMRP is concerned with determining
1. INTRODUCTION
1. INTRODUCTION The
the second
buffer step
sizes. of DDMRP
Within is
DDMRP, concerned with
the buffers determining
are consti-
1. INTRODUCTION
INTRODUCTION The
The second
second step
step of
of DDMRP
DDMRP is
is concerned
concerned with
with determining
determining
1. the
The
tuted
the buffer
second sizes.
step
of 3 superposed
buffer sizes. of Within
DDMRP
Within DDMRP,
zones is
DDMRP, concerned
(see Fig. the
the 1). buffers
with
buffers are consti-
determining
are consti-
Today more than 1. ever, enterprises
INTRODUCTION are forced to maintain The
the second
buffer step of
sizes. DDMRP
Within is(see
DDMRP, concernedthe with determining
buffers are consti-
Today more than ever, enterprises are forced to maintain tuted
the
tuted of
buffer
of 3
3 superposed
sizes.
superposed Within zones
DDMRP,
zones (see Fig.
Fig. the 1).
1). buffers are consti-
an
Todayon-hand
more inventory
than ever, (stock) to
enterprises cope
are with
forced theto uncertain
maintain the
tuted buffer
of 3 sizes.
superposed Within DDMRP,
zones (see Fig. the 1). buffers are consti-
Today
an
Today more
on-hand
more than
inventory
than ever,
ever, enterprises
(stock) to
enterprises are
cope
are forced
with
forced theto
to maintain
uncertain
maintain tuted of 3 superposed zones (see Fig. 1).
nature
an of demand
on-hand inventory dictated
(stock) by to the
cope VUCA
with environment
the uncertain tuted of 3 superposed zones (see Fig. 1).
Today
an
nature
an more
on-hand
of
on-hand than
inventory
demand
inventoryever, enterprises
(stock)
dictated
(stock)byby to the
to the are
cope
cope forced
with
VUCA
with the
theto maintain
uncertain
environment
uncertain
Bourne
nature (2021).
of demand To replenish
dictated the stock,VUCAa successful
environment pro-
an on-hand
nature
Bourne
nature of inventory
demand
(2021).
of demand To (stock)by
dictated
replenish
dictated by to the
the cope
the
stock, with
VUCA
VUCAa the uncertain
environment
successful
environment pro-
duction
Bourne planning
(2021). Tomethod
replenishschedules
the replenishment
stock, a successful orders
pro-
nature
Bourne
duction
Bourne of demand
(2021).
planning
(2021). To dictated
Tomethod
replenish
replenish by
thethe
schedules
the stock,
stock,VUCAa environment
a successful
successful
replenishment pro-
orders
pro-
in such
duction a way
planning that two contradictory goals are achieved.
Bourne
duction
in such
duction a(2021).
planning
way
planning Tomethod
that replenish
method
two
method
schedules
the stock,
schedules
contradictory
schedules
replenishment
a successful
replenishment
goals are
replenishment
orders
pro-
orders
achieved.
orders
Thesuch
in firstaone
way is that
to avoid
two stockouts
contradictory (service level
goals are KPI). The
achieved.
duction
in
The
in such
first
such a
aplanning
way
one
way is that
to
thatmethod
two
avoid
two schedules
contradictory
stockouts
contradictory replenishment
(servicegoals are
level
goals are KPI). orders
achieved.
The
achieved.
second
The goal
first one isis to
to minimize
avoid stock (service
stockouts (cost KPI) level (Karakutuk
KPI). The
in
Thesuch
second
The firstaone
way
firstgoal
one isis that
is to
to two stockouts
to minimize
avoid
avoid contradictory
stockouts
stock (service
(cost
(servicegoals
KPI) are
level
level achieved.
KPI).
(Karakutuk
KPI). The
The
and
secondOrnek (2022)).
The
second
and
secondfirstgoal
Ornek one
goal
goal
is
isis to
is to minimize
to
(2022)).
to minimize
minimize
stock
avoid stockouts
stock
(cost
(cost KPI)
stock (service
(cost level(Karakutuk
KPI)
KPI) KPI). The
(Karakutuk
(Karakutuk
and
and Ornek
second goal(2022)).
Demand-Driven
Ornek is toMaterial
(2022)). minimizeRequirements
stock (cost KPI) Planning(Karakutuk
is a re-
and Ornek (2022)).
Demand-Driven
and
cent Ornek (2022)).
production
Demand-Driven Material Requirements
planning
Material Requirements
method suggested Planning
by Ptak
Planning is aaand
is re-
re-
Demand-Driven
cent
Smith production
Demand-Driven
(2011) aboutMaterial
planning
Material
a decadeRequirements
method
Requirements
ago. suggested
Within Planning
by
Planning
the BOM is
Ptak
is aaand
(Bill re-
re-
of
cent
cent production
Demand-Driven
production planning
Material
planning method
Requirements
method suggested
suggested by
byinPtak
Planning is
Ptak aandre-
and
Smith
cent (2011)
production
Materials),
Smith (2011) about
which
about a decade
decade
planning
defines
a the ago.
method
product
ago. Within
suggested
Within the BOM
structure
the BOM
by Ptak(Bill
terms
(Billandof Fig. 1. The three buffer stock zones in DDMRP
of
cent
Smith production
(2011)
Materials),
Smith (2011) about
which
aboutplanning
a decade
defines
a method
the
decade ago.
product
ago. suggested
Within
Within the
structure
the byinPtak
BOM
BOM (Bill
terms
(Billandof Fig.
of 1. The three buffer stock zones in DDMRP
componentswhich
Materials),
Smith (2011)
Materials),
, DDMRP
about
which defines
a
defines
the
decade
defines the
components
product
ago.
product Within the
(or references)
structure
structure BOMin
in terms
(Bill
terms of Fig.
of Fig.
The
Fig.
1.
1.
red
1.
The
The
Thezone
three
three buffer
buffer
accounts
three buffer
stock
stock
for
stock the
zones
zones
safety
zones
in
in
in
DDMRP
DDMRP
zone. It is dimen-
DDMRP
components
Materials),
to be protected
components ,
which
, DDMRP
by
DDMRP defines
a defines
bufferthe components
product
stock.
defines All
components (or
structure
remaining
(or references)
in terms
references
references) of The red zone accounts for the safety zone. It is dimen-
Materials),
components which
, DDMRP defines the
definesproduct
components structure
(or inreferences
terms of The
references) Fig.
sionned1.
redThe
as
zonethree
follows: buffer
accounts stock
for the zones
safety in DDMRP
zone. It is dimen-
to be protected
components
arebemanaged
to protected , by
DDMRP
through
by a
a buffer
a JIT
buffer stock.
defines All
components
philosophy
stock. All remaining
(or references)
(Just-In-Time).
remaining references The
sionned
The red
red zone
as accounts
follows:
zone accounts for
for the
the safety
safety zone.
zone. It
It is
is dimen-
dimen-
components
to
are
to be
be protected
managed
protected , DDMRP
by
through
by a
a adefines
buffer
buffer JIT components
stock. All
philosophy
stock. All (or references)
remaining references
(Just-In-Time).
remaining references sionned as follows:
To minimize
are managed inventory
through acosts,
JIT itphilosophy
is recommended to buffer The
(Just-In-Time). sionned
sionnedred aszone
as accounts for the safety zone. It is dimen-
follows:
follows:
to
are
To
are be protected
managed
minimize
managed by
througha
inventory
through buffer
a stock.
JIT
costs,
acosts,
JIT All
philosophy
itphilosophy remaining
is recommended
recommended references
(Just-In-Time).
to buffer
(Just-In-Time).buffer
onlyminimize
To references which are
inventory critical
it is in terms of lead
to time sionned Red as zonefollows:
:= ADU.DLT.FLT + ADU.DLT.FLT .FV
are
only
To managed
To minimize
minimize
references through
inventory
which
inventory acosts,
JIT
costs,
are itphilosophy
is recommended
critical
it is recommended
in terms(Just-In-Time).
of to buffer
lead
to buffer
time Red zone :=    + ADU.DLT.F
  LT .FV
Ptak
only
To
and
onlyminimize
Ptak
only
Smith
references
references
and Smith
references
(2016).
which
inventory
which
(2016).
which
Positioning
are critical
costs,
are inthe
it is recommended
critical
Positioning
are critical
buffers
terms
intheterms
in terms
buffers
of
of
within
lead
to buffer
of within
lead
lead
the
time
time
the
time Red
Red zone := ADU.DLT.F
ADU.DLT.F
 ”Red base” LTLT + ADU.DLT.F
 ”Red LT .F
safety” .FVV
BOM
Ptak constitutes
and Smith the first
(2016). step of DDMRP.
Positioning the buffers within the Red zone
zone :=:= ADU.DLT.F
ADU.DLT.F
  LT 

+
+ ADU.DLT.F
ADU.DLT.F
 
 LT .FV
LT
.FV
”Red base” LT ”Red safety”
only
BOM
Ptak references
Ptak and
and
constitutes
Smith which
Smith (2016).
(2016).
the are
first critical
Positioning
step of
Positioning in
DDMRP.
theterms
the buffers of lead
buffers within time
within the the Red zone := 
ADU.DLT.F  
”Red base” LT + ADU.DLT.F 
”Red safety”LT
BOM
Ptak
BOM constitutes
and Smith
constitutes the
(2016).
the first step of
Positioning
first step of DDMRP.
the
DDMRP. buffers within the where ADU stands  for”Red base”
”Redthe Average
base”   Daily ”Red
”Red 
safety”
Usage,
safety”DLT  for
BOM constitutes the first step of DDMRP. where ADU stands for
”Redthe Average

BOM constitutes
The authors the
acknowledge first
thestep of
support DDMRP.
received from the industrial Decoupled
where ADU Lead
stands Time,for base”
which
the is the Daily
Average lead”Red
Daily Usage,
timesafety”
Usage, DLT
considering
DLT for
for
⋆ The authors acknowledge the support received from the industrial
chair
where
Decoupled
where
all ADULead
ADU stands
stands
sub-components Time,for
for
are the
which
the Average
is
Average
available the Daily
lead
Daily
without Usage,
time
Usage,
delay. FDLT
considering
DLT , for
for
the
⋆ TheFM logistic.
authors acknowledge the support received from the industrial Decoupled Lead Time, which is the lead time considering
LT

⋆ The
chair
TheFMauthors acknowledge
logistic.
authors acknowledge the
the support
support received
received from
from the
the industrial
industrial where
Decoupled
all
Decoupled ADULead stands
sub-components
Lead Time,
Time,for
are the
which
which Average
available is
is the
the Daily
lead
without
lead Usage,
time
time DLT
considering
delay. F ,
considering
LT
for
the
chair
⋆ FM logistic. all sub-components are available without delay. F ,, the
chair
TheFM
chair FM logistic.
authors acknowledge the support received from the industrial
logistic. Decoupled
all Lead Time,
all sub-components
sub-components are which
are available
available is the lead time
without
without delay.
delay. FLT
considering
F the
LT , the
LT
chair FM logistic.
2405-8963 Copyright © 2023 The Authors. This is an open access article under all sub-components
the CC BY-NC-ND license are available
. without delay. FLT , the
Peer review under responsibility of International Federation of Automatic Control.
10.1016/j.ifacol.2023.10.1892
1798 Youssef Lahrichi et al. / IFAC PapersOnLine 56-2 (2023) 1797–1802

lead time factor, and, FV , the variability factor are two 2. LITERATURE OVERVIEW
numerical factors to be fixed by the manager according
to the DLT and the variability of demand (respectively). Demand-Driven Materials Requirements Planning has
The red zone is the equivalent of safety stock in classical been firstly introduced in Ptak and Smith (2011) then in
production planning methods. Ptak and Smith (2016) as a viable alternative to MRP-
The yellow zone is dimensioned as follows: 2 that relies to much on forecasts or Kanban that can
lead to major stockouts and important lead times. Many
researchers conducted comparative studies to evaluate the
relevance of DDMRP in comparison with other produc-
Yellow zone := ADU × DLT tion planning methods or to evaluate the applicability
of DDMRP in a VUCA industrial context (see Table 1:
comparison with other systems and practical applications).
The yellow zone is somehow equivalent to the average
demand that cannot be caught up by a replenishment Some other researchers have considered the problem of
order. Indeed, a replenishment order is executed within determining where to position the buffer stocks with the
a delay equivalent to the DLT. bill of materials. This question has been addressed through
optimization (see Table 1: Buffer positioning studies).
Finally, the green zone is dimensionned as follows:
Decision-support tools for parameterization have been
studied only recently, exclusively by means of optimiza-
tion. A comprehensive study is conducted in Damand et al.
Green zone := M ax{ADU ×DLT ×FLT , M OQ, ADU ×Ho } (2022) including all possible parameters of DDMRP which
are eight in number. The authors consider two objectives
MOQ stands for the Minimum Order Quantity and Ho for simultaneously, namely, the minimization of the on-hand
the order cycle, in other words, the duration separating inventory and the maximisation of the On-Time Delivery
two orders. For the sake of simplification, we do not (OTD), which is the percentage of demands delivered
consider there two parameters in the present study. We without delay to the costumer. The authors assume that
consider in the remainder of the paper that Green zone := the demand is known (forecast or historic data) over a
ADU × DLT × FLT . planning horizon of one year. Fronts of non-dominated
solutions are obtained by means of a multi-objective al-
The third step of DDMRP is concerned with buffer size gorithm (NSGA-II).
adjustment in order to account for unforeseen changes.
In Lahrichi et al. (2022), the problem is tackled by means
The fourth step of DDMRP is concerned with the gener- of a mathematical model (MILP). The OTD is fixed to
ation of replenishment orders. This is achieved by exam- 100% by means of a constraint and the optimal solution
ining the level of the net flow = on-hand inventory + on- minimizing the average stock is computed using a com-
order inventory - qualified demand. The qualified demand mercial solver. Data sets from Damand et al. (2022) are
is the demand of the day added to the demands exceeding truncated to 100 days in order the test the approach.
some peak threshold. This threshold, denoted Tpeak , is to Optimal solutions were obtained within a few seconds.
be determined by the manager according to the specific
industrial context. An order of size TOG - Net flow is All optimization methods assume prior knowledge of
generated if Net flow ≤ TOY. demand over the planning horizon through forecasting.
There is no prior attempt to design an algorithm (agent)
The last step of DDMRP is concerned with the execution that parameterize DDMRP dynamically assuming demand
including priority management between replenishment or- is known only some few days in advance and not all over
ders. the planning horizon. This paper tries to overcome this
DDMRP relies on some parameters to be fixed by the user gap through reinforcement learning.
including the lead time factor, the variability factor and
the peak threshold. One natural question arises: how to fix 3. PROBLEM STATEMENT
automatically these parameters while optimising KPIs of
interest ? In order to state the problem, the main issue is to de-
termine the horizon over which the demand is known, we
Some few authors addressed this question by means of
call it the time step in the remainder of the paper. The
optimization assuming the demand data is known in
parameters are fixed over the time step and can be updated
advance all over the planning horizon. In this paper, we try
from a time step to the following time step. On the one
to address the question by means of reinforcement learning
hand, the time step must be as small as possible to be
assuming the demand data is known only some days in
the least dependent on data and forecasting. On the other
advance.
hand, the step must be large enough to take into account
DDMRP’s related literature is briefly summarized in the the inertia of the system Esteso et al. (2022), Lee and Lee
following section. The problem statement is given in sec- (2022). Enough time should be given to the learning agent
tion 3. The proposed reinforcement learning-based resolu- to observe the outcome of its parameterization action. A
tion approach is described in section 4. Key results from replenishment order needs at least DLT to be delivered.
the experiments are given in section 5. The contribution of However, fixing the time step to the DLT may seem too
the paper is summarized in section 6 together with future short since it will account only for the order launched
perspectives of research. during the first day of the time step. We fix the time step
Youssef Lahrichi et al. / IFAC PapersOnLine 56-2 (2023) 1797–1802 1799

Table 1. Literature review


Paper Seminal books Comparison with Buffer Decision-support for parameterization
& other systems and positioning
literature reviews practical applications studies Optimization Learning
Azzamouri et al. (2021) ✓
Bahu et al. (2019) ✓
El Marzougui et al. (2020) ✓
Pekarčı́ková et al. (2019) ✓
Ptak and Smith (2011) ✓
Ptak and Smith (2016) ✓
Bayard and Grimaud (2018) ✓
Dessevre et al. (2019) ✓
Favaretto and Marin (2018) ✓
Ihme (2015) ✓
Kortabarria et al. (2018) ✓
Martin (2020) ✓
Miclo et al. (2016) ✓
Miclo et al. (2019) ✓
Shofa and Widyarto (2017) ✓
Shofa et al. (2018) ✓
Thürer et al. (2020) ✓
Velasco Acosta et al. (2020) ✓
Abdelhalim et al. (2021) ✓
Jiang and Rim (2016) ✓
Jiang and Rim (2017) ✓
Rim et al. (2014) ✓
Damand et al. (2022) ✓
Lahrichi et al. (2022) ✓
Lee and Rim (2019) ✓
Miclo (2016) ✓
This paper ✓

to 2 × DLT which gives just enough time to observe the A Markov Decision Process (MDP) is a general tool which
outcome of the parameterization during one DLT. is often used to represent sequential decision making by
means of reinforcement learning. A MDP is given by a set
The elements of the considered problem are summarized
of states S defining the environment, a set of actions A
below:
to be taken by the agent, a reward set R within R and
• Data/Input: the environment dynamics p. Within state St , the agent
· The demand over a planning horizon of one year takes action At which makes the environment move to
(However, on day t, the agent has access to the the next state St+1 . The agent also receives a numerical
demand only up to day t + 2 × DLT ) reward Rt+1 which is a feedback on the action taken by the
· The Average Daily Usage (ADU) which is the agent (See Figure 2). The dynamics function defines the
average demand distribution of next states and rewards based on the action
• Decision variables: taken from a given state. The agent tries to maximize not
· FV : The variability factor only the reward received immediately but also the sum of
· FLT : The lead time factor rewards received over time (called the return).
· Tspike : The order spike threshold
Within state s, an action a is evaluated thanks to the state-
• Constraints:
action optimal value function q. q(s, a) gives the best aver-
· Ensure a minimum OTD
age return obtained from state s if action a were taken. If
· Follow a DDMRP replenishment policy
the state-action value function were known, it would suffice
• Objective:
to choose the action maximizing q (i.e, on state s choose
· Minimizing the average stock
a∗ = argmax q(s, a)). Solving a reinforcement learning
algorithm comes down to approximating the state-action
4. RESOLUTION APPROACH value function.
There are two types of reinforcement learning methods:
Reinforcement learning differs from classical machine
learning mainly on two levels Sutton and Barto (2018): • Function approximation methods: theses methods
assume that the state-action value function can be
• Within reinforcement learning, there is no clear dis- approximated by some parameterized function. The
tinction between the training phase and the test parameter of the approximating function is usually
phase. The model can be exploited from the start to computed by means of gradient decent methods (Lu
make decisions et al. (2021)).
• Within reinforcement learning, the decision to be • Tabular methods: these methods work differently.
taken is sequential and evaluative feedback is given They entirely depend on the experience of the agent.
to the agent at each time step
1800 Youssef Lahrichi et al. / IFAC PapersOnLine 56-2 (2023) 1797–1802

Fig. 2. The interaction between the agent and the environment in a Markov decision process.

This experience is stored on a table Q where Q(s, a) is approach on a computer equipped with a M1 chip (16Gb
the average return observed over the experience of the in RAM). JAVA 8 programming language was used.
agent with action a from state s. When choosing an
For this preliminary experiment, we generated a data set
action a from s based on Q, a new reward is observed
representing demand data over a planning horizon of one
which leads to updating Q.
year. The data instance is derived from Damand et al.
Different tabular methods have different ways for updating (2022). The SARSA algorithm was tuned using a trail-
the table Q. Among these methods, SARSA algorithm and-error method (See Table 2).
stood out with competitive experimental results (Sutton
and Barto (2018), Xu et al. (2018)). SARSA uses the Table 2. Algorithm tuning
following rule to update the state-action value function:
Parameter value
α 0.5
Q(St , At ) := Q(St , At ) + α.[Rt+1 + γ.Q(St+1 , At+1 ) γ 1
γV 100
−Q(St , At )] γLT 2000
γspike 10
The designed algorithm learns three parameters simultane-
ously: FV : the variability factor, FLT : the lead time factor
and Tspike : the order spike. One different table is used for To evaluate the ability of the agent to learn from experi-
each parameter: QV , QLT , Qspike . ence, the algorithm was executed on the same instance
several episodes in a row. Figure 3 shows that as the
QV (S V , FV ) gives the average reward for fixing the vari- episodes progress, the agent only finds himself in states
ability factor to FV from state sV . State sV must encode that he has already visited, which allows him to use his
efficiently relevant information needed to fix FV . We fix experience to choose a good action. This validates that the
sV as follows:
 
encoding used can encompass all the states of the system.
2
sV := γV × (std dev. of demand over the time step) Table 3 gives an example of a sequence followed by the
agent (trajectory) : State 1, Action 1, State 2, Action 2,
QLT (S LT , FLT ) gives the average reward for fixing the lead ...
time factor to FLT from state sLT . State sLT must encode Table 3. Trajectory containing 16 states
efficiently relevant information needed to fix FLT . We fix
sLT as follows:
  Parameters values State
LT On-hand inventory + On-order inventory
s := γLT × Time step FLT FV Tspike S LT SV S spike
Avg. demand over the time step × DLT 10 37 89 75 4000 9343 23410
11 34 27 24 4142 8856 26631
Qspike (S spike , Tspike ) gives the average reward for fixing 12 47 6 27 5131 9346 22650
the order spike threshold to Tspike from state sLT . We fix 13 96 25 90 4126 8676 26194
sspike as follows: 14 41 48 31 3359 8950 43763
  15 14 15 53 3315 10342 43832
spike Average demand over the time step 16 15 14 70 4794 9713 17089
s := γspike .
TOR 17 36 70 37 4072 9382 32646
18 25 30 70 2810 8771 115022
The reward given to the agent is the negative of the on- 19 18 51 26 2645 9460 116806
hand inventory obtained at the beginning of the time step. 20 29 69 60 3855 9690 33432
21 40 18 100 2797 9144 59407
22 52 45 31 2775 9909 76973
5. COMPUTATIONAL EXPERIMENTS 23 51 51 28 3511 9605 41381
24 27 0 89 3387 8949 40032
We give in this section some first insights on computational 25 95 62 43 3657 9713 27198
experiments. We implemented the reinforcement learning
Youssef Lahrichi et al. / IFAC PapersOnLine 56-2 (2023) 1797–1802 1801

Fig. 3. Evolution of experience usage for the agent

Fig. 4. Evolution of on-hand inventory

Figure 4 shows the evolution of the average on-hand inven- the suggested approach. The first results are encouraging
tory as the episodes progress (learning curve). The agent and invite us to further study the suggested method.
first yields a solution with 6634 in stock before lowering
As a perspective research direction, more experiments
this value to 4336 through experience. A steady state is
achieved. The OTD remains constant. This validates that could be carried out to evaluate the algorithm. For exam-
the encoding of the states contains enough information to ple, the algorithm could me trained on larger date sets
then tested and compared with latest approaches from
make a good parameterization decision.
literature like Lahrichi et al. (2022) and Damand et al.
(2022).
6. CONCLUSION AND PERSPECTIVES
REFERENCES
We study in this paper the automatic parameterization
of Demand-Driven MRP through reinforcement learning. Abdelhalim, A., Hamid, A., and Tiente, H. (2021). Optimi-
A novel encoding of states is suggested and used alongside sation of the automated buffer positioning model under
with the SARSA algorithm. A data set is generated to test ddmrp logic. IFAC-PapersOnLine, 54(1), 582–588.
1802 Youssef Lahrichi et al. / IFAC PapersOnLine 56-2 (2023) 1797–1802

Azzamouri, A., Baptiste, P., Dessevre, G., and Pellerin, R. tor fabrication. Expert Systems with Applications, 191,
(2021). Demand driven material requirements planning 116222.
(ddmrp): A systematic review and classification. Jour- Lu, S., Zhang, K., Chen, T., Başar, T., and Horesh, L.
nal of Industrial Engineering and Management, 14(3), (2021). Decentralized policy gradient descent ascent for
439–456. safe multi-agent reinforcement learning. In Proceedings
Bahu, B., Bironneau, L., and Hovelaque, V. (2019). of the AAAI Conference on Artificial Intelligence, vol-
Compréhension du ddmrp et de son adoption: premiers ume 35, 8767–8775.
éléments empiriques. Logistique & Management, 27(1), Martin, G. (2020). Contrôle dynamique du Demand
20–32. Driven Sales and Operations Planning. Ph.D. thesis,
Bayard, S. and Grimaud, F. (2018). Enjeux fi- Ecole des Mines d’Albi-Carmaux.
nanciers de ddmrp: Une approche simulatoire. In 12e Miclo, R. (2016). Challenging the” Demand Driven MRP”
Conférence Internationale de Modélisation, Optimisa- Promises: a Discrete Event Simulation Approach. Ph.D.
tion et SIMulation-MOSIM’18. thesis, Ecole des Mines d’Albi-Carmaux.
Bourne, M. (2021). Performance measurement and man- Miclo, R., Fontanili, F., Lauras, M., Lamothe, J., and
agement in a vuca world. Milian, B. (2016). An empirical comparison of mrpii
Damand, D., Lahrichi, Y., and Barth, M. (2022). Parame- and demand-driven mrp. IFAC-PapersOnLine, 49(12),
terisation of demand-driven material requirements plan- 1725–1730.
ning: a multi-objective genetic algorithm. International Miclo, R., Lauras, M., Fontanili, F., Lamothe, J., and
Journal of Production Research, 1–22. Melnyk, S.A. (2019). Demand driven mrp: assessment of
Dessevre, G., Martin, G., Baptiste, P., Lamothe, J., Pel- a new approach to materials management. International
lerin, R., and Lauras, M. (2019). Decoupled lead time Journal of Production Research, 57(1), 166–181.
in finite capacity flowshop: a feedback loop approach. In Pekarčı́ková, M., Trebuňa, P., Kliment, M., and Trojan, J.
2019 International Conference on Industrial Engineer- (2019). Demand driven material requirements planning.
ing and Systems Management (IESM), 1–6. IEEE. some methodical and practical comments. Management
El Marzougui, M., Messaoudi, N., Dachry, W., Sarir, H., and production engineering review, 10, 50–59.
and Bensassi, B. (2020). Demand driven mrp: literature Ptak, C. and Smith, C. (2011). Orlicky’s material require-
review and research issues. In 13ème Conférence in- ments planning. McGraw-Hill Education.
ternationale de modélisation, optimisation et simulation Ptak, C. and Smith, C. (2016). Demand Driven Material
(MOSIM2020), 12-14 Nov 2020, Agadir, Maroc. Requirements Planning (DDMRP): Version 2. McGraw-
Esteso, A., Peidro, D., Mula, J., and Dı́az-Madroñero, M. Hill Education.
(2022). Reinforcement learning applied to production Rim, S.C., Jiang, J., and Lee, C.J. (2014). Strategic
planning and control. International Journal of Produc- inventory positioning for mto manufacturing using asr
tion Research, 1–18. lead time. In Logistics Operations, Supply Chain Man-
Favaretto, D. and Marin, A. (2018). An empirical com- agement and Sustainability, 441–456. Springer.
parison study between ddmrp and mrp in material Shofa, M.J., Moeis, A.O., and Restiana, N. (2018). Effec-
management. Department of Management, Università tive production planning for purchased part under long
Ca’Foscari Venezia Working Paper, (15). lead time and uncertain demand: Mrp vs demand-driven
Ihme, M. (2015). Interpreting and applying demand driven mrp. In IOP Conference Series: Materials Science and
MRP: a case study. Ph.D. thesis, Nottingham Trent Engineering, volume 337, 012055. IOP Publishing.
University. Shofa, M.J. and Widyarto, W.O. (2017). Effective produc-
Jiang, J. and Rim, S.C. (2016). Strategic inventory tion control in an automotive industry: Mrp vs. demand-
positioning in bom with multiple parents using asr lead driven mrp. In AIP Conference Proceedings, volume
time. Mathematical Problems in Engineering, 2016. 1855, 020004. AIP Publishing LLC.
Jiang, J. and Rim, S.C. (2017). Strategic wip inventory Sutton, R.S. and Barto, A.G. (2018). Reinforcement
positioning for make-to-order production with stochas- Learning: An Introduction. The MIT Press, second
tic processing times. Mathematical Problems in Engi- edition. URL http://incompleteideas.net/book/
neering, 2017. the-book-2nd.html.
Karakutuk, S.S. and Ornek, M.A. (2022). A goal program- Thürer, M., Fernandes, N.O., and Stevenson, M. (2020).
ming approach to lean production system implementa- Production planning and control in multi-stage assem-
tion. Journal of the Operational Research Society, 1–14. bly systems: an assessment of kanban, mrp, opt (dbr)
Kortabarria, A., Apaolaza, U., Lizarralde, A., and Amor- and ddmrp by simulation. International Journal of
rortu, I. (2018). Material management without fore- Production Research, 1–15.
casting: From mrp to demand driven mrp. Journal Velasco Acosta, A.P., Mascle, C., and Baptiste, P. (2020).
of Industrial Engineering and Management, 11(4), 632– Applicability of demand-driven mrp in a complex man-
650. ufacturing environment. International Journal of Pro-
Lahrichi, Y., Damand, D., and Barth, M. (2022). A first duction Research, 58(14), 4233–4245.
milp model for the parameterization of demand-driven Xu, Z.x., Cao, L., Chen, X.l., Li, C.x., Zhang, Y.l., and Lai,
mrp. Computers & Industrial Engineering, 108769. J. (2018). Deep reinforcement learning with sarsa and q-
Lee, C.J. and Rim, S.C. (2019). A mathematical safety learning: a hybrid approach. IEICE TRANSACTIONS
stock model for ddmrp inventory replenishment. Math- on Information and Systems, 101(9), 2315–2322.
ematical Problems in Engineering, 2019.
Lee, Y.H. and Lee, S. (2022). Deep reinforcement learning
based scheduling within production plan in semiconduc-

You might also like