0% found this document useful (0 votes)

65 views22 pages

Neural Networks Reinforcement Learning

The document discusses reinforcement learning and the role of neural networks in reinforcement learning. It provides an introduction to reinforcement learning, describes Q-learning as a reinforcement learning algorithm, and discusses how neural networks can be used with Q-learning when there are large numbers of state-action pairs. Specifically, it explains that neural networks can be used to store and approximate Q-factors, which are updated incrementally during reinforcement learning. The document provides examples of how a simple neural network can be combined with Q-learning.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views22 pages

Neural Networks Reinforcement Learning

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Missouri S & T gosavia@mst.

edu

NEURAL NETWORKS AND REINFORCEMENT LEARNING

Abhijit Gosavi

Department of Engineering Management and Systems

Engineering
Missouri University of Science and Technology
Rolla, MO 65409

1
A. Gosavi
Missouri S & T [email protected]

Outline

• A Quick Introduction to Reinforcement Learning

• The Role of Neural Networks in Reinforcement Learning
• Some Algorithms
• The Success Stories and the Failures
• Some Online Demos
• Future of Neural Networks and Reinforcement Learning

2
A. Gosavi
Missouri S & T [email protected]

What is Reinforcement Learning?

• Reinforcement Learning (RL) is a technique useful in solving

control optimization problems.
• By control optimization, we mean the problem of recognizing the
best action in every state visited by the system so as to optimize
some objective function, e.g., the average reward per unit time
and the total discounted reward over a given time horizon.
• Typically, RL is used when the system has a very large number
of states (>> 1000) and has complex stochastic structure, which
is not amenable to closed form analysis.
• When problems have a relative small number of states and the
underlying random structure is relatively simple, one can use
dynamic programming.

3
A. Gosavi
Missouri S & T [email protected]

Q-Learning

• The central idea in Q-Learning is to recognize or learn the

optimal action in every state visited by the system (also called
the optimal policy) via trial and error.
• The trial and error mechanism can be implemented within the
real-world system (commonly seen in robotics) or within a
simulator (commonly seen in management science / industrial
engineering).

4
A. Gosavi
Missouri S & T [email protected]

Q-Learning: Working Mechanism

• The agent chooses an action, obtains feedback for that action,

and uses the feedback to update its database.
• In its database, the agent keeps a so-called Q-factor for every
state-action pair. When the feedback for selecting an action in a
state is positive, the associated Q-factor’s value is increased,
while if the feedback is negative, the value is decreased.
• The feedback consists of the immediate revenue or reward plus
the value of the next state.

5
A. Gosavi
Missouri S & T [email protected]

Simulator
(environment)

Feedback
r(i,a,j)

RL Algorithm
(Agent)
Action
a

Figure 1: Trial and error mechanism of RL. The action selected by the RL
agent is fed into the simulator. The simulator simulates the action, and the
resultant feedback obtained is fed back into the knowledge-base (Q-factors)
of the agent. The agent uses the RL algorithm to update its knowledge-base,
becomes smarter in the process, and then selects a better action.

5
A. Gosavi
Missouri S & T [email protected]

Q-Learning: Feedback

• The immediate reward is denoted by r(i, a, j), where i is the

current state, a the action chosen in the current state, and j the
next state.
• The value of any state is given by the maximum Q-factor in that
state. Thus, if there are two actions in each state, the value of a
state is the maximum of the two Q-factors for that state.
• In mathematical terms:
f eedback = r(i, a, j) + λ max Q(j, b),
b

where λ is the discount factor, which discounts the values of

future states. Usually, λ = 1/(1 + R) where R is the rate of
discounting.

6
A. Gosavi
Missouri S & T [email protected]

Q-Learning: Algorithm

The core of the Q-Learning algorithm uses the following updating

equation:
Q(i, a) ← [1 − α]Q(i, a) + α [f eedback] ,
i.e.,
[ ]
Q(i, a) ← [1 − α]Q(i, a) + α r(i, a, j) + λ max Q(j, b) ,
b

where α is the learning rate (or step size).

7
A. Gosavi
Missouri S & T [email protected]

Q-Learning: Where Do We Need Neural Networks?

• When we have a very large number of state-action pairs, it is not

feasible to store every Q-factor separately.
• Then, it makes sense to store the Q-factors for a given action
within one neural network.
• When a Q-factor is needed, it is fetched from its neural network.
• When a Q-factor is to be updated, the new Q-factor is used to
update the neural network itself.
• For any given action, Q(i, a) is a function of i, the state. Hence,
we will call it a Q-function in what follows.

8
A. Gosavi
Missouri S & T [email protected]

Incremental or Batch?

• Neural networks are generally of two types: batch updating or

incremental updating.
• The batch updating neural networks require all the data at once,
while the incremental neural networks take one data piece at a
time.
• For reinforcement learning, we need incremental neural networks
since every time the agent receives feedback, we obtain a new
piece of data that must be used to update some neural network.

9
A. Gosavi
Missouri S & T [email protected]

Neurons and Backpropagation

• Neurons are used for ﬁtting linear forms, e.g., y = a + bi where i

is the input (the state in our case). Also called adenaline rule or
Widrow-Hoff rule.
• Backprop is used when the Q-factor is non-linear in i, which is
usually the case. (Algorithm was invented by Paul Werbos in
1975).
• Backprop is a universal function approximator, and ideally
should fit any Q-function!
• Neurons can also be used by fitting the Q-function in a piecewise
manner, where a linear fit is introduced in every piece.

10
A. Gosavi
Missouri S & T [email protected]

Algorithm for Incremental Neuron

Step 1: Initialize the weights of the neural network.

Step 2a: Compute the output op using

∑
k
output = w(j)x(j), where
j=0

w(j) is the jth weight of neuron and x(j) is the jth input.
Step 2b: Update each w(i) for i = 0, 1, . . . , k using:

w(i) ← w(i) + µ[target − output]x(i),

where the target is the updated Q-factor.

Step 3: Increment iter by 1. If iter < itermax , return to Step 2.

11
A. Gosavi
Missouri S & T [email protected]

Q-Learning combined with Neuron

We now discuss a simple example of Q-Learning coupled with a

neuron using incremental updating on an MDP with two states and
two actions.
Step 1. Initialize the weights of the neuron for action 1, i.e., w(1, 1)
and w(2, 1), to small random numbers, and set the corresponding
weights for action 2 to the same values. Set k, the number of
state transitions, to 0. Start system simulation at any arbitrary
state. Set kmax to a large number.
Step 2. Let the state be i. Simulate action a with a probability of
1/|A(i)|. Let the next state be j.
Step 3. Evaluate the Q-factor for state-action pair, (i, a), which we

12
A. Gosavi
Missouri S & T [email protected]

will call Qold , using the following:

Qold = w(1, a) + w(2, a)i.

Now evaluate the Q-factor for state j associated to each action,

i.e.,

Qnext (1) = w(1, 1) + w(2, 1)j; Qnext (2) = w(1, 2) + w(2, 2)j.

Now set Qnext = max {Qnext (1), Qnext (2)} .

Step 3a. Update the relevant Q-factor as follows (via Q-Learning).

Qnew ← (1 − α)Qold + α [r(i, a, j) + λQnext ] . (1)

Step 3b. The current step in turn may contain a number of steps
and involves the neural network updating. Set m = 0, where m is
the number of iterations used within the neural network. Set
mmax , the maximum number of iterations for neuronal updating,

13
A. Gosavi
Missouri S & T [email protected]

to a suitable value (we will discuss this value below).

Step 3b(i). Update the weights of the neuron associated to action a
as follows:
w(1, a) ← w(1, a)+µ(Qnew −Qold )1; w(2, a) ← w(2, a)+µ(Qnew −Qold )i.
(2)
Step 3b(ii). Increment m by 1. If m < mmax , return to Step 3b(i);
otherwise, go to Step 4.
Step 4. Increment k by 1. If k < kmax , set i ← j; then go to Step 2.
Otherwise, go to Step 5.
Step 5. The policy learned, µ̂, is virtually stored in the weights. To
determine the action prescribed in a state i where i ∈ S, compute
the following:
µ(i) ∈ arg max [w(1, a) + w(2, a)i] .
a∈A(i)

14
A. Gosavi
Missouri S & T [email protected]

Some important remarks need to be made in regards to the algorithm

above.
Remark 1. Note that the update in Equation (2) is the update used
by an incremental neuron that seeks to store the Q-factors for a given
action.

Remark 2. The step-size µ is the step size of the neuron, and it can
be also be decayed with every iteration m

15
A. Gosavi
Missouri S & T [email protected]

Backpropagation

• The algorithm is more complicated, since it involves multiple

layers and the threshold functions.
• The algorithm requires signiﬁcant tuning.
• Incremental version of the algorithm must be used.

16
A. Gosavi
Missouri S & T [email protected]

Integration of neural networks with Q-Learning

• In some sense resembles integration of hardware (neural network)

and software (Q-Learning).
• Integration has to be tightly controlled; otherwise results can be
disappointing.
• Mixed success with backprop; Crites and Barto (1996; elevator
case study), Das et al (1999; preventive maintenance used
backprop with great success), and Sui (supply chains) have
obtained great success, but some other failures reported (Sutton
and Barto’s 1998 textbook).
• Piecewise ﬁtting of the function using neurons has also shown
robust and stable behavior: Gosavi (2004; airline revenue
management).

17
A. Gosavi
Missouri S & T [email protected]

Toy Problem: Two states and Two actions

Table 1: The table shows Q-factors.

Method Q(1, 1) Q(1, 2) Q(2, 1) Q(2, 2)

Q-factor-value iteration 44.84 53.02 51.87 49.28
Q-Learning with α = 150/(300 + n) 44.40 52.97 51.84 46.63
Q-Learning with Neuron 43.90 51.90 51.54 49.26

18
A. Gosavi
Missouri S & T [email protected]

Future Directions

• A great deal of current research in function approximation and

RL is using regression, e.g., algorithms such as LSTD (least
squares temporal diﬀerences)
• However, most exciting RL applications in robotics and
neuro-science studies are still using neural networks.
• Support Vector Machines are yet another data mining tool that
have seen some recent applications in RL.

19
A. Gosavi
Missouri S & T [email protected]

References

D. Bertsekas and J. Tsitsiklis. Neuro-dynamic programming, Athena,

1996.
R. Crites and A. Barto. Improving elevator performance using
reinforcement learning. In Neural Information Processing Systems
(NIPS). 1996.
T.K. Das, A. Gosavi, S. Mahadevan, and N. Marchalleck. Solving
semi-Markov decision problems using average reward reinforcement
learning. Management Science, 45(4):560574, 1999.
A. Gosavi. Simulation-based optimization: Parametric optimization
techniques and reinforcement learning, Kluwer Academic Publishers,
2009.
A. Gosavi, N. Bandla, and T. K. Das. A reinforcement learning

20
A. Gosavi
Missouri S & T [email protected]

approach to a single leg airline revenue management problem with

multiple fare classes and overbooking. IIE Transactions (Special
Issue on Large-Scale Optimization edited by Suvrajeet Sen),
34(9):729742, 2002.
R. Sutton and A. Barto. Reinforcement Learning: An Introduction,
MIT Press, 1998.

21
A. Gosavi

Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Unit 5
No ratings yet
Unit 5
54 pages
Esraa Khaled
No ratings yet
Esraa Khaled
27 pages
RL Class Mtech
No ratings yet
RL Class Mtech
67 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Chapter 2 Routine and Non Routine Problem
100% (1)
Chapter 2 Routine and Non Routine Problem
16 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
Unit5 MLT
No ratings yet
Unit5 MLT
26 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Q-Learning Algorithm
No ratings yet
Q-Learning Algorithm
13 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
4a - Approximate Reinforcement Learning
No ratings yet
4a - Approximate Reinforcement Learning
55 pages
Lec 11
No ratings yet
Lec 11
45 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
Recurrent Neural Networks For Prediction
100% (3)
Recurrent Neural Networks For Prediction
297 pages
Reinforedu
No ratings yet
Reinforedu
46 pages
Hota ML ReinforcementLearning
No ratings yet
Hota ML ReinforcementLearning
12 pages
4.3 Reinforcement Learning
No ratings yet
4.3 Reinforcement Learning
27 pages
ML Unit 5
No ratings yet
ML Unit 5
17 pages
AI T8 ReinfoLearning
No ratings yet
AI T8 ReinfoLearning
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
DQN Atari
No ratings yet
DQN Atari
26 pages
Q Learning
No ratings yet
Q Learning
38 pages
Structures Congress 2017: Buildings and Special Structures
No ratings yet
Structures Congress 2017: Buildings and Special Structures
801 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
Lec 09
No ratings yet
Lec 09
26 pages
AI 11 Reinforcement Learning II
No ratings yet
AI 11 Reinforcement Learning II
35 pages
Grade 2 Math - End Term 2 - 2024
No ratings yet
Grade 2 Math - End Term 2 - 2024
7 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
RRB NTPC 12 January 2021 Question Paper PDF
No ratings yet
RRB NTPC 12 January 2021 Question Paper PDF
3 pages
Lecture Notes On Reinforcement Learning Basics
No ratings yet
Lecture Notes On Reinforcement Learning Basics
6 pages
Learning Task
No ratings yet
Learning Task
14 pages
Neural Network Study Group
No ratings yet
Neural Network Study Group
24 pages
Slides Active Flow Control Deep Reinforcement Learning
No ratings yet
Slides Active Flow Control Deep Reinforcement Learning
46 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Torque Values For Nut
No ratings yet
Torque Values For Nut
1 page
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Lecture 29 RL
No ratings yet
Lecture 29 RL
38 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Handbook For V&V of Digital Systems
No ratings yet
Handbook For V&V of Digital Systems
282 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
AI A Z HandBook
No ratings yet
AI A Z HandBook
12 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
The Papir Hogy Nem
No ratings yet
The Papir Hogy Nem
13 pages
3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
Ricco Serial Verb Constructions in Three-Participant Event
No ratings yet
Ricco Serial Verb Constructions in Three-Participant Event
50 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcement Learning: Yijue Hou
No ratings yet
Reinforcement Learning: Yijue Hou
34 pages
Sistem Persediaan
No ratings yet
Sistem Persediaan
34 pages
Rule-Based Reinforcement Learning Augmented by External Knowledge
No ratings yet
Rule-Based Reinforcement Learning Augmented by External Knowledge
7 pages
Fai Mid2 4ans
No ratings yet
Fai Mid2 4ans
4 pages
Module 2.5
No ratings yet
Module 2.5
32 pages
Math 155 Lecture Notes Section 10,2
No ratings yet
Math 155 Lecture Notes Section 10,2
7 pages
Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks
No ratings yet
Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks
12 pages
Goodwill and Dynamic Advertising Strateg
No ratings yet
Goodwill and Dynamic Advertising Strateg
38 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
RL PDF
No ratings yet
RL PDF
4 pages
Design Patterns - Assignment Sample 6 With Answers
100% (1)
Design Patterns - Assignment Sample 6 With Answers
13 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Tutorial 5 (With Answers)
No ratings yet
Tutorial 5 (With Answers)
10 pages
Luxury Yacht Swan - : Guests Staterooms Crew
No ratings yet
Luxury Yacht Swan - : Guests Staterooms Crew
25 pages
Stable Weight Decay Regularization
No ratings yet
Stable Weight Decay Regularization
18 pages
A Comprehensive Review On Power System Risk-Based Transient Stability
No ratings yet
A Comprehensive Review On Power System Risk-Based Transient Stability
6 pages
Maths
No ratings yet
Maths
222 pages
Holographic Space-Time and Quantum Information
No ratings yet
Holographic Space-Time and Quantum Information
18 pages
2009 Lotos Bssa
No ratings yet
2009 Lotos Bssa
21 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Opn Research by Prof Narang
No ratings yet
Opn Research by Prof Narang
43 pages
Emanation and Bulk Fluorescence in Liquid Argon From Tetraphenyl Butadiene Wavelength Shifting Coatings
No ratings yet
Emanation and Bulk Fluorescence in Liquid Argon From Tetraphenyl Butadiene Wavelength Shifting Coatings
12 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
HAQ: Hardware-Aware Automated Quantization With Mixed Precision
No ratings yet
HAQ: Hardware-Aware Automated Quantization With Mixed Precision
10 pages
M911 G11 - Transformation Geometry
No ratings yet
M911 G11 - Transformation Geometry
12 pages
MAC: Mining Activity Concepts For Language-Based Temporal Localization
No ratings yet
MAC: Mining Activity Concepts For Language-Based Temporal Localization
9 pages
Umbrello Handbook X
No ratings yet
Umbrello Handbook X
41 pages
Power-Law Genesis: Strong Coupling and Galileon-Like Vector Fields
No ratings yet
Power-Law Genesis: Strong Coupling and Galileon-Like Vector Fields
9 pages
Catalogo Erico Pararrayos Dinasphere
100% (1)
Catalogo Erico Pararrayos Dinasphere
6 pages
Basics of Sigma-Delta Modulation
No ratings yet
Basics of Sigma-Delta Modulation
25 pages
Pandora's Box Lid: Geometry Near The Apparent Horizon
No ratings yet
Pandora's Box Lid: Geometry Near The Apparent Horizon
8 pages
CCA Loss
No ratings yet
CCA Loss
5 pages
III-Day 37
No ratings yet
III-Day 37
3 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
Plane Symmetric Model With Constant Deceleration Parameter
No ratings yet
Plane Symmetric Model With Constant Deceleration Parameter
3 pages
K-2615 (Paper-II) (Mathematical Science)
No ratings yet
K-2615 (Paper-II) (Mathematical Science)
8 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Paper A Method For Fuel Efficiency Classification of Agricultural Tractors
No ratings yet
Paper A Method For Fuel Efficiency Classification of Agricultural Tractors
11 pages
Lattice QCD On A Novel Vector Architecture: Benjamin Huth, Nils Meyer, Tilo Wettig
No ratings yet
Lattice QCD On A Novel Vector Architecture: Benjamin Huth, Nils Meyer, Tilo Wettig
7 pages
San Francisco Bread Co
No ratings yet
San Francisco Bread Co
3 pages
Y5A Practice Book Answers White Rose Maths Edition
No ratings yet
Y5A Practice Book Answers White Rose Maths Edition
25 pages
15-150703-Design and Analysis of Algorithms PDF
No ratings yet
15-150703-Design and Analysis of Algorithms PDF
2 pages
02 Eisenman Cardboard Architecture
No ratings yet
02 Eisenman Cardboard Architecture
12 pages
En 10083 C50 Steel Plate High Carbon Steel
No ratings yet
En 10083 C50 Steel Plate High Carbon Steel
2 pages
M Stage 8 p110 02 Afp PDF
67% (3)
M Stage 8 p110 02 Afp PDF
14 pages

Neural Networks Reinforcement Learning

Uploaded by

Neural Networks Reinforcement Learning

Uploaded by

Missouri S & T gosavia@mst.

NEURAL NETWORKS AND REINFORCEMENT LEARNING

Department of Engineering Management and Systems

• A Quick Introduction to Reinforcement Learning

What is Reinforcement Learning?

• Reinforcement Learning (RL) is a technique useful in solving

• The central idea in Q-Learning is to recognize or learn the

Q-Learning: Working Mechanism

• The agent chooses an action, obtains feedback for that action,

• The immediate reward is denoted by r(i, a, j), where i is the

where λ is the discount factor, which discounts the values of

The core of the Q-Learning algorithm uses the following updating

where α is the learning rate (or step size).

Q-Learning: Where Do We Need Neural Networks?

• When we have a very large number of state-action pairs, it is not

• Neural networks are generally of two types: batch updating or

Neurons and Backpropagation

• Neurons are used for ﬁtting linear forms, e.g., y = a + bi where i

Algorithm for Incremental Neuron

Step 1: Initialize the weights of the neural network.

w(i) ← w(i) + µ[target − output]x(i),

where the target is the updated Q-factor.

Q-Learning combined with Neuron

We now discuss a simple example of Q-Learning coupled with a

will call Qold , using the following:

Qold = w(1, a) + w(2, a)i.

Now evaluate the Q-factor for state j associated to each action,

Now set Qnext = max {Qnext (1), Qnext (2)} .

Qnew ← (1 − α)Qold + α [r(i, a, j) + λQnext ] . (1)

to a suitable value (we will discuss this value below).

Some important remarks need to be made in regards to the algorithm

• The algorithm is more complicated, since it involves multiple

Integration of neural networks with Q-Learning

• In some sense resembles integration of hardware (neural network)

Toy Problem: Two states and Two actions

Table 1: The table shows Q-factors.

Method Q(1, 1) Q(1, 2) Q(2, 1) Q(2, 2)

• A great deal of current research in function approximation and

D. Bertsekas and J. Tsitsiklis. Neuro-dynamic programming, Athena,

approach to a single leg airline revenue management problem with

You might also like