Lec-1-17 (Z - Merged)
Lec-1-17 (Z - Merged)
Lec-1-17 (Z - Merged)
• A more detailed title of the course could be “An Operations Research Perspective on Manufacturing
Industry.”
• The major focus of this course is on how to handle uncertainty in manufacturing and industrial systems.
• Venue and lecture timings: Online; Tues, Wed, Fri: 9 AM – 9.50 AM.
• Tentative grade calculation policy: Minor: 30%; Major: 40%; Assignments: 20%; Term Paper: 10%
• Textbook: Operations Research: An Introduction, 10th edition. Author: Hamdy A. Taha.
• Supplementary textbook: Factory Physics, 3rd edition. Authors: W.J. Hopp and M.L. Spearman.
• Planned Topics in Module 1
❖Inventory systems
❖Materials requirement planning
❖Decision analysis for industrial scenarios
❖Introduction to bandit model
❖Queueing systems
Inventory Systems
Courtesy: goodfirms.co
• Inventory is the goods and materials that a business holds for the ultimate goal of resale.
• Inventory management is a discipline primarily concerned with specifying the shape and placement of
stocked goods.
• Purchasing cost is the price per unit of an inventory item.
• Setup cost represents the fixed charge incurred when an order is placed.
• Setup cost is fixed regardless of the size of the order requested.
• Setup cost can also include the cost associated with receiving a shipment.
• Setup cost is also fixed regardless of the size of the shipment received.
• Holding cost represents the cost of maintaining inventory in stock.
• Holding cost includes
• interest on capital
• cost of storage
• cost of maintenance
• cost of handling
• cost of obsolescence (meaning item passed its expiry date or a more recent version is available in the
market, and therefore cannot be sold)
• cost of shrinkage due to fraud or theft
• Shortage cost is the penalty incurred when stock is out.
• Shortage cost includes
• potential loss of income
• disruption in production due to rescheduling nominal manufacturing schedules
• additional cost of ordering emergency shipments (usually overnight)
• subjective cost of loss in customer goodwill which is hard to estimate
• These four costs are conflicting because an increase in one may result in the reduction of another
• For example, more frequent ordering results in higher setup cost but lower inventory holding cost.
• Therefore we seek to minimize the total inventory cost in order to balance these conflicting costs.
• The inventory problem reduces to devising an inventory policy that answers two questions
1. How much to order?
2. When to order?
• How much to order means we need to determine the size of the order at replenishment time.
• When to order i.e. how you decide the replenishment time is more complicated.
• An inventory system may be based on
• periodic reviews (e.g., ordering at the start of every week or month)
• Or continuous reviews, placing a new order whenever the inventory level drops to a specific reorder
point.
• The solution of the inventory problem also depends on the nature of demand faced by the corresponding
supply chain.
• Demand in inventory systems can be of four types
1. Deterministic and constant (static) with time.
2. Deterministic and variable (dynamic) with time, for example, seasonality effects.
3. Probabilistic and stationary over time: parameters of probability distribution are known and fixed
over time.
4. Probabilistic and nonstationary over time: you cannot pin down the underlying probability
distribution
• The coefficient of variation (CV) is defined as the ratio of the standard deviation to the mean.
• CV measures the relative variation or spread of the data around the mean.
• In general, higher values of CV indicate higher uncertainty if you use the mean as an approximation of
monthly consumption.
• For deterministic demand, CV = 0 because the associated standard deviation is zero.
• CV can be used to classify demand into one of the four categories by using the following guidelines.
• If the average monthly demand (taken over a number of years) is “approximately” constant and CV is
reasonably small (< 20%), then the standard convention is to assume the demand to be deterministic and
constant.
• If the average monthly demand varies appreciably among the different months but CV remains reasonably
small for all months, then the demand may be considered deterministic but variable.
• Thus, low CV means deterministic and high CV means probabilistic.
• If CV is high (> 20%) but the average monthly demand (taken over a number of years) is “approximately”
constant, then the demand is probabilistic and stationary.
• The remaining case is the probabilistic nonstationary demand, which occurs when the average monthly
demands and coefficients of variation vary appreciably month to month.
Static Economic-Order-Quantity (EOQ) Models
• Classical EOQ model is the simplest of the inventory models.
• Production is instantaneous.
• There is no capacity constraint, and the entire lot is produced simultaneously. Thus, we assume no
shortage so that shortage cost does not have to be accounted for in the total inventory cost formula.
• Delivery is immediate.
• There is no time lag between production and availability to satisfy demand.
• Demand is deterministic.
• There is no uncertainty about the quantity or timing of demand.
• Demand is constant over time, hence called static EOQ.
• Demand can be represented as a straight line, so that if annual demand is 365 units, this translates to a
daily demand of one unit.
• A production run incurs a fixed setup cost.
• Regardless of the size of the lot or the status of the factory, the setup cost is the same.
• Products can be analyzed individually.
• Either there is only a single product to be analyzed or there are no interactions (e.g., shared equipment)
between products and so their analysis can be decoupled into independent problems of single product.
• The inventory stock is depleted uniformly at a constant demand rate, D.
• The important characteristic of the classical EOQ model is that when the inventory reaches zero level, an
order of size 𝑦 units is delivered to the production facility instantaneously.
• There are two cost parameters associated with the classical EOQ model.
• 𝐾 = Setup cost associated with the placement of an order (INR per order)
• ℎ = Holding cost (INR per inventory unit per unit time)
[ ]
• We can violate the second assumption of the classical EOQ model because a new order need not be
received at the instant it is ordered.
• Introducing delivery delays is straightforward if delivery times are known and fixed.
• Let us say that a positive lead time L occurs between the placement and the receipt of an order.
• So if we assume that the lead time is less than the cycle length, then the reorder point is modified to
occur whenever the inventory level drops to LD units.
EOQ with Price breaks
• This is also a type of Static EOQ model.
• The inventory item can be purchased at a discount if the size of the order 𝑦 exceeds a given limit 𝑞.
• Let 𝑐 denote the unit purchasing price.
• This is because just like the classical EOQ model, every 𝑡0 units of time we order 𝑦 units which costs us 𝑐𝑦.
• Because the two functions differ only by a constant amount, their minima will still coincide at 𝑦𝑚 just
like in the classical case.
• However, there is a clever trick to obtain the correct answer.
• To see this, we must first determine the value of 𝑄 > 𝑦𝑚 as follows (whose significance will be
explained later)
• There will be three possible cases depending on the actual value of 𝑞.
• If 𝑞 < 𝑦𝑚 , then we see that the dashed curves will be ignored.
• Considering only the solid curves, the minimum point is obvious.
• All the three possible cases are summarized below.
Multi-Item EOQ with Storage Limitation
• This is also a type of Static EOQ model.
• This model deals with multiple items whose individual inventory fluctuations are exactly the same as the
classical EOQ model.
• The only difference is that the items compete for a limited storage space.
Material Requirements Planning (MRP)
• Any demand that originates outside the system is called independent demand.
• This includes all demand for final products and possibly some demand for components (e.g., when they are sold
as replacement parts).
• Dependent demand is demand for components that make up the independent demand products.
• MRP operates at the interface between independent and dependent demand.
• MRP is therefore called a push system since it computes schedules of what should be started (or pushed) into
production based on demand.
• This is in contrast to pull systems such as Toyota’s kanban system, that authorize production as and when
inventory is consumed.
• Assume Part A has been assigned a fixed order period (FOP) of 2 weeks.
• FOP implies that the firm places an order with the supplier for the supply of Part A at fixed time intervals.
• FOP helps generate the planned order receipts by using net requirements data.
• Also assume Part A has a lead time of 2 weeks.
• Lead time is the amount of time that passes between the placement of an order and its delivery.
• Lead time helps generate the planned order releases by using planned order receipts data.
• We review the previous concepts by solving a more involved problem shown above.
• Above is the master production schedule for part B.
• The master production schedule is the source of demand for the MRP system.
Dynamic Lot Sizing
• Dynamic Lot Sizing is different from the static EOQ models studied so far.
• The demand per period, though deterministic, is dynamic, in that it varies from one period to the next, thus we
violate the assumption of constant demand.
• The main historical approach to relaxing the constant-demand assumption is the Wagner-Whitin model.
• The Wagner-Whitin model considers the problem of determining production lot sizes when demand is
deterministic but time-varying.
• All the other assumptions for the EOQ model are valid for the Wagner-Whitin model.
• When demand varies over time, a continuous time model like the EOQ model which treats time as a continuous
real line is infeasible.
• So we clump demand into discrete periods, which could correspond to days, weeks, or months, depending on the
system.
• This gives rise to tabular data.
• A daily production schedule might make more sense for a high-volume system with rapidly changing demand.
• A monthly schedule may be more adequate for a low-volume system with demand that changes more slowly.
INR
INR
INR
• For simplicity, assume that setup costs 𝑨𝒕 , production costs 𝒄𝒕 , and holding costs 𝒉𝒕 are all constant over
time, although this is not necessary for the Wagner-Whitin model.
• Problem is to satisfy all demands at minimal cost, i.e., production plus setup plus holding cost.
• The only available controls available to solve this problem are the production quantities 𝑄𝑡 .
• However, since all demands must be filled, only the timing of production is open to choice, not the total
production quantity.
• Hence if unit production cost is constant (that is, 𝑐𝑡 does not vary with 𝑡), then production cost will be the same
for all possible timings of production and therefore production cost can be ignored.
• Wagner–Whitin Property: Under an optimal lot-sizing policy, either the inventory carried to week 𝑡 + 1 from a
previous week will be zero or the production quantity in week 𝑡 + 1 will be zero.
• Why? Because either it is cheaper to produce all of week 𝑡 + 1’s demand in week 𝑡, or all of it in 𝑡 + 1; it is never
cheaper to produce some in each.
• If we produce items in week 𝑡 (and incur a setup cost) to satisfy demand in week 𝑡 + 1, then it cannot possibly be
economical to produce in week 𝑡 + 1 (and incur another setup cost).
• The Wagner-Whitin property implies that either 𝑄𝑡 = 0 or 𝑄𝑡 will be exactly enough to satisfy demand in the
current week plus some integer number of future weeks.
• The Wagner–Whitin algorithm starts with week 1 and finishes with week N.
• By the Wagner–Whitin property, we know that we will produce in a week only if the inventory carried to that
week is zero.
• For instance, in a 6-week problem, there are six possibilities for the amount we can produce in week 1, namely,
𝐷1 , 𝐷1 + 𝐷2 , 𝐷1 + 𝐷2 + 𝐷3 , … , 𝐷1 + 𝐷2 + 𝐷3 + 𝐷4 + 𝐷5 + 𝐷6 .
• If we choose to produce 𝐷1 + 𝐷2 , then inventory will run out in week 3 and so we will have to produce again in
week 3.
Planning Horizon Property
• Until step 4, we had to consider producing for week 4 in all weeks 1 through 4. But this is not always necessary.
• In the 4-week problem we saw that it is optimal to produce in week 4 for week 4.
• So let us now ask: Is it cheaper to produce for week 5 in week 3 than in week 4?
• If we produce in week 3 or 4, then the produced items in weeks 3 and 4 must be held in inventory up to week 5.
• In both cases, the carrying cost from week 4 to week 5 will be same.
• So we only need to ask: is it cheaper to set up in week 3 and carry inventory from week 3 to week 4 than it is to
set up in week 4?
• But we already know the answer to this question from step 4: It is cheaper to set up in week 4.
• Therefore, it is unnecessary to consider producing in weeks 1, 2, and 3 for the demand in week 5.
• We need to consider only weeks 4 and 5.
• The blank spaces in the upper right-hand corner of this table are due to planning horizon property.
• Without using this property, entire upper triangular part of the table will get filled.
• How to interpret the above table to solve the original dynamic lot sizing problem?
• The minimum total setup plus inventory carrying cost is read from last column of the penultimate row 𝑍10 = 580.
• The optimal lot sizes are determined from the 𝑗𝑡∗ values.
• Since 𝑗𝑡∗ represents the last week of production in a 𝑡-week problem, it is optimal to produce enough to cover the
demand from week 𝑗𝑡∗ through week 𝑡.
∗
• Therefore, since 𝑗10 = 8, it is optimal to produce for weeks 8, 9, and 10 in week 8.
• Doing this leaves us with a 7-week problem.
• Since 𝑗7∗ = 4, it is optimal to produce for weeks 4, 5, 6, and 7 in week 4.
• This leaves us with a 3-week problem.
• Since 𝑗3∗ = 1, we should produce for weeks 1, 2, and 3 in week 1.
Continuous Review Probabilistic Inventory
Models
“Probabilitized” EOQ Model
• The critical period during inventory cycle occurs between placing and receiving orders because shortage can occur then.
• Probabilitized EOQ Model seeks to maintain a constant buffer stock that will put a cap on the probability of shortage.
• Larger buffer stock results in lower shortage probability and vice versa.
• Assume that the demand per unit time is normally distributed with mean D and standard deviation 𝜎.
• Let 𝑥𝐿 denote the demand during lead time L.
• Then 𝑥𝐿 is also normally distributed with mean 𝜇𝐿 and standard deviation 𝜎𝐿 .
• The size of buffer B is determined by demanding the probability of shortage during lead time L is at most 𝛼.
𝑥𝐿 −𝜇𝐿
• We can define a new random variable Z = which is clearly normally distributed with mean 0 and
𝜎𝐿
standard deviation 1.
• The inventory policy is to order the quantity 𝑦 whenever the amount of inventory on hand drops to level R.
• Reorder level R is a function of the lead time between placing and receiving an order.
• Optimal values of 𝑦 and R are determined by minimizing the expected sum of setup, holding, and shortage costs per unit
time.
• The optimal values of 𝑦 ∗ and 𝑅∗ cannot be determined in closed form.
• We use an iterative algorithm developed by Hadley and Whitin which is given below.
We want 𝑦 ∗ to be as small as possible but incur no shortage. Setting 𝑆 = 0, we get our initial solution
NEWSVE NDOR PROBLEM
L peuhable item
y hastic dehndnd
cletrnmii'shic ots
Lonsidera a t a kgnizon s h y Shgle bne
L Zt_te
placed be dete y I
alhiny neks ved mozlel, anyp do
tabadd be *-I
SirgleaArd
a2t p0 dte pnalty Cotph stag
rou4vvbla
vohusblaX _dete e demanel
t pdt a .
f cltnte e
Pbas [b]=£a) dat
L Lt F date cdf_ X:
Fla)=Prbu sa u) d
Cane 1 No etup t
2 eyny t may4,u, o )
beck h t u =
Azala
E-M-C. (epeted krldin st)= El-x)
E.S-C lxheted Shataggtat1) E-(z-x)
T TE.PC. [tuh abe delpducio t ) ; ESc +EHC
we eek t inimi ze
mi EPc)
min ESC+ EHC
-Py-)+Aly-1)
Fo eementary mtabillty eoy
Fla plan)dn Stabdael
fn)dn (-n)fh)dn
Cp - Z ard t =al.
Qh
Cotvibufinn s pri t o vahsheL Nou we Lna
Lonpute
= 0
c ucE --H)dn + k (-)flb) dn
Dfferechating ive
=-12-n)f)] - 2 - ) f l o )d n
+hl-£m +h
n
(fn)da + [fm)dzn
-GHn)da Fl2) +h
Flz) =o
- ECZ)+h Flz)=o_
,f ake knouh, i s
Fly=
b+h ndion is tasy Johe
Conpte
E f Lhawrian d'tyibdo,Tken
-M
+h
Slddes ad tuoley thot owtpt
invtrne
MtO
ph
Lans 2: Witatup Lt
+ Kp + kK +h-X
hl-x)
Nom pohamiNee(endd
The4 we
Amands_
An id
(Quiet, Quiet) 3 3
(Quiet, Confess) 0 5
(Confess, Quiet) 5 0
(Confess, Confess) 1 1
• Define 𝑠−𝑖 = (𝑠1 , … , 𝑠𝑖−1 , 𝑠𝑖+1 , … , 𝑠𝑛 ) to be the strategy profile 𝑠 with player 𝑖’s strategy removed.
• Define 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 ) to be the value of the utility function to player 𝑖 of the strategy profile with
𝑠𝑖 removed and replaced by 𝑡𝑖 .
• Definition. Player 𝑖’s strategy 𝑠𝑖 is a best response to the profile 𝑠−𝑖 of other player strategies if
𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) ≥ 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 ) for all other strategies 𝑡𝑖 ∈ 𝑆𝑖 where 𝑆𝑖 denotes all the strategies of player 𝑖.
• Confess is the best response strategy for R if C’s strategy is set to Quiet.
• Confess is also a best response for R if C’s strategy is set to Confess.
• Definition. Player 𝑖’s strategy 𝑠𝑖 strongly dominates player 𝑖’s strategy 𝑡𝑖 if 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) > 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 )
for all strategy profiles 𝑠−𝑖 ∈ 𝑆−𝑖 = 𝑆1 × 𝑆2 × ⋯ × 𝑆𝑖−1 × 𝑆𝑖+1 × 𝑆𝑛 available to the remaining
players.
• Definition. Player 𝑖’s strategy 𝑠𝑖 dominates player 𝑖’s strategy 𝑡𝑖 if 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) ≥ 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 ) for all
strategy profiles 𝑠−𝑖 ∈ 𝑆−𝑖 , and strict inequality holds for at least one strategy profile 𝑠−𝑖 ∈ 𝑆−𝑖 .
• The strategy Confess strongly dominates the strategy Quiet for R because R’s best response is the
same regardless of C’s choice of strategy.
• Definition. A strategy is strongly dominant [resp., dominant] for player 𝑖 if it strongly dominates
[resp., dominates] all other strategies for player 𝑖.
• When strategy 𝑠𝑖 strongly dominates strategy 𝑡𝑖 , player 𝑖 should select strategy 𝑠𝑖 over strategy 𝑡𝑖
unless the strategy selections by the other players result in player 𝑖 obtaining the same utility for
either.
• Confess strongly dominates Quiet for R and by symmetry, Confess also strongly dominates Quiet for
C.
• Thus, both players select Confess resulting in payoffs of 1 for each player, which we denote with the
payoff pair (1,1).
• Knowing that this was the thinking of the other suspect, neither regrets or second guesses their own
decision to confess.
• Such a regret-free strategy profile is known as a Nash equilibrium.
• Definition. A strategy profile 𝑠 is a Nash equilibrium if 𝑢𝑗 (𝑠) ≥ 𝑢𝑗 (𝑡𝑗 , 𝑠−𝑗 ) for all players j ∈ 𝑁 and
all strategies 𝑡𝑗 ∈ 𝑆𝑗 available to that player.
• That is, 𝑠 is a Nash equilibrium if, given what the other players have chosen to do, 𝑠−𝑗 , each player 𝑗
cannot unilaterally improve their payoff by replacing their current strategy, 𝑠𝑗 , with a new strategy,
𝑡𝑗 .
• Thus no player has regrets about their strategy selection in a Nash equilibrium.
• A solution concept is simply a formal rule for predicting how a game will be played.
• We introduced two solution concepts above, viz. dominance and Nash equilibria.
• Now we look at a third solution concept.
• A player might decide to think prudentially and so chooses a strategy by looking at the worst thing
that can happen with each strategy choice, and then choosing the strategy that makes the worst case
as “least bad” as possible.
• So the player would choose a strategy that maximizes their minimum payoff with respect to the
strategy choices of other players.
• For example, if R chooses Quiet, the worst that can happen is a payoff of 0 when C chooses Confess.
• On the other hand, R’s worst payoff if she chooses Confess is 1.
• This suggests that R should choose Confess if she is strategically risk adverse (= prudential).
• Definition. Player 𝑖’s strategy 𝑠𝑖 is prudential if
• The value of 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) is called the security level for player 𝑖.
• Confess is the unique dominant and unique prudential strategy for each player. Verify!
• Although (Confess, Confess) is the unique Nash equilibrium, the two players in the Prisoner’s
Dilemma strategy game would be better off if they both chose Quiet, resulting in the payoff pair (3,3)
instead of the payoff pair (1,1).
• Thus, the three solution methods do not always yield the best overall payoff for each player.
• Therefore, we introduce the fourth solution concept.
• Definition. A strategy profile 𝑠, and its associated outcome 𝑜, are efficient if there does not exist a
strategy profile t ∈ 𝑆 such that 𝑢𝑗 (𝑡) ≥ 𝑢𝑗 (𝑠) for all players 𝑗, with at least one of the inequalities
being strict.
• So a strategy profile is efficient if we cannot find another strategy profile that at least maintains the
utility for all players, while strictly improving the utility for at least one player.
• For the Prisoner’s Dilemma strategic game, the strategy profile (Confess, Confess) is not efficient
because both players obtain larger utilities with (Quiet, Quiet).
• Each of the other three strategy profiles are efficient because it is impossible to make a change without
reducing at least one of the player’s payoffs.
• As in the Prisoner’s Dilemma strategic game, players may not have an incentive to choose a strategy
that is part of an efficient strategy profile.
• However, there is no such dilemma when Nash equilibria are also efficient.
• In the Prisoner’s Dilemma strategic game, there is a tension between
1. choosing the dominant strategy (Confess), which will always yield a higher payoff regardless
of the other player’s choice, and
2. knowing that a better outcome might be possible if both players choose their dominated
strategy (Quiet).
• This tension is what puts the dilemma into the Prisoner’s Dilemma: each player selecting the logical,
rational strategy does not lead to an efficient outcome!
• One way to resolve this dilemma would be for the players to enter into a binding agreement to stay
quiet.
• But if we introduce binding agreements, then we no longer have a strategic game and Nash
equilibrium solution concept does not apply.
Office Scenario
• Suppose there is a shared coffee pot in an office and the employees voluntarily contribute to a pool of
money to replenish the supplies.
• Each employee who drinks the coffee must decide whether or not to contribute to the pool.
• Player strategies are Contribute or Not Contribute.
• Not Contribute is the strongly dominant strategy because it helps the players save maximum money,
hence has maximum payoff compared to any other strategy.
• But if everyone selects Not Contribute then there are no funds to buy coffee.
• So this is another example of Prisoner’s dilemma scenario.
• A multi-agent decision-making scenario is said to be a prisoner’s dilemma scenario if
❖it can be modeled as a strategic game
❖there is a single strategy for each player that strongly dominates all of that player’s other strategies
❖but all players would receive a higher payoff if they choose a specific dominated, rather than the
dominant, strategy.
• Since the mutual benefit result requires all players to cooperate by choosing the dominated strategy, it
is often called the Cooperate strategy.
• Since there is always an incentive for any individual player to switch their choice to the dominant
strategy, the dominant strategy is often called the Defect strategy.
• In the original prisoner’s dilemma scenario, Quiet is the Cooperate strategy.
• In the Office Coffee scenario, Contribute is the Cooperate strategy.
• In the original prisoner’s dilemma scenario, Confess is the Defect strategy.
• In the Office Coffee scenario, Not Contribute is the Defect strategy.
• Assume a prisoner’s dilemma scenario involving exactly two players and each player has exactly two
strategies such that their payoffs are as described below
• The payoff to player 𝑖 for cooperating when the other player is defecting is 𝑆𝑖 .
• The payoff when both players defect is 𝑃𝑖 .
• The payoff when both cooperate is 𝑅𝑖 .
• The payoff to entice player 𝑖 to defect is 𝑇𝑖 .
• For the payoffs of the players we have the relationship as stated below.
• However, there need not be any relationship between the two sequences of payoffs for the two players.
• Given this ordering of the payoffs, let us verify that Defect is the strongly dominant strategy for each
player.
• To do this it is easier to write the payoffs in terms of a matrix.
• Setting the partial derivative equal to zero, we find the critical value of 𝑄1 which will be a maximum point
as long as it is positive because the second derivative is negative.
• Subtracting the two best response functions shows us that 𝑄1 = 𝑄2 , and therefore the optimal production
values are as computed below.
• Thus, if 𝑎 − 𝑏𝑐 > 0, then each firm will produce (𝑎 − 𝑏𝑐)/3 units of the good. The profit of each firm
then will be (𝑎 − 𝑏𝑐)2 /9𝑏.
• If 𝑎 − 𝑏𝑐 ≤ 0, then each firm will produce nothing and obtain no profit.
• The condition for positive production can also be written as 𝑎/𝑏 > 𝑐, which is saying for any reasonable
market to sustain itself, the price at which consumers will demand nothing must be higher than the cost to
produce a single unit of the product.
• Sequential games can also be used to model scenarios in which the players have a continuum of strategy
options available to them. Cournot duopoly can be extended to allow continuous strategies.
• Suppose that Firm 1 moves first by selecting a production level and then Firm 2 moves second, with
knowledge of what Firm 1’s decision was.
• The shading between the “0” and “∞” branches for each firm indicates that there are an infinite number of
actions available to each player at each decision point: any nonnegative number indicating the production
level of the corresponding firm.
• A generic choice for Firm 1 is labeled 𝑄1 and the resulting choice for Firm 2 is labeled 𝑄2 .
• Firm 2 has a choice for each choice of Firm 1 and the resulting total production level is Q = 𝑄1 + 𝑄2 .
• Firm 2’s best response function had been computed previously
• Because Firm 2 is the last firm to move, this strategy is precisely the one indicated by the backward
induction method as it helps Firm 2 maximize its payoff.
• The profit function for Firm 1 then needs to be rewritten as above.
• Note that since Firm 1 knows what action Firm 2 will be taking, its utility function only depend on 𝑄1 .
• Thus, Firm 1’s best response is to maximize its profit by computing the optimal value of 𝑄1 by taking the
derivative of 𝑢1 with respect to 𝑄1 and setting it equal to zero.
SEQUENTIAL DECISION MAKING
• Game tree is one way to model sequential decision making (SDM) scenarios but Markov Decision Processes
(MDPs) form a more powerful modeling tool for more complex SDM scenarios.
• BIA is used to solve game trees whereas Reinforcement Learning (RL) algorithms are used to solve MDPs.
• The most general SDM scenarios are described as follows.
• There is a decision maker (DM) who makes successive observations of a process before making a final
decision, and this keeps repeating until the time horizon is reached or some termination condition is triggered.
• But there are costs associated with each of these observations.
• The DM must choose what to do and how much to do it at various points in time.
• The choices at one time influence the possibilities available at other points in time.
• These choices depend on the relative value the DM assigns to different payoffs at different points in time.
• These choices also require the DM to trade off costs and rewards at different points in time.
• The procedure to decide when to stop taking observations and when to continue is called the stopping rule.
• The objective is to find the optimal stopping rule and the optimal decision-making strategy with the goal of
minimizing some given loss function, and observation costs are included in this optimization process.
Government Subsidy
• People who are monetarily risk-averse often purchase insurance policies to protect themselves or their
properties against large losses.
• Even though the expected value of paying INR10k is the same as having a 1% chance of paying INR10
lacs but for many, paying a guaranteed monthly premium is preferable to having even a small chance of
paying a much larger sum.
• Can having security against having to pay a larger sum changes people’s behavior in anyway?
• For example, insured drivers may not drive as cautiously or may leave their cars unlocked knowing they
won’t be responsible for large damages or losses.
• Insurance companies must take this into account when calculating premiums or must find ways to reduce
riskier behavior amongst their insured clients.
• The change in behavior produced by passing some of the risk or responsibility from an individual to
another player is known in game theory as moral hazard.
• The individual is called an agent and the other party assuming some of the risk is known as the principal.
• Insurance is one example with the insurance company acting as the principal for clients who are agents.
• Another common example studied is that of managers attempting to influence their employees behavior
through incentive programs related to sales or profits.
• Here we explore whether the scholarships given out by central governments can result in moral hazard:
will free college education encourage or discourage students in giving their best?
• A key characteristic of a situation that involves moral hazard is that the principal cannot influence the
agent’s action in any binding way.
• In many situations the principal is able to observe only the final results.
• Consider a large central government agency handing out scholarships to individual students spread out
across a country.
• Will the government, in the role of the principal, enable the student, in the role of the agent, to engage in
activities that do not support the intentions of the government?
• Specifically, rather than working hard to succeed, will a student choose to put minimal effort into their
studies and have a higher risk of failure since there is no or little cost to them for the classes?
• Instead of modeling the entire college education, let us simply consider a single course with two levels of
accomplishment: success (S) or failure (F).
• Cost of the course (tuition, books, and supplies) is a fixed amount K.
• Government (G) reimburses the student (A) for
• part of the cost of the course (an amount less than K)
• the entire cost of the course (the amount K)
• an amount greater than K as in a fellowship
• G may also make this amount depend on whether the student succeeds or fails.
• Let 𝑟𝑆 and 𝑟𝐹 be the amount that the government will reimburse in the two cases respectively.
• Completely free college occurs when 𝑟𝑆 = 𝑟𝐹 = 𝐾.
• The student can choose how much effort they are willing to put into succeeding in this course categorized
as high effort (H), low effort (L), or no effort (N) by not taking the course.
• If student has chosen to take the course, chance (C) then determines success or failure with the probability
of success being 𝑝𝐻 or 𝑝𝐿 depending on whether the student chose to put in high or low effort.
• The game tree for this sequential game is shown below.
• We next explain the rationale for assigning the payoffs to the government and student for each of the five
possible outcomes.
(𝐺, 𝐴)
• When the student chooses to not take the course, a payoff of zero is assigned.
• For the government, the monetary worth for student success and failure is 𝑣𝑆 and 𝑣𝐹 .
• For the student, the monetary worth for success and failure is 𝑤𝑆 and 𝑤𝐹 .
• The monetary cost for high and low effort is 𝑐𝐻 and 𝑐𝐿 .
• Thus, if the government chooses reimbursement plus the student chooses a high level of effort plus the
student is successful, then the government’s payoff is 𝑣𝑆 − 𝑟𝑆 and the student’s payoff is 𝑤𝑆 − 𝑐𝐻 − 𝐾 + 𝑟𝑆 .
• Assume the government prefers success (0 ≤ 𝑣𝐹 < 𝑣𝑆 ).
• Assume there may be intrinsic worth to the student to take and pass the course (0 ≤ 𝑤𝐹 ≤ 𝑤𝑆 ).
• Assume effort is costly for the student (0 < 𝑐𝐿 < 𝑐𝐻 ).
• Assume students can pass and hard work increases their chances of getting a passing grade (0 < 𝑝𝐿 < 𝑝𝐻 ).
• Given government’s action (𝑟𝑆 , 𝑟𝐹 ) and student’s action (𝐻, 𝐿, 𝑁), then government’s expected payoff is
computed below.
• We can similarly compute the student’s expected payoff as shown below.
• ℎ and 𝑙 denote the student’s intrinsic (without government reimbursement) net benefit for high and low
effort.
• 𝑤𝑆 = 𝑤𝐹 means that there is no intrinsic worth to the student passing instead of failing the course.
• If there is no intrinsic worth to the student passing instead of failing the course (𝑤𝑆 = 𝑤𝐹 ) and no
government assistance (𝑟𝑆 = 𝑟𝐹 = 0), then ℎ < 𝑙 because 𝑐𝐿 < 𝑐𝐻 . So the student has no incentive to put
in a high amount of effort.
• If passing the course is sufficiently valuable (𝑤𝑆 > 𝑤𝐹 ) and this is also sufficiently effective in comparison
with the cost of effort (𝑤𝑆 −𝑤𝐹 ≫ 𝑐𝐻 − 𝑐𝐿 ), then this implies that ℎ > 0 and ℎ > 𝑙.
• So the student will take the course and put in a high level of effort even without government assistance.
• Let us determine the government’s best course of action by applying the subgame perfect equilibrium
solution concept.
• We begin by considering the student’s maximizing strategy.
• For each government action (𝑟𝑆 , 𝑟𝐹 ), student chooses an action among (𝐻, 𝐿, 𝑁) denoted by a(𝑟𝑆 , 𝑟𝐹 )
which maximizes 𝑢𝐴 ( 𝑟𝑆 , 𝑟𝐹 , a(𝑟𝑆 , 𝑟𝐹 )).
• Government’s strategy will be to choose that action (𝑟𝑆 , 𝑟𝐹 ) which maximizes 𝑢𝐺 ( 𝑟𝑆 , 𝑟𝐹 , a(𝑟𝑆 , 𝑟𝐹 )).
• Finding this subgame perfect equilibrium is equivalent to solving the two linear programs below.
𝑟𝑆 ≥ 0, 𝑟𝐹 ≥ 0 𝑟𝑆 ≥ 0, 𝑟𝐹 ≥ 0
𝑟𝑆 ≥ 0, 𝑟𝐹 ≥ 0 𝑟𝑆 ≥ 0, 𝑟𝐹 ≥ 0
• The last inequality constraints in both LPs are simply the nonnegativity constraints.
• The only nontrivial constraint is the second inequality constraint.
• The 2nd constraint simply decides which strategy student will adopt, H or L.
• The 3rd constraint ensures that student will not consider N.
• For 1st LP , the 2nd constraint decides if H is adopted by student.
• For 2nd LP, the 2nd constraint decides if L is adopted by student.
• The objective function maximizes the government’s payoff based on whether student adopts H or L.
• If value of 1st LP is larger than value of 2nd LP and also than the value of student not taking the course (N)
which is zero, then government announces reimbursement as per 1st LP.
• If value of 2nd LP is larger than value of 1st LP and also than the value of student not taking the course (N)
which is zero, then government announces reimbursement as per 2nd LP.
• If value of both LPs ≤ 0, then government sees no utility in students putting effort (H or L) in their education
and so announces its subsidy based on student choosing N which will logically lead to (𝑟𝑆 = 𝑟𝐹 = 0).
EXTENSIVE GAMES
• Until now, we have assumed that players in a game know the rules, the possible outcomes, and each
other’s preferences over outcomes. Such games are said to be games with complete information.
• Sometimes we have also assumed that players choose their actions sequentially and know all actions taken
previously. Such games are said to be games with perfect information.
• Sequential games have both complete and perfect information and have no actions left to chance.
• Strategic games are imperfect information games with complete information.
• An extensive game is a way to model scenarios that have imperfect or incomplete information in some
way.
• We shall restrict ourselves to imperfect information variants of extensive games where players may not
always know where they are in a game.
• Thus we assume that the players do not know one or more actions taken by the other players.
• An extensive game consists of the following
Service Contract
• A local food processing plant, FoodPro, needs a small repair done.
• They normally call Ben’s contracting company.
• But the FoodPro manager decides to explore another contractor.
• So the FoodPro manager first asks Mark’s contracting company for a bid.
• Mark, who would like to have the work, can decide not to place a bid or to place a bid of some amount
• After receiving a response from Mark, the FoodPro manager tells Ben whether Mark has submitted a bid.
• But the amount of the bid, if one was submitted, is not told to Ben.
• The FoodPro manager then asks Ben for a bid.
• Since the project is small, Ben does not really want the work.
• However, he wants to keep FoodPro as a customer over the long term.
• The FoodPro manager plans to accept the lower of the two bids, but if the bids are of similar amounts, the
FoodPro manager will choose Ben over Mark.
• Assume that Mark and Ben are the only players in this game.
• The game is sequential in nature.
• Ben does not have full information about Mark’s action, thus this is an imperfect information game.
• To create a simple model, we assume Mark can choose one of three actions: not bid (No), bid low (Lo), or
bid high (Hi).
• Ben only knows whether Mark bids or does not bid.
• If Mark does bid, Ben does not know whether he bid high or low.
• Since Ben is not interested in doing the work for a low price, assume that he chooses between two actions:
not bid (No) or bid high (Hi).
• The game tree below summarizes this scenario by assigning appropriate utilities.
• The same game tree can also be represented using the table below.
• Since Ben is preferred over Mark, Ben will obtain the work with histories (No, Hi) and (Hi, Hi).
• Mark will obtain the work with histories (Lo, No), (Lo, Hi), and (Hi, No).
• Neither will obtain the work with history (No, No).
• Mark most prefers (Hi, No) because he gets the work at the most profit and probably gains a new customer.
• Mark prefers (Lo, Hi) over (Lo, No) because with the former outcome he is getting the highest possible
profit given Ben’s bid and with the later outcome he regrets losing out on a higher profit margin.
• Mark values the remaining histories in which he does not obtain the work in the order (Hi, Hi), (No, Hi),
and (No, No).
• Mark is happy to have at least tried in (Hi, Hi).
• We can also assume that Mark would rather want Ben to get the work than neither of them getting it in the first
round of bidding with (No, No).
• Next we discuss Ben’s payoffs.
• Ben most prefers (Hi, No) because he does not need to do a small job he’s not interested in.
• We can also assume that (Hi, No) does not take away future repair jobs of Ben at FoodPro.
• Ben next prefers histories (Hi, Hi), (No, Hi), and (Lo, Hi) because his concern about his long term relationship
with FoodPro is more important than his desire to not take on a small job.
• Also because (Hi, Hi) suggests to FoodPro that Ben’s charges are reasonable while (Lo, Hi) suggests that Ben
may be trying to cheat FoodPro.
• Ben’s lowest ranked option is (Lo, No) since it most likely leads to Ben losing future repair jobs to Mark.
• The history (No, No) is not ranked last because having FoodPro realize the difficulty of finding competent
repair contractors could be advantageous to Ben in the future.
• This completes utility assignment to players.
• The non-terminal histories are shown above to be partitioned into three groups which are referred to as the
information sets.
• Only Ben faces an information set, viz. Ben2, that contains more than one node.
• At Ben2, Ben does not know which sequence of actions has occurred.
• Ben has to make a decision whether to choose No or Hi without knowing whether Mark has bid Hi or Lo.
• In FoodPro, one pure strategy for Ben is to always choose No.
• Another pure strategy is to choose No when Mark Bids (at information set Ben2) and Hi if Mark does not
bid (at information set Ben1).
• Mark has three pure strategies, corresponding to the actions at his only non-terminal history: No, Lo, or
Hi.
• For strategic games, we know that players can adopt mixed strategies.
• A mixed strategy in strategic games is a probability distribution that assigns to each available action a
likelihood of being selected.
• We could also allow players in extensive games to adopt such mixed strategies.
• But it is more natural to allow players to randomize action choices at each information set based on
probability distributions.
• Such strategies are called behavior strategies.
• A pure strategy for player 𝑖 is a function 𝑠𝑖 which assigns to each of the player’s information sets a
possible action.
• A behavior strategy for player 𝑖 is a function 𝑠𝑖 which assigns to each of the player’s information sets a
probability distribution over possible actions.
• If 𝑠 is used to denote a pure or behavior strategy, let 𝑠𝑖 (𝐼) or simply 𝑠(𝐼) denote the action or probability
distribution over actions chosen by player 𝑖 at the assigned information set 𝐼.
• Let 𝑠𝑖 (𝑎|𝐼) or simply 𝑠(𝑎|𝐼) denote the probability that player 𝑖 will choose action 𝑎 at information set 𝐼.
• A belief system for an extensive game is a function, 𝛽, that assigns a probability distribution over histories in
each information set not assigned to chance.
• Thus, 𝛽(𝐼) denotes the probability distribution on the nodes in information set 𝐼.
• Let 𝛽(ℎ|𝐼) denote the probability of history ℎ in the information set 𝐼 with respect to the belief system 𝛽.
• A belief system models the players’ understanding of what has happened in the game up to 𝐼.
• When the player assigned to 𝐼 makes a choice of actions, that player uses the probabilities 𝛽(ℎ|𝐼) for all ℎ𝐼.
• So these probabilities should reflect that player’s beliefs about how likely it is that each ℎ has occurred.
• The probability distribution 𝛽(𝐼) is also referred to as the “player 𝑖’s belief system at 𝐼.”
• For sequential games, we defined and used a stronger solution concept than Nash equilibrium, namely
subgame perfect equilibrium, that required player strategies to be best responses starting from each non-
terminal history of the game.
• In an extensive game, players may not know at which node they find themselves within an information
set, and so it would be impossible to determine the best response using the subgame perfect equilibrium
solution concept.
• However, if each player has a belief about how likely each history in the information set is to have
occurred, they can determine a best response based on these beliefs.
• An extensive game includes all aspects of a sequential game, but adds chance and information sets which
may contain more than one non-terminal history.
• Because of their sequential nature, we still use game trees as a visual representation for extensive games.
• Each node in the tree corresponds to a history: terminal nodes correspond to terminal histories and are
labeled with payoffs.
• The other nodes correspond to non-terminal histories and are grouped together within dashed boxes to
indicate information sets, which are labeled with chance or the player who chooses an action there.
• A player assigned to an information set only knows that they are somewhere within the dashed box, but
does not know specifically at which node, when choosing an action.
• When all of the information sets contain a single history and no information set is assigned to chance, that
extensive game becomes identical to a sequential game.
• An information set containing two or more histories incorporates imperfect information because the players
do not know some of the history of actions before making a choice and do not necessarily know the direct
effects of an action.
• A player in an extensive game has perfect recall if at every opportunity to act
i. he remembers his prior actions, and
ii. he remembers everything that he knew before.
• Intuitively, the perfect recall assumption is that players never forget information once it is acquired.
• An extensive game has perfect recall if every player in that game does.
• If the partners in a card game or the members of a soccer team are modeled as a single player, then the game
will not have perfect recall because each card player only knows their own cards and each team member can
only see certain parts of the soccer field.
• Even individual persons often forget certain choices they made in the past.
• There are two types of memory associated with perfect recall: memory of past knowledge and memory of
past actions.
• A player can forget what she knew and yet remember what she did in the past.
• A player can forget what actions she took in the past but remember what she knew at the time.
• In the FoodPro scenario, one pair of behavior strategies is for Mark to choose between No, Lo, and Hi with
equal probability and for Ben to choose No or Hi with equal probability at Ben1 but choose Hi over No twice
as often at Ben 2.
• Thus, we get the strategy profile 𝑠 = (𝑠𝑀𝑎𝑟𝑘 , 𝑠𝐵𝑒𝑛 ) shown above. For this strategy profile. Computing
Mark’s and Ben’s expected payoff is straightforward as shown below respectively for the two.
• Also if the consistency equation holds for all but one history in an information set, then the consistency
equation clearly holds for the last remaining history because the summation of 𝛽 ℎ 𝐼 and 𝑃𝑟𝐺′ ℎ s over
all ℎ has to be one.
• So to verify that an assessment achieves consistency of beliefs, it is sufficient to check the consistency
equation at all except one history in each non-singleton information set where at least one history is
reached with positive probability (otherwise you will be dividing by zero).
• In the FoodPro game, since the subgame rooted at the node labeled Ben1 has no non-singleton information
sets, the consistency of beliefs condition holds trivially.
• For the subgame 𝐺 rooted at Mark1, there is one non-singleton information set that we have to consider,
namely Ben2 and it satisfies the consistency equation as shown below, therefore the original assessment
achieves consistency of beliefs.
• However, the assessment with the changed belief strategy does not achieve consistency of beliefs as shown
above.
• But there is an easy recipe to define consistent beliefs.
• Trick 1: For any information set that is reached via player strategies with positive probability in some
subgame, you can construct the consistent beliefs by directly calculating the probability ratios from the
behavior strategies.
• Trick 2: If an information set is not reached via player strategies with a positive probability in any
subgame, then any probability distribution on that information set will be consistent.
• Definition. An assessment (𝑠, 𝛽) is a weak sequential equilibrium if it is both sequentially rational and
achieves consistency of beliefs.
• The assessment consisting of the original belief strategy in FoodPro game is not a weak sequential
equilibrium since it is not sequentially rational, even though it achieves consistency of beliefs.
• The assessment consisting of the changed belief strategy in FoodPro game is not a weak sequential
equilibrium since even though it is sequentially rational, it does not achieve consistency of beliefs.
• Using the tricks above, we can make the latter assessment achieve consistency of beliefs by simply
computing the beliefs according to the consistency equation.
• Consider the assessment as shown above. It is a weak sequential equilibrium.
• To show this, we must first prove it is sequentially rational which we have already done before.
• So we only need to show that it achieves consistency of beliefs by verifying the consistency equation.
• For the subgame 𝐺 rooted at Mark1, this is shown below.
• For strategic games, the primary solution concept has been Nash equilibrium.
• For sequential games, we have used subgame perfect equilibrium.
• For extensive games, we shall use weak sequential equilibrium which, similar to the subgame perfect
equilibrium solution concept, also eliminates certain unreasonable Nash equilibria.
• Strategic and sequential games are special cases of extensive games.
Existence Theorems
• The following theorems provide a sufficient condition for the existence of a weak sequential equilibrium.
• Weak Sequential Equilibrium Existence Theorem. If an extensive game has perfect recall and a finite
number of histories, then a weak sequential equilibrium exists.
• Theorem. Given a sequential game, the corresponding extensive game puts each non-terminal history in its
own information set. Each subgame perfect equilibrium of the sequential game becomes a weak sequential
equilibrium for the corresponding extensive game by adding a belief system achieving consistency of
beliefs, and the strategy profile of each weak sequential equilibrium of the corresponding extensive game
is a subgame perfect equilibrium of the sequential game.
• Theorem. Given a strategic game with strategy sets 𝑆1 , 𝑆2 , … , 𝑆𝑛 the corresponding extensive game
consists of the set of terminal histories 𝑆1 × 𝑆2 × ⋯ × 𝑆𝑛 and an information set 𝐼𝑖 = 𝑆1 × 𝑆2 × ⋯ ×
𝑆𝑖−1 for each player 𝑖. Each Nash equilibrium of the strategic game becomes a weak sequential
equilibrium for the corresponding extensive game by adding a belief system achieving consistency of
beliefs, and the strategy profile of each weak sequential equilibrium of the corresponding extensive game
is a subgame perfect equilibrium of the sequential game.
• Nash’s Existence Theorem. Every game with a finite number of players in which each player can choose
from finitely many pure strategies has at least one Nash equilibrium, which might be a pure or mixed
strategy for each player.
Analysis of War Games
• Let RE and GT denote two parties at war with each other.
• RE prefers to conduct short campaigns raiding villages to capture or destroy tribal military supplies.
• RE can advance on the village either through a forest or across a small lake.
• The defending GT soldiers are sufficient to defend one of the approaches.
• If GT chooses correctly, GT will win the resulting battle.
• If GT chooses incorrectly, RE will capture the village and capture weapons and food.
• The GT soldiers are then faced with a second choice: attack RE in the village, or wait in ambush.
• The success of either option depends on a decision by RE on whether they will return to their base
immediately, or wait until nightfall.
• If GT is waiting in ambush and RE returns by day, then GT wins the resulting battle.
• If GT is waiting in ambush, but RE withdraws at night, then RE successfully escape the ambush.
• If GT attacks, but RE had withdrawn during the day, then RE are successful in their mission.
• But if GT attacks and RE has decided to wait to withdraw, then there is a vicious battle in which both
sides lose.
• RE can also decide not to attack at all but doing this results in a loss of morale among soldiers and the
removal of commander.
• In this model, both players lack information.
• Their payoffs are assigned so that positive numbers correspond to an overall “win” for that player and
negative numbers correspond to an overall “loss” for that player.
• The 2 tables below sum this up for RE and then GT.
• Based on the payoffs given above, we can construct the following extensive game tree.
• Probability values have been labeled
below edges and nodes.
• Just like boxing best response strategies
helps you find all Nash equilibria and
backward induction algorithm helps
you find all subgame perfect equilibria,
similarly we need a method for finding
all possible weak sequential equilibria.
• This example scenario illustrates a
straightforward way how this can be
done.
• The lack of information is indicated by the dashed line boxes around three pairs of the decision nodes.
• Within each of these boxes, the player who is making the decision does not know which history has
occurred.
• For example, within the GT1 box, when GT is making its decision whether to defend the forest or lake, it
does not know RE’s decision whether it advanced through the forest or over the lake.
• However, at the GT2 node, GT knows that they defended an advance across the lake while RE attacked
the village after advancing through the forest.
• This game has 3 subgames.
• First, the entire game has the empty history as its root.
• Second, the subgame with root (Forest, Lake) which corresponds to the node labeled GT2 and everything to the right of GT2.
• Third, the subgame with root (Lake, Forest) which corresponds to the node labeled GT3 and everything to the right of GT3.
• We first analyze the right-most action choices.
• At the information set RE3 by choosing Day, RE expects payoffs of (1 − r)(0) + r(2) = 2r.
• At the information set RE3 by choosing Night, RE expects payoffs of (1 − r)(2) + r(1) = 2 − r.
• Case 1. Let r > 2/3. Then Day is RE’s unique best response.
• So GE’s unique best response at GT3 would be Ambush because a payoff of 1 is larger than a payoff of
−2.
• So 𝑔 will be zero.
• Now to achieve consistency of beliefs, 𝑟 must be equal to 𝑔 as can be checked using the consistency equation.
• So r will also be zero.
• Since r > 2/3 and r = 0 are mutually contradictory, this case is rejected and we can conclude that there is no weak
sequential equilibrium for r > 2/3.
• Case 2. Let r < 2/3. Then Night is RE’s unique best response.
• So GT’s unique best response at GT3 would be Attack because a payoff of 0 is larger than a payoff of −2.
• So 𝑔 will be one.
• Again, to achieve consistency of beliefs, it follows that r = 1.
• Since r < 2/3 and r = 1 are mutually contradictory, this case is also rejected and we can conclude that there is no
weak sequential equilibrium for r < 2/3.
• Therefore, r = 2/3, and to achieve consistency of beliefs, 𝑔 = 2/3.
• The physical significance of r = 2/3 is that RE3 is indifferent between choosing Day and Night.
• Similarly, 𝑔 = 2/3 being GT3’s best response strategy implies that GT3 must be indifferent between choosing
Ambush and Attack because if expected payoff of either was more, then GT3 would select that with full
probability.
• GT’s expected payoff from choosing Ambush at GT3 is (1 − d)(1) + d(−2) = 1 − 3d.
• GT’s expected payoff from choosing Attack at GT3 is (1 − d)(−2) + d(0) = 2d − 2.
• But both these expected payoffs must be equal as GT3 is indifferent between choosing Ambush and Attack.
• 1 − 3d = 2d − 2 implies d =
3/5.
• So expected payoffs at GT3
for the 2 players are
(1−2/3)((1−3/5)(0,1) +
(3/5)(2,−2)) +
(2/3)((1−3/5)(2,−2) +
(3/5)(1,0)) = (4/3,−4/5).
• Since the structure and
payoffs in subgames rooted
at GT2 and GT3 are
identical, in any sequential
equilibrium, 𝑓 = 𝑔 =
2/3, 𝑞 = 𝑟 = 2/3, 𝑐 =
𝑑 = 3/5, and expected
payoffs at GT2 are
(4/3,−4/5).
• This results in the truncated
game tree shown below.
• Now analyze the GT1 information set.
• The expected payoff to GT when it chooses Lake (e = 0) is (1 − p)(−4/5) + p(2) = 2.8p − 0.8.
• The expected payoff to GT when it chooses Forest (e = 1) is (1 − p)(2) + p(−4/5) = 2 − 2.8p.
• Case 1. Let p > ½. Then Lake is the GT’s unique best response.
• Then RE’s best response at RE1 has to be Forest because the resulting payoff 4/3 is larger than the payoffs of −1
and −2 by choosing Don’t and Lake, respectively.
• So 𝑏 is equal to zero.
• To achieve consistency of beliefs, 𝑝 must be equal to 𝑏 which is zero.
• Since p > 1/2 and p = 0 are mutually contradictory, a weak sequential equilibrium cannot have p > 1/2.
• Case 2. Let p < ½.
• A similar argument shows that a weak sequential equilibrium cannot have p < 1/2.
• Thus, p = 1/2.
• To achieve consistency of beliefs, 𝑎 must be equal to 𝑏.
• If RE chooses 𝑎 = 𝑏 = 0 at weak sequential equilibrium, then RE’s expected payoff by choosing Don’t Attack
≥ RE’s expected payoffs by choosing Forest and Lake.
• RE’s expected payoff by choosing Don’t Attack is −1.
• RE’s expected payoffs by choosing Forest and Lake will be (1 − e)(4/3) + e(−2) and (1 − e)(−2) + e(4/3)
respectively.
• Summing the two inequalities −1 ≥ (1 − e)(4/3) + e(−2) and −1 ≥ (1 − e)(−2) + e(4/3) gives −2 ≥ −2/3.
• This is a contradiction, therefore a weak sequential equilibrium cannot have 𝑎 = 𝑏 = 0.
• Therefore, 𝑎 = 𝑏 > 0.
• So RE1 must be indifferent between choosing Forest and Lake.
• Now RE’s expected payoff from choosing Forest is (1−e)(4/3)+e(−2).
• Also RE’s expected payoff from choosing Lake is (1 − e)(−2) + e(4/3).
• So these two expected payoffs must be equal.
• Thus, e = ½.
• RE’s expected payoffs at RE1 for choosing Forest or Lake is −1/3.
• −1/3 is greater than −1 which is the expected payoff from choosing Don’t Attack.
• So RE discards Don’t Attack and chooses as its unique best response at RE1 to be 𝑎 = 𝑏 = 1/2.
• In conclusion, the assessment satisfying 𝑎 = 𝑏 = 1/2, 𝑐 = 𝑑 = 3/5, 𝑓 = 𝑔 = 2/3, 𝑝 = 1/2, 𝑞 = 𝑟 =
2/3 is the only possible weak sequential equilibrium for this game.