0% found this document useful (0 votes)

2 views19 pages

5639 Chain of Experts When LLM

The paper presents Chain-of-Experts (CoE), a novel multi-agent framework that utilizes large language models (LLMs) to solve complex operations research (OR) problems, reducing reliance on domain experts. CoE orchestrates specialized agents with domain knowledge to enhance reasoning capabilities through a structured interaction process, including forward thought construction and backward reflection. Experimental results demonstrate that CoE significantly outperforms existing LLM-based approaches on a new benchmark dataset, ComplexOR, indicating its effectiveness in tackling complex OR challenges.

Uploaded by

Muturi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views19 pages

5639 Chain of Experts When LLM

Uploaded by

Muturi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Published as a conference paper at ICLR 2024

C HAIN - OF -E XPERTS : W HEN LLM S M EET C OMPLEX

O PERATIONS R ESEARCH P ROBLEMS
Ziyang Xiao1 , Dongxiang Zhang1∗, Yangjun Wu1 , Lilin Xu1 , Yuan Wang2 ,
Xiongwei Han2 , Xiaojin Fu2 , Tao Zhong2 , Jia Zeng2 , Mingli Song1 , Gang Chen1
1
Zhejiang University 2 Huawei Noah’s Ark Lab
3
School of Business, Singapore University of Social Sciences
{xiaoziyang, zhangdongxiang, brooksong, cg}@zju.edu.cn
[email protected], {hanxiongwei, Zeng.Jia}@huawei.com
[email protected], [email protected]
[email protected], [email protected]

A BSTRACT

Large language models (LLMs) have emerged as powerful techniques for various
NLP tasks, such as mathematical reasoning and plan generation. In this paper,
we study automatic modeling and programming for complex operations research
(OR) problems, so as to alleviate the heavy dependence on domain experts and
benefit a spectrum of industry sectors. We present the first LLM-based solution,
namely Chain-of-Experts (CoE), a novel multi-agent cooperative framework to
enhance reasoning capabilities. Specifically, each agent is assigned a specific role
and endowed with domain knowledge related to OR. We also introduce a con-
ductor to orchestrate these agents via forward thought construction and backward
reflection mechanism. Furthermore, we build a benchmark dataset (ComplexOR)
of complex OR problems to facilitate OR research and community development.
Experimental results show that CoE significantly outperforms the state-of-the-art
LLM-based approaches both on LPWP and ComplexOR.

1 I NTRODUCTION

Operations research (OR) aims to mathematically model complex decision-making problems that
arise from a wide spectrum of industry sectors. To automate the procedure and reduce the depen-
dence on domain-specific modeling experts, NL4Opt (Natural Language for Optimization) (Rama-
monjison et al., 2022a) has recently emerged as an attractive but challenging NLP task. Its objective
is to translate the text description of an OR problem into math formulations for optimization solvers.
To facilitate understanding the task, an example from the current NL4Opt benchmark dataset is de-
picted in Figure 1. The prevailing NL4Opt models adopt a two-stage framework. Initially, they
perform NER to identify variables, parameters, and constraints from the input text, which are subse-
quently converted into a mathematical optimization model. Despite their efficacy in elementary OR
problems, these approaches fail in tackling complex real-world challenges.
In this paper, we study the automatic modeling and programming of complex OR problems derived
from real-world industrial demands. As shown in Figure 1, their text descriptions often contain
implicit constraints, posing a substantial interpretation challenge for existing NL4Opt solvers. For
instance, the phrase “zero lead times”, highlighted in green, conveys the absence of any time lag
between production orders. Additionally, it is imperative to possess domain-specific knowledge to
understand terminologies such as “backlogging”, “carryover”, and “lot-sizing”. Finally, in contrast
to the explicit input numbers in the simple example, complex OR problems exhibit an abundance
of implicit variables that require specification from domain modeling experts. The magnitude of
variables and constraints in these complex problems introduces formidable hurdles and results in a
longer reasoning chain.

∗
Corresponding author.

1
Published as a conference paper at ICLR 2024

An example of current NL4Opt dataset Modeling result

A theme park transports its visitors around the park either by scooter or rickshaw. A Variables: !, #
scooter can carry 2 people while a rickshaw can carry 3 people. To avoid excessive Constraints:
pollution, at most 40% of the vehicles used can be rickshaws. If the park needs to # ≤ 0.4 # + !
transport at least 300 visitors, minimize the total number of scooters used. 3# + 2! ≥ 300
Objective: -./.-.01 !
An example of our dataset

In the context of manufacturing planning, we tackle the Capacitated Multi-level Lot Sizing Problem with Backlogging.
We make the following assumptions in defining and formulating this problem. First, we assume that setup times and
costs are non-sequence dependent, setup carryover between periods is not permitted, and all initial inventories are
zero. Second, all production costs are assumed to be linear in production output and do not vary over time; hence,
they can be dropped from the model for simplicity. Setup and holding costs also are assumed not to vary over time.
Furthermore, end items are assumed to have no successors, and only end items have external demands and
backlogging costs. Finally, we assume zero lead times and no lost sales. It is important to note that all these
assumptions (except setup carryover) are made for ease of exposition only and without loss of generality, i.e., the
theoretical results remain valid even when they are removed. See Ozturk and Ornek (2010) for the lot-sizing problem
with setup carryover as well as with external demands for component items.

Figure 1: Comparison between elementary and complex NL4Opt problems. In the complex OR
example, phrases in green indicate implicit constraints, and the Domain-specific terminologies are
highlighted in yellow. The model output is presented in the Appendix A.1.

To resolve the above issues, we leverage the power of LLMs and present the first LLM-based solu-
tion. We propose a multi-agent reasoning framework, namely Chain-of-Experts (CoE), to orches-
trate multiple LLM agents for complex OR problem solving. At the helm of the collaborative en-
deavor, there presides a central entity, designated as the “Conductor”, responsible for orchestrating
the sequence of interactions among the agents. Each agent is assigned a specific role and is equipped
with domain-specific expertise. We implement diversified agents with different skills, including but
not limited to terminology interpreter, construction of mathematical models and programming. Fur-
thermore, we incorporate a backward reflection mechanism. Through a systematic analysis of the
output, the framework has the capacity to detect potential errors in the problem-solving process.
Comparison with other LLM-based reasoning. In recent years, extensive research efforts have
been devoted to enhancing the reasoning capabilities of Large Language Models (LLMs). Notable
examples in this domain include Chain-of-Thought (Wei et al., 2022), Self-consistency (Wang et al.,
2023a), Tree of Thoughts (Yao et al., 2023a), Graph of Thoughts (Besta et al., 2023), Progressive-
Hint Prompting (Zheng et al., 2023), ReAct (Yao et al., 2023b). These works have formulated
distinct prompting schemes and approaches to thought transformation. Further elaboration on these
methodologies is presented in the subsequent section. Unfortunately, these single-agent LLMs as
well as multi-agent schemes like Solo-Performance Prompting (Wang et al., 2023b) exhibit conspic-
uous limitations when confronted with complex OR problems because they cannot simultaneously
tackle the challenges of implicit constraints, external knowledge prerequisites and long reasoning
chain. In our CoE, we address these challenges via multi-expert collaboration and experimental
results indicate that CoE can significantly outperform the LLM-based approaches.
Contributions. (1) We study NL4Opt at the more challenging level, which requires the model
to have implicit constraint discovery, domain-specific knowledge, and complex reasoning capabil-
ity. (2) This is the first LLM-based solution to complex OR problems. (3) We propose a novel
multi-agent framework called Chain-of-Experts (CoE), enabling collaborative problem-solving and
iterative modeling optimization based on the forward thought construction and backward reflection
mechanism. (4) We also build a new dataset (ComplexOR) and the experimental results on it affirm
the superior performance of CoE over 8 other LLM-based reasoning baselines.

2 R ELATED WORK
NL4Opt Problems. NL4Opt aims to translate the descriptions of OR problems into mathematical
formulations. A benchmark dataset1 was curated by Ramamonjison et al. (2022a). To bridge the gap
1
https://github.com/nl4opt/nl4opt-competition

2
Published as a conference paper at ICLR 2024

between the natural language input p and context-free formulation r, they proposed a two-stage map-
ping p → r → f that first adopted the BART-base model (Lewis et al., 2020) with copy mechanism
to generate an intermediate representation r, which was then parsed into a canonical formulation.
Edit-based models (Malmi et al., 2022) can be applied as a post-processing step for error correction.
The two-stage framework was followed by subsequent studies. He et al. (2022) introduced an en-
semble text generator leveraging multitask learning techniques to enhance the quality of generated
formulations. In a similar vein, Ning et al. (2023) proposed a prompt-guided generation framework,
complemented by rule-based pre-processing and post-processing techniques, to enhance accuracy.
In a related research endeavor, Prasath & Karande (2023) investigated the synthesis of mathematical
programs. GPT-3 with back translation was utilized to synthesize the canonical forms as well as
Python code.

LLMs-based Reasoning. Language models have shown substantial potential in solving complex
reasoning tasks within specific domains, such as TSP(Zhang et al., 2023), databases(Xuanhe Zhou,
2023) and knowledge systems(Zhu et al., 2023). The Chain-of-Thought (CoT) (Wei et al., 2022)
broke a complex reasoning task into a series of intermediate reasoning steps. Self-consistency
(Wang et al., 2023a) replaced the greedy decoding in CoT by sampling a diverse set of reason-
ing paths and selecting the most consistent answer. Tree of Thoughts (ToT) (Yao et al., 2023a) and
Graph of Thoughts (GoT) (Besta et al., 2023) further enhanced the reasoning capability by allowing
LLMs to explore and combine thoughts in a structured manner. Progressive-Hint Prompting (PHP)
(Zheng et al., 2023) progressively refined the answers by leverageing previously generated answers
as hints. Subsequent works, such as ReAct (Yao et al., 2023b) and Reflexion (Shinn et al., 2023),
allowed LLMs to interface with additional information or feedback from external sources. Recently,
cooperation among multiple agents has also been explored. CAMEL (Li et al., 2023) introduced
a novel communicative agent framework for autonomous cooperation. Solo Performance Prompt-
ing (SPP) (Wang et al., 2023b) transformed a single LLM into a cognitive synergist by simulating
multiple personas and demonstrated the potential problem-solving abilities for multi-agent systems.

3 P ROPOSED M ETHOD
3.1 E XPERT D ESIGN

In our reasoning framework, an “expert” refers to a specialized agent based on a Large Language
Model (LLM) augmented with domain-specific knowledge and reasoning skills. Each expert is
assigned a specific role and undergoes four steps:
Step 1: In-context Learning. Each agent is allowed to access an external knowledge base and
perform top-k retrieval against the knowledge base. The retrieved information is then provided to
the LLM to facilitate in-context learning. For example, an expert responsible for generating Gurobi
programs can access the Gurobi official API documentation. This step is optional, depending on the
availability of the knowledge base.
Step 2: Reasoning. LLM-based expert utilizes existing prompting techniques, such as Chain-of-
Thought or self-consistency, to perform reasoning task according to their specific role. Our reason-
ing procedure consists of forward thinking and reflection modes, whose details will be presented in
the subsequent section.
Step 3: Summarize. Due to the token limit constraint in a single interaction with LLM, an expert
can choose to summarize their reasoning output. Since this step may result in significant information
loss, it is optional for certain experts (e.g., modeling experts).
Step 4: Comment. This step is inspired by Solo Performance Prompting (Wang et al., 2023b), in
which the participants are allowed to give critical comments and detailed suggestions. The objective
is to make the communication between agents more constructive.

3.2 T HE WORKFLOW OF C HAIN - OF -E XPERTS

The framework of our proposed Chain-of-Experts (CoE) is depicted in Figure 2. We initialize a

collection of 11 experts such as terminology interpreter, modeling experts, programming experts,
and code reviewing expert. Their detailed design specifications are available in Appendix A.2.1.

3
Published as a conference paper at ICLR 2024

Problem Input: In the context of manufacturing planning, we tackle the Multi-level Lot Sizing Problem with Backlogging. We assume that…

Conductor Terminology Interpreter: Modeling Expert: Programmer:

“backlogging” refers to a ① Variables:𝑥!" , 𝐼!" , 𝐵!" ② import gurobipy as gp
start situation where customer Constraints:… from gurobipy import GRB
orders cannot be met on time… Object: Minimize … model = gp.Model(”MLSP") …
③
Terminology ① Modeling Expert: Programmer:
Interpreter
I apologize that I have ⑤ I‘ve reviewed the code, and ④ Evaluator’s Feedback:
② Line 41: Variable ’Q‘ is not
reviewed the modeling process, confirm that it accurately
⑤ defined...
there was an error … reflects the modeling…
⑥ Modeling
Expert ⑥
Programmer Programmer: Evaluator’s
forward pass
③ ④⑦ Here‘s the corrected Python ⑦ ⑧
Feedback: Final answer backward pass
⑧ code based on the new
Run successfully! forward pass
Evaluator modeling…
User

Figure 2: An example to illustrate the workflow of Chain-of-Experts. In this example, the Conductor
receives the input problem and starts coordinating the experts. The exemplar workflow consists of ➀:
terminology interpretation for the input problem; ➁: problem modeling; ➂: program generation; ➃:
evaluation of correctness and identify an issue; ➄: reflection of program, confirming correctness; ➅:
reflection modeling, find a mistake; ➆: proceed with program generation again; ➇: final evaluation,
confirming correctness.

There is a Conductor to effectively coordinate these experts. It iteratively and dynamically selects
an expert to construct a forward thought chain. The candidate answer generated by the forward
reasoning chain will be passed to an external program execution environment, whose feedback signal
triggers a backward reflection procedure. The details of these two key components are elaborated in
the following:
Forward Thought Construction. During the forward-thought construction phase, the experts are
sequentially selected by the Conductor. We can formulate forward thought construction as a sequen-
tial decision-making process, where we view the set of experts as the action space. Let’s define the
input problem description as P, and a set of pre-defined experts as ϵ = {Eϕ1 , Eϕ2 , ..., Eϕn }, where
n is the total number of experts and ϕi represents the configuration of the i-th expert. Each expert is
associated with an optional knowledge base and a prompt template. We denote the set of comments
at the t-th reasoning step as Ct and define the state in Equation 1.
St = (P, Ct , t) (1)
Unlike traditional reinforcement learning which requires a large amount of training data, we utilize
a training-free approach by leveraging the prompt technique of large language models. This allows
us to achieve the same functionality as a decision-making agent without any training. Consequently,
we model the policy of Conductor in Equations 2, where the Conductor acts as a policy function to
select the experts, F represents the large language model, θ′ represents the parameters of LLM, and
PTt denotes the prompt template at the t-th step.
ConductorF θ′ (PTt ) (e|s) = Pr {Eϕt = e|St = s} (2)
Based on the above formulation, the expert selection policy can be translated into the design of a
prompt template PTt , which requires prompt engineering to achieve an optimal policy. The detailed
design of the prompt template is presented in Appendix A.2.2. After designing Conductor, we can
update the comment set in each reasoning step as follows.
Eϕit = Conductor(St ) (3)
c = Eϕit (P, Ct ) (4)
Ct+1 = Ct ∪ {c} (5)
where Eϕit represents the selected it -th expert at step t and c denotes the comment of the selected
expert. We concatenate the previous comments Ct and c to obtain Ct+1 as the new state. After a
fixed number of steps T , the forward process in the Chain-of-Experts framework is terminated. At
this point, all the comments are summarized to form the final answer A.
Backward Reflection. The backward reflection mechanism in the Chain-of-Experts enables the
system to leverage external feedback and adjust the collaboration among experts based on the

4
Published as a conference paper at ICLR 2024

Algorithm 1 Chain-of-Experts
Input: problem description p
Parameters: forward steps N , maximum forward-backward trials T
1: Initialize a set of comments C ← ∅
2: Initialize a stack of experts E ← ∅
3: for t = 1, ..., T do
4: for i = 1, ..., N do
5: experti ← Conductor(p, C)
6: comment ← experti (p, C)
7: C ← C ∪ {comment}
8: E.push(experti )
9: end for
10: answer ← Reducer(p, C)
11: f eedback, passed ← Evaluator(answer)
12: if passed then
13: return answer
14: end if
15: stop backward ← false
16: while not stop backward and not E.empty() do
17: expert ← E.pop()
18: f eedback, stop backward ← expert.ref lect(p, C, f eedback)
19: C ← C ∪ {f eedback}
20: end while
21: end for
22: return answer

evaluation of problem-solving results. Let’s define the trajectory of experts selected in order as
τ = {Eϕi1 , Eϕi2 , ..., EϕiT }, where it represents the index of the expert selected at step t. The back-
ward reflection process starts with external feedback rraw , which is typically provided by a program
execution environment. This process can be denoted as rraw ← execution(A). Then, the initial
signals are derived from the evaluation of the raw external feedback: (r0 , sr0 ) ← evaluate(rraw ),
where r0 is a boolean value indicating whether the backward process needs to continue and sr0
represents the natural language summary of the feedback, which is used to locate errors during the
backward reflection process. If the answer A is deemed as correct, r0 is set to false and the whole
reasoning procedure terminates. Otherwise, the Chain-of-Experts initiates a backward self-reflection
process to update the answer. The process begins with the last expert EϕiT , and backpropagates in
reverse order to iteratively update the feedback signal. At the t-th backward step, the update of the
state is described by Equation 6 and 7, where ref lect represents one of the reasoning abilities in
expert design. The ref lect function also produces a tuple of rt and srt , which aligns with the output
of the evaluate function. In this case, rt is set to true when the expert confirms the correctness of
its previous comment.
(rt , srt ) ← ref lect(EϕiT −t+1 , P, Ct , rt−1 ) (6)
Ct+1 = Ct ∪ {srt } (7)
The backward reflection process continues until the feedback signal rt indicates that the expert
EϕiT −t+1 is the one who made the mistake or or until all experts have been reflected upon. Subse-
quently, a forward process will be performed again.
It is worth noting that our reflection method differs from Reflexion (Shinn et al., 2023), where
reflection is performed at the system level with interaction among multiple experts, and the feedback
is recursively backpropagated. In contrast, Reflexion just involves a single LLM.

3.3 I MPLEMENTATION D ETAILS

Algorithm 1 provides the implementation pseudo-code of the Chain-of-Expert framework, which

consists of four main stages:
Initialization (lines 1 - 2): The process begins by initializing the set of comments C. Additionally,
a stack S is used to store the selected experts, ensuring a first-in-last-out order for forward thought
construction and backward reflection.

5
Published as a conference paper at ICLR 2024

Forward Thought Construction (lines 4 - 9): Experts are selected sequentially by the Conductor, with
each expert contributing their comments to the global comment set C. Forward construction process
continues for a fixed number of steps N . As depicted in line 10, once the forward construction is
completed, a Reducer is employed to summarize all the comments and generate a final answer. Since
the comment set contains numerous comments after the forward construction, the Reducer plays a
crucial role in summarizing and reconciling these comments. For more detailed prompt template
design of the Reducer, please refer to Appendix A.2.3.
Backward Reflection (lines 11 - 20): In line 11, once a solution is obtained, an Evaluator gathers
feedback and converts it into natural language feedback to assess its correctness. If the solution is
deemed incorrect, the system enters a reflection phase. In this phase, experts are consulted iteratively
in reverse order by removing them from the stack. They are prompted to reflect on their solution and
provide additional comments if necessary. As indicated in line 16, the backward process continues
until a mistake is found by self-reflection or the first expert is reached.
Iterative Improvement (loop in line 3): The steps of forward thought construction and backward
reflection are repeated iteratively until a satisfactory solution is achieved or a maximum number of
trials T is reached.

4 E XPERIMENTS
4.1 DATASETS

LPWP. The LPWP dataset (Ramamonjison et al., 2022b) is collected from the NL4Opt competition
in NuerIPS 2022. It comprises 1101 elementary-level linear programming (LP) problems. Each
problem consists of a text description with IR annotations including parameters, variables, linear
constraints and the objective function. The dataset is partitioned into 713 training samples, 99
validation samples, and 289 test samples for performance evaluation.
ComplexOR. With the assistance from three specialists with expertise in operations research, we
constructed and released the first dataset for complex OR problems. We selected 37 problems from
diversifed sources, including academic papers, textbooks, and real-world industry scenarios. These
problems cover a wide range of subjects, spanning from supply chain optimization and scheduling
problems to warehousing logistics. It took the experts nearly a month to annotate each problem with
model formulation, and a minimum of five test cases to verify the correctness of generated code.

4.2 M ODEL S ETUP AND P ERFORMANCE M ETRICS

In our experimental setup, we use the GPT-3.5-turbo as the default large language model. We set the
parameter temperature to a value of 0.7 and conduct five runs to average the metrics. The number
of iterations is set to 3, with each iteration consisting of 5 forward steps by default.
Since it is infeasible for domain experts to manually evaluate the output from the LLM-based so-
lutions, we employed an automated code evaluation process. Specifically, we require each solution
to generate the programming code for each OR problem. If the code can pass the associated test
cases annotated by the OR specialists, we consider the problem is successfully solved. We use Ac-
curacy to indicate the success rate. Besides this rigorous metric, we also adopt compile error rate
(CE rate) to capture the percentage of generated programs that fail to compile, possibly caused
by issues in the automatic modeling process; Alternatively, runtime error rate (RE rate) measures
the percentage of generated programs that encounter errors during execution, which are caused by
internal logic errors such as unsolvable models or non-linear constraints. The experimental code is
at https://github.com/xzymustbexzy/Chain-of-Experts.

4.3 BASELINES

We compare CoE with 9 baselines. As to traditional approaches for NL4Opt, we consider tag-BART
(Neeraj Gangwar, 2022) as a SOTA model, which won 1st place in the NeurIPS competition (Ra-
mamonjison et al., 2022b). We also compare CoE with prevailing LLM-based methods, including
Chain-of-Thought, Progressive Hint, Tree-of-Thought, Graph-of-Thought, ReAct, Reflexion
and Solo Performance Prompting. The default GPT without any optimization on the reasoning

6
Published as a conference paper at ICLR 2024

Table 1: Comparison with baselines on LPWP and ComplexOR

LPWP ComplexOR
Method
Accuracy↑ CE rate↓ RE rate↓ Accuracy↑ CE rate↓ RE rate↓
tag-BART 47.9% - - 0% - -
Standard 42.4% 18.1% 13.2% 0.5% 36.8% 8.6%
Chain-of-Thought 45.8% 20.5% 9.4% 0.5% 35.3% 8.6%
Progressive Hint 42.1% 19.4% 10.3% 2.2% 35.1% 13.5%
Tree-of-Thought 47.3% 17.4% 9.7% 4.9% 31.4% 7.6%
Graph-of-Thought 48.0% 16.9% 9.1% 4.3% 32.4% 8.1%
ReAct 48.5% 15.5% 11.2% 14.6% 31.9% 10.8%
Reflexion 50.7% 7.3% 9.0% 13.5% 12.9% 10.1%
Solo Performance 46.8% 17.9% 13.6% 7.0% 46.5% 13.5%
CoE without expert 55.1% 4.0% 11.9% 18.8% 7.9% 15.0%
Chain-of-Experts 58.9% 3.8% 7.7% 25.9% 7.6% 6.4%

chain is named Standard, which is expected to achieve inferior performance. We also implement
a baseline that uses the same model, which uses a uniform system prompt, ”You are a helpful as-
sistant,” across all roles, without any additional knowledge bases. The detailed implementation of
these algorithms is described in Appendix A.3.

4.4 OVERALL P ERFORMANCE ON LPWP AND C OMPLEX OR

The results in terms of accuracy, CE rate and RE rate in the two benchmark datasets are reported
in Table 1. Since the traditional method tag-BART is not capable of generating code, we measure
its accuracy by requiring its constraints and objective in the math model to be correct. Note that
generating a valid linear programming model is a prerequisite step for correct code generation. Even
under such a loose evaluation metric, tag-BART is still inferior to certain LLM-based baselines,
verifying the feasibility of applying LLM to solve OR problems. We also observe that tag-BART
fails in all problem instances of ComplexOR.
Among the LLM-based baselines, Reflexion stands out as the most promising OR problem solver.
Due to its self-reflection mechanism, it achieves the lowest CE rate and RE rate in both datasets.
When confronted with complex OR problems, its overall accuracy is slightly inferior to ReAct. The
reason is that in complex OR problems, the ability to access external knowledge bases becomes
more crucial, which is a strength of ReAct. Even though Solo Performance also adopts a multi-
agent reasoning framework, its performance is not satisfactory. Unlike our collaborative reasoning
framework, its agents are simply initialized by a leader LLM and lack effective cooperation to solve
the challenging OR problems.
Our proposed CoE established clear superiority among all performance metrics in both datasets.
In LPWP, the accuracy is 58.9%, surpassing the state-of-the-art agent Reflexion by 8.2%. In its
best performance, CoE also manages to solve 10 out of 37 complex problem instances in the Com-
plexOR. The outstanding performance owes to the effective design of the Chain-of-Experts reason-
ing framework, including the expert design methodology, the roles of the Conductor and Reducer,
and the reflection mechanism. We also find that the removal of experts’ features leads to a de-
crease in accuracy, which suggests that the CoE benefits from using specialized experts. In the next
experiment, we will investigate the effect of these ingredients through an ablation study.

4.5 A BLATION S TUDY

Regarding single expert design, Table 2 highlights the positive roles played by both the knowledge
base and reasoning ability. Removing these components results in a slight drop in accuracy. Inter-
estingly, we found that summarization is the most crucial design aspect. Reasons are two-fold. First,
the length of comments may exceed the token limit of GPT-turbo-3.5 and the overflowed tokens will
be discarded. Second, a compact and meaningful summary is more friendly for decision making in
the downstream experts.

7
Published as a conference paper at ICLR 2024

Table 2: Ablation study of Chain-of-Experts

LPWP ComplexOR
Method
Accuracy CE rate RE rate Accuracy CE rate RE rate
CoE (Full) 58.9% 3.8% 7.7% 25.9% 7.6% 6.4%
w/o knowledge base 58.0% 4.0% 8.5% 23.3% 8.4% 7.9%
inner-
w/o CoT reasoning 58.2% 3.7% 7.9% 24.3% 8.1% 6.4%
agent
w/o summarize 56.3% 3.8% 9.4% 20% 7.6% 10.3%
w/o Reflection 55.6% 4.2% 12.2% 22.7% 7.8% 10.6%
inter-
w/o Conductor 54.2% 6.5% 8.2% 21.1% 8.1% 8.6%
agent
w/o Reducer 56.5% 5.5% 8.8% 23.0% 9.2% 8.1%

For inter-expert collaboration, we evaluate the effect of backward reflection, forward thought con-
struction in Conductor, and reducer. As shown in Table 2, after removing the component of back-
ward reflection, the accuracy drops significantly from 58.9% to 55.6%, and the RE rate increases
noticeably from 7.7% to 12.2%. These results imply that without the reflection mechanism, the
system is prone to mistakes in logical reasoning and lacks the ability for self correction. To eval-
uate the effect of Conductor, we replace it with random selection of subsequent experts during the
construction of the forward chain of thought. The performance also degrades significantly because
the experts are no longer well coordinated and the random selection of experts may even be detri-
mental to the reasoning process. It’s surprising to find that the Reducer component also contributes
remarkably. This module summarizes the collective insights from multiple preceding experts. If we
remove it, the answer will be extracted from the concatenation of the experts’ raw comments, which
may lead to incorrect conclusions, as the comments can be fragmented and even conflicting with
each other.

4.6 PARAMETER S ENSITIVITY A NALYSIS

We evaluate two critical parameters related to reasoning capabilities, including the number of steps
in the forward thought construction and the temperature in the Large Language Model. From the
results shown in Figure 3a, we can draw two conclusions. Firstly, a lower value for the temperature
parameter tends to lead to better performance. This suggests that, for knowledge-intensive problems,
the experts benefit from providing more deterministic and consistent thoughts, rather than creative or
diverse ones. Secondly, a longer reasoning chain that involves more experts in the forward thought
construction generally improves accuracy. However, it occurs at the cost of higher reasoning time
cost and more API requests. That’s why we select temperature = 0 and f orward steps = 5 as
the default parameter configuration.

100
99.2%

88.3%

0.58 80
69.4%
Frequency (%)

60
0.56 47.3%
Accuracy

44.0%

40 38.0%
34.3%
38.5%

0.54
temperature=0.0 20 14.1%

temperature=0.3
10.2% 10.3%

0.52
6.4%

temperature=0.6 0
Terminology Interpreter

Parameter Extraction

Variable Extraction

Constraint Extraction

Object Extraction

Modeling Knowledge

Modeling

LP File Generation

Feasibility Check

Code Example Provider

Programming

Code Reviewer

0.50 temperature=0.9
2 3 4 5 6 7 8
Forward Steps
(a) CoE performance on different parameter settings. (b) Selection frequency of individual expert.

Figure 3: Parameter sensitive analysis and selection frequency analysis on LPWP

8
Published as a conference paper at ICLR 2024

Table 3: Robustness of Chain-of-Experts under different large language models

GPT-3.5-turbo GPT-4 Claude2

Method
LPWP ComplexOR LPWP ComplexOR LPWP ComplexOR
Standard 42.4% 0.5% 47.3% 4.9% 44.9% 0.0%
Reflexion 50.7% 13.5% 53.0% 16.8% 51.4% 12.4%
Chain-of-Experts 58.9% 25.9% 64.2% 31.4% 62.0% 27.0%

4.7 OTHER LLM S AS BASE R EASONING M ODEL

We also conduct an investigation into the impact of using different LLMs within the Chain-of-
Experts. We consider GPT-4 and Claude2 as two alternative LLMs and select Standard and
Reflexion as two baselines. As shown in Table 3, all methods benefit from the upgrade of more
advanced LLMs. However, our Chain-of-Experts approach exhibited the most substantial improve-
ment. For instance, when GPT-4 is used, we observed an accuracy boost of 8.3% in LPWP and
5.5% in ComplexOR, the highest among the three methods.

4.8 E XPERIMENTAL A NALYSIS OF E XPERT S ELECTION F REQUENCY

In the final experiment, we aim to gain a deeper understanding of the Conductor’s behavior and
examine the rationality behind its selection of experts.
First, we conduct experiments on the LPWP dataset and analyze the selection frequency of each
expert. In Figure 3b, we observe that the Programming Expert and Modeling Expert are the two
most frequently selected experts. This finding is consistent with the expectation that modeling and
programming are crucial to solving OR problems. Additionally, we notice that the extraction of
parameters, variables, constraints, and objective functions is rarely selected. This can be attributed to
advancements in language comprehension capabilities of LLMs, which now can understand problem
statements directly, without the need for the step-by-step NER in traditional methods.
Moreover, we study the most frequently sampled collaboration paths involving multiple experts.
Each problem-solving process provides a path that represents the order of experts involved. In Table
4, we observe that when the parameter f orward steps is set to 2, where only two experts collaborate
to solve a problem, the most frequent path is from the Modeling to the Programming Expert. This
finding aligns with the importance of these two roles in problem-solving. Additionally, when the
steps is set to 6, the collaboration path becomes more complex and resembles real-world workflows.

Table 4: The most frequently collaboration paths for experts on different forward steps settings.

Forward steps Most frequent path

2 Modeling → Programming
3 Knowledge → Modeling → Programming
4 Terminology Interpreter → Knowledge → Modeling → Programming
5 Terminology Interpreter → Modeling → LP file Generator → Programming
→ Code Reviewer
6 TI → Modeling → LP file Generator → Programming Example Provider →
Programming → Code Reviewer

5 C ONCLUSION
In this paper, we presented the first LLM-based solution to complex OR problems. To enhance
reasoning capabilities, we devised Chain-of-Experts (CoE), a novel multi-agent cooperative frame-
work. The core of CoE was a conductor orchestrating a group of LLM-based experts via forward
thought construction and backward reflection mechanism. We built a new dataset, ComplexOR to
facilitate OR research and community development. Experimental results indicated that our CoE
significantly outperformed the state-of-the-art reasoning methods both on LPWP and ComplexOR.

9
Published as a conference paper at ICLR 2024

6 ACKNOWLEDGEMENTS

The work is supported by the National Key Research and Development Project of China
(2022YFF0902000).

R EFERENCES
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda,
Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoe-
fler. Graph of thoughts: Solving elaborate problems with large language models. CoRR,
abs/2308.09687, 2023. doi: 10.48550/arXiv.2308.09687. URL https://doi.org/10.
48550/arXiv.2308.09687.

JiangLong He, Mamatha N, Shiv Vignesh, Deepak Kumar, and Akshay Uppal. Linear program-
ming word problems formulation using ensemblecrf ner labeler and t5 text generator with data
augmentations, 2022.

Dave Hulbert. Tree of knowledge: Tok aka tree of knowledge dataset for large language models llm.
https://github.com/dave1010/tree-of-thought-prompting, 2023.

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer
Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-
training for natural language generation, translation, and comprehension. In Proceedings of
the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880, On-
line, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.703.
URL https://aclanthology.org/2020.acl-main.703.

Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem.
CAMEL: communicative agents for ”mind” exploration of large scale language model society.
CoRR, abs/2303.17760, 2023. doi: 10.48550/arXiv.2303.17760. URL https://doi.org/
10.48550/arXiv.2303.17760.

Eric Malmi, Yue Dong, Jonathan Mallinson, Aleksandr Chuklin, Jakub Adámek, Daniil Mirylenka,
Felix Stahlberg, Sebastian Krause, Shankar Kumar, and Aliaksei Severyn. Text generation with
text-editing models. CoRR, abs/2206.07043, 2022. doi: 10.48550/arXiv.2206.07043. URL
https://doi.org/10.48550/arXiv.2206.07043.

Nickvash Kani Neeraj Gangwar. Tagged input and decode all-at-once strategy. https://
github.com/MLPgroup/nl4opt-generation, 2022.

Yuting Ning, Jiayu Liu, Longhu Qin, Tong Xiao, Shangzi Xue, Zhenya Huang, Qi Liu, Enhong
Chen, and Jinze Wu. A novel approach for auto-formulation of optimization problems, 2023.

Ganesh Prasath and Shirish Karande. Synthesis of mathematical programs from natural language
specifications. DL4C Workshop, ICLR, 2023.

Rindra Ramamonjison, Haley Li, Timothy Yu, Shiqi He, Vishnu Rengan, Amin Banitalebi-dehkordi,
Zirui Zhou, and Yong Zhang. Augmenting operations research with auto-formulation of optimiza-
tion models from problem descriptions. In Proceedings of the 2022 Conference on Empirical
Methods in Natural Language Processing: Industry Track, pp. 29–62, Abu Dhabi, UAE, Decem-
ber 2022a. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-industry.4.
URL https://aclanthology.org/2022.emnlp-industry.4.

Rindra Ramamonjison, Haley Li, Timothy T. L. Yu, Shiqi He, Vishnu Rengan, Amin Banitalebi-
Dehkordi, Zirui Zhou, and Yong Zhang. Augmenting operations research with auto-formulation
of optimization models from problem descriptions. In Yunyao Li and Angeliki Lazaridou (eds.),
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing:
EMNLP 2022 - Industry Track, Abu Dhabi, UAE, December 7 - 11, 2022, pp. 29–62. Association
for Computational Linguistics, 2022b. doi: 10.18653/v1/2022.emnlp-industry.4. URL https:
//doi.org/10.18653/v1/2022.emnlp-industry.4.

10
Published as a conference paper at ICLR 2024

Noah Shinn, Beck Labash, and Ashwin Gopinath. Reflexion: an autonomous agent with dynamic
memory and self-reflection. CoRR, abs/2303.11366, 2023. doi: 10.48550/arXiv.2303.11366.
URL https://doi.org/10.48550/arXiv.2303.11366.

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha
Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language
models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Ki-
gali, Rwanda, May 1-5, 2023. OpenReview.net, 2023a. URL https://openreview.net/
pdf?id=1PL1NIMMrw.

Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, and Heng Ji. Unleashing
cognitive synergy in large language models: A task-solving agent through multi-persona self-
collaboration. CoRR, abs/2307.05300, 2023b. doi: 10.48550/arXiv.2307.05300. URL https:
//doi.org/10.48550/arXiv.2307.05300.

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi,
Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language
models. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/
2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.
html.

Guoliang Li Xuanhe Zhou, Zhaoyan Sun. Db-gpt: Large language model meets database, 2023.

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik
Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. CoRR,
abs/2305.10601, 2023a. doi: 10.48550/arXiv.2305.10601. URL https://doi.org/10.
48550/arXiv.2305.10601.

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan
Cao. React: Synergizing reasoning and acting in language models. In The Eleventh International
Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenRe-
view.net, 2023b. URL https://openreview.net/pdf?id=WE_vluYUL-X.

Dongxiang Zhang, Ziyang Xiao, Yuan Wang, Mingli Song, and Gang Chen. Neural tsp
solver with progressive distillation. In Proceedings of the Thirty-Seventh AAAI Conference
on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artifi-
cial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence,
AAAI’23/IAAI’23/EAAI’23. AAAI Press, 2023. ISBN 978-1-57735-880-0. doi: 10.1609/aaai.
v37i10.26432. URL https://doi.org/10.1609/aaai.v37i10.26432.

Chuanyang Zheng, Zhengying Liu, Enze Xie, Zhenguo Li, and Yu Li. Progressive-hint prompting
improves reasoning in large language models. CoRR, abs/2304.09797, 2023. doi: 10.48550/
arXiv.2304.09797. URL https://doi.org/10.48550/arXiv.2304.09797.

Di Zhu, Hailian Yin, Yidan Xu, Jiaqi Wu, Bowen Zhang, Yaqi Cheng, Zhanzuo Yin, Ziqiang
Yu, Hao Wen, and Bohan Li. A survey of advanced information fusion system: from model-
driven to knowledge-enabled. Data Science and Engineering, 8:1–13, 04 2023. doi: 10.1007/
s41019-023-00209-8.

A A PPENDIX

A.1 A N EXAMPLE OF C OMPLEX OR DATASET

The example shown in Figure 4 shows a typical instance of a complex operations research problem.
Specifically, it illustrates a Capacitated Lot Sizing Problem, which is much more challenging. This
problem involves a wide range of constraints, such as summations, equations, and inequalities. Un-
like simpler problems, the objective function in this case is not a straightforward linear expression
but rather a summation across multiple set variables. These combined characteristics categorize this
problem as a complex optimization challenge.

11
Published as a conference paper at ICLR 2024

An example of our dataset

In the context of manufacturing planning, we tackle the Capacitated Multi-level Lot Sizing Problem with Backlogging.
We make the following assumptions in defining and formulating this problem. First, we assume that setup times and
costs are non-sequence dependent, setup carryover between periods is not permitted, and all initial inventories are
zero. Second, all production costs are assumed to be linear in production output and do not vary over time; hence,
they can be dropped from the model for simplicity. Setup and holding costs also are assumed not to vary over time.
Furthermore, end items are assumed to have no successors, and only end items have external demands and
backlogging costs. Finally, we assume zero lead times and no lost sales. It is important to note that all these
assumptions (except setup carryover) are made for ease of exposition only and without loss of generality, i.e., the
theoretical results remain valid even when they are removed. See Ozturk and Ornek (2010) for the lot-sizing
problem with setup carryover as well as with external demands for component items.

Modeling Result

Sets: 𝑃𝑒𝑟𝑖𝑜𝑠, 𝑀, 𝐼, 𝑒𝑛𝑑, 𝑒𝑡𝑎

Parameters: 𝑠𝑐, 𝑏𝑐, ℎ𝑐, 𝑠𝑡, 𝑎, 𝑔𝑑, 𝑀𝑛, 𝑟, 𝐶
Variables: 𝑥, 𝑠, 𝑏, 𝑦, 𝑏𝑐
Constraints:
invBalance1: 𝑥!,# + 𝑠!,#$% + 𝑏!,# − 𝑏!,#$% = 𝑔𝑑!,# + 𝑠!,# for 𝑖 ∈ 𝑒𝑛𝑑, 𝑡 ∈ 𝑃𝑒𝑟𝑖𝑜𝑑𝑠
invBalance2: 𝑥!,# + 𝑠!,#$% = 𝑔𝑑!,# + ∑&∈(#)! 𝑟!,& : 𝑥&,# + 𝑠!,# for 𝑖 ∈ 𝐼 − 𝑒𝑛𝑑, 𝑡 ∈ 𝑃𝑒𝑟𝑖𝑜𝑑𝑠
capConstraints: ∑!∈* 𝑎!,+ ⋅ 𝑥!,# + ∑!∈* 𝑠𝑡!,+ ⋅ 𝑦!,# <= 𝐶+,# for 𝑚 ∈ 𝑀, 𝑡 ∈ 𝑃𝑒𝑟𝑖𝑜𝑑𝑠
setupConstraints: 𝑥!,# <= 𝑀𝑛 ⋅ 𝑦!,# for 𝑖 ∈ 𝐼, 𝑡 ∈ 𝑃𝑒𝑟𝑖𝑜𝑑𝑠
initStatus: 𝑠!,, = 0 for 𝑖 ∈ 𝐼
endStatus: 𝑏!,- = 0 for 𝑖 ∈ 𝐼
Objective: minimize ∑!∈*,#∈.(/!012 (𝑠𝑐! ⋅ 𝑦!,# + ℎ𝑐! ⋅ 𝑠!,# ) + ∑!∈(31,#∈.(/!012 𝑏𝑐! ⋅ 𝑏!,#

Figure 4: An example of ComplexOR dataset.

A.2 M ORE I MPLEMENTATION D ETAILS OF C HAIN - OF -E XPERTS

A.2.1 E XPERTS D ESIGN

In this section, we provide an in-depth overview of the individual experts participating in the Chain-
of-Experts framework. Table 5 offers a comprehensive list of these experts, each assigned a specific
role and domain knowledge relevant to OR problem-solving.

Table 5: All experts involved in Chain-of-Experts

Expert name Knowledge base

Terminology Interpreter Supply Chain Optimization & Scenario Modeling
Parameter Extraction Expert -
Variable Extraction Expert -
Constraint Extraction Expert -
Objective Function Extraction Expert -
Modeling Knowledge Supplement Expert GAMS-Cutting Edge Modeling
Modeling Expert -
LP File Generation Expert LP format Documentation
Constraint Feasibility Check Expert -
Programming Example Provider Gurobi Example Tour
Programming Expert Gurobi Reference Manual
Code Reviewer -

Below, we present the detailed descriptions and prompt template implementations for each expert.
Please note that text enclosed within curly braces signifies placeholders that will be dynamically
populated during runtime based on the specific problem description, comments provided by experts,
and retrieved knowledge.

Terminology Interpreter:

12
Published as a conference paper at ICLR 2024

Role description: Provides additional domain-specific knowledge to enhance problem understand-

ing and formulation.
Prompt template: As a domain knowledge terminology interpreter, your role is to provide addi-
tional information and insights related to the problem domain. Here are some relevant background
knowledge about this problem: {knowledge}. You can contribute by sharing your expertise, ex-
plaining relevant concepts, and offering suggestions to improve the problem understanding and for-
mulation. Please provide your input based on the given problem description: {problem}.

Parameter Extraction Expert:

Role description: Provides additional domain-specific knowledge to enhance problem understand-
ing and formulation.
Prompt template: As a variable extraction expert, your role is to identify and extract the relevant
variables from the problem statement. Variables represent the unknowns or decision variables in the
optimization problem. Your expertise in the problem domain will help in accurately identifying and
describing these variables. Please review the problem description and provide the extracted variables
along with their definitions: {problem}.

Variable Extraction Expert:

Role description: Proficient in identifying and extracting relevant variables from the problem state-
ment.
Prompt template: As a variable extraction expert, your role is to identify and extract the relevant
variables from the problem statement. Variables represent the unknowns or decision variables in the
optimization problem. Your expertise in the problem domain will help in accurately identifying and
describing these variables. Please review the problem description and provide the extracted variables
along with their definitions: {problem}.

Constraint Extraction Expert:

Role description: Skilled in extracting constraints from the problem description.
Prompt template: As a constraint extraction expert, your role is to identify and extract the con-
straints from the problem description. Constraints represent the limitations or conditions that need
to be satisfied in the optimization problem. Your expertise in the problem domain will help in ac-
curately identifying and formulating these constraints. Please review the problem description and
provide the extracted constraints: {problem}. The comments given by your colleagues are as fol-
lows: {comments}, please refer to them carefully.

Objective Function Extraction Expert:

Role description: Capable of identifying and extracting the objective function from the problem
statement.
Prompt template: You are an expert specialized in Operations Research and Optimization and re-
sponsible for objective function extraction. Your role is to identify and extract the objective function
from the problem statement. The objective function represents the goal of the optimization problem.
Now, the problem description is as following: {problem}.

Modeling Knowledge Supplement Expert:

Role description: Offers supplementary knowledge related to modeling techniques and best prac-
tices.
Prompt template: As a modeling knowledge supplement expert, your role is to provide additional
knowledge and insights related to modeling techniques and best practices in the field of Operations
Research and Optimization. Here are some relevant background knowledge about modeling tech-
nique: {knowledge}. You can contribute by explaining different modeling approaches, suggesting
improvements, or sharing relevant tips and tricks. Please provide your input based on the given
problem description and the modeling efforts so far: {problem}.

13
Published as a conference paper at ICLR 2024

Modeling Expert:
Role description: Proficient in constructing mathematical optimization models based on the ex-
tracted information.
Prompt template: You are a modeling expert specialized in the field of Operations Research and
Optimization. Your expertise lies in Mixed-Integer Programming (MIP) models, and you possess an
in-depth understanding of various modeling techniques within the realm of operations research. At
present, you are given an Operations Research problem, alongside additional insights provided by
other experts. The goal is to holistically incorporate these inputs and devise a comprehensive model
that addresses the given production challenge. Now the origin problem is as follow: {problem}.
And the modeling is as follow: {comments} Give your model of this problem.

LP File Generation Expert:

Role description: Expertise in generating LP (Linear Programming) files that can be used by opti-
mization solvers.
Prompt template: As an LP file generation expert, your role is to generate LP (Linear Program-
ming) files based on the formulated optimization problem. LP files are commonly used by opti-
mization solvers to find the optimal solution. Here is the important part source from LP file format
document: {knowledge}. Your expertise in generating these files will help ensure compatibility and
efficiency. Please review the problem description and the extracted information and provide the gen-
erated LP file: {problem}. The comments given by your colleagues are as follows: {comments},
please refer to them carefully.

Programming Example Provider:

Role description: Provides programming examples and templates to assist in implementing the
optimization solution.
Prompt template: As a programming expert in the field of operations research and optimiza-
tion, you offer programming examples and templates according to the background knowledge:
{knowledge}. Now, according to problem description: {problem}. Could you please comprehend
the extract code snippets in background knowledge and understand the their function, then give your
code example to assist with addressing this problem. The comments given by your colleagues are
as follows: {comments}, please refer to them carefully.

Programming Expert:
Role description: Skilled in programming and coding, capable of implementing the optimization
solution in a programming language.
Prompt template: You are a Python programmer in the field of operations research and optimiza-
tion. Your proficiency in utilizing third-party libraries such as Gurobi is essential. In addition to
your expertise in Gurobi, it would be great if you could also provide some background in related
libraries or tools, like NumPy, SciPy, or PuLP. You are given a specific problem and comments by
other experts. You aim to develop an efficient Python program that addresses the given problem.
Now the origin problem is as follow: {problem} And the experts along with there comment are as
follow: {comments} Give your Python code directly.

Code Reviewer:
Role description: Conducts thorough reviews of the implemented code to identify any errors, inef-
ficiencies, or areas for improvement.
Prompt template: As a Code Reviewer, your responsibility is to conduct thorough reviews of
implemented code related to optimization problems. You will identify possible errors, inefficiencies,
or areas for improvement in the code, ensuring that it adheres to best practices and delivers optimal
results. Now, here is the problem: {problem}. You are supposed to refer to the comments given by
your colleagues from other aspects: {comments}

14
Published as a conference paper at ICLR 2024

A.2.2 T HE C ONDUCTOR
The role of a conductor is highly specialized and significant, which necessitates a more intricate
prompt design compared to other experts. The following is a prompt template for a conductor:

You are a leader of an expert team in the field of operations research. Now, You
need to coordinate all the experts you manage so that they can work together to
solve a problem.
Next, you will be given a specific OR problem, and your goal is to select the expert
you think is the most suitable to ask for insights and suggestions.
Generally speaking, the solution of a complex OR problem requires analysis, in-
formation extraction, modeling and programming to solve the problem. The de-
scription of problem is presented as follows: {problem}
Remember, based on the capabilities of different experts and the current status of
the problem-solving process, you need to decide which expert to consult next. The
experts’ capabilities are described as follows: {experts inf o}
Experts that have already been commented include: {commented experts}
REMEMBER, THE EXPERT MUST CHOOSE FROM THE EXISTING LIST
ABOVE.
Note that you need to complete the entire workflow within the remaining
{remaining steps} steps.
Now, think carefully about your choice and give your reasons.

A.2.3 T HE R EDUCER
The Reducer’s role is to serve as a summarizer for all comments provided by the selected experts.
They must meticulously analyze the comments and generate the final answer, which can take various
forms, such as a modeling or a program. The Reducer’s prompt template may vary based on the
specific type of final answer required. Here’s an example of the Reducer’s prompt template when
the goal is to obtain a program:

You have been assigned the critical task of generating a program to solve the
complex operations research problem presented. This program should incorporate
the insights and suggestions provided by the selected experts. Your role is to
synthesize the information effectively to create a functional program.
The program is described as follows: {problem}
The comments from other experts are as follows: {comments}
Could you please write Python GUROBI code according to the comments.

A.3 BASELINES ’ I MPLEMENTATION

To ensure a fair comparison, we have implemented the baseline algorithms following their original
papers’ guidelines.
The traditional model, tag-BART, typically requires a training process. If we were to directly use
a pretrained tag-BART model from the LPWP dataset to test on the ComplexOR dataset, there
would likely be a domain shift. To mitigate this issue, we can adopt a two-step approach. First,
we pretrained the tag-BART model on the LPWP dataset. This initial pretraining enables the model
to acquire basic NER abilities. Next, we fine-tune the pretrained model on an additional set of 30
problems that are similar to the ComplexOR problems. These problems have the same annotated
format as the LPWP dataset. By fine-tuning the model on this specific set of problems, we aim to
maximize the performance and adapt the model to the requirements of the ComplexOR domain.
For the Standard prompting technique, we leverage the in-context learning ability of the language
model. Following the recommended approach outlined in the OpenAI official documentation, we
design the following prompt template:

You are a Python programmer in the field of operations research and optimization.
Your proficiency in utilizing third-party libraries such as Gurobi is essential. In

15
Published as a conference paper at ICLR 2024

addition to your expertise in Gurobi, it would be great if you could also provide
some background in related libraries or tools, like NumPy, SciPy, or PuLP. You
are given a specific problem. You aim to develop an efficient Python program that
addresses the given problem. Now the origin problem is as follow:{problem}.
Give your Python code directly.

Chain-of-Thought is a technique similar to standard prompting but with some additional steps. It
begins with the sentence ”Let’s think step by step” to guide the model’s thought process. After that,
a further summarization step is added because the output generated by Chain-of-Thought can be
lengthy and fragmented.
For Tree-of-Thoughts and Graph-of-Thoughts, we set the parameters based on the experiments
conducted in the respective papers. The width of exploration is set to 3, and the maximum explo-
ration step is set to 5. We adopt the prompt paradigm proposed in the work by Tree-of-Thoughts
Prompting (Hulbert, 2023). The prompt is designed as follows, where {exploration step prompt}
represents the original prompt used in each exploration step:

Imagine three different experts in the field of operations research and optimization
are modeling and writing programmer for a hard problem.
All experts will write down 1 step of their thinking, then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they’re wrong at any point then they leave.
The problem description is: {problem}
{exploration step prompt}

For Progressive-Hint Prompting, the original implementation may not be suitable for complex
OR problems. In the original paper, the answer is an immediate number, which makes it easy
to determine consistency between multiple interactions. However, in complex OR problems, the
answer can be a model or a program, both of which are not directly comparable. To address this,
we follow the underlying idea of Progressive-Hint Prompting and make some modifications. We
generate an initial answer and then use an additional interaction with the language model to ask
whether the current answer is the same as the previous one. In this way, the PHP algorithm is
implemented in a more appropriate manner.

In the ReAct approach, there are two main steps: reasoning and acting. In the reasoning step,
we use the same prompt as in CoT to guide the model’s thought process. In the acting step, we
limit the actions to retrieving knowledge from a knowledge base. This is because, in complex OR
problems, accessing and utilizing external knowledge is crucial for making informed decisions. To
ensure a fair comparison, we allow the ReAct agent to access all the knowledge bases mentioned in
Chain-of-Experts.
The design of Reflexion aligns with the backward reflection process described in Chain-of-Experts.
In Reflexion, feedback is obtained from the compilation and runtime of the modeling program,
allowing for iterative refinement of the previous steps until the agent is satisfied with the answer. It’s
worth noting that in our experiment setting, we do not generate test units.

A.4 M ORE E XPERIMENTAL R ESULTS

A.4.1 D ETAILED E XPERIMENT R ESULT ON C OMPLEX OR

Table 6 presents a detailed overview of the performance of baseline algorithms and the Chain-of-
Experts approach applied to the ComplexOR dataset. In the interest of brevity, we employ abbre-
viations, referring to the main content of the traditional algorithm as “BART”. Additionally, we
use shorthand labels for various algorithms. The results highlight the challenges faced by both
traditional algorithms, such as BART, and prompting techniques like CoT, as they were unable to
successfully address all the problems in the dataset. Notably, GoT achieved success in solving the
relatively straightforward “Blending” problem. Among the methods employing Large Language
Models agents, which include ReAct and Reflexion, several less complex problems were solvable,

16
Published as a conference paper at ICLR 2024

Table 6: Detailed experiment results for different methods on ComplexOR

problem BART Standard CoT PHP ToT GoT ReAct Reflexion SPP CoE
Blending ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✓ ✗ ✓
Car Selection ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✓
Capacitated Warehouse Location ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Employee Assignment ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Aircraft Landing ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
VRPTW Routing ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✓
Flowshop ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Distribution Center Allocation ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Aircraft Assignment ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Traffic Equilibrium ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓
Robot Arm ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Largest Small Polygon ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✓
CFLP ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Cut Problem ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓
Diet Problem ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Dietu Problem ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Knapsack ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Multi-commodity Transportation ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
PROD ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Single Level Big Bucket ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓
Overall 0/20 0/20 0/20 0/20 0/20 1/20 3/20 2/20 0/20 7/20

but overall performance remained suboptimal. The Chain-of-Experts (CoE) approach demonstrated
the highest success rate, solving 7 out of 20 problems, making it the most effective algorithm.
We also found that certain algorithms, particularly PHP and SPP, struggled with token limitations
when confronted with complex problems. This token limitation issue not only hindered their perfor-
mance but also incurred increased computational costs and inefficiencies in their execution. In con-
trast, the Chain-of-Experts approach incorporates three key strategies to mitigate token limitation-
related errors. Firstly, the utilization of expert summaries significantly reduces memory stress by
nearly 50%. Secondly, the adoption of a Conductor, as opposed to a round-robin approach, further
reduces the maximum context, effectively halving the token usage. Lastly, the incorporation of a
visible map led to a substantial reduction of approximately 70% in token consumption.

A.4.2 A BLATION EXPERIMENT ON INDIVIDUAL EXPERT

0.60 0.580 0.584 0.583 0.580 0.588 0.570 0.585 0.576

0.569
0.350 0.350 0.350 0.350
0.553 0.568 0.3
0.300
0.250 0.250
0.300
0.55
0.200 0.200
Accuracy

Accuracy

0.513 0.2
0.50 0.150
0.100
0.1
0.45
0.0
Terminology Interpreter

Parameter Extraction

Variable Extraction

Constraint Extraction

Objective Function Extraction

Modeling Knowledge Supplement

Modeling

Constraint Feasibility Check

Terminology Interpreter

Parameter Extraction
Variable Extraction
Constraint Extraction
Objective Function Extraction
Modeling Knowledge Supplement
Modeling

Constraint Feasibility Check

LP File Generation

Programming Example Provider

Programming

Code Reviewer

LP File Generation

Programming Example Provider

Programming
Code Reviewer

(a) Expert ablation experiment on LPWP dataset (b) Expert ablation experiment on ComplexOR
dataset

Figure 5: Impact on accuracy when removing individual experts from CoE

Figure 5 presents the results of the ablation experiment conducted on each expert in the Chain-of-
Experts framework. The x-axis labels represent the removal of specific experts from CoE, while
the y-axis represents the accuracy achieved after removing each expert. The blue horizontal line

17
Published as a conference paper at ICLR 2024

represents the accuracy when all experts are integrated. Based on the results, we can observe the
following insights regarding the importance of each expert for both datasets.
In the subfigure 5a, which corresponds to the LPWP dataset consisting of easy problems, removing
a single expert does not lead to a significant performance drop. However, the most crucial expert is
the Programming Expert. This finding aligns with the nature of the LPWP dataset, where the final
evaluation is based on the correctness of the program. Therefore, having an expert who can provide
insights into programming is essential. The second important expert is the Modeling Expert, as
mathematical modeling plays a crucial role in problem-solving.

Example Problem
In the context of manufacturing planning, we tackle the Capacitated Multi-level Lot Sizing Problem with
Backlogging. We make the following assumptions in defining and formulating this problem. First, we assume that
setup times and costs are non-sequence dependent, setup carryover between periods is not permitted, and all
initial inventories are zero. Second, all production costs are assumed to be linear in production output and do not
vary over time; hence, they can be dropped from the model for simplicity. Setup and holding costs also are
assumed not to vary over time. Furthermore, end items are assumed to have no successors, and only end items
have external demands and backlogging costs. Finally, we assume zero lead times and no lost sales. It is important
to note that all these assumptions (except setup carryover) are made for ease of exposition only and without loss
of generality, i.e., the theoretical results remain valid even when they are removed. See Ozturk and Ornek (2010)
for the lot-sizing problem with setup carryover as well as with external demands for component items.

Standard Prompting Result

Sets: 𝑃𝑒𝑟𝑖𝑜𝑠, 𝑀, 𝐼, 𝑒𝑛𝑑, 𝑒𝑡𝑎
Parameters: 𝑠𝑐, 𝑏𝑐, ℎ𝑐, 𝑠𝑡, 𝑎, 𝑔𝑑, 𝑀𝑛, 𝑟, 𝐶
Variables: 𝑥, 𝑠, 𝑏, 𝑦 Missing variable
Constraints:
invBalance1: 𝑥!,# + 𝑠!,#$% + 𝑏!,# − 𝑏!,#$% = 𝑔𝑑!,# + 𝑠!,# for 𝑖 ∈ 𝑒𝑛𝑑, 𝑡 ∈ 𝑃𝑒𝑟𝑖𝑜𝑑𝑠
invBalance2: 𝑥!,# + 𝑠!,#$% = 𝑔𝑑!,# + ∑&∈(#)! 𝑟!,& : 𝑥&,# + 𝑠!,# for 𝑖 ∈ 𝐼 − 𝑒𝑛𝑑, 𝑡 ∈ 𝑃𝑒𝑟𝑖𝑜𝑑𝑠
capConstraints: ∑!∈* 𝑎!,+ ⋅ 𝑥!,# + ∑!∈* 𝑠𝑡!,+ ⋅ 𝑦!,# <= 𝐶+,# for 𝑚 ∈ 𝑀, 𝑡 ∈ 𝑃𝑒𝑟𝑖𝑜𝑑𝑠
setupConstraints: 𝑥!,# <= 𝑀𝑛 ⋅ 𝑦!,# for 𝑖 ∈ 𝐼, 𝑡 ∈ 𝑃𝑒𝑟𝑖𝑜𝑑𝑠 Missing constrains
Objective: minimize ∑!∈*,#∈,(-!./0(𝑠𝑐! ⋅ 𝑦!,# + ℎ𝑐! ⋅ 𝑠!,# ) Missing item

SSP Result
Personas identification:Manufacturing Planner, Demand Planner, Data Analysts, Communication Specialists

Manufacturing Planner: To address this problem effectively, it's crucial to carefully consider production
quantities and schedules while adhering to the defined assumptions , such as zero lead times and no lost sales.

Demand Planner: In this context, the role of the Demand Planner is critical for forecasting external…

Data Analysts: My recommandations are instrumental in helping the Manufacturing Planner make informed…

Communication Specialists: Effective communication is key to solving the Capacitated Multi-level Lot
Sizing Problem successfully. Communication specialists facilitate collaboration…

Result: Variables: 𝑥, 𝑠, 𝑏, 𝑦 ; Constrains: … 𝑠!,1 = 0, 𝑏!,2 = 0; Objective: minimize ∑(𝑠𝑐! ⋅ 𝑦!,# + ℎ𝑐! ⋅ 𝑠!,# )

Conductor CoE Result

Terminology Interpreter: In this context, … , 1. No Setup Carryover: Setup carryover would mean that …;
2. Initial Inventories Zero means …; 3. Zero Lead Times and No Lost Sales: Lead time is the delay…

Variable Extractor: Sets:𝑃𝑒𝑟𝑖𝑜𝑠, 𝑀, 𝐼,𝑒𝑛𝑑,𝑒𝑡𝑎; Parameters:𝑠𝑐,𝑏𝑐, ℎ𝑐, st, a …; Variables: 𝑥, 𝑠, 𝑏, 𝑦, 𝑏𝑐

Modeling Expert: Variables: 𝑥, 𝑠, 𝑏, 𝑦, 𝑏𝑐 ; Constrains: … 𝑠!,1 = 0, 𝑏!,2 = 0; Objective: minimize

∑(𝑠𝑐! ⋅ 𝑦!,# + ℎ𝑐! ⋅ 𝑠!,# )

Programmer: LP file of this modeling: Minimize: obj…; Subject to: invBalance1[i,t]: x[i,t] + s[i,t-1] + b[i,t]
- b[i,t-1] = gd[i,t] + s[i,t] for all i in end, t in Periods ….
Run the file and get modeling unsolvable error. Start backward reflection to adjust modeling…

Result: Variables: 𝑥, 𝑠, 𝑏, 𝑦, 𝑏𝑐 ; Constrains: … 𝑠!,1 = 0, 𝑏!,2 = 0; Objective: minimize

∑(𝑠𝑐! ⋅ 𝑦!,# + ℎ𝑐! ⋅ 𝑠!,# ) + ∑!∈(3/,#∈,(-!./0 𝑏𝑐! ⋅ 𝑏!,#

Figure 6: Case study

18
Published as a conference paper at ICLR 2024

Subfigure 5b shows that individual experts have a much more significant impact on more challenging
problems. Apart from the Programming Expert and Modeling Expert, the removal of the Terminol-
ogy Interpreter leads to a significant drop of approximately 20% in accuracy. This result highlights
the knowledge-intensive nature of the ComplexOR dataset, which heavily relies on the supplementa-
tion of external knowledge. Interestingly, the LP File Generator Expert also proves to be important.
This finding suggests that for harder problems, utilizing LP files as an efficient intermediate struc-
tural file for modeling is a good approach, as it yields better results compared to writing Python
Gurobi program files.

A.4.3 C ASE STUDY

In this experiment, we conducted a detailed case study to gain insights into the effectiveness of
our approach, as depicted in Figure 6. To reduce the uncertainty of sampling results, we run each
methods five times and based our findings on the majority response. In this result, the standard
prompting approach failed to correctly identify the variable bc as backlogging cost, due to insuf-
ficient knowledge about the Multi-level Lot Sizing Problem. And there was a lack of constraints
regarding initial and end statuses, which are essential in the context of real manufacturing process
constraints. Moreover, the objective function was deemed unsolvable due to missing critical items.
The SSP provided some basic background knowledge through a Manufacturing Planner created by
leader persona. However, this method had limitations: three out of four personas offered negligible
assistance in solving the problem, which is a critical issue in domain-specific problems like OR.
In CoE, the Conductor effectively selected appropriate experts for different stages of the problem-
solving process. Initially, a Terminology Interpreter was chosen to provide essential background
knowledge. Although the Modeling Expert initially repeated the same mistake regarding the objec-
tive function, this error was rectified in the backward reflection process.

Sample DLP Matatag Template
No ratings yet
Sample DLP Matatag Template
5 pages
Thought-Like-Pro - Enhancing Reasoning of Large Language Models Through Self-Driven Prolog-Based Chain-of-Thought
No ratings yet
Thought-Like-Pro - Enhancing Reasoning of Large Language Models Through Self-Driven Prolog-Based Chain-of-Thought
15 pages
Codecot and Beyond: Learning To Program and Test Like A Developer
No ratings yet
Codecot and Beyond: Learning To Program and Test Like A Developer
10 pages
Large Language Models As Analogical Reasoners
No ratings yet
Large Language Models As Analogical Reasoners
25 pages
LM4OPT: Unveiling The Potential of Large Language Models in Formulating Mathematical Optimization Problems
No ratings yet
LM4OPT: Unveiling The Potential of Large Language Models in Formulating Mathematical Optimization Problems
8 pages
Large Language Models
No ratings yet
Large Language Models
19 pages
Synthetic Dialogue Dataset Generation Using LLM Agents: Ramamonjison Et Al. 2022 2023
No ratings yet
Synthetic Dialogue Dataset Generation Using LLM Agents: Ramamonjison Et Al. 2022 2023
11 pages
Explaining Competitive-Level Programming Solutions Using LLMs
No ratings yet
Explaining Competitive-Level Programming Solutions Using LLMs
14 pages
Leveraging Large Language Models For Solving Rare MIP Challenges
No ratings yet
Leveraging Large Language Models For Solving Rare MIP Challenges
5 pages
Large Language Models As Analogical Reasoners
No ratings yet
Large Language Models As Analogical Reasoners
25 pages
L L M A R - : Arge Anguage Odels As Nalogical Eason ERS
No ratings yet
L L M A R - : Arge Anguage Odels As Nalogical Eason ERS
24 pages
Devoir Machin Learning
No ratings yet
Devoir Machin Learning
4 pages
LLM+P Peter Stone
No ratings yet
LLM+P Peter Stone
8 pages
Guiding Large Language Models With Divide-and-Conquer Program For Discerning Problem Solving
No ratings yet
Guiding Large Language Models With Divide-and-Conquer Program For Discerning Problem Solving
18 pages
Combined Pyrolysis Biochar Kiln Project
No ratings yet
Combined Pyrolysis Biochar Kiln Project
15 pages
Catherine Bryant, Victorian School of Languages, 1935-2015, Ph.D. Thesis, 2016
No ratings yet
Catherine Bryant, Victorian School of Languages, 1935-2015, Ph.D. Thesis, 2016
313 pages
Ban Chinh: M6N Thi: Tling Anh
No ratings yet
Ban Chinh: M6N Thi: Tling Anh
27 pages
MZ33-AR1-rev3x Datasheet v1.0
No ratings yet
MZ33-AR1-rev3x Datasheet v1.0
1 page
IQ Test Questions With Answers - IQ Test PDF
56% (16)
IQ Test Questions With Answers - IQ Test PDF
4 pages
Design 1 Ti III Rpms Template 2022 2023
No ratings yet
Design 1 Ti III Rpms Template 2022 2023
48 pages
Technotes 576
No ratings yet
Technotes 576
162 pages
Am I Teaching Well-Self-Evaluation Strategies For Effective Teachers-Nikolic and Cabaj 2000
No ratings yet
Am I Teaching Well-Self-Evaluation Strategies For Effective Teachers-Nikolic and Cabaj 2000
265 pages
Skala Arxiv250614665
No ratings yet
Skala Arxiv250614665
43 pages
Power Market Study MAI Final Report - 171121 - Compressed
No ratings yet
Power Market Study MAI Final Report - 171121 - Compressed
188 pages
Mock Examination 1: English Business
50% (2)
Mock Examination 1: English Business
8 pages
Huawei Server Safety Information 08
No ratings yet
Huawei Server Safety Information 08
23 pages
Lesson Plan On Herons Formula
100% (5)
Lesson Plan On Herons Formula
2 pages
Homeschooling Final
100% (1)
Homeschooling Final
6 pages
Applsci 15 00716
No ratings yet
Applsci 15 00716
32 pages
Energies 17 00796
No ratings yet
Energies 17 00796
28 pages
EPRA National Values & Principles of Governance
No ratings yet
EPRA National Values & Principles of Governance
25 pages
Pofessional Goals ps1 - Deshann Valenitne
No ratings yet
Pofessional Goals ps1 - Deshann Valenitne
3 pages
Layc Career Academy Application
No ratings yet
Layc Career Academy Application
98 pages
Optichat: Bridging Optimization Models and Practitioners With Large Language Models
No ratings yet
Optichat: Bridging Optimization Models and Practitioners With Large Language Models
20 pages
Frwa 03 686856
No ratings yet
Frwa 03 686856
16 pages
Wa Njala 4062025 Jam Cs 136886
No ratings yet
Wa Njala 4062025 Jam Cs 136886
17 pages
Lesson Plan Template M
No ratings yet
Lesson Plan Template M
4 pages
Scope of Work For Engineering Services
No ratings yet
Scope of Work For Engineering Services
5 pages
Teachers Induction Program-Output Module-2-6
No ratings yet
Teachers Induction Program-Output Module-2-6
17 pages
Module 3 Reflection
No ratings yet
Module 3 Reflection
2 pages
Lesson 3 Table
No ratings yet
Lesson 3 Table
4 pages
Smart Grids Transcript
No ratings yet
Smart Grids Transcript
11 pages
Basic Elements of Spoken Language
No ratings yet
Basic Elements of Spoken Language
3 pages
Majorana Nayak Interview
No ratings yet
Majorana Nayak Interview
10 pages
Speech & Language Therapy in Practice, Summer 2010
No ratings yet
Speech & Language Therapy in Practice, Summer 2010
32 pages
Filipino Accomplishment Report# 59
No ratings yet
Filipino Accomplishment Report# 59
2 pages
Phy Project
No ratings yet
Phy Project
15 pages
PISA-WEEK 4-Day 1
No ratings yet
PISA-WEEK 4-Day 1
25 pages
MC13 LE3 Rev 1x - Datasheet - v1.0
No ratings yet
MC13 LE3 Rev 1x - Datasheet - v1.0
1 page
Zak and The Vet Teachers Notes
No ratings yet
Zak and The Vet Teachers Notes
2 pages
IT314-week4 - Lecture4
No ratings yet
IT314-week4 - Lecture4
12 pages
Pensando Pollara 400 Product Brief
No ratings yet
Pensando Pollara 400 Product Brief
3 pages
Table To Fill-Work Hours
No ratings yet
Table To Fill-Work Hours
2 pages
Final Project 1
No ratings yet
Final Project 1
14 pages
Learning Organization 1
No ratings yet
Learning Organization 1
24 pages
Finalfinal Deped Lessonplan
No ratings yet
Finalfinal Deped Lessonplan
3 pages
Activity Sheet Module 1 - Lesson 1
No ratings yet
Activity Sheet Module 1 - Lesson 1
3 pages
Biochar Mass and Energy Balance Notes
No ratings yet
Biochar Mass and Energy Balance Notes
1 page
A Study of Attitude of Pupil Teachers Towards Implementation of School Internship in B.Ed Programme
No ratings yet
A Study of Attitude of Pupil Teachers Towards Implementation of School Internship in B.Ed Programme
12 pages
PCH 1026 Produktblad Chf1126 Uk16 Pi Pch1026 Mk2
No ratings yet
PCH 1026 Produktblad Chf1126 Uk16 Pi Pch1026 Mk2
2 pages
MARISOL-RAGOBRIO - CURRICULUM-VITAE (2) (AutoRecovered)
No ratings yet
MARISOL-RAGOBRIO - CURRICULUM-VITAE (2) (AutoRecovered)
2 pages
DLL AOM (Week 0 June 20-21)
No ratings yet
DLL AOM (Week 0 June 20-21)
3 pages
Taylor Salamone Resume 10.15.2022
No ratings yet
Taylor Salamone Resume 10.15.2022
1 page
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
From Everand
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
OneFlow for Parallel and Distributed Deep Learning Systems: The Complete Guide for Developers and Engineers
From Everand
OneFlow for Parallel and Distributed Deep Learning Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
From Everand
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
Marcin Jamro
No ratings yet
Economic Multi Agent Systems: Design, Implementation, and Application
From Everand
Economic Multi Agent Systems: Design, Implementation, and Application
Gottfried Haber
4/5 (1)
Algorithms Made Simple: Understanding the Building Blocks of Software
From Everand
Algorithms Made Simple: Understanding the Building Blocks of Software
William E. Clark
No ratings yet
Introduction to MATLAB for Scientists and Engineers: A Practical Guide to Computational Problem Solving
From Everand
Introduction to MATLAB for Scientists and Engineers: A Practical Guide to Computational Problem Solving
Eric Okoth Ogur
No ratings yet
Python-Based Evolutionary Algorithms for Engineers
From Everand
Python-Based Evolutionary Algorithms for Engineers
Pankaj Jayaraman
No ratings yet
Mastering the Art of Prolog Programming: Advanced Techniques and Skills
From Everand
Mastering the Art of Prolog Programming: Advanced Techniques and Skills
Steve Jones
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Applications of Combinatorial Optimization
From Everand
Applications of Combinatorial Optimization
Vangelis Th. Paschos
No ratings yet
Advanced Guide to Dynamic Programming in Python: Techniques and Applications
From Everand
Advanced Guide to Dynamic Programming in Python: Techniques and Applications
Adam Jones
No ratings yet
Optimization in Engineering Sciences: Exact Methods
From Everand
Optimization in Engineering Sciences: Exact Methods
Pierre Borne
No ratings yet
Future-Proof z/OS: A Comprehensive Guide to Mainframe Modernization: Mainframes
From Everand
Future-Proof z/OS: A Comprehensive Guide to Mainframe Modernization: Mainframes
Ricardo Nuqui
No ratings yet
Co-Evolution of Metamodels and Model Transformations: An operator-based, stepwise approach for the impact resolution of metamodel evolution on model transformations.
From Everand
Co-Evolution of Metamodels and Model Transformations: An operator-based, stepwise approach for the impact resolution of metamodel evolution on model transformations.
Steffen Kruse
No ratings yet
LoRA Techniques for Large Language Model Adaptation: The Complete Guide for Developers and Engineers
From Everand
LoRA Techniques for Large Language Model Adaptation: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Rust Programming Basics: A Practical Guide with Examples
From Everand
Rust Programming Basics: A Practical Guide with Examples
William E. Clark
No ratings yet
Go Algorithms for Beginners: A Practical Guide with Examples
From Everand
Go Algorithms for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Functional Programming Step by Step: A Practical Guide with Examples
From Everand
Functional Programming Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
C++ Data Structures Explained: A Practical Guide with Examples
From Everand
C++ Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Prolog Programming Foundations: Definitive Reference for Developers and Engineers
From Everand
Prolog Programming Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
IGNOU MCA Previous Years Unsolved Papers All in One
From Everand
IGNOU MCA Previous Years Unsolved Papers All in One
Manish Soni
No ratings yet
IGNOU MCA Design and Analysis of Algorithms Previous Years Unsolved Papers MCS 211
From Everand
IGNOU MCA Design and Analysis of Algorithms Previous Years Unsolved Papers MCS 211
Manish Soni
No ratings yet
Tcl Language Essentials: Definitive Reference for Developers and Engineers
From Everand
Tcl Language Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Dynamic Programming in Python
From Everand
Mastering Dynamic Programming in Python
Ed A Norex
No ratings yet
Aspect-Oriented Programming in Practice: Definitive Reference for Developers and Engineers
From Everand
Aspect-Oriented Programming in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Computer Programming: A Comprehensive Guide
From Everand
Mastering Computer Programming: A Comprehensive Guide
Kondwani Hara
No ratings yet
OpenMPI Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenMPI Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mivar NETs and logical inference with the linear complexity
From Everand
Mivar NETs and logical inference with the linear complexity
Varlamov, Oleg O.
No ratings yet
OpenMP in Practice: Definitive Reference for Developers and Engineers
From Everand
OpenMP in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
From Everand
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
C++ OOP Made Simple: A Practical Guide with Examples
From Everand
C++ OOP Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
From Everand
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Workshop Master Revealed
From Everand
Workshop Master Revealed
Anil Soni
No ratings yet
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
From Everand
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Algorithms and Data Structures
From Everand
Mastering Algorithms and Data Structures
Manish Soni
No ratings yet
Transformers: Principles and Applications
From Everand
Transformers: Principles and Applications
Richard Johnson
No ratings yet
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
Constraint Satisfaction: Fundamentals and Applications
From Everand
Constraint Satisfaction: Fundamentals and Applications
Fouad Sabry
No ratings yet
Constrained Conditional Model: Fundamentals and Applications
From Everand
Constrained Conditional Model: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computational Geometry: Exploring Geometric Insights for Computer Vision
From Everand
Computational Geometry: Exploring Geometric Insights for Computer Vision
Fouad Sabry
No ratings yet

5639 Chain of Experts When LLM

Uploaded by

5639 Chain of Experts When LLM

Uploaded by

Published as a conference paper at ICLR 2024

C HAIN - OF -E XPERTS : W HEN LLM S M EET C OMPLEX

An example of current NL4Opt dataset Modeling result

3.2 T HE WORKFLOW OF C HAIN - OF -E XPERTS

The framework of our proposed Chain-of-Experts (CoE) is depicted in Figure 2. We initialize a

Conductor Terminology Interpreter: Modeling Expert: Programmer:

3.3 I MPLEMENTATION D ETAILS

Algorithm 1 provides the implementation pseudo-code of the Chain-of-Expert framework, which

4.2 M ODEL S ETUP AND P ERFORMANCE M ETRICS

Table 1: Comparison with baselines on LPWP and ComplexOR

4.4 OVERALL P ERFORMANCE ON LPWP AND C OMPLEX OR

4.5 A BLATION S TUDY

Table 2: Ablation study of Chain-of-Experts

4.6 PARAMETER S ENSITIVITY A NALYSIS

Code Example Provider

Figure 3: Parameter sensitive analysis and selection frequency analysis on LPWP

Table 3: Robustness of Chain-of-Experts under different large language models

GPT-3.5-turbo GPT-4 Claude2

4.7 OTHER LLM S AS BASE R EASONING M ODEL

4.8 E XPERIMENTAL A NALYSIS OF E XPERT S ELECTION F REQUENCY

Forward steps Most frequent path

A.1 A N EXAMPLE OF C OMPLEX OR DATASET

An example of our dataset

Sets: 𝑃𝑒𝑟𝑖𝑜𝑠, 𝑀, 𝐼, 𝑒𝑛𝑑, 𝑒𝑡𝑎

Figure 4: An example of ComplexOR dataset.

A.2 M ORE I MPLEMENTATION D ETAILS OF C HAIN - OF -E XPERTS

A.2.1 E XPERTS D ESIGN

Table 5: All experts involved in Chain-of-Experts

Expert name Knowledge base

Role description: Provides additional domain-specific knowledge to enhance problem understand-

Parameter Extraction Expert:

Variable Extraction Expert:

Constraint Extraction Expert:

Objective Function Extraction Expert:

Modeling Knowledge Supplement Expert:

LP File Generation Expert:

Programming Example Provider:

A.3 BASELINES ’ I MPLEMENTATION

A.4 M ORE E XPERIMENTAL R ESULTS

A.4.1 D ETAILED E XPERIMENT R ESULT ON C OMPLEX OR

Table 6: Detailed experiment results for different methods on ComplexOR

A.4.2 A BLATION EXPERIMENT ON INDIVIDUAL EXPERT

0.60 0.580 0.584 0.583 0.580 0.588 0.570 0.585 0.576

Objective Function Extraction

Modeling Knowledge Supplement

Constraint Feasibility Check

Constraint Feasibility Check

Programming Example Provider

Programming Example Provider

Figure 5: Impact on accuracy when removing individual experts from CoE

Standard Prompting Result

Conductor CoE Result

Variable Extractor: Sets:𝑃𝑒𝑟𝑖𝑜𝑠, 𝑀, 𝐼,𝑒𝑛𝑑,𝑒𝑡𝑎; Parameters:𝑠𝑐,𝑏𝑐, ℎ𝑐, st, a …; Variables: 𝑥, 𝑠, 𝑏, 𝑦, 𝑏𝑐

Modeling Expert: Variables: 𝑥, 𝑠, 𝑏, 𝑦, 𝑏𝑐 ; Constrains: … 𝑠!,1 = 0, 𝑏!,2 = 0; Objective: minimize

Result: Variables: 𝑥, 𝑠, 𝑏, 𝑦, 𝑏𝑐 ; Constrains: … 𝑠!,1 = 0, 𝑏!,2 = 0; Objective: minimize

Figure 6: Case study

A.4.3 C ASE STUDY

You might also like