0% found this document useful (0 votes)

19 views16 pages

23 Multi Modal and Multi Agent

Uploaded by

MurrayBent

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views16 pages

23 Multi Modal and Multi Agent

Uploaded by

MurrayBent

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Bowen Jiang 1 Yangxinyu Xie 1 2 Xiaomeng Wang 1 Weijie J. Su 1 Camillo J. Taylor 1 Tanwi Mallick 2

Abstract 2023; Echterhoff et al., 2024; Mukherjee & Chang, 2024;

Rationality is characterized by logical thinking Macmillan-Scott & Musolesi, 2024; Wang et al., 2024; Suri
and decision-making that align with evidence and et al., 2024). These biases significantly challenge the utility
logical rules. This quality is essential for effec- of LLMs in natural language processing research. For exam-
tive problem-solving, as it ensures that solutions ple, LLM-based evaluators, a popular choice for automated
are well-founded and systematically derived. De- assessments for text generation, display cognitive biases
spite the advancements of large language mod- against certain responses irrespective of their actual qual-
els (LLMs) in generating human-like text with ity or relevance (Stureborg et al., 2024; Koo et al., 2023).
remarkable accuracy, they present biases inher- Irrationality and hallucinations (Bang et al., 2023; Guer-
ited from the training data, inconsistency across reiro et al., 2023; Huang et al., 2023) also undermine the
different contexts, and difficulty understanding practical deployment of LLMs in critical sectors like health-
complex scenarios. Therefore, recent research at- care, finance, and legal services (He et al., 2023; Li et al.,
tempts to leverage the strength of multiple agents 2023d; Kang & Liu, 2023; Cheong et al., 2024), where
working collaboratively with various types of data reliability and consistency are paramount. The emerging
and tools for enhanced consistency and reliabil- concern about the factual accuracy and trustworthiness of
ity. To that end, this survey aims to define some LLMs highlighting an urgent need to develop better agents
axioms of rationality, understand whether multi- or agent systems (Nakajima, 2023; Gravitas, 2023) with
modal and multi-agent systems are advancing to- rational reasoning processes.
ward rationality, identify their advancements over One possible reason for the LLMs’ irrational behaviors, as
single-agent, language-only baselines, and dis- suggested by Bubeck et al. (2023) and Sun (2024), is the
cuss open problems and future directions. autoregressive nature of existing language models. This ar-
chitecture doesn’t allow for an “internal scratchpad” beyond
these models’ inner parametric representations of knowl-
1. Introduction edge, causing them to fail to reason rationally when faced
Large language models (LLMs) have demonstrated promis- with problems that require more complex and iterative pro-
ing results across a broad spectrum of tasks, particularly cedures. Thus, an important question emerges: How can
in exhibiting capabilities that plausibly mimic human-like we design an LLM-based agent capable of rational decision-
reasoning (Wei et al., 2022; Yao et al., 2024; Besta et al., making that can overcome these biases and inconsistencies?
2024; Shinn et al., 2023; Bubeck et al., 2023; Valmeekam Recent advancements in multi-modal and multi-agent frame-
et al., 2023; Prasad et al., 2023). These models leverage the works offer a promising direction to address this challenge,
richness of human language to abstract concepts, elaborate which leverage the expertise of different agents acting to-
thinking process, comprehend complex user queries, and gether towards a collective goal. Multi-modal foundation
develop plans and solutions in decision-making scenarios. models (Awadalla et al., 2023; Liu et al., 2023a; Wang et al.,
Despite these advances, recent research has revealed that 2023c; OpenAI, 2023; Reid et al., 2024) enhance reasoning
even state-of-the-art LLMs exhibit various forms of irra- by grounding decisions in a broader sensory context, akin
tional behaviors, such as the framing effect, certainty effect, to how human brains integrate rich sensory inputs to form a
overweighting bias, and conjunction fallacy (Binz & Schulz, more holistic base of knowledge. Meanwhile, multi-agent
1 systems introduce mechanisms such as consensus, debate,
University of Pennsylvania, Philadelphia, PA, 19104, USA
2
Argonne National Laboratory, Lemont, IL, 60439, USA. Cor- and self-consistency (Du et al., 2023; Liang et al., 2023;
respondence to: Bowen Jiang <[email protected]>, Talebirad & Nadiri, 2023; Madaan et al., 2024; Cohen et al.,
Yangxinyu Xie <[email protected]>. 2023; Shinn et al., 2023; Mohtashami et al., 2023), which
allow for more refined and reliable output through collab-
Proceedings of the 41 st International Conference on Machine orative interaction among multiple instances. Each agent
Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by
the author(s). is specialized in different domains and offers its unique

1
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Figure 1. The evolutionary tree of multi-agent and/or multi-modal systems related to the four axioms of rationality. Many proposed
approaches strive to address multiple axioms simultaneously. Bold fonts are used to mark works that involve multi-modalities. This tree
also includes foundational works to provide a clearer reference of time.

perspective, simulating the dynamics of discussion in hu- and factual reality of the world in which it operates. There-
man societies. Multi-agent systems can also incorporate fore, drawing on foundational works in rational decision-
multi-modal agents and agents specialized in querying ex- making (Tversky & Kahneman, 1988; Hastie & Dawes,
ternal knowledge sources or tools (Lewis et al., 2020; Schick 2009; Eisenführ et al., 2010), this section adopts an ax-
et al., 2024; Tang et al., 2023; Pan et al., 2024) to overcome iomatic approach to define rationality, presenting four sub-
hallucinations, ensuring that their results are more robust, stantive axioms that we expect a rational agent or agent
deterministic, and trustworthy, thus significantly improving systems to fulfill:
the quality of the generated responses towards rationality.
Grounding The decision of a rational agent is grounded
This survey provides a unique lens to interpret the underly-
on the physical and factual reality. In order to make a sound
ing motivations behind current multi-modal and/or multi-
decision, the agent must be able to integrate sufficient and
agent systems. Drawing from cognitive science, we first
accurate information from different sources and modalities
delineate four fundamental requirements for rational think-
grounded in reality without hallucination. While this re-
ing. We then discuss how research fields within the multi-
quirement is generally not explicitly stated in the cognitive
modality and multi-agents literature are progressing towards
science literature when defining rationality, it is implicitly
rationality by inherently improving these criteria. We posit
implied, as most humans have access to physical reality
that such advancements are bridging the gap between the per-
through multiple sensory signals.
formance of these systems and the expectations for a rational
thinker, in contrast to traditional single-agent language-only
models. We hope this survey can inspire further research Orderability of Preferences When comparing alterna-
at the intersection between agent systems and cognitive tives in a decision scenario, a rational agent can rank the
science. options based on the current state and ultimately select the
most preferred one based on the expected outcomes. This
orderability consists of several key principles, including
2. Defining Rationality comparability, transitivity closure, solvability, etc. with de-
A rational agent should avoid reaching contradictory conclu- tails in Appendix A. The orderability of preferences ensures
sions in decision making processes, respecting the physical the agent can make consistent and logical choices when
faced with multiple alternatives. LLM-based evaluations

2
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

heavily rely on this property, as discussed in Appendix B. ioms of rationality, offering a novel perspective that bridges
existing methodologies with rational principles.
Independence from irrelevant context The agent’s pref-
erence should not be influenced by information irrelevant 4.1. Towards Grounding through Multi-Modal Models
to the decision problem at hand. LLMs have been shown to Multi-modal approaches aim to improve information
exhibit irrational behavior when presented with irrelevant grounding across various channels, such as language and vi-
context (Shi et al., 2023; Wu et al., 2024; Liu et al., 2024c), sion. By incorporating multi-modal models (Radford et al.,
leading to confusion and suboptimal decisions. To ensure 2021; Alayrac et al., 2022; Awadalla et al., 2023; Liu et al.,
rationality, an agent must be able to identify and disregard 2024a; 2023a; Wang et al., 2023c; Zhu et al., 2023a; Ope-
irrelevant information, focusing solely on the factors that nAI, 2023; 2024; Reid et al., 2024), multi-agent systems can
directly impact the decision-making processes. greatly expand their capabilities, enabling a richer, more ac-
curate, and contextually aware interpretation of environment.
Invariance The preference of a rational agent remains For example, Chain-of-Action (Pan et al., 2024) advances
invariant across equivalent representations of the decision the single-modal Search-in-the-Chain (Xu et al., 2023) by
problem, regardless of specific wordings or modalities. supporting multi-modal data retrieval for faithful question
answering. We leave more discussions to Appendix D.
3. Scope
4.2. Towards Grounding through Knowledge Retrieval
Unlike existing surveys (Han et al., 2024; Guo et al., 2024;
Xie et al., 2024a; Durante et al., 2024; Cui et al., 2024; The existing transformer architecture (Vaswani et al., 2017)
Xu et al., 2024; Zhang et al., 2024a; Cheng et al., 2024; fundamentally limits how much information LLMs can hold.
Li et al., 2024a) that focus on the components, structures, As a result, in the face of uncertainty, LLMs often halluci-
agent profiling, planning, communications, memories, and nate (Bang et al., 2023; Guerreiro et al., 2023; Huang et al.,
applications of multi-modal and/or multi-agent systems, this 2023), generating outputs that are not supported by the fac-
survey is the first to specifically examine the increasingly tual reality of the environment. Retrieval-Augmented Gen-
important relationship between rationality and these eration (RAG) (Lewis et al., 2020) marks a significant mile-
multi-modal and multi-agent systems, exploring how they stone in addressing such an inherent limitation of LLMs.
contribute to enhancing rationality in decision making. We A multi-agent system can include planning agents in its
emphasize that rationality, by definition, is not equivalent framework, which determine how and where to retrieve
to reasoning or Theory of Mind, although they are deeply external knowledge, and what specific information to ac-
intertwined. We leave explanations to Appendix C. quire. External knowledge source could be a knowl-
edge graph (Gardères et al., 2020; Hogan et al., 2021), a
4. Towards Rationality through Multi-Modal database (Lu et al., 2024; Xie et al., 2024b), and more. Ad-
ditionally, the system can have summarizing agents that
and Multi-Agent Systems
utilize retrieved knowledge to enrich the system’s language
This section surveys recent advancements in multi-modal outputs with better factuality. For example, thanks to the
and multi-agent systems, categorized by their fields as de- external knowledge base, ReAct (Yao et al., 2022b) reduces
picted in Figure 1. Each category of research, such as knowl- false positive rates from hallucination by 8.0% compared
edge retrieval or neuro-symbolic reasoning, addresses one or to CoT (Wei et al., 2022). We provide a detailed survey of
more fundamental requirements for rational thinking. These how multi-agent systems surpass single-agent baselines in
rationality requirements are typically intertwined; there- Appendix E.
fore, an approach that enhances one aspect of rationality
often inherently improves others simultaneously. Mean- 4.3. Towards Grounding & Invariance & Independence
while, the overall goal of current multi-agent system in from Irrelevant Contexts through Tool Utilization
achieving rationality can usually be distilled into two key
concepts: deliberation and abstraction. Deliberation en- Similar to knowledge retrieval, Toolformer (Schick et al.,
courages slower reasoning process such as brainstorming 2024) opens a new era that allows LLMs to use external
and reflection, while abstraction refers to boiling down the tools via API calls following predefined syntax, effectively
problem into its logical essence like calling APIs of tools or extending their capabilities beyond their intrinsic limita-
incorporating neuro-symbolic reasoning agents. tions and enforcing consistent and predictable outputs. A
multi-agent system can understand when and which tool to
Most existing studies do not explicitly base their frameworks use, which modality of information the tool should expect,
on rationality in their original writings. Our analysis aims how to call the corresponding API, and how to incorporate
to reinterpret these works through the lens of our four ax-

3
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

outputs from the API calls, which anchors subsequent rea- contributes to a self-evolving process within the multi-agent
soning processes with more accurate information beyond system (Zhang et al., 2024b), resulting in a final response
their parametric memory. For example, VisProg (Gupta & or a consensus that is less sensitive to specific wording or
Kembhavi, 2023) generates Python programs to reliably exe- token bias, moving the response towards better invariance.
cute subroutines. We provide more examples in Appendix F.
In most cases, utilizing tools require translating natural lan- 5. Open Problems and Future Directions
guage queries into API calls with predefined syntax. Once This survey builds connections between multi-modal and
the planning agent has determined the APIs and their input multi-agent systems with rationality, guided by the four
arguments, the original queries that may contain irrelevant axioms we expect a rational agent or agent systems should
context become invisible to the tools, and the tools will satisfy: information grounding, orderability of preference,
ignore any variance in the original queries as long as they independence from irrelevant context, and invariance across
share the equivalent underlying logic. This improves the equivalent representations. Our findings suggest that the
invariance property from noisy queries and independence grounding can usually be enhanced by multi-modalities,
of irrelevant context. Examples are shown in Appendix F. world models, knowledge retrieval, and tool utilization. The
remaining three axioms are typically intertwined, which
4.4. Towards Orderability of Preferences & Invariance could be improved by achievements in multi-modalities,
& Independence from Irrelevant Context through tool utilization, neuro-symbolic reasoning, self-reflection,
Neuro-Symbolic Reasoning and multi-agent collaborations.
A multi-agent system incorporating symbolic modules can
not only understand language queries but also solve them Inherent Rationality It is important to understand that
with a level of consistency, providing a faithful and trans- integrating most of these agents or modules with LLMs still
parent reasoning process based on well-defined rules and does not inherently make LLMs more rational. Current
logical principles, which is unachievable by LLMs alone. methods are neither sufficient nor necessary, but they
Logic-LM (Pan et al., 2023), for example, combines prob- serves as instrumental tools that bridge the gap between
lem formulating, symbolic reasoning, and summarizing an LLM’s response and rationality. These approaches
agents, where the symbolic reasoner empowers LLMs with enable multi-agent systems, which are black boxes from the
deterministic symbolic solvers to perform inference, ensur- user’s perspective, to more closely mimic rational thinking
ing a correct answer is consistently chosen. These modules in their output responses. However, despite these more ra-
typically expect standardized input formats, enhancing in- tional responses elicited from multi-modal and multi-agent
variance and independence similar to API calls of tool usage. systems, the challenge of how to effectively close the loop
More examples are included in Appendix G. and bake these enhanced outputs back into the LLMs (Zhao
et al., 2024), beyond mere fine-tuning, remains an open
4.5. Towards Orderability of Preferences & Invariance topic. In other words, can we leverage these more rational
through Reflection, Debate, and Prompt Strategies outputs to inherently enhance a single foundation model’s
rationality in its initial responses in future applications?
Single agents with self-reflection prompting (Shinn et al.,
2023) and multi-agent systems that promote debate and con-
sensus can help align outputs more closely with deliberate Encouraging More Multi-Modal Agents in Multi-Agent
and logical decision-making, thus enhancing rational rea- Systems Research into the integration of multi-modality
soning. For instance, Corex (Sun et al., 2023) finds that within multi-agent systems would be promising. Fields
orchestrating multiple agents to work together yields better such as multi-agent debate and neuro-symbolic reasoning,
complex reasoning results, exceeding strong single-agent as shown in Figure 1, currently under-utilize the potential of
baselines (Wang et al., 2022b) by an average of 1.1-10.6%. multi-modal sensory inputs. We believe that expanding the
More similar results are discussed in Appendix H. These role of multi-modalities, including but not limited to vision,
collaborative approaches, in summary, allow each agent in a sounds, and structured data could significantly enhance the
system to compare and rank its preference on choices from capabilities and rationality of multi-agent systems.
its own or from other agents through critical judgments.
It helps enable the system to discern and output the most
Evaluation on Rationality Benchmarks on rationality are
dominant decision as a consensus, thereby improving the
scarce. Future research should prioritize the development of
orderability of preference. At the same time, through such a
benchmarks specifically tailored to assess rationality, going
slow and critical thinking process, errors in initial responses
beyond existing ones on accuracy. These new benchmarks
or input prompts are more likely to be detected and cor-
should avoid data contamination and emphasize tasks that
rected. Accumulated experience from past error planning
demand consistent reasoning across diverse representations.

4
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

6. Limitations Niewiadomski, H., Nyczyk, P., et al. Graph of thoughts:

Solving elaborate problems with large language models.
The fields of multi-modal and multi-agent systems are In Proceedings of the AAAI Conference on Artificial In-
rapidly evolving. Despite our best efforts, it is inherently telligence, volume 38, pp. 17682–17690, 2024.
impossible to encompass all related works within the scope
of this survey. Our discussion also possesses limited men- Binz, M. and Schulz, E. Using cognitive psychology to
tion of the reasoning capabilities, theory of mind in machine understand gpt-3. Proceedings of the National Academy
psychology, cognitive architectures, and evaluations on ra- of Sciences, 120(6):e2218523120, 2023.
tionality, all of which lie beyond the scope of this survey
but are crucial for a deeper understanding of LLMs and Brooks, T., Peebles, B., Holmes, C., DePue, W.,
agent systems. Furthermore, the concept of rationality in Guo, Y., Jing, L., Schnurr, D., Taylor, J., Luhman,
human cognitive science may encompass more principles T., Luhman, E., Ng, C., Wang, R., and Ramesh,
and axioms than those defined in our survey. A. Video generation models as world simulators.
2024. URL https://openai.com/research/
video-generation-models-as-world-simulators.
References
Aghajanyan, A., Huang, B., Ross, C., Karpukhin, V., Xu, Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J.,
H., Goyal, N., Okhonko, D., Joshi, M., Ghosh, G., Lewis, Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y.,
M., et al. Cm3: A causal masked multimodal model of Lundberg, S., et al. Sparks of artificial general intel-
the internet. arXiv preprint arXiv:2201.07520, 2022. ligence: Early experiments with gpt-4. arXiv preprint
arXiv:2303.12712, 2023.
Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I.,
Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, Chan, C.-M., Chen, W., Su, Y., Yu, J., Xue, W., Zhang, S.,
M., et al. Flamingo: a visual language model for few- Fu, J., and Liu, Z. Chateval: Towards better llm-based
shot learning. Advances in neural information processing evaluators through multi-agent debate. arXiv preprint
systems, 35:23716–23736, 2022. arXiv:2308.07201, 2023.

Apperly, I. A. and Butterfill, S. A. Do humans have two sys- Chen, X., Wang, X., Changpinyo, S., Piergiovanni, A.,
tems to track beliefs and belief-like states? Psychological Padlewski, P., Salz, D., Goodman, S., Grycner, A.,
review, 116(4):953, 2009. Mustafa, B., Beyer, L., et al. Pali: A jointly-scaled
multilingual language-image model. arXiv preprint
Awadalla, A., Gao, I., Gardner, J., Hessel, J., Hanafy, Y., arXiv:2209.06794, 2022.
Zhu, W., Marathe, K., Bitton, Y., Gadre, S., Sagawa,
S., et al. Openflamingo: An open-source framework Chen, Y., Wang, R., Jiang, H., Shi, S., and Xu, R. Exploring
for training large autoregressive vision-language models. the use of large language models for reference-free text
arXiv preprint arXiv:2308.01390, 2023. quality evaluation: A preliminary empirical study. arXiv
preprint arXiv:2304.00723, 2023.
Bai, Y., Ying, J., Cao, Y., Lv, X., He, Y., Wang, X., Yu,
J., Zeng, K., Xiao, Y., Lyu, H., et al. Benchmarking Cheng, Y., Zhang, C., Zhang, Z., Meng, X., Hong, S., Li,
foundation models with language-model-as-an-examiner. W., Wang, Z., Wang, Z., Yin, F., Zhao, J., et al. Ex-
Advances in Neural Information Processing Systems, 36, ploring large language model based intelligent agents:
2024. Definitions, methods, and prospects. arXiv preprint
arXiv:2401.03428, 2024.
Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie,
B., Lovenia, H., Ji, Z., Yu, T., Chung, W., et al. A multi- Cheng, Z., Xie, T., Shi, P., Li, C., Nadkarni, R., Hu, Y.,
task, multilingual, multimodal evaluation of chatgpt on Xiong, C., Radev, D., Ostendorf, M., Zettlemoyer, L.,
reasoning, hallucination, and interactivity. arXiv preprint et al. Binding language models in symbolic languages.
arXiv:2302.04023, 2023. arXiv preprint arXiv:2210.02875, 2022.

Bao, H., Wang, W., Dong, L., Liu, Q., Mohammed, O. K., Cheong, I., Xia, K., Feng, K., Chen, Q. Z., and Zhang,
Aggarwal, K., Som, S., Piao, S., and Wei, F. Vlmo: A. X. (a) i am not a lawyer, but...: Engaging legal experts
Unified vision-language pre-training with mixture-of- towards responsible llm policies for legal advice. arXiv
modality-experts. Advances in Neural Information Pro- preprint arXiv:2402.01864, 2024.
cessing Systems, 35:32897–32912, 2022.
Chern, S., Fan, Z., and Liu, A. Combating adversar-
Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Pod- ial attacks with multi-agent debate. arXiv preprint
stawski, M., Gianinazzi, L., Gajda, J., Lehmann, T., arXiv:2401.05998, 2024.

5
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Chiang, C.-H. and Lee, H.-y. A closer look into automatic Gao, D., Ji, L., Zhou, L., Lin, K. Q., Chen, J., Fan, Z., and
evaluation using large language models. arXiv preprint Shou, M. Z. Assistgpt: A general multi-modal assistant
arXiv:2310.05657, 2023. that can plan, execute, inspect, and learn. arXiv preprint
arXiv:2306.08640, 2023b.
Cohen, R., Hamri, M., Geva, M., and Globerson, A. Lm vs
lm: Detecting factual errors via cross examination. arXiv Gao, M., Ruan, J., Sun, R., Yin, X., Yang, S., and Wan,
preprint arXiv:2305.13281, 2023. X. Human-like summarization evaluation with chatgpt.
arXiv preprint arXiv:2304.02554, 2023c.
Cui, C., Ma, Y., Cao, X., Ye, W., Zhou, Y., Liang, K., Chen,
J., Lu, J., Yang, Z., Liao, K.-D., et al. A survey on mul- Gardères, F., Ziaeefard, M., Abeloos, B., and Lecue, F.
timodal large language models for autonomous driving. Conceptbert: Concept-aware representation for visual
In Proceedings of the IEEE/CVF Winter Conference on question answering. In Findings of the Association for
Applications of Computer Vision, pp. 958–979, 2024. Computational Linguistics: EMNLP 2020, pp. 489–498,
Deng, X., Gu, Y., Zheng, B., Chen, S., Stevens, S., Wang, 2020.
B., Sun, H., and Su, Y. Mind2web: Towards a general-
Gravitas, S. Autogpt. Python. https://github.com/Significant-
ist agent for the web. Advances in Neural Information
Gravitas/ Auto-GPT, 2023.
Processing Systems, 36, 2024.
Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., and Mor- Guerreiro, N. M., Alves, D. M., Waldendorf, J., Haddow, B.,
datch, I. Improving factuality and reasoning in lan- Birch, A., Colombo, P., and Martins, A. F. Hallucinations
guage models through multiagent debate. arXiv preprint in large multilingual translation models. Transactions of
arXiv:2305.14325, 2023. the Association for Computational Linguistics, 11:1500–
1517, 2023.
Durante, Z., Huang, Q., Wake, N., Gong, R., Park, J. S.,
Sarkar, B., Taori, R., Noda, Y., Terzopoulos, D., Choi, Y., Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla,
et al. Agent ai: Surveying the horizons of multimodal N. V., Wiest, O., and Zhang, X. Large language model
interaction. arXiv preprint arXiv:2401.03568, 2024. based multi-agents: A survey of progress and challenges.
arXiv preprint arXiv:2402.01680, 2024.
Echterhoff, J., Liu, Y., Alessa, A., McAuley, J., and He, Z.
Cognitive bias in high-stakes decision-making with llms. Gupta, T. and Kembhavi, A. Visual programming: Compo-
arXiv preprint arXiv:2403.00811, 2024. sitional visual reasoning without training. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Eisenführ, F., Weber, M., and Langer, T. Rational decision Pattern Recognition, pp. 14953–14962, 2023.
making. Springer, 2010.
Gur, I., Furuta, H., Huang, A., Safdari, M., Matsuo, Y., Eck,
Fan, L., Wang, G., Jiang, Y., Mandlekar, A., Yang, Y., Zhu, D., and Faust, A. A real-world webagent with planning,
H., Tang, A., Huang, D.-A., Zhu, Y., and Anandkumar, A. long context understanding, and program synthesis. arXiv
Minedojo: Building open-ended embodied agents with preprint arXiv:2307.12856, 2023.
internet-scale knowledge. Advances in Neural Informa-
tion Processing Systems, 35:18343–18362, 2022. Hagendorff, T. Machine psychology: Investigating
emergent capabilities and behavior in large language
Fang, M., Deng, S., Zhang, Y., Shi, Z., Chen, L., Pech-
models using psychological methods. arXiv preprint
enizkiy, M., and Wang, J. Large language models are neu-
arXiv:2303.13988, 2023.
rosymbolic reasoners. arXiv preprint arXiv:2401.09334,
2024. Han, S., Zhang, Q., Yao, Y., Jin, W., Xu, Z., and He, C.
Fu, J., Ng, S.-K., Jiang, Z., and Liu, P. Gptscore: Evaluate Llm multi-agent systems: Challenges and open problems.
as you desire. arXiv preprint arXiv:2302.04166, 2023. arXiv preprint arXiv:2402.03578, 2024.

Furuta, H., Nachum, O., Lee, K.-H., Matsuo, Y., Gu, Hastie, R. and Dawes, R. M. Rational choice in an uncertain
S. S., and Gur, I. Multimodal web navigation with world: The psychology of judgment and decision making.
instruction-finetuned foundation models. arXiv preprint Sage Publications, 2009.
arXiv:2305.11854, 2023.
He, K., Mao, R., Lin, Q., Ruan, Y., Lan, X., Feng, M.,
Gao, C., Lan, X., Lu, Z., Mao, J., Piao, J., Wang, H., Jin, D., and Cambria, E. A survey of large language models for
and Li, Y. Sˆ3: Social-network simulation system with healthcare: from data, technology, and applications to ac-
large language model-empowered agents. arXiv preprint countability and ethics. arXiv preprint arXiv:2310.05694,
arXiv:2307.14984, 2023a. 2023.

6
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, Khardon, R. and Roth, D. Learning to reason. Journal of
G. D., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Nav- the ACM (JACM), 44(5):697–725, 1997.
igli, R., Neumaier, S., et al. Knowledge graphs. ACM
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C.,
Computing Surveys (Csur), 54(4):1–37, 2021.
Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo,
Hong, S., Zheng, X., Chen, J., Cheng, Y., Wang, J., Zhang, W.-Y., Dollár, P., and Girshick, R. Segment anything.
C., Wang, Z., Yau, S. K. S., Lin, Z., Zhou, L., et al. arXiv:2304.02643, 2023.
Metagpt: Meta programming for multi-agent collabora- Koo, R., Lee, M., Raheja, V., Park, J. I., Kim, Z. M.,
tive framework. arXiv preprint arXiv:2308.00352, 2023a. and Kang, D. Benchmarking cognitive biases in
large language models as evaluators. arXiv preprint
Hong, W., Wang, W., Lv, Q., Xu, J., Yu, W., Ji, J., Wang,
arXiv:2309.17012, 2023.
Y., Wang, Z., Dong, Y., Ding, M., et al. Cogagent: A
visual language model for gui agents. arXiv preprint Kosinski, M. Evaluating large language models in theory of
arXiv:2312.08914, 2023b. mind tasks. arXiv e-prints, pp. arXiv–2302, 2023.

Hsu, J., Mao, J., Tenenbaum, J., and Wu, J. What’s left? Langley, P. Crafting papers on machine learning. In Langley,
concept grounding with logic-enhanced foundation mod- P. (ed.), Proceedings of the 17th International Conference
els. Advances in Neural Information Processing Systems, on Machine Learning (ICML 2000), pp. 1207–1216, Stan-
36, 2024. ford, CA, 2000. Morgan Kaufmann.
LeCun, Y. A path towards autonomous machine intelligence
Hu, Z., Iscen, A., Sun, C., Chang, K.-W., Sun, Y., Ross,
version 0.9. 2, 2022-06-27. Open Review, 62(1), 2022.
D., Schmid, C., and Fathi, A. Avis: Autonomous visual
information seeking with large language model agent. LeCun, Y. Objective-driven ai: Towards ai systems
Advances in Neural Information Processing Systems, 36, that can learn, remember, reason, plan, have com-
2024. mon sense, yet are steerable and safe. University
of Washington, Department of Electrical & Com-
Huang, J. and Chang, K. C.-C. Towards reasoning in puter Engineering, January 2024. URL https:
large language models: A survey. arXiv preprint //www.ece.uw.edu/wp-content/uploads/
arXiv:2212.10403, 2022. 2024/01/lecun-20240124-uw-lyttle.pdf.
Slide presentation retrieved from University of Washing-
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H.,
ton.
Chen, Q., Peng, W., Feng, X., Qin, B., et al. A survey
on hallucination in large language models: Principles, Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V.,
taxonomy, challenges, and open questions. arXiv preprint Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel,
arXiv:2311.05232, 2023. T., et al. Retrieval-augmented generation for knowledge-
intensive nlp tasks. Advances in Neural Information Pro-
Jiang, B., Zhuang, Z., Shivakumar, S. S., Roth, D., and cessing Systems, 33:9459–9474, 2020.
Taylor, C. J. Multi-agent vqa: Exploring multi-agent
foundation models in zero-shot visual question answering. Li, C., Gan, Z., Yang, Z., Yang, J., Li, L., Wang, L., Gao, J.,
arXiv preprint arXiv:2403.14783, 2024. et al. Multimodal foundation models: From specialists to
general-purpose assistants. Foundations and Trends® in
Kang, H. and Liu, X.-Y. Deficiency of large language mod- Computer Graphics and Vision, 16(1-2):1–214, 2024a.
els in finance: An empirical examination of hallucination.
Li, H., Chong, Y. Q., Stepputtis, S., Campbell, J., Hughes,
arXiv preprint arXiv:2311.15548, 2023.
D., Lewis, M., and Sycara, K. Theory of mind for multi-
Ke, Y. H., Yang, R., Lie, S. A., Lim, T. X. Y., Abdullah, agent collaboration via large language models. arXiv
H. R., Ting, D. S. W., and Liu, N. Enhancing diagnos- preprint arXiv:2310.10701, 2023a.
tic accuracy through multi-agent conversations: Using Li, J., Li, D., Savarese, S., and Hoi, S. Blip-2: Bootstrapping
large language models to mitigate cognitive bias. arXiv language-image pre-training with frozen image encoders
preprint arXiv:2401.14589, 2024. and large language models. In International conference
on machine learning, pp. 19730–19742. PMLR, 2023b.
Khan, A., Hughes, J., Valentine, D., Ruis, L., Sachan, K.,
Radhakrishnan, A., Grefenstette, E., Bowman, S. R., Li, J., Wang, S., Zhang, M., Li, W., Lai, Y., Kang, X.,
Rocktäschel, T., and Perez, E. Debating with more persua- Ma, W., and Liu, Y. Agent hospital: A simulacrum of
sive llms leads to more truthful answers. arXiv preprint hospital with evolvable medical agents. arXiv preprint
arXiv:2402.06782, 2024. arXiv:2405.02957, 2024b.

7
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Li, X., Zhao, R., Chia, Y. K., Ding, B., Bing, L., Joty, S., and Lu, P., Peng, B., Cheng, H., Galley, M., Chang, K.-W.,
Poria, S. Chain of knowledge: A framework for ground- Wu, Y. N., Zhu, S.-C., and Gao, J. Chameleon: Plug-and-
ing large language models with structured knowledge play compositional reasoning with large language models.
bases. arXiv preprint arXiv:2305.13269, 2023c. Advances in Neural Information Processing Systems, 36,
2024.
Li, Y., Wang, S., Ding, H., and Chen, H. Large language
models in finance: A survey. In Proceedings of the Fourth Luo, Z., Xie, Q., and Ananiadou, S. Chatgpt as a factual in-
ACM International Conference on AI in Finance, pp. 374– consistency evaluator for abstractive text summarization.
382, 2023d. arXiv preprint arXiv:2303.15621, 2023.

Li, Y., Zhang, Y., and Sun, L. Metaagents: Simulating inter- Macmillan-Scott, O. and Musolesi, M. (ir) rationality and
actions of human behaviors for llm-based task-oriented cognitive biases in large language models. arXiv preprint
coordination via collaborative generative agents. arXiv arXiv:2402.09193, 2024.
preprint arXiv:2310.06500, 2023e.
Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao,
Liang, T., He, Z., Jiao, W., Wang, X., Wang, Y., Wang, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S.,
R., Yang, Y., Tu, Z., and Shi, S. Encouraging divergent Yang, Y., et al. Self-refine: Iterative refinement with self-
thinking in large language models through multi-agent feedback. Advances in Neural Information Processing
debate. arXiv preprint arXiv:2305.19118, 2023. Systems, 36, 2024.

Liu, E. Z., Guu, K., Pasupat, P., Shi, T., and Liang, P. Re- Marino, K., Chen, X., Parikh, D., Gupta, A., and Rohrbach,
inforcement learning on web interfaces using workflow- M. Krisp: Integrating implicit and symbolic knowledge
guided exploration. arXiv preprint arXiv:1802.08802, for open-domain knowledge-based vqa. In Proceedings
2018. of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 14111–14121, 2021.
Liu, H., Li, C., Li, Y., and Lee, Y. J. Improved base-
lines with visual instruction tuning. arXiv preprint Mohtashami, A., Hartmann, F., Gooding, S., Zilka, L., Shar-
arXiv:2310.03744, 2023a. ifi, M., et al. Social learning: Towards collaborative
learning with large language models. arXiv preprint
Liu, H., Li, C., Wu, Q., and Lee, Y. J. Visual instruction tun- arXiv:2312.11441, 2023.
ing. Advances in neural information processing systems,
36, 2024a. Mukherjee, A. and Chang, H. H. Heuristic reasoning in ai:
Instrumental use and mimetic absorption. arXiv preprint
Liu, H., Yan, W., Zaharia, M., and Abbeel, P. World model arXiv:2403.09404, 2024.
on million-length video and language with ringattention.
arXiv preprint arXiv:2402.08268, 2024b. Nakajima, Y. Babyagi. Python. https://github.
com/yoheinakajima/babyagi, 2023.
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua,
M., Petroni, F., and Liang, P. Lost in the middle: How Nye, M., Tessler, M., Tenenbaum, J., and Lake, B. M. Im-
language models use long contexts. Transactions of the proving coherence and consistency in neural sequence
Association for Computational Linguistics, 12:157–173, models with dual-system, neuro-symbolic reasoning. Ad-
2024c. vances in Neural Information Processing Systems, 34:
25192–25204, 2021.
Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R., and Zhu, C.
Gpteval: Nlg evaluation using gpt-4 with better human Oguntola, I., Hughes, D., and Sycara, K. Deep interpretable
alignment. arXiv preprint arXiv:2303.16634, 2023b. models of theory of mind. In 2021 30th IEEE interna-
tional conference on robot & human interactive commu-
Liu, Z., Zhang, Y., Li, P., Liu, Y., and Yang, D. Dy- nication (RO-MAN), pp. 657–664. IEEE, 2021.
namic llm-agent network: An llm-agent collaboration
framework with agent team optimization. arXiv preprint OpenAI. Gpt-4v(ision) system card. 2023. URL https:
arXiv:2310.02170, 2023c. //api.semanticscholar.org/CorpusID:
263218031.
Lu, J., Batra, D., Parikh, D., and Lee, S. Vilbert: Pre-
training task-agnostic visiolinguistic representations for OpenAI. Gpt-4o. Software available from Ope-
vision-and-language tasks. Advances in neural informa- nAI, 2024. URL https://openai.com/index/
tion processing systems, 32, 2019. hello-gpt-4o/. Accessed: 2024-05-20.

8
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Pan, L., Albalak, A., Wang, X., and Wang, W. Y. Logic- Sclar, M., Kumar, S., West, P., Suhr, A., Choi, Y., and
lm: Empowering large language models with symbolic Tsvetkov, Y. Minding language models’(lack of) theory
solvers for faithful logical reasoning. arXiv preprint of mind: A plug-and-play multi-character belief tracker.
arXiv:2305.12295, 2023. arXiv preprint arXiv:2306.00924, 2023.

Pan, Z., Luo, H., Li, M., and Liu, H. Chain-of-action: Faith- Shaw, P., Joshi, M., Cohan, J., Berant, J., Pasupat, P., Hu,
ful and multimodal question answering through large lan- H., Khandelwal, U., Lee, K., and Toutanova, K. N. From
guage models. arXiv preprint arXiv:2403.17359, 2024. pixels to ui actions: Learning to follow instructions via
graphical user interfaces. Advances in Neural Information
Prasad, A., Koller, A., Hartmann, M., Clark, P., Sabharwal, Processing Systems, 36, 2024.
A., Bansal, M., and Khot, T. Adapt: As-needed decompo-
Shen, C., Cheng, L., Nguyen, X.-P., You, Y., and Bing, L.
sition and planning with language models. arXiv preprint
Large language models are not yet human-level evaluators
arXiv:2311.05772, 2023.
for abstractive summarization. In Findings of the Associ-
Qian, C., Cong, X., Yang, C., Chen, W., Su, Y., Xu, J., ation for Computational Linguistics: EMNLP 2023, pp.
Liu, Z., and Sun, M. Communicative agents for software 4215–4233, 2023.
development. arXiv preprint arXiv:2307.07924, 2023. Shen, W., Li, C., Chen, H., Yan, M., Quan, X., Chen, H.,
Zhang, J., and Huang, F. Small llms are weak tool learn-
Qiao, S., Ou, Y., Zhang, N., Chen, X., Yao, Y., Deng, ers: A multi-llm agent. arXiv preprint arXiv:2401.07324,
S., Tan, C., Huang, F., and Chen, H. Reasoning with 2024a.
language model prompting: A survey. arXiv preprint
arXiv:2212.09597, 2022. Shen, Y., Song, K., Tan, X., Li, D., Lu, W., and Zhuang,
Y. Hugginggpt: Solving ai tasks with chatgpt and its
Qiao, S., Zhang, N., Fang, R., Luo, Y., Zhou, W., Jiang, friends in hugging face. Advances in Neural Information
Y. E., Lv, C., and Chen, H. Autoact: Automatic agent Processing Systems, 36, 2024b.
learning from scratch via self-planning. arXiv preprint
arXiv:2401.05268, 2024. Shi, F., Chen, X., Misra, K., Scales, N., Dohan, D., Chi,
E. H., Schärli, N., and Zhou, D. Large language models
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., can be easily distracted by irrelevant context. In Inter-
Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., national Conference on Machine Learning, pp. 31210–
et al. Learning transferable visual models from natural 31227. PMLR, 2023.
language supervision. In International conference on
Shi, T., Karpathy, A., Fan, L., Hernandez, J., and Liang,
machine learning, pp. 8748–8763. PMLR, 2021.
P. World of bits: An open-domain platform for web-
Reid, M., Savinov, N., Teplyashin, D., Lepikhin, D., Lilli- based agents. In International Conference on Machine
crap, T., Alayrac, J.-b., Soricut, R., Lazaridou, A., Firat, Learning, pp. 3135–3144. PMLR, 2017.
O., Schrittwieser, J., et al. Gemini 1.5: Unlocking multi- Shi, Z., Gao, S., Chen, X., Yan, L., Shi, H., Yin, D., Chen, Z.,
modal understanding across millions of tokens of context. Ren, P., Verberne, S., and Ren, Z. Learning to use tools
arXiv preprint arXiv:2403.05530, 2024. via cooperative and interactive agents. arXiv preprint
arXiv:2403.03031, 2024.
Ren, T., Liu, S., Zeng, A., Lin, J., Li, K., Cao, H., Chen,
J., Huang, X., Chen, Y., Yan, F., et al. Grounded sam: Shinn, N., Labash, B., and Gopinath, A. Reflexion: an au-
Assembling open-world models for diverse visual tasks. tonomous agent with dynamic memory and self-reflection.
arXiv preprint arXiv:2401.14159, 2024. arXiv preprint arXiv:2303.11366, 2023.

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Speer, R., Chin, J., and Havasi, C. Conceptnet 5.5: An
Ommer, B. High-resolution image synthesis with latent open multilingual graph of general knowledge. In Pro-
diffusion models. In Proceedings of the IEEE/CVF con- ceedings of the AAAI conference on artificial intelligence,
ference on computer vision and pattern recognition, pp. volume 31, 2017.
10684–10695, 2022. Stureborg, R., Alikaniotis, D., and Suhara, Y. Large lan-
guage models are inconsistent and biased evaluators.
Schick, T., Dwivedi-Yu, J., Dessı̀, R., Raileanu, R., Lomeli,
arXiv preprint arXiv:2405.01724, 2024.
M., Hambro, E., Zettlemoyer, L., Cancedda, N., and
Scialom, T. Toolformer: Language models can teach Su, W., Zhu, X., Cao, Y., Li, B., Lu, L., Wei, F., and Dai, J.
themselves to use tools. Advances in Neural Information Vl-bert: Pre-training of generic visual-linguistic represen-
Processing Systems, 36, 2024. tations. arXiv preprint arXiv:1908.08530, 2019.

9
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Sun, Q., Yin, Z., Li, X., Wu, Z., Qiu, X., and Kong, Wang, W., Bao, H., Dong, L., Bjorck, J., Peng, Z., Liu,
L. Corex: Pushing the boundaries of complex reason- Q., Aggarwal, K., Mohammed, O. K., Singhal, S., Som,
ing through multi-model collaboration. arXiv preprint S., et al. Image as a foreign language: Beit pretraining
arXiv:2310.00280, 2023. for all vision and vision-language tasks. arXiv preprint
arXiv:2208.10442, 2022a.
Sun, R. Can a cognitive architecture fundamentally enhance
llms? or vice versa? arXiv preprint arXiv:2401.10444, Wang, W., Lv, Q., Yu, W., Hong, W., Qi, J., Wang, Y., Ji,
2024. J., Yang, Z., Zhao, L., Song, X., et al. Cogvlm: Visual
expert for pretrained language models. arXiv preprint
Suri, G., Slater, L. R., Ziaee, A., and Nguyen, M. Do arXiv:2311.03079, 2023c.
large language models show decision heuristics similar
to humans? a case study using gpt-3.5. Journal of Exper- Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang,
imental Psychology: General, 2024. S., Chowdhery, A., and Zhou, D. Self-consistency im-
proves chain of thought reasoning in language models.
Surı́s, D., Menon, S., and Vondrick, C. Vipergpt: Visual arXiv preprint arXiv:2203.11171, 2022b.
inference via python execution for reasoning. In Pro-
ceedings of the IEEE/CVF International Conference on Wang, Z., Wan, W., Chen, R., Lao, Q., Lang, M., and Wang,
Computer Vision, pp. 11888–11898, 2023. K. Towards top-down reasoning: An explainable multi-
agent approach for visual question answering. arXiv
Talebirad, Y. and Nadiri, A. Multi-agent collaboration:
preprint arXiv:2311.17331, 2023d.
Harnessing the power of intelligent llm agents. arXiv
preprint arXiv:2306.03314, 2023. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi,
E., Le, Q. V., Zhou, D., et al. Chain-of-thought prompting
Tang, Q., Deng, Z., Lin, H., Han, X., Liang, Q., and
elicits reasoning in large language models. Advances in
Sun, L. Toolalpaca: Generalized tool learning for lan-
neural information processing systems, 35:24824–24837,
guage models with 3000 simulated cases. arXiv preprint
2022.
arXiv:2306.05301, 2023.

Tversky, A. and Kahneman, D. Rational choice and the Wikipedia contributors. Plagiarism — Wikipedia,
framing of decisions. Decision making: Descriptive, the free encyclopedia, 2004. URL https:
normative, and prescriptive interactions, pp. 167–192, //en.wikipedia.org/w/index.php?title=
1988. Plagiarism&oldid=5139350. [Online; accessed
22-July-2004].
Valmeekam, K., Marquez, M., Sreedharan, S., and Kamb-
hampati, S. On the planning abilities of large language Wong, L., Mao, J., Sharma, P., Siegel, Z. S., Feng, J., Ko-
models-a critical investigation. Advances in Neural Infor- rneev, N., Tenenbaum, J. B., and Andreas, J. Learning
mation Processing Systems, 36:75993–76005, 2023. adaptive planning representations with natural language
guidance. arXiv preprint arXiv:2312.08566, 2023.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. At- Wu, J., Lu, J., Sabharwal, A., and Mottaghi, R. Multi-modal
tention is all you need. Advances in neural information answer validation for knowledge-based vqa. In Proceed-
processing systems, 30, 2017. ings of the AAAI conference on artificial intelligence,
volume 36, pp. 2712–2721, 2022.
Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu,
Y., Fan, L., and Anandkumar, A. Voyager: An open- Wu, Q., Bansal, G., Zhang, J., Wu, Y., Zhang, S., Zhu, E., Li,
ended embodied agent with large language models. arXiv B., Jiang, L., Zhang, X., and Wang, C. Autogen: Enabling
preprint arXiv:2305.16291, 2023a. next-gen llm applications via multi-agent conversation
framework. arXiv preprint arXiv:2308.08155, 2023.
Wang, J., Liang, Y., Meng, F., Sun, Z., Shi, H., Li, Z., Xu,
J., Qu, J., and Zhou, J. Is chatgpt a good nlg evaluator? Wu, S., Xie, J., Chen, J., Zhu, T., Zhang, K., and Xiao,
a preliminary study. arXiv preprint arXiv:2303.04048, Y. How easily do irrelevant inputs skew the responses of
2023b. large language models? arXiv preprint arXiv:2404.03302,
2024.
Wang, P., Xiao, Z., Chen, H., and Oswald, F. L. Will the
real linda please stand up... to large language models? Xie, J., Chen, Z., Zhang, R., Wan, X., and Li, G.
examining the representativeness heuristic in llms. arXiv Large multimodal agents: A survey. arXiv preprint
preprint arXiv:2404.01461, 2024. arXiv:2402.15116, 2024a.

10
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Xie, Y., Mallick, T., Bergerson, J. D., Hutchison, J. K., Ye, H., Gui, H., Zhang, A., Liu, T., Hua, W., and Jia,
Verner, D. R., Branham, J., Alexander, M. R., Ross, R. B., W. Beyond isolation: Multi-agent synergy for im-
Feng, Y., Levy, L.-A., et al. Wildfiregpt: Tailored large proving knowledge graph construction. arXiv preprint
language model for wildfire analysis. arXiv preprint arXiv:2312.03022, 2023.
arXiv:2402.07877, 2024b.
Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., and Tenen-
Xiong, K., Ding, X., Cao, Y., Liu, T., and Qin, B. Div- baum, J. Neural-symbolic vqa: Disentangling reasoning
ing into the inter-consistency of large language models: from vision and language understanding. Advances in
An insightful analysis through debate. arXiv preprint neural information processing systems, 31, 2018.
arXiv:2305.11595, 2023.
Yin, D., Brahman, F., Ravichander, A., Chandu, K., Chang,
Xu, S., Pang, L., Shen, H., Cheng, X., and Chua, T.-s. K.-W., Choi, Y., and Lin, B. Y. Lumos: Learning agents
Search-in-the-chain: Towards the accurate, credible and with unified data, modular design, and open-source llms.
traceable content generation for complex knowledge- arXiv preprint arXiv:2311.05657, 2023.
intensive tasks. arXiv preprint arXiv:2304.14732, 2023. Yoshikawa, H. and Okazaki, N. Selective-lama: Selective
prediction for confidence-aware evaluation of language
Xu, X., Wang, Y., Xu, C., Ding, Z., Jiang, J., Ding, Z.,
models. In Findings of the Association for Computational
and Karlsson, B. F. A survey on game playing agents
Linguistics: EACL 2023, pp. 2017–2028, 2023.
and large models: Methods, applications, and challenges.
arXiv preprint arXiv:2403.10249, 2024. Zelikman, E., Huang, Q., Poesia, G., Goodman, N., and
Haber, N. Parsel: Algorithmic reasoning with language
Yang, Z. and Zhu, Z. Curiousllm: Elevating multi-document models by composing decompositions. Advances in Neu-
qa with reasoning-infused knowledge graph prompting. ral Information Processing Systems, 36:31466–31523,
arXiv preprint arXiv:2404.09077, 2024. 2023.
Yang, Z., Chen, G., Li, X., Wang, W., and Yang, Y. Do- Zhang, J., Hou, Y., Xie, R., Sun, W., McAuley, J., Zhao,
raemongpt: Toward understanding dynamic scenes with W. X., Lin, L., and Wen, J.-R. Agentcf: Collaborative
large language models. arXiv preprint arXiv:2401.08392, learning with autonomous language agents for recom-
2024. mender systems. arXiv preprint arXiv:2310.09233, 2023.

Yao, S., Chen, H., Yang, J., and Narasimhan, K. Web- Zhang, Y., Mao, S., Ge, T., Wang, X., de Wynter, A., Xia, Y.,
shop: Towards scalable real-world web interaction with Wu, W., Song, T., Lan, M., and Wei, F. Llm as a master-
grounded language agents. Advances in Neural Informa- mind: A survey of strategic reasoning with large language
tion Processing Systems, 35:20744–20757, 2022a. models. arXiv preprint arXiv:2404.01230, 2024a.

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, Zhang, Z., Bo, X., Ma, C., Li, R., Chen, X., Dai, Q., Zhu,
K., and Cao, Y. React: Synergizing reasoning and acting J., Dong, Z., and Wen, J.-R. A survey on the memory
in language models. arXiv preprint arXiv:2210.03629, mechanism of large language model based agents. arXiv
2022b. preprint arXiv:2404.13501, 2024b.

Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., Zhao, S. and Xu, H. Less is more: Toward zero-shot local
and Narasimhan, K. Tree of thoughts: Deliberate problem scene graph generation via foundation models. arXiv
solving with large language models. Advances in Neural preprint arXiv:2310.01356, 2023.
Information Processing Systems, 36, 2024. Zhao, Y., Lin, Z., Zhou, D., Huang, Z., Feng, J., and Kang,
B. Bubogpt: Enabling visual grounding in multi-modal
Yao, W., Heinecke, S., Niebles, J. C., Liu, Z., Feng, Y., Xue, llms. arXiv preprint arXiv:2307.08581, 2023.
L., Murthy, R., Chen, Z., Zhang, J., Arpit, D., et al. Retro-
former: Retrospective large language agents with policy Zhao, Z., Ma, K., Chai, W., Wang, X., Chen, K., Guo, D.,
gradient optimization. arXiv preprint arXiv:2308.02151, Zhang, Y., Wang, H., and Wang, G. Do we really need
2023. a complex agent system? distill embodied agent into a
single model. arXiv preprint arXiv:2404.04619, 2024.
Yasunaga, M., Aghajanyan, A., Shi, W., James, R.,
Leskovec, J., Liang, P., Lewis, M., Zettlemoyer, L., and Zheng, B., Gou, B., Kil, J., Sun, H., and Su, Y. Gpt-4v
Yih, W.-t. Retrieval-augmented multimodal language (ision) is a generalist web agent, if grounded. arXiv
modeling. arXiv preprint arXiv:2211.12561, 2022. preprint arXiv:2401.01614, 2024a.

11
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z.,
Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. Judging
llm-as-a-judge with mt-bench and chatbot arena. Ad-
vances in Neural Information Processing Systems, 36,
2024b.
Zhong, M., Liu, Y., Yin, D., Mao, Y., Jiao, Y., Liu, P.,
Zhu, C., Ji, H., and Han, J. Towards a unified multi-
dimensional evaluator for text generation. arXiv preprint
arXiv:2210.07197, 2022.
Zhu, D., Chen, J., Shen, X., Li, X., and Elhoseiny, M.
Minigpt-4: Enhancing vision-language understanding
with advanced large language models. arXiv preprint
arXiv:2304.10592, 2023a.
Zhu, X., Chen, Y., Tian, H., Tao, C., Su, W., Yang, C.,
Huang, G., Li, B., Lu, L., Wang, X., et al. Ghost in the
minecraft: Generally capable agents for open-world envi-
roments via large language models with text-based knowl-
edge and memory. arXiv preprint arXiv:2305.17144,
2023b.

12
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

A. Orderability of Preferences
Comparability When faced with any two alternatives A and B, the agent should have at least a weak preference, i.e.,
A ⪰ B or B ⪰ A. This means that the agent can compare any pair of alternatives and determine which one is preferred or if
they are equally preferred.

Transitivity If the agent prefers A to B and B to C, then the agent must prefer A to C. This ensures that the agent’s
preferences are consistent and logical across multiple comparisons.

Closure If A and B are in the alternative set S, then any probabilistic combination of A and B (denoted as ApB) should
also be in S. This principle ensures that the set of alternatives is closed under probability mixtures.

Distribution of probabilities across alternatives If A and B are in S, then the probability mixture of (ApB) and B,
denoted as [(ApB)qB], should be indifferent to the probability mixture of A and B, denoted as (ApqB). This principle
ensures consistency in the agent’s preferences when dealing with probability mixtures of alternatives.

Solvability When faced with three alternatives A, B, and C, with the preference order A ⪰ B ⪰ C, there should be some
probabilistic way of combining A and C such that the agent is indifferent between choosing B or this combination. In other
words, the agent should be able to find a solution to the decision problem by making trade-offs between alternatives.

B. LLM-based Evaluations
Recent research underscores a critical need for more rational LLM-based evaluation methods, particularly for assessing
open-ended language responses. CoBBLEr (Koo et al., 2023) provides a cognitive bias benchmark for evaluating LLMs as
evaluators, revealing a preference for their own outputs over those from other LLMs. Stureborg et al. (2024) argues that
LLMs are biased evaluators towards more familiar tokens and previous predictions, and exhibit strong self-inconsistency in
the score distribution. Luo et al. (2023); Shen et al. (2023); Gao et al. (2023c); Wang et al. (2023b); Chen et al. (2023);
Chiang & Lee (2023); Zheng et al. (2024b); Fu et al. (2023); Liu et al. (2023b) also point out the problem with a single
LLM as the evaluator, with concerns over factual and rating inconsistencies, a high dependency on prompt design, a low
correlation with human evaluations, and struggles with the comparison, i.e., the orderability of preferences.
Multi-agent systems might be a possible remedy. By involving multiple evaluative agents from diverse perspectives, it
becomes possible to achieve a more balanced and consistent orderability of preferences. For instance, ChatEval (Chan et al.,
2023) posits that a multi-agent debate evaluation usually offers judgments that are better aligned with human annotators
compared to single-agent ones. Bai et al. (2024) also finds decentralized methods yield fairer evaluation results. Multi-Agent
VQA (Jiang et al., 2024) relies on a group of LLM-based graders for evaluating zero-shot, open-world visual question
answering, where exact answer matches are no longer feasible.

C. More Explanations on Scope

Rationality, by definition, is not equivalent to reasoning (Khardon & Roth, 1997; Huang & Chang, 2022; Zhang et al.,
2024a; Qiao et al., 2022), although deeply intertwined. Rationality involves making logically consistent decisions grounded
with reality, while reasoning refers to the cognitive process of drawing logical inferences and conclusions from available
information, as illustrated in the following thought experiment:

Consider an environment where the input space and the output decision space are finite. A lookup table with
consistent mapping from input to output is inherently rational, while no reasoning is necessarily present in the
mapping.

Despite this example, it is still crucial to acknowledge that reasoning typically plays a vital role in ensuring rationality,
especially in complex and dynamic real-world scenarios where a simple lookup table is insufficient. Agents must possess the
ability to reason through novel situations, adapt to changing circumstances, make plans, and achieve rational decisions based
on incomplete or uncertain information. Furthermore, reasoning is crucial when faced with conflicting data or competing
objectives. It helps systems to weigh the evidence, consider alternative perspectives, and make trade-offs between different
courses of action. Through reasoning, individuals can weigh the evidence, consider alternative perspectives, and make

13
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

trade-offs between different courses of action. This process allows for more nuanced and context-dependent decision-making
while navigating the intricacies in the real world. , all of which are fundamental steps in making rational decisions.
Rationality is also different from Theory of Mind (ToM) (Apperly & Butterfill, 2009; Nye et al., 2021; Oguntola et al., 2021;
Hagendorff, 2023; Li et al., 2023a; Sclar et al., 2023; Kosinski, 2023) in machine psychology. ToM refers to the model’s
ability to understand that others’ mental states, beliefs, desires, emotions, and intentions may be different from its own.

D. More Related Work on Multi-Modal Models

As a picture is worth a thousand words, recent advances in large vision-language pretraining have enabled LLMs with robust
language comprehension capabilities to finally perceive the visual world. Multi-modal foundation models, including but
not limited to CLIP (Radford et al., 2021), VLBERT and ViLBERT (Su et al., 2019; Lu et al., 2019), BLIP-2 (Li et al.,
2023b), (Open) Flamingo (Alayrac et al., 2022; Awadalla et al., 2023), LLaVA (Liu et al., 2024a; 2023a), CogVLM (Wang
et al., 2023c), MiniGPT-4 (Zhu et al., 2023a), GPT-4 Vision (OpenAI, 2023) and GPT-4o (OpenAI, 2024), and Gemini 1.5
Pro (Reid et al., 2024) serve as the cornerstones for multi-modal agent systems to ground knowledge in vision and beyond.
Chain-of-Action (Pan et al., 2024) advances the single-modal Search-in-the-Chain (Xu et al., 2023) by supporting multi-
modal data retrieval for faithful question answering. We leave more discussions to Appendix D. DoraemonGPT (Yang et al.,
2024) decomposes complex tasks into simpler ones toward understanding dynamic scenes, where multi-modal understanding
is necessary for spatial-temporal videos analysis. RA-CM3 (Yasunaga et al., 2022) augments baseline retrieval-augmented
LLMs with raw multi-modal documents that include both images and texts, assuming that these two modalities can
contextualize each other and make the documents more informative, leading to better generator performance. The multi-
modal capabilities also allow HuggingGPT (Shen et al., 2024b), Agent LUMOS (Yin et al., 2023), ToolAlpaca (Tang et al.,
2023), and AssistGPT (Gao et al., 2023b) to expand the scope of tasks they can address, including cooperation among
specialized agents or tools capable of handling different information modalities.
Web agents are another example of how multi-modal agents surpass language-only ones. In agents like Pix2Act (Shaw et al.,
2024), WebGUM (Furuta et al., 2023), CogAgent(Hong et al., 2023b), and SeeAct (Zheng et al., 2024a), web navigation
is grounded on graphical user interfaces (GUIs) rather than solely on HTML texts (Shen et al., 2024a; Yao et al., 2022a;
Deng et al., 2024; Gur et al., 2023). This method of visual grounding offers higher information density compared to HTML
codes that are usually lengthy, noisy, and sometimes even incomplete (Zheng et al., 2024a). Supporting the importance of
vision, ablation studies in WebGUM (Furuta et al., 2023) also reports 5.5% success rate improvement on the MiniWoB++
dataset (Shi et al., 2017; Liu et al., 2018) by simply adding the image modality.
Large world models is an emerging and promising direction to reduce multimodal hallucinations. The notion is also
mentioned in “Objective-driven AI” (LeCun, 2024), where agents have behavior driven by fulfilling objectives, i.e., drives,
and they understand how the world works with common sense knowledge, beyond an auto-regressive generation. LeCun
(2024) proposes the urgency for agents to learn to reason beyond feed-forward, i.e., the System 1 subconscious computation,
and start making System 2 reasoning and planning on complicated actions to satisfy objectives with a grounding on world
models. For example, Ghost-in-the-Minecraft (Zhu et al., 2023b) and Voyager (Wang et al., 2023a) have agents living in a
well-defined game-world environment. JEPA (LeCun, 2022) creates a recurrent world model in an abstract representation
space. Large World Model (LWM) (Liu et al., 2024b) and Sora (Brooks et al., 2024) develop insights from both textual
knowledge and the world through video sequences. They both advance toward general-purpose simulators of the world, but
still lack reliable physical engines for guaranteed grounding in real-world dynamics.
The concept of invariance is the cornerstone of Visual Question Answering (VQA) agents (Chen et al., 2022; Jiang et al.,
2024; Wang et al., 2023d; Yi et al., 2018; Wang et al., 2022a; Bao et al., 2022; Zhao & Xu, 2023). On one hand, these
agents must grasp the invariant semantics of any open-ended questions posed about images, maintaining consistency despite
variations in wording, syntax, or language. On the other hand, within a multi-agent VQA system, visual agents can provide
crucial verification and support for language-based reasoning (Wang et al., 2023d; Jiang et al., 2024; Zhao & Xu, 2023),
while language queries can direct the attention of visual agents, based on a shared and invariant underlying knowledge
across vision and language domains.

14
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

E. More Related Work on Knowledge Retrieval

There are multiple works that construct large-scale knowledge graphs (KGs) (Hogan et al., 2021) from real-world sources
to effectively expand their working memory. Specifically, compared to language-only models, MAVEx (Wu et al., 2022)
improves system’s scores by 9.5% compared to an image-only baseline through the integration of knowledge from
ConceptNet (Speer et al., 2017) and Wikipedia (Wikipedia contributors, 2004). It also improves the scores by 8.3% by using
the image modality for cross-modal validations with an oracle. Thanks to the external knowledge base, ReAct (Yao et al.,
2022b) reduces false positive rates from hallucination by 8.0% compared to CoT (Wei et al., 2022). CuriousLLM (Yang
& Zhu, 2024) presents ablation studies showing the effectiveness of KGs on improving reasoning within the search
process. MineDojo (Fan et al., 2022) observes that internet-scale multi-modal knowledge allows models to significantly
outperform all creative task baselines. Equipped with world knowledge, RA-CM3 (Yasunaga et al., 2022) can finally
generate faithful images from captions compared to CM3 (Aghajanyan et al., 2022) and Stable Diffusion (Rombach
et al., 2022). CooperKGC (Ye et al., 2023) enables multi-agent collaborations, leveraging knowledge bases of different
experts. It finds that the incorporation of KGs improves F1 scores by 10.0-33.6% across different backgrounds, and adding
more collaboration rounds also enhance performance by about 10.0-30.0%. DoraemonGPT (Yang et al., 2024) supports
knowledge tools to assist the understanding of specialized video contents. SIRI (Wang et al., 2023d) builds a multi-view
knowledge base to increase the explainability of visual question answering. Grounding agents in external knowledge base
also promotes more factual rationales and fewer hallucinations, especially in scientific and medical domains, exemplified by
Chameleon (Lu et al., 2024), Chain-of-Knowledge (Li et al., 2023c), WildfireGPT (Xie et al., 2024b), and Agent Hospital (Li
et al., 2024b). Chain-of-Knowledge (Li et al., 2023c) even discovers that integrating multiple knowledge sources enhances
performance by 2.1% compared to using a single source in its experiments.

F. More Related Work on Using Tools

VisProg (Gupta & Kembhavi, 2023), ViperGPT (Surı́s et al., 2023), and Parsel (Zelikman et al., 2023) generate Python
programs to reliably execute subroutines. Gupta & Kembhavi (2023); Surı́s et al. (2023) also invoke off-the-shelf mod-
els for multimodal assistance. Foundation models are not specifically trained for object detection or segmentation, so
BuboGPT (Zhao et al., 2023) and Multi-Agent VQA (Jiang et al., 2024) call SAM (Kirillov et al., 2023; Ren et al., 2024) as
the tool, and Jiang et al. (2024) finds 8.8% of accuracy improvements compared to a single agent. Besides, BabyAGI (Naka-
jima, 2023), Chamelon (Lu et al., 2024), AssistGPT (Gao et al., 2023b), Avis (Hu et al., 2024), ToolAlpaca (Tang et al.,
2023), MetaGPT (Hong et al., 2023a), Agent LUMOS (Yin et al., 2023), AutoAct (Qiao et al., 2024), α-UMi (Shen et al.,
2024a), and ConAgents (Shi et al., 2024) harness compositional reasoning to enable generalized multi-agent systems with
planning and modular tool-using capabilities in real-world scenarios.
To boil down the task into its logical essence, Multi-Agent VQA (Jiang et al., 2024), as an example, has an LLM which
provides only relevant object names rather than the whole visual question to the Grounded SAM (Ren et al., 2024) component
of the system acting as an object-detector. Similarly, the image editing tools in VisProg (Gupta & Kembhavi, 2023) only
receive a fixed set of arguments translated from user queries to perform deterministic code executions. SeeAct (Zheng
et al., 2024a) as a Web agent explores vision-language models, ranking models, and a bounding box annotation tool to
improve Web elements grounding from lengthy and noisy HTML codes. Consequently, using tools in a multi-agent system
enhances the invariance and independence of agents from irrelevant contexts, ensuring that their operations are streamlined
and focused solely on necessary information.

G. More Related Work on Neuro-Symbolic Reasoning

Coherent Orderability of Preference SymbolicToM (Sclar et al., 2023) and KRISP (Marino et al., 2021) construct
explicit symbolic graphs and answer questions by retrieving nodes in the graph. Binder (Cheng et al., 2022), Parsel (Zelikman
et al., 2023), LEFT (Hsu et al., 2024), and Fang et al. (2024) decompose tasks into planning, parsing, and execution, where
the symbolic reasoning agents can help maintain a coherent order of preferences among symbolic options in the system
outputs. By skipping the symbolic module, Parsel (Zelikman et al., 2023) observes substantial performance drops by 19.5%.
LEFT (Hsu et al., 2024) also outperforms end-to-end baselines without symbolic programs by 3.85% on average across
multiple experiments. In more explicit scenarios, logical modules can directly compare the order of options represented as
variables—such as “left” or “right” in relational logic (Hsu et al., 2024)—rather than relying on a single LLM to generate
responses indeterministically within the natural language space.

15
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Abstraction that Boils Down to Logical Essence Beyond detailed symbolic reasoning steps, these modules typically
expect a standardized input formats, similar to API calls of tool usage. This layer of abstraction enhances the independence
from irrelevant contexts and maintains the invariance of LLMs when handling natural language queries. The only relevant
factor is the parsed inputs into the predetermined neuro-symbolic programs. For instance, Ada (Wong et al., 2023) introduces
symbolic operators to abstract actions, ensuring that lower-level planning models are not compromised by irrelevant
information in the queries and observations. Without the symbolic action library, a single LLM would frequently fail at
grounding objects or obeying environmental conditions, resulting in a significant accuracy gap of approximately 59.0-89.0%.

H. More Related Work on Reflection, Debate, and Memory

Corex (Sun et al., 2023) finds that orchestrating multiple agents to work together yields better complex reasoning results,
exceeding strong single-agent baselines (Wang et al., 2022b) by an average of 1.1-10.6%. Retroformer (Yao et al., 2023)
equips the single-agent Reflexion (Shinn et al., 2023) algorithm with an additional LLM to generate verbal reinforcement cues
and assist its self-improvement, enhancing accuracy by 1.0-20.9%. ChatEval (Chan et al., 2023) introduces a multi-agent
debate framework to mimic human annotators collaborating in robust answer evaluations. Its multi-agent approach achieves
greater alignment with human preferences compared to single-agent evaluations, enhancing accuracy by 6.2% for GPT-3.5
and 2.5% for GPT-4, and an increase of 16.3% and 10.0% in average Spearman and Kendall-Tau correlations (Zhong et al.,
2022) with human judgements in GPT-4. MetaAgents (Li et al., 2023e) effectively coordinates agents within task-oriented
social contexts to achieve consistent behavior patterns, and the implementation of agent reflection in this system leads to a
21.0% improvement in success rates.
LM vs LM (Cohen et al., 2023), FORD (Xiong et al., 2023), Multi-Agent Debate (Liang et al., 2023; Du et al., 2023),
DyLAN (Liu et al., 2023c), and Khan et al. (2024) highlight the profound impact of multi-agent collaboration through
cross-examination and debates. These studies demonstrate substantial improvements in performance when multiple agents
are orchestrated to work in collaboration. Specifically, LM vs LM (Cohen et al., 2023) illustrates how its multi-agent
framework improves F1 scores by an average of 15.7% compared to the single-agent baseline (Yoshikawa & Okazaki, 2023).
FORD (Xiong et al., 2023) reports an accuracy increase up to 4.9% compared to a single LLM. Liang et al. (2023) indicates
significant improvements in accuracy — 17.0% for translation tasks and 16.0% for reasoning tasks — by employing a
multi-agent strategy, effectively bridging the performance gap between GPT-3.5 and GPT-4 by harnessing multi-agents. Du
et al. (2023) finds that multi-agent debates not only enhance reasoning performance by 8.0-14.8%, but more importantly,
increase factual accuracy by 7.2-15.9%. DyLAN (Liu et al., 2023c) observes 3.5-4.1% in accuracy improvements over
single-agent execution. Multi-agent debating in Khan et al. (2024) also leads to more truthful answers, boosting single-agent
baselines by 28.0%. Multi-Agent Collaboration (Talebirad & Nadiri, 2023), ChatDev (Qian et al., 2023), AgentCF (Zhang
et al., 2023), AutoGen (Wu et al., 2023), Social Learning (Mohtashami et al., 2023), S3 (Gao et al., 2023a), Ke et al. (2024),
and Chern et al. (2024) continue to push the frontier of a multi-agent system’s applications beyond daily conversation to a
versatile set of real-world task completions.

Neha Bhosle - Sociology Optional Notes - Paper 1 PDF
100% (1)
Neha Bhosle - Sociology Optional Notes - Paper 1 PDF
371 pages
My Revision Notes A-Level ReligiousStudies Islam Sample
No ratings yet
My Revision Notes A-Level ReligiousStudies Islam Sample
20 pages
(Studies On Entrepreneurship, Structural Change and Industrial Dynamics) João Leitão, António Nunes
No ratings yet
(Studies On Entrepreneurship, Structural Change and Industrial Dynamics) João Leitão, António Nunes
312 pages
Life Skill Notes
No ratings yet
Life Skill Notes
29 pages
Book Notes - Never Split The Difference-Chris Voss
100% (2)
Book Notes - Never Split The Difference-Chris Voss
26 pages
Diss Lesson 1
No ratings yet
Diss Lesson 1
55 pages
Survey LLM-Agents 2025
No ratings yet
Survey LLM-Agents 2025
44 pages
The Art of Persuasion Winning Without Intimidation (Bob Burg) (Z-Library)
100% (2)
The Art of Persuasion Winning Without Intimidation (Bob Burg) (Z-Library)
175 pages
James Griffin - Well-Being - Its Meaning, Measurement, and Moral Importance (Clarendon Paperbacks) - Oxford University Press, USA (1989)
No ratings yet
James Griffin - Well-Being - Its Meaning, Measurement, and Moral Importance (Clarendon Paperbacks) - Oxford University Press, USA (1989)
299 pages
Basic Concepts of Ethics (UPDATED)
No ratings yet
Basic Concepts of Ethics (UPDATED)
11 pages
Luhmann, The Non-Trivial Machine and The Neocybernetic Regime of Truth - E. Hörl
No ratings yet
Luhmann, The Non-Trivial Machine and The Neocybernetic Regime of Truth - E. Hörl
29 pages
Survey Agent Optimization Arxiv 2503
No ratings yet
Survey Agent Optimization Arxiv 2503
42 pages
Sqa Ah English Dissertation
100% (1)
Sqa Ah English Dissertation
7 pages
Baysiean Paper
No ratings yet
Baysiean Paper
48 pages
Nation Roleplay Rules
No ratings yet
Nation Roleplay Rules
21 pages
Advances and Challenges in Foundation Agents
No ratings yet
Advances and Challenges in Foundation Agents
264 pages
LLMs Agents Guide
No ratings yet
LLMs Agents Guide
11 pages
Can China's Tianxia Philosophy Save Us From Growing Global Chaos - The Washington Post
100% (1)
Can China's Tianxia Philosophy Save Us From Growing Global Chaos - The Washington Post
7 pages
10 Ijsshr Sexuality & Dating
No ratings yet
10 Ijsshr Sexuality & Dating
151 pages
The Rise and Potential of Large Language Model
No ratings yet
The Rise and Potential of Large Language Model
86 pages
TPTU: Task Planning and Tool Usage of Large Language Model-Based AI Agents
No ratings yet
TPTU: Task Planning and Tool Usage of Large Language Model-Based AI Agents
36 pages
Chapter. 01 - Overview of Artificial Intelligence
No ratings yet
Chapter. 01 - Overview of Artificial Intelligence
55 pages
New Public Management As A Tool For Changes in Pub
No ratings yet
New Public Management As A Tool For Changes in Pub
28 pages
Bellavita Hadiatul Anthropocentrism and Its Impact On The
No ratings yet
Bellavita Hadiatul Anthropocentrism and Its Impact On The
60 pages
Agentverse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
No ratings yet
Agentverse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
34 pages
Wisdom of The Silicon Crowd 2402.19379
No ratings yet
Wisdom of The Silicon Crowd 2402.19379
20 pages
A Stupidity-Based Theory of Organizations
No ratings yet
A Stupidity-Based Theory of Organizations
27 pages
7 3 Rational Exponents
No ratings yet
7 3 Rational Exponents
36 pages
AI T1 Intro
No ratings yet
AI T1 Intro
25 pages
Vincent Muller - Philosophy of AI, A Structured Overview - 2023
No ratings yet
Vincent Muller - Philosophy of AI, A Structured Overview - 2023
24 pages
Effectuation Logic: A Systematized Literature Review On Entrepreneurs Decision Making Process From 2001-2016
No ratings yet
Effectuation Logic: A Systematized Literature Review On Entrepreneurs Decision Making Process From 2001-2016
20 pages
Autonomous AI Agents
No ratings yet
Autonomous AI Agents
44 pages
Survey On Evaluation of LLM-based Agents
No ratings yet
Survey On Evaluation of LLM-based Agents
20 pages
An In-Depth Survey of Large Language Model-Based Artificial Intelligence Agents
No ratings yet
An In-Depth Survey of Large Language Model-Based Artificial Intelligence Agents
15 pages
A V: F M - A C - E E B: Gent Erse Acilitating Ulti Gent Ollab Oration and Xploring Mergent Ehaviors
No ratings yet
A V: F M - A C - E E B: Gent Erse Acilitating Ulti Gent Ollab Oration and Xploring Mergent Ehaviors
39 pages
A Survey On LLM-based Multi-Agent Systems: Workflow, Infrastructure, and Challenges
No ratings yet
A Survey On LLM-based Multi-Agent Systems: Workflow, Infrastructure, and Challenges
43 pages
Perception, Reason, Think, and Plan
No ratings yet
Perception, Reason, Think, and Plan
75 pages
Beyond Task Performance: Evaluating and Readucing The Flaws of Large Multimodal Models With In-Context Learning
No ratings yet
Beyond Task Performance: Evaluating and Readucing The Flaws of Large Multimodal Models With In-Context Learning
31 pages
Agentq
No ratings yet
Agentq
22 pages
2024.findings Emnlp.297
No ratings yet
2024.findings Emnlp.297
24 pages
Formal-LLM: Integrating Formal Language and Natural Language For Controllable LLM-based Agents
No ratings yet
Formal-LLM: Integrating Formal Language and Natural Language For Controllable LLM-based Agents
22 pages
Evaluating The Simulation of Social Dynamics Via Large Language Models
No ratings yet
Evaluating The Simulation of Social Dynamics Via Large Language Models
19 pages
Mental Modelling of Reinforcement Learni
No ratings yet
Mental Modelling of Reinforcement Learni
46 pages
MNLP 2024 Tutorial On Language Agents (Public) - AI Agent Defiion
No ratings yet
MNLP 2024 Tutorial On Language Agents (Public) - AI Agent Defiion
20 pages
2024 Emnlp-Main 416
No ratings yet
2024 Emnlp-Main 416
18 pages
Issues of Transformational Learning Theory
No ratings yet
Issues of Transformational Learning Theory
9 pages
Agent Q
No ratings yet
Agent Q
22 pages
A Comprehensive Evaluation of Cognitive Biases in Llms
No ratings yet
A Comprehensive Evaluation of Cognitive Biases in Llms
36 pages
On The Reasoning Capacity of AI Models and How To Quantify It
No ratings yet
On The Reasoning Capacity of AI Models and How To Quantify It
20 pages
LLM Voting Human Choices and AI Collecti
No ratings yet
LLM Voting Human Choices and AI Collecti
19 pages
Thinking Machines: A Survey of LLM Based Reasoning Strategies
No ratings yet
Thinking Machines: A Survey of LLM Based Reasoning Strategies
15 pages
LLM Agents
No ratings yet
LLM Agents
15 pages
Enabling Intelligent Interactions Between An Agent and An
No ratings yet
Enabling Intelligent Interactions Between An Agent and An
17 pages
Dynasaur:: Large Language Agents Beyond Predefined Actions
No ratings yet
Dynasaur:: Large Language Agents Beyond Predefined Actions
15 pages
LLM M: A S S R L L M: ASA Astermind Urvey of Trategic Easoning With Arge Anguage Odels
No ratings yet
LLM M: A S S R L L M: ASA Astermind Urvey of Trategic Easoning With Arge Anguage Odels
17 pages
第七次课参考文献-REX Rapid Exploration and eXploitation for AI agents
No ratings yet
第七次课参考文献-REX Rapid Exploration and eXploitation for AI agents
16 pages
Exploring The Reasoning Abilities of Multimodal
No ratings yet
Exploring The Reasoning Abilities of Multimodal
29 pages
Colin McGinn Donald Davidson' by Simon Evnine and Donald Davidson's Philosophy of Language' by Bjorn Ramberg LRB 19 August 1993
No ratings yet
Colin McGinn Donald Davidson' by Simon Evnine and Donald Davidson's Philosophy of Language' by Bjorn Ramberg LRB 19 August 1993
11 pages
Multi Agent Collabration
No ratings yet
Multi Agent Collabration
11 pages
Jaspers' Interpretation of Marx and Freud
No ratings yet
Jaspers' Interpretation of Marx and Freud
11 pages
On Evaluating The Integration of Reasoning and Action in LLM Agents
No ratings yet
On Evaluating The Integration of Reasoning and Action in LLM Agents
16 pages
Reasoning and Tools For Human-Level Forecasting: Elvis Hsieh, Preston Fu, Jonathan Chen
No ratings yet
Reasoning and Tools For Human-Level Forecasting: Elvis Hsieh, Preston Fu, Jonathan Chen
9 pages
Xagents: A Framework For Interpretable Rule-Based Multi-Agents Cooperation
No ratings yet
Xagents: A Framework For Interpretable Rule-Based Multi-Agents Cooperation
9 pages
LLM As A Judge
No ratings yet
LLM As A Judge
7 pages
Automated Interview References
No ratings yet
Automated Interview References
16 pages
Measuring The Causality On Cooperatives, Livelihood Diversification and Determinants
No ratings yet
Measuring The Causality On Cooperatives, Livelihood Diversification and Determinants
7 pages
Review-Provincializing Europe-3
No ratings yet
Review-Provincializing Europe-3
9 pages
Do Large Language Models Reason Causally Like Us? Even Better?
No ratings yet
Do Large Language Models Reason Causally Like Us? Even Better?
7 pages
The Influence of Human-Inspired Agentic Sophistication in LLM-driven Strategic Reasoners
No ratings yet
The Influence of Human-Inspired Agentic Sophistication in LLM-driven Strategic Reasoners
8 pages
Flowbench
No ratings yet
Flowbench
17 pages
Overridingness, Moral: Joshua Gert
No ratings yet
Overridingness, Moral: Joshua Gert
7 pages
Agent-as-Tool: A Study On The Hierarchical Decision Making With Reinforcement Learn-Ing
No ratings yet
Agent-as-Tool: A Study On The Hierarchical Decision Making With Reinforcement Learn-Ing
12 pages
Agentic Reasoning: Reasoning Llms With Tools For The Deep Research
No ratings yet
Agentic Reasoning: Reasoning Llms With Tools For The Deep Research
8 pages
Agentic AI A New Paradigm in Autonomous
No ratings yet
Agentic AI A New Paradigm in Autonomous
2 pages
How LLM Agents Learn and Reason
No ratings yet
How LLM Agents Learn and Reason
2 pages
Durkheim and Weber: Understanding Celebrity: Maggie Carragher SOAN 330 Final
No ratings yet
Durkheim and Weber: Understanding Celebrity: Maggie Carragher SOAN 330 Final
17 pages
Autoagents: A Framework For Automatic Agent Generation
No ratings yet
Autoagents: A Framework For Automatic Agent Generation
9 pages
Patterns of Commoning - The Commons Strategies Group
No ratings yet
Patterns of Commoning - The Commons Strategies Group
1 page
Cumulative Distribution Function: A Mathematical Approach to Probabilistic Modeling in Robotics
From Everand
Cumulative Distribution Function: A Mathematical Approach to Probabilistic Modeling in Robotics
Fouad Sabry
No ratings yet
Principles of Intelligent Agents: Definitive Reference for Developers and Engineers
From Everand
Principles of Intelligent Agents: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Introduction to Explainable AI (XAI): Making AI Understandable
From Everand
Introduction to Explainable AI (XAI): Making AI Understandable
Robert Johnson
No ratings yet
The Social Dynamics of Open Data
From Everand
The Social Dynamics of Open Data
African Books Collective
No ratings yet
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
From Everand
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
A Practical Guide to Mixed Research Methodology: For research students, supervisors, and academic authors
From Everand
A Practical Guide to Mixed Research Methodology: For research students, supervisors, and academic authors
Farhad Daneshgar PhD
No ratings yet
Emergence III
From Everand
Emergence III
Larry Matthews
No ratings yet
A Process of Conducting Research
From Everand
A Process of Conducting Research
Prof Gideon C. Mwanza
No ratings yet
Decision Theory: Fundamentals and Applications
From Everand
Decision Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet
Knowledge Reasoning: Fundamentals and Applications
From Everand
Knowledge Reasoning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Upper Ontology: Fundamentals and Applications
From Everand
Upper Ontology: Fundamentals and Applications
Fouad Sabry
No ratings yet
Conceptual Dependency Theory: Fundamentals and Applications
From Everand
Conceptual Dependency Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet

23 Multi Modal and Multi Agent

Uploaded by

23 Multi Modal and Multi Agent

Uploaded by

Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Abstract 2023; Echterhoff et al., 2024; Mukherjee & Chang, 2024;

6. Limitations Niewiadomski, H., Nyczyk, P., et al. Graph of thoughts:

C. More Explanations on Scope

D. More Related Work on Multi-Modal Models

E. More Related Work on Knowledge Retrieval

F. More Related Work on Using Tools

G. More Related Work on Neuro-Symbolic Reasoning

H. More Related Work on Reflection, Debate, and Memory

You might also like