Skip to content
View whustan's full-sized avatar

Block or report whustan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 165 16 Updated Apr 22, 2024

Benchmarks, environments, and toolkits for general computer agents

Python 152 10 Updated Aug 31, 2024

m&ms: A Benchmark to Evaluate Tool-Use for multi-step multi-modal tasks

Python 30 3 Updated Apr 7, 2024
Python 32 5 Updated Jul 2, 2024

[ACL 2024] ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages

5 Updated May 16, 2024

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

Python 93 3 Updated May 9, 2024

A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use

Python 106 11 Updated Mar 22, 2024

ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios.

Jupyter Notebook 234 7 Updated Aug 19, 2023

RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning

Python 11 Updated Apr 11, 2024

The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark.

Python 31 2 Updated Aug 7, 2024

Companion code to https://arxiv.org/abs/2402.15491

Python 9 Updated Apr 26, 2024

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

Python 63 5 Updated Apr 11, 2024

An LLM-based autonomous agent controlling real-world applications via RESTful APIs

Python 1,297 93 Updated Jun 7, 2024

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

Python 281 34 Updated Jul 4, 2024

ToolBench, an evaluation suite for LLM tool manipulation capabilities.

Python 134 11 Updated Feb 28, 2024

Tool Learning for Big Models, Open-Source Solutions of ChatGPT-Plugins

Python 2,876 269 Updated Dec 5, 2023

中文大模型能力评测榜单:目前已囊括115个大模型,覆盖chatgpt、gpt4o、百度文心一言、阿里通义千问、讯飞星火、商汤senseChat、minimax等商用模型, 以及百川、qwen2、glm4、yi、书生internLM2、llama3等开源大模型,多维度能力评测。不仅提供能力评分排行榜,也提供所有模型的原始输出结果!

2,330 112 Updated Sep 10, 2024

⚡️🧪 Fast LLM Tool Calling Experimentation, big and smol

Jupyter Notebook 133 12 Updated Mar 9, 2024

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

Python 1,338 224 Updated Aug 20, 2024

Google Research

Jupyter Notebook 33,794 7,825 Updated Sep 10, 2024

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.

Python 13,270 1,295 Updated Sep 10, 2024

[ICLR 2024] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

Python 60 8 Updated Mar 21, 2024

A python module to repair invalid JSON, commonly used to parse the output of LLMs

Python 705 40 Updated Sep 10, 2024

An opinionated list of awesome Python frameworks, libraries, software and resources.

Python 218,284 24,778 Updated Aug 11, 2024

异步图书 大模型应用开发 动手做AI Agent

Jupyter Notebook 149 33 Updated Jul 4, 2024

LangGPT: Empowering everyone to become a prompt expert!🚀 Structured Prompt,Language of GPT, 结构化提示词,结构化Prompt

Jupyter Notebook 5,321 461 Updated Sep 4, 2024

🔥中文 prompt 精选🔥,ChatGPT 使用指南,提升 ChatGPT 可玩性和可用性!🚀

2,971 266 Updated Jun 11, 2024

Automated Design of Agentic Systems

Python 808 121 Updated Aug 20, 2024

[ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents

Python 10 Updated Jul 12, 2024
Next