Agents 2
Agents 2
Dinesh Raghu
Senior Researcher, IBM Research
LLMs and Tools
tool_calls=[ FunctionCall(name='payment_status',
arguments={"transaction_id": "T1001"}) ]
{"status": "Paid"}
{"status": "Paid"}
tool_calls=[ FunctionCall(name='payment_status',
arguments={"transaction_id": "T1001"}) ]
tool_calls=[ FunctionCall(name='payment_status',
arguments={"transaction_id": "T1001"}) ]
tool_calls=[ FunctionCall(name='payment_status',
tool_calls=[ FunctionCall(name='payment_status', arguments={"transaction_id": ”?"}) ]
arguments={"transaction_id": "T1001"}) ]
Hi there! I can help with that. Can you please provide your transaction ID?
1. Gorilla
2. ToolAlpaca
3. ToolLLM
4. APIGen
*Gorilla: Large Language Model Connected with Massive APIs, Patil et. al., Nov 2023
*Gorilla: Large Language Model Connected with Massive APIs, Patil et. al., Nov 2023
*Gorilla: Large Language Model Connected with Massive APIs, Patil et. al., Nov 2023
*Gorilla: Large Language Model Connected with Massive APIs, Patil et. al., Nov 2023
*Gorilla: Large Language Model Connected with Massive APIs, Patil et. al., Nov 2023
*Gorilla: Large Language Model Connected with Massive APIs, Patil et. al., Nov 2023
*ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases, Tang et. al., Sep 2023
*ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases, Tang et. al., Sep 2023
*ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases, Tang et. al., Sep 2023
*ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases, Tang et. al., Sep 2023
2. ToolAlpaca Dataset
• High Diversity (400 real world inspired APIs)
• Multi turn dialogs with question generation and response generation
*ToolLLM: Facilitating Large Language Models to Master 16000+ Real World APIs, Qin et. al., Oct 2023
*ToolLLM: Facilitating Large Language Models to Master 16000+ Real World APIs, Qin et. al., Oct 2023
*ToolLLM: Facilitating Large Language Models to Master 16000+ Real World APIs, Qin et. al., Oct 2023
*ToolLLM: Facilitating Large Language Models to Master 16000+ Real World APIs, Qin et. al., Oct 2023
2. ToolAlpaca Dataset
• High Diversity (400 real world inspired APIs)
• Multi turn dialogs with question generation and response generation
3. ToolBench (ToolLLM)
• High diversity (16K real world APIs from RapidAPI Hub)
• Multi turn dialogs
• Has single tool setup and multi tool setup
* APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Dataset, Liu et. al., Jun 2024
* APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Dataset, Liu et. al., Jun 2024
* APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Dataset, Liu et. al., Jun 2024
* APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Dataset, Liu et. al., Jun 2024
* APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Dataset, Liu et. al., Jun 2024
* APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Dataset, Liu et. al., Jun 2024
2. ToolAlpaca Dataset
• High Diversity (400 real world inspired APIs)
• Multi turn dialogs with question generation and response generation
3. ToolBench (ToolLLM)
• High diversity (16K real world APIs from RapidAPI Hub)
• Multi turn dialogs
• Has single tool setup and multi tool setup
4. xlam-function-calling-60k (APIGen)
• High diversity (3K high quality real world APIs from RapidAPI Hub)
• high quality multi turn dialogs – thanks to the 3 stage filtering
• Has single tool setup and multi tool setup along with its parallel variants
2. Generalizability: Even though the datasets are generated using diverse sets of APIs (e.g., RapidAPIs),
Basu et al. (2024) has shown that models trained on these datasets have difficulty generalizing to out-of-
domain datasets
3. Granular tasks: Function calling encompasses multiple granular sub-tasks such as function-name
detection, slot filling, and detecting the ordered sequence of functions needed to be called. Existing
models trained to perform function calling lack the ability to handle these granular tasks independently
*Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks, Abdelaziz et. Al., Jun 2024
1. Openness: synthesize training data using models with provide open license
2. Generalizability: repurposed existing datasets with permissible license and added new synthesized
datasets to get better generalization
*Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks, Abdelaziz et. Al., Jun 2024
*Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks, Abdelaziz et. Al., Jun 2024
3. ToolBench (ToolLLM)
• High diversity (16K real world APIs from RapidAPI Hub)
• Multi turn dialogs
• Has single tool setup and multi tool setup
4. xlam-function-calling-60k (APIGen)
• High diversity (3K high quality real world APIs from RapidAPI Hub)
• high quality multi turn dialogs – thanks to the 3 stage filtering
• Has single tool setup and multi tool setup along with its parallel variants