LLMs On The Fly: Text-To-JSON For Custom API Calling
LLMs On The Fly: Text-To-JSON For Custom API Calling
Abstract
In the rapidly evolving landscape of Natural Language Processing (NLP), there is a growing demand for agile and intuitive
tools due to the increasing model capabilities, primarily in the field of Large Language Models (LLMs). In recent months, we
have seen great progress in the Natural Language Generation (NLG) landscape, with proliferation of generative AI applications
leveraging LLMs for a vast number of tasks. The power of LLMs resides in their ability to generalize almost any NLP task to
the problem of next token prediction, thus simplifying the traditional NLP pipelines consisting in intensive data labeling and
domain-specific fine-tuning for a single task. Moreover, LLMs are enhanced (1) with external knowledge bases, which improve
their reasoning and domain understanding and (2) with external tools, which improve their ability to perform actions.
We present a novel approach that harnesses the power of LLMs to transform natural language inputs into structured
data representations, facilitating seamless interaction with custom APIs for real-time data visualization. We explore the
integration of Flythings® Technologies API for Internet of Things (IoT) device solutions in the Industry 4.0 domain. This
system demonstration presents a chat-based virtual assistant that allows users to query the status of monitored machines
and devices. The core component of the application is a LLM that serves as a bridge between user queries and machine-
readable JSON objects, which adhere to a predefined schema following the Flythings standard. Our LLM output facilitates the
interaction with the Flythings API, leading to the generation of visualizations that illustrate IoT device status in real time.
Keywords
NLP, LLM, Fine-tuning, agents, assistants, visualization, API tools, IoT, Monitoring, Industry 4.0
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
tuning and deployment of the optimized and production- 3. Proposed Method
ready LLM. In Section 4, we illustrate the practical ex-
amples carried out and the real world utility of our tool, In this section, we present our methodology, covering
presenting its limitations in Section 5. We conclude with all the steps involved in our pipeline. We describe our
Section 6 by summarizing our findings and outlining the data preparation stage, including the seed data creation
future directions of our research. and data augmentation process. We also formulate our
supervised fine-tuning (SFT) method for our information
extraction task, as well as the inference optimizations
2. Related Work taken into account for our LLM deployment. The overall
process is depicted in Figure 1.
In recent months, we have seen a myriad of LLM re-
search papers addressing the topic of context-aware
LLMs through in-context learning. This capability en- 3.1. Seed Data
ables them to generalize to almost any NLP task, com- In the absence of pre-existing user data for our task, de-
monly unseen during pre-training and fine-tuning stages pendent on the FlyThings® technology, we started creat-
[3, 5, 6]. This direction has led the research commu- ing a dataset. We collected feedback from the Flythings
nity to explore the integration of LLMs with external team, who provided us with the initial examples of poten-
tools such as document stores [7] or APIs [8], enhancing tial user inputs and expected outputs. In this way, we got
their generalization capabilities even more. LLM agents a seed dataset consisting of 6 outputs, each of them with 3
[9] are a new concept arised from providing LLMs with different ways of expressing the input in accordance with
(1) extensive up-to-date data pools beyond their fixed the Flythings team. Given these pairs, we agreed on a
knowledge representations and (2) functions or tools to specification, defining a JSON schema as the golden rule.
perform actions and automate processes [10, 11, 12, 13]. Our pipeline starts with (1) a template-based method for
Such two-fold strategy reduces the need for regular re- generating new JSON outputs as described in Figure 1,
training. For example, Gorilla [8] leverages a multitude randomly selecting one of the available options for each
of APIs and documentation through document retrievers, of the JSON fields, following the schema depicted in Fig-
highlighting the effectiveness of this framework. ure 2. In this way, we got a pool of examples for the next
Moreover, the reasoning capabilities of LLMs are in- data augmentation step.
fluenced by the prompt strategies followed [5, 14, 15],
where how natural language instructions are written
significantly affects the performance [16]. More com-
3.2. Data Augmentation
plex prompting strategies like ReAct [9] became popular, Our seed dataset was scarce and limited in scope, lacking
combining reasoning and planning techniques by adding from input query diversity. Therefore, we followed a data
reasoning traces and task-specific actions to the prompt. augmentation approach. We created a custom pipeline for
These strategies benefit the integration of the LLM with generating alternative input queries, given the reference
external sources. In this new landscape, new benchmark (input, output) pairs from the seed data. For this task,
frameworks were proposed [17, 18], which aim at design- we leveraged the Mixture of Experts (MoE) LLM Mixtral-
ing reliable and robust evaluation methodologies. 8x7B-Instruct-v0.1 model from Mistral AI [24].
The introduction of Generative Information Extraction We aimed at generating variant inputs for each JSON
(GIE) has further boosted the NLP field [19]. Recent stud- output from the previous pool depicted in Figure 1, so that
ies [20] propose LLMs to generate structured information we could increase the available (input, output) pairs. We
from natural language. Some closely-related tasks, like used the original seed as reference within the instruction
text-to-SQL [21, 22], involve the transformation of nat- illustrated in Figure 3, generating 3 variations of the input
ural language into SQL language for querying external for each target through few-shot in-context learning [6].
tools (i.e., databases). This generative approach proves to This process corresponds to the (2) data augmentation
be effective even in scenarios involving complex schemas step depicted in Figure 1. We increased our dataset up to
with millions of entities involved [23]. The ability of 355 curated samples for the following SFT stage.
LLMs to manage these large schemas without dropping
performance (effectively generating the target query fol-
3.3. Supervised Fine-Tuning
lowing a specific format) is particularly significant for
our research. We propose a generation step aiming at Before diving into the details of the fine-tuning process,
transforming natural language queries (sent to our virtual it is important to understand why supervised fine-tuning
assistant) into structured JSON objects with the relevant was necessary in the first place. While zero-shot or few-
parameters for the integration of the FlyThings® API. shot (i.e., in-context) learning [25] can be effective for
general NLP tasks, it entails challenges when the task
1 Instruction Input-Output
1 Json schema (1) Output Pool (2)
Task Pool
JSON
(5) (4) (3)
AWQ Supervised
Figure 1: Our pipeline begins with the design of the JSON schema with the formatting rules, used as the specification for (1)
a template-based method for the generation of random JSON output targets for our task. These outputs are fed into (2) a data
augmentation phase utilizing a LLM to generate multiple inputs corresponding to each previously generated JSON output,
so that we add diversity to how users convey queries. Subsequently, (3) the supervised fine-tuning task for our information
extraction task, (4) the quantization stage for the model inference optimization and (5) the deployment phase culminating
with the integration of the FlyThings® endpoint for the creation of a virtual assistant enhanced with visualizations.
developers team, for their continuous support and feed- Large Language Model Connected with Massive
back to enhance our LLM generation capabilities and APIs, arXiv preprint arXiv:2305.15334 (2023).
integration within their systems. [9] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran,
K. Narasimhan, Y. Cao, React: Synergizing reason-
ing and acting in language models, arXiv preprint
References arXiv:2210.03629 (2022).
[10] A. Parisi, Y. Zhao, N. Fiedel, TALM: Tool Aug-
[1] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke,
mented Language Models, ArXiv abs/2205.12255
E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lund-
(2022). URL: https://api.semanticscholar.org/
berg, et al., Sparks of artificial general intelli-
CorpusID:249017698.
gence: Early experiments with gpt-4, arXiv preprint
[11] T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu,
arXiv:2303.12712 (2023).
M. Lomeli, L. Zettlemoyer, N. Cancedda, T. Scialom,
[2] A. Kulkarni, A. Shivananda, A. Kulkarni, D. Gu-
Toolformer: Language models can teach themselves
divada, LLMs for Enterprise and LLMOps, Apress,
to use tools, arXiv preprint arXiv:2302.04761 (2023).
Berkeley, CA, 2023, pp. 117–154. URL: https://doi.
[12] R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang,
org/10.1007/978-1-4842-9994-4_7. doi:10.1007/
C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saunders,
978-1-4842-9994-4_7.
X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. But-
[3] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei,
ton, M. Knight, B. Chess, J. Schulman, WebGPT:
I. Sutskever, Language Models are Unsuper-
Browser-assisted question-answering with human
vised Multitask Learners, 2019. URL: https://api.
feedback, CoRR abs/2112.09332 (2021). URL: https:
semanticscholar.org/CorpusID:160025533.
//arxiv.org/abs/2112.09332. arXiv:2112.09332.
[4] J. Wei, M. Bosma, V. Zhao, K. Guu, A. W. Yu,
[13] S. Yao, R. Rao, M. Hausknecht, K. Narasimhan,
B. Lester, N. Du, A. M. Dai, Q. V. Le, Fine-
Keep CALM and explore: Language models for
tuned Language Models Are Zero-Shot Learn-
action generation in text-based games, in:
ers, ArXiv abs/2109.01652 (2021). URL: https://api.
B. Webber, T. Cohn, Y. He, Y. Liu (Eds.), Pro-
semanticscholar.org/CorpusID:237416585.
ceedings of the 2020 Conference on Empiri-
[5] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, Y. Iwa-
cal Methods in Natural Language Processing
sawa, Large Language Models are Zero-Shot Rea-
(EMNLP), Association for Computational Linguis-
soners, ArXiv abs/2205.11916 (2022). URL: https:
tics, Online, 2020, pp. 8736–8754. URL: https:
//api.semanticscholar.org/CorpusID:249017743.
//aclanthology.org/2020.emnlp-main.704. doi:10.
[6] D. Dai, Y. Sun, L. Dong, Y. Hao, S. Ma, Z. Sui, F. Wei,
18653/v1/2020.emnlp-main.704.
Why Can GPT Learn In-Context? Language Mod-
[14] J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. H.
els Implicitly Perform Gradient Descent as Meta-
hsin Chi, F. Xia, Q. Le, D. Zhou, Chain of Thought
Optimizers (2023). arXiv:2212.10559.
Prompting Elicits Reasoning in Large Language
[7] P. Lewis, E. Perez, A. Piktus, F. Petroni,
Models, ArXiv abs/2201.11903 (2022). URL: https:
V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t.
//api.semanticscholar.org/CorpusID:246411621.
Yih, T. Rocktäschel, et al., Retrieval-augmented
[15] W. Huang, P. Abbeel, D. Pathak, I. Mordatch, Lan-
generation for knowledge-intensive nlp tasks,
guage Models as Zero-Shot Planners: Extracting
Advances in Neural Information Processing
Actionable Knowledge for Embodied Agents, CoRR
Systems 33 (2020) 9459–9474.
abs/2201.07207 (2022). URL: https://arxiv.org/abs/
[8] S. G. Patil, T. Zhang, X. Wang, J. E. Gonzalez, Gorilla:
2201.07207. arXiv:2201.07207. A Survey on Hallucination in Large Language Mod-
[16] Anthropic, Long context prompting for claude els: Principles, Taxonomy, Challenges, and Open
2.1, 2023. URL: https://www.anthropic.com/news/ Questions, ArXiv abs/2311.05232 (2023). URL: https:
claude-2-1-prompting. //api.semanticscholar.org/CorpusID:265067168.
[17] Q. Xu, F. Hong, B. Li, C. Hu, Z. Chen, J. Zhang, [27] T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettle-
On the Tool Manipulation Capability of Open- moyer, QLoRA: Efficient Finetuning of Quantized
source Large Language Models, arXiv preprint LLMs, ArXiv abs/2305.14314 (2023). URL: https:
arXiv:2305.16504 (2023). //api.semanticscholar.org/CorpusID:258841328.
[18] Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, Y. Lin, [28] E. J. Hu, yelong shen, P. Wallis, Z. Allen-Zhu, Y. Li,
X. Cong, X. Tang, B. Qian, et al., Toolllm: Facilitat- S. Wang, L. Wang, W. Chen, LoRA: Low-Rank
ing large language models to master 16000+ real- Adaptation of Large Language Models, in: In-
world apis, arXiv preprint arXiv:2307.16789 (2023). ternational Conference on Learning Representa-
[19] D. Xu, W. Chen, W. Peng, C. Zhang, T. Xu, X. Zhao, tions, 2022. URL: https://openreview.net/forum?id=
X. Wu, Y. Zheng, E. Chen, Large Language Mod- nZeVKeeFYf9.
els for Generative Information Extraction: A Sur- [29] W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H.
vey, ArXiv abs/2312.17617 (2023). URL: https://api. Yu, J. E. Gonzalez, H. Zhang, I. Stoica, Efficient
semanticscholar.org/CorpusID:266690657. Memory Management for Large Language Model
[20] A. Dunn, J. Dagdelen, N. Walker, S. Lee, A. S. Serving with PagedAttention, in: Proceedings of
Rosen, G. Ceder, K. Persson, A. Jain, Structured the ACM SIGOPS 29th Symposium on Operating
information extraction from complex scientific Systems Principles, 2023.
text with fine-tuned large language models, 2022. [30] E. Frantar, S. Ashkboos, T. Hoefler, D. Alistarh,
arXiv:2212.05238. GPTQ: Accurate Post-training Compression for
[21] J. Li, B. Hui, G. Qu, J. Yang, B. Li, B. Li, B. Wang, Generative Pretrained Transformers, arXiv preprint
B. Qin, R. Cao, R. Geng, N. Huo, X. Zhou, C. Ma, arXiv:2210.17323 (2022).
G. Li, K. C. C. Chang, F. Huang, R. Cheng, Y. Li, [31] J. Lin, J. Tang, H. Tang, S. Yang, X. Dang, S. Han,
Can LLM Already Serve as A Database Interface? AWQ: Activation-aware Weight Quantization for
A BIg Bench for Large-Scale Database Grounded LLM Compression and Acceleration, arXiv (2023).
Text-to-SQLs, 2023. arXiv:2305.03111. [32] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wain-
[22] R. Srivastava, Defog SQLCoder, 2023. URL: https: wright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama,
//github.com/defog-ai/sqlcoder. A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller,
[23] M. Josifoski, N. De Cao, M. Peyrard, F. Petroni, M. Simens, A. Askell, P. Welinder, P. Christiano,
R. West, GenIE: Generative information extraction, J. Leike, R. Lowe, Training language models to
in: M. Carpuat, M.-C. de Marneffe, I. V. Meza Ruiz follow instructions with human feedback, 2022.
(Eds.), Proceedings of the 2022 Conference of the arXiv:2203.02155.
North American Chapter of the Association for [33] R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D.
Computational Linguistics: Human Language Tech- Manning, C. Finn, Direct Preference Optimization:
nologies, Association for Computational Linguis- Your Language Model is Secretly a Reward Model,
tics, Seattle, United States, 2022, pp. 4626–4643. 2023. arXiv:2305.18290.
URL: https://aclanthology.org/2022.naacl-main.342.
doi:10.18653/v1/2022.naacl-main.342.
[24] Mistral AI, Mixtral of experts, 2023. A. Flythings
https://mistral.ai/news/mixtral-of-experts/
and https://huggingface.co/mistralai/ The FlyThings® platform is an all-in-one tool for IoT
Mixtral-8x7B-Instruct-v0.1. device management for many different productive sec-
[25] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, tors. It is designed for the analysis and forecasting of
J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, data records of IoT devices, considering any of the data
G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, types available at scale. FlyThings® handles a wide va-
G. Krueger, T. Henighan, R. Child, A. Ramesh, riety of sensors, systems and applications for specific
D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, use cases including, but not limited to, smart indus-
E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, tries or intelligent energy. FlyThings® helps in the de-
C. Berner, S. McCandlish, A. Radford, I. Sutskever, cision making process, yielding better results for en-
D. Amodei, Language Models are Few-Shot Learn- terprises, with ad hoc offerings including modular Big
ers, 2020. arXiv:2005.14165. Data as a Service (BDaaS) with standard APIs for data
[26] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, management and visualization. Check https://itg.es/en/
H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, T. Liu, monitoring-iot-platform-flythings/ for more details.