Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

Meyer, Lars-Peter; Frey, Johannes; Junghanns, Kurt; Brei, Felix; Bulert, Kirill; Gründer-Fahrer, Sabine; Martin, Michael

Computer Science > Artificial Intelligence

arXiv:2308.16622 (cs)

[Submitted on 31 Aug 2023]

Title:Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

Authors:Lars-Peter Meyer, Johannes Frey, Kurt Junghanns, Felix Brei, Kirill Bulert, Sabine Gründer-Fahrer, Michael Martin

View PDF

Abstract:As the field of Large Language Models (LLMs) evolves at an accelerated pace, the critical need to assess and monitor their performance emerges. We introduce a benchmarking framework focused on knowledge graph engineering (KGE) accompanied by three challenges addressing syntax and error correction, facts extraction and dataset generation. We show that while being a useful tool, LLMs are yet unfit to assist in knowledge graph generation with zero-shot prompting. Consequently, our LLM-KG-Bench framework provides automatic evaluation and storage of LLM responses as well as statistical data and visualization tools to support tracking of prompt engineering and model performance.

Comments:	To be published in SEMANTICS 2023 poster track proceedings. SEMANTICS 2023 EU: 19th International Conference on Semantic Systems, September 20-22, 2023, Leipzig, Germany
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Databases (cs.DB)
Cite as:	arXiv:2308.16622 [cs.AI]
	(or arXiv:2308.16622v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2308.16622

Submission history

From: Lars-Peter Meyer [view email]
[v1] Thu, 31 Aug 2023 10:31:19 UTC (429 KB)

Computer Science > Artificial Intelligence

Title:Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators