Arabic Stable LM: Adapting Stable LM 2 1.6B to Arabic

Alyafeai, Zaid; Pieler, Michael; Teufel, Hannah; Tow, Jonathan; Bellagente, Marco; Phung, Duy; Pinnaparaju, Nikhil; Adithyan, Reshinth; Rocha, Paulo; Zhuravinskyi, Maksym; Riquelme, Carlos

Computer Science > Computation and Language

arXiv:2412.04277 (cs)

[Submitted on 5 Dec 2024]

Title:Arabic Stable LM: Adapting Stable LM 2 1.6B to Arabic

Authors:Zaid Alyafeai, Michael Pieler, Hannah Teufel, Jonathan Tow, Marco Bellagente, Duy Phung, Nikhil Pinnaparaju, Reshinth Adithyan, Paulo Rocha, Maksym Zhuravinskyi, Carlos Riquelme

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have shown impressive results in multiple domains of natural language processing (NLP) but are mainly focused on the English language. Recently, more LLMs have incorporated a larger proportion of multilingual text to represent low-resource languages. In Arabic NLP, several Arabic-centric LLMs have shown remarkable results on multiple benchmarks in the past two years. However, most Arabic LLMs have more than 7 billion parameters, which increases their hardware requirements and inference latency, when compared to smaller LLMs. This paper introduces Arabic Stable LM 1.6B in a base and chat version as a small but powerful Arabic-centric LLM. Our Arabic Stable LM 1.6B chat model achieves impressive results on several benchmarks beating multiple models with up to 8x the parameters. In addition, we show the benefit of mixing in synthetic instruction tuning data by augmenting our fine-tuning data with a large synthetic dialogue dataset.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.04277 [cs.CL]
	(or arXiv:2412.04277v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.04277

Submission history

From: Zaid Alyafeai Mr [view email]
[v1] Thu, 5 Dec 2024 15:59:29 UTC (4,943 KB)

Computer Science > Computation and Language

Title:Arabic Stable LM: Adapting Stable LM 2 1.6B to Arabic

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Arabic Stable LM: Adapting Stable LM 2 1.6B to Arabic

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators