Porting Large Language Models To Mobile Devices For Question Answering
Porting Large Language Models To Mobile Devices For Question Answering
Porting Large Language Models To Mobile Devices For Question Answering
Summary: Deploying Large Language Models (LLMs) on mobile devices makes all the capabilities of natural language
processing available on the device. An important use case of LLMs is question answering, which can provide accurate and
contextually relevant answers to a wide array of user queries. We describe how we managed to port state of the art LLMs to
mobile devices, enabling them to operate natively on the device. We employ the llama.cpp framework, a flexible and self-
contained C++ framework for LLM inference. We selected a 6-bit quantized version of the Orca-Mini-3B model with 3 billion
parameters and present the correct prompt format for this model. Experimental results show that LLM inference runs in
interactive speed on a Galaxy S21 smartphone and that the model delivers high-quality answers to user queries related to
questions from different subjects like politics, geography or history.
Keywords: deep learning, large language models, question answering, mobile devices, termux
1 2
https://huggingface.co/docs/hub/models-the-hub https://github.com/ggerganov/llama.cpp
6th International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI' 2024),
17-19 April 2024, Funchal (Madeira Island), Portugal
3 5
https://termux.dev/en/ https://github.com/Genymobile/scrcpy
4 6
https://f-droid.org/ https://huggingface.co/pankajmathur/orca_mini_3b
6th International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI' 2024),
17-19 April 2024, Funchal (Madeira Island), Portugal