Cross-platform wrapper scripts for llama-server that simplify the process of listing and running .gguf models with comprehensive configuration management through environment variables and/or configuration files.
Available Scripts:
llms.ps1- PowerShell script for Windows, Linux, and macOS (PowerShell Core)llms- Bash script for UNIX systems (Linux, macOS, WSL)
This script serves as an intelligent wrapper around llama-server that:
- Simplifies model management: Automatically discovers and runs GGUF models from configured directories
- Provides flexible configuration: Supports both environment variables and INI file for all parameters
- Enables partial matching: Find models using partial names without typing full filenames
- Handles multi-modal models: Automatically detects and loads companion
.mmproj-*.gguffiles - Offers dry-run capability: Preview commands before execution
- Maintains consistent defaults: Fallback values ensure the script works out-of-the-box
- Cross-platform support: PowerShell script for Windows, bash script for UNIX systems
- Smart model discovery: Searches multiple directories for
.gguffiles - Partial name matching: Find models without typing complete filenames
- Multi-modal support: Automatically loads companion
.mmproj-*.gguffiles - Flexible configuration: Environment variables override INI file settings with platform-appropriate paths
- Fallback defaults: Works out-of-the-box with sensible defaults
- Dry-run capability: Preview commands before execution
- Color-coded output: Enhanced terminal display with syntax highlighting
- Error handling: Clear error messages for missing configurations and models
- Configuration priority: Script directory configs override user configs
Both scripts share identical command syntax and functionality.
list: List all available.ggufmodels in configured directories<partial_model_name>: Partial name to match against model files (case-insensitive)<context_size>: Required - Context window size for the model[llama-server args...]: Optional additional arguments passed tollama-server[--dry-run]: Preview the command without executing it
llms listllms <partial_model_name> <context_size>example:
llms Mistral-Small-3.1-24B-Instruct-2503-UD-Q4 64000The script automatically detects companion .mmproj-*.gguf files and adds appropriate parameters.
llms <partial_model_name> <context_size> [llama-server args...]example with custom chat template:
llms GLM-4-32B 24000 --chat-template-file ~/llm/chat-template-chatml.jinjaPreview the command that would be executed without running it:
llms Devstral-Small-2505-UD-Q4 100000 --no-webui --dry-run- Environment Variables (always override everything)
- llms.ini in script directory (highest file priority)
- llms.ini in user config directory:
- Windows:
%USERPROFILE%\AppData\Local\llms.ini - Linux/macOS:
~/.config/llms.ini(or$XDG_CONFIG_HOME/llms.ini)
- Windows:
- Fallback values in code
All configuration options can be set via environment variables:
| Environment Variable | Description | Fallback Value |
|---|---|---|
LLMS_MODELS_DIRS |
Model search directories (comma-separated) | Required - no fallback |
LLMS_HOST |
IP address for llama-server to bind to | 127.0.0.1 |
LLMS_PORT |
Port to listen on | 8080 |
LLMS_API_KEY |
API key for authentication | secret |
LLMS_CACHE_TYPE_K |
Cache type for K | q8_0 |
LLMS_CACHE_TYPE_V |
Cache type for V | q8_0 |
LLMS_UBATCH_SIZE |
Micro-batch size | 1024 |
LLMS_N_GPU_LAYERS |
Number of GPU layers | 999 |
The scripts look for llms.ini in these locations (in priority order):
Priority 1: Script Directory
- Same directory as the script (
./llms.ini)
Priority 2: User Config Directory
- Windows:
%USERPROFILE%\AppData\Local\llms.ini - Linux/macOS:
~/.config/llms.ini(or$XDG_CONFIG_HOME/llms.ini)
# Model directories (comma-separated paths)
ModelsDirs = C:\path\to\models1,D:\path\to\models2
# Performance settings
CacheTypeK = q8_0
CacheTypeV = q8_0
UbatchSize = 1024
NGpuLayers = 99
# Network settings (can also be set via ENV)
Host = 0.0.0.0
Port = 8080
ApiKey = secret| INI Directive | Environment Variable | Description | Default |
|---|---|---|---|
ModelsDirs |
LLMS_MODELS_DIRS |
Comma-separated list of model directories | Required |
CacheTypeK |
LLMS_CACHE_TYPE_K |
Cache type for K tensors | q8_0 |
CacheTypeV |
LLMS_CACHE_TYPE_V |
Cache type for V tensors | q8_0 |
UbatchSize |
LLMS_UBATCH_SIZE |
Micro-batch size for processing | 1024 |
NGpuLayers |
LLMS_N_GPU_LAYERS |
Number of layers to offload to GPU | 999 |
Host |
LLMS_HOST |
Server bind address | 127.0.0.1 |
Port |
LLMS_PORT |
Server port | 8080 |
ApiKey |
LLMS_API_KEY |
API authentication key | secret |
PowerShell:
$env:LLMS_MODELS_DIRS = "C:\models,D:\ai-models"
$env:LLMS_PORT = 8081
$env:LLMS_N_GPU_LAYERS = 50
llms mistral 32000Bash:
LLMS_MODELS_DIRS="/home/user/models,/opt/ai-models" \
LLMS_PORT=8080 \
LLMS_N_GPU_LAYERS=50 \
llms mistral 32000llms.ini:
ModelsDirs = C:\models,D:\ai-models
# or Linux/macOS paths
#ModelsDirs = /home/user/models,/opt/ai-models
UbatchSize = 2048
NGpuLayers = 40override via environment:
PowerShell:
$env:LLMS_PORT = 8081
llms mistral 32000Bash:
LLMS_PORT=8081 \
llms mistral 32000# Model locations
ModelsDirs = C:\Users\username\models,D:\shared-models,E:\large-models
# Performance tuning
CacheTypeK = q8_0
CacheTypeV = q8_0
UbatchSize = 1024
NGpuLayers = 99
# Server settings
Host = 0.0.0.0
Port = 8080
ApiKey = my-secret-keyBoth scripts use identical syntax:
llms list
llms <partial_model_name> <context_size> [llama-server args...] [--dry-run]This project uses Pester v5 for testing the PowerShell script. To run the tests:
-
Ensure Pester is installed:
Install-Module -Name Pester -Force -Scope CurrentUser
-
Run the tests:
Invoke-Pester ./llms.tests.ps1
The bash script has been manually tested on macOS and should work on Linux systems. Automated testing for the bash script is planned for future releases.


