How to Run AI Models Locally with Ollama: Deploy LLMs and Debug APIs in Minutes

#api #tutorial #learning #ai

This article introduces how to download Ollama and deploy AI large language models (such as DeepSeek-R1, Llama 3.2, etc.). Using Ollama - an open-source large language model service tool - you can run other open-source AI models locally on your computer. We'll provide step-by-step instructions for installation and setup to enable seamless interaction with AI models.

Table of Contents

Step 1: Download and Install Ollama

Step 2: Install AI Models

Step 3: Interact with AI Models

Step 4: Optional - Simplify Workflows with GUI/Web Tools

Step 5: Debug the Local AI API

Step 1: Download and Install Ollama

Visit Ollama's official GitHub repository: https://github.com/ollama/ollama
Download the version corresponding to your operating system (this tutorial uses macOS as an example; Windows follows similar steps).

3.Complete the installation.

After installation, open the Terminal (on macOS, press F4 and search for "Terminal"). Enter ollama - if the following prompt appears, installation was successful.

Step 2: Install AI Models

After installing Ollama, download the desired AI model using these commands:

ollama run Llama3.2

Available models (replace Llama3.2 with your preferred model):

Model	Parameters	Size	Download
DeepSeek-R1	7B	4.7GB	`ollama run deepseek-r1`
DeepSeek-R1	671B	404GB	`ollama run deepseek-r1:671b`
Llama 3.3	70B	43GB	`ollama run llama3.3`
Llama 3.2	3B	2.0GB	`ollama run llama3.2`
Llama 3.2	1B	1.3GB	`ollama run llama3.2:1b`
Llama 3.2 Vision	11B	7.9GB	`ollama run llama3.2-vision`
Llama 3.2 Vision	90B	55GB	`ollama run llama3.2-vision:90b`
Llama 3.1	8B	4.7GB	`ollama run llama3.1`
Llama 3.1	405B	231GB	`ollama run llama3.1:405b`
Phi 4	14B	9.1GB	`ollama run phi4`
Phi 4 Mini	3.8B	2.5GB	`ollama run phi4-mini`
Gemma 2	2B	1.6GB	`ollama run gemma2:2b`
Gemma 2	9B	5.5GB	`ollama run gemma2`
Gemma 2	27B	16GB	`ollama run gemma2:27b`
Mistral	7B	4.1GB	`ollama run mistral`
Moondream 2	1.4B	829MB	`ollama run moondream`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
LLaVA	7B	4.5GB	`ollama run llava`
Granite-3.2	8B	4.9GB	`ollama run granite3.2`

A progress indicator will appear during download (duration depends on internet speed):

When prompted with "Send a message", you're ready to interact with the model:

Step 3: Interact with Llama3.2

Example interaction (asking "Who are you?"):

Use Control + D to end the current session.
To restart later, simply rerun ollama run Llama3.2.

Step 4: Optional GUI/Web Interface Support

Using a terminal for daily interactions can be inconvenient. For a more user-friendly experience, Ollama’s GitHub repository lists multiple community-driven GUI and web-based tools (e.g., Ollama WebUI, LM Studio). You can explore these options independently, as each project provides its own setup instructions. Here’s a brief overview:

GUI Tools
- Ollama Desktop : Native app for macOS/Windows (supports model management and chat).
- LM Studio : Cross-platform interface with model library integration.
Web Interfaces
- Ollama WebUI : Browser-based chat interface (run locally).
- OpenWebUI : Customizable web dashboard for model interaction.

For details, visit the Ollama GitHub README .

Step 5: Debug the AI API

Ollama exposes a local API by default. Refer to the Ollama API Docs for details.

Below, we will use Apidog to debug the local API generated by Ollama. If you haven't installed Apidog yet, you can download and install it—it's an excellent tool for API debugging, API documentation, API mocking, and automated API testing.

Create a New Request

Copy this cURL command:

curl --location --request POST 'http://localhost:11434/api/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "llama3.2",
    "prompt": "Why is the sky blue?",
    "stream": false
}'