Install Ollama with Docker to get a local LLM server that runs entirely on your own hardware, persists downloaded models across updates, and exposes an OpenAI-compatible API on port 11434, no Python environment required.
Ollama handles everything model-related: downloading weights, managing quantization, and serving inference through a clean REST API. Running it inside Docker means the host system stays untouched, and your models survive container rebuilds because they live in a volume, not the container itself.
What You Need Before Starting
You need Docker and Docker Compose installed, the ability to run Docker commands without sudo (your user is in the docker group), and at least 8 GB of RAM with 10 GB of free disk space.
Create the Project Directory
mkdir -p ~/homelab/ollama
cd ~/homelab/ollama
Write the Compose File
Create docker-compose.yml:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ./ollama_data:/root/.ollama
environment:
- TZ=America/New_York
Replace America/New_York with your timezone. Run timedatectl if you are unsure what yours is.
The volume mount ./ollama_data:/root/.ollama is the critical line. Ollama stores all downloaded models at /root/.ollama inside the container. Mapping that to a local directory means pulling a new image version later will not touch your models; they stay exactly where they are.
Start the Container
docker compose up -d
Verify It’s Running
Check the container status:
docker compose ps
The ollama service should show as Up. Then confirm the API is responding:
curl http://localhost:11434/api/tags
Expected response:
{"models":[]}
No models yet, but the API is live.
Pull Your First Model
docker exec -it ollama ollama pull qwen2.5:3b
Qwen2.5 3B is a practical starting point about 2 GB to download, runs on CPU without issues, and handles general instruction tasks well. The pull output shows layer-by-layer progress and confirms with success when complete.
Verify it landed:
docker exec -it ollama ollama list
NAME ID SIZE MODIFIED
qwen2.5:3b 123abc456def 2.1 GB 2 minutes ago
Run a Quick Test
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5:3b",
"prompt": "What is Docker?",
"stream": false
}'
The response JSON will include a response field with the model’s output. If you get a connection refused error instead, the container is not up yet; run docker compose ps to check its status.
What Port 11434 Means for Your Network
Port 11434 is Ollama’s default API port. The Compose file maps it as 11434:11434 host port on the left, container port on the right. Any tool or service that can reach your server on that port can now send inference requests, including local scripts, other containers on the same Docker network, or applications configured to use a local Ollama endpoint.
The API is not authenticated by default. Do not expose this port directly to the internet. For remote access, route it through a VPN tunnel or a reverse proxy with authentication in front of it.
The Takeaway
Ollama is now running in Docker with a persistent volume protecting your models and a live API on port 11434. You pulled a model, confirmed it is loaded, and ran a generation request against it. The server will restart automatically with Docker and survive image updates without losing any model data.
