How to Fix Slow Docker Builds for ML Containers

by

Faveren Caleb

How to Fix Slow Docker Builds for ML Containers

Slow Docker builds for ML projects are almost always caused by the same three mistakes: no .dockerignore file, dependencies reinstalling on every code change, and model weights downloading at container startup instead of build time. Fixing these three things cuts most build times from minutes to seconds.

Add a .dockerignore File First

If your project has no .dockerignore, Docker sends everything in the project directory to the build daemon before it runs a single instruction. Virtual environments, Git history, cached model files, and local datasets all get included. On an ML project, this can easily be several gigabytes of transfer before the build even starts.

Create a .dockerignore file in the same directory as your Dockerfile:

.git/
__pycache__/
*.pyc
.venv/
venv/
env/
*.log
data/
models/
.pytest_cache/
.coverage
.idea/
.vscode/

The models/ and data/ Lines are important. If you have local copies of model weights or datasets sitting in the project directory, they will be sent to the daemon on every build without this. A single line in .dockerignore eliminates that.

Fix the Layer Order in Your Dockerfile

Docker caches each layer and reuses it on the next build if nothing above it has changed. The mistake that kills build speed is copying application code before installing dependencies.

This is the wrong order:

COPY . /app
RUN pip install -r requirements.txt

Every time any file in the project changes, a single line in a Python file, a readme edit, or Docker invalidates the cache at the COPY step and reinstalls all dependencies from scratch. For a PyTorch or TensorFlow project, that reinstall alone takes several minutes.

The correct order puts dependencies first:

FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

Now Docker only reinstalls dependencies when requirements.txt itself changes. Code changes hit only the final COPY layer and rebuild in seconds.

Apply the same principle to the full dependency stack. Things that change rarely go at the top; things that change constantly go at the bottom:

FROM python:3.10-slim

# System dependencies change rarely
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Python dependencies change occasionally
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Model weights change periodically
COPY ./models /models

# Application code changes constantly
COPY ./app /app

CMD ["python", "serve.py"]

Pre-Download Model Weights During the Build

The most expensive startup delay in ML containers is downloading model weights at runtime. A 500MB model adds two to five minutes to every container start depending on network speed. A multi-gigabyte model makes the container effectively unusable for auto-scaling.

The fix is to download weights during the build and bake them into the image. The container starts with the weights already present.

For Hugging Face models:

FROM python:3.10-slim

RUN pip install --no-cache-dir huggingface-hub

RUN mkdir -p /models && \
    huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 \
    --local-dir /models/all-MiniLM-L6-v2

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY ./app /app

CMD ["python", "serve.py"]

The huggingface-cli tool is part of the Hugging Face Hub CLI. Visit the Hub CLI docs for authentication options if the model requires a token.

The image gets larger by the size of the model weights, but startup time drops from minutes to under thirty seconds. For production deployments where new instances need to start quickly, the tradeoff is worth it without exception.

If model weights update frequently without a full rebuild, mount them as a Docker volume instead:

services:
  model-server:
    volumes:
      - ./models:/models

See How to Read and Write a Docker Compose File if the compose syntax above is unfamiliar. This keeps the image lean but reintroduces network dependency at startup. For stable model versions, baking them in is the better approach.

The Takeaway

Slow Docker builds on ML projects come down to three fixable problems: a missing .dockerignore sending unnecessary files to the build daemon, wrong layer order forcing full dependency reinstalls on every code change, and model weights downloading at runtime instead of build time. Each fix is independent; apply whichever one matches the bottleneck you are hitting.

Leave a Comment