Slow Docker builds for ML projects are almost always caused by the same three mistakes: no .dockerignore file, dependencies reinstalling on every code change, and model weights downloading at container startup instead of build time. Fixing these three things cuts most build times from minutes to seconds.
Add a .dockerignore File First
If your project has no .dockerignore, Docker sends everything in the project directory to the build daemon before it runs a single instruction. Virtual environments, Git history, cached model files, and local datasets all get included. On an ML project, this can easily be several gigabytes of transfer before the build even starts.
Create a .dockerignore file in the same directory as your Dockerfile:
.git/
__pycache__/
*.pyc
.venv/
venv/
env/
*.log
data/
models/
.pytest_cache/
.coverage
.idea/
.vscode/
The models/ and data/ Lines are important. If you have local copies of model weights or datasets sitting in the project directory, they will be sent to the daemon on every build without this. A single line in .dockerignore eliminates that.
Fix the Layer Order in Your Dockerfile
Docker caches each layer and reuses it on the next build if nothing above it has changed. The mistake that kills build speed is copying application code before installing dependencies.
This is the wrong order:
COPY . /app
RUN pip install -r requirements.txt
Every time any file in the project changes, a single line in a Python file, a readme edit, or Docker invalidates the cache at the COPY step and reinstalls all dependencies from scratch. For a PyTorch or TensorFlow project, that reinstall alone takes several minutes.
The correct order puts dependencies first:
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
Now Docker only reinstalls dependencies when requirements.txt itself changes. Code changes hit only the final COPY layer and rebuild in seconds.
Apply the same principle to the full dependency stack. Things that change rarely go at the top; things that change constantly go at the bottom:
FROM python:3.10-slim
# System dependencies change rarely
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Python dependencies change occasionally
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Model weights change periodically
COPY ./models /models
# Application code changes constantly
COPY ./app /app
CMD ["python", "serve.py"]
Pre-Download Model Weights During the Build
The most expensive startup delay in ML containers is downloading model weights at runtime. A 500MB model adds two to five minutes to every container start depending on network speed. A multi-gigabyte model makes the container effectively unusable for auto-scaling.
The fix is to download weights during the build and bake them into the image. The container starts with the weights already present.
For Hugging Face models:
FROM python:3.10-slim
RUN pip install --no-cache-dir huggingface-hub
RUN mkdir -p /models && \
huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 \
--local-dir /models/all-MiniLM-L6-v2
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY ./app /app
CMD ["python", "serve.py"]
The huggingface-cli tool is part of the Hugging Face Hub CLI. Visit the Hub CLI docs for authentication options if the model requires a token.
The image gets larger by the size of the model weights, but startup time drops from minutes to under thirty seconds. For production deployments where new instances need to start quickly, the tradeoff is worth it without exception.
If model weights update frequently without a full rebuild, mount them as a Docker volume instead:
services:
model-server:
volumes:
- ./models:/models
See How to Read and Write a Docker Compose File if the compose syntax above is unfamiliar. This keeps the image lean but reintroduces network dependency at startup. For stable model versions, baking them in is the better approach.
The Takeaway
Slow Docker builds on ML projects come down to three fixable problems: a missing .dockerignore sending unnecessary files to the build daemon, wrong layer order forcing full dependency reinstalls on every code change, and model weights downloading at runtime instead of build time. Each fix is independent; apply whichever one matches the bottleneck you are hitting.
