When working on couple of DockerfilesContainerfiles, I was having a situation when I had to clear dockerpodman cache (for various reasons), and the indispensable step is to update or download certain packages from apt. It is often long process, and I was thinking whether there’s a way to speed it up or optimize.

The results of the investigation are below:

Caching apt get downloads

There are two options of caching:

  • mount cache
  • apt-cacher-ng

mount cache

Docker mechanism to cache layers: https://docs.docker.com/build/cache/

Dockerfile extract:

[...]
RUN rm -f /etc/apt/apt.conf.d/docker-clean # needed in ubuntu to persist cache, otherwise the mount won't work!
RUN --mount=type=cache,mode=0755,target=/var/cache/apt \
    apt-get update \
    && apt-get install -yqq --no-install-recommends \
    git gcc make \
    && rm -rf /var/lib/apt/lists/*
[...]

(https://vsupalov.com/buildkit-cache-mount-dockerfile/)

apt-cacher-ng

Acquire::http::Proxy is a config option to set up proxying in apt. However, since it fails in case proxy is not available, an “improved” version was added - Acquire::http::Proxy-Auto-Detect. apt-cacher-ng is a companion to this mechanism - the apt wopuld get pointed to apt-cacher-ng instance, but it will fall back to original source if not available.

apt-cacher-ng is a separate app, most probably running in separate container (or as quadlet). The purpose of the app is to perform man-in-the-middle between apt and source servers. It’ll hold the copy of downloaded .deb. When the file does not exist, it would get downloaded from the server. If it is requested again, it is retrieved from local cache, and we’d get the speedup.

It is a nice addition to mount cache mechanism - even if the mount cache did not get hit, we’d have files cached anyway.

Step by step tutorial how to configure:

  1. Create background service with name:
docker run --name apt-cacher-ng --init -d --restart=always \
--publish 3142:3142 \
--volume /var/cache/apt-cacher-ng:/var/cache/apt-cacher-ng \
sameersbn/apt-cacher-ng

Check whether cacher works by visiting a page via browser - go to http://localhost:3142 and you should see a page. (Replace localhost with ip of your container if needed, e.g. in case of wsl2 or vm).

In case something not working, try with host network:

docker run --name apt-cacher-ng --init -d --restart=always \
--network=host \
--volume /var/cache/apt-cacher-ng:/var/cache/apt-cacher-ng \
sameersbn/apt-cacher-ng

Once you have cacher set up, lets create a Containerfile which will use the caching container. We’ll create Dockerfile and docker-compose.yaml to configure it easily:

example compose.yaml:

services:
  service-1:
    # image: ubuntu
    build:
      # dockerfile: Dockerfile
      context: .
    tty: true
    #  commented out not needed
    # network_mode: bridge
    # hostname: service-2
    # container_name: service-2
    command: " sleep 100000"
    extra_hosts:
      - "host.docker.internal:host-gateway"

Note the “extra_hosts” - this is the mechanism which will make sure the container has access to the host network. It wont work without it.

read more [1] [2]

example Dockerfile:

FROM docker.io/python:3.11-bookworm

RUN apt-get -q update \
    && apt-get install -qy auto-apt-proxy \
    && apt-get install -qy python3-pip git jq curl unzip gcc

Note the important one: the apt-get install auto-apt-proxy is separate from installing other apps. It is imporant, because the apps would be downloaded one by one, and then installed one by one. Having it as separate command would install it. The subsequent command is going to use apt-cacher-ng because the script will be present in the system.

auto-apt-proxy source code

Refs: [1] https://stackoverflow.com/questions/70725881/what-is-the-equivalent-of-add-host-host-docker-internalhost-gateway-in-a-comp [2] https://github.com/microsoft/docker/blob/master/docs/examples/apt-cacher-ng.md [3] https://blog.packagecloud.io/using-apt-cacher-ng-with-ssl-tls/ [4] https://razinj.dev/build-and-run-apt-cacher-ng-proxy-in-docker/ [5] https://medium.com/@TimvanBaarsen/how-to-connect-to-the-docker-host-from-inside-a-docker-container-112b4c71bc66

Caching python (pip) and/or poetry

There are problems with caching pip. It is related to issues like this

Basically, the problem is not yet resolved.

The best way to cache python installs is to do mount cache.

mount cache – pip

Dockerfile extract:

RUN --mount=type=cache,mode=0755,target=/var/cache/pip \
    pip install --cache-dir=/var/cache/pip
    requests [..other packages..]

https://stackoverflow.com/questions/58018300/using-a-pip-cache-directory-in-docker-builds

https://pmac.io/2019/02/multi-stage-dockerfile-and-python-virtualenv/

mount cache – poetry

  1. configure poetry: RUN poetry config cache-dir /var/cache/poetry-cache a. you can set env variable as well: https://python-poetry.org/docs/configuration/#cache-dir
  2. install but prefix with (buildkit) mount entry:
    RUN --mount=type=cache,mode=0755,target=/var/cache/poetry-cache \
    poetry install
    
  3. it should work after rebuild with --no-cache.

poetry setup for docker https://github.com/orgs/python-poetry/discussions/1879#discussioncomment-216865 https://github.com/python-poetry/poetry/issues/525#issuecomment-1227231432

Other stuff, to be cleaned up later

# docker multistage builds

https://charlesxu.io/docker_multi_stage/

https://medium.com/@tonistiigi/advanced-multi-stage-build-patterns-6f741b852fae

https://www.gasparevitta.com/posts/advanced-docker-multistage-parallel-build-buildkit/

https://pythonspeed.com/articles/multi-stage-docker-python/

buildkit cache mounts

https://dev.doroshev.com/blog/docker-mount-type-cache/

mode=0755 needed problem:

https://github.com/moby/buildkit/issues/2447

https://stackoverflow.com/questions/61459775/docker-buildkit-mount-type-cache-not-working-why

https://docs.docker.com/build/cache/

https://forums.docker.com/t/how-to-delete-build-cache-buildkit-experimental/70714

https://github.com/moby/buildkit/issues/3155

mount type cache in depth https://github.com/moby/buildkit/issues/1673

docker network ping by hostname between containers

default bridge - d not work user defined bridge network - can access via hostname because there is a dns server https://stackoverflow.com/a/53364345

docker compose for testing the cache

services: service-1: image: busybox # container_name: service-1 # hostname: service-1 network_mode: bridge command: “ping service-2” extra_hosts: - “host.docker.internal:host-gateway” service-2: image: ubuntu tty: true network_mode: bridge # hostname: service-2 # container_name: service-2 command: “ping service-1” extra_hosts: - “host.docker.internal:host-gateway”

networks: mynet: driver: bridge

https://stackoverflow.com/questions/31324981/how-to-access-host-port-from-docker-container/ https://medium.com/@TimvanBaarsen/how-to-connect-to-the-docker-host-from-inside-a-docker-container-112b4c71bc66 https://www.howtogeek.com/devops/how-to-connect-to-localhost-within-a-docker-container/

https://docs.docker.com/storage/bind-mounts/ mount type=cache missing in examples

and service-1 can ping to host by: ping host.docker.interal so that, it can access the apt cacher instance available at (localhost):3142 -> host.docker.internal:3142 however, it needs to be specified first: