I’ve seen images that accidentally install tensorflow twice, too. It wouldn’t be so bad if large files were shared between layers but they aren’t. It’s bad enough that I’m building an alternative registry and snapshotter with file level dedupe to deal with it.
Sounds like it would be useful. Many common dev workflows started falling apart when it's not just tiny code files they need to deal with. In the python world, uv has helped massively, with pip we were seeing 30+ min build times on fairly simple images with torch
The other issue is that it's not like they can go back and run this deduplication after the fact. Image layers are stored as a single gzipped tar of the contents on the layer. You can't just pull a single file out of that. If you go and reorganize as multiple gzip streams, you'll change the digest of the layer. A new registry could do that reorganization on import (and return the digest), or provide tooling to build the layers in the right format to begin with.
I'm well aware of `COPY --link`, it doesn't solve the problem. I'm a heavy heavy user of it, combined with throwaway build stages. `COPY --link` won't help my `apt install` commands.
The use case here isn't `FROM python:3.10`, it's `FROM ubuntu; RUN apt install -y vim wget curl software-properties-common python3.10`/`RUN rosdep install`/`RUN --mount=type=cache,target=/root/.cache/uv --mount=type=bind,source=uv.lock,target=uv.lock --mount=type=bind,source=pyproject.toml,target=pyproject.toml uv sync --locked --no-install-project`. All of those dependencies get merged onto a single layer that isn't shared with anything else. You'd better hope something like tensorflow isn't one of those dependencies.
Meta: I think your example code would benefit from being a code block; in HN this is done by prefixing with 2 spaces.
eg.
FROM ubuntu
RUN apt install -y vim wget curl software-properties-common python3.10
RUN rosdep install
RUN --mount=type=cache,target=/root/.cache/uv --mount=type=bind,source=uv.lock,target=uv.lock --mount=type=bind,source=pyproject.toml,target=pyproject.toml uv sync --locked --no-install-project
reply