Hacker Newsnew | past | comments | ask | show | jobs | submit | a_t48's commentslogin

That's neat, though dragging on the timeline wrecked my browser history

A docker/container registry that deduplicates at the file level instead of the layer level. Faster pushes, cheaper storage costs.

This is neat. I’m about to dive into snapshooters myself, any pitfalls to watch out for?

I’ve seen images that accidentally install tensorflow twice, too. It wouldn’t be so bad if large files were shared between layers but they aren’t. It’s bad enough that I’m building an alternative registry and snapshotter with file level dedupe to deal with it.

Sounds like it would be useful. Many common dev workflows started falling apart when it's not just tiny code files they need to deal with. In the python world, uv has helped massively, with pip we were seeing 30+ min build times on fairly simple images with torch

uv is one of my inspirations. Take a familiar interface, do the same thing but better/faster.

The other issue is that it's not like they can go back and run this deduplication after the fact. Image layers are stored as a single gzipped tar of the contents on the layer. You can't just pull a single file out of that. If you go and reorganize as multiple gzip streams, you'll change the digest of the layer. A new registry could do that reorganization on import (and return the digest), or provide tooling to build the layers in the right format to begin with.

I'm well aware of `COPY --link`, it doesn't solve the problem. I'm a heavy heavy user of it, combined with throwaway build stages. `COPY --link` won't help my `apt install` commands.

The use case here isn't `FROM python:3.10`, it's `FROM ubuntu; RUN apt install -y vim wget curl software-properties-common python3.10`/`RUN rosdep install`/`RUN --mount=type=cache,target=/root/.cache/uv --mount=type=bind,source=uv.lock,target=uv.lock --mount=type=bind,source=pyproject.toml,target=pyproject.toml uv sync --locked --no-install-project`. All of those dependencies get merged onto a single layer that isn't shared with anything else. You'd better hope something like tensorflow isn't one of those dependencies.


Meta: I think your example code would benefit from being a code block; in HN this is done by prefixing with 2 spaces.

eg.

  FROM ubuntu
  RUN apt install -y vim wget curl software-properties-common python3.10
  RUN rosdep install
  RUN --mount=type=cache,target=/root/.cache/uv --mount=type=bind,source=uv.lock,target=uv.lock --mount=type=bind,source=pyproject.toml,target=pyproject.toml uv sync --locked --no-install-project

They were intended to be three separate examples but point taken, yes, I should have

What? It’s much much better now, you can just use uv. Yeah, it’s yet another package manager, but it does it well.

Or go up a rung or two on the abstraction ladder, and use mise to manage all the things (node, npm, python, etc).

Squashing the image means you end up duplicating all those files across images though, unless I'm misunderstanding.

You can also docker save + docker load :)

What's your alternative here?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: