More

bitbang · 2025-12-29T17:19:58 1767028798

bitbang · 2025-08-30T13:28:24 1756560504

Devices should offer a local signing cert, where you can sign an app for that device only. Then make the app signing process enforce binding agreement that you assume all responsibility related to the app.

bitbang · 2025-08-25T21:14:24 1756156464

For anybody on an android phone,I highly recommend https://github.com/aj3423/SpamBlocker

It is highly configurable with every feature I've ever wanted but could never find for call filtering in the the app store. I've essentially set mine so that if the calling number is not in my contacts, or is not a number I've called in the last 90 days, the phone never runs and the call is sent to voice mail. But it supports lots of other mechanisms to filter with like regex, or how many times the number has called within a given timeframe.

bitbang · 2025-05-29T15:52:14 1748533934

That’s not what that means at all, your inserting modern values into the parable. In the culture of that time, the rich were viewed with high regard. The understanding was that if you were rich, then clearly you were in a favorable relationship to God because he was blessing you with wealth. With that understanding, the sentence that directly follows the parable makes a lot more sense: “When the disciples heard this, they were greatly astonished, saying, ‘Who then can be saved?’” A modern telling of the parable would replace the rich man with a monk who’s taken a vow of poverty to run an orphanage in a God-forsaken third world country. The parable was intended to portray an absurdly impossible standard to entry; the whole point being that human merit, status, or morality, regardless of however the cultural context may define that, does not afford one any distinctive advantage before God.

akomtu · 2025-05-29T19:56:36 1748548596

The rich is firmly attached to wordly things, they would rather sink with their gold than let it go. The monk that you've described is attached to his self by training it with sophisticated hardships. He hoards inner peace just like the rich hoards gold. Both are practicing the culture of personality. They need to leave that baggage behind, their self-centered life and their polished personas, and reorient their life around helping others. Once they do this, an enormous internal conflict will emerge - the struggle between their selfish and selfless sides, and at the end of this path they'll enter the kingdom of God.

Those who want to climb to the mountain top need to leave everything behind. The higher they climb, the longer will be the fall if they look back for a moment and slip on this narrow path, longing for what they left behind.

michaelmrose · 2025-05-29T16:02:43 1748534563

Got some links to support this interpretation?

bitbang · 2025-05-27T19:04:12 1748372652

Why is the footer metadata not sufficient for this need? The metadata should contain the min and max timestamp values from the respective column of interest, so that when executing a query, the query tool should be optimizing its query by reading the metadata to determine if that parquet file should be read or not depending on what time range is in the query.

amluto · 2025-05-27T21:55:11 1748382911

Because the footer metadata is in the Parquet file, which is already far too late to give an efficient query.

If I have an S3 bucket containing five years worth of Parquet files, each covering a few days worth of rows, and I tell my favorite query tool (DuckDB, etc) about that bucket, then the tool will need to do a partial read (which is multiple operations, I think, since it will need to find the footer and then read the footer) of ~500 files just to find out which ones contain the data of interest. A good query plan would be to do a single list operation on the bucket to find the file names and then to read the file or files needed to answer my query.

Iceberg and Delta Lake (I think -- I haven't actually tried it) can do this, but plain Parquet plus Hive partitioning can't, and I'm not aware of any other lightweight scheme that is well supported that can do it. My personal little query tool (which predates Parquet) can do it just fine by the simple expedient of reading directory names.

Jarwain · 2025-05-28T18:19:12 1748456352

Maybe I'm misunderstanding something about how ducklake works, but isn't that the purpose of the 'catalog database'? To store the metadata about all the files to optimize the query?

In theory, going off of the schema diagram they have, all your files are listed in `data_file`, the timestamp range for that file would be in `file_column_stats`, and that information could be used to decide what files to _actually_ read based on your query.

Whether duckdb's query engine takes advantage of this is a different story, but even if it doesn't Yet it should be possible to do so Eventually.

amluto · 2025-05-29T17:02:31 1748538151

Yes, and this is how basically every “lake” thing works. But all the lake solutions add a lot more complexity than just improving the parquet filename scheme, and all of them require that all the readers and all the writers agree on a particular “lake”.

Jarwain · 2025-05-29T18:47:00 1748544420

That's fair! I guess I see it as trading technical complexity with the human complexity of getting everyone on board with an update to the standard, and getting that standard implemented across the board. It's a lot easier to get my coworkers to just use duckdb as a reader/writer with ducklake than to change the system.

Frankly, I'm not entirely sure what the process of proposing that change to the hive file scheme would even look like

amluto · 2025-05-29T19:02:56 1748545376

> Frankly, I'm not entirely sure what the process of proposing that change to the hive file scheme would even look like

Maybe convince DuckDB and/or clickhouse-local and/or polars.scan_parquet to implement it as a pilot? If it's a success, other tools might follow suit.

Or maybe something like DuckLake could have an option to put column statistics in the filenames. I raised this as a discussion:

https://github.com/duckdb/ducklake/discussions/92

Jarwain · 2025-05-29T19:19:00 1748546340

I'm not super sure about it being in the filename, if only because my understanding is that some of the lakes use it for partitioning and other metadata (metameta-data?).

Imo range is probably the most useful statistic in a folder/file name anyways for partitioning purposes. My vote would be for `^` as the range separator to minimize risk of collision and confusion. i.e. `timestamp=2025-03-27T00:00:00-0800^2025-03-30-0700` or `hour=0^12`,`hour=12^24`. `^` is valid across all systems, and I'd be very surprised if it was commonly used as a property/column name. Only collision I can think of is that its start-of-line in regex

Jarwain · 2025-05-31T01:20:35 1748654435

Too late to edit buuut

There's a standard! (for time intervals, and I could see it working here)[0]

> Section 3.2.6 of ISO 8601-1:2019 notes that "A solidus may be replaced by a double hyphen ["--"] by mutual agreement of the communicating partners",

So forget what I said; why exacerbate the standards problem?[1]

[0]https://en.wikipedia.org/wiki/ISO_8601#Time_intervals [1]https://xkcd.com/927/

dugmartin · 2025-05-27T19:14:58 1748373298

This can also be done using row group metadata within the parquet file. The row group metadata can include the range values of ordinals so you can "partition" on timestamps without having to have a file per time range.

amluto · 2025-05-27T21:05:32 1748379932

But I want a file per range! I’m already writing out an entire chunk of rows, and that chunk is a good size for a Parquet file, and that chunk doesn’t overlap the previous chunk.

Sure, metadata in the Parquet file handles this, but a query planner has to read that metadata, whereas a sensible way to stick the metadata in the file path would allow avoiding reading the file at all.

mrlongroots · 2025-05-28T01:27:53 1748395673

I have the same gripe. You want a canonical standard that's like "hive partitioning" but defines the range [val1, val2) as column=val1_val2. It's a trivial addition on top of Parquet.

amluto · 2025-05-28T02:25:24 1748399124

That would do the trick, as would any other spelling of the same thing.

simlevesque · 2025-05-27T20:08:53 1748376533

I wish we had more control of the row group metadata when writing Parquet files with DuckDB.

bitbang · 2025-05-09T19:35:01 1746819301

Very nice, needs option for json/jsonl output.

tanelpoder · 2025-05-09T19:40:48 1746819648

Thanks! Yep I was thinking of doing that next, will be very easy as under the hood the data is stored in Python dictionaries.

bitbang · 2025-04-09T19:32:56 1744227176

I've done this to build custom RPi images. Way faster than trying to build on a low power ARM platform, and way less fragile than cross compilers.

jlundberg · 2025-04-11T18:41:42 1744396902

Same here, and it is blazingly fast for me running on a M2 macbook air using macOS built in virtulization framework to run an arm64 Debian.

Probably even faster on Asahi Linux, but having both macOS and a fast Debian at the same time is soo neat :)

seba_dos1 · 2025-04-12T01:52:24 1744422744

Perhaps it's fast because M2 is already arm64.

bitbang · 2025-03-28T22:51:55 1743202315

I have little patience for worthless studies that serve no purpose beyond a means of coping with time and effort sunk into a worthless humanities degree.

bitbang · 2025-03-22T15:50:33 1742658633

If I understand it correctly, landlock is an API used by an app to sandbox itself. The app itself controls the sandboxing. Bubble wrap is user space tooling external to the app, so the app had no direct awareness or control of its sandboxing. The scenarios each is intended for are orthogonal to one another.

amarshall · 2025-03-22T15:53:48 1742658828

Landlock can be used to sandbox a launched sub process, as it is here, just as the Kernel APIs used by Bubblewrap could (and sometimes are!) used by programs to sandbox themselves.

1oooqooq · 2025-03-22T16:33:17 1742661197

not exactly correct. bubblewrap, firejail, and i not sure, but maybe even apparmour, all remove capabilities and create+join restricted fs/net namespaces, and then fork the actual thing you want to execute. so it's exactly the same concept, but those use the cap and cgroups.

bitbang · on Nov 24, 2024

If your publisher requires word documents (or _any_ word processor format), you need to find a better publisher. I can understand if they prefer word for the text-copy, which is then pulled into a typesetting app. But to use that as the pre-press format is a terrible workflow. This isn't a limitation of LibreOffice, it's a limitation of not being competent in pre-press typesetting and publishing software.