More

itsphilos · on May 26, 2024

I think OP is referring to the "unprivileged user namespaces" [1] feature of Linux, which caused numerous security incidents in the past. AFAIK, this is mainly because with this feature enabled, unprivilged users can create environments/namespaces which allow them to exploit kernel bugs much more easily. Most of them revolve around broken permission checks (read: root inside container but not outside, yet feature X falsely checks for the permissions _inside_). [2] has a nice list of CVEs caused by unprivileged user namespaces. Given that rootful docker e.g. is also prone to causing security issues, it's ultimately an attacker model / pick-your-poison situation though.

[1] https://www.man7.org/linux/man-pages/man7/user_namespaces.7....

[2] https://security.stackexchange.com/a/209533

curt15 · on May 26, 2024

Doesn't the Chromium sandbox, the gold standard for browser sandboxes, use user namespaces?

https://chromium.googlesource.com/chromium/src/+/HEAD/docs/l...

itsphilos · on Jan 21, 2024

My experience with Cpp isn't that extensive, but what is the use-case of a garbage collector in this language? I always had the impression that with the well-defined lifetimes of objects, you wouldn't really create garbage to begin with, but I guess there are some use-cases I don't yet know about?

mike_hearn · on Jan 21, 2024

It's pretty useful. Chrome uses one extensively for example (called Oilpan). So does Unreal Engine. GC in C++ is much more widely used than people realize.

The problem is that big programs in the real world often don't have well defined lifetimes for everything. The idea that everything in your app can have its lifetime worked out in advance isn't true when you're modelling the world (a world), or lifetimes are controlled by other people (e.g. website authors).

Generally what you see is that these apps start out trying to do manual memory management, decide it's too difficult to do reliably at scale, and switch to a conservatively GCd heap with a custom collector or Boehm.

Note that Rust doesn't fix this problem. Rust just encodes the assumption of pre-known lifetimes much more strongly in the language. If you're not in such a domain then you have to fall back to refcounting, which is (a) slow and (b) easy to get wrong such that you leak anyway. Early versions of Rust used refcounting for everything and iirc they found anywhere between 10-15% of the resulting binaries was refcounting instructions!

pcwalton · on Jan 21, 2024

> Note that Rust doesn't fix this problem. Rust just encodes the assumption of pre-known lifetimes much more strongly in the language. If you're not in such a domain then you have to fall back to refcounting, which is (a) slow and (b) easy to get wrong such that you leak anyway. Early versions of Rust used refcounting for everything and iirc they found anywhere between 10-15% of the resulting binaries was refcounting instructions!

Well, modern idiomatic Rust only uses Arc/Rc on the few objects where it's needed, so the overhead of reference count adjustment is so tiny as to never show up. You typically only see reference count traffic be expensive when either (a) everything is reference counted, as in ancient Rust; or (b) on super-inefficient implementations of reference counting, as in COM where AddRef() and Release() are virtual calls.

mike_hearn · on Jan 22, 2024

Right, but that's what I mean by encoding the assumption of known lifetimes in the language. The design is intended for cases where most lifetimes are known, and only a few need ref counting, and you can reason about lifetimes and relationships well enough in advance to avoid cycles. At least, that's my understanding (my Rust-fu is shamefully poor).

steveklabnik · on Jan 21, 2024

> Early versions of Rust used refcounting for everything and iirc they found anywhere between 10-15% of the resulting binaries was refcounting instructions!

Do you happen to have a citation for this? I don’t remember ever hearing about it, but it’s possible this was before my time, as I started in the smart pointers era.

mike_hearn · on Jan 21, 2024

I read it in an old HN comment by pcwalton. Algolia to the rescue! Actually it's even worse, he said it was 19% of the binary size:

https://news.ycombinator.com/item?id=5691684

The Rust compiler does this. Even so, 19% of the binary size in rustc is adjusting reference counts.

I am not exaggerating this. One-fifth of the code in the binary is sitting there wasted adjusting reference counts. This is much of the reason we're moving to tracing garbage collection.

It's interesting how many strategies Rust tried before settling on linear types.

klodolph · on Jan 21, 2024

> It's interesting how many strategies Rust tried before settling on linear types.

Rust doesn’t actually have linear types. I’m not sure what Rust’s types are called (affine?), but linear types are the “must be consumed” (can’t leak) types, and Rust doesn’t have any support for this.

Rust’s guarantee is that you MUST NOT use an object after dropping it. Linear types would add the additional requirement that you MUST drop the object.

steveklabnik · on Jan 21, 2024

Rust's types are affine, yes. Sometimes some people don't draw the distinction between the two and call both "linear." But "affine" is more precise.

pcwalton · on Jan 21, 2024

Fascinating, thank you!

(I completely forgot I wrote that and also I'm mildly embarrassed at how I used to write back then.) :)

steveklabnik · on Jan 21, 2024

Thank you!!!

bfgeek · on Jan 21, 2024

Oilpan can theoretically be used as a library as well - see: https://v8.dev/blog/oilpan-library https://v8.dev/blog/tags/cppgc https://chromium.googlesource.com/v8/v8.git/+/HEAD/include/c...

Due to the nature of web engine workloads migrating objects to being GC'd isn't performance negative (as most people would expect). With care it can often end up performance positive.

There are a few tricks that Oilpan can apply. Concurrent tracing helps a lot (e.g. instead of incrementing/decrementing refs, you can trace on a different thread), in addition when destructing objects, the destructors typically become trivial meaning the object can just be dropped from memory. Both these free up main thread time. (The tradeoff with concurrent tracing is that you need atomic barriers when assigning pointers which needs care).

This is on top of the safey improvements you gain from being GC'd vs. smart pointers, etc.

One major tradeoff that UAF bugs become more difficult to fix, as you are just accessing objects which "should" be dead.

vlovich123 · on Jan 21, 2024

> One major tradeoff that UAF bugs become more difficult to fix, as you are just accessing objects which "should" be dead.

Are you referring to access through a raw pointer after ownership has been dropped and then garbage collection is non deterministic?

bfgeek · on Jan 21, 2024

> Are you referring to access through a raw pointer after ownership has been dropped and then garbage collection is non deterministic?

No - basically objects sometimes have some state of when they are "destroyed", e.g. an Element detached from the DOM tree[1]. Other parts of the codebase might have references to these objects, and previously accessing them after they destroyed would be a UAF. Now its just a bug. This is good! Its not a security bug anymore! However much harder to determine what is happening as it isn't a hard crash.

[1] This isn't a real case, just an example.

uluyol · on Jan 21, 2024

Not sure early versions of rust is the best example of refcounting overhead. There are a bunch of tricks you can use to decrease that, and it usually doesn't make sense to invest too much time into that type of thing while there is so much flux in the language.

Swift is probably a better baseline.

EPWN3D · on Jan 21, 2024

Yeah I was thinking the same thing. "10 years ago the Rust compiler couldn't produce a binary without significant size coming from reference counts after spending minimal effort to try and optimize it" doesn't seem like an especially damning indictment of the overall strategy. Rust is a language which is sensitive to binary size, so they probably just saw a lower-effort, higher-reward way to get that size back and made the decision to abandon reference counts instead of sinking time into optimizing them.

It was probably right for that language at that time, but I don't see it as being a generalizable decision.

Swift and ObjC have plenty of optimizations for reference counting that go beyond "Elide unless there's an ownership change".

adgjlsfhk1 · on Jan 21, 2024

it's worth noting that percent of instructions is a bad metric since modern CPUs have lots of extra compute, so adding simple integer instructions that aren't in the critical path will often not affect the wall time at all.

mike_hearn · on Jan 21, 2024

The problem is that once threading gets involved they have to be interlocked adjustments and that's very slow due to all the cache coherency traffic. Refcounts are also branches that may or may not be well predicted. And you're filling up icache, which is valuable.

vlovich123 · on Jan 21, 2024

You could have a hybrid recount where you use basic integers when adjusting the ref count on the current thread and atomics to adjust the global count when you hit 0 (hybrid-rc is the crate you can use but something Swift-like where the compiler does ARC for specific values when you opt in may not be the worst). Also when the type isn’t Send then you don’t need to do atomic refcounts although the interaction with unsafe does get a like more complex.

But yeah, at this point Rust’s niche is a competitor to c/c++ and in that world implicit recounting doesn’t have a lot of traction and people favor explicit gc and “manual” resource management (RAII mechanisms like drop and destructors are ok).

FartyMcFarter · on Jan 21, 2024

> what is the use-case of a garbage collector in this language?

Same as other languages.

> I always had the impression that with the well-defined lifetimes of objects, you wouldn't really create garbage to begin with

There's no well defined lifetime of objects when it comes to dynamic allocation. For example, if you allocate something with the new keyword, there are no language guarantees that this won't leak memory.

Jeaye · on Jan 21, 2024

I'm using C++ to build jank, a native Clojure dialect on LLVM with C++ interop. All of Clojure's runtime objects are dynamically allocated and the churn of reference counting is far too slow, compared to garbage collection. I had originally started with an intrusive_ptr and atomic count and the Boehm GC was about 2x faster for that benchmark (and at least as much for every later benchmark).

Even outside of writing languages on top of C++, if you're using something like immer, for persistent immutable data structures in C++ (as jank does), it has memory policies for reference counting or garbage collection. This is because immutable data generally results in more garbage, even when transients are used for the most pathological cases. That garbage is the trade off for knowing your values will never be mutated in place. The huge win of that is complete thread safety for reading those values, as well as complete trust on reproducibility/purity and trivial memoization.

pjmlp · on Jan 21, 2024

Targeting .NET, hence C++/CLI, integration with Blueprints, hence Unreal C++.

itsphilos · on Jan 3, 2024

I don't know much about licenses, but why is MIT preferable over GPL used by the GNU coreutils?

master-lincoln · on Jan 3, 2024

MIT licensed projects can be taken and sold by others without sharing any changes made to it. With GPL, all changes when taken by a third party need to be openly available for everybody. So the hardcore FOSS community (which GNU certainly counts to, see Richard Stallmann) is against using MIT and alike as it enables others to profit from your work without giving back any improvements to the open community.

(hand wavy, non-detailed explanation - read other sources for clarity)

So the MIT license is less restricted for users, but ultimately only in the interest of people looking for profit, not consumers

rcxdude · on Jan 3, 2024

Specifically, if you are making a widget with embedded linux, the anti-tivoisation clauses in GPLv3 are a bit of a pain (even if you are not particularly trying to lock down the system from tinkering): they effectively mean you need to develop and provide an extra partial-update mechanism.

m4rtink · on Jan 3, 2024

That's kinda the point - owners of devices should be in control of software their devices are running. Or they are not owners.

drxzcl · on Jan 3, 2024

The MIT license is two paragraphs long. The GPL is little more than a page, plus a very informative preamble.

Don’t take anyone’s word for what they say or which is better. Read them.

chrismorgan · on Jan 3, 2024

The terms and conditions part of the GPLv3 is 4,600 words long, which in typical formatting will be about 7–9 pages long. Not one.

mr_toad · on Jan 3, 2024

I’m not sure I value the opinion of (or want to use software written by) someone who couldn’t read nine pages.

ilikehurdles · on Jan 3, 2024

To consumers, it isn’t. It removes beneficial terms from users.

master-lincoln · on Jan 3, 2024

This is an ideological view that I would oppose. GPL is beneficial for the user compared to MIT because any improvements made by users of a project will be open for everybody again. With MIT it's just easier to make a profit without giving back. That is not in the interest of consumers

ilikehurdles · on Jan 3, 2024

It sounds like we agree? With the GPL all users have the right to look at the code. With MIT none do.

This seems like a pretty straightforward consumer benefit that anyone should be able to acknowledge regardless of which license they develop with.

master-lincoln · on Jan 3, 2024

indeed, sorry, not sure why I got this wrong

cmrdporcupine · on Jan 3, 2024

yeah seems like they read your comment wrong

thebruce87m · on Jan 3, 2024

Conversely, it means that the software can end up in places it wouldn’t otherwise. This could be better for consumers if the alternative is a poorer quality library with a more permissive license.

itsphilos · on Oct 10, 2023

I didn't know about Duff's device, which looks pretty interesting! Thanks for sharing

lisper · on Oct 10, 2023

Sure, but just for the record IMHO duffs device is an abomination and an illustration of how badly broken C is. No one should ever actually use it.

commandersaki · on Oct 11, 2023

I think I've come across Duff's device in Go internals for zero-ing memory.

itsphilos · on May 18, 2023

again?

itsphilos · on Sept 6, 2022

It is worth noting that this release currently breaks the banIP package [1], which relies on the old fw3. So for those relying on it, it might be worth waiting for a short while.

[1] https://forum.openwrt.org/t/banip-support-thread/16985/751

molticrystal · on Sept 6, 2022

There is a fuller list of broken packages and their status over here:

https://github.com/openwrt/packages/issues/16818

itsphilos · on Aug 17, 2022

Fascinating link, thank you! On that note though: How are the horizontal bars (first one right below the introduction) made? My guess is that they gathered all colours from all pixels, made a histogram, and then converted that into a horizontal stacked bar. However, the ordering seems weird, as some colours look like they appear multiple times (say for example the almost black-ish regions), disproving my histogram hypothesis. Then again, that may be due to image compression. And ordering the colour vectors would highly depend on the used colour space & the used partial order. I similarly stumbled over the image-version of these bars in section "The colours within a single object" further down, which I just don't quite understand how they're made.

Does someone have insight on the methodology or maybe a link to a paper? The present methodology section seems to focus on object similarity.

itsphilos · on Feb 18, 2022

> Note: This blog post has been rewritten at least 3 times. I started with describing how I configured the system, then I went to bragging about how I love bspwm, how I set up all my jails, etc. I might still write about it at some point, but not this time. Every time I started writing the post, I realised that I was missing a point. I can say now that I know what I really wanted to say: that I love FreeBSD and I find joy in using it.

I really appreciate the honesty and I think it was a good choice to focus on fleshing out the message. Also: Quite the un-evangelistic stance on this rather controversy-inducing topic!

itsphilos · on May 19, 2021

A fascinating read! And very well written.

What I found interesting is the ripple-like throughput fluctuations. Especially on the TX2 v8.1a, this seems very odd. User 'MB' has already asked this on the sites forum, but the author could not explain it either. Maybe someone on HN has an idea?

itsphilos · on Feb 28, 2021

I took a university class on said book last year, held by one of the authors Prof. Tobias Nipkow.

Its a fascinating introduction into general program semantics, formal analyses and proofs for program properties. During the lecture I struggled somewhat as I didn‘t really have sufficient background knowledge for it to run smoothly. But all the concepts are supported by, well, concrete examples which you can immediately try to solve in Isabelle. So I definitely recommend giving it a go!

It was an eye-opening experience for me, especially since I never really looked into formal program analysis.

max_ · on Feb 28, 2021

I would like to get into formal methods but I am kind of confused.

What's the difference between TLA+, Coq & Isabelle?

Is there one should I prefer over the other? Or do they individualy satisfy distinctive purposes?

zaik · on Feb 28, 2021

I am no expert, but Coq and Isabe le "feel" very similar at first, but Isabelle seems to provide much more powerful tools for proof automation than Coq. Isar, Isabelle's proof language, tries to be close to natural language proofs which makes it more intuitive.

I really wish Coq had equally powerful automation tools and a more intuitive proof language, because it feels much more of a solid language and I really liked that the clear relation of types <-> statements, terms <-> proofs, type checking <-> proof verification; including the ability to print the raw proof terms of theorems.