Try something completely different from your field of expertise. For a typical nerd, this might be motor skills like in gymnastics. My experience is this takes a very long time to learn.
This is how many registers the ISA exposes, but not the number of registers actually in the CPU. Typical CPUs have hundreds of registers. For example, Zen 4 's integer register file has 224 registers, and the FP/vector register file has 192 registers (per Wikipedia). This is useful to know because it can effect behavior. E.g. I've seen results where doing a register allocation pass with a large number of registers, followed by a pass with the number of registers exposed in the ISA, leads to better performance.
What you describe sounds counter-intuitive. And the paper you cite seems to suggest an ISA extension to increase the number architected (!) registers. That is something very different. It makes most sense in VLIW architectures, like the ones described in the paper. Architectures like x86 do hardware register renaming (or similar techniques, there are several) to be able to exploit as much instruction level parallelism as possible. That is why I find you claim hard to believe. VLIW architectures traditionally provide huge register sets and make less use of transparent register renaming etc, that part is either explicit in the ISA or completely left to the compiler. These are very different animals than our good old x86...
> Compiler controlled memory: There is a mechanism in the processor where frequently accessed memory locations can be as fast as registers. In Figure 2, if the address of u is the same as x, then the last load μ-op is a nop. The internal value in register r25 is forwarded to register r28, by a process called write-buffer feedforwarding. That is to say, provided the store is pending or the value to be stored is in the write-buffer, then loading form a memory location is as fast as accessing external registers.
I think it over-sells the benefit. Store forwarding is a thing, but it does not erase the cost of the load or store, at least certainly on the last ~20 years of chips and I don't think on the PII (the target of the paper) either.
The load and store still effectively occur in terms of port usage, so the usual throughput, etc, limits apply. There is a benefit in latency of a few cycles. Perhaps also the L1 cache access itself is omitted, which could help for bank conflicts, though on later uarches there were few to none of these so you're left with perhaps a small power benefit.
In 2 out of 3 problematic bugs I've had in the last two years or so were in statically typed languages where previous developers didn't use the type system effectively.
One bug was in a system that had an Email type but didn't actually enforce the invariants of emails. The one that caused the problem was it didn't enforce case insensitive comparisons. Trivial to fix, but it was encased in layers of stuff that made tracking it down difficult.
The other was a home grown ORM that used the same optional / maybe type to represent both "leave this column as the default" and "set this column to null". It should be obvious how this could go wrong. Easy to fix but it fucked up some production data.
Both of these are failures to apply "parse, don't validate". The form didn't enforce the invariants it had supposedly parsed the data into. The latter didn't differentiate two different parsing.
> the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.
You're not allowed to do that. The email address `foo@bar.com` is identical to `foo@BAR.com`, but not necessarily identical to `FOO@bar.com`. If we're going to talk about 'commonly applied normalisations at most email providers', where do you draw that line? Should `foo+whatever@bar.com` be considered equal to `foo@bar.com`? That souds weird, except - that is exactly how gmail works, a couple of other mail providers have taken up that particular torch, and if your aim is to uniquely identify a 'recipient', you can hardcode that `a@gmail.com` and `a+whatever@gmail.com` definitely, guaranteed, end up at the same mailbox.
In practice, yes, users _expect_ that email addresses are case insensitive. Not just users, even - various intermediate systems apply the same incorrect logic.
This gets to an intriguing aspect of hardcoding types: You lose the flex, mostly. types are still better - the alternative is that you reliably attempt to write the same logic (or at least a call to some logic) to disentangle this mess every time you do anything with a string you happen to know is an email address which is terrible but gives you the option of intentionally not doing that if you don't want to apply the usual logic.
That's no way to program, and thus actual types and the general trend that comes with it (namely: We do this right, we write that once, and there is no flexibility left). Programming is too hard to leave room for exotic cases that programmers aren't going to think about when dealing with this concept. And if you do need to deal with it, it can still be encoded in the type, but that then makes visible things that in untyped systems are invisible (if my email type only has a '.compare(boolean caseSensitive)' style method, and is not itself inherently comparable because of the case sensitivity thing, that makes it _seem_ much more complicated than plain old strings. This is a lie - emails in strings *IS* complicated. They just are. You can't make that go away. But you can hide it, and shoving all data in overly generic data types (numbers and strings) tends to do that.
These days the world assumes that all parts of emails are case-insensitive, even if RFC5321 says otherwise. If it’s true for Google, Outlook & Apple mail then it’s basically true everywhere & everyone else has to get with the program.
If you don’t want to lose potentially important email then you need to make sure your own systems are case-insensitive everywhere. Otherwise you’ll find out the hard way when a customer or supplier is using a system that capitalises entire email addresses (yes, I have seen this happen) & you lose important messages.
Genuinely curious: Are non-ascii characters also case-insensitive. With Unicode comes different case-sensitivity rules according to Unicode version and locale.
The innovation here is, I think, the use of union types. The problem with errors as standard algebraic data types (ADTs) is you end up with lots of boilerplate to transform from one ADT to another, as errors propagate through the system. With union types (as found in Typescript and Scala 3) you can add and remove types from the union in an ad-hoc manner. IIRC Elm doesn't have union types, so I think the blog post is a bit inaccurate.
I mean union types in the programming language theory sense, not in the "we called this language feature union types" sense. Elm's union types are algebraic data types (ADTs). Union types in the PLT sense are what Typescript has. The main difference is that an ADT must be declared upfront, whilst a union type can be constructed in an ad-hoc manner based on whatever types are used together at a particular point in the program.
I used to think the same but after having many discussions just like this I accepted that there’s no agreed upon formal definition for what is a union type and what isn’t. Being created ad hoc or not seems like a made up on the spot distinction. If you do think there is a definition that most researchers agree on that’s distinct from ADT, do provide some sources.
Here's Types and Programming Languages on union types. My take away is that the preferred terminology is union types for non-disjoint unions and sum types for disjoint unions.
"Sum and variant types are sometimes called disjoint unions. The type T1+T2 is
a “union” of T1 and T2 in the sense that its elements include all the elements
from T1 and T2. This union is disjoint because the sets of elements of T1 or
T2 are tagged with inl or inr,respectively, before they are combined, so that
it is always clear whether a given element of the union comes from T1 or T2.
The phrase union type is also used to refer to untagged (non-disjoint) union
types, described in §15.7." p142
"The dual notion of union types, T1 ∨ T2, also turns out to be quite useful.
Unlike sum and variant types (which, confusingly, are sometimes also called
“unions”), T1 ∨T2 denotes the ordinary union of the set of values belonging
to T1 and the set of values belonging to T2, with no added tag to identify the
origin of a given element." p207
They seem to both agree that union types are about being possibly non-disjoint, and ADT being always disjoint, with values tagged as of a single type. I can accept that since it defines union as the same concept as in set theory. There’s nothing there regarding types being ad hoc, though that is arguably a consequence of the syntax since using a symbol between different types to “union” them cannot be expected to result in a disjointed type union. So, I think that’s a useful and defensible position, but I don’t think many people will use these definitions consistently given there’s already a lot of confusion in the use of these terms.
Can anyone describe how this differs from Tangled (https://tangled.org/)? Both seem very interesting, but I'm not deep enough into either to understand how they differ.
Radicle is architecturally local-first: you run your own node, sync repositories from a P2P gossip network, and then everything—browsing code, creating issues, reviewing patches—happens against your local data store. There's no round-trip to a server. Issues and patches are stored as signed Git objects (COBs) that replicate with the repo itself. The network is only involved when you choose to sync. This makes it extremely performant for day-to-day work and fully functional offline.
Tangled to my understanding is federated in theory but centralized in practice. It relies on "knots" (servers that host Git repos) and a central AppView at tangled.sh that aggregates the network. Issues and social artifacts live on Personal Data Servers, not locally. While you can self-host a knot, the default experience routes through Tangled's managed infrastructure. The architecture is fundamentally client-server: your operations go over the network to wherever your data lives.
That implementation sounds really awesome but it raises a few questions for me (that I didn't immediately see when skimming the landing page although I realize answers might be in the docs somewhere).
I found the answer to one of them (how automatic pinning works) which I'll paste here because others are likely to wonder as well. Related, I assume there's a way to block overly large files if you run a seed node?
> They can vary in their seeding policies, from public seed nodes that openly seed all repositories to community seed nodes that selectively seed repositories from a group of trusted peers.
Suppose I'm A and I collaborate with B, C, ... Z. If I file an issue locally and sync to C, am I able to see if and when that propagates through the network to everyone else? I guess what I'm wondering about is what the latency, reliability, and end user understandability are like when using this to collaborate in practice. Like if I file an issue on GitHub I know that it's globally visible immediately. How does that work here?
Currently, with Radicle still under active development, we already reach convergence times that are negligible for async collaboration (like working on code or issues). Working on a well-seeded repo, my changes sync to ~10 nodes within a tenth of a second and with ~80 nodes within 3 seconds.
This is obviously not fast enough for sync collaboration, like writing on a virtual whiteboard together, but that's also not what Radicle is designed for. Also, if you share larger files (e.g. you attach a screenshot to your issue) the above times might not be a good estimation anymore, but that's the exception for now.
It's really strange to see that people assume that peer to peer networks somehow must be slow. In my experience, since everything runs locally, working with Radicle feels way more snappy than any web interface, which has lots of latency on every so-odd click.
As the network scales, it'll of course take some care to keep the speed up, but that's known and there are a few models to take inspiration from.
It's not that I assume it must be slow, but rather that from experience being slow is a distinct possibility so I know to ask about it. But I also asked about reliability and visibility into the process. The latter is what I'm most curious about.
I'm not meaning to suggest that I have a problem with any of it. It's just that when I see anything P2P that's mutable I start wondering about propagation of changes and ordering of events and how "eventual consistency" presents to end users in practice. Particularly in the face of a node unexpectedly falling off the network.
I realize I could browse the docs but I figure it's better to ask here because others likely have similar questions and we're here to discuss the thing after all.
There's `rad sync status` which will show you (for a particular repository) which other nodes have echoed back to you that they have received and verified the most recent state of your namespace of that repository. So, if you expect some other node to have received your changes, you can use this command to verify that.
When the user explicitly asks to sync, then by default the process will be considered to have completed successfully as soon as three other nodes have echoed that they have received your changes. This threshold is configurable. Further, one can define a list of nodes that they care particularly much about, in which case the process will only be considered to have completed successfully if all these nodes also signaled that they have received your changes.
For anything deeper than that, you'd have to resort to logs. And if you connect your node to the other one your are interested, you can get a pretty good picture of what's going on.
If one node "falls off" the network, then the above mechanisms will communicate that to you, or fail after a timeout.
With Git repositories, humans establish order explicitly. They push commits which are a DAG. The collaboration around that (mostly discussions on issues, patches) is also stored in and synced by Git, but here, humans do not have to establish order explicitly. Rather, these things, in Radicle lingo called "Collaborative Objects" are CRDTs, so they will merge automatically. Nodes also opportunistically tag operations on these CRDTs with the latest operation they know, to help a bit by establishing an order where possible.
This sounds so much more appealing to me than github and co. Unfortunately I guess there's no multibillion dollar exit in the cards in this case.
Has there been any thought about how this might interact with centralized-ish hosting? For example. Suppose a large project chose to use a radicle repo as its "blessed" point of coordination. Being a major project of course there's a mirror on (at minimum) github that points back to a web page (presumably the radicle app) for filing issues, collab, wiki, whatever.
So a user that doesn't have any interest in learning about radicle wants to file an issue using the web app. When I glanced at the heartwood repo it seems to be read only with no indication of being able to log in (that's entirely unsurprising ofc). How much work / community welcome / etc is there likely to be for a project to offer a usable web front end, presumably leveraging a solution such as OIDC? Basically being able to "guest" users of centralized platforms in to the project so that they can collaborate with near zero overhead.
As a motivating example consider outfits that want to self host a git forge but also want to offer centralized services to users. Communities such as KDE and SDL come to mind. Many of them have ended up migrating to github or gitlab over the years for various reasons but in an alternate reality it didn't have to be that way!
I realize I'm effectively asking "do you have thoughts about implementing a partially federated model" but hopefully you can see the real world usecase that's motivating the (otherwise seemingly unreasonable) question.
It's a valid question, and in fact there's quite some interest in adding write features to the web app. The current version of Radicle was designed with one user per node in mind, to get things off the ground. The process of relaxing this is currently ongoing. First, to multiple users per node, which would make use-cases like the one you are sketching viable. What we'd like to avoid is to hand the key to the server, in such case, and instead generate an Ed25519 key in the browser, and sign there, with some web-compatible transport (HTTP? WebSocket?) in between. And that's just a bit more intricate than it sounds.
Tangled is built on top of the AT Protocol, and "mediates" between what they call "knots". Git servers. Their strength is to use AT Protocol to make communication across multiple Git servers work smoothly.
Radicle is completely peer to peer. There are no such things as servers and clients, only nodes. However, there are quite a few nodes that then act as HTTP servers to offer convenient access via the browser.
Also, Tangled is VC-funded. I cannot find information about Radicle, but considering the authorship is not advertised on their website, and that P2P is not easily monetizable, I would bet it is not VC-funded.
All in all, seems like an awesome project and instantly more trustworthy and rugpull-resistant than Tangled.
quite ironic, radicle seems to have raised 7m$ from "radworks", some sort of crypto foundation.
that being said, why is it being not monetizable a good thing? their website says radicle has been in development for 4 years already. without more money in the bank, how would they continue to build the thing?
Radicle is a free software project, not a company or commercial product. Many other open source projects are also not monetizable in the narrow sense, and still manage to attract enough contributors and funding to flourish. Sometimes these projects even get adopted so widespread that multiple companies build consortia/foundations to fund the development, even if there is no direct revenue stream from that.
As long as someone is willing to fund the development of Radicle, the developers just will have a stronger incentive to work on it. Without any more funding, of course it will join the (very large) club of less well funded free software projects.
If enough people join and contribute now, and then some companies make the switch, it might well be feasible to pay a small team to continue working on it, financed by donations.
Just don't think about it as a commercial product, only because someone decided to use their money towards its development. If you don't like that it's not a company, then that's okay. I am just trying to give another perspective.
Speaking of prolific Racketeers... Noel! Just an hour ago, on a walk, I was thinking, "I should work through that one LLM book, and implement it in Racket." (But have started job-hunting, so will probably be Python.)
I've got so much other stuff I'd rather learn and code I'd rather write (C/wasm backend for my language), but I've also started job hunting and probably should understand how this latest fad works. Neural networks have long been on my todo list anyway.
Interesting stuff. I didn't read the whole paper in detail. I'm unclear on how FIP differs from adding uniqueness types to a language. A number of languages are exploring this (Rust being the most well known) so what's the novelty that FIP brings?
Whereas uniqueness qualifiers on values or references are a semantic component of a languages terms, FIP is more akin to a polymorphic System F based language having restrictions on term formation to enable decidable type inference.
Uniqueness types (which Rust can only be said to have as an implementation detail of the borrow checker’s running, as opposed to having programmer assignable uniqueness qualifier syntax for construction of types) are semantic constructs that a user places on types which then restrict the operations available for that object. In this paper and in Koka, the language that semi-inspired the exploration here, the functional-in-place mutation scheme is an optimization performed by the compiler in those cases where the resulting code is probably equivalent to the surface level syntax. Said surface level syntax presenting a logical view of the program as only have persistent, immutable objects.
The implicit novelty, although this paper is more an exploration of existing concepts in a specific environment (i.e. no GC at run time), is that there is no annotation burden or conceptual distinction to be made by users of the language to receive the performance benefits of mutation where available.
This, Local-first Software [1], the Humane Web Manifesto [2], etc. make me optimistic that we're moving away from the era of "you are the product" dystopian enshittification to a more user-centric world. Here's hoping.
reply