More

ycombiredd · 2026-01-29T14:06:12 1769695572

Tangentially related, I once wanted to render a NetworkX DAG in ASCII, and created phart to do so.

There's an example of a fairly complicated graph of chess grandmaster PGM taken from a matplotlib example from the NetworkX documentation website, among some more trivial output examples in the README at https://github.com/scottvr/phart/blob/main/README.md#example...

(You will need to expand the examples by tapping/clicking on the rightward-facing triangle under "Examples", so that it rotates to downward facing and the hidden content section is displayed)

ycombiredd · 2026-01-20T05:14:45 1768886085

Yes. This type of behavior was what I was referring to in an earlier comment mentioning flashbacks to seeing logs from named filled with "cannot have cname and other data", and slapping my forehead asking "who keeps doing this?", in the days when editing files by hand was the norm. And then, of course having repeats of this feeling as tools were built, automations became increasingly common, and large service providers "standardized" interfaces (ostensibly to ensure correctness) allowing or even encouraging creation of bad zone configurations.

The more things change, the more things stay the same. :-)

ycombiredd · 2026-01-20T05:06:05 1768885565

You just caused flashbacks of error messages from BIND of the sort "cannot have CNAME and other data", from this proximate cause, and having to explain the problem many, many times. Confusion and ambiguity of understandings have also existed since forever by people creating domain RR's (editing files) or the automated or more machined equivalents.

Related, the phrase "CNAME chains" causes vague memories of confusion surrounding the concepts of "CNAME" and casual usage of the term "alias". Without re-reading RFC1034 today, I recall that my understanding back in the day was that the "C" was for "canonical", and that the host record the CNAME itself resolved to must itself have an A record, and not be another CNAME, and I acknowledge the already discussed topic that my "must" is doing a lot of lifting there, since the RFC in question predates a normative language standard RFC itself.

So, I don't remember exactly the initial point I was trying to get at with my second paragraph; maybe there has always been some various failure modes due to varying interpretations which have only compounded with age, new blood, non-standard language being used in self-serve DNS interfaces by providers, etc which I suppose only strengthens the "ambiguity" claim. That doesn't excuse such a large critical service provider though, at all.

ycombiredd · 2026-01-18T07:17:44 1768720664

So, I posted this link. I actually did so assuming it likely already had already been submitted, and I wanted to discuss this with people more qualified and educated in the subject than I. The authors of this paper are definitely more qualified to publish such a paper than I am; I'm not an ML scientist and I am not trying to pose as one. The paper made me feel a sort of way, and caused a bunch of questions to come to mind I didn't find answers to in the paper but, as I'm willing to suppose, maybe I'm not even qualified to read such a paper. I considered messaging the authors someplace like Twitter or in review/feedback on the Arxiv submission (which I probably don't have access to do with my user anyway, but I digress.) I decided that might make me seem like a hostile critic, or maybe likely, I'd just come off as an unqualified idiot.

So... HN came quickly to mind as a place where I can share a thought, considered opinion, ask questions, with potential to have them be answered by very smart and knowledgeable folks on a neutral ground. If you've made it this far into my comment, I already appreciate you. :)

Ok so... I've already disclaimed any authority, so I will get to my point and see what you guys can tell me. I read the paper (it is 80+ pages, so admittedly I skimmed some math, but also re-read some passages to feel more certain that I understood what they are saying.

I understand the phenomenon, and have no reason to doubt anything they put in the paper. But, as I mentioned, while reading it I had some intangible gut "feelings" that seeing that they have math to back what they're saying could not resolve for me. Maybe this is just because I don't understand the proofs. Still, I realized when I stopped reading at it that it actually wasn't anything that they said, it was what it seemed to my naive brain was not said, and I felt like it should have been.

I'll try to get to the point. I completely buy that reframing prompts can reduce mode collapse. But, as I understand it, the chat interface in front of the backend API of any LLM tested does not have insight into logits, probs, etc. The parameters passed by the prompt request, and the probabilities returned with the generations (if asked for by the API request) do not leak, are not provided in the chat conversation context in any way, so that when you prompt an LLM to return a probability, it's responding with, essentially, the language about probabilities it learned during its training, and it seems rather unlikely that many training datasets contain actual factual information about their own contents' distributions for the model during training or RLHF to "learn" any useful probabilistic information about its own training data.

So, a part of the paper I re-read more than once says at one point (in 4.2): "Our method is training-free, model-agnostic, and requires no logit access." This statement is unequivocally obviously true and honest, but - and I'm not trying to be rude or mean, I just feel like there is something subtle I'm missing or misunderstanding - because, said another way, that statement could also be true and honest if it said "Our method has no logit access, because the chat interface isn't designed that way", and here's what immediately follows then in my mind, which is "the model learned how humans write about probabilities and will output a number that may be near to (or far away from) the actually prob of the token/word/sentence/whathaveyou, and we observed that if you prompt the model in a way that causes it to output a number that looks like a probability (some digits, a decimal somewhere), along with the requested five jokes, it has an effect on the 'creativity' of the list of five jokes it gives you."

So, naturally, one wonders what, if any actual correlation there is between the numbers the LLM generates as "hallucinated" (I'm not trying to use the word in a loaded way; it's just the term that everyone understands for this meaning, with no sentiment behind my usage here) probabilities for the jokes it generated, and the actual probabilities thereof. I did see that they measured empirical frequencies of generated answers across runs and compared that empirical histogram to a proxy pretraining distribution, and that they acknowledge that they did no comparison or correlation of the "probabilities" output by the model, and they clearly state it. So without continuing to belabor that point, this is probably core to my confusion about the framing of what the paper says that the phenomenon indicates.

It is hard for me to stop asking all the slight variations on these questions that lead me to write this, but I will stop, and try to get to a TL;DR I think dear HN readers may appreciate more than my exposition of befuddlement bordering on dubiousness:

I guess the TLDR of my comment is that I am curious if the authors examined any relationship between the LLM verbalized "probabilities" and actual model sampling likelihoods (logprobs or selection frequency). I am not convinced that the verbalized "probabilities" themselves are doing any work other than functioning as token noise or prompt reframing.

I didn't see a control for, or even a comparison to/against multi-slot prompts with arbitrary labels or non-semantic "decorative" annotation. In my experience poking and prodding LLMs as a user, desiring to influence generations in specific and sometimes unknown ways, even lightweight slotting without probability language substantially reduces repetition, which makes me wonder how much of the gain from VS is attributable to task reframing, as opposed to the probability verbalization itself.

This may not even be a topic of interest for anyone, and maybe nobody will even see my comment/questions, so I'll stop for now... but if anyone has insights, clarifications, or can point out where I'm being dense, I actually have quite a bit more to say and ask about this paper.

I can't really explain why I just had to see if I could get another insightful opinion on this paper (I usually don't have such a strong reaction when reading academic papers I may not fully understand, but there's some gap in my knowledge (or less likely, there's something off about the framing of the phenomenon described), and it's causing me to really hope for discussion, so I can ask my perhaps even less-qualified questions pertaining to what boils down to mostly just my intuition (or maybe incomprehension. Heh.)

Thanks so much if you've read this and even more if you can talk to me about what I've used too many words to try to convey here.

dcx · 2026-01-19T04:38:48 1768797528

Hello! I'm one of the main authors of the paper. Thanks for engaging with our work so thoughtfully – that's a very clear and valid question.

We didn't get around to addressing this within the paper itself – 80 pages is a lot, and deadlines, etc. But I have unpublished experiments that show that in a reasonably broad setting I'm doing some work in, verbalized probabilities are restoring a distribution that looks almost identical to the base distribution. It is not possible to demonstrate this on frontier models, since their public models are already mode-collapsed, and they don't share the base model or logprobs anyway. But I've established this to my personal satisfaction on large local models which offer base / post-trained pairs.

To share some intuition on why one might believe this is occurring: there are a bunch of tasks implicit in the pre-training corpus that encourage the model to learn this capability. Consider sentences in news and research articles like: "Scientists discover that [doing something] increases [some outcome] on [some population] by X%". It seems quite natural that the model might learn a pathway by which it can translate its base probabilities into the equivalent numeric tokens in order to "beat" the task of reducing loss on the "X%" prediction. I can even almost visualize how this works mechanically in terms of what the upper layers of an MLP would do to learn this, i.e. translating from weights into specific token slots. And this is almost certainly more parameter-efficient than constructing an entire separate emulated reality for filling in X. Although I'm not ruling out that the latter might still be happening – perhaps some future interp research might be able to validate this!

I'm actually working on a paper that packs up some of the above findings in passing. But if helpful in the meantime, this is also building on related work by Tian et al. 2023, "Just Ask for Calibration" [1] and Meister et al. 2024, "Benchmarking Distributional Alignment of LLMs" [2], that give some extra confidence here. Their findings indicate that whether or not they are rooted in the model's base probabilities, they seem to be useful for the purposes that people care about. (Oh, and you can probably set up an experiment to verify this independently with vLLM in a few Claude Code requests!)

Hope that was helpful – feel free to ping with follow-ups! (Although replies might be a little delayed, I happened to see this at a good time; having quite a crunchy week)

[1] https://arxiv.org/abs/2305.14975

[2] https://arxiv.org/abs/2411.05403

ycombiredd · 2026-01-13T12:28:51 1768307331

Maybe I am missing something or am just naive, but isn't it fairly common for social media accounts of well-known figures to be taken over (hacked/phished/whatever) for the purpose of shilling some crypto scam? Launching a memecoin and then very quickly (30 min later, apparently) rugpulling seems like it would at least as likely fit that type of scam as it would being one where the public figure themselves is actually behind the scam.

Not making a claim as to what is actually true, just positing explanations. Heck, maybe plot twist: it is actually Eric Adams behind it, but the "account takeover" possibility was planned to serve as plausible deniability.

You know... like "an actor that's playing a dude, disguised as another dude" type thing.

tomasphan · 2026-01-13T13:09:32 1768309772

Nope he made a video about it: https://x.com/ericadamsfornyc/status/2010849167258566727?s=4...

shmeeed · 2026-01-13T13:17:46 1768310266

Just pointing out, this clip could have been done with AI just as well.

panja · 2026-01-13T14:36:41 1768315001

Yeah but I doubt it. These people have PR teams and could have easily released a statement if this was fake.

ycombiredd · 2026-01-13T23:37:31 1768347451

Yeah, just following up to my grandparent comment to say "wow. Holy shit. It is how it looks." I'm not sure why I was surprised; maybe I'm an optimist, or as I suggested in my first comment, a bit naive.

In my defense, I don't think I'm stupid; I just don't want to believe so many people in power are cartoonishly evil, so I tend to look for explanations that don't require it. I think my internal sense of the world wants there to be a distinction between, say, average cryptoscammer evil buffoonery and the people in positions where at least ostensibly they try to present as a good guy while trying to keep their evildoings secret. This story gives me some sort of cognitive dissonance, and while reflecting on that fact, I get a bit sad. This world is bonkers.

ycombiredd · 2026-01-08T14:12:19 1767881539

It is interesting, for sure, that they are using a gmail.com email address for a role account apparently currently for which the recipient is CPT John Hutchison as of May 2025 [0] But that's not what actually inspired me to write this reply I thought some of you may enjoy reading about.

Incidentally, the dot in the local recipient part of that NSA veterinarian address brings something of a fond anecdote to mind: Since for a gmail SMTP address at delivery time, (excluding organizationally-managed Workspace addresses) "dots" do not matter in the LHS of a recipient address [1], this gmail account address (since it is in the gmail.com domain) would actually be just "nsabahrainvetclinic[at]gmail.com", and the dot seems only to be a visual cue to make its meaning clearer for the human reader/sender. But that's just a preface to my actual anecdote.

More preface: Gmail account names (the LHS) must be at least six characters in length when the account is submitted for creation. [2]

As an early adopter from the Gmail invite-only beta stage, I was able to obtain my long-held, well-known (by my peers) 7-character UNIX login name @gmail.com without issue, which consists of my five-letter first name followed immediately by a two-letter abbreviation of my lengthy Dutch surname, as had been used for years as my Unix login (and thus email address) and sometimes as my online forum handle.

In this early day of gmail, I wanted to "reserve" similar short, memorable, and tradition-preserving usernames for my children, who would soon be entering ages where having an email account would be relevant for them and I was in a position with my allotment of invites to secure such "good" addresses for them. For my daughter this worked out easily as her first name plus surname abbreviation worked out to exactly six characters. For my son, this seemed to not be possible since his given name was only three letters long, and 3+2 being 5, meant that creating a gmail account for him, following my newly-imposed family standard naming scheme seemed impossible.

So, on a hunch following a scent of there possibly being something I could exploit here (and slightly influenced by the burgeoning non-Unix-login-length character imposition corporate trend of first.last[at]domain address standardizations), hypothesizing a letter-correct gmail web front-end implementation that might allow me to spirit-violate backend behavior to achieve my goal, I followed through and successfully got my son's gmail address past the first criteria that a new account must be at least six characters by creating his address as his three letter first name, followed by a "dot", with our two-letter abbreviation of our long surname at the end; something like abc.xy@gmail.com. And my hunch paid off, for as described in [1], the dot was simply ignored at SMTP address-parsing and delivery (and mayhaps also/because at username creation/storage time, but that's just a guess; I'm unsure how/why it actually worked at a technical level since I did not work at Google), giving my son the ability to effectively have a five-letter gmail "username" in his address, in the intended "first name followed by last name two-letter short form" I had created for my progeny, simply by omitting the '.' From his username when sending him email to his gmail address! :-) (My son, sadly has since passed - RIP my sweet boy Ryk; I miss you terribly every day) and I have no idea if this technique is still exploitable in this way today.

I did later wonder if I could have done similar using the fact that "+anything" is ignored in the LHS when parsing a gmail delivery address to maybe pull off creating a three-letter username for a gmail account for my son back then, but never actually tried it when it could have been trivial to try to exploit that sort of front-end-validation vs backend implementation technique for gmail addresses. shrug

I hope y'all don't mind my little off-topic tangent and enjoy the story of this afaik little-known feat that could be pulled off, at least for a time.

[0] https://www.cusnc.navy.mil/Portals/17/NSA%20BAHRAIN%20IMPORT...

[1] https://support.google.com/mail/answer/7436150?hl=en

[2] https://support.google.com/mail/answer/9211434?hl=en

sirpilade · 2026-01-08T16:49:19 1767890959

I just wanted to say that I enjoyed your story and I am deeply sorry for your loss.

ycombiredd · 2026-01-09T01:25:49 1767921949

Thank you, on both counts.

ycombiredd · 2026-01-07T21:51:34 1767822694

I can't quite figure out what sort of irony the blurb at the bottom of the post is. (I'm unsure if it was intentional snark, a human typo, or an inadvertent demonstration of Haiku not being well suited for spelling and grammar checks), but either way I got a chuckle:

> Disclaimer: This post was written by a human and edited for spelling, grammer by Haiku 4.5

hu3 · 2026-01-08T00:00:56 1767830456

The most plausible explanation is that the only typo in that post was made by a human.

ycombiredd · 2025-12-31T21:58:55 1767218335

I think he's saying that he (GlenTheMachine) is Glen Henshaw, "space roboticist", and (understandably) was a bit excited that a somewhat famous document contains a "law" bearing his name as attribution was posted by this water cooler. A way to get some minor attention for it in a comment thread full of like-minded users, and probably offer a genuine (and also maybe coy/tongue-in-cheek) offer to answer questions about that specific line item law.

I like that he waved from the crowd in this way, if only for the "huh. Small world" moment I had reading his comment.

ycombiredd · 2025-12-31T20:59:33 1767214773

> Couldn't you send data at megabits per seconds over a mile long copper wire

Yes, but you need the bare copper wire without signaling. We operated a local ISP in the 90's and did exactly that by ordering so-called "alarm circuits" from the telco (with no dial tone) and placed a copper T1 CSU on each end. We marketed it as "metro T1" and undercut traditional T1 pricing by a huge margin with great success to the surrounding downtown area.

ycombiredd · 2025-12-29T03:31:33 1766979093

As someone who writes html only rarely (I'm more of a "backend" guy, and even use that term loosely.. most of my webdev experience dates back to the CGI days and often the html was spat out by Perl scripts) and usually in vim, I am pleased to know there is an in-built solution outside of me properly indenting or manually counting divs. Thanks for enlightening me.

wging · 2025-12-29T03:42:17 1766979737

A more common alternative to counting divs would be CSS classnames or (for unique elements on the page) IDs. You'd do `document.querySelector('.my-class')` to locate `<div class="my-class">` or similar, rather than using the fact that e.g. something is nested 3 divs inside <body>.

Even if this custom element trick didn't work, I don't see why one would need to count divs (at least if you control the markup, but if not then made-up tags aren't an option anyway). The article even mentions using class names as an option.

ycombiredd · 2025-12-29T04:03:20 1766981000

Sorry, I didn't mention class names because the article explicitly did and I assumed that my aversion to the extra typing would be presumed by a reader of my comment. My mistake.

So yeah, I guess what wasn't obvious from my statement of gratitude was that I appreciate knowing that there is a more concise way of keeping track - even without CSS styling. If I make up tags, they will just inherit default styling but to my eye I can be clear about where things are closed, and where to insert things later. I was talking about the manual editing (in vim, as I mentioned), rather than any dynamic query selectors. Make more sense?