Hacker Newsnew | past | comments | ask | show | jobs | submit | computably's commentslogin

> I was never under the impression that gaps in conversations would increase costs nor reduce quality. Both are surprising and disappointing.

You didn't do your due diligence on an expensive API. A naïve implementation of an LLM chat is going to have O(N^2) costs from prompting with the entire context every time. Caching is needed to bring that down to O(N), but the cache itself takes resources, so evictions have to happen eventually.


How do you do "due diligence" on an API that frequently makes undocumented changes and only publishes acknowledgement of change after users complain?

You're also talking about internal technical implementations of a chat bot. 99.99% of users won't even understand the words that are being used.


I somewhat disagree that this is due diligence. Claude Code abstracts the API, so it should abstract this behavior as well, or educate the user about it.

> Claude Code abstracts the API, so it should abstract this behavior as well, or educate the user about it.

Does mmap(2) educate the developer on how disk I/O works?

At some point you have to know something about the technology you're using, or accept that you're a consumer of the ever-shifting general best practice, shifting with it as the best practice shifts.


Yes. It’s perfectly reasonable to expect the user to know the intricacies of the caching strategy of their llm. Totally reasonable expectation.

It's not like they have a poweful all-knowing oracle that can explain it to them at their dispos... oh, wait!

They have to know that this could bite them and to ask the question first.

I do think having some insight into the current state of the cache and a realistic estimate for prompt token use is something we should demand.

It is more useful to read posts and threads like this exact thread IMO. We can't know everything, and the currently addressed market for Claude Code is far from people who would even think about caching to begin with.

Okay, sure. There's a dollar/intelligence tradeoff. Let me decide to make it, don't silently make Claude dumber because I forgot about a terminal tab for an hour. Just because a project isn't urgent doesn't mean it's not important. If I thought it didn't need intelligence I would use Sonnet or Haiku.

How's that O(N^2)? How's it O(N) with caching? Does a 3 turn conversation cost 3 times as much with no caching, or 9 times as much?

What if the cache was backed up to cold storage? Instead of having to recompute everything.

It seems you haven't done the due diligence on what part of the API is expensive - constructing a prompt shouldn't be same charge/cost as llm pass.

It seems you haven't done the due diligence on what the parent meant :)

It's not about "constructing a prompt" in the sense of building the prompt string. That of course wouldn't be costly.

It is about reusing llm inference state already in GPU memory (for the older part of the prompt that remains the same) instead of rerunning the prompt and rebuilding those attention tensors from scratch.


You not only skipped the diligence but confused everyone repeating what I said :(

that is what caching is doing. the llm inference state is being reused. (attention vectors is internal artefact in this level of abstraction, effectively at this level of abstraction its a the prompt).

The part of the prompt that has already been inferred no longer needs to be a part of the input, to be replaced by the inference subset. And none of this is tokens.


How big this cached data is? Wouldn't it be possible to download it after idling a few minutes "to suspend the session", and upload and restore it when the user starts their next interaction?

Should be about 10~20 GiB per session. Save/restore is exactly what DeepSeek does using its 3FS distributed filesystem: https://github.com/deepseek-ai/3fs#3-kvcache

With this much cheaper setup backed by disks, they can offer much better caching experience:

> Cache construction takes seconds. Once the cache is no longer in use, it will be automatically cleared, usually within a few hours to a few days.


I often see a local model QWEN3.5-Coder-Next grow to about 5 GB or so over the course of a session using llamacpp-server. I'd better these trillion parameter models are even worse. Even if you wanted to download it or offload it or offered that as a service, to start back up again, you'd _still_ be paying the token cost because all of that context _is_ the tokens you've just done.

The cache is what makes your journey from 1k prompt to 1million token solution speedy in one 'vibe' session. Loading that again will cost the entire journey.


This sounds like a religious cult priest blaming the common people for not understanding the cult leader's wish, which he never clearly stated.

What? Hiring is a contract between employer (company entity) and employee. No individual "you" can hire anybody except through the company's official process. If HR says "no we won't extend an offer," a lowly HM extending an offer would be clear-cut fraud.

Managers usually have the authority to bind the company to an employment contract. Even if they don't, the rule of "apparent authority" often means the employee can still sue.

In the USA this is mostly theoretical since HR could immediately fire the employee due to at-will employment.

But in Canada, it's a much bigger issue due to labour protections.

e.g. Many managers at American multinationals gave assurances over email to employees about work-from-home arrangements. Then the company does a huge RTO push.

When the employee refuses, HR discovers they can't fire the employee without a hefty buyout.

Best not to give assurances if you're managing a multinational team.


>>Managers usually have the authority to bind the company to an employment contract

Is that an American thing? I've been a manager for years and never heard of that happening. I didn't even know how much the people I managed were paid.


I believe it happens more often in Canada. Here's a case where the RTO ultimatum was ruled constructive dismissal, because the manager made a verbal agreement to amend the terms of employment.

https://mathewsdinsdale.com/employers-advisor-march-2025/#:~...


I meant what would have happened - and to whom - if HR had greenlighted the offer, but others' posts pretty much clarified that for me, thanks.

> I know many folks who make $500k+ a year in the SF Bay Area and complain about affordability, and to a large extent, it's stuff like that that makes them poorer.

You don't have to make absurd extrapolations to make your point. Even with 20 subscriptions at $20/mo, that's $400/mo or $4800/yr, about 2% of net income.


> Woe betide our 401(k)s when it happens, though.

The stock market crashes once in a while. Shit happens. The long-term outlook is unlikely to change nearly as much, unless you think there will be systemic macroeconomic changes.


Long-term relative to lifespan of the 401K holder. Outcome changes a lot for those who are ready to retire.

You're responding to literally 7 words out of context.

> Jobs with access to/control over millions of people's data should require some kind of genuine software engineering certification

FAANG, Fortune 500, etc., almost universally go out of their way to violate user freedom in pursuit of profit. Regulation is practically the only way to force megacorps to respect users' rights and improve their security, as evidenced by right-to-repair, surveillance/privacy, and so on.

And none of that has anything to do with users' individual rights to create, run, and modify their own software.

(Yes, regulatory capture exists, no, it doesn't mean all regulation is bad.)


If the megacorps are going in that direction of being strictly regulated, the rest of the industry will follow. It's the general movement of the Overton Window that's the underlying issue.

No, they won't. No one in their right mind "wants" ISO27001, ISO9001, SOC or multiple PITA certifications.

Companies do that because they want to attract certain kind of customers and have enough spare manpower and money to go through this all year long.

....or they want to hold a very sensitive data that requires *proven* processes, trainings and skills.

My firm has several of these and we have to keep full compliance team and *always* have some auditor on site.

No one does it just because.


YT would have to start declining in growth pretty substantially for that to be the case. All the 360p video from 2010-2015 probably doesn't take up even 1% of the storage new videos added in 2025.

True, it's more likely to be aimed at stemming the tide of 4k video that nobody watches - but luckily they're worth more than Disney right now so we don't have to confront that ... yet.

The census data you linked lists unemployment and underemployment for graduates aged 22-27. Assuming nontraditional graduates are a relatively small minority, that's a 5 year window after graduation.

I would find it believable, though not interesting, for only 11% of CS grads to have a local-median-pay, CS-related job locked in at graduation.


> And the similarities are striking. Now, I dont know whether the recommended novel is the training data, or its actually written by LLM. Or maybe its just how novelist writes.

For traditionally published works, it's trivial to exclude LLM-written content, just look for anything published before Nov 30, 2022.


Which is also a good filter for web searches to exclude a lot of garbage results (if the specific search makes sense for non-recent results)

Except many search engines have a recency bias.

A sane default previously; as news changes and the status quo also, but it makes you even more likely to encounter slop now.


Not sure how that changes the fact that you can filter by date range in searches where you don't actually need anything recent?

I think we are discussing the wrong problem here. I have no solution to offer, but I think the problem is not so much generated content, but the surroundings in which it can thrive and become the content you see everywhere.

If we hadn't removed the gatekeepers everywhere (and I know there are problems with them, too), then all that technology would not be able to do much harm.

It might also have to do with incentives. The incentives in our economy are not to help and advance society, the invisible hand nonwithstanding.


Why stop with traditionally published works? Before dead-internet-day, very-nearly all forms of writing were guaranteed to be hand crafted, organic, and made with 100% Natural Intelligence.

The artificial stuff often has an odd taste, but boy it sure is quick and convenient.


Don't you remember the endless SEO spam that swamped the Net even before GPT, allegedly written by real humans?

You joke, but I bet every person in this forum, when presented the choice between a bot-filled forum and a guaranteed human-only* forum, they'd go with the latter.

* this is a hypothetical scenario. I don't know any guaranteed human-only digital forums.


I converse enough with LLMs for research at this point where I feel I have a good enough structure to hop on/off them to primary sources and stuff, so I don't get annoyed with them too easily.

Whereas I haven't seriously reflected on my social media consumption habits for over 15 years, and over the years I'm getting more and more annoyed at social media.

Not to be a bit misanthropic, but there's something seriously wrong with my social media usage, especially when I know there's a real human on the other side, combined with ever increasing annoyance towards commenters and just the feelings I get after reading social media.

It may be dopamine / self-help related, but no actually, I think all of that is part of the issue (discovered that in high school when it was taking off). Something about the way I'm fundamentally interacting with the medium seems so horrible and icky the more I mature.


I agree with you, but as to your addendum:

Niche hobbyist forums are still safe, for now. There's just not enough commercial interest in petroleum lantern restoration to make it worth anyone's time to poison this particular well.

Even some larger niche hobbies like the saltwater aquarium community seemspretty safe for now (though it also helps that many forums have members who visit each other to trade corals and admire each others tanks).


On the contrary! The dead-day theorem established earlier states that an 11/22 date filter is a necessary condition for verifiable human-only content, when filtered by content-creation date.

A weaker theorem can be postulated that any such filter provides a second order sufficient condition.

This means we can filter content by account creation date, for example, by hiding all posts and comments from accounts created after the digital death event. This won’t always guarantee human-only content but certainly more than otherwise.

But then we wouldn’t be having this most definitively human-to-human conversation, right?


Is the ChatGPT launch the "low background steel" date for writing?

What's are the dates for images and video? Nano Banana Pro and Seedance 2.0?

And code? Opus 4.6?


It's not the launch of GPT, but probably about 4 or 4o that it really became solid. I also don't think video is there just yet, at least for video over 10 seconds.

Is it "solid" if people can read it and instantly know it's generated content?

No. But you can easily make and post content that is not easily detectable as generated.

You only notice plastic surgery when it's bad, but that doesn't mean all plastic surgery looks bad...


Who's "people"? The bottom X% (40%?) of the population is already falling for AI slop video scams, but before that, they were also falling for pig butchering and nigerian prince scams, so the "average" person benchmark has already been passed for text, photos, videos, etc. For more astute consumers, video isn't there yet.

There's also the question of whether people are even trying to disguise AI content, and how effective that disguise is. Are you or I missing the AI-generated text that just has a veneer of disguise on it?


>Who's "people"?

If you follow this thread up you will see the context is 'people who want to read content written by humans.'


why does it matter when it "became solid?" there was plenty of slop generated with ChatGPT, that really was the turning point (because of public access)

I'm pretty sure their point is Dropbox is better as personal storage than file sharing between users.

> A lot of people read things, it changes their life, and their life is better. They may not even remember where they read these things. They don't produce citations all of the time. That's totally fine, and normal. I don't see LLMs as being any different. If I write an article about making code better, and ChatGPT trains on it, and someone, somewhere, needs help, and ChatGPT helps them? Win, as far as I'm concerned. Even if I never know that it's happened. I already do not hear from every single person who reads my writing.

Not a contradiction but an addendum: plenty of creative pursuits are not about functional value, or at least not primarily. If somebody writes a seemingly genuine blog post about their family trauma, and I as the reader find out it's made-up bullshit, that's abhorrent to me, whether or not AI is involved. And I think it would be perfectly fair for writers who do create similar but genuine content to find it abhorrent that they must compete with genAI, that genAI will slurp up their words, and that genAI's mere existence casts doubt on their own authenticity. It's not about money or social utility, it's about human connection.


The consent question gets weirder when agents have persistent memory. I run agents that accumulate context over weeks — beliefs extracted from observations, relationships with other agents. At what point does an agent's memory become its own work product vs. derivative of its training? There's no legal framework for that.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: