Hacker Newsnew | past | comments | ask | show | jobs | submit | macinjosh's commentslogin

We need something like SETI@home/Folding@home but for crawling and archiving the web or maybe something as simple as a browser extension that can (with permission) archive pages you view.

This exists although not in the traditional BOINC space, it's Archiveteam^1. I run two of their warrior^2 instances in my home k3s instance via the docker images. One of them is set to the "Team's choice" where it spends most of its time downloading Telegram chats. However, when they need the firepower for sites with imminent risk of closure, it will switch itself to those. The other one is set to their URL shortener project, "Terror of Tiny Town"^3.

Their big requirement is you need to not be doing any DNS filtering or blocking of access to what it wants, so I've got the pod DNS pointed to the unfiltered quad9 endpoint and rules in my router to allow the machine it's running on to bypass my PiHole enforcement+outside DNS blocks.

^1 https://wiki.archiveteam.org/

^2 https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

^3 https://wiki.archiveteam.org/index.php/URLTeam


In the US at least, there is no expectation of privacy in public. Why should these websites that are public-facing get an exemption from that? Serving up content to the public should imply archivability.

Sometimes it feels like ai-use concerns are a guise to diminish the public record. While on the other hand services like Ring or Flock are archiving the public forever.


Ring and Flock are not a standard we should be striving towards. Their massive databases tracking citizens need to go.

Your TV probably does that, and you definitely gave it permission when you clicked "accept" on the terms.

good thing I don't have a TV!

I run an ArchiveBox instance locally. Recommended! https://archivebox.io/

This is a good idea. Not sure what ToS it would violate. But a good idea.

The irony of an intelligence agency publishing a "fact book" in the first place is thick.

Why? It's an excellent recruiting tool. I used to read it as a kid (along with every other paper or digital encyclopedia I could get my hands on), and it certainly made me interested in the CIA.

Why?

Because intelligence agencies generally have a vested interest in spreading subtle propaganda, such as by distorting facts.

Now, I have yet to see any cases of the CIA doing this to the World Factbook, since that would tank its credibility, but I also don't browse the Factbook too often.


You are looking at the trees, and missing the forest. The subtle propaganda that the Factbook exists to spread is “the CIA is a neutral and trustworthy gatherer and purveyor of facts”.

I think that’s a secondary or even tertiary goal. The primary goal is to provide a public service to public and private parties who want to become better informed and to show the American people that their tax dollars are at work and reduce the risk of having their funding get cut.

The part before the “and” is the how of the propaganda I described, the part after the “and” is one of the outcomes the propaganda is intended to influence; neither is an alternative to the propaganda function.

I think the problem is people are acting like propaganda is inherently bad, so this subconsciously comes across as “the CIA is problematic because they have a source of factlettes for people to peruse”.

They have multiple competing interests. One of their interests is telling the truth to their local military and politicians - getting caught in a lie to their side is the worst that could happen to them.

The world factbook was mostly things that the military or politicians might care about the truth of, and data they need anyway. Mostly what is there were things where there wouldn't be much value in spreading lies - and what value that might have is outweighed by you can fact check everything (with a lot of work) so lies are likely to be caught.

Not saying they are perfect, but this isn't a place where I would expect they would see distorting facts help them.


> One of their interests is telling the truth to their local military and politicians - getting caught in a lie to their side is the worst that could happen to them.

It's definitely not the worst that can happen. Happens fairly often - google: CIA lying to congress. Getting audited is the worst that thing that happens to the CIA. ie The U.S. Government Accountability Office (GAO) last actively audited the Central Intelligence Agency (CIA) in the early 1960s, specifically discontinuing such work around 1962.


The worst that can happen is congress gets interested in a way that cuts their budget. An audit is one potential step on that path.

Fair

I wrote about one such case in another discussion: https://news.ycombinator.com/item?id=46901003

I remember a few amusing examples which weren't strictly inaccurate but were pretty blatant official lines, like how the US uniquely got to stress a "strong democratic tradition" as its political system, whereas everywhere else in the Western world was just "parliamentary democracy" or "constitutional monarchy" and at least the Cold War era versions had a "Communists" line item which purported to identify how few people in democratic societies were members of Communist parties...

The degradation does not need to be in the inference it can be in how often inference is used.

It is closed source but the algorithms that decide what Claude code does when, could behave differently when the API responses are slower. Maybe it does fewer investigatory greps or performs fewer tasks to get to “an” answer faster and with less load.


WTF, is a harness issue. You have to be more clear.


the issue is unrelated to the foundational model but rather the prompts and tool calling that encapsulate the model


Just ten more years! Take my money!


Yea that's right, 10 years for you in your car. Right now, for people who own Teslas.


Unless it’s icy or rainy or you’re on a dirty road and the cameras can’t see anything. Enjoy getting out to wash your camera lenses off every quarter mile.


Apple's locked down ecosystem is enabling the rollout of Digital ID which will eventually be required for Internet access and age verification law. This is why Google is locking down their ecosystem now too.


Call me what you want but it is my belief that the reason google is locking down and Apple refuses to budge is that in the near term future our mobile devices will become our identity online and in public.

Apple already offers digital ID in some states. They can do this partly because they can guarantee to the gov’t the ID is genuine because the user cannot modify the system.

Google needs to be able to do the same thing.

Age verification laws for online services will actually require something like a digital ID and Apple and Google want to be the providers.


wild to me these two companies have always been at odds and it is playing out on an even bigger stage now.


they have not always been at odds. CUDA was even supported on MacOSX back in the day.


Git is already distributed. We don’t need a hub for it. Just stop using GitHub it is a Microsoft product.

Not sure how open source got bamboozled into paying rent to Microsoft of all companies.


The SDK bundles Claude code and uses it for its agentic work. The SDK really only lets you control the UI layer. It als doesn’t yet fully support plan mode.


I use the SDK in my app and it works fine with plan mode. I don't deal with auth at all. I detect if the CLI is installed and it just reuses whatever auth the user has already setup. Works fine.


> I detect if the CLI is installed and it just reuses whatever auth the user has already setup.

Isn't this what they just explicitly banned?


no, they banned use of the model without the CLI harness/SDK when using the subscription plans. Opencode was spoofing requests as if they were coming from claude code CLI, and controlling the agent loop / tool call totally internally. Anthropic wants subscription plans to use the CLI/SDK.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: