More

kpw94 · 2025-12-16T20:45:59 1765917959

Very cool! And important for sure, thank you.

Few questions:

- is the stack to index those open source?

- is there some standardized APIs each municipality provides, or do you go through the tedious task of building a per-municipality crawling tool?

- how often do you refresh the data? Checked a city, it has meeting minutes until 6/17, but the official website has more recent minutes (up to 12/2 at least)

phildini · 2025-12-16T21:33:36 1765920816

Thanks for asking!

- The framework for crawling is open-source. https://github.com/civicband

- There is absolutely not a standardized API for nearly any of this. I build generalized crawlers when I can, and then build custom crawlers when I need.

- Can you let me know which city? The crawlers run for every municipality at least once every day, so that's probably a bug

kpw94 · 2025-12-09T21:40:02 1765316402

> I've personally decided to just rent systems with GPUs from a cloud provider and setup SSH tunnels to my local system.

That's a good idea!

Curious about this, if you don't mind sharing:

- what's the stack ? (Do you run like llama.cpp on that rented machine?)

- what model(s) do you run there?

- what's your rough monthly cost? (Does it come up much cheaper than if you called the equivalent paid APIs)

clusterhacks · 2025-12-09T22:46:12 1765320372

I ran ollama first because it was easy, but now download source and build llama.cpp on the machine. I don't bother saving a file system between runs on the rented machine, I build llama.cpp every time I start up.

I am usually just running gpt-oss-120b or one of the qwen models. Sometimes gemma? These are mostly "medium" sized in terms of memory requirements - I'm usually trying unquantized models that will easily run on an single 80-ish gb gpu because those are cheap.

I tend to spend $10-$20 a week. But I am almost always prototyping or testing an idea for a specific project that doesn't require me to run 8 hrs/day. I don't use the paid APIs for several reasons but cost-effectiveness is not one of those reasons.

Juminuvi · 2025-12-10T02:36:34 1765334194

I know you say you don't use the paid apis, but renting a gpu is something I've been thinking about and I'd be really interested in knowing how this compares with paying by the token. I think gpt-oss-120b is 0.10/input 0.60/output per million tokens in azure. In my head this could go a long way but I haven't used gpt oss agentically long enough to really understand usage. Just wondering if you know/be willing to share your typical usage/token spend on that dedicated hardware?

KronisLV · 2025-12-10T10:32:47 1765362767

For comparison, here's my own usage with various cloud models for development:

  * Claude in December: 91 million tokens in, 750k out
  * Codex in December: 43 million tokens in, 351k out
  * Cerebras in December: 41 million tokens in, 301k out
  * (obviously those figures above are so far in the month only)
  * Claude in November: 196 million tokens in, 1.8 million out
  * Codex in November: 214 million tokens in, 4 million out
  * Cerebras in November: 131 million tokens in, 1.6 million out
  * Claude in October: 5 million tokens in, 79k out
  * Codex in October: 119 million tokens in, 3.1 million out

As for Cerebras in October, I don't have the data because they don't show the Qwen3 Coder model that was deprecated, but it was way more: https://blog.kronis.dev/blog/i-blew-through-24-million-token...

In general, I'd say that for the stuff I do my workloads are extremely read heavy (referencing existing code, patterns, tests, build and check script output, implementation plans, docs etc.), but it goes about like this:

  * most fixed cloud subscriptions will run out really quickly and will be insufficient (Cerebras being an exception)
  * if paying per token, you *really* want the provider to support proper caching, otherwise you'll go broke
  * if you have local hardware that is great, but it will *never* compete with the cloud models, so your best bet is to run something good enough, basically cover all of your autocomplete needs, and also with tools like KiloCode an advanced cloud model can do the planning and a simpler local model do the implementation, then the cloud model validate the output

adam_patarino · 2025-12-12T16:12:38 1765555958

This is the perfect use case for local models. It's why we set out to create cortex.build! A local LLM

clusterhacks · 2025-12-10T14:32:28 1765377148

Sorry, I don't much track or keep up with those specifics other than knowing I'm not spending much per week. My typical scenario is to spin up an instance that costs less than $2/hr for 2-4 hours. It's all just exploratory work really. Sometimes I'm running a script that is making a call to the LLM server api, other times I'm just noodling around in the web chat interface.

bigiain · 2025-12-10T01:19:21 1765329561

I don't suppose you have (or would be interested in writing) a blog post about how you set that up? Or maybe a list of links/resources/prompts you used to learn how to get there?

clusterhacks · 2025-12-10T02:22:19 1765333339

No, I don't blog. But I just followed the docs for starting an instance on lambda.ai and the llama.cpp build instructions. Both are pretty good resources. I had already setup an SSH key with lambda and the lambda OS images are linux pre-loaded with CUDA libraries on startup.

Here are my lazy notes + a snippet of the history file from the remote instance for a recent setup where I used the web chat interface built into llama.cpp.

I created an instance gpu_1x_gh200 (96 GB on ARM) at lambda.ai.

connected from terminal on my box at home and setup the ssh tunnel.

ssh -L 22434:127.0.0.1:11434 ubuntu@<ip address of rented machine - can see it on lambda.ai console or dashboard>

  Started building llama.cpp from source, history:    
     21  git clone   https://github.com/ggml-org/llama.cpp
     22  cd llama.cpp
     23  which cmake
     24  sudo apt list | grep libcurl
     25  sudo apt-get install libcurl4-openssl-dev
     26  cmake -B build -DGGML_CUDA=ON
     27  cmake --build build --config Release

MISTAKE on 27, SINGLE-THREADED and slow to build see -j 16 below for faster build

     28  cmake --build build --config Release -j 16
     29  ls
     30  ls build
     31  find . -name "llama.server"
     32  find . -name "llama"
     33  ls build/bin/
     34  cd build/bin/
     35  ls
     36  ./llama-server -hf ggml-org/gpt-oss-120b-GGUF -c 0 --jinja

MISTAKE, didn't specify the port number for the llama-server

     37  clear;history
     38  ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking -c 0 --jinja --port 11434
     39  ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking.gguf -c 0 --jinja --port 11434
     40  ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking-GGUF -c 0 --jinja --port 11434
     41  clear;history

I switched to qwen3 vl because I need a multimodal model for that day's experiment. Lines 38 and 39 show me not using the right name for the model. I like how llama.cpp can download and run models directly off of huggingface.

Then pointed my browser at http//:localhost:22434 on my local box and had the normal browser window where I could upload files and use the chat interface with the model. That also gives you an openai api-compatible endpoint. It was all I needed for what I was doing that day. I spent a grand total of $4 that day doing the setup and running some NLP-oriented prompts for a few hours.

bigiain · 2025-12-10T07:17:02 1765351022

Thanks, much appreciated.

kpw94 · 2025-11-12T18:19:36 1762971576

> $1 and $2 coins in wide circulation (instead of worn-out $1 bills).

This has its own pros/cons...

One advantage of $1 bill over coin is the majority of people in US don't need a wallet with zipper to hold coins. Five $1 bills is much less bulky and much lighter than five $1 CAD or five 1€ coins

wasabi991011 · 2025-11-13T00:15:36 1762992936

Of course everything has its pros and cons, but not all of them are worth considering.

The amount of wallets with zipper is a country is not worth considering when discussing what coins should be minted.

jamincan · 2025-11-13T13:40:36 1763041236

I would contend that 5 bills are more bulky than 5 coins. The only upside of dealing with US bills when travelling in the US is that you feel like a millionaire when you pull out the massive wad of bills from your pocket.

kpw94 · 2025-11-10T23:52:17 1762818737

> one can absolutely check the text to remove all occurrences of Indiana Jones

How do you handle this kind of prompt:

“Generate an image of a daring, whip-wielding archaeologist and adventurer, wearing a fedora hat and leather jacket. Here's some back-story about him: With a sharp wit and a knack for languages, he travels the globe in search of ancient artifacts, often racing against rival treasure hunters and battling supernatural forces. His adventures are filled with narrow escapes, booby traps, and encounters with historical and mythical relics. He’s equally at home in a university lecture hall as he is in a jungle temple or a desert ruin, blending academic expertise with fearless action. His journey is as much about uncovering history’s secrets as it is about confronting his own fears and personal demons.”

Try copy-pasting it in any image generation model. It looks awfully like Indiana Jones for all my attempts, yet I've not referenced Indiana Jones even once!

runeblaze · 2025-11-11T09:17:11 1762852631

Emmmm sure, but throw this to a human artist who has not heard of Indiana Jones and see if they draw something alike.

kpw94 · 2025-10-20T23:27:25 1761002845

Nice!

So the cache check tries to find if a previously existing text embedding has >0.8 match with the current text.

If you get a cache hit here, iiuc, you return that matched' text label right away. But do you also insert a text embedding of the current text in the text embeddings table? Or do you only insert it in case of cache miss?

From reading the GitHub readme it seems you only "store text embedding for future lookups" in the case of cache miss. This is by design to keep the text embedding table not too big?

frenchmajesty · 2025-10-20T23:42:52 1761003772

Op here. Yes that's right. We do also insert the current text embedding on misses to expand the boundaries of the cluster.

For instance: I love McDonalds (1). I love burgers. (0.99) I love cheeseburgers with ketchup (?).

This is a bad example but in this case the last text could end up right at the boundary of the similarity to that 1st label if we did not store the 2nd, which could cause a cluster miss we don't want.

We only store the text on cache misses, though you could do both. I had not considered that idea but it make sense. I'm not very concerned about the dataset size because vector storage is generally cheap (~ $2/mo for 1M vectors) and the savings in $$$ not spend generating tokens covers for that expense generously.

kpw94 · 2025-09-21T02:39:07 1758422347

A workaround: Long press on image, "open image in a new tab"

kpw94 · 2025-09-02T17:38:13 1756834693

Yeah the landscpe when there were many more Search engines must have been exactly the same...

I think the eng teams behind those were just more competent / more frugal on their processing.

And since there wasn't any AWS equivalent, they had to be better citizens since well-known IP range ban for the crawled websites was trivial.

danudey · 2025-09-02T21:20:55 1756848055

It's worth noting that search engines back then (and now? except the AI ones) generally tended to follow robots.txt, which meant that if there were heavy areas of your site that you didn't want them to index you could filter them out and let them just follow static pages. You could block off all of /cgi-bin/ for example and then they would never be hitting your CGI scripts - useful if your guestbook software wrote out static files to be served, for example.

The search engines were also limited in resources, so they were judicious about what they fetched, when, and how often; optimizing their own crawlers saved them money, and in return it also saved the websites too. Even with a hundred crawlers actively indexing your site, they weren't going to index it more than, say, once a day, and 100 requests in a day isn't really that much even back then.

Now, companies are pumping billions of dollars into AI; budgets are infinite, limits are bypassed, and norms are ignored. If the company thinks it can benefit from indexing your site 30 times a minute then it will, but even if it doesn't benefit from it there's no reason for them to stop it from doing so because it doesn't cost them anything. They cannot risk being anything other than up-to-date, because if users are coming to you asking about current events and why space force is moving to Alabama and your AI doesn't know but someone else's does, then you're behind the times.

So in the interests of maximizing short-term profit above all else - which is the only thing AI companies are doing in any way shape or form - they may as well scrape every URL on your site once per second, because it doesn't cost them anything and they don't care if you go bankrupt and shut down.

acdha · 2025-09-02T19:18:46 1756840726

Bandwidth cost more then, so the early search engines had an inventive not to massively increase their own costs if nothing else.

ccgreg · 2025-09-02T17:55:06 1756835706

The blekko search engine index was only 1 billion pages, compared to Common Crawl Foundation's crawl of 3 billion webpages per month.

kpw94 · 2025-06-10T06:40:57 1749537657

> military forces exist to protect the country from existential threats — such as an invasion or rebellion — not to enforce the law.

serious question: are Countries such as Italy, France etc not a democracy?

All of them are, verbatim from wikipedia, "a military force with law enforcement duties among the civilian population.". Ditto for spain Guardia Civil, and many of the countries listed in that same wiki page: Algeria, Netherlands, Poland, Argentina, Romania, Turkey, Ukraine, Chile, France, Italy, Portugal, Spain, ...

https://en.wikipedia.org/wiki/Gendarmerie

the_gipsy · 2025-06-10T10:12:33 1749550353

Having police not separated from military doesn't invalidate the democracy, it just makes it easier to subvert democracy at some point.

The spanish Guardia Civil is a very good example of a police force tied too deeply with the military. In 1981 some parts of the force attempted an actual coup, with one guy entering the parliament and shooting in the air (or ceiling).

https://en.wikipedia.org/wiki/1981_Spanish_coup_attempt

The continuity of the Guardia Civil after Franco's dictatorship is one of many vestiges that has not been removed due to fears of creating an instability leading to some coup and a reversal to fascism. IMHO this may have been justified the years immediately after Franco's death, but should have been addressed at some point. See the 1981 coup as for why "appeasing" the oppressors usually doesn't work out, or even works out for the oppressors.

anthk · 2025-06-10T12:02:43 1749556963

The Guardia Civil itself predates Franco, and to be fair some GC agents fought for the Republican side in the war.

the_gipsy · 2025-06-10T12:41:41 1749559301

True. But AFAIK they were a crucial element of the regime's oppression, especially in rural areas.

Their logo even today still contains a fasces[1] shield, which as been added during the Franco regime.

[1]: https://en.wikipedia.org/wiki/Fasces

forty · 2025-06-10T06:59:16 1749538756

Gendarmerie are simply policemen with a military status which give them some duty (like I think they cannot strike) and some benefits (earlier retirement) but they are still really a police force in reality. I don't think it would look good to send actual army to fight citizens, and I don't think the army would appreciate it either (it might have been done already, no idea)

Y_Y · 2025-06-10T07:43:58 1749541438

What you say is true, but I'd add that Gendarms/Guardia Civil/Carabinieri etc.; tend to hang around carrying big guns, are responsible to the country as a whole (rather than the local community), are under the relevant defence ministry (while also reporting to the interior ministry).

In my experience they don't act at all like normal cops, and sometimes can be in conflict with them. The only interactions I ever hear of with citizens is if they beat the shit out of someone. You're not going to be going to them for a lost phone or a cat in a tree.

vladvasiliu · 2025-06-10T08:12:56 1749543176

I don't know about the other forces mentioned here, but the French Gendarmerie are pretty much "regular police" as far as the people are concerned. The main difference with "actual regular police" is that they tend to operate in sparsely populated areas instead of large cities.

But they absolutely will do traffic police on highways, intervene to reason with a loud neighbor, etc. They'll also routinely show up during large protests in big cities.

The "big-gun carrying" Gendarmerie is a special unit, the GIGN, probably akin to US' SWAT teams. They'll intervene when "very dangerous" people are involved, think hostage situations or the like. "Regular police" also has a similar outfit.

Y_Y · 2025-06-10T10:27:54 1749551274

Thank you for the correction. Indeed the main force of the French Gendarmerie (Gendarmerie Départementale) is much more like a "regular" police force than I described.

The unit I was confusing with the Gendarmerie as a whole was the Mobile Gendarmerie, whose role is more similar to the the Guardia Civil and Carabinieri.

https://en.wikipedia.org/wiki/Mobile_Gendarmerie

I wouldn't have included GIGN, since I they appear to be much smaller and have a more "special”/"tactical" role.

I'll also note that the the Gendarmerie don't appear to be sending a team to the AWC (the olympics of smashing through the ceiling and shooting you in your bed) in two weeks, whereas the Guardia Civil and Carabinieri will. This may be a geopolitical thing though.

https://www.kasotc.com/14th-annual-warrior-competition

seadan83 · 2025-06-10T19:17:42 1749583062

Lived in Paris 30 years ago, my experience:

Seeing Gens D'Armes on the street was somewhat common. The Gens D'Armes are akin to 'heavy' police and are a show of force. The Gens D'Armes were pretty common to see in the subways, airports, and/or just on patrol. They were Gens D'Armes stations in the city just how there were also regular police stations. Gens D'Armes patrols were a bit distinct from other police patrols, almost always larger groups, around 5 to 7 people with long-guns and plate carriers. Meanwhile regular police had much lighter weapons, no body armor, and very rarely were in groups of more than 2 or 3.

vladvasiliu · 2025-06-10T20:09:29 1749586169

Times have changed. Nowadays, the gendarmes only show up when protests are expected to turn into rioting (so basically most of them). You don't see them around Paris in day to day life. We now have actual military patrolling the streets, "Operation Sentinelle". They're supposed to show some muscle to discourage terrorism. They are actual military, with actual military weapons. This has been going on for multiple years, I don't remember when it started.

However, regular police now wear bulletproof vests, too, even when randomly patrolling the streets. Since some years ago, we now have "municipal police", basically police which answer to the mayor [0], as opposed to the state, with somewhat fewer powers. But even they walk around with bullet-proof vests.

---

[0] In France, "the police" usually means "Police Nationale", which answers to the Prefect, who represents the State in the local Jurisdiction (département) – they are not elected, but appointed by the Interior Ministry. The "Municial police" answers to the City, but they're not allowed to conduct all the operations that the National Police do. The City means the Mayor, who's elected by the local population.

lloeki · 2025-06-13T16:30:06 1749832206

> The Gens D'Armes are akin to 'heavy' police and are a show of force

I've only seen that when they show up as support for or operating in a similar role as CRS† (crowd control, security for major events) which indeed would be Gendarmerie Mobile but that's a far cry from the range of operational responsibilities of Gendarmerie as a whole.

Turns out this is probably what city dwellers in France would only see of Gendarmerie, because Police Nationale and Municipale (city) typically have much more presence in cities than countryside, and the other way around for Gendarmerie.

† https://en.wikipedia.org/wiki/Compagnies_Républicaines_de_Sé...

closewith · 2025-06-10T07:16:30 1749539790

That is not universally true. A Gendarmerie is literally a military force with law enforcement duties and many are exactly that.

In the Netherlands, the Royal Marechaussee are literal soldiers who perform military police duties and also many civilian policing duties, but all of them are soldiers first.

close04 · 2025-06-10T08:18:08 1749543488

> A Gendarmerie is literally a military force with law enforcement duties

The second part is a huge differentiator from "normal" military. A police force even if administratively under the military has one crucial differentiator: their daily duties and training revolve almost exclusively around policing civilians from the same country. Military training and tactics are overwhelmingly aimed at dealing with foreign enemy combatants, mainly other military forces.

The methods give away the intentions and expected outcome. The US already has a very "militarized" police force. You send actual military only if you want to inflict the maximum amount of damage, and with that threat overwhelmingly scare the country into compliance.

closewith · 2025-06-10T08:35:34 1749544534

> their daily duties and training revolve almost exclusively around policing mainly civilians, citizens of the same country.

That is the part that is not universally true. There are plenty of Gendarmeries who are soldiers first, with combat training and ethos, who also perform policing duties, the Marechaussee included.

close04 · 2025-06-10T09:05:41 1749546341

> plenty of Gendarmeries who are soldiers first

Fair enough, but Wikipedia confirms that they all have civilian law enforcement and police duties so clearly their training, tactics, and experience revolve heavily around dealing with civilians.

I'll still take that over "soldiers only", even more with US's very active military where the soldiers routinely see active combat. Both the theory and practice shapes their "soldier vs. enemy combatant" world view. That's a hammer if I've ever seen one.

davedx · 2025-06-10T10:04:05 1749549845

It's not the same though:

* when used domestically, it's under the Minister of Justice and Security

* there's also no Dutch equivalent of the U.S. presidency with unilateral executive control over the military

I'd argue this kind of danger is something you get more in presidential systems. Not that we all shouldn't be wary of military forces within our civilian populations.

forty · 2025-06-10T09:42:40 1749548560

Yes, sorry, I was answering only regarding the French gendarmerie, which I thought was made clear by the fact it's a French word but it turns out to be used more broadly.

aredox · 2025-06-10T09:06:07 1749546367

Superficial argument. The "gendarmerie" is exclusively trained in law enforcement. The military aspect is only relative to organisational aspects.

jxjnskkzxxhx · 2025-06-10T09:03:47 1749546227

In Portugal, the Guarda Civil are cops in rural areas. I have no special insight into their training or hierarchy, but I can tell you that in practice they interact with the population like cops, not like soldiers. E.g. you wouldn't report shoplifting to the army, but you can report to the Guarda Civil.

So I don't think your comment makes any sense, at least in Portugal.

tiagod · 2025-06-10T15:19:46 1749568786

There is no "Guarda Civil" in Portugal. It's called Guarda Nacional Republicana (GNR).

jxjnskkzxxhx · 2025-06-10T16:08:08 1749571688

I haven't lived there in almost 15 years. I stand corrected. In fact I'm closer in time to having lived in Spain than in Portugal, that must be the origin of my confusion.

In any case, I hope you agree my description of the GNR was accurate in substance.

tiagod · 2025-06-10T16:59:40 1749574780

Yes you are correct. They also patrol some highways (although I believe some are the jurisdiction of PSP)

AnimalMuppet · 2025-06-10T13:21:55 1749561715

If the US has laws that forbid that, and other nations have laws that establish that, then the US military being used for police activities is threatening to democracy - or at least to the rule of law - in a way that it is not threatening in other countries.

Other countries can do that if they want. It may or may not be a threat to them. But in the US, it's absolutely a threat to democracy, because it's already the executive deploying the military against the law.

JumpCrisscross · 2025-06-10T06:44:54 1749537894

> serious question: are Countries such as Italy, France etc not a democracy?

They are, but not in the the "framework of US constitutional democracy." A system for which we have more evidence of stability than either of Italy or France's modern republics. (Note, too, les gendarmes' heritage: imperial France. Also, gendarmes aren't usually deployed overseas. They are, in a sense, more similar to the FBI than the U.S. Marines.)

gabaix · 2025-06-10T07:34:30 1749540870

I have always found confusing the existence of the gendarmes. They are indeed a vestigial force of the XIXth century, and should be transformed into a regular police force.

aredox · 2025-06-10T09:13:12 1749546792

On the contrary, they are more relevant than ever in today's era of peacekeeping and anti-terrorism activities. They are fundamental to the stabilisation of the Balkans, for example. They fill the gap between full war and "normal" (punctual) criminality.

gabaix · 2025-06-10T09:59:35 1749549575

The issues are two-fold

1- the territorial split between gendarmerie/police within the French territory

2- the fact the gendarmes for police work report to the Ministry of Defense.

If one had to design the police system from crash, they would likely merge police and gendarmes for police work.

BrandoElFollito · 2025-06-10T13:32:13 1749562333

You forgot 3: a hatred between the organizations for ego reasons (not everyone, not everywhere).

The split is nonsense today.

eldgfipo · 2025-06-10T07:11:20 1749539480

As a French, I'd argue we're a flawed democracy. Shame on us when we compare ourselves to Scandinavian countries.

dontlaugh · 2025-06-10T09:17:19 1749547039

Those are bad too. Anyone that grew up in a country with a gendarmerie knows they are the most violent, unpleasant and fascist (personally, not like "all cops are fascist") people you’ll ever meet.

hotmeals · 2025-06-10T08:07:00 1749542820

Some of the cases you mention involve "military" police who are under the authority of the Ministry of the Interior, instead of the Ministry of Defense. Many also are not the only police force, in Chile the investigative duties fall to the non-military PDI.

IMO as Chilean, it's a pretty bad thing democratically, for both historical (dictatorship) and more recent reasons. Still, there is a clear difference between when the police with deep ties to the army enforce the law and when actual troops do it.

While copper Gutiérrez and grunt Herrera both technically have the rank of corporal, one mostly writes tickets, deals with noise complaints, and has riot training, while the other only knows how to march and shoot an assault rifle.

The actually important thing is that this is testing the waters. Trump will use the troops for flimsier and flimsier reasons.

NOTE: Chilean police are semi-routinely brutal; this is not an endorsement.

kpw94 · 2025-05-28T18:32:05 1748457125

Books have a back cover for that reason: so you can read it before buying.

Long-form articles could have a back cover summary too, or an enticing intro... and some substack paid articles do that already: they let you read an intro and cut before going in the interesting details.

But for short newspapers articles it becomes harder to do based on topic. If the summary has to give out 90% of the information to not be too vague, you may then feel robbed paying for it once you realize the remaining 10% wasn't that useful.

jaredwiener · 2025-05-28T19:32:20 1748460740

Not to mention, the reporting that went into the headline or blurb is what is expensive. You got the value by reading it for free.

https://blog.forth.news/a-business-model-for-21st-century-ne...

kpw94 · 2025-05-02T15:06:04 1746198364

So first linux distribution was this one Feb 1992.

And first linux distribution with a GUI was "TAMU linux", 3 months later: https://lwn.net/Articles/91371/

Both were released by universities