- is there some standardized APIs each municipality provides, or do you go through the tedious task of building a per-municipality crawling tool?
- how often do you refresh the data? Checked a city, it has meeting minutes until 6/17, but the official website has more recent minutes (up to 12/2 at least)
- There is absolutely not a standardized API for nearly any of this. I build generalized crawlers when I can, and then build custom crawlers when I need.
- Can you let me know which city? The crawlers run for every municipality at least once every day, so that's probably a bug
I ran ollama first because it was easy, but now download source and build llama.cpp on the machine. I don't bother saving a file system between runs on the rented machine, I build llama.cpp every time I start up.
I am usually just running gpt-oss-120b or one of the qwen models. Sometimes gemma? These are mostly "medium" sized in terms of memory requirements - I'm usually trying unquantized models that will easily run on an single 80-ish gb gpu because those are cheap.
I tend to spend $10-$20 a week. But I am almost always prototyping or testing an idea for a specific project that doesn't require me to run 8 hrs/day. I don't use the paid APIs for several reasons but cost-effectiveness is not one of those reasons.
I know you say you don't use the paid apis, but renting a gpu is something I've been thinking about and I'd be really interested in knowing how this compares with paying by the token. I think gpt-oss-120b is 0.10/input 0.60/output per million tokens in azure. In my head this could go a long way but I haven't used gpt oss agentically long enough to really understand usage. Just wondering if you know/be willing to share your typical usage/token spend on that dedicated hardware?
For comparison, here's my own usage with various cloud models for development:
* Claude in December: 91 million tokens in, 750k out
* Codex in December: 43 million tokens in, 351k out
* Cerebras in December: 41 million tokens in, 301k out
* (obviously those figures above are so far in the month only)
* Claude in November: 196 million tokens in, 1.8 million out
* Codex in November: 214 million tokens in, 4 million out
* Cerebras in November: 131 million tokens in, 1.6 million out
* Claude in October: 5 million tokens in, 79k out
* Codex in October: 119 million tokens in, 3.1 million out
In general, I'd say that for the stuff I do my workloads are extremely read heavy (referencing existing code, patterns, tests, build and check script output, implementation plans, docs etc.), but it goes about like this:
* most fixed cloud subscriptions will run out really quickly and will be insufficient (Cerebras being an exception)
* if paying per token, you *really* want the provider to support proper caching, otherwise you'll go broke
* if you have local hardware that is great, but it will *never* compete with the cloud models, so your best bet is to run something good enough, basically cover all of your autocomplete needs, and also with tools like KiloCode an advanced cloud model can do the planning and a simpler local model do the implementation, then the cloud model validate the output
Sorry, I don't much track or keep up with those specifics other than knowing I'm not spending much per week. My typical scenario is to spin up an instance that costs less than $2/hr for 2-4 hours. It's all just exploratory work really. Sometimes I'm running a script that is making a call to the LLM server api, other times I'm just noodling around in the web chat interface.
I don't suppose you have (or would be interested in writing) a blog post about how you set that up? Or maybe a list of links/resources/prompts you used to learn how to get there?
No, I don't blog. But I just followed the docs for starting an instance on lambda.ai and the llama.cpp build instructions. Both are pretty good resources. I had already setup an SSH key with lambda and the lambda OS images are linux pre-loaded with CUDA libraries on startup.
Here are my lazy notes + a snippet of the history file from the remote instance for a recent setup where I used the web chat interface built into llama.cpp.
I created an instance gpu_1x_gh200 (96 GB on ARM) at lambda.ai.
connected from terminal on my box at home and setup the ssh tunnel.
ssh -L 22434:127.0.0.1:11434 ubuntu@<ip address of rented machine - can see it on lambda.ai console or dashboard>
Started building llama.cpp from source, history:
21 git clone https://github.com/ggml-org/llama.cpp
22 cd llama.cpp
23 which cmake
24 sudo apt list | grep libcurl
25 sudo apt-get install libcurl4-openssl-dev
26 cmake -B build -DGGML_CUDA=ON
27 cmake --build build --config Release
MISTAKE on 27, SINGLE-THREADED and slow to build see -j 16 below for faster build
28 cmake --build build --config Release -j 16
29 ls
30 ls build
31 find . -name "llama.server"
32 find . -name "llama"
33 ls build/bin/
34 cd build/bin/
35 ls
36 ./llama-server -hf ggml-org/gpt-oss-120b-GGUF -c 0 --jinja
MISTAKE, didn't specify the port number for the llama-server
I switched to qwen3 vl because I need a multimodal model for that day's experiment. Lines 38 and 39 show me not using the right name for the model. I like how llama.cpp can download and run models directly off of huggingface.
Then pointed my browser at http//:localhost:22434 on my local box and had the normal browser window where I could upload files and use the chat interface with the model. That also gives you an openai api-compatible endpoint. It was all I needed for what I was doing that day. I spent a grand total of $4 that day doing the setup and running some NLP-oriented prompts for a few hours.
> $1 and $2 coins in wide circulation (instead of worn-out $1 bills).
This has its own pros/cons...
One advantage of $1 bill over coin is the majority of people in US don't need a wallet with zipper to hold coins. Five $1 bills is much less bulky and much lighter than five $1 CAD or five 1€ coins
I would contend that 5 bills are more bulky than 5 coins. The only upside of dealing with US bills when travelling in the US is that you feel like a millionaire when you pull out the massive wad of bills from your pocket.
> one can absolutely check the text to remove all occurrences of Indiana Jones
How do you handle this kind of prompt:
“Generate an image of a daring, whip-wielding archaeologist and adventurer, wearing a fedora hat and leather jacket. Here's some back-story about him: With a sharp wit and a knack for languages, he travels the globe in search of ancient artifacts, often racing against rival treasure hunters and battling supernatural forces. His adventures are filled with narrow escapes, booby traps, and encounters with historical and mythical relics. He’s equally at home in a university lecture hall as he is in a jungle temple or a desert ruin, blending academic expertise with fearless action. His journey is as much about uncovering history’s secrets as it is about confronting his own fears and personal demons.”
Try copy-pasting it in any image generation model. It looks awfully like Indiana Jones for all my attempts, yet I've not referenced Indiana Jones even once!
So the cache check tries to find if a previously existing text embedding has >0.8 match with the current text.
If you get a cache hit here, iiuc, you return that matched' text label right away. But do you also insert a text embedding of the current text in the text embeddings table? Or do you only insert it in case of cache miss?
From reading the GitHub readme it seems you only "store text embedding for future lookups" in the case of cache miss. This is by design to keep the text embedding table not too big?
Op here. Yes that's right. We do also insert the current text embedding on misses to expand the boundaries of the cluster.
For instance: I love McDonalds (1). I love burgers. (0.99) I love cheeseburgers with ketchup (?).
This is a bad example but in this case the last text could end up right at the boundary of the similarity to that 1st label if we did not store the 2nd, which could cause a cluster miss we don't want.
We only store the text on cache misses, though you could do both. I had not considered that idea but it make sense. I'm not very concerned about the dataset size because vector storage is generally cheap (~ $2/mo for 1M vectors) and the savings in $$$ not spend generating tokens covers for that expense generously.
It's worth noting that search engines back then (and now? except the AI ones) generally tended to follow robots.txt, which meant that if there were heavy areas of your site that you didn't want them to index you could filter them out and let them just follow static pages. You could block off all of /cgi-bin/ for example and then they would never be hitting your CGI scripts - useful if your guestbook software wrote out static files to be served, for example.
The search engines were also limited in resources, so they were judicious about what they fetched, when, and how often; optimizing their own crawlers saved them money, and in return it also saved the websites too. Even with a hundred crawlers actively indexing your site, they weren't going to index it more than, say, once a day, and 100 requests in a day isn't really that much even back then.
Now, companies are pumping billions of dollars into AI; budgets are infinite, limits are bypassed, and norms are ignored. If the company thinks it can benefit from indexing your site 30 times a minute then it will, but even if it doesn't benefit from it there's no reason for them to stop it from doing so because it doesn't cost them anything. They cannot risk being anything other than up-to-date, because if users are coming to you asking about current events and why space force is moving to Alabama and your AI doesn't know but someone else's does, then you're behind the times.
So in the interests of maximizing short-term profit above all else - which is the only thing AI companies are doing in any way shape or form - they may as well scrape every URL on your site once per second, because it doesn't cost them anything and they don't care if you go bankrupt and shut down.
> military forces exist to protect the country from existential threats — such as an invasion or rebellion — not to enforce the law.
serious question: are Countries such as Italy, France etc not a democracy?
All of them are, verbatim from wikipedia, "a military force with law enforcement duties among the civilian population.". Ditto for spain Guardia Civil, and many of the countries listed in that same wiki page: Algeria, Netherlands, Poland, Argentina, Romania, Turkey, Ukraine, Chile, France, Italy, Portugal, Spain, ...
Having police not separated from military doesn't invalidate the democracy, it just makes it easier to subvert democracy at some point.
The spanish Guardia Civil is a very good example of a police force tied too deeply with the military. In 1981 some parts of the force attempted an actual coup, with one guy entering the parliament and shooting in the air (or ceiling).
The continuity of the Guardia Civil after Franco's dictatorship is one of many vestiges that has not been removed due to fears of creating an instability leading to some coup and a reversal to fascism. IMHO this may have been justified the years immediately after Franco's death, but should have been addressed at some point. See the 1981 coup as for why "appeasing" the oppressors usually doesn't work out, or even works out for the oppressors.
Gendarmerie are simply policemen with a military status which give them some duty (like I think they cannot strike) and some benefits (earlier retirement) but they are still really a police force in reality. I don't think it would look good to send actual army to fight citizens, and I don't think the army would appreciate it either (it might have been done already, no idea)
What you say is true, but I'd add that Gendarms/Guardia Civil/Carabinieri etc.; tend to hang around carrying big guns, are responsible to the country as a whole (rather than the local community), are under the relevant defence ministry (while also reporting to the interior ministry).
In my experience they don't act at all like normal cops, and sometimes can be in conflict with them. The only interactions I ever hear of with citizens is if they beat the shit out of someone. You're not going to be going to them for a lost phone or a cat in a tree.
I don't know about the other forces mentioned here, but the French Gendarmerie are pretty much "regular police" as far as the people are concerned. The main difference with "actual regular police" is that they tend to operate in sparsely populated areas instead of large cities.
But they absolutely will do traffic police on highways, intervene to reason with a loud neighbor, etc. They'll also routinely show up during large protests in big cities.
The "big-gun carrying" Gendarmerie is a special unit, the GIGN, probably akin to US' SWAT teams. They'll intervene when "very dangerous" people are involved, think hostage situations or the like. "Regular police" also has a similar outfit.
Thank you for the correction. Indeed the main force of the French Gendarmerie (Gendarmerie Départementale) is much more like a "regular" police force than I described.
The unit I was confusing with the Gendarmerie as a whole was the Mobile Gendarmerie, whose role is more similar to the the Guardia Civil and Carabinieri.
I wouldn't have included GIGN, since I they appear to be much smaller and have a more "special”/"tactical" role.
I'll also note that the the Gendarmerie don't appear to be sending a team to the AWC (the olympics of smashing through the ceiling and shooting you in your bed) in two weeks, whereas the Guardia Civil and Carabinieri will. This may be a geopolitical thing though.
Seeing Gens D'Armes on the street was somewhat common. The Gens D'Armes are akin to 'heavy' police and are a show of force. The Gens D'Armes were pretty common to see in the subways, airports, and/or just on patrol. They were Gens D'Armes stations in the city just how there were also regular police stations. Gens D'Armes patrols were a bit distinct from other police patrols, almost always larger groups, around 5 to 7 people with long-guns and plate carriers. Meanwhile regular police had much lighter weapons, no body armor, and very rarely were in groups of more than 2 or 3.
Times have changed. Nowadays, the gendarmes only show up when protests are expected to turn into rioting (so basically most of them). You don't see them around Paris in day to day life. We now have actual military patrolling the streets, "Operation Sentinelle". They're supposed to show some muscle to discourage terrorism. They are actual military, with actual military weapons. This has been going on for multiple years, I don't remember when it started.
However, regular police now wear bulletproof vests, too, even when randomly patrolling the streets. Since some years ago, we now have "municipal police", basically police which answer to the mayor [0], as opposed to the state, with somewhat fewer powers. But even they walk around with bullet-proof vests.
---
[0] In France, "the police" usually means "Police Nationale", which answers to the Prefect, who represents the State in the local Jurisdiction (département) – they are not elected, but appointed by the Interior Ministry. The "Municial police" answers to the City, but they're not allowed to conduct all the operations that the National Police do. The City means the Mayor, who's elected by the local population.
> The Gens D'Armes are akin to 'heavy' police and are a show of force
I've only seen that when they show up as support for or operating in a similar role as CRS† (crowd control, security for major events) which indeed would be Gendarmerie Mobile but that's a far cry from the range of operational responsibilities of Gendarmerie as a whole.
Turns out this is probably what city dwellers in France would only see of Gendarmerie, because Police Nationale and Municipale (city) typically have much more presence in cities than countryside, and the other way around for Gendarmerie.
That is not universally true. A Gendarmerie is literally a military force with law enforcement duties and many are exactly that.
In the Netherlands, the Royal Marechaussee are literal soldiers who perform military police duties and also many civilian policing duties, but all of them are soldiers first.
> A Gendarmerie is literally a military force with law enforcement duties
The second part is a huge differentiator from "normal" military. A police force even if administratively under the military has one crucial differentiator: their daily duties and training revolve almost exclusively around policing civilians from the same country. Military training and tactics are overwhelmingly aimed at dealing with foreign enemy combatants, mainly other military forces.
The methods give away the intentions and expected outcome. The US already has a very "militarized" police force. You send actual military only if you want to inflict the maximum amount of damage, and with that threat overwhelmingly scare the country into compliance.
> their daily duties and training revolve almost exclusively around policing mainly civilians, citizens of the same country.
That is the part that is not universally true. There are plenty of Gendarmeries who are soldiers first, with combat training and ethos, who also perform policing duties, the Marechaussee included.
Fair enough, but Wikipedia confirms that they all have civilian law enforcement and police duties so clearly their training, tactics, and experience revolve heavily around dealing with civilians.
I'll still take that over "soldiers only", even more with US's very active military where the soldiers routinely see active combat. Both the theory and practice shapes their "soldier vs. enemy combatant" world view. That's a hammer if I've ever seen one.
* when used domestically, it's under the Minister of Justice and Security
* there's also no Dutch equivalent of the U.S. presidency with unilateral executive control over the military
I'd argue this kind of danger is something you get more in presidential systems. Not that we all shouldn't be wary of military forces within our civilian populations.
Yes, sorry, I was answering only regarding the French gendarmerie, which I thought was made clear by the fact it's a French word but it turns out to be used more broadly.
In Portugal, the Guarda Civil are cops in rural areas. I have no special insight into their training or hierarchy, but I can tell you that in practice they interact with the population like cops, not like soldiers. E.g. you wouldn't report shoplifting to the army, but you can report to the Guarda Civil.
So I don't think your comment makes any sense, at least in Portugal.
I haven't lived there in almost 15 years. I stand corrected. In fact I'm closer in time to having lived in Spain than in Portugal, that must be the origin of my confusion.
In any case, I hope you agree my description of the GNR was accurate in substance.
If the US has laws that forbid that, and other nations have laws that establish that, then the US military being used for police activities is threatening to democracy - or at least to the rule of law - in a way that it is not threatening in other countries.
Other countries can do that if they want. It may or may not be a threat to them. But in the US, it's absolutely a threat to democracy, because it's already the executive deploying the military against the law.
> serious question: are Countries such as Italy, France etc not a democracy?
They are, but not in the the "framework of US constitutional democracy." A system for which we have more evidence of stability than either of Italy or France's modern republics. (Note, too, les gendarmes' heritage: imperial France. Also, gendarmes aren't usually deployed overseas. They are, in a sense, more similar to the FBI than the U.S. Marines.)
I have always found confusing the existence of the gendarmes. They are indeed a vestigial force of the XIXth century, and should be transformed into a regular police force.
On the contrary, they are more relevant than ever in today's era of peacekeeping and anti-terrorism activities. They are fundamental to the stabilisation of the Balkans, for example. They fill the gap between full war and "normal" (punctual) criminality.
Those are bad too. Anyone that grew up in a country with a gendarmerie knows they are the most violent, unpleasant and fascist (personally, not like "all cops are fascist") people you’ll ever meet.
Some of the cases you mention involve "military" police who are under the authority of the Ministry of the Interior, instead of the Ministry of Defense. Many also are not the only police force, in Chile the investigative duties fall to the non-military PDI.
IMO as Chilean, it's a pretty bad thing democratically, for both historical (dictatorship) and more recent reasons. Still, there is a clear difference between when the police with deep ties to the army enforce the law and when actual troops do it.
While copper Gutiérrez and grunt Herrera both technically have the rank of corporal, one mostly writes tickets, deals with noise complaints, and has riot training, while the other only knows how to march and shoot an assault rifle.
The actually important thing is that this is testing the waters. Trump will use the troops for flimsier and flimsier reasons.
NOTE: Chilean police are semi-routinely brutal; this is not an endorsement.
Books have a back cover for that reason: so you can read it before buying.
Long-form articles could have a back cover summary too, or an enticing intro... and some substack paid articles do that already: they let you read an intro and cut before going in the interesting details.
But for short newspapers articles it becomes harder to do based on topic. If the summary has to give out 90% of the information to not be too vague, you may then feel robbed paying for it once you realize the remaining 10% wasn't that useful.
Few questions:
- is the stack to index those open source?
- is there some standardized APIs each municipality provides, or do you go through the tedious task of building a per-municipality crawling tool?
- how often do you refresh the data? Checked a city, it has meeting minutes until 6/17, but the official website has more recent minutes (up to 12/2 at least)
reply