More

nkko · 2026-03-03T18:16:07 1772561767

FWIW I work at Steel (not the OP). While we’ve been iterating on the “right shape” for agent tooling, I’ve been building a benchmark harness to measure how different surfaces affect real web task completion: raw API context, CLI-only, opinionated “skills” (structured outputs + artifact capture), and combinations.

If you’ve run agents on the open web, I’d love suggestions for nasty-but-representative workflows to include in the benchmark.

nkko · 2026-03-03T05:57:41 1772517461

This rings true, as I’ve noticed that with every new model update, I’m leaving behind full workflows I’ve built. The article is really great, and I do admire the system, even if it is overengineered in places, but it already reads like last quarter’s workflow. Now letting Codex 5.3 xhigh chug for 30 minutes on my super long dictated prompt seems to do the trick. And I’m hearing 5.4 is meaningfully better model. Also for fully autonomous scaffolding of new projects towards the first prototype I have my own version of a very simple Ralph loop that gets feed gpt-pro super spec file.

nkko · 2026-02-15T08:38:19 1771144699

For no special reason, beside I could, I’ve slop coded this AI agents ephemeral VM orchestrator which I use inside any agent to manipulate and maintain my coding VMs on Proxmox. Probably it could make sense to simplify it further and move from Proxmox to something like this. Link: https://github.com/nibzard/agentlab

nkko · 2026-02-04T22:15:04 1770243304

This is exciting. But I had to read and check everything twice to figure it out, as some already commented. Strong Feedback loop is an ultimate unlock for AI agents and having twins is exactly the right approach.

aspectrr · 2026-02-04T22:24:05 1770243845

YOOO thanks niko! Currently reworking lots of wording to make it easier to understand!

nkko · 2026-01-21T12:15:21 1768997721

For sure! Just ask enough times "why" and you will find the root. The main issue here it is, how many people do that for real, and how this is becoming even more critical now.

chrisjj · 2026-01-21T12:22:31 1768998151

> Just ask enough times "why" and you will find the root.

Once was enough. The root is lack of concern for security, and for GH terms of use.

nkko · 2026-01-15T16:20:33 1768494033

That magic now moved to ESP32.

nkko · 2026-01-13T12:30:40 1768307440

Reproducibility is a fascinating topic for me, and today with AI coding agents we could have automated reproducibility at least in some fields. The concept they touch on in the paper, of post publication verification could replace or add onto existing research valorization.

nkko · 2026-01-12T13:30:32 1768224632

Annual full body MRI has become a trend. Not sure who first started promoting it, probably Peter Attia.

potamic · 2026-01-12T14:33:01 1768228381

I thought radiologists need to know what to look for in order to diagnose something? Do they brute force every potential condition in the body that can be detected with an MRI?

bulbar · 2026-01-12T16:14:35 1768234475

Exactly, because an MRI is not a simple "shows problems" machine. It provides a very simplified model of certain aspects of the state of the body. We very often can't know if parts of that state are a health problem or not.

To my knowledge, studies have not shown any benefits of regular full body MRI's. You might find a problem, or you might find a non-problem and in the process of fixing it (aka operation / medication) you create a problem. Those two effects seem to balance out each other on average.

palmotea · 2026-01-12T16:01:23 1768233683

> I thought radiologists need to know what to look for in order to diagnose something? Do they brute force every potential condition in the body that can be detected with an MRI?

No, when they read a scan, they're supposed to read everything visible for every problem. Think of it this way: if you break your leg and they take an MRI, do you want the radiologist to miss a tumor because he was focused on the break?

potamic · 2026-01-12T17:22:29 1768238549

About how many "parameters" do they evaluate roughly for a full body scan? And is one typically qualified to evaluate across the entire body or do they specialize in different areas of the body?

palmotea · 2026-01-12T18:28:47 1768242527

I don't know, but I've heard from doctors (many times, sometimes quite forcefully) that it's a radiologist's job to call out all abnormalities on the full image they get, and the reasoning makes sense.

I suppose a full body MRI would be very expensive and take a lot of time to read.

nkko · 2026-01-05T15:58:41 1767628721

Yep—many of these predate LLMs.

nkko · 2026-01-05T15:57:24 1767628644

Strong point. I’m considering to tag patterns better and add stuff like “model/toolchain-specific,” and something like “last validated (month/year)” field. Things change fast and for example “Context anxiety” is likely less relevant and should be reframed that way (or retired).