Hacker Newsnew | past | comments | ask | show | jobs | submit | manmal's commentslogin

Many large systems can’t be built good enough because they just fall apart. Try letting a junior dev make an ERP or a database system.

You are describing the toy projects that had us all amazed end of last year. Large, maintainable software that can serve paying customers is in a completely different galaxy.

I‘ve used it in a previous engagement. Unfortunately it’s not customizable enough, and performance for deep forms is really bad. Also, I‘d definitely use agents to set it up.

Curious, why did Zed with ACP not work for you?

Because I wanted the full ide on my iPhone so I can code while away from my laptop doing fun stuff with my kids. And I don’t like the Claude codex fire and forget approach.

The ide I built has a full terminal, file system, git integration and AI agent. It uses a private cloud Linux container that is persistent so I can install packages and do anything I want from any phone, computer or browser. It’s amazing that we live in a time where we can build custom software for ourselves just for fun. I will never have to worry about cursor or vs changing getting bought and moth balled like Atom (my favorite ide). I now own my tool and will forever.


Literally will break overnight when some key dependency changes. Your LLM might not be able to fix it. Then i guess you regenerate it all from scratch? Sounds exhausting tbh.

I’ve built enterprise software for 10 years with multiple upgrades over that time. With good test coverage and the right abstractions maintenance is feasible.

Also, because I wrote and own the code I don’t have to update if I don’t want to. I could choose instead to build around the dependency. That’s much more control over than when Microsoft bought GitHub and destroyed the Atom ide which I loved in favor of vscode which I still hate


I'm just guessing, but IDE which is using 3D acceleration just for stupid UI to run "smoothly", that is ridiculous.

Who runs IDE with LLM agents accessing your local filesystem, on bare metal?

Or am I alone to run everything LLM related on my VM just for development work. Then because of ZED genius decision, you need to share your GPU to VM, then some important features will not work, like snapshots. So you also need workaround for this, etc.

Too much hassle, Zed is not for me.

But I'm anti-Apple, so maybe that's the reason :)

Btw, even "ImHex" devs realized this and they're providing version without acceleration for VM use. They're using ImGui. Using it for local desktop app UI is also ridiculous, imho. Whatever.


I would imagine running a local LLM for development isn’t as popular as using a hosted provider. I don’t personally host a local model, but I have shared GPUs and storage volumes with VMs and I didn’t see it as that much of a hassle. What kinds of problems are you running into?

Doesn’t ghostty also use graphics acceleration? I was under the impression that rendering text is a relatively challenging graphics compute task.


I run local LLM on my MacBook together with frontier models for different tasks. I am in the process of setting up a 3 Mac studio system to serve AI to my team.

What's wrong with using a 3d accelerator and falling back to CPU graphics if needed? Pixels / joule is orders of magnitude better on an iGPU than on the CPU. (Which can matter over a 8-12 hour editing session, maybe.)

Modern IDEs don't use 3D at all, nor do they use the sprite-like 2D graphics that GPUs excel at and that can accelerate, e.g. mobile touch- and swipe-based UX. The main thing they do is font rendering, and accelerating that on GPU while keeping visual quality unchanged is quite complicated. The graphics pipeline doesn't really help all that much.

Agents are read-only per default in Zed. You should really get off your high horse.

Yes that’s the case. And as Anthropic staff, author has an incentive to promote workflows that require an agent to interact with text documents.

I've yet to see Anthropic promote any sort of token optimization strategy to its users - they always assume we all have infinite inference.

"No bread? Let them eat cake!"


Not sure how you use CC, but the last 6 months has felt like significant optimization efforts to me. Last year Claude would just read and edit files, now it's all kinds of basic tool gymnastics with grep/awk/sed/etc to narrowly slice and avoid token-heavy reads. Resuming sessions that aren't even that large get a scary prompt about using a significant portion of your token budget if you continue without compacting.

To me it feels like a worse experience, and they probably feel it too, but it makes sense from an optimization perspective. I've probably learned some shell tricks, but also going blind from watching Claude try dozens of variations of some multi-line chained and piped wall of bash nightmare, instead of just reading a few files.


Valid points, but they address a totally different matter than the one I pointed out.

They give you multiple models and knobs to control effort? What are you looking for?

I completely agree but hadn’t found a way to put it to words. It could be the model too trained on optimized strategies

At the same time it has gotten WAY better at parsing giant documents due to this.

Nah they do. They push Sonnet pretty hard rather than Opus for most tasks.

Also: https://platform.claude.com/docs/en/agents-and-tools/tool-us...


I've noticed that's changed over the past month or so. Claude-code used to happily pipe build commands straight into context, but recently it's been running them as background tasks that pipe to file, and it'll search and do partial reads on the output instead.

It also gives tips on reducing context size when you run /context .

Presumably they are actually starting to feel the pinch on inference costs themselves with what still feels like a fairly generous max plan.


And it seems to use head, tail etc. more than it used to, even when unnecessary, which, combined with the recent(?) tendency of more chaining and as you said, piping to temp files and the like, totally screwed up claude code’s auto approval system for me (by auto approval I mean the system to decide which commands can be run without permission prompt, based on the permissions.allow setting among other things, not to be confused with a specific new approval mode called “auto” that burns more tokens to decide whether the command is safe). I had to write my own auto approval system and plug it in as a hook.

HTML is by far simpler than Markdown.

By what measure? HTML as they're describing it here includes CSS and presumably Javascript. Markdown has only a handful of elements and they're all more human readable in their unrendered form. Show both to someone who has never seen either and it's obvious which they would describe as "simpler".

Markdown covers only a very small subset of HTML. By definition you are wrong.

How often are your emails being marked as spam, for others? A few years ago it read like there’s a whole science behind avoiding getting flagged. Is this easier now with agents aiding the setup?

Not the person you replied to, and it's impossible to know with certainty how often you're in someone else's spam, but very rarely.

I had an issue with yahoo a couple of years ago that's all. The "it read like there's a whole science" is sadly a trope mostly repeated by people who have never tried because it gets upvotes on Reedit.

There are some steps you have to take, but not many, and systems like Mox mailserver or stalwart guide you through it, and mail-tester will check if you got it right.

Email, other than tweaking spam filters, is one of my lowest maintenance systems. I can't remember the last time I touched Exim or Mox config


You got me really interested here, I ran my own mailserver years ago and eventually just gave it up. I am getting rid of Google Workspace and have been planning a migration to Proton for two domains. But this sounds like a fun project. Any advice? I am going to check out Mox and Stalwart.

What providers are good hosting candidates, I have a website on DO, but from my understanding their entire ranges are blacklisted heavily.


If I remember rightly DO have some restrictions like port 25 on ipv6 outbound being blocked.

I can't speak for all of them but I use mythic beasts in the UK for one mail server (they are a very knowledgeable old school host) and it has been good. I also have dedicated with OVH which is fine, and a couple small scale (eg simplelogin, a notification server) with IONOS but they only deliver to me so I can't say how reliably they deliver elsewhere.

Mox is great but I think it's still alpha. I've been using it for 2 years in production for a small traffic domain. The other I use Exim (with mythic beast's Sympl that sets it up) but it's a little more hands on at the beginning


Excellent thanks

Not very often at all, but it did happen at least once. Note that even email sent from Google itself can be marked as spam depending on the message.

I imagine an agent would make a lot of the first time setup from scratch easier, but the fastest reliable way to get up and running is mail-in-a-box or mailcow. Before those were available I built a flurdy style Postfix+Courier+Amavisd+MySQL setup and have been evolving it ever since. Now I'm on Postfix+Dovecot+rspamd+MySQL but I don't think that's for everyone or even the best way to start.

The science of not getting flagged is easy when you're not sending large volumes of untrusted mail; it only gets complicated if you start hosting mail for "customers" or let your system forward mail unfiltered into gmail/yahoo.

Here's my hit list of universal things to configure:

* Start with an IP with good or neutral reputation, non-residential, its nearly impossible to fix an IP that has been burned by a spammer. (Network)

* Valid reverse dns for your IP matching your mailhost forward dns (DNS)

* Valid SPF record; -all (DNS)

* Valid DKIM; with sufficiently sized key (DNS+Config)

* Valid DMARC; start with p=none to test and move to p=reject once you're configured (DNS)

* ARC if you or your users will ever possibly forward mail (Config)

* Don't get your messages flagged as spam anywhere ever, filter outbound mail even if its just you. All it takes is one piece of malware and a saved password and you'll have to get a new IP. (Config)

* Don't configure services behind your mail server with example domains that you don't control ~ I get so much mis-configured test mail from people who think its cute to use my domain as an example in their practice lab. It all gets reported as spam or bounces and then their smart host bounce rate goes up. (Config)

* Test for open relay; only relay for authenticated users. (Config)

* Use strong authentication, preferably with certificates or MFA. (Config)

* Secure everything; IMAP/SMTP/POP are old AF make sure you're requiring STARTTLS and setup MTA-STS to prevent downgrade attacks and enforce encryption in transit. Use a real certificate from Lets Encrypt don't self-sign. (DNS+http+Config)

* fail2ban your auth, you're going to get so much driveby password spraying and credential stuffing; I fail2ban block entire subnets at a time with iptables actions. I also have a bunch of "poison pill" rules for weird stuff I see in my logs eg block anyone who tries to auth with the NTLM hash for 'password'. (Config)

* Don't bother with BIMI at home, you can't get a blue check mark without deep pockets and a trademark (vmc) and most platforms only show logos that have a matching vmc. (DNS+https+config)

* DMARC reporting and TLS-RPT reporting are a pain to manage but are helpful troubleshooting deliverability be prepared to read some XML reports or setup a stack to parse them as they arrive (DNS + Config + https)

* setup the SMTP Submission port (587), so many networks block port 25 outbound and its the right way for clients to connect. (Config)

* configure BACKUPS, don't skip this step, encrypted restic backups to s3 or backblaze b2 is cheap and easy. (config)

* track your configs in git, don't commit secrets. (config)

* configure a free blacklist monitor on mxtoolbox for your domain(s) (config)

If you do those things you'll be in a pretty good spot, you could probably paste that list/this post into your agent and vibe up solid mailserver.

For me keeping the spam and phishing out is a bigger hassle than deliverability issues. rspamd does a pretty good job of keeping it manageable.

I do all of those things and with all of that setup the only place I ever run into issues with with users on AT&T's residential broadband mail servers. AT&T appears to block you if you're not known to them and they have a short memory. If you don't have regular correspondence with AT&T users they will block you after a bit. I'm a fairly low volume sender so I end up blocked every other time I try to send to AT&T by no fault of my own. I've talked most of those friends off of AT&Ts free email and on to ProtonMail at this point.


For the people who's mail service blocks you and they cannot or will not change their mail provider, what is your solution?

I would just send those domains through mailgun with a transport map in postfix, it probably wouldn't even break the free tier.

If you use mailgun or similar you have to setup dkim keys for them and add them to your spf.


Great info, thanks

It’s quite easy to remote control an Android phone with an agent (eg there‘s agent-device). I don’t think this will keep automation from happening.

I will never get these two minutes back.


I know a cardiologist who founded a training & knowledge base startup for doctors. He once told me (that was before LLMs), that it’s super common to tell a patient that the doc needs to look up sthg in their patient history, to then instead google the symptoms. Or, even more often, quickly text a colleague.

I have no way of knowing if this is true. But I‘d rather had a complete, guided prompt be the basis of a diagnosis, than a 2m google search.


> quickly text a colleague.

This is still common and useful to gut check and make sure you aren't missing something. Source: wife is a doctor.


Does she think this really does the complexity of each case justice though? I doubt you can compress an anamnesis into a two-liner without losing essential data.

> Does she think this really does the complexity of each case justice though?

Do you believe that -prior to the 2020-ish mass evacuation of doctors from the profession- the typical specialist would misrepresent the facts of a case when asking for a cross-check?

Related: Have you ever worked as "the guys who actually work on the thing"-level tech support for a nontrivial Enterprise Software Product (or System)? If you have, did you never send a quick message to a knowledgeable coworker to double-check something that you were pretty sure was correct, but weren't 100% certain about?


An enterprise product is not comparable with the human body at all. A single cell contains hundreds of times more information/entropy in its state than an operating system.

> An enterprise product is not comparable with the human body at all.

Incorrect! Enterprise products are often sprawling projects that

* are poorly designed

* are inadequately (and often incorrectly) documented

* have confusing and/or inadequate diagnostic facilities

* are far, far too large for any one person to completely understand

* have one or components that no one adequately understands

* are pretty much constantly in a state of partial failure

* usually don't require an understanding of -say- the QM principles that govern the behavior of the medium that embodies system in order to perform system diagnostics and repair

Given that you dodged them, I'll assume that your answer to my first question to you is "Yes", and to my second is "No".


Well, I don't doubt that enterprise deployments can be complex, but this is a false analogy.

Even the original stable diffusion app had image 2 image. It just didn’t work as well. I‘m not sure why this is supposed to be novel.

It’s obviously not a new model capability. But using this well-known, existing capability to solve this particular issue is only obvious after the fact.

It’s a useful trick to have in one’s toolbox, and I’m grateful to the author for sharing it.


It's not novel in the sense that nobody knew about img2img. It's novel in the sense that nobody thought of using img2img to solve this problem in this way.

It's novel if you never played with img2img, including especially several forms of (text+img)2img. Or, if you never tried editing images by text prompt in recent multimodal LLMs.

That said, I spent plenty of time doing both, and yet it would probably take me a while to arrive at this approach. For some reason, the "draw a sketch, have a model flesh it out" approach got bucketed with Stable Diffusion in my mind, and multimodal LLMs with "take detailed content, make targeted edits to it". So I'm glad the OP posted it.


They’re actually quite good at it. I’ve had a number of situations where I’ve wanted to re-render some of my older comics. You can basically tell any SOTA multimodal model (NB, GPT-Image-X) to treat them as storyboards and prompt for a specific style: newprint, crosshatching, monochromatic ink sketch, etc.

Another thing I’ve gotten very used to doing is avoiding the “one-shot” approach. If I generate something and don’t like the results, I bring it into Krita, move things around, redraw some elements, and then send it back in with instructions to just clean it up (remove any smudges or imperfections). The state-of-the-art models can do an astonishing job with that workflow.

https://imgpb.com/eGDJIb


Ok it might just be me then. I view Nvidia‘s DLSS as a similar thing. There was even this meme that video games will in the future only output basic geometry and the AI layer transforms it into stunning graphics.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: