You are describing the toy projects that had us all amazed end of last year. Large, maintainable software that can serve paying customers is in a completely different galaxy.
I‘ve used it in a previous engagement. Unfortunately it’s not customizable enough, and performance for deep forms is really bad. Also, I‘d definitely use agents to set it up.
Because I wanted the full ide on my iPhone so I can code while away from my laptop doing fun stuff with my kids. And I don’t like the Claude codex fire and forget approach.
The ide I built has a full terminal, file system, git integration and AI agent. It uses a private cloud Linux container that is persistent so I can install packages and do anything I want from any phone, computer or browser. It’s amazing that we live in a time where we can build custom software for ourselves just for fun. I will never have to worry about cursor or vs changing getting bought and moth balled like Atom (my favorite ide). I now own my tool and will forever.
Literally will break overnight when some key dependency changes. Your LLM might not be able to fix it. Then i guess you regenerate it all from scratch? Sounds exhausting tbh.
I’ve built enterprise software for 10 years with multiple upgrades over that time. With good test coverage and the right abstractions maintenance is feasible.
Also, because I wrote and own the code I don’t have to update if I don’t want to. I could choose instead to build around the dependency. That’s much more control over than when Microsoft bought GitHub and destroyed the Atom ide which I loved in favor of vscode which I still hate
I'm just guessing, but IDE which is using 3D acceleration just for stupid UI to run "smoothly", that is ridiculous.
Who runs IDE with LLM agents accessing your local filesystem, on bare metal?
Or am I alone to run everything LLM related on my VM just for development work.
Then because of ZED genius decision, you need to share your GPU to VM, then some important features will not work, like snapshots. So you also need workaround for this, etc.
Too much hassle, Zed is not for me.
But I'm anti-Apple, so maybe that's the reason :)
Btw, even "ImHex" devs realized this and they're providing version without acceleration for VM use.
They're using ImGui. Using it for local desktop app UI is also ridiculous, imho. Whatever.
I would imagine running a local LLM for development isn’t as popular as using a hosted provider. I don’t personally host a local model, but I have shared GPUs and storage volumes with VMs and I didn’t see it as that much of a hassle. What kinds of problems are you running into?
Doesn’t ghostty also use graphics acceleration? I was under the impression that rendering text is a relatively challenging graphics compute task.
I run local LLM on my MacBook together with frontier models for different tasks. I am in the process of setting up a 3 Mac studio system to serve AI to my team.
What's wrong with using a 3d accelerator and falling back to CPU graphics if needed? Pixels / joule is orders of magnitude better on an iGPU than on the CPU. (Which can matter over a 8-12 hour editing session, maybe.)
Modern IDEs don't use 3D at all, nor do they use the sprite-like 2D graphics that GPUs excel at and that can accelerate, e.g. mobile touch- and swipe-based UX. The main thing they do is font rendering, and accelerating that on GPU while keeping visual quality unchanged is quite complicated. The graphics pipeline doesn't really help all that much.
Not sure how you use CC, but the last 6 months has felt like significant optimization efforts to me. Last year Claude would just read and edit files, now it's all kinds of basic tool gymnastics with grep/awk/sed/etc to narrowly slice and avoid token-heavy reads. Resuming sessions that aren't even that large get a scary prompt about using a significant portion of your token budget if you continue without compacting.
To me it feels like a worse experience, and they probably feel it too, but it makes sense from an optimization perspective. I've probably learned some shell tricks, but also going blind from watching Claude try dozens of variations of some multi-line chained and piped wall of bash nightmare, instead of just reading a few files.
I've noticed that's changed over the past month or so. Claude-code used to happily pipe build commands straight into context, but recently it's been running them as background tasks that pipe to file, and it'll search and do partial reads on the output instead.
It also gives tips on reducing context size when you run /context .
Presumably they are actually starting to feel the pinch on inference costs themselves with what still feels like a fairly generous max plan.
And it seems to use head, tail etc. more than it used to, even when unnecessary, which, combined with the recent(?) tendency of more chaining and as you said, piping to temp files and the like, totally screwed up claude code’s auto approval system for me (by auto approval I mean the system to decide which commands can be run without permission prompt, based on the permissions.allow setting among other things, not to be confused with a specific new approval mode called “auto” that burns more tokens to decide whether the command is safe). I had to write my own auto approval system and plug it in as a hook.
By what measure? HTML as they're describing it here includes CSS and presumably Javascript. Markdown has only a handful of elements and they're all more human readable in their unrendered form. Show both to someone who has never seen either and it's obvious which they would describe as "simpler".
How often are your emails being marked as spam, for others? A few years ago it read like there’s a whole science behind avoiding getting flagged. Is this easier now with agents aiding the setup?
Not the person you replied to, and it's impossible to know with certainty how often you're in someone else's spam, but very rarely.
I had an issue with yahoo a couple of years ago that's all. The "it read like there's a whole science" is sadly a trope mostly repeated by people who have never tried because it gets upvotes on Reedit.
There are some steps you have to take, but not many, and systems like Mox mailserver or stalwart guide you through it, and mail-tester will check if you got it right.
Email, other than tweaking spam filters, is one of my lowest maintenance systems. I can't remember the last time I touched Exim or Mox config
You got me really interested here, I ran my own mailserver years ago and eventually just gave it up. I am getting rid of Google Workspace and have been planning a migration to Proton for two domains. But this sounds like a fun project. Any advice? I am going to check out Mox and Stalwart.
What providers are good hosting candidates, I have a website on DO, but from my understanding their entire ranges are blacklisted heavily.
If I remember rightly DO have some restrictions like port 25 on ipv6 outbound being blocked.
I can't speak for all of them but I use mythic beasts in the UK for one mail server (they are a very knowledgeable old school host) and it has been good. I also have dedicated with OVH which is fine, and a couple small scale (eg simplelogin, a notification server) with IONOS but they only deliver to me so I can't say how reliably they deliver elsewhere.
Mox is great but I think it's still alpha. I've been using it for 2 years in production for a small traffic domain. The other I use Exim (with mythic beast's Sympl that sets it up) but it's a little more hands on at the beginning
I imagine an agent would make a lot of the first time setup from scratch easier, but the fastest reliable way to get up and running is mail-in-a-box or mailcow. Before those were available I built a flurdy style Postfix+Courier+Amavisd+MySQL setup and have been evolving it ever since. Now I'm on Postfix+Dovecot+rspamd+MySQL but I don't think that's for everyone or even the best way to start.
The science of not getting flagged is easy when you're not sending large volumes of untrusted mail; it only gets complicated if you start hosting mail for "customers" or let your system forward mail unfiltered into gmail/yahoo.
Here's my hit list of universal things to configure:
* Start with an IP with good or neutral reputation, non-residential, its nearly impossible to fix an IP that has been burned by a spammer. (Network)
* Valid reverse dns for your IP matching your mailhost forward dns (DNS)
* Valid SPF record; -all (DNS)
* Valid DKIM; with sufficiently sized key (DNS+Config)
* Valid DMARC; start with p=none to test and move to p=reject once you're configured (DNS)
* ARC if you or your users will ever possibly forward mail (Config)
* Don't get your messages flagged as spam anywhere ever, filter outbound mail even if its just you. All it takes is one piece of malware and a saved password and you'll have to get a new IP. (Config)
* Don't configure services behind your mail server with example domains that you don't control ~ I get so much mis-configured test mail from people who think its cute to use my domain as an example in their practice lab. It all gets reported as spam or bounces and then their smart host bounce rate goes up. (Config)
* Test for open relay; only relay for authenticated users. (Config)
* Use strong authentication, preferably with certificates or MFA. (Config)
* Secure everything; IMAP/SMTP/POP are old AF make sure you're requiring STARTTLS and setup MTA-STS to prevent downgrade attacks and enforce encryption in transit. Use a real certificate from Lets Encrypt don't self-sign. (DNS+http+Config)
* fail2ban your auth, you're going to get so much driveby password spraying and credential stuffing; I fail2ban block entire subnets at a time with iptables actions. I also have a bunch of "poison pill" rules for weird stuff I see in my logs eg block anyone who tries to auth with the NTLM hash for 'password'. (Config)
* Don't bother with BIMI at home, you can't get a blue check mark without deep pockets and a trademark (vmc) and most platforms only show logos that have a matching vmc. (DNS+https+config)
* DMARC reporting and TLS-RPT reporting are a pain to manage but are helpful troubleshooting deliverability be prepared to read some XML reports or setup a stack to parse them as they arrive (DNS + Config + https)
* setup the SMTP Submission port (587), so many networks block port 25 outbound and its the right way for clients to connect. (Config)
* configure BACKUPS, don't skip this step, encrypted restic backups to s3 or backblaze b2 is cheap and easy. (config)
* track your configs in git, don't commit secrets. (config)
* configure a free blacklist monitor on mxtoolbox for your domain(s) (config)
If you do those things you'll be in a pretty good spot, you could probably paste that list/this post into your agent and vibe up solid mailserver.
For me keeping the spam and phishing out is a bigger hassle than deliverability issues. rspamd does a pretty good job of keeping it manageable.
I do all of those things and with all of that setup the only place I ever run into issues with with users on AT&T's residential broadband mail servers. AT&T appears to block you if you're not known to them and they have a short memory. If you don't have regular correspondence with AT&T users they will block you after a bit. I'm a fairly low volume sender so I end up blocked every other time I try to send to AT&T by no fault of my own. I've talked most of those friends off of AT&Ts free email and on to ProtonMail at this point.
I know a cardiologist who founded a training & knowledge base startup for doctors. He once told me (that was before LLMs), that it’s super common to tell a patient that the doc needs to look up sthg in their patient history, to then instead google the symptoms. Or, even more often, quickly text a colleague.
I have no way of knowing if this is true. But I‘d rather had a complete, guided prompt be the basis of a diagnosis, than a 2m google search.
Does she think this really does the complexity of each case justice though? I doubt you can compress an anamnesis into a two-liner without losing essential data.
> Does she think this really does the complexity of each case justice though?
Do you believe that -prior to the 2020-ish mass evacuation of doctors from the profession- the typical specialist would misrepresent the facts of a case when asking for a cross-check?
Related: Have you ever worked as "the guys who actually work on the thing"-level tech support for a nontrivial Enterprise Software Product (or System)? If you have, did you never send a quick message to a knowledgeable coworker to double-check something that you were pretty sure was correct, but weren't 100% certain about?
An enterprise product is not comparable with the human body at all. A single cell contains hundreds of times more information/entropy in its state than an operating system.
> An enterprise product is not comparable with the human body at all.
Incorrect! Enterprise products are often sprawling projects that
* are poorly designed
* are inadequately (and often incorrectly) documented
* have confusing and/or inadequate diagnostic facilities
* are far, far too large for any one person to completely understand
* have one or components that no one adequately understands
* are pretty much constantly in a state of partial failure
* usually don't require an understanding of -say- the QM principles that govern the behavior of the medium that embodies system in order to perform system diagnostics and repair
Given that you dodged them, I'll assume that your answer to my first question to you is "Yes", and to my second is "No".
It’s obviously not a new model capability. But using this well-known, existing capability to solve this particular issue is only obvious after the fact.
It’s a useful trick to have in one’s toolbox, and I’m grateful to the author for sharing it.
It's not novel in the sense that nobody knew about img2img. It's novel in the sense that nobody thought of using img2img to solve this problem in this way.
It's novel if you never played with img2img, including especially several forms of (text+img)2img. Or, if you never tried editing images by text prompt in recent multimodal LLMs.
That said, I spent plenty of time doing both, and yet it would probably take me a while to arrive at this approach. For some reason, the "draw a sketch, have a model flesh it out" approach got bucketed with Stable Diffusion in my mind, and multimodal LLMs with "take detailed content, make targeted edits to it". So I'm glad the OP posted it.
They’re actually quite good at it. I’ve had a number of situations where I’ve wanted to re-render some of my older comics. You can basically tell any SOTA multimodal model (NB, GPT-Image-X) to treat them as storyboards and prompt for a specific style: newprint, crosshatching, monochromatic ink sketch, etc.
Another thing I’ve gotten very used to doing is avoiding the “one-shot” approach. If I generate something and don’t like the results, I bring it into Krita, move things around, redraw some elements, and then send it back in with instructions to just clean it up (remove any smudges or imperfections). The state-of-the-art models can do an astonishing job with that workflow.
Ok it might just be me then. I view Nvidia‘s DLSS as a similar thing. There was even this meme that video games will in the future only output basic geometry and the AI layer transforms it into stunning graphics.
reply