More

polyrand · 2026-01-26T10:26:39 1769423199

I share the same feeling. I waited as much as possible to upgrade to iOS 26 / macOS Tahoe.

Two days ago, I finally upgraded. Liquid Glass is one of the worst things I've ever seen in terms of design. It reminds me of when I personalized old cheap android phones or Linux distros just "to look cool". Cool-looking: yes. Unusable: also yes. Tasteful design: almost absent.

Just the increase of the border-radius in all elements makes it hideous. Apps with a search bar on a scrollable list look like a CSS bug when the search bar is on top of the elements. Neither the search bar nor the element underneath are visible. Although this applies to most transparency effects on Liquid Glass. Neither the elements above nor below the "glass" are visible. And the extra value added is zero.

The thing is, I can still adapt to it, or tweak transparency and contrast. But I've seen elderly relatives struggle just because WhatsApp decided to add the "Meta AI" floating button. I can't imagine what this "inaccessible" UI changes can do.

freehorse · 2026-01-26T10:51:30 1769424690

It is the first time I am trying to skip a macos version. I really hope in macos27 they will fix things. I used to skip every second windows version, so back here we are.

inatreecrown2 · 2026-01-26T11:53:21 1769428401

same here. using mac since OS9, and Tahoe is the first time i skipped a version (downgrade after 2 months)

polyrand · 2026-01-24T21:08:42 1769288922

Don't forget that if you're using SQLite on something like EBS, multiple queries may not be efficient.

I'm saying this as a huge SQLite fan, but also beware of what kind of storage you're using in your instance.

andersmurphy · 2026-01-25T18:45:05 1769366705

Yeah, you really want directly connected NVME drives to your machine/VPS. It can make orders of magnitude difference.

polyrand · 2026-01-19T17:51:56 1768845116

I've been using z.ai models through their coding plan (incredible price/performance ratio), and since GLM-4.7 I'm even more confident with the results it gives me. I use it both with regular claude-code and opencode (more opencode lately, since claude-code is obviously designed to work much better with Anthropic models).

Also notice that this is the "-Flash" version. They were previously at 4.5-Flash (they skipped 4.6-Flash). This is supposed to be equivalent to Haiku. Even on their coding plan docs, they mention this model is supposed to be used for `ANTHROPIC_DEFAULT_HAIKU_MODEL`.

RickHull · 2026-01-19T18:41:32 1768848092

Same, I got 12 months of subscription for $28 total (promo offer), with 5x the usage limits of the $20/month Claude Pro plan. I have only used it with claude code so far.

theshrike79 · 2026-01-21T07:58:41 1768982321

This offer was so stupid cheap there was no point in NOT getting :D

stogot · 2026-01-19T20:38:13 1768855093

Do they still have that promo offer?

Mashimo · 2026-01-19T21:39:13 1768858753

Looks like they have something for 29 USD with 3x the claude code usage: https://z.ai/subscribe

victorbjorklund · 2026-01-19T21:53:16 1768859596

How has the performance been lately? I heard some people say that they change their limits likely making it almost not useable

chewz · 2026-01-19T23:52:17 1768866737

Never had any problems with Z.ai models.

However they are using more thinking internally and that makes them seem slow.

polyrand · 2025-12-24T15:59:26 1766591966

Not sure about the impact of these, I guess it depends on the context where this engine is used. But there seems to be already exploits for the engine:

https://x.com/itszn13/status/2003707921679679563

https://x.com/itszn13/status/2003808443761938602

polyrand · 2025-12-22T22:29:19 1766442559

A few comments mentioning distillation. If you use claude-code with the z.ai coding plan, I think it quickly becomes obvious they did train on other models. Even the "you're absolutely right" was there. But that's ok. The price/performance ratio is unmatched.

hashbig · 2025-12-23T04:54:20 1766465660

I had Gemini 3 Flash hit me this morning with "you're absolutely right" when I corrected it on a mistake it did. It's not conclusive of anything.

polyrand · 2025-12-23T06:13:00 1766470380

That's interesting, thanks for sharing!

It's a pattern I saw more often with claude code, at least in terms of how frequently it says it (much improved now). But it's true that just this pattern alone is not enough to infer the training methods.

theptip · 2025-12-23T06:26:58 1766471218

Or it’s conclusive of an even broader trend!

ljosifov · 2025-12-23T10:12:27 1766484747

I imagine - and sure hope so - everyone trains on everything else. Distillation - ofc if one has bigger/other models providing true posterior token probabilities in the (0,1) interval (a number between 0 and 1), rather than 1-hot-N targets that are '0 for 200K-sans-this-token, and 1 for the desired output token' - one should use the former instead of the latter. It's amazing how as a simple as straightforward idea should face so much resistance (paper rejected) and from the supposedly most open minded and devoted to knowing (academia) and on the wrong grounds ('will have no impact on industry'; in fact - it's had tremendous impact on industry; better rejection wd have been 'duh it is obvious'). We are not trying to torture the model and the gpu cluster to be learning from 0 - when knowledge is already available. :-)

Havoc · 2025-12-23T01:23:02 1766452982

>Even the "you're absolutely right" was there.

I don't think that's particularly conclusive for training on other models. Seems plausible to me that the internet data corpus simply converges on this hence multiple models doing this.

...or not...hard to tell either way.

polyrand · 2025-12-22T09:35:52 1766396152

I enjoyed the post. I was about to link the "Let Me Speak Freely" paper and "Say What You Mean" response from dottxt, but that's already been posted in the comments.

I'm a huge fan of structured outputs, but also recently started splitting both steps, and I think it has a bunch of upsides normally not discussed:

1. Separate concerns, schema validation errors don't invalidate the whole LLM response. If the only error is in generating schema-compliant tokens (something I've seen frequently), retries are much cheaper.

2. Having the original response as free text AND the structured output has value.

3. In line with point 1, it allows using a more expensive (reasoning) model for free-text generation, then a smaller model like gemini-2.5-flash to convert the outputs to structured text.

polyrand · 2025-12-11T11:33:07 1765452787

A frozen dictionary would be very welcome. You can already do something similar using MappingProxyType [0]

  from types import MappingProxyType
  
  d = {}
  
  d["a"] = 1
  d["b"] = 2
  
  print(d)
  
  frozen = MappingProxyType(d)
  
  print(frozen["a"])
  
  # Error:
  frozen["b"] = "new"

[0]: https://docs.python.org/3/library/types.html#types.MappingPr...

zahlman · 2025-12-11T13:27:04 1765459624

> You can already do something similar

Only if you deny access to the underlying real dict.

ali_m · 2025-12-11T13:53:47 1765461227

Yes, this only prevents the callee from mutating it, it can't provide a strong guarantee that the underlying mapping won't be changed upstream (and hence MappingProxyType can't be washable).

polyrand · 2025-11-25T12:40:29 1764074429

There is too much focus on students cheating with AI and not enough on the other side of the equation: teachers.

I've seen assignments that were clearly graded by ChatGPT. The signs are obvious: suggestions that are unrelated to the topic or corrections for points the student actually included. But of course, you can't 100% prove it. It's creating a strange feedback loop: students use an LLM to write the essay, and teachers use an LLM to grade it. It ends up being just one LLM talking to another, with no human intelligence in the middle.

However, we can't just blame the teachers. This requires a systemic rethink, not just personal responsibility. Evaluating students based on this new technology requires time, probably much more time than teachers currently have. If we want teachers to move away from shortcuts and adapt to a new paradigm of grading, that effort needs to be compensated. Otherwise, teachers will inevitably use the same tools as the students to cope with the workload.

Education seemed slow to adapt to the internet and mobile phones, usually treating them as threats rather than tools. Given the current incentive structure and the lack of understanding of how LLMs work, I'm not optimistic this will be solved anytime soon.

I guess the advantage will be for those that know how to use LLMs to learn on their own instead of just as a shortcut. And teachers who can deliver real value beyond what an LLM can provide will (or should) be highly valued.

ericmcer · 2025-11-25T18:12:12 1764094332

It is probably a good time to view the root goals of education instead of the markers of success that we have been shooting at for a long time now (worksheets, standardized tests, etc.).

A one hour lecture where students (especially <20 year old kids) need to proactively interject if they don't understand something is a pretty terrible format.

> "Education seemed slow to adapt to the internet and mobile phones, usually treating them as threats rather than tools. Given the current incentive structure and the lack of understanding of how LLMs work"

Good point, it is less like a threat and more like... "how do we shoehorn this into our current processes without adapting them at all? Oh cool now the LLM generates and grades the worksheets for me!".

We might need to adjust to more long term projects, group projects, and move away from lectures. A teacher has 5*60=300 minutes a week with a class of ~26. If you broke the class into groups of 4 - 5 you could spend a significant amount of time with each group and really get a feel for the students beyond what grade the computer gives to their worksheet.

gchallen · 2025-11-25T16:48:40 1764089320

As a teacher, I agree. There's a ton of covert AI grading taking place on college campuses. Some of it by actual permanent faculty, but I suspect most of it by overworked adjuncts and graduate student teaching assistants. I've seen little reporting on this, so it seems to be largely flying under the radar. For now. But it's definitely happening.

Is using AI to support grading such a bad idea? I think that there are probably ways to use it effectively to make grading more efficient and more fair. I'm sure some people are using good AI-supported grading workflows today, and their students are benefiting. But of course there are plenty of ways to get it wrong, and the fact that we're all pretending that it isn't happening is not facilitating the sharing of best practices.

Of course, contemplating the role of AI grading also requires facing the reality of human grading, which is often not pretty. Particularly the relationship between delay and utility in providing students with grading feedback. Rapid feedback enables learning and change, while once feedback is delayed too long, its utility falls to near zero. I suspect this curve actually goes to zero much more quickly than most people think. If AI can help educators get feedback returned to students more quickly, that may be a significant win, even if the feedback isn't quite as good. And reducing grading burden also opens up opportunities for students to directly respond to the critical feedback through resubmission, which is rare today on anything that is human-graded.

And of course, a lot of times university students get the worst of both worlds: feedback that is both unhelpful and delayed. I've been enrolling in English courses at my institution—which are free to me as a faculty member. I turned in a 4-page paper for the one I'm enrolled in now in mid-October. I received a few sentences of written feedback over a month later, and only two days before our next writing assignment was due. I feel lucky to have already learned how to write, somehow. And I hope that my fellow students in the course who are actual undergraduates are getting more useful feedback from the instructor. But in this case, AI would have provided better feedback, and much more quickly.

whearyou · 2025-11-25T18:11:23 1764094283

“It's creating a strange feedback loop: students use an LLM to write the essay, and teachers use an LLM to grade it. It ends up being just one LLM talking to another, with no human intelligence in the middle.”

This was the plot to a recent South Park episode: https://m.imdb.com/title/tt27035146/

HDThoreaun · 2025-11-25T17:18:39 1764091119

When I was in high school none of my teachers actually read any of the homework we turned in. They all skimmed it, maybe read the opening and closing paragraph if it was an essay. So I guess the question is if having an ai grade it is better than having a teacher look at it for 15 seconds, because that’s the real alternative.

polyrand · 2025-09-30T06:27:47 1759213667

Instead of containers, which may not always be available, I'm experimenting with having control over the shell to whitelist the commands that the LLM can run [0]. Similar to an allow list, but configured outside the terminal agent. Also trying to make it easy to use the same technique in macOS and Linux

[0]: https://ricardoanderegg.com/posts/control-shell-permissions-...

jcgl · 2025-09-30T12:10:53 1759234253

Not specific to LLM stuff, but I've lately been using bubblewrap more and more to isolate bits of software that are somewhat more sketchy (NPM stuff, binaries downloaded from GitHub, honestly most things not distro-packaged). It was a little rocky start out with, but it is nice knowing that a random binary can't snoop on and exfiltrate e.g. my shell history.

tennox · 2025-09-30T22:32:45 1759271565

You might might my (alpha-level) attempt at this: https://gitlab.com/txlab/ai/sandcastle

jcgl · 2025-10-01T20:58:18 1759352298

Looks like it's probably neat, but is kinda inverse from what I myself want. I want:

- something general-purpose (not specific to LLMS (I myself don't use agents--just duck.ai when I want to ask an LLM a question)) - something focused on sandboxing (bells and whistles like git and nix integration sound like things I'd want to use orthogonal tools for)

philipp-gayret · 2025-09-30T07:54:11 1759218851

I really like this and we're doing a similar approach but instead using Claude Code hooks. What's really nice about this style of whitelisting is that you can provide context on what to do instead; Let's say if `terraform apply` is banned, you can tell it why and instruct it to only do `terraform plan`. Has been working amazing for me.

polyrand · 2025-09-30T16:51:11 1759251071

Me too! I also have a bunch of hooks in claude code for this. But codex doesn't have a hooks feature as polished as claude code (same for their command permissions, it's worse than Claude Code as of today). That's why I explored this "workaround" with bash itself.

khafra · 2025-09-30T06:33:32 1759214012

An interesting exercise would be to let a friend into this restricted shell, with a prize for breaking out and running rm -rf / --no-preserve-root. Then you know to switch to something higher-security once LLM capabilities reach the level of that friend.

user3939382 · 2025-09-30T07:54:20 1759218860

You have to put them in the same ACL, chroot, whatever permission context for authorization you’d apply to any other user human or otherwise. For some resources it’s cumbersome to setup but anything else is a hope and a prayer.

_heimdall · 2025-09-30T13:36:34 1759239394

This is how I've been using Gemini CLI. It has no permissions by default, whether it wants to search google, run tests, or update a markdown file it has to propose exactly what it needs to do next and I approve it. Often its helpful even just to redirect the LLM, if it starts going down the wrong path I catch it early rather than 20 steps down that road.

I have no way of really guaranteeing that it will do exactly what it proposed and nothing more, but so far I haven't seen it deviate from a command I approved.

hboon · 2025-09-30T09:12:57 1759223577

I didn’t check, but sometimes Claude Code writes scripts and run them (their decision); does your approach guard against that?

polyrand · 2025-09-30T16:49:38 1759250978

It depends. If you allow running any of bash/ruby/python3/perl, etc. and also allow Claude to create and edit files without permission, then it won't protect against the pattern you describe.

polyrand · 2025-09-11T21:33:44 1757626424

A bit off topic but:

  The reason for the "lite" in the name is that it doesn’t run a separate process, it doesn’t listen on a port or a socket, and you can’t connect to it.

The name doesn't really contain "lite". It's SQL-ite. So the suffix is "ite":

  The suffix "ite" is derived from the Greek word lithos (from its adjectival form -ites), meaning rock or stone [0]

[0]: https://english.stackexchange.com/a/34010

chuckadams · 2025-09-11T21:48:21 1757627301

From the horse's mouth (Hipp's): "I wrote SQLite, and I think it should be pronounced "S-Q-L-ite". Like a mineral. But I'm cool with y'all pronouncing it any way you want. :-)"

Me, I say "sequeLITE" with the emphasis on the last syllable, but now I'm thinking of switching to "SEQuelite". You'll never catch me pronouncing it "ess-cue-ell" either way dammit!

:)

arcanemachiner · 2025-09-12T01:55:43 1757642143

In my head, I call it "ess-kew-lite".

christophilus · 2025-09-11T21:57:25 1757627845

Yeah the first time I heard Hipp pronounce it, my brain glitched.

stevula · 2025-09-12T05:52:39 1757656359

The Stack Exchange link is incorrect about -ite being etymologically derived from lithos, as one of the commenters there noted. Maybe a misunderstanding of this wiktionary note or similar:

> But by the Hellenistic period, both the masculine -ίτης (-ítēs) and the feminine -ῖτις (-îtis) became very productive in forming technical terms for products, diseases, minerals and gems (adjectives with elliptic λίθος (líthos, “stone”)), ethnic designations and Biblical tribal names.

The meaning of that is not that -ite is etymologically derived from lithos. It’s trying to say that mineral names like “hematite” (αἱματίτης - literally “blood-red”) are originally adjectives agreeing with an implied noun lithos.

jjgreen · 2025-09-12T08:32:11 1757665931

Comments like this are why I read HN ...