More

chermanowicz · 2026-02-12T20:34:01 1770928441

It's so capable at some things, and others are garbage. I uploaded a photo of some words for a spelling bee and asked it to quiz my kid on the words. The first word it asked, wasn't on the list. After multiple attempts to get it to start asking only the words in the uploaded pic, it did, and then would get the spellings wrong in the Q&A. I gave up.

romanows · 2026-02-13T02:35:47 1770950147

I had it process a photo of my D&D character sheet and help me debug it as I'm a n00b at the game. Also did a decent, although not perfect, job of adding up a handwritten bowling score sheet.

chermanowicz · 2025-12-25T22:56:52 1766703412

the talk is that sambanova-intel is akin to a firesale, so who knows what the real story with any single company. same with graphcore.

https://www.datacenterdynamics.com/en/news/sambanova-explori...

chermanowicz · 2025-09-19T19:36:06 1758310566

How do you implement the same thing on Android?

chermanowicz · 2025-09-19T19:34:41 1758310481

It looks like there are spacers that might fill in different gaps depending on a case - the whole usb plug looks a bit long to suit different cases

chermanowicz · 2025-09-19T19:16:08 1758309368

Will you offer this without a subscription fee? I like it but for this it should just be a small voice model running on device, not cloud - especially given you don't use APIs. I think you'll get a lot of adoption if it doesn't have a subscription or cloud requirement.

chermanowicz · 2025-09-16T15:21:20 1758036080

https://archive.ph/r2Jb7

chermanowicz · on June 12, 2024

I can also guarantee you points 2 & 3 are pretty wrong as well. Funny (and also sad) how different peoples realities can be. I have been worn out sitting on both sides of the table to tell you the truth. (One thing I will say is that within VC, the vast majority of the folks actually doing the work are sympathetic and helpful to companies, working with early/senior employees, but it gets lost up the food chain so to speak and there's usually just one or two people making decisions at the end of the day about a particular deal or a whole portfolio- and these people are generally very self-interested.)

chermanowicz · on May 14, 2024

On one hand, some of these results are impressive; on the other, the illegal moves count is alarming - it suggests no reasoning ability as there should never be an illegal move? I mean, how could a violation of a fairly basic game (from a rules perspective) be acceptable in assigning any 'outcome' to a model other than failure?

freediver · on May 14, 2024

Agreed, this is what makes evaluating this very hard. A 1700 Elo chess player would never make an illegal move, let alone have 12% illegal moves.

So from the model's perspective, we have at the same time display of both brilliancy (most 1700 chess players would not be able to solve as many puzzles by looking just at the FEN notation) and on the other side complete lack of any understanding of what is it trying to do from a fundamental, human-reasoning level.

vbezhenar · on May 14, 2024

That's because LLM does not reason. For me, as a layman, that seems strange that they don't wire some kind of Prolog engine to fill the gap, (like they wired Python to fill the gap in arithmetic) but probably it's not that easy.

jodrellblank · on May 14, 2024

Prolog doesn’t reason either, it does a simple brute force search over all possible states of your code and if that’s not fast enough it can table (cache, memoize) previous states.

People build reasoning engines from it, in the same way they do with Python and LISPs.

eggdaft · on May 14, 2024

What do you mean by “an LLM doesn’t reason”?

vbezhenar · on May 14, 2024

I mean that it does not follow basic logic rules when constructing its thoughts. For many tasks they'll get it right, however it's not that hard to find a task for which LLM will yield obviously logically wrong answer. That would be impossible for human with basic reasoning.

eggdaft · on May 15, 2024

I disagree, but I don’t have a cogent argument yet. So I can’t really refute you.

What I can say is, I think there’s a very important disagreement here and it divides nerds into two camps. The first think LLMs can reason, the second don’t.

It’s very important to resolve this debate, because if the former are correct then we are likely very close to AGI historically speaking (<10 years). If not, then this is just a stepwise improvement and we will now plateaux until the next level of sophistication of model or computer power etc is achieved.

I think a lot of very smart people are in the second camp. But they are biased by their overestimation of human cognition. And that bias might be causing them to misjudge the most important innovation in history. An innovation that will certainly be more impactful than the steam engine and may be more dangerous than the atomic bomb.

We should really resolve this argument asap so we can all either breathe a sigh of relief or start taking the situation very very seriously.

vbezhenar · on May 15, 2024

I'm actually in the first camp. For I believe that our brains is really LLM on steroids and logic rules are just in our "prompt".

What we need is a LLM that will iterate over its output until it feels that it's correct. Right now LLM output is like random thought in my mind. Which might be true or not. Before writing forum post I'd think it twice. And may be I'll rewrite the post before submitting it. And when I'm solving a complex problem, it might take weeks and thousands of iterations. Even reading math proof might take a lot of effort. LLM should learn to do it. I think that's the key to imitating human intelligence.

mistrial9 · on May 14, 2024

my guess is -- the probabilistic engine does sequence variation and it just will not do anything else.. so a simple A->B sort of logic is elusive at a deep level; secondly the adaptive and very broad kinds of questions and behaviors it handles, also make it difficult to write logic that could correct defective answers to simple logic.

chermanowicz · on Feb 2, 2024

Sucks, but there were three users I think

chermanowicz · on Oct 19, 2023

This is a list from 2018