Another thing... they alter the localStorage & sessionStorage prototype, by wrapping the native ones with a wrapper that prevent keys that not in their whitelist from being set.
As I understand, Comma.ai is focused on driver-assistance and not fully autonomous self-driving.
The features listed on the wikipedia are lane-centering, cruise-control, driver monitoring, and assisted lane change.[1]
The article I linked to from Starsky addresses how the first 90% is much easier than the last 10% and even cites "The S-Curve here is why Comma.ai, with 5–15 engineers, sees performance not wholly different than Tesla’s 100+ person autonomy team."
To give an example of the difficulty of the last 10%: I saw an engineer from Waymo give a talk about how they had a whole team dedicated to detecting emergency vehicle sirens and acting appropriately. Both false positives and false negatives could be catastrophic so they didn't have a lot of margin for error.
Speaking as a user of Openpilot / Comma device, it is exactly what the Wikipedia article described. In other words, it's a level 2 ADAS.
My point was, he had more than naive / "pedestrian level" (pun?) understanding of the problem domain as he worked on Comma.ai project for quite some time; even the device is only capable of solving maybe about 40% of the autonomous driving problem.
> At the end, movies are about the stories, not just pretty graphics.
The great people at Pixar and DreamWorks would be a bit offended. Over the past three or so decades they have pushed every aspect of rendering to its very limits: from water, hair, atmospheric effects, reflections, subsurface scattering, and more. Watching a modern Pixar film is a visual feast. Sure, the stories are also good, but the graphics are mind-bendingly good.
What you said is possible by feeding the output of speech-to-text tools into an LLM. You can prompt the LLM to make sense of what you're trying to achieve and create sets of actions. With a CLI it’s trivial, you can have your verbal command translated into working shell commands. With a GUI it’s slightly more complicated because the LLM agent needs to know what you see on the screen, etc.
That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.
I was just thinking about building something like this, looks like you beat me to the punch, I will have to try it out.
I'm curious if you're able to give commands just as well as some wording you want cleaned up. I could see a model being confused between editting the command input into text to be inserted and responding to the command. Sorry if that's unclear, might be better if I just try it.
What's likely is that there won't be anything open / significant coming out from them anymore
reply