Hacker Newsnew | past | comments | ask | show | jobs | submit | charcircuit's commentslogin

A pretty simple one would be to have every model try and one shot every ticket your company has and then measure the acceptance rate of each model.

Except that if you tried one-shotting your ticket twenty times at different hours of the day and different days of the week, you would have enough changes to make benchmarks even if you used the same model every time. Much moreso if you fiddled with the thinking or changed the prompt.

Because non-deterministic, because of constant updates and changes, and because the models are throttled according to number of users, releases, et al.


You never get "the same" Steph Curry, he might be tired, annoyed by a fan, getting older... but if he and I were to throw 100 3-pointers, we could all correctly guess who will perform better.

Good point.

But I use Codex and Claude daily (work and hobby respectively). And there are days where one or the other just seems to have gotten up on the wrong side of the bed. Or is just being lazy. Or is suddenly super-powered do everything including what i asked it not to. (To be fair, the same thing happens with myself. :/)

I am convinced that if I was bench-marking, I would be convinced these are different models on different days.

[This conviction may say more about me then about the model.]


From what I've seen on YouTube the cars do drive themselves. This seems more like the type of thing with AI where people change the goal posts of what AI means. Just because a car did not slow down in a school zone, that doesn't mean that the car wasn't driving itself.

This is a common misconception. People tend to think driving is controlling the steering and pedals, so if FSD does those things it must be driving.

It's not. Driving is whatever has ultimate responsibility for the vehicle and its occupants. If a cop pulls you over while FSD is enabled, it's not Tesla who's paying the ticket. If FSD has an issue, you're the driver who has to respond.

Think of FSD as a very nice cruise control. You're still driving, even if you aren't touching the wheel.


Sort of how programming isn't the same as writing code — it also involves a bunch of other thing like all the design and planning work.

It's a common misconception because the thing is called "full self driving."

technically it was called "Full Self Driving (BETA)" and then "Full Self Driving (Supervised)"

The bottom line is, no one else is even remotely close to that experience for the driver, liable or not. Probably with good reason, as every other car company actually listen to their lawyers.

So if the law says that a human in the car has to be responsible then it is impossible for a self driving car to exist. I do not think tying the definition to legal liability is right.

I don't see why self driving couldn't just be steering and pedals. It would be pretty limiting but it would be able to drive itself in a circle at least.


No. The law allows passengers in self driving Taxi not to be responsible. Including Taxi operated by Tesla.

Here Tesla makes it clear to people who turn on “Full self driving” the driver must maintain supervision and thus responsibility. As such it’s Tesla’s choice that they aren’t selling self driving cars.

It wouldn’t be such a big deal if some random engineer said they’d eventually do X, but when it’s the CEO repeatedly saying the same across many public appearances that’s as binding as a Super Bowl advertisement.


It's not about legal liability, though I admit the example of tickets was informal and confusing.

Let me state it a little more clearly: the driver is the component in the vehicle system design that's ultimately responsible for ensuring the safety invariants are maintained. In a normal car, that's clearly the human in the driver's seat. Less obviously, the same is true of a Tesla with FSD. If we move that human to a remote control room, they're still the driver even if they're not physically in the vehicle.

It's only when the computer itself becomes responsible for maintaining system safety that it becomes the driver. Waymo is an example. Waymo also employs people in a remote call center, but those humans aren't responsible for safety and hence aren't drivers. But a Waymo employee out on the street using their <5mph remote control mode is driving it, because they've taken on the safety role again.

Legal liability can follow from this, but it's a much more complicated classification that I don't expect to ever have a singular answer, or even a knowable answer in many cases.


By that logic it’s ok if the car slams itself against a concrete wall - just because it failed to stop in time doesn’t mean it wasn’t driving itself.

Self driving cars are supposed to obey the same rules as human drivers.


Well ... yes. By that logic it is the case. It applies to humans too - if a human slams their car into a concrete wall then the human was still driving the car. They did a bad job of it, but they were in fact driving.

A car being driven autonomously doesn't imply much about the quality of that driving. They're still going to make bad decisions and have accidents, just like humans do (a friend of mine died slamming their car into a tree). There is probably some minimum where we'd say that it isn't really driving because it can't do anything right, but modern self driving systems are past that.


> A car being driven autonomously doesn't imply much about the quality of that driving

Only that’s not what they’re selling us - they say autonomous cars are safer than humans, fewer accidents per mile driven, faster reaction times yadda yadda. I think this implies quality and not respecting speed limits is not something that sounds very high-quality. At least not while they have to share the road with humans.


I'm just going to quote myself here:

> (a friend of mine died slamming their car into a tree)

Autonomous cars can run headlong into concrete walls and still be substantially better drivers than humans. There is no inherent contradiction there at all. They can speed and still be more law-abiding than humans too, humans get pretty casual about speed limits. I don't think you've grappled with just how bad humans are at operating rolling tin cans travelling at speeds evolution has not prepared us to move at. We're really bad at it. Autonomous cars aren't ever going to be perfect, they are merely a better alternative than humans.


Tesla FSD is vulnerable to RoadRunner and Wile E. Coyote style tricks.

Fortunately the ACME products are flawed and subject to their own litigation, see e.g. Coyote vs. ACME (2026).

it's not. that vid was using autopilot, not fsd, and subsequent videos using actual new FSD were fine

> it's not.

"Tesla FSD is invulnerable to tricks" is a pretty strong claim.


Both statements can be true. Human vs self driving cars is a different classification between good and bad driving. Humans can slam into a wall too.

By this definition, putting a brick on the accelerator and tying the steering wheel in place is self-driving.

When full liability is put on the manufacturer, then we can talk about "cars driving themselves".

Mercedes-Benz accepts full liability when their Drive Pilot autonomous system is active.

https://www.mbusa.com/en/drive-pilot


>the cars do drive themselves

Those are cars with the "HW4" FSD hardware, which was released in Mar 2023.

There were a lot of cars sold with "HW2" (nVidia-based) and HW3 (Tesla silicon). Those cars, apparently cannot be upgraded to HW4 because of physical size differences between the units. HW2 was able to be upgraded to HW3.

Those videos you are talking about seeing do not represent the FSD experience for all, or possibly even most, Tesla FSD vehicles in the wild.


It's fairly simple. Tesla says I have to supervise, and they are not liable for anything the car does wrong. It is not full self-driving any more than a 25 40 year old car with cruise control is.

AI never had goalposts, it means programming meant to look like human behavior. Like AI opponents in old video games.

Tesla FSD won't be level 5 until Tesla has liability for any crashes it causes the way Waymo does.

Elon Musks claims included (exact quotes, these posts are still on X):

Jan 10, 2016: In ~2 years, summon should work anywhere connected by land & not blocked by borders, eg you're in LA and the car is in NY

Jul 16, 2019: If we make all cars with FSD package self-driving, as planned, any such Tesla should be worth $100k to $200k, as utility increases from ~12 hours/week to ~60 hours/week

These aren't moving goalposts by antis, this are the expectations set by Elon Musk himself when advertising his products.


Here's more than a decade of claims from Tesla on self driving vehicles summarized in one handy table:

https://en.wikipedia.org/wiki/List_of_predictions_for_autono...


Those YouTubers are all there to make Tesla look good. It’s a grift. The ones that are honest and show the bad side get kicked out of the Tesla club fast and dogpiled on.

Also a school zone is one of the most basic things the car should be able to handle. If it can’t do that, it’s not ready for public use.


>Also a school zone is one of the most basic things the car should be able to handle. If it can’t do that, it’s not ready for public use.

Humans don't always follow the law driving through school zones. And when humans speed through a school zone, the human is definitely driving the car. Are we ready to let humans drive on public roads?

The argument has to go into the magnitude of the problem to get anywhere meaningful.


See, that's really the best argument for this. It can drive itself the same way I can fly an Airbus A321. You can't sue me because I didn't land the plane "intact".

This does not make sense to support. Businesses that have proper privacy controls and security do not want to be lumped together with random shady apps and want users to explicitly opt out. Another issue with this header is that users could set it and then accidentally opt out of other sharing that they don't realize since this header is being set somewhere random. Standardizing on a per app basis way to revoke consent, along with showing privacy polices and measures the apps have put in place for guarding security would be a more sensible alternative that could gain traction.

Gathering information without real consent is shady.

Bots trying to brute force accounts may not have the API implemented like a real device may.

Sure, and my desktop computer just reports 100% battery level? Which can't be easily replicated by a static header in the bot?

This would be a silly thing to use to identify bots.


It's not silly if it works.

You could get that same educational value from programming things on a smartphone.

Current smartphones are highly optimized for content consumption a.k.a doom scrolling. Nothing serious exists for programming. On top of that, a touch keyboard and hard to reach special characters make programming on a modern smartphone a big chore. I miss the old days of smartphones that had a hardware keyboard with tactile feedback. I used to code up and maintain a PHP based dynamic website circa 2007 with a Sony Ericsson K770i and upload through a J2ME based FTP client that also had the text editor in it. If I remember it correctly it was called MobyExplorer

It's much harder to type on a TI calculator than a smartphone.

Not sure I agree. You can "blind type" on a physical keyboard, and even if it has less sophistication in the way of inputting large amounts of text (lack of auto complete, lack of fuzzy typing/auto correct), a calculator is purpose built with tons of shortcuts and contextual menus that you access from muscle memory without second guessing yourself. Right now, if I've got a mildly complicated mathematical expression to type, I'd rather do it on a last-century calculator rather than e.g. on Android's GeoGebra.

It's been a long time since I've done it, but I could type pretty quickly on a TI-83 - even with the silly ABC keyboard layout and all.

I did, in Java 2ME for my Nokia. That was a completely different experience and much harder to get going. I did make a color 2D game though in about a week.

The TI-200 was much more accessible and fun, creating small little programs during or after class. Only once you wanted to go assembly did it become a chore again.

To summarize: not the same.


What's your favorite free programming environment for commonly used smartphones?

> What's your favorite free programming environment for commonly used smartphones?

Termux

  pkg install python
  python
  print('hello')
  ctrl+D
Haven't tried these, but have seen them recommended:

Acode

Termux + neovim

Termux + code-server (vscode-like, accessed through phone browser at localhost)


I like Codea for iOS, though the free version has a soft-limit at 500 lines. If a project gets bigger than 500 lines you can still run code but it'll nag you to upgrade.

I don't have a favorite. I do not feel like anyone that I am aware of has made proper investment to make a quality development app for mobile due to the low market demand. While development is better than on a calculator I think they are below my expectations.

It is not proof. It is a clear derived work infringing on the copyright of Nintendo.

Depends of the country, a lot of countries have exceptions for interoperability (at least the whole EU) and since these projects are mainly used to make ports to other systems, it may be covered.

This is an absolutely ridiculous interpretation. There is no interoperability here at all, you are literally just copying the work in question.

It is like claiming that compiling Samba to run in $NEW_PLATFORM suddenly strips Samba of the GPLv3.


You absolutely aren't copying the work, recompilation projects are intensive work and a re-imagining of what the source code could look like. Compilation is still a one way process.

And then for the legal part, that's why it's called an exception.


There is little to no creative work whatsoever if you end up with exactly the same game; and often they end up with exactly the same binary as well. Source translations are derivative works almost by definition. It doesn't matter what magic you use to generate it.

And again, where is the interoperability here? Interoperability exception would apply if there was whitebox cryptography, Nintendo logo-style things or anything else where the only method for the work to run would be to violate copyright of _exactly that_. Under no circumstances you can simply copy & distribute the entire work (or derivates) while claiming "interoperability exception!". It makes utterly no sense.


I disagree, the creative work is in figuring out what the game does, and the resulting recompilation is completely different from the original source code.

And then for the interoperability, these decompilation projects are primarily made to target other systems, not the original platform. That's the textbook definition of interoperability.

Let's be real, N64 and the PS1/PS2 (where most of these projects are based) are crumbling old platforms at this point and these projects are sometimes the best way to run games when they exist.


Decompilation produces a derivative work. This is not up for debate, or disagreement.

The exception for interoperability only applies to _the minimum required_ for interoperability. You can use this exception to distribute e.g. game authorization code even if copyright would not allow you to do it.

You _cannot_ use this as an excuse to pirate the entire program, much less to create your own derivative work and distribute it!

This is just wishful thinking that comes up every so often in these threads (now it is the 5th time I see this parroted here). And then, when Nintendo inevitably shuts everything down, cue the crying. This ignorance is simply setting these projects for failure.

https://news.ycombinator.com/item?id=45643106


Your interpretation, I have mine. As far as I know, none of these recompilation projects ended up in any EU court yet so your interpretation is as valid as mine.

And Nintendo can pound sand, sorry. The only realistic ways to play those aging games is on an emulator or recompilation projects nowadays.

Nintendo also didn't strike these projects, maybe they are afraid of making a precedent.


So, wishful thinking it is.

There is a bazillion of jurisprudence about decompilation in the EU . Just search for your favorite case. I'm based in the EU (France). But FYI, despite what you may think, in practice the US is more lax about this than the EU is.

In the EU, for example, decompilation even if you don't distribute may very well be illegal (because it would be an unauthorized temporary copy of the program); the US courts are way more lax when it comes to these temporary never-distributed copies (which are almost always fair use, a concept that doesn't exist per-se in the EU). This is a big problem in the EU for security research (which obviously does not fall into interoperability).

Emulation would be acceptable, which is yet another reason the interoperability clause does not apply (since you _already_ have a way to interoperate that doesn't require distributing copyrighted software, and the EU interoperability clause very explicitly says that then it does _not_ apply).


Derivative works aren't some unknowable arcane legal term. They're a pretty fundamental aspect of copyright law. The canonical examples of derivative works are things like adaptation of a book to a film, translation of a book, or a sequel.

And given these examples, it's very clear that recompilation to play on modern hardware is quite similar in spirit to translating a book into a different language, which makes it a derivative work. The other alternative is that there is insufficient creativity in the recompilation effort to merit independent copyright at all, in which case it's just plain copying of the original work. In either case, it's infringement.


The computer doesn't even work. Disappointing.

The headline is pretty misleading.

Awesome project, regardless.


As the browser you do not have to license Widevine. This is the responsibility of your OS vendor to provide and license a DRM solution. So for example when you build your browser on Mac it uses the Apple APIs to use FairPlay to handle Spotify.

It's unrealistic to expect every app on the system to have to deal with licensing DRM themselves.


This is pure speculation. It is a million times more likely that this data is strictly used to combat scraping and fraud.

You saw speculation, and you raised with speculation and hyperbole!

>Openclaw is not a competitor with Claude

Not Claude, but other Anthropic products such as Claude Cowork.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: