Hacker Newsnew | past | comments | ask | show | jobs | submit | _alternator_'s commentslogin

This is different—the cost of plastic goes up if fossil fuel consumption goes down because currently it uses a waste stream. Not sure if it’s true, but it’s different than my prior intuition about fossil fuel and plastic.

Exactly - a common understanding of fossil fuels is that we could just "use them for planes and plastics" but there would be an unexpected cost there - because the plastics are basically "free" waste products of processing for other needs.

It's similar to how car heaters work on waste heat from ICE and have to be accounted for in electric cars.


Right on with special relativity—Lorentz also was developing the theory and was a bit sour that Einstein got so much credit. Einstein basically said “what if special relativity were true for all of physics”, not just electromagnetism, and out dropped e=mc^2. It was a bold step but not unexplainable.

As for general relativity, he spent several years working to learn differential geometry (which was well developed mathematics at the time, but looked like abstract nonsense to most physicists). I’m not sure how he was turned on to this theory being applicable to gravity, but my guess is that it was motivated by some symmetry ideas. (It always come down to symmetry.)


If people want to study this, perhaps it makes more sense to do like we used to: don't include the "labels" of relativity into the training set and see if it comes up with it.

Mass AI job displacement is real. The cost of code per engineer is going to go up, even as engineers write 5-10x the code, because AI tools are expensive. The solution is to reduce engineering, and we are seeing that across the board.

Similar things are going to happen in accounting, HR, sales, and other areas. Everyone who is still employed will be more productive, but they will cost more to employ.


Citation needed, and not by a CEO who is looking for a positive spin on the layoffs their company is doing.

‘Potato quality’ ahahahahahahhaa I hope this was iPhone autocorrect to prove the point.

No, it's an old phrase. It came from the question, "Was this filmed on a potato?" when someone posted a video of particularly bad quality, as if their phone was a potato.

It wasn’t too long ago either. I mentioned it before in prior comments but due to how MMS works at one major carrier (verizon) they sent picture quality back to pre-smartphone days for a large % of android users.

The quick explainer is phones send a user agent with the request to fetch a media message, this user agent contains a link to a file that describes what the device can handle. Apple and Blackberry hosted these files themselves, Verizon hosted most of the android ones on its network itself. They decommissioned the server hosting them a few years ago which made it so all affected devices pulled the lowest potato quality image down for compatibility. Huge number of complaints.


It's a phrase that's been around for years to mean "poor quality" (https://knowyourmeme.com/memes/recorded-with-a-potato). One theory behind the term is that the recording device was so bad/low-tech, it could be powered by a potato battery.

For smartphones, I think it has always meant you could replace your smartphone with a potato and get the same functionality.

Pretty sure that was sarcasm.

So… it’s like you completely understand the issue :)

And obviously, you tax the fuel at the source, right when it comes out of the ground. Higher prices get passed down, changing behavior because the products externalities are priced correctly from the start.


To be clear, the source would still be the consumer. Hydrocarbons can be used for non-CO2 emitting purposes such as chemical feedstock for pharmaceuticals, solvents, etc. We should only be levying a tax upon uses that emit CO2 into the atmosphere, i.e. burning them in your ICE vehicle. It’s not the fracking company that’s emitting the CO2 (unless they’re gas flaring or similarly emitting carbon during extraction but this is a rounding error on total emissions).

Right, because processing oil for non fuel products takes 0 energy, and produces 0 emissions, right?

You can. Everything- including basic things like food, transportation, construction, healthcare- will become more expensive, of course. My objection was to ask fossil fuel companies to pay after you already bought and burned your fuel cheap.

Citations? Seems like this is a general assertion so it’d be nice to see if it’s true in any particular case.

This is just a general pattern: applied mathematicians are often using things pure mathematicians haven't proved to be true yet. The examples are widespread for the generalized Riemann hypothesis. There are statements we aren't sure about, but there's also a lot that we are sure about but not sure about the proof of.

People—like me—who’ve used latex often learn to love en and em dashes. I think they are great, and I appreciate that people care about typography enough to use them. I also use an Oxford comma. It’s about care and quality; the fact that LLMs use them suggests that they are preferable. I’d encourage everyone to start.

Let’s be clear about what endorsement for arXiv means here. You either need a validated email address (eg most .edu’s) or an endorsement from someone who has one to get a paper on arXiv. It’s a simple gate that helps keep arXiv relatively free from spam, but it’s not peer review.

I view it as nice that you’ve got someone serious who thinks the work is worth posting to arXiv, but the endorsement bar is generally quite low. I’d encourage you to send it to a journal (Didier might be able to recommend an appropriate venue) and really engage with the process and community. I’ve found that process to be extremely valuable (and humbling).


These are very serious research level math questions. They are not “Erdős style” questions; they look more like problems or lemmas that I encountered while doing my PhD. Things that don’t make it into the papers but were part of an interesting diversion along the way.

It seems likely that PhD students in the subfields of the authors are capable of solving these problems. What makes them interesting is that they seem to require fairly high research level context to really make progress.

It’s a test of whether the LLMs can really synthesize results from knowledge that require a human several years of postgraduate preparation in a specific research area.


So these are like those problems that are “left for the reader”?

Not necessarily. Even the statements may not appear in the final paper. The questions arose during research, and understanding them was needed for the authors to progress, but maybe not needed for the goal in mind.

No, results in a paper are identified to be "left for the reader" because they are thought to be straightforward to the paper's audience. These are chosen because they are novel. I didn't see any reason to think they are easier than the main results, just maybe not of as much interest.

Very serious for mathematicians - not for ML researchers.

If the paper would not have had the AI spin, would those 10 questions still have been interesting?

It seems to me that we have here a paper that is solely interesting because of the AI spin -- while at the same time this AI spin is really poorly executed from the point of AI research, where this should be a blog post at most, not an arXiv preprint.


I’m confused by this comment. I’m pretty sure that someone at all the bigs labs is running these questions through their models and will report back as soon as the results arrive (if not sooner, assuming they can somehow verify the answers).

The fact that you find it odd that this landed on arXiv is maybe a cultural thing… mathematicians kinda reflexively throw work up there that they think should be taken seriously. I doubt that they intend to publish it in a peer reviewed journal.


Yes, but people at those labs may be running those problems because a Fields Medalist is in the paper, and it got hype.

Not because of the problems, and not because this is new methodology.

And once the labs report back, what do we know that we didn't know before? We already know, as humans, the answer to the problems, so that is not it. We already know that LLMs can solve some hard problems, and fail in easy problems, so that is not it either.

So what do we really learn?


Ah. I think the issue is that research mathematicians haven’t yet hit the point where the big models are helping them on the problems they care about.

Right now I can have Claude code write a single purpose app in a couple hours complete with a nice front end, auth, db, etc. (with a little babysitting). The models solve a lot of the annoying little issues that an experienced software developer has had to solve to get out an MVP.

These problems are representative of the types of subproblems research mathematicians have to solve to get a “research result”. They are finding that LLMs aren’t that useful for mathematical research because they can’t crush these problems along the way. And I assume they put this doc together because they want that to change :)


> These problems are representative of the types of subproblems research mathematicians have to solve to get a “research result”. They are finding that LLMs aren’t that useful for mathematical research because they can’t crush these problems along the way. And I assume they put this doc together because they want that to change :)

Same holds true for IMProofBench problems. This dataset shows nothing new.


> So what do we really learn?

We will learn if the magical capabilities attributed to these tools are really true or not. Capabilities like they can magically solve any math problem out there. This is important because AI hype is creating the narrative that these tools can solve PhD level problems and this will dis-infect that narrative. In my book, any tests that refute and dispel false narratives make a huge contribution.


> We will learn if the magical capabilities attributed to these tools are really true or not.

They're not. We already know that. FrontierMath. Yu Tsumura's 553th problem, RealMath benchmark. The list goes on. As I said many times on this thread, there is nothing novel in this benchmark.

This fact that this benchmark is so hyped shows that the community knows nothing, NOTHING, about prior work in this space, which makes me sad.


the last unsolved erdos problem proof generated by llms that hit the news was so non interesting that a paper published by erdos himself stated the proof

aaaaaaand no one cared enough to check

so i think the question is, are those interesting by themselves, or, are they just non interesting problems no one will ever care about except it would be indicative llms are good for solving complex novel problems that do not exists in their training set?


The timed-reveal aspect is also interesting.

How is that interesting for a scientific point of view? This seems more like a social experiment dressed as science.

Science should be about reproducibility, and almost nothing here is reproducible.


> Science should be about reproducibility, and almost nothing here is reproducible.

I can see your frustration. You are looking for reproducible "benchmarks". But you have to realize several things.

1) research level problems are those that bring the "unknown" into the "known" and as such are not reproducible. That is why "creativity" has no formula. There are no prescribed processes or rules for "reproducing" creative work. If there were, then they would not be considered "research".

2) things learnt and trained are already in the realm of the "known", ie, boiler-plate, templated and reproducible.

The problems in 2) above are where LLMs excel, but they have been hyped into excelling at 1) as well. And this experiment is trying to test that hypothesis.


Deepmind’s Nobel Prize was primarily for its performance in CASP which is pretty much exactly this. Labs solve structures of proteins, but don’t publish them until after all the computational teams predict structures.

So I’m not sure where you’re coming from claiming that this isn’t scientific.


It wasn't like this in any way.

CASP relies on a robust benchmark (not just 10 random proteins), and has clear participation criteria, objective metrics how the eval plays out, etc.

So I stand by my claim: This isn't scientific. If CASP is Japan, a highly organized & civilized society, this is a banana republic.


Reproducibility is just one aspect of science, logic + reasoning from principles and data is the major aspect.

There are some experiments which cannot be carried out more than once.


> There are some experiments which cannot be carried out more than once

Yes, in which case a very detailed methodology is required: which hardware, runtimes, token counts etc.

This does none of that.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: