Hacker Newsnew | past | comments | ask | show | jobs | submit | jvanderbot's commentslogin

And a world model is useful for ... action space search which would require prediction?

It should improve agents' action selection by allowing them to evaluate actions' effects before performing them.

An agent using only a regular LLM has no real way to predict the results of its actions. It has to just take an action based on its training data and hope it's the right one. With a world model like this, it could do a second pass before each action to catch mistakes.

I don't know if this actually delivers yet, but if it does it might help make agents more usable.


Yeah, the fun part is the lookahead search, and here we are back in classical action-space fanout search, except I guess emulated in an LLM

A 2D sprite TUI interface to a JPL telnet service? Yes please.

This is bananas to me. Theres been successful entries to snow plow competitions for ages. What a world that people now expect networks to handhold through it. Irresistable to all parties I suppose.

Well I guess I'll have to have a look!


Yeah, there's commercially available snow plow robots, you can buy a Yarbo for your house today. As far as I can tell, they all operate on a classical robotics stack - for the Yarbo you install an RTK antenna to give the robot cm-level precision, define a map and a routine, then the Yarbo can execute that routine by itself.

But can it deal with arbitrary lots without extensive premapping, manage piles, handle obstacles intelligently, correct itself (ie spot needs a second clearing ), tackle windrows, etc? It can't, and my hunch is that LLMs are the first tech we have that can plausibly handle all the various cases that a proper robot would need to handle.


My hunch is that some kind of planning stack with environmental awareness at a network level is a good solution to this. My hunch is that LLMs aren't really it. Maybe VLA but I'd bet lower.

Robotics probably will absorb a lot of Rl/diffusion-based tech, with LLM at a high level interface at best.


Yeah, afaik the approach people take today is always some form of bi or tri level hierarchical control, with a slow LLM doing planning and sub task management and diffusion or VLA doing the motor control at higher frequencies. Major differences seem like where and how you draw the boundaries. For my project I'm personally trying to use ROS2 as a low level tool call (instead of diffusion), with an agent /LLM doing the main decisions.

Having said that, this scheme seems like it might just be a reaction to current hardware limitations. When I saw Talaas demonstrate a 8B model running on a custom chip at 17k Tok/sec, first thing I thought was "wow, you can just run an LLM in a control loop"


> This is bananas to me. Theres been successful entries to snow plow competitions for ages.

Why do you hate subscriptions? What if you get a summertime snow storm?


At some point in the apparently-impending "software is free" era, s/w stops being a product that has to be "popular" and starts being mostly bespoke. One possible future is that your machine does you want because you have a local agent molding it into the right form all the time.

Bit of a stretch, but possible. I've had agents write 100x more code for me _to be productive at things_ than they do for new projects I want to sell/share.


What would be accomplished by doing this vs placing them basically anywhere else on earth?

It is a comment on the absurdity of orbital data centers. Mountaintop data centers sound absurd, but are more feasible and efficient than orbital ones in nearly every aspect.

Cooling is not the crux of the real problem, it's the fact that we have no way to replace single failed units in a running space-based data center without another launch - and if youre stressing your total launch cadence with 'new' datacenters, at what % do you repair or replace the whole slab.

The launch tempo, following the invention of a functioning approach to in-space single node replacement for even a modest portion of the planned workload capacity is something that strains credulity, even at the normal earth-level maintenance rate.

Addressing the increased failure rates due to the hard rads and geomagnetic effects, while demanding that orbital systems remain above nm% load - that's n% of the hardware still operating - at 100% power and thermal, or 100% of hardware at m% of power and thermal, or the intersection of those two slopes at any given time - in order to meet shareholders profit expectations pushes that launch cadence and cost - to maintain the baseline of workload... and well, the math of that for even a minimal % of earthbound current deployed demand is just staggeringly many launches per year.

Maybe i'm missing something, but bigger vehicles for putting larger payloads doesnt make it better, it makes it worse.


That's fine, if the argument for DC in space is just "Let's put them in the hardest place possible". Then less hard -> absurd, implies more harder -> more absurder.

But space based dc accomplish something that mountaintop dc do not. The different list of benefits/tradeoffs are why space DC are proposed and mountaintop ones are not. It's a difference of kind, not degree. It's not a meaningful experiment to just try to build DC in hard places and then we can finally validate space.

Stated benefits in particular:

- Power available 24/7 for "free"

- coms w/o interruption using existing infra

- Rideshare (SPX can build out capacity while other lifts pay some of the bill for lift)

- Nonregulation

- Very low latency to "places of interest far from USA mountains"

And no, I do not believe that mountaintop automatically satisfies these benefits in a smooth way such that mountaintop is a meaningful stepping stone towards space.


> - Power available 24/7 for "free"

The Sun is visible from Earth as well, the last time I checked.

In LEO you don't get power 24/7 because you are only 500km above the Earth. Yes the Sun is more attenuated on Earth but what we care about is $/W not raw wattage, and Earth certainly has cheaper $/W than space.

> - coms w/o interruption using existing infra

I'm perplexed how comms might be easier in space than on Earth where you can just run a cable.

> - Rideshare (SPX can build out capacity while other lifts pay some of the bill for lift)

On Earth you don't need to rideshare because you don't have to ride a rocket.

> - Nonregulation

Space is more regulated than Earth. The only way to get to space is via a rocket which is the same as an ICBM. Governments regulate the process of building ICBMs and what payloads can ride on them.

If you want non-regulation then go to international waters or find a bribable government.

> - Very low latency to "places of interest far from USA mountains"

The latency is not terrible in LEO but it's nowhere near as good as on Earth.


We're losing the direct chain of thought here. My assertion is that "Nonexistence of Mountaintop DC is not a counter-example to space DC". That's it. The reasons were spelled out.

Your points: "Mountaintop" is how comms is easier in space vs on earth. Starlink already serves many rural areas simply b/c it is easier to go to/from space in some places than "running a cable". "Latency is nowhere near as good as on earth" is just false. "Mountaintop" is why. But more broadly, my most recent vacation cabin has higher latency than starlink offers. Case closed I guess?

And one more on latency: I was referring to latency in areas of interest far from USA mountaintops / USA in general. You might want to peruse the DARPA programs on low latency in-situ, closed loop comms for in theater (sometimes space based) compute. Something close to the action.

Power: "Mountaintop" is how space has a better power case than earth. Not all of earth. Mountaintop earth. top level comment was talking about a wind turbine on a mountaintop. That's an attempt at 24h power which is very likely strictly worse.

You can step back and make larger arguments, but this thread is narrower.

"Space is more regulated than Earth". Yes, again, you're talking about wider counts of regulation. Just go look around at the pushback to data centers and you'll see some of the case for DC in space. The path to getting equipment into space is clean - just get permits and launch same as SPX does for starlink. The path to building a data center on a mountaintop probably encounters at least some non-paperwork pushback that's likely to trip big political fights. That's it. Are there a lot of mountaintops that are sufficiently cold to warrant "cooling" arguments that are not part of large state/federal parks?

So going back to the thread - if you believe that a mountaintop datacenter is a counter example to the feasibility of a space-based data center, then I think you're making a category error on some of the above criteria. Your comments don't dissuade me at all about that because they don't address either side of that argument.



This is a continuation of the clapback from DoW kerfluffle right?

Great first half of a movie, by the way. Up there with Sunshine for "Sit down for a great hour-long ambiance".

I usually end Legend after the mannequin trap, and end Sunshine after the transit of mercury.


"It is well that we are so foolish, or what little freedom we have would be wasted on us. It is for this that Book of Cold Rain says one must never take the shortest path between two points."

https://croissanthology.com/earring


Gently, as long as you work with humans, you should consider yourself working _for_ those humans. Everyone needs shared state to work from, and that's just the cost of doing business.

That said, sometimes low-trust environments are the issue, not PRs. In a higher trust environment, PR review is a helpful thing you usually desire, not dread.


> In a higher trust environment, PR review is a helpful thing you usually desire, not dread

Respectfully, in a high-trust environment, feedback should be delivered well before the PR stage. If you've let someone write a whole bunch of code without having a shared understanding of how the solution should work, you may have earlier process issues that PRs are papering over


You cannot deliver feedback on something that doesn't exist. If you mean a review in the style of "all of this is wrong and needs to be rewritten differently" then yes, that's something to be discussed beforehand. But I don't imagine this is what people think of when discussing a review.

Depends on how PRs function within teams. For some, the PR is a lightweight thing that is the preferred method of communication. It sounds like you are imagining a case where face to face communication, or communication over chat, is preferred for early stages, with the PR being a nearly final artifact. But it doesn't have to work like that.

I think that's a valuable point. Especially as LLMs bring the cost of prototyping down (and reduce emotional investment in code written), it may be more viable to use PRs as proposals/sketches of a solution.

With human reviewers, I find that by the time someone has churned out enough of a solution to post a PR, they are already quite invested in specifics of the solution, and it makes it emotionally costly (to both author and reviewer) when someone says "hey, I'm not a fan of this whole approach, lets start over and do it this other way"


I have seen many a PR where it is obvious it is an exploratory work: eg. figuring out how to use an external dependency that is imperfectly or incorrectly documented, etc. (You can claim this should be done ahead of time, but experience tells me you need to code it to learn it)

The emotional toll there is real, but this is exactly the moment when you expose the knowledge of that external dependency to the unbiased party that is the reviewer.

I like combining approvals to satisfy the urge for completion and closure, with a request for fast-follow refactor to better match the newly discovered model of interaction. (The worst code review experience I have seen is when a reviewer accepts it as-is and does a fast follow refactor themselves, depriving the author of the opportunity to learn and remain an expert in that area)


A discussion ahead of the implementation can also bias the two parties to that discussion and have them overlook the same implementation issue: many things you only understand once you start implementing.

If you have these parties review each other's code, I agree that rarely brings much value.

I think the best way to understand our experience with reviews is to stop and say: in a few sentences, what do you expect out of a quality code review? (sounds like nothing in your case, but I am curious)


> in a few sentences, what do you expect out of a quality code review? (sounds like nothing in your case, but I am curious)

From my perspective, there are three sorts of PRs:

- One is very close to the final form of a particular change, and any feedback you get at that late stage is indicative of holes in your process.

- Another is one where someone throws something up and says "hey, this is an experiment, can I get feedback on the approach". This is great, the parameters are clear, not much to say about these.

- The 3rd sort is someone making a trivial 5-line patch to a makefile/cargo.toml/github workflow/etc. These add basically no value to anyone.

Of those only the 2nd type really brings much value, and those are the ones that folks would keep posting even if you didn't require PRs (since they have an actual question, or a cool thing to show off).

I'll also note that this only really negatively impacts small remote teams, because on a sufficiently large, co-located team, you just ask your buddy one desk over to rubber stamp all the trivial commits...


On the first category, what is a process you use which has no "holes" in it?

Does everybody produce completely readable, tested code every time? Perhaps that's just "style" to you when it is "maintainability" to me?


> Does everybody produce completely readable, tested code every time?

Do your coworkers not reliably produce readable, tested code?

That's kind of the minimum bar for a software engineer in my book


I like to invert that to: do I produce code I am perfectly happy with in regards to readability and maintainability or would I benefit from another pair of eyes?

Every question I get (when my code is reviewed) is a signal that code could be more self-explanatory, unless it is a complex algorithm itself, and that my — by now deep — exposure to the problem is keeping me misguided about what is and isn't "obvious" or "clear". A reviewer can take a step back and help ensure both them and I will be able to easily grasp the same code 3 or 24 months later.

Note that one of the best advice I got early in my career about doing a good code review is that you "just" need to ask good questions: the point is not for a reviewer to show how much smarter they are, but for both to develop a shared understanding and ensure code can be interpreted as quickly as possible.


Depends on the change. Certainly most PRs don't need feedback before the PR is ready - the task is too obvious, and there's little to feed back on before there's any code.

For bigger changes, of course you need feedback on designs. But that could easily be in the form of draft PRs.

I definitely would push back on anything that required feedback before PRs. That's way too much process. Just going to slow you down for no benefit.


Agreed. But those things are not mutually exclusive.

Agree. All the subtleties of how a high trust environment work are hard to enumerate

Worse, the text that does exist concerning "war games" is probably "Wargames" and descendants/predecessors ... in which the AI always nukes.

It's just gonna do what we expect it to!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: