Hacker Newsnew | past | comments | ask | show | jobs | submit | lreeves's commentslogin

In my experience Qwen3.5/Qwen3-Coder-Next perform best in their own harness, Qwen-Code. You can also crib the system prompt and tool definitions from there though. Though caveat, despite the Qwen models being the state of the art for local models they are like a year behind anything you can pay for commercially so asking for it to build a new app from scratch might be a bit much.

Could not agree more, it's wild my AppleTVs now ask for which profile is using it but the iPad still hasn't gotten this.

Profiles don't work well on Apple TVs at all though. You choose a profile on the device, and then you still have to choose a profile whenever you launch any given streaming app as well. I don't know what changing profiles on an Apple TV actually does.

Apps can hook into the Apple TV user profiles if they want, but many don't.

As a developer myself, I respect and understand that it's not their fault that profiles are useless

As a consumer, I don't care whose fault it is that profiles are useless.


the developer needs to write code to detect the current profile. Most app’s don’t do this, and they explicitly ask a 2nd time. Not apple’s fault.

There are some apps that get this right. Infuse recently added support for this.


I don’t know I think Apple should be able to keep COW filesystems for every user to apply atop a read only file system. Unique apps, unique settings (maybe unify tv settings into admin panel) and no cross-contamination or need for app owners to switch profiles. macOS software doesn’t need explicit understanding of profile switching, neither should iPadOS software.

It's not the end user's problem whose fault it is.

as an end user it is my problem when trying to complain in the right place

I agree with you.

and the end user can blame Netflix, Prime Video, Disney+, YouTube, etc for not delivering the best experience for their customers. ¯\_(ツ)_/¯


Or they can blame Apple for delivering a developer experience to those companies that makes those companies not want to play ball.

Not that they likely will, as Apple owns the framing.


In my household we have two Apple TVs sitting next to each other, and two remotes with the names of my partner and mine on them as most apps don't properly support profiles so that's the easiest solution. If they do that so people buy more devices...it's definitely working.

I also ended up doing this. With HDMI-CEC powering up the TV and the receiver automatically, then switching to the correct input on any AppleTV remote button press, this is actually a really friction free option if you can stomach buying two devices for the same purpose. I put the remotes in different colored rubber cases (red and blue) to make clear which device is being operated.

At one stage I even had a third AppleTV, that was hooked permanently to a VPN exiting in a foreign country, so I could get TV content and applications restricted to another region I watch a lot of content in. It was so nice to just pick up a remote and instantly have the foreign appleTV experience, rather than juggle VPN apps and foreign Apple Store accounts on the same device.


Probably the only apple platform whose price point is low enough that I’d be on board with this idea.

It’s the vendors not supporting platform features. Usually, actively avoiding it because they think it’d dilute their brand or some shit.

I solved this by just pirating everything and putting it in Jellyfin with Infuse on my AppleTV. Managing profiles and parental controls (and god forbid you also want actual curation) is just totally broken if you pay money for the content, but if you pirate it, it works. Go figure. Dropped from like seven or eight streaming services at peak to I think two. It’s not worth it for the savings, though that’s a nice bonus (it all ends up in hard drives or electricity anyway, though) but it’s the only way to get sane UX. Friggin’ irritating.


I'm not even sure I'd only see the fault with the vendors in this case, as I could very easily imagine that feature to be buggy (From Apples side) or not supporting some use cases that they might want, as no large streaming service seems to do it.

It's a bit similar to them not supporting Apple TV's "Continue Watching" feature as they don't want to hand over all their watch data to Apple.

In any case, once you have a good setup the pirating UX is very hard to beat (I'm looking forward to the day that Jellyfin on tvOS has feature parity with Plex, not a big fan of Infuse personally. That's the issue to follow for that: https://github.com/jellyfin/Swiftfin/discussions/1294).


The one thing Infuse gets me is support for one fairly major audio codec (I forget which one). Have to pay for a license, doubt the official client will ever have it.

The UI is slightly janky out of the box but if you customize it it’s not bad. Key to note is that you probably want to use the “library” menu item for almost everything and drill down from there (that way you can filter by e.g. genre, or order by release date, or whatever, right up front) or else just go over to the entry for the server itself, which gets you a list of top-level items like you see in the Jellyfin web ui.

If you have much stuff at all you need to just ignore top level entries like “movies” or “tv” because (as far as I can tell) they’re just giant alphabetical lists of everything, which borders on useless. I think you can make them not show up at all. You just need search, “library”, and an entry for any server(s) you have to browse them “raw”.


I mean no shit though? People calmly said this in Trump's first term where he (unsuccessfully) first tried to go tariff crazy. What does it add though? Nobody is freaking out saying "all tariffs are bad", they're saying "blanket tariffs for no/the stupidest reasons possible are bad".


And "tariffs that are utterly unpredictable and can change after barely-concealed bribery" are unhelpful to plan a business around.


I run the larger version of it on a Threadripper with 512GB RAM and a 32GB GPU for the non-expert layers and context, using llama.cpp. Performs great, however god forbid you try to get that much memory these days.


I sometimes still code with a local LLM but can't imagine doing it on a laptop. I have a server that has GPUs and runs llama.cpp behind llama-swap (letting me switch between models quickly). The best local coding setup I've been able to do so far is using Aider with gpt-oss-120b.

I guess you could get a Ryzen AI Max+ with 128GB RAM to try and do that locally but non-nVidia hardware is incredibly slow for coding usage since the prompts become very large and take exponentially longer but gpt-oss is a sparse model so maybe it won't be that bad.

Also just to point it out, if you use OpenRouter with things like Aider or roocode or whatever you can also flag your account to only use providers with a zero-data retention policy if you are truly concerned about anyone training on your source code. GPT5 and Claude are infinitely better, faster and cheaper than anything I can do locally and I have a monster setup.


gpt-oss-120b is amazing. I created a RAG agent to hold most of GCP documentation (separate download, parsing, chunking, etc). ChatGPT finished a 50 question quiz in 6 min with a score of 46 / 50. gpt-oss-120b took over an hour but got 47 / 50. All the other local LLMs I tried were small and performed way worse, like less than 50% correct.

I ran this on an i7 with 64gb of RAM and an old nvidia card with 8g of vram.

EDIT: Forgot to say what the RAG system was doing which was answering a 50 question multiple choice test about GCP and cloud engineering.


> gpt-oss-120b is amazing

Yup, I agree, easily best local model you can run today on local hardware, especially when reasoning_effort is set to "high", but "medium" does very well too.

I think people missed out on how great it was because a bunch of the runners botched their implementations at launch, and it wasn't until 2-3 weeks after launch that you could properly evaluate it, and once I could run the evaluations myself on my own tasks, it really became evident how much better it is.

If you haven't tried it yet, or you tried it very early after the release, do yourself a favor and try it again with updated runners.


What do you use to run gpt-oss here? ollama, vLLM, etc


Not parent, but frequent user of GPT-OSS, tried all different ways of running it. Choice goes something like this:

- Need batching + highest total throughoutput? vLLM, complicated to deploy and install though, need special versions for top performance with GPT-OSS

- Easiest to manage + fast enough: llama.cpp, easier to deploy as well (just a binary) and super fast, getting ~260 tok/s on a RTX Pro 6000 for the 20B version

- Easiest for people not used to running shell commands or need a GUI and don't care much for performance: Ollama

Then if you really wanna go fast, try to get TensorRT running on your setup, and I think that's pretty much the fastest GPT-OSS can go currently.


> I created a RAG agent to hold most of GCP documentation (separate download, parsing, chunking, etc)

If you share the scripts to gather the GCP documentation this, that'd be great. Because I have had an idea to do something like this, and the part I don't want to deal with is getting the data


I tried scripts but got blocked. I used wget to download tthem


on what hardware you manate to run gpt-oss-120b locally?


you can run the 120b model on an 8GB GPU? or are you running this on CPU with the 64GB RAM?

I'm about to try this out lol

The 20b model is not great, so I'm hoping 120b is the golden ticket.


With MoE models like gpt-oss, you can run some layers on the CPU (and some on GPU): https://github.com/ggml-org/llama.cpp/discussions/15396

Mentions 120b is runnable on 8GB VRAM too: "Note that even with just 8GB of VRAM, we can adjust the CPU layers so that we can run the large 120B model too"


I have in many cases had better results with the 20b model, over the 120b model. Mostly because it is faster and I can iterate prompts quicker to choerce it to follow instructions.


> had better results with the 20b model, over the 120b model

The difference of quality and accuracy of the responses between the two is vastly different though, if tok/s isn't your biggest priority, especially when using reasoning_effort "high". 20B works great for small-ish text summarization and title generation, but for even moderately difficult programming tasks, 20B fails repeatedly while 120B gets it right on the first try.


But the 120b model has just as bad if not worse formatting issues, compared to the 20b one. For simple refactorings, or chatting about possible solutions i actually feel teh 20b halucinates less than the 120b, even if it is less competent. Migth also be because of 120b not liking being in q8, or not being properly deployed.


> But the 120b model has just as bad if not worse formatting issues, compared to the 20b one

What runtime/tools are you using? Haven't been my experience at all, but I've also mostly used it via llama.cpp and my own "coding agent". It was slightly tricky to get the Harmony parsing in place and working correct, but once that's in place, I haven't seen any formatting issues at all?

The 20B is definitely worse than 120B for me in every case and scenario, but it is a lot faster. Are you running the "native" MXFP4 weights or something else? That would have a drastic impact on the quality of responses you get.

Edit:

> Migth also be because of 120b not liking being in q8

Yeah, that's definitely the issue, I wouldn't use either without letting them be MXFP4.


Everything I run, even the small models, some amount goes to the GPU and the rest to RAM.


Hmmm...now that you say that, it might have been the 20b model.

And like a dumbass I accidentally deleted the directory and didn't have a back up or under version control.

Either way, I do know for a fact that the gpt-oss-XXb model beat chatgpt by 1 answer and it was 46/50 at 6 minutes and 47/50 at 1+ hour. I remember because I was blown away that I could get that type of result running locally and I had texted a friend about it.

I was really impressed but disappointed at the huge disparity between time the two.


What were you using for RAG? Did you build your own or some off the shelf solution (e.g. openwebui)


I used pg vector chunking on paragraphs. For the answers I saved in a flat text file and then parsed to what I needed.

For parsing and vectorizing of the GCP docs I used a Python script. For reading each quiz question, getting a text embedding and submitting to an LLM, I used Spring AI.

It was all roll your own.

But like I stated in my original post I deleted it without backup or vcs. It was the wrong directory that I deleted. Rookie mistake for which I know better.


What quantization settings?



In a sense you can think of it that way, as a Canadian we counter-tariff the US and that can be considered punishing us; however the US is only one country and it encouraged more free trade with every other one of our trading partners so in a game theory sense it's affecting Canadian trade negatively with one country and affecting US trade negatively with you know.. every country.


I see, so like saying "we'll make it less appetizing for our nationals to do business with you, so they'll go shopping elsewhere"?


Exactly right. There are trade deals forming between countries that in unprecedented ways to avoid dealing with the constantly changing tariffs while one country says they'll take their ball and play alone.


But the US is the bigger country just next to it, also the most practical to trade with. Trading with country further appart means less efficient in transport. Is it not still self inflicted harm?


The Econ 101 view would say yes, note most countries haven't imposed 1:1 retaliatory tariffs.

But economic considerations are not the only ones. Opposition to the American Revolution is a fundamental theme in Canadian history. People shouldn't be surprised when Canada acts accordingly.


What options do Canadians have? Deal with the wildly capricious economic policies of the US president, or go seeking other, more stable opportunities elsewhere? Almost all countertariffs we have in place are targeted as opposed to the sweeping tariffs Trump is implementing.


They could seek other opportunities elsewhere without adding tariff themselves: continue to import from the US and other countries like before. They may indeed export less to the US due to reduced demand from the US, but reciprocating the tariff won't help with that.


But the point is to hurt producers of other country, to motivate them to argue to their government about original tariffs.

There are no winners in the trade war.


No it is not. It would be being doormat to a bully and bully would come in bully them more.


less efficient in transport

Not after factoring in the 35%-50% tarrif Trump has imposed on many Canadian goods.


It's not practical when Trump sees a TV ad that enrages him and then cancels all negotiations, how are Canadian leaders supposed to proceed? There's no good faith whatsoever from him.


On iOS you can deny an app cellular data access which accomplishes this, as long as you don't launch it on Wifi. But yes I too wish I could deny apps internet access completely.


"... as long as you don't launch it on WiFi."

Unfortunately, apps can still connect even when they are not "launched"

There are ways to deny apps internet access completely. But this is not something that is provided by Apple or Google


Syncthing-fork is in the Play Store and works fine for me.



The commands aren't the special sauce, it's the analytical capabilities of the LLM to view the outputs of all those commands and correlate data or whatever. You could accomplish the same by prefilling a gigantic context window with all the logs but when the commands are presented ahead of time the LLM can "decide" which one to run based on what it needs to do.


Electronic warfare is pretty effective against drones that are using radio waves for their communication. Earlier in the war you could see a lot of drone footage that would become washed out with static as they got closer to tanks so it's much more reliable to use spools of fiber.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: