More

0xjmp · 2025-12-01T19:43:21 1764618201

Location: SF Remote: Either Willing to relocate: Yes Technologies: Ruby-on-rails, Rust, Go, Security Resume/CV: jpeterson.co Email: hello@jpeterson.co

0xjmp · 2025-11-06T19:44:01 1762458241

This happens top down historically though, yes?

Someone releases a maxed out parameter model. Another distillates it. Another bifurcates it. With some nuance sprinkled in.

0xjmp · 2025-11-06T19:39:38 1762457978

I must be missing something important here. How do the Chinese train these models if they don't have access to the GPUs to train them?

barrell · 2025-11-06T19:44:16 1762458256

I believe they mean distribution (inference). The Chinese model is currently B.Y.O.GPU. The American model is GPUaaS

0xjmp · 2025-11-06T20:01:28 1762459288

Why is inference less attainable when it technically requires less GPU processing to run? Kimi has a chat app on their page using K2 so they must have figured out inference to some extent.

jychang · 2025-11-06T23:20:08 1762471208

That entirely depends on the number of users.

Inference is usually less gpu-compute heavy, but much more gpu-vram heavy pound-for-pound compared to training. General rule of thumb is that you need 20x more vram for training a model with X params, than for inference for that same size model. So assuming batch size b, then serving more than 20*b users would tilt vram use on the side of inference.

This isn't really accurate; it's an extremely rough rule of thumb and ignores a lot of stuff. But it's important to point out that inference is quickly adding to costs for all AI companies. Deepseek claims that they used $5.6mil to train Deepseek R1; that's about 10-20 trillion tokens at their current pricing- or 1 million users sending just 100 requests at full context size.

root_axis · 2025-11-06T23:04:36 1762470276

> it technically requires less GPU processing to run

Not when you have to scale. There's a reason why every LLM SaaS aggressively rate limits and even then still experiences regular outages.

throwaway314155 · 2025-11-06T20:12:30 1762459950

tl;dr the person you originally responded too is wrong.

Der_Einzige · 2025-11-07T06:10:43 1762495843

That's super wrong. A lot of why people flipped out about Deepseek V3 is because of how cheap and how fast their GPUaaS model is.

There is so much misinformation both on HN, and in this very thread about LLMs and GPUs and cloud and it's exhausting trying to call it out all the time - especially when it's happening from folks who are considered "respected" in the field.

riku_iki · 2025-11-06T21:17:38 1762463858

> How do the Chinese train these models if they don't have access to the GPUs to train them?

they may be taking some western models: llama, chatgpt-oss, gemma, mistral, etc, and do postraining, which required way less resources.

simonw · 2025-11-07T04:25:35 1762489535

If they were doing that I expect someone would have found evidence of it. Everything I've seen so far has lead me to believe that these Chinese AI labs are training their own models from scratch.

riku_iki · 2025-11-07T04:27:27 1762489647

not sure what kind of evidence it could be..

simonw · 2025-11-07T05:58:07 1762495087

Just one example: if you know the training data used for a model you can prompt it in a way that can expose whether or not that training data was used.

The NYT used tricks like this as part of their lawsuit against OpenAI: page 30 onwards of https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

riku_iki · 2025-11-07T06:02:15 1762495335

You either don't know which training data was used for say chatgpt oss, or training data can be included into some open dataset like pile or similar. I think this test is very unreliable, and even if someone come to such conclusion, not clear what is the value of such conclusion, and if that someone can be trusted.

simonw · 2025-11-07T07:43:27 1762501407

My intuition tells me it is vanishingly unlikely that any of the major AI labs - including the Chinese ones - have fine-tuned someone else's model and claimed that they trained it from scratch and got away with it.

Maybe I'm wrong about that, but I've never heard any of the AI training experts (and they're a talkative bunch) raise that as a suspicion.

There have been allegations of distillation - where models are partially trained on output from other models, eg using OpenAI models to generate training data for DeepSeek. That's not the same as starting with open model weights and training on those - until recently (gpt-oss) OpenAI didn't release their model weights.

I don't think OpenAI ever released evidence that DeepSeek had distilled from their models, that story seemed to fizzle out. It got a mention in a congressional investigation though: https://cyberscoop.com/deepseek-house-ccp-committee-report-n...

> An unnamed OpenAI executive is quoted in a letter to the committee, claiming that an internal review found that “DeepSeek employees circumvented guardrails in OpenAI’s models to extract reasoning outputs, which can be used in a technique known as ‘distillation’ to accelerate the development of advanced model reasoning capabilities at a lower cost.”

riku_iki · 2025-11-07T18:52:04 1762541524

Additionally, it would be interesting to know if there is dynamics in opposite directions, US corps (oai, xai) can now incorporate Chinese models into their core models as one/several expert towers.

riku_iki · 2025-11-07T16:44:12 1762533852

> That's not the same as starting with open model weights and training on those - until recently (gpt-oss) OpenAI didn't release their model weights.

there was obviously llama.

zackangelo · 2025-11-07T03:41:43 1762486903

What 1T parameter base model have you seen from any of those labs?

riku_iki · 2025-11-07T03:46:05 1762487165

its moe, each expert tower can be branched from some smaller model.

jychang · 2025-11-11T00:46:55 1762822015

That's not how MoE works, you need to train the FFN directly or else the FFN gate would have no clue how to activate the expert.

0xjmp · on Jan 23, 2024

I think Sundar Pichai put it succinctly: “durable cost savings”

0xjmp · on Jan 23, 2024

In this current job market 6 months to next job is a very quick turnaround...

mlhpdx · on Jan 23, 2024

Can you explain? Unemployment is still very, very low and there are hundreds of thousand of open positions out there. Not sure about game companies, hence my asking.

bragr · on Jan 23, 2024

Not just gaming. Tech companies have been shedding jobs like crazy and slashing hiring. Pretty much every day there is news of some company slashing hundreds or thousand of jobs. Anecdotally, its not unusual to take 6 months to find a new job at only 2/3rd the previous salary.

whateveracct · on Jan 23, 2024

Tech companies are definitely still hiring too though. My company is and I have no shortage of recruiter LinkedIn mail with leads should I need or want to find another job.

jachee · on Jan 23, 2024

I read an article about this recently, but can’t find it to officially cite it. However, the thrust was something like this:

Despite unemployment being low, the reality is that many of those “thousands of positions” are either geographically-distributed duplicates of the same position, eternally-open cattle calls, or open-but-basically-a-formality for either getting a visa or an internal move. This makes the number of actual openings quite opaque, yet much smaller than it appears.

johnnyanmac · on Jan 28, 2024

Yeah, that's my feeling. I see plenty of positions and still get maybe 5 calls a week for work. But it really does feel like most aren't even looking at my resume, nor do recruiters get much farther through a hiring manager.

Of course I could simply be unlucky and swamped by other more attractive candidates. Apparently seniors are swamping Junior roles, so maybe I'm just competing with a bunch of staff/principals now in mid/senior roles.

ajmurmann · on Jan 23, 2024

I don't think these numbers are for tech only anyways. Further, it seems like the high demand is around service jobs and not white-collar office jobs either. So it's probably easy to find a job, but you are gonna be selling donuts, not produce art for video games or earn 400k+ building apps.

mlhpdx · on Jan 23, 2024

Fair, but it's also true that many of the positions eliminated were actually unfilled (paper shuffling). It's so hard to tell the facts from the spin.

0xjmp · on Dec 13, 2023

Happy to see publishers getting paid! But definitely unsure what this means for society long-term.

albert180 · on Dec 13, 2023

Axel Springer is German Fox News. This is very bad news unless you want Tay 2.0

0xjmp · on Dec 4, 2023

Hang on a sec.

> Users don't have the same life experience as security people and sometimes a user simply do not know how to verify a link on e.g. his iPhone.

Later...

> My password is mine. I control my password. I own my password. I am not dependent upon some third party closed proprietary operating system or device to handle my security. I would rather have a piece of paper with all my passwords written down, stored in a drawer at home

So the general public apparently lacks the ability to verify the url in an email, but _do not_ lack the ability to safely control their password? You completely ignore the ability of the site administrator to safely control your password by the way.

Altruistic as it may be to be anti-Big Tech, they are pushing the needle forward on cybersecurity. Looking at Apple, they invested millions into a special biometric device that ensures the fingerprint cannot be retrieved by *anyone*.

Also, I missed the part of this article where a quote was identified that hardware keys are for the entire population of the world? We do not roll keys out to every single employee - you're right about that. It's an interoperability nightmare. We do roll keys out to critical staff though. CEOs, COOs, high profile figures, critical service admins, etc. These folks are already trained and understand exactly why this initiative is so important; usually because they themselves have been targeted already.

Hardware keys are a great thing. I'm frankly getting really exhausted reading about how every new solution must be engineered to fit every single human of planet earth's needs. Its simply not designed with them in mind because they're simply not ready yet. It is what it is.

Nuclear power plants don't air gap controls because they hope the same system is used by Walmart down the street. They do it because their risk is... well... nuclear. Hardware keys in tech are no different.

0xjmp · on Nov 27, 2023

I took a year off. I couldn't exactly afford it, and I didn't exactly sign up for it either (laid off). But it was exactly what I needed.

I lost a loved one and mistakenly assumed work peers would understand that I was struggling. They didn't. No secret the workplace makes people astoundingly cold. This left me feeling bitter at the industry.

Towards the 8 month mark I started to have a different crisis: I still didn't want to work again. Would I ever want to work again? Just in the knick of time I did find my stride. My point is that it took patience. You're in the wrong environment for that.

I'm sure you've heard this before but your work doesn't care about you. Your co-workers don't care about you. If you put even a small amount of care or emotion into the job you're playing yourself.

Your kid matters. Your job doesn't. Your deadlines are a lie made up by someone who should be focused on going to therapy. Sounds like you're a driven, talented individual. Take a break for you and your kid. That ambitious drive won't leave you just because you took a break to focus on what actually matters.

0xjmp · on Nov 16, 2023

This idea that an equivalent level of talent to SV is readily available in Indiana or Costa Rica for cheaper pay is deeply flawed.

pzo · on Nov 16, 2023

OP didn't mentioned to slash salaries just by half not by 75%. Most IT people in western countries in Europe are not making even 200k per year. Even in London is hard to get 120k unless you maybe working as a contractor.

A lot of those SV talents are not american but migrated from europe or elsewhere - there are still talented people in EU who just simply don't want to move to USA these days even if salaries are at least 2x. You wouldn't have a problem finding real talent in eastern europe for 150k.

0xjmp · on Nov 17, 2023

You're both contradicting yourself and proving my point.

Eastern Europe. For a non-profit privacy focused company. You're joking right?

0xjmp · on Sept 28, 2023

Second this. From using LLM within my daily IC workflow, its not that developers are getting a productivity boost, its that our job became more enjoyable. I didn't even realize how much frustration builds up over time while relying on google search for things such as that one obscure api method i forgot the name of. I have to dig and dig... Somehow using GPT in its place has brought a noticeable bump to my quality of life. Most of my dev colleagues share this sentiment. Impact on the tech economy from this? Unknown.

smabie · on Sept 28, 2023

I would imagine that if your job is more enjoyable, you're getting a productivity boost.

Unless you're suggesting that using an llm is just as efficient as not, but is somehow more fun?

pineapple_guy · on Sept 28, 2023

Possibly, but why assume that developers are passing on that increased productivity without expecting a relative pay increase?

smabie · on Sept 29, 2023

Why would they get a pay increase? Everyone can use LLMs.