Hacker Newsnew | past | comments | ask | show | jobs | submit | fergie's commentslogin

Surely Redhat has gone from being the defacto default Linux to relative obscurity?

> Proton for Business newsletter

AFAIK you are legally allowed to spam businesses, but not individuals. A handy get-out clause for marketeers.


How do you know the address you’re emailing belongs to a business? The head of A&A ISP in the UK used to regularly win ~£100 judgements in small claims from spammers because his personal email was leased for a nominal fee from aa.net.uk, the same domain as his business.

OP had checked that they would like to receive the "Proton for Business newsletter", and on that basis was deemed a "business".

If your email is used as a contact on Business subscription it is safe to assume that it is used for business purposes.

I mean that's cute and all, but it's a party trick, and very unlikely it caused any actual behaviour to change.

I'm 30 years in, and literally don't understand the question.

After a quick look this is can be seen as a low level GPU/TPU optimization problem where you have to consider the throughput and depth of different arithmetic pipelines. If you want to hire people who understand how to do that you unfortunately have to give them such a convoluted task and emulate the relevant parts of HW. (In reality this is probably more like TPU since it has scalar pipelines, but the optimization methods are not that different)

The task is to parallelize tree traversal, which is embarrassingly unparallel so it's tricky.


This also shows that a performance engineer's job, even at Anthropic, is to be a glorified human compiler, who is often easily beaten by LLMs.

> who is often easily beaten by LLMs

Is that really the case? My experience is fairly limited, but I've found that the LLM's willingness to fill in plausible sounding (but not necessarily at all accurate) numbers where it needs them to be a significant hindrance when asking it to think about performance.


I think the job is to be one of the few that's better than LLMs.

And how would one do that these days if they didn't spend their career doing this pre-LLM? Just expect to study and perform such projects as a hobby for a few years on the side? These are specialized problems that you only really do for a few select companies.

I mean yeah... You kind of have to learn this stuff (performance engineering) by yourself (a strong education background helps a lot of course). There are transferable parts of it and there are platform-specific parts where you need to be somewhat familiar with GPUs.

Seeks like another catch 22 when companies still care about 3-5 years of experience in industry, even if you work on some hobby projects. I'm not in this sector but I had similar struggles getting noticed in another specific domain despite studying it for a while.

Since it's a CPU, you start with the idea that there is an ALU and spiral outward from that. That gives you something concrete to wrap your head around while you climb up the abstraction levels.

However, when I hit "scratch_write" and it wasn't in the Machine class and it wasn't coming from some Decorator and it was getting defined and deleted by a member function ... I stopped. That's paying lip service to the variable typing that is scattered around and actively hampers even basic IDE usage. Probably the typing was added by AI/LLM after the fact, and it missed that unusual usage. The Python convention used to be that those kinds of variables got declared as "_scratch_write" with a leading underscore to flag that they were "private/internal".

That was the gigantic red "We write shitty code" signal or worse "We don't care about wasting your time" signal. Human review should have flagged that.

Shame. I was kinda looking forward to the technical problem, but I'm not going to spend a bunch of time using grep to untangle garbage code to get at it.

I suspect everything would actually be much clearer if you wrote it in SystemVerilog and tested with Cocotb. Let's see if their LLMs can handle that porting job. HAH!


What is variable typing?

The types on the variables. Python recently adopted "gradual typing", but it isn't enforced by default. Consequently, you may have to actually execute a Python program to determine what an unlabeled variable type is.

A lot of people write Python code and then run "AI" on it to fill in the variable types. This, of course, is error prone and shitty. And the AI will miss strange usages like the one I flagged.

Although I am sorry for phrasing it as "variable typing". I can see how you might read that as "typing that varies" instead.


The question isn't clearly written down anywhere, that's why. Presumably actual candidates would have been given more info over the phone or email. Part of the "challenge" is reverse engineering their Python; unclear if that's intentional.

If you look at the top of perf_takehome.py then there is a brief comment saying the challenge is to optimize a kernel. Kernel in GPU land means a program that computes on data in parallel, it's not an OS kernel:

    Optimize the kernel (in KernelBuilder.build_kernel) as much as possible in the
    available time, as measured by test_kernel_cycles on a frozen separate copy
    of the simulator.
However, this kernel doesn't run on an actual GPU. It runs on a little interpreter for a custom assembly language written in Python. Thus you will be optimizing the program built in-memory by the function on this line:

https://github.com/anthropics/original_performance_takehome/...

This function is described only as:

    Like reference_kernel2 but building actual instructions.
    Scalar implementation using only scalar ALU and load/store.
The KernelBuilder class has some fields like "instrs" but we can't immediately see what they're meant to be because this is Python and types are optional. Nonetheless we can see that instructions are being added to a list, and below we can see the test_kernel_cycles function that runs the interpreter on the program. So our mission is to change the build_kernel function to make a better program. And it says this is an assembly version of the python function reference_kernel2 which is found in problem.py.

What exactly is this kernel doing? The reference_kernel2 function doesn't explain itself either - it's some sort of parallel tree walk. Let's put that to one side for a second and explore the machine, which is defined in problem.py. The machine itself is also largely undocumented, but there's a brief description in a docstring on line 66.

At this point it helps to understand the design of exotic processors. The emulator is for a fictional CPU that uses a VLIW SIMD ISA. Normal programmers will never encounter such a chip. Intel tried to make such a machine decades ago and it never took off, since then the concept has been largely dead. I believe it's still used in some mobile DSPs like Qualcomm's Hexagon. Notably, NVIDIA PTX is not such an ISA so this seems to have been chosen just to make things harder. As the comment explains, in a VLIW machine multiple instructions are packed together into a "slot" and executed in parallel. In a normal CPU the hardware reads a serial stream of instructions and works out just in time which can be executed in parallel, using fancy out-of-order circuitry. In a VLIW machine that's done ahead of time by the compiler or (in this case) the humble programmer, you. But this isn't just a VLIW machine, it's also multi-core, and multi-"engine", so there are multiple levels of execution going on. And it's SIMD, meaning each instruction can itself operate on multiple bits of data simultaneously.

This machine doesn't have registers or cache but it does have "scratch space", and so you can use the vector instructions to load data into a series of 32 bit scratch words and then do things on them in parallel. And multiple vector instructions can also run in parallel. "Broadcasting a scalar" in SIMD-speak means taking a single value and repeating it over multiple scratch space slots (or register subwords in a real machine), so you take e.g. 0xFF and get 0xFFFFFFFFFFFFFFFF.

And that's it, that's all we get. As the code says: "This comment is not meant to be full ISA documentation though, for the rest you should look through the simulator code". Possible point of confusion: real ISAs are serialized to bytes but this one is just Python tuples. The code is only partially typed; sometimes you're just left guessing.

So to recap, the problem is to optimize an undocumented program expressed in undocumented data structures returned by a Python function whose result is interpreted by a partly documented Python class that simulates a fictional exotic CPU architecture using an abandoned design that gives a lot of parallel computational capacity, but which requires all parallelism to be statically declared ahead of time, whilst simultaneously reverse engineering the Python that does all this.

Does that help? Sounds like a fun exercise :)

Edit: I just checked and Google TPUs are much more VLIW like so perhaps this simulator is designed to match a TPU. I know Anthropic rely on TPUs for serving and have done some optimization for them.


It does seem a bit of a strange challenge - a bit reminiscent of high school math problems where understanding the question was as much part of it as actually solving the problem when you understood it.

Since the focus of the challenge appears(?) intended to be optimization, not reverse engineering, it's a bit odd that they don't give a clear statement of what the kernel is meant to be computing. Perhaps the challenge is intended to be a combination of the two, but then the correct reverse engineering part of it becomes a gate for the optimization part, else you'll be solving the wrong problem.

Given the focus on results achieved by Opus 4.5, maybe that's the main point - to show how well Opus can reverse engineer something like this. If they gave the actual clear problem statement, then maybe you could brute force an optimal solution using tree search.


I just threw this prompt at Gemini, and it seems (I haven't analyzed the problem to see if it is correct), to be able to extract a clear understanding of the problem, and a specification for the kernel.

"Can you "reverse engineer" what the kernel in this optimization exercise is actually doing - write a specification for it?

https://github.com/anthropics/original_performance_takehome"

Gemini says it's doing inference on a random forest - taking a batch of inputs, running each one through each decision tree, and for each input outputting the sum of these decision tree outputs - the accumulated evidence.


So looking at the actual code (reference_kernel() in problem.py), this "random forest inference" is completely wrong!

It's doing some sort of binary tree traversal, but the hashing and wrap around looks weird - maybe just a made up task rather than any useful algorithm?


Yes, it’s made up.

This isn't "reverse engineering" it's merely "being able to read fairly simple code you didn't write". A much simpler version of the kernel is provided at the end of problem.py as reference_kernel2.

If you can't make sense of such a small codebase or don't immediately recognize the algorithm that's being used (I'm guilty of the latter) then you presumably aren't someone that they want to hire.


Fair enough, and there are clues in the comments too, but why not just provide the specification of the kernel (inputs and outputs) as part of the problem?

They do. They provide reference_kernel which shows the algorithm itself, build_mem_image which shows the data format you will be working with, and finally reference_kernel2 which implements said algorithm on said data format.

They then provide you with a very naive implementation that runs on their (very simple) VLIW architecture that you are to optimize.

If at the end of that someone is still lost I think it is safe to say it was their goal that person should fail.


Well, yes, they have a reference implementation as documentation, just as they have the simulator as documentation for the ISA ...

The problem is about pipelining memory loads and ALU operations, so why not just give clear documentatation and state the task rather than "here's a kernel - optimize it"? \_(ツ)_/


Presumably that is only one of two purposes, with the other being to test your ability to efficiently read, understand, and edit low level code that you didn't write. I imagine you'd regularly run into raw PTX if you worked for them in the relevant capacity.

And perhaps a third purpose is to use the simulator to test your ability to reason about hardware that you are only just getting familiar with.


I would assume that anyone optimizing kernels at Anthropic has full documentation and specs for what they are working on, as well as a personal butler attending to their every need. This is big money work - every 1% performance improvement must translate to millions of cost savings.

Maybe they specified the challenge in this half-assed way to deliberately test those sorts of skills (even if irrelevant to the job), or maybe it was just lazily put together.

The other thing to note is that if you look at what the reference_kernel() is actually doing, it really looks like a somewhat arbitrary synthetic task (hashes, wraparound), so any accurate task specification would really need to be a "line by line" description of the steps, at which point you may as well just say "here's some code - do this".


In a fast-paced domain such as this one, and especially wrt the (global) competitiveness, development/leadership process is most likely chaotic and "best" practices that we would normally find in other lower-paced companies cannot be followed here. I think that by underspecifiying the assignment they wanted to test the ability of a candidate to fit into such environment, apart from the obvious reason and which is to filter out not enough motivated candidates.

They do, but documentation is not always complete or correct.

> as well as a personal butler attending to their every need

I think they do and his name is Claude ;)


> but which requires all parallelism to be statically declared ahead of time

this is what all specialized chips like TPU/Cerebras require today, and it allows for better optimization than a generic CPU since you can "waste" 30 min figuring out the perfect routing/sequencing of operations, instead of doing it in the CPU in nanoseconds/cycles

another benefit is you can throw away all the CPU out-of-order/branch prediction logic and put useful matrix multipliers in it's place


This is nice writeup. Thanks. Another commenter said will've taken them 2h just to sketch out ideas; sans LLMs will've taken me more than 2h just to collect all this info let alone start optimizing it.

It took me about 10 minutes to generate that writeup the old fashioned 100% organic way, because one of the things that's unspecified is whether you're allowed to use AI to help solve it! So I assumed as it's a job interview question you're not allowed, but now I see other comments saying it was allowed. That would let you get much further.

I think I'd be able to make some progress optimizing this program in two hours but probably not much. I'm not a performance engineer but have designed exotic emulated CPU architectures before, so that helps a lot.


I've not written a VM before, but the comments in perf_takehome.py and problem.py explain the basics of this.

I gleaned about half of this comment in a few minutes of just skimming the code and reading the comments on the functions and classes. There's only 500 lines of code really (the rest is the benchmark framework).


Same thought. I doubt they provided additional explanation to candidates - it seems that basic code literacy within the relevant domain is one of the first things being tested.

On the whole I don't think I'd perform all that well on this task given a short time limit but it seems to me to be an extremely well designed task given the stated context. The reference kernel easily fits on a single screen and even the intrinsic version almost does. I think this task would do a good job filtering the people they don't want working for them (and it seems quite likely that I'm borderline or maybe worse by their metric).


I think calling VLIW "an adandoned design" is somewhat of an exaggeration, such architectures are pretty common for embedded audio processing.

Worth adding on that note:

From JAX to VLIW: Tracing a Computation Through the TPU Compiler Stack, https://patricktoulme.substack.com/p/from-jax-to-vliw-tracin...

Google’s Training Chips Revealed: TPUv2 and TPUv3, HotChips 2020, https://hc32.hotchips.org/assets/program/conference/day2/Hot...

Ten Lessons From Three Generations Shaped Google’s TPUv4i, ISCA 2021, https://gwern.net/doc/ai/scaling/hardware/2021-jouppi.pdf


Thanks, that JAX writeup was interesting.

Sure. I did mention DSPs. But how many people write code for DSPs?

x86-64 SSE and AVX are also SIMD

SIMD and VLIW are somewhat similar but very different in the end.

True.

The ISA in this Anthropic machine is actually both, VLIW and SIMD, and both are relevant to the problem.


    Sounds like a fun exercise :)
I'll be honest, that sounds like the opposite of fun since the worst parts of my job are touching the parts of a Python codebase that are untyped. The sad part is this work codebase isn't even that old, maybe a few years, and the developers definitely should have known better if they had anyone capable leading them. Alas, they're all gone now.

Harder than figuring out the instruction set for some exotic CPU are definitely the giant untyped dicts/lists common in data science code.


On the one hand, this exercise probably reflects a realistic task. Daily engineering work comprises a lot of reverse engineering and debugging of messy code. On the other hand, this does not seem very suitable as an isolated assignment. The lack of code base-specific context has a lot of potential for frustration. I wonder what they really tested on the candidates, and whether this was what they wanted to filter for.

> The lack of code base-specific context has a lot of potential for frustration.

I think that's one of the intentional points. Being able to quickly understand what the provided source code is doing.


Wow! Thanks for the explanation :)

"Performance can be optimized by not using python."

Generate instructions for their simulator to compute some numbers (hashes) in whatever is considered the memory of their "machine"¹. I didn't see any places where they actually disallow cheating b/c it says they only check the final state of the memory² so seems like if you know the final state you could just "load" the final state into memory. The cycle count is supposedly the LLM figuring out the fewest number of instructions to compute the final state but again, it's not clear what they're actually measuring b/c if you know the final state you can cheat & there is no way to tell how they're prompting the LLM to avoid the answers leaking into the prompt.

¹https://github.com/anthropics/original_performance_takehome/...

²https://github.com/anthropics/original_performance_takehome/...


Well, they read your code in the actual hiring loop.

My point still stands. I don't know what the LLM is doing so my guess is it's cheating unless there is evidence to the contrary.

I guess your answer to "Try to run Claude Code on your own 'ill-defined' problem" would be "I'm not interested." Correct? I think we can stop here then.

Well that's certainly a challenge when you use LLMs for this test driven style of programming.

Why do you assume it’s cheating?

Because it's a well know failure mode of neural networks & scalar valued optimization problems in general: https://www.nature.com/articles/s42256-020-00257-z

Again, you can just read the code

You're missing the point. There is no evidence to support their claims which means they are more than likely leaking the memory into the LLM prompt & it is cheating by simply loading constants into memory instead of computing anything. This is why formal specifications are used to constrain optimization. Without proof that the code is equivalent you might as well just load constants into memory & claim victory.

> There is no evidence to support their claims

Do you make a habit of not presuming even basic competence? You believe that Anthropic left the task running for hours, got a score back, and never bothered to examine the solution? Not even out of curiosity?

Also if it was cheating you'd expect the final score to be unbelievably low. Unless you also suppose that the LLM actively attempted to deceive the human reviewers by adding extra code to burn (approximately the correct number of) cycles.


This has nothing to do w/ me & consistently making it a personal problem instead of addressing the claims is a common tactic for people who do not know what it means to present evidence for their claims. Anthropic has not provided the necessary evidence for me to conclude that their LLM is not cheating. I have no opinion on their competence b/c that is not what is at issue. They could be incompetent & not notice that their LLM is cheating at their take home exam but I don't care about that.

You are implying that you believe them to be incompetent since otherwise you would not expect evidence in this instance. They also haven't provided independent verification of their claims - do you suspect them of lying as well?

How do you explain the specific score that was achieved if as you suggest the LLM simply copied the answer directly?


Either they have proof that their LLM is not cheating or they don't. The linked post does not provide evidence that the LLM is not cheating. I don't have to explain anything on my end b/c my claim is very simple & easily refuted w/ the proper evidence.

And? Anthropic is not aware of this 2020 paper? The problem is not solvable?

Why are you asking me? Email & ask Anthropic.

Obviously, because you use this old paper as an argument.

I don't have any insider information on what they know or don't know so you're welcome to keep asking nonsensical questions but eventually I'll stop answering.

Which part exactly are ypu having trouble with?

- Optimize the kernel (in KernelBuilder.build_kernel) as much as possible in the available time, as measured by test_kernel_cycles on a frozen separate copy of the simulator


Thank goodness, I thought it was just me...

Right- but if you have a long line that is, for example, a JSON object, then surely it can't be properly be validated or syntax-highlighted before the entire line is scanned?

I do agree that Emacs can be slower than the terminal when handling long lines/files, although (depending on your case) this can be easily mitigated by running a terminal inside of Emacs.

Generally though, for everyday use, Emacs feels a lot snappier than VSCode.


Good point. Though for widget UIs you're typically rendering structured data you control, not parsing arbitrary text files. The syntax highlighting / validation concern applies to editing code, not to building interactive interfaces.

> Generally though, for everyday use, Emacs feels a lot snappier than VSCode.

+1


I see why this is easy and fun, but is it really "self-hosting" if you are dependent on a $1200 a year AI-service to build and maintain it?


You only have to spend 5 minutes browsing for MCP servers to see that there is an issue with AI slop. MCP is probably the first "standard" to be built out in the vibe-coding era and it really shows.

As mentioned in the article, its not clear to me what the advantage over OpenAPI is. Surely a swagger file solves more or less the same issue.

That said, one minor nice thing about the MCP servers is that they operate locally over stdin stdout, which feels a lot faster than HTTP/Rest.


What do you mean with "locally over stdin/stdout"? This is only true if the MCP server (and original service) runs locally.


It sounds weird, but for reasons that I don't fully understand (bandwidth issues maybe?), first gen MCP servers run and accept "queries" locally and talk to their AI-minds themselves on the back end. Its not an HTTP API direct to a remote service as you might expect.


There are a large subset of security problems that are solved by simply eliminating compilation steps typically included in "postinstall". If you want a more secure, more debuggable, more extensible lib, then you should definitely publish it in pure js (rather than, say, Typescript), so that there is no postinstall attack surface.


With type stripping in Node LTS now there's no reason at all to have a postinstall for Typescript code either. There's fewer reasons you can't post a "pure TS" library either.


The TypeScript compiler is being ported to Go, so if you want type-checking going forward you will need to execute a native binary.


In all of this, people forget that NPM packages are largely maintained by volunteers. If you are going to put up hurdles and give us extra jobs, you need to start paying us. Open source licenses explicitly state some variation of "use at your own risk". A big motivation for most maintainers is that we can create without being told what to do.

I had 25 million downloads on NPM last year. Not a huge amount compared to the big libs, but OTOH, people actually use my stuff. For this I have received exactly $0 (if they were Spotify or YouTube streams I would realistically be looking at ~$100,000).

I propose that we have two NPMs. A non-commercial NPM that is 100% use at your own risk, and a commerical NPM that has various guarantees that authors and maintainers are paid to uphold.


NPM has to decide between either being a friendly place for hobbyists to explore their passions or being the backbone for a significant slice of the IT industry.

Every time someone pulls/messes with/uploads malware to NPM, people complain and blame NPM.

Every time NPM takes steps to prevent pulling/messing with/uploading malware to NPM, people complain and blame NPM.

I don't think splitting NPM will change that. Current NPM is already the "100% use at your own risk" NPM and still people complain when a piece of protestware breaks their build.


In my opinion the problem has more to do with the whole corporate software ecosystem having lost past good practices:

Before you were never to use a public version of something as-is. Each company was having their own corporate repository with each new version of dependencies being carefully curated before being added to the repository.

Normally you should not update anything without at least looking at the release note differential to understand why you update but nowadays people add or update whatever package without even looking.

You just have to look at how many downloads got typosquated clones of famous projects.

For me it is even bad for the whole ecosystem as everyone is doing that, the one still doing that are at odd, slower and less nimble. And so there is a dumping with no one anymore committed to pay the cost of having serious software practices.

In my opinion, node, npm and the js ecosystem are responsible in a big part of the current situation. Pushing people and newbies to wrong practices. Cf all the "is-*x packages...


If you are going to put up hurdles and give us extra jobs, you need to start paying us.

Alternatively, we can accept that there will be fewer libraries because some volunteers won't do the extra work for free. Arguably there are too many libraries already so maybe a contraction in the size of the ecosystem would be a net positive.


Note: the bad guys are incentivized to work for free, this would increase the problem considerably.


The npm left-pad incident would be the classic argument against this position


It's a bit more complicated than that. The ecosystem around node is just weird. It's not clear what role NPM wants to have.

Lots of people chase downloads on NPM. It's their validation, their youtube subscribers, or their github stars if you will. That's how they get job offers. Or at least they think they do, I don't know if it actually works. There's tons of good software there, but the signal to noise ratio is still rather low.

Given that, I'd rather get paid for including your software as a dependency to my software, boosting your downloads for a long time.

Just kidding, of course. On that last part. But it wouldn't surprise me the least if something like it actually happened. After all, you can buy stars on github just like on any other social media. And that does strange things to the social dynamics.


I agree with you here, it feels like management said: "well, we have to do SOMETHING!" and this is what they chose: push more of the burden on to the developers giving away stuff for free when the burden should be on the developers and companies consuming the stuff for free.


But the management who decided that gets rewarded for pushing work to someone else.


Not looking forward to the mandatory doxxing that would probably come along if this was introduced today.


This makes no sense, maintainers are not exactly operating under a cloak of anonymity. Quite the opposite in fact.


Yes! I despise how the open source and free software culture turns into just free labour for freeloading million-dollar and billion-dollar companies.

The culture made sense in the early days when it was a bunch of random nerds helping each other out and having fun. Now the freeloaders have managed to hijack it and inject themselves into it.

They also weaponise the culture against the devs by shaming them for wanting money for their software.

Many companies spend thousands of dollars every month on all sorts of things without much thought. But good luck getting a one-time $100 license fee out of them for some critical library that their whole product depends on.

Personally I'd like to see the "give stuff to them for free then beg and pray for donations" culture end.

We need to establish a balance based on the commercial value that is being provided.

For example I want licensing to be based on the size and scale of the user (non-commercial user, tiny commercial user, small business, medium business, massive enterprise).

It's absurd for a multi-million company to leech off a random dev for free.


I have no idea how much of this stuff is volunteer written, and how much is paid work that is open-sourced.

No one if forced to use these licences. Even some FOSS licences such as AGPL will not be used by many companies (even the GPL where its software that is distributed to users). You could use a FOSS license and add an exemption for non-commercial use, or use a non-FOSS license that is free for non-commercial use or small businesses.

On the other hand a lot of people choose permissive licenses. I assume they are happy to do so.


I only use copyleft licenses, it keeps away most of them I imagine.


https://econofact.org/factbrief/do-private-equity-firms-own-...

> "Large institutional investors, defined as those owning over 100 homes (which includes private equity firms), own 3 percent of the single-family rental stock nationwide according to Brookings. This share is higher in some local markets — in the 20 Metropolitan Statistical Areas where these investors are most present, they own 12.4 percent"

I personally believe that its problematic that large institutional investors own 12.4% of single family properties in the 20 main metro areas of the US.


Proposal for new word: "employtainment"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: