SpaceMartini's comments

SpaceMartini · on April 13, 2022

This hit me when moving from a start-up to a FAANG. There is effectively an infinite amount of work for me to do on any given day, so at some point I just have to decide to stop - if I don't, I'll just end up tired tomorrow with an equally infinite amount of stuff still to do.

SpaceMartini · on April 12, 2022

I work in HPC for a cloud provider, and fully endorse this move. Anonymously, of course.

You can make an economic argument for or against cloud in practically every IT domain, but in HPC the case for on-prem is really compelling; none of the cloud networking/resiliency value-add is relevant to batch workflows, and costs per core-hour are only remotely comparable if you use spot - which is itself a major compromise.

The only real advantage cloud has for science is object storage, which is genuinely a much better idea than trying to manage your own long-term archival storage.

If I were independent I would recommend people buy and build on-prem clusters and shuffle data out of fast scratch into Glacier, but other than that just don't worry about cloud until price pressure kicks in and we are down to 1-2 cents per core-hour on-demand.

I'd love a role where I can say these things non-anonymously, but the salary for such a position would be at least 50% lower than working for a cloud provider. Keep that in mind when talking to your supplier - we may not believe the pitch ourselves, but making it is just part of the job.

dagw · on April 12, 2022

The only real advantage cloud has for science is object storage

As someone who has done a fair bit of HPC I consider the real advantage to be temporary scalability. If my 'normal' compute notes have 128 GB of RAM and all of a sudden I have job that need 300 GB or RAM, with cloud I can just change a line in a config file and run that calculation on a machine with 300 GB of RAM. Or if I have a job that will optimally run on 100s of 1-core machines with only 4 GB of RAM I can set up a cluster of such machines with in minutes.

That being said I 100% agree that if you have a normal baseline workload that should absolutely be done on in house hardware.

SpaceMartini · on April 12, 2022

Those are fair points - I have seen truly spiky workloads like that very occasionally, but more often those spikes are a precursor to more sustained usage in a similar manner and so would quickly warrant hardware purchases.

SpaceMartini · on April 12, 2022

As an addendum to this: if you absolutely must use cloud, stick with AWS. Using Azure is (IMO) a fucking miserable experience and their only advantage (InfiniBand) is better served by buying your own hardware. GCP and OCI might be fine if you are getting a lot of credits, but the skills will not be useful down the line - while AWS is expensive, you will at least learn a bunch of in-demand operational skills.

smartbit · on April 12, 2022

> you will at least learn a bunch of in-demand operational skills.

Are you saying we should stick with AWS because most stick with AWS?

My personal experience with GKE/GCP was quite good, except as expected for their support.

SpaceMartini · on April 12, 2022

> Are you saying we should stick with AWS because most stick with AWS?

Broadly speaking yes - there is a lot of value in having a deeper pool of skilled people to hire from, and there are enough differences between cloud offerings to knock at least a couple of "effective years" of experience off someone who changes provider.

briffle · on April 12, 2022

So nobody got fired for buying IBM?

smm11 · on April 12, 2022

I can hang 32 terminals off just one PC. You're still blowing your budget on standalones.

earleybird · on April 12, 2022

Dang it all, we had 20+ teletypes hanging off a 32Kb PDP-8 (with DECTape for random access storage), "and we liked it!" :-)

lnwlebjel · on April 12, 2022

> The only real advantage cloud has for science is object storage, which is genuinely a much better idea than trying to manage your own long-term archival storage.

I work with an academic HPC group, and because researchers generally pay only for the hardware, and maybe some recharge rate for occasional maintenance, the cost per TB per month for 100's of TB and larger systems works out to the same for Glacier Deep (about $1/TB/mo) - except there is no 180 day requirement, no egress fees, and no transfer fees. And disk just keeps getting cheaper.

I'm told that big part of the solution is their use of ZFS.

Tsarbomb · on April 12, 2022

If you can setup HPC at scale you can setup Ceph for object storage. It will save you so much money in the long run.

SpaceMartini · on April 12, 2022

I have heard too many horror stories about Ceph (and OpenStack) to be confident about that. I certainly don't think I can truly beat S3 on cost or performance at the terabyte scale for household data - and while larger scale would give on-prem savings there are also higher expectations (in terms of availability and performance) of a multi-perabyte storage array.

secabeen · on April 12, 2022

Really depends on your scale. At the Terabyte to 100s of TB level, you can solve most storage problems at minimum cost with NAS or ZFS on commodity hardware.

Ceph/Object storage comes into its own at the multi-petabyte and higher levels, which is not very many groups or institutions.

SpaceMartini · on April 13, 2022

Solving storage at the tens of TB scale with commodity hardware is fine to a point (I have a ZFS NAS at home) but has much more ongoing maintenance burden than S3 and you need at least 2 copies for it to be a remotely comparable solution in terms of durability.

Ultimately you just have to design for what is important to you; I don't want to spend time managing this stuff any more, so keep a local NAS for my partner to access and put the bulk of my "cold" data into 2 different cloud object storage providers. Note that neither of these is actually S3; for business use I would absolutely use AWS but for personal files I can manage with the reduced capabilities and lower prices others offer.

lmeyerov · on April 12, 2022

Yep it is quite an arbitrage going on. Unless startup credits at the highest tier ($100K), way too expensive for most startups.

For a lot of our customers, cloud is impossible now for HPC: shortages are so bad that you have to know someone high up at top cloud providers to get access to right-sized GPUs. (T4? Forget it -- one of our tickets is open since ~Christmas.)

We have gone hybrid, and for growing compute, going multi-cloud, with main stuff on top 3 clouds (CPU, light minimal GPU...), and GPUs elasticity on other ones. And for a lot... Yep, just buy GPUs for local dev.

duxup · on April 12, 2022

I'm curious what people are spending / over spending all their money on in the cloud?

My exposure to the actual granular costs and billing have only been limited to a small company and in that case the costs were pretty appealing compared to running everything yourself. Granted this was also a bit of a hybrid with some services local and others in the cloud.

I've not had much exposure to where the deep costs start to pile up as far as cloud services goes. I wonder where those pop up?

bsder · on April 12, 2022

> I'm curious what people are spending / over spending all their money on in the cloud?

Outbound data.

Cloud companies generally make inbound data close to free, but outbound data incredibly expensive.

simonebrunozzi · on April 12, 2022

> If I were independent I would recommend people buy and build on-prem clusters and shuffle data out of fast scratch into Glacier,

Pretty much 100% agree with this statement. Only exception is maybe at the beginning of a long series of computation, you might want to start on-demand to fully understand and size your exact needs, and then provision off-prem.

zozbot234 · on April 12, 2022

Scalability might still be a major advantage if you have infrequent enough needs for HPC-scale compute. Basically, use the cloud service as a pure computational "grid". But the pitfalls of cloud business models (including sky-high costs for data egress) often make that unworkable.

SpaceMartini · on April 11, 2022

Anything by Rooster Teeth is worth a try for comedy; not to everyone's taste, but if you like the main RT podcast there is a long backlog worth listening to. Skipping back 200 episodes got me through my PhD write-up.

Any shows hosted by Joe Ressington are good for Linux news if you want down-to-earth, honest conversation. Jupiter broadcast shows may also be worth a look, but personally I find most of the hosts quite annoying and their discussions superficial.

The Economist has a good spread, I mostly stick with Babbage and "Checks and Balance".

For hipsters talking about "productivity" (ie podcaster autofellatio) using Macs and other expensive tools, check the relay.fm offerings.