IBM was ahead of the boat! They had Watson on Jeopardy years ago! /s
I think you make a fair point about the potential disruption for their consulting business but didn't they try to de-risk a bit with the Kyndryl spinout?
I used to add stickers obtained from visiting my company's clients/partners to my laptop. After 5 years the laptop was basically advertising a bunch of dead brands (either through acquisition or lack of VC funding).
Yeah, I don't know about S3, but years back I talked a fair bit with someone that did storage stuff for HPC, and one thing he talked about is building huge JBOD arrays where only a handful of disks per rack would be spun up, basically pushing what could be done with scsi extenders or such. It wouldn't surprise me if they're doing something like that with batch scheduling the drive activations over a minutes to hours window.
I think that's close to the truth. IIRC it's something like a massive cluster of machines that are effectively powered off 99% of the time with a careful sharding scheme where they're turned on and off in batches over a long period of time for periodic backup or restore of blobs.
it's amazing that Glacier is such a huge system with so many people working on it and it's still a public mystery how it works. I've not seen a single confirmation of how it works..
Not even the higher tiers of Glacier were tape afaict (at least when it was first created), just the observation that hard drives are much bigger than you can reasonably access in useful time.
In the early days when there were articles speculating on what Glacier was backed by, it was actually on crusty old S3 gear (and at the very beginning, it was just on S3 itself as a wrapper and a hand wavy price discount, eating the costs to get people to buy in to the idea!). Later on (2018 or so) they began moving to a home grown tape-based solution (at least for some tiers).
I'm not aware of AWS ever confirming tape for glacier. My own speculation is they likely use hdd for glacier - especially so for the smaller regions - and eat the cost.
Someone recently came across some planning documents filed in London for a small "datacenter" which wasn't attached to their usual London compute DCs and built to house tape libraries (this was explicitly called out as there was concern about power - tape libraries don't use much). So I would be fairly confident they wait until the glacier volumes grow enough on hdd before building out tape infra.
Do you have any sources for that? I'm really curious about Glacier's infrastructure and AWS has been notoriously tight-lipped about it. I haven't found anything better than informed speculation.
My speculation: writes are to /dev/null, and the fact that reads are expensive and that you need to inventory your data before reading means Amazon is recreating your data from network transfer logs.
I'd be curious whether simulating a shitty restoration experience was part of the emulation when they first ran Glacier on plain S3 to test the market.
There might be surprisingly little value in going tape due to all the specialization required. As the other comment suggest, many of the lower tiers likely represent basically IO bandwidth classes. a 16 TB disk with 100 IOPs can only offer 1 IOP/s over 1.6 TB for 100 customers, or 0.1 IOP/s over 160 GB for 1000, etc. Just scale up that thinking to a building full of disks, it still applies
I realize you're making a general point about space/IO ratios and the below is orthogonal, no contradiction.
It's actually a lot less user-facing per disk IO capacity that you will be able to "sell" in a large distributed storage system. There's constant maintenance churn to keep data available:
- local hardware failure
- planned larger scale maintenance
- transient, unplanned larger scale failures
(etc)
In general, you can fall back to using reconstruction from the erasure codes for serving during degradation. But that's a) enormously expensive in IO and CPU and b) you carry higher availability and/or durability risk because you lost redundancy.
Additionally, it may make sense to rebalance where data lives for optimal read throughput (and other performance reasons).
So in practice, there's constant rebalancing going on in a sophisticated distributed storage system that takes a good chunk of your HDD IOPS.
This + garbage collection also makes tape really unattractive for all but very static archives.
See comments above about AWS per-request cost - if your customers want higher performance, they'll pay enough to let AWS waste some of that space and earn a profit on it.
Kinda feels like a lot of companies think they can be option 1 because someone else will be option 2, and then they'll hire the young people away after they become experienced.
It has been many moons since I used Magic Lantern. Has anamorphic desqueeze ever been a feature or could it be in the future? That's one missing feature that bums me out about shooting videos on Canon.
Every time there is a thread on big O notation I'm hoping someone is going to explain something that will make it clear how it relates to the anime The Big O. I'm still trying to understand what happened in that show.
This is the reason I also go to Microcenter for components. It's too bad there are so few of them and Fry's is defunct. Not many brick and mortar options left.
I had an aunt who was a hold out until this past year. She was in a rather wooded and sparsely populated area and although faster internet became available awhile ago it was much more expensive and she was already used to the limitations of dial-up so she didn't feel compelled to make the jump. If she really needed fast internet for some reason (maybe emailing an attachment) she would drive to the nearest library.
I think you make a fair point about the potential disruption for their consulting business but didn't they try to de-risk a bit with the Kyndryl spinout?