I confirm this, I used SeaweedFS to serve 1M users daily with 56 million images / ~100TB with 2 servers + HDD only, while Minio can't do this. Seaweedfs performance is much better than Minio's.
The only problem is that SeaweedFS documentation is hard to understand.
This is Chris and I am the creator of SeaweedFS. I am starting to work full time on SeaweedFS now. Just create issues on SeaweedFS if any.
Recently SeaweedFS is moving fast and added a lot more features, such as:
* Server Side Encryption: SSE-S3, SSE-KMS, SSE-C
* Object Versioning
* Object Lock & Retention
* IAM integration
* a lot of integration tests
Also, SeaweedFS performance is the best in almost all categories in a user's test https://www.repoflow.io/blog/benchmarking-self-hosted-s3-com...
And after that, there is a recent architectural change that increases performance even more, with write latency reduced by 30%.
Thank you for your work. I was in a position where I had to choose between minio and seaweed FS and though seaweed FS was better in every way the lack of an includes dashboard or UI accessibility was a huge factor for me back then. I don't expect or even want you to make any roadmap changes but just wanted to let you know of a possible pain point.
One similar use case used Cassandra as SeaweedFS filer store, and created thousands of files per second in a temp folder, and moved the files to a final folder. It caused a lot of tombstones for the updates in Cassandra.
Later, they changed to use Redis for the temp folder, and keep Cassandra for other folders. Everything has been very smooth since then.
SeaweedFS is built on top of a blob storage based on Facebook's Haystack paper.
The features are not fully developed yet, but what makes it different is a new way of programming for the cloud era.
When needing some storage, just fallocate some space to write to, and a file_id is returned. Use the file_id similar to a pointer to a memory block.
There will be more features built on top of it. File system and Object store are just a couple of them. Need more help on this.
The allocated storage is append only. For updates, just allocate another blob. The deleted blobs would be garbage collected later. So it is not really mmap.
> Also what is the difference between a file, an object, a blob, a filesystem and an object store?
The answer would be too long to fit here. Maybe chatgpt can help. :)
I, too, am interested in your views on the last 2 questions, since your views, not chatGPT's, are what informed the design. Part of learning from others' designs [0] is understanding what the designers think about their own design, and how they came about it.
Would you mind elaborating on them? HN gives a lot of space, and I'm confident you can find a way to summarize without running out, or sounding dismissive (which is what the response kind of sounds like now).
The blob storage is what SeaweedFS built on. All blob access has O(1) network and disk operation.
Files and S3 are higher layers above the blob storage. They require metadata to manage to the blobs, and other metadata for directories, S3 access, etc.
These metadata usually sit together with the disks containing the files. But in highly scalable systems, the metadata has dedicated stores, e.g., Google's Colossus, Facebook's Techtonics, etc. SeaweedFS file system layer is built as a web application of managing the metadata of blobs.
Actually SeaweedFS file system implementation is just one way to manage the metadata. There are other possible variations, depending on requirements.
There are a couple of slides on the SeaweedFS github README page. You may get more details there.
Thank you, that was very informative. I appreciate your succinct, information dense writing style, and appreciate it in the documentation, too, after reviewing that.
what makes it different is a new way of programming for the cloud era.
but you aren't even explaining how anything is different from what a normal file system can do, let alone what makes it a "new way of programming for the cloud era".
Sorry, everybody has different background of knowledge. Hard to understand where the question comes from.
They were straightforward questions. The paper you linked talks about blobs as a term for appending to files. Mostly it seems to be about wrapping and replicating XFS.
Is that why you are avoiding talking about specifics? Are you wrapping XFS?
I'm little confused why people are being so weird with the OP, asking what the difference between a blob and a file aren't something for seaweedfs lol, Blobs and Files, and other terms are terms used to describe different layers of data allocation in almost every modern object storage solution.
Blobs are what lie under files, you can have a file split into multiple blobs spread across different drives, or different servers etc, and then can put it back together into a file when requested, thats how i understand it at a basic level
I think they are being weird. According to the facebook pdf they linked I think that would fall under chunks, but either way, why would someone advertise a filesystem for 'blob' storage when users don't interact with that? According to the paper 'blobs' are sent to append to files, but that isn't really 'blob storage', it's just giving a different name for an operation that's been done since the 70s - appending to a networked file. No one would say 'this filesystem can store all your file appends' and no one is storing discreet serialized data without a name and once you do that, you have a file.
They also seem like they are being vague and patronizing to avoid admitting that their product is not a unique filesystem, but just something to distribute XFS.
> Why does a user need that? Filesystems already break up files into blocks / sectors. Why wouldn't a user just deal with files and let the filesystem handle it?
A blob has its own storage, which can be replicated to other hosts in case current host is not available. It can scale up independently of the file metadata.
Why does a user need that? Filesystems already break up files into blocks / sectors. Why wouldn't a user just deal with files and let the filesystem handle it?
I really don't understand why you aren't eager to explain the differences and what problems are being solved.
First, the feature set you have built is very impressive.
I think SeaweedFS would really benefit from more documentation on what exactly it does.
People who want to deploy production systems need that, and it would also help potential contributors.
Some examples:
* It says "optimised for small files", but it is not super clear from the whitepaper and other documentation what that means. It mostly talks about about how small the per-file overhad is, but that's not enough. For example, on Ceph I can also store 500M files without problem, but then later discover that some operations that happen only infrequently, such as recovery or scrubs, are O(files) and thus have O(files) many seeks, which can mean 2 months of seeks for a recovery of 500M files to finish. ("Recovery" here means when a replica fails and the data is copied to another replica.)
* More on small files: Assuming small files are packed somehow to solve the seek problem, what happens if I delete some files in the middle of the pack? Do I get fragmentation (space wasted by holes)? If yes, is there a defragmentation routine?
* One page https://github.com/seaweedfs/seaweedfs/wiki/Replication#writ... says "volumes are append only", which suggests that there will be fragmentation. But here I need to piece together info from different unrelated pages in order to answer a core question about how SeaweedFS works.
* https://github.com/seaweedfs/seaweedfs/wiki/FAQ#why-files-ar... suggests that "vacuum" is the defragmentation process. It says it triggers automatically when deleted-space overhead reaches 30%. But what performance implications does a vacuum have, can it take long and block some data access? This would be the immediate next question any operator would have.
* Scrubs and integrity: It is common for redundant-storage systems (md-RAID, ZFS, Ceph) to detect and recover from bitrot via checksums and cross-replica comparisons. This requires automatic regular inspections of the stored data ("scrubs"). For SeaweedFS, I can find no docs about it, only some Github issues (https://github.com/seaweedfs/seaweedfs/issues?q=scrub) that suggest that there is some script that runs every 17 minutes. But looking at that script, I can't find which command is doing the "repair" action. Note that just having checksums is not enough for preventing bitrot: It helps detect it, but does not guarantee that the target number of replicas is brought back up (as it may take years until you read some data again). For that, regular scrubs are needed.
* Filers: For a production store of a highly-available POSIX FUSE mount I need to choose a suitable Filer backend. There's a useful page about these on https://github.com/seaweedfs/seaweedfs/wiki/Filer-Stores. But they are many, and information is limited to ~8 words per backend. To know how a backend will perform, I need to know both the backend well, and also how SeaweedFS will use it. I will also be subject to the workflows of that backend, e.g. running and upgrading a large HA Postgres is unfortunately not easy. As another example, Postgres itself also does not scale beyond a single machine, unless one uses something like Citus, and I have no info on whether SeaweedFS will work with that.
* The word "Upgrades" seems generally un-mentioned in Wiki and README. How are forward and backward compatibility handled? Can I just switch SeaweedFS versions forward and backward and expect everything will automatically work? For Ceph there are usually detailed instructions on how one should upgrade a large cluster and its clients.
In general the way this should be approached is: Pretend to know nothing about SeaweedFS, and imagine what a user that wants to use it in production wants to know, and what their followup questions would be.
Some parts of that are partially answered in the presentations, but it is difficult to piece together how a software currently works from presentations of different ages (maybe they are already outdated?) and the presentations are also quite light on infos (usually only 1 slide per topic). I think the Github Wiki is a good way to do it, but it too, is too light on information and I'm not sure it has everything that's in the presentations.
I understand the README already says "more tools and documentation", I just want to highlight how important the "what does it do and how does it behave" part of documentation is for software like this.