Thanks for writing OrientDB! - I tried it, but I was pressed for time, so I needed something that more or less worked instantly for my requirements - which in the end was elasticsearch.
TL;
I researched MongoDB and OrientDB for a side-project with a bit heavy data structure (10M+ docs, 800+ fields on two to three levels). MongoDB was blazingly fast, but it segfaulted somewhere in the process (also index creation needs extra time and isn't really ad-hoc). OrientDB wasn't as fast and a little harder to do the initial setup but the inserting speed was ok - for a while (500k docs or so) and then it degraded. I also looked at CouchDB, but I somehow missed the ad-hoc query infrastructure.
My current solution, which works nice for the moment is elasticsearch; it's fast - and it's possible to get a prototype from 0 to 10M docs in about 50 minutes - or less, if you load balance the bulk inserts on a cluster - which is so easy to setup it's scary - and then let a full copy of the data settle on each machine in the background.
Disclaimer - since this is a side project, I did only minimal research on each of the technologies (call it 5 minute test) and ES clearly won the first round over both MongoDB and OrientDB.
i love ES, but i don't really feel comfortable with it as a primary datastore. We tend to use couchdb to write to, and ES to query against. It all happens automagically with a single shell command.
I won't use ES on it's own, because I have experienced situations in the past where the dynamic type mapping functionality gets confused, ie: the first time it sees a field, it indexes it as an integer, but then one of the later records has 'n/a' instead of a number. The entire record became unquery-able after that, even if it might have stored the original data.
You could fix this by creating the mapping by hand, BEFORE any data has been imported, as it can't be modified later. But what you have then is a situation where you have to maintain a schema to not get it to 'randomly' ignore data.
You also can't just tell ES to rebuild an index when you need to mess with the mappings, you have to actually create a new index, change the mappings and then reimport the data into the new index (possibly from the existing index).
It actually also feels right to me to split storing the data versus querying the data between separate applications, because they have different enough concerns, that being able to scale them out differently is a boon sometimes.
Thank you for your input. Had minor issues with dynamic mapping, too - but since the data is more or less just strings, I could circumvent ES' mechanism to infer datatype from value by simple using an empty default-mapping.js. I'll definitely give your approach a try.
I have always been curious about OrientDB, but from what I saw it was very small and not backed by any commercial entity and it's usage was not widespread. Also Luca, you should in fairness write that you are the maintainer.
- Non-counting B-Trees: OrientDB uses MVRB-Tree that has the counter. size() requires 0ns
- Poor Memory Management: OrientDB uses MMAP too but with many settings to optimize it usage
- Uncompressed field names: the same as OrientDB
- Global write lock: this kills your concurrency! OrientDB handles read/write locks at segment level so it's really multi-thread under the hood
- Safe off by default: the same as OrientDB (turn on synch to stay safe or use good HW/multiple servers)
- Offline table compaction: OrientDB compacts at each update/delete so the underlying segments are always well defragmented
- Secondaries do not keep hot data in RAM: totally different because OrientDB is multi-master
Furthermore you have Transactions, SQL and support for Graphs. Maybe they could avoid to use a RDBMS for some tasks using OrientDB for all.
My 0,02.