Hey, by reading all the bad things seems that OrientDB would fit better than Mon...

mtrn · on April 13, 2012

Thanks for writing OrientDB! - I tried it, but I was pressed for time, so I needed something that more or less worked instantly for my requirements - which in the end was elasticsearch.

TL;

I researched MongoDB and OrientDB for a side-project with a bit heavy data structure (10M+ docs, 800+ fields on two to three levels). MongoDB was blazingly fast, but it segfaulted somewhere in the process (also index creation needs extra time and isn't really ad-hoc). OrientDB wasn't as fast and a little harder to do the initial setup but the inserting speed was ok - for a while (500k docs or so) and then it degraded. I also looked at CouchDB, but I somehow missed the ad-hoc query infrastructure.

My current solution, which works nice for the moment is elasticsearch; it's fast - and it's possible to get a prototype from 0 to 10M docs in about 50 minutes - or less, if you load balance the bulk inserts on a cluster - which is so easy to setup it's scary - and then let a full copy of the data settle on each machine in the background.

Disclaimer - since this is a side project, I did only minimal research on each of the technologies (call it 5 minute test) and ES clearly won the first round over both MongoDB and OrientDB.

AdrianRossouw · on April 14, 2012

i love ES, but i don't really feel comfortable with it as a primary datastore. We tend to use couchdb to write to, and ES to query against. It all happens automagically with a single shell command.

I won't use ES on it's own, because I have experienced situations in the past where the dynamic type mapping functionality gets confused, ie: the first time it sees a field, it indexes it as an integer, but then one of the later records has 'n/a' instead of a number. The entire record became unquery-able after that, even if it might have stored the original data.

You could fix this by creating the mapping by hand, BEFORE any data has been imported, as it can't be modified later. But what you have then is a situation where you have to maintain a schema to not get it to 'randomly' ignore data.

You also can't just tell ES to rebuild an index when you need to mess with the mappings, you have to actually create a new index, change the mappings and then reimport the data into the new index (possibly from the existing index).

It actually also feels right to me to split storing the data versus querying the data between separate applications, because they have different enough concerns, that being able to scale them out differently is a boon sometimes.

mtrn · on April 14, 2012

Thank you for your input. Had minor issues with dynamic mapping, too - but since the data is more or less just strings, I could circumvent ES' mechanism to infer datatype from value by simple using an empty default-mapping.js. I'll definitely give your approach a try.

amalag · on April 14, 2012

I have always been curious about OrientDB, but from what I saw it was very small and not backed by any commercial entity and it's usage was not widespread. Also Luca, you should in fairness write that you are the maintainer.