Differences in Map/Reduce between RavenDB & MongoDB

time to read 6 min | 1119 words

Ben Foster has a really cool article showing some of the similarities and differences between MongoDB & RavenDB with regards to their map/reduce implementation.

However, there is a very important distinction that was missed. Map/reduce operations are run online in MongoDB, that means that for large collections, map/reduce is going to be very expensive. MongoDB has the option of taking the result of a map/reduce operation and writing it to a collection, so you don’t need to run map/reduce jobs all the time. However, that is a snapshot view of the data, not a live view. Ben mentioned that you can do something called incremental map/reduce, but that isn’t actually really good idea at all.

Let us look at the following sequence of operations:

   1: db.items.insert({name: 'oren', ts: 1 });
   2: db.items.insert({name: 'ayende', ts: 2});
   3:  
   4: var map = function Map() { emit(this.name,null); };
   5: var reduce = function(key, val) { return key; };
   6:  
   7: db.items.mapReduce(map,reduce, { out: 'distinct_item_names' });

This creates two items, and give me the distinct names in a separate collection. Now, let us see how that works with updates…

   1: db.items.insert({name: 'eini', ts: 3 });
   2:  
   3: db.items.mapReduce(map,reduce, { out: {reduce: 'distinct_item_names'}, query: {ts: {$gt: 2} } });

This is actually nice, mongo is able to merge the previous results with the new results, so you only have to do the work on the new data. But this has several implications:

  • You have to kick something like ‘ts’ property around to check for new stuff. And you have to _udpate_ that ts property on every update.
  • You have to run this on a regular basis yourself, mongo won’t do that for you.
  • It can’t work with deletes.

It is the last part that is really painful:

   1: db.items.remove({name: 'oren'});

Now, there is just no way for you to construct a map/reduce job that would remove the name when it is gone.

This sort of thing works very nicely when what you want is to just append stuff. That is easy. It is PITA when we are talking about actually using it for live data, that can change and be modified.

Contrast that with the map/reduce implementation in RavenDB:

  • No need to manually maintain state, the database does it for you.
  • No need to worry about updates & deletes, the database does it for you.
  • No need to schedule map/reduce job updates, database does it for you.
  • Map/reduce queries are very fast, regardless of data size.

To be frank, the map/reduce implementation in RavenDB is complex, and pretty much all of it comes down to the fact that we don’t do stupid stuff like run a map/reduce operation on a large database on every query, and that we support edge cases scenarios like data that is actually updated or deleted.

Naturally I’m biased, but it seems to me that trying to use map/reduce in Mongo just means that you have to do a lot of hand holding yourself, while with RavenDB, we take care of everything and leaving you to actually do stuff.