Saturday, April 3, 2010

MongoDB Sequences

I came across an issue today with MongoDB, the first one where SQL would have had a simple answer - sequences.

If you're familiar with SQL you know that this is very simple to do by declaring a column an index in a table. With MongoDB there's no built-in capability for this (as of 1.4.0, which I'm using now - I wouldn't be surprised to see some built-in sequence capability in a future release).

MongoDB does have a case that guarantees order, a "capped" collection, but it's not really meant for this purpose.

I found some hints online about how this could be done. Since none of those were satisfying and I ended up coming up with my own relatively simple way, I thought I'd share it as food for thought.

The approach is to have a javascript function saved to the database that can be called from the client to do our bidding. The client calls db.eval() to invoke this function to insert the object for us.

To set this up, I created a collection named "sequences" where the "_id" of each entry is the name of another collection in the database that I want to be sequenced. The table just needs to have an entry with an initial value for the sequence (take your pick) before the function to insert ever gets called.

For instance, if I had a collection named "foo" I would start with an entry like:
{_id: "foo", seq: 1}

To insert an object into the database I invoke db.eval() passing the function the name of the collection to insert into, and the object to be inserted.

The function that does the insert is:

function(coll, obj) {
var s = db.sequences.findOne({_id: coll});
s.seq++;
db.sequences.save(s);
obj['seq'] = s.seq;
db[coll].insert(obj);
return {'seq':s.seq,
'error_status':db.runCommand("getlasterror")};
}

I'm returning the sequence that was allocated (which I keep track of in my use case in the case where the insert was successful) and the error information associated with the insert. That way if the insert failed for some reason (like an index uniqueness constraint violation) I still find out about it.

Some caveats are probably in order.

I'm not in a sharded environment (yet), and when I am I suspect I will have to revisit this.

This also isn't the most efficient approach for high performance because db.eval() monopolizes mongod, so depending on the database usage pattern this could be pretty disruptive. On the other hand, this mongod behavior effectively acts as a lock and means calling this function will be atomic. I'm going to wait and see if this is actually a performance issue in my environment, however, before implementing an approach that brings more complexity into the application.

Whatever the case, I thought this was an interesting way to solving the problem to think about as it's a pretty straightforward analog to the SQL sequence functionality.

By the time I need to do it differently, maybe there will be a native way to do so in MongoDB.

No comments:

Post a Comment