How does Mongo DB handle a large array field?

By : Jaepil
Source: Stackoverflow.com
Question!

I'm trying to store a list of ObjectIds in a document as an array field.

I understand Mongo DB has a 4MB size limit for single documents. So considering the length of ObjectId is 12 bytes, a document should be able to handle more than 300,000 entries in one array field. (Let me know if the calculation is off).

If the number of entries in the array gets close to that limit, what kind of performance can I expect? Especially when the field is indexed? Any memory issues?


Typical queries would look like below:

Query by a single value

db.myCollection.find(
  {
    myObjectIds: ObjectId('47cc67093475061e3d95369d')
  }
);

Query by multiple values

db.myCollection.find(
  {
    myObjectIds: {$in: [ObjectId('47cc67093475061e3d95369d'), ...]}
  }
);

Add a new value to multiple documents

db.myCollection.update(
  {
    _id: {$in: [ObjectId('56cc67093475061e3d95369d'), ...]}
  },
  {
    $addToSet: {myObjectIds: ObjectId('69cc67093475061e3d95369d')}
  }
);


By : Jaepil


Answers

You won't notice when you hit the document size limit unless you use getLastError after each update. The update will fail, and a message is logged to the database log. I have anecdotal evidence from my local ops guy that Mongo seems to be working harder when there are a lot of updates that fail because of the document size being reached.

I know of no easy way of avoiding it, other than designing around it. As far as I know there is no way to conditionally push to a list. I've seen other questions here on SO where people have been trying to build fixed size lists and such, but no good solutions have been found.

By : Theo


TBH, I think the best thing you can do is to benchmark it. Create some dummy data, and test the performance as you increase the number of items in the array. It may be quicker to knock up a test in your environment - than wait for an answer here

It is one thing on my TODO list to investigate and blog about, but I haven't got round to it yet. If you do, I'd definitely be interested to see what your findings are! Likewise, if I get round to it soon I will post the results here too.

By : AdaTheDev


With the release of mongo 2.4 you can use capped arrays. On insert, you can tell mongo to $sort and $slice the array to keep it to a fixed length based on your criteria (if you don't care about throwing data away). For example, you could use this to save the most recent N entries in a data log.



This video can help you solving your question :)
By: admin