mongodb - retrieve array subset

Question!

what seemed a simple task, came to be a challenge for me.

I have the following mongodb structure:

{
(...)
"services": {
    "TCP80": {
      "data": [{
          "status": 1,
          "delay": 3.87,
          "ts": 1308056460
        },{
          "status": 1,
          "delay": 2.83,
          "ts": 1308058080
        },{
          "status": 1,
          "delay": 5.77,
          "ts": 1308060720
        }]
    }
}}

Now, the following query returns whole document:

{ 'services.TCP80.data.ts':{$gt:1308067020} }

I wonder - is it possible for me to receive only those "data" array entries matching $gt criteria (kind of shrinked doc)?

I was considering MapReduce, but could not locate even a single example on how to pass external arguments (timestamp) to Map() function. (This feature was added in 1.1.4 https://jira.mongodb.org/browse/SERVER-401)

Also, there's always an alternative to write storedJs function, but since we speak of large quantities of data, db-locks can't be tolerated here.

Most likely I'll have to redesign the structure to something 1-level deep, like:

{
   status:1,delay:3.87,ts:138056460,service:TCP80
},{
   status:1,delay:2.83,ts:1308058080,service:TCP80
},{
   status:1,delay:5.77,ts:1308060720,service:TCP80
}

but DB will grow dramatically, since "service" is only one of many options which will append each document.

please advice!

thanks in advance



Answers

I'm attempting to do something similar. I tried your suggestion of using the GROUP function, but I couldn't keep the embedded documents separate or was doing something incorrectly.

I needed to pull/get a subset of embedded documents by ID. Here's how I did it using Map/Reduce:

db.parent.mapReduce(
  function(parent_id, child_ids){
    if(this._id == parent_id) 
      emit(this._id, {children: this.children, ids: child_ids})
  }, 
  function(key, values){
    var toReturn = [];

    values[0].children.forEach(function(child){
      if(values[0].ids.indexOf(product._id.toString()) != -1)
        toReturn.push(child);
    });
    return {children: toReturn};
  }, 
  { 
     mapparams: [
       "4d93b112c68c993eae000001", //example parent id
       ["4d97963ec68c99528d000007", "4debbfd5c68c991bba000014"] //example embedded children ids
     ]
  }
).find()

I've abstracted my collection name to 'parent' and it's embedded documents to 'children'. I pass in two parameters: The parent document ID and an array of the embedded document IDs that I want to retrieve from the parent. Those parameters are passed in as the third parameter to the mapReduce function.

In the map function I find the parent document in the collection (which I'm pretty sure uses the _id index) and emit its id and children to the reduce function.

In the reduce function, I take the passed in document and loop through each of the children, collecting the ones with the desired ID. Looping through all the children is not ideal, but I don't know of another way to find by ID on an embedded document.

I also assume in the reduce function that there is only one document emitted since I'm searching by ID. If you expect more than one parent_id to match, than you will have to loop through the values array in the reduce function.

I hope this helps someone out there, as I googled everywhere with no results. Hopefully we'll see a built in feature soon from MongoDB, but until then I have to use this.

By : Fadi


Fadi, as for "keeping embedded documents separate" - group should handle this with no issues

function getServiceData(collection, criteria) {

    var res=db[collection].group({
        cond: criteria,
        initial: {vals:[],globalVar:0},
        reduce: function(doc, out) {
            if (out.globalVar%2==0)
                out.vals.push({doc.whatever.kind.and.depth);
                out.globalVar  ;
        },
        finalize: function(out) {
            if (vals.length==0)
                out.vals='sorry, no data';
            return out.vals;
        }
    });

    return res[0];
};


This is not currently supported. By default you will always receive the whole document/array unless you use field restrictions or the $slice operator. Currently these tools do not allow filtering the array elements based on the search criteria.

You should watch this request for a way to do this: https://jira.mongodb.org/browse/SERVER-828



This video can help you solving your question :)
By: admin