Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split array-like string values in MongoDB aggregation

A field in my data is stored as a string instead of a real nested array:

{
  "uuid": "yxcvb", 
  "product": "[{\"pid\":\"4f76c06825aa486db9a7a5cedbc2ea19\",\"size\":\"S\"},{\"pid\":\"d3055e0a1bb040dba65674b9733bf4ce\",\"size\":\"M\"}]",
  "timestamp":{"$date":"2023-01-01T11:25:47+00:00"}
},
{
  "uuid": "asdfg", 
  "product": "[{\"pid\":\"369b33cc768d496193b18d657d706920\",\"size\":\"M\"}]",
  "timestamp":{"$date":"2023-01-02T11:25:47+00:00"}
},
{
  "uuid": "qwert", 
  "product": "[{\"pid\":\"13325cb71fd7413c8f3e18caf20b5d6d\",\"size\":\"S\"},{\"pid\":\"ac431397d49d449cab44d98dc13ec57c\",\"size\":\"M\"},{\"pid\":\"ba35b9f804a44e15bf197d3ef671dc34\",\"size\":\"L\"}]",
  "timestamp":{"$date":"2023-01-03T11:25:47+00:00"}
}

/// "product" has a maximum of 4 dict-like pid-size units

I would like to know the count of each size and group by timestamp. Expected output:

[
  {
    "week": 52,
    "sizeS": 1
  },
  {
    "week": 52,
    "sizeM": 1
  },
  {
    "week": 1,
    "sizeS": 1
  },
  {
    "week": 1,
    "sizeM": 2
  },
  {
    "week": 1,
    "sizeL": 1
  }
]

To realize this, the following problems need to be solved:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  1. Turn the array-like string into a real string, then I could use $unwind.
  2. If 1 cannot be achieved, extract all pid and size values in a document, and save only the value after the colon (e.g. "4f76c06825aa486db9a7a5cedbc2ea19", "M").
  3. After 2, count total records of each size within the grouped timeframe (week).

>Solution :

You can use $function to parse the json strings and use it as normal mongodb fields.

db.collection.aggregate([
  {
    "$set": {
      "product": {
        "$function": {
          "body": "function(product) {return JSON.parse(product)}",
          "args": [
            "$product"
          ],
          "lang": "js"
        }
      }
    }
  },
  {
    "$unwind": "$product"
  },
  {
    $group: {
      _id: {
        week: {
          "$isoWeek": "$timestamp"
        },
        size: "$product.size"
      },
      count: {
        $sum: 1
      }
    }
  },
  {
    $sort: {
      "_id.week": -1
    }
  }
])

Mongo Playground

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading