Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I summarize tags by category in mongodb

I have a collection that is shaped like this:

[
    {
        _id: ObjectId("5d8e8c9b8f8b9b7b7a8b4567"),
        tags: {
            language: [ 'en' ],
            industries: [ 'agency', 'travel' ],
            countries: [ 'ca', 'us' ],
            regions: [ 'north-america' ],
        }
    },
    {
        _id: ObjectId("5d8e8c9b8f8b9b7b7a8b4568"),
        tags: {
            language: [ 'en', 'fr' ],
            industries: [ 'travel' ],
            countries: [ 'ca' ]
        }
    },
    {
        _id: ObjectId("5d8e8c9b8f8b9b7b7a8b4569"),
        tags: {
            language: [ 'en' ],
            industries: [ 'agency', 'travel' ],
            countries: [ 'ca', 'us' ],
            regions: [ 'south-america' ]
        }
    },
]

and I would like to generate this as a result…

{
    //* count of all documents
    "count": 3,
    //* count of all documents that contain any slug within the given category
    "countWithCategorySlug": {
        "language": 3,
        "industries": 3,
        "countries": 3,
        "regions": 2
    },
    //* per category: count of documents that contain that slug in the givin category
    "language" {
        "en": 3,
        "fr": 1
    },
    "industries" {
        "agency": 2,
        "travel": 3,
    },
    "countries" {
        "ca": 3,
        "us": 2
    },
    "regions" {
        "north-america": 1,
        "south-america": 1
    }
}

super stuck so any help would be appreciated. 🙂

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The number of categories is unknown and I have a code solution that queries the list of disctint categories and slugs then for each one generates a $group stage… The resultant query is excessively big and there needs to be a better way… problem is that I have absolutely no idea on how to optimize it…

>Solution :

Query

  • the first part before the facet is done to seperate them and make for each value 1 document like
  [{
  "type": "language",
  "value": "en",
  "_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
  "type": "industries",
  "value": "agency",
  "_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
  "type": "industries",
  "value": "travel",
  "_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
},
{
  "type": "countries",
  "value": "ca",
  "_id": ObjectId("5d8e8c9b8f8b9b7b7a8b4567")
}]
  • and then facet with 3 fields and count the documents
  • and after than transformations to have data on keys like the expected output

Playmongo

ggregate(
[{"$set": {"tags": {"$objectToArray": "$tags"}}},
 {"$set": 
   {"tags": 
     {"$map": 
       {"input": "$tags",
        "in": {"type": "$$this.k", "value": "$$this.v", "_id": "$_id"}}}}},
 {"$unwind": "$tags"},
 {"$replaceRoot": {"newRoot": "$tags"}},
 {"$unwind": "$value"},
 {"$facet": 
   {"count": 
     [{"$group": {"_id": null, "count": {"$addToSet": "$_id"}}},
       {"$set": {"count": {"$size": "$count"}}}],
    "category": 
     [{"$group": {"_id": "$type", "count": {"$addToSet": "$_id"}}},
       {"$set": {"count": {"$size": "$count"}}}],
    "values": 
     [{"$group": 
         {"_id": "$value",
          "type": {"$first": "$type"},
          "values": {"$addToSet": "$_id"}}},
       {"$set": {"values": {"$size": "$values"}}},
       {"$group": 
         {"_id": "$type",
          "values": 
           {"$push": 
             {"type": "$type", "value": "$_id", "count": "$values"}}}}]}},
 {"$set": 
   {"count": 
     {"$getField": 
       {"field": "count", "input": {"$arrayElemAt": ["$count", 0]}}},
    "category": 
     {"$arrayToObject": 
       [{"$map": 
           {"input": "$category",
            "in": {"k": "$$this._id", "v": "$$this.count"}}}]},
    "values": 
     {"$arrayToObject": 
       [{"$map": 
           {"input": "$values",
            "in": 
             {"k": "$$this._id",
              "v": 
               {"$arrayToObject": 
                 [{"$map": 
                     {"input": "$$this.values",
                      "in": {"k": "$$this.value", "v": "$$this.count"}}}]}}}}]}}}])

Outputs

[{
  "count": 3,
  "category": {
    "countries": 3,
    "industries": 3,
    "regions": 2,
    "language": 3
  },
  "values": {
    "regions": {
      "south-america": 1,
      "north-america": 1
    },
    "countries": {
      "us": 2,
      "ca": 3
    },
    "language": {
      "fr": 1,
      "en": 3
    },
    "industries": {
      "agency": 2,
      "travel": 3
    }
  }
}]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading