I have a document structure like this:
[
{
"country": "UK",
"shops": [
{"city": "London", "fruits": ["banana", "apple"]},
{"city": "Birmingham", "fruits": ["banana", "pineapple"]},
],
},
{
"country": "DE",
"shops": [
{"city": "Munich", "fruits": ["banana", "strawberry"]},
{"city": "Berlin", "fruits": ["kiwi", "pineapple"]},
],
},
]
In my python script, i have a dict mapping each fruit to a category value:
categories = {
1: ["apple"],
2: ["banana", "kiwi"],
3: ["pineapple", "strawberry"]
}
DESIRED OUTPUT:
Now i would like to get, using the mongo aggregation framework, for each document the max_category projected from the max value found in the shops sub-documents based on the category mapping.
[
{
"country": "UK",
"shops": [
{"city": "London", "fruits": ["banana", "apple"]},
{"city": "Birmingham", "fruits": ["banana", "pineapple"]},
],
"max_category": 3
},
{
"country": "DE",
"shops": [
{"city": "Munich", "fruits": ["banana", "apple"]},
{"city": "Berlin", "fruits": ["kiwi", "apple"]},
],
"max_category": 2
},
]
Thanks for your help!
>Solution :
You category dict will not fit in mongo as mongo requires objects to have keys in string types. You can wrangle the dict to below form for easier processing:
[
{
category: 1,
fruits: [
"apple"
]
},
{
category: 2,
fruits: [
"banana",
"kiwi"
]
},
{
category: 3,
fruits: [
"pineapple",
"strawberry"
]
}
]
In aggregation pipeline, iterate through the wrangled category array by $reduce
to conditionally update the accumulator to get the max matched category.
db.collection.aggregate([
{
"$unwind": "$shops"
},
{
"$set": {
"max_category": {
"$reduce": {
"input": [
{
category: 1,
fruits: [
"apple"
]
},
{
category: 2,
fruits: [
"banana",
"kiwi"
]
},
{
category: 3,
fruits: [
"pineapple",
"strawberry"
]
}
],
"initialValue": null,
"in": {
"$cond": {
"if": {
$and: [
{
$gt: [
"$$this.category",
"$$value"
]
},
{
$gt: [
{
$size: {
"$setIntersection": [
"$$this.fruits",
"$shops.fruits"
]
}
},
0
]
}
]
},
"then": "$$this.category",
"else": "$$value"
}
}
}
}
}
},
{
"$group": {
"_id": "$_id",
"country": {
$first: "$country"
},
"max_category": {
$max: "$max_category"
},
"shops": {
"$push": "$shops"
}
}
}
])