I have post documents like this :
post = {
"_id": "uid1",
"userID": "user1",
"likeNum": 30,
}
My current pipeline looks like this with user_ids an array of user IDs to avoid and seen_ids also an array of post IDs to avoid:
fake_pipeline = [
{'$match': {'userID': {'$nin': user_ids}, '_id': {"$nin": seen_ids}}},
{'$group': {'_id': '$userID', 'posts': {'$push': '$likeNum'}}},
{'$project': {'posts': {'$slice': [{'$sortArray': {'input': '$posts', 'sortBy': {'likeNum': -1}}}, 2]}}},
{'$limit': 10}
]
I would like the aggregation to return a list of 10 post ordered by likeNum descending but with at max 2 post per userID.
Example :
post_list_in_db = [
{"_id": "uid1", "userID": "user1", "likeNum": 29},
{"_id": "uid2", "userID": "user1", "likeNum": 2},
{"_id": "uid3", "userID": "user1", "likeNum": 13},
{"_id": "uid4", "userID": "user2", "likeNum": 21},
{"_id": "uid5", "userID": "user2", "likeNum": 19},
{"_id": "uid6", "userID": "user3", "likeNum": 1},
{"_id": "uid7", "userID": "user3", "likeNum": 8},
{"_id": "uid8", "userID": "user3", "likeNum": 14},
{"_id": "uid9", "userID": "user3", "likeNum": 4},
{"_id": "uid10", "userID": "user4", "likeNum": 20},
{"_id": "uid11", "userID": "user4", "likeNum": 9},
{"_id": "uid12", "userID": "user4", "likeNum": 11},
]
The expected output is :
[
{"_id": "uid1", "userID": "user1", "likeNum": 29},
{"_id": "uid4", "userID": "user2", "likeNum": 21},
{"_id": "uid10", "userID": "user4", "likeNum": 20},
{"_id": "uid5", "userID": "user2", "likeNum": 19},
{"_id": "uid8", "userID": "user3", "likeNum": 14},
{"_id": "uid3", "userID": "user1", "likeNum": 13},
{"_id": "uid12", "userID": "user4", "likeNum": 11},
{"_id": "uid7", "userID": "user3", "likeNum": 8},
]
Note : for performance reason, i would like the minimum code in the aggregation, no need to format the data perfectly like in the example output unless it has no impact on performances, otherwise i prefer reorder / transform objects myself in the code as long as i have the correct post list content (but not especially formatted or ordered) as output of the aggregation.
>Solution :
You’re quite close to what you need.
-
Sort on
$likeNumafter the$matchstage. -
Then when you
$groupon userID, you can use the$firstNaggregation accumulator selecting just the first 2 posts per user. So these will be the highest 2likeNum‘s per UserID (since we sorted before this).- Also, use
"$$ROOT"to get the actual post document rather than just thelikeNum.
- Also, use
-
Then unwind the accumulated
"$posts", replace those as the docs themselves, and sort again bylikeNum.
(The parts in blue are the differences from your pipeline.)
db.post.aggregate([
{"$match": {"userID": {"$nin": user_ids}, "_id": {"$nin": seen_ids}}},
{ $sort: { likeNum: -1 } },
{
"$group": {
"_id": "$userID",
"posts": {
$firstN: {
input: "$$ROOT",
n: 2
}
}
}
},
{ $unwind: "$posts" },
{ $replaceWith: "$posts" },
{ $sort: { likeNum: -1 } },
{ "$limit": 10 }
])
Mongo Playground. (No changes to the first $match stage, I’ve just put ["user3"] and ["uid9"] as values in the playground example.)
Wrt "i prefer reorder / transform objects myself in the code" – The first sort cannot/should not be done in code, since it’s for all posts after the match. The second $sort could be done in code but it’s likely to be slower than letting MongoDB sort and then select just 10 results. Otherwise, you need to fetch all results and then sort & limit yourself. And if you plan to have pagination, the skip + limit step have to be done by MongoDB after the sort is also done by MongoDB.