Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to sort by year using a date string in PyMongo Aggregate

Me and another programmer have run around in circles on this and cannot seem to find an answer online.

We have a MongoDB collection into which we have inserted documents which have been pulled from an API. Each document has a field called attributes.created which is a string.

The format of the string is like this:
"2019-04-25T07:06:02Z"

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Our aim is to get a list of counts per year. We can do this on a different collection which just has an int value for year… but we cannot figure it out here as we need to convert a string to a date.

An example of our aggregate function is:

publicationsByYear = collection.aggregate([
         {"$group": {"_id" : { $dateFromString: { "dateString": "attributes.created", "format": "%Y-%m-%d %H:%M:%S" }}, "num_publications": {"$sum": 1}}}
])

We have also tried the $toDate instead of $dateFromString, but the results are the same. No matter what we try we seem to get one of the same two errors:

  1. Error parsing date, a four digit year could not be found. (this is the error the above code throws)
  2. passing a time zone identifier as part of the string is not allowed. (this is thrown if we remove the ‘format’ parameter and go with the default format).

We’ve tried various uses of $toDate and $dateFromString with different values for the format parameter, but we are getting the same errors. We’ve also tried various format specifications, but can’t seem to get it right.

What are we missing?

>Solution :

Using the $dateFromString operator,

  • convert that string date to date type by $dateFromString
  • $year to get the year from the above-converted date
publicationsByYear = collection.aggregate([
  {
    $group: {
      _id: {
        $year: { $dateFromString: "$attributes.created" }
      },
      num_publications: { $sum: 1 }
    }
  }
])

Playground

Using the $toDate operator,

  • convert that string date to date type by $toDate
  • $year to get the year from the above-converted date
publicationsByYear = collection.aggregate([
  {
    $group: {
      _id: {
        $year: { $toDate: "$attributes.created" }
      },
      num_publications: { $sum: 1 }
    }
  }
])

Playground

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading