I am trying to count the number of books in a dataset whose publication year is equal to or greater than 2000.
Here is the format of the column:
publication_date = "dd/mm/yyyy"
Here is my code:
df[int(df["publication_date"][-4: 0]) >= 2000]["publication_date"].count()
I am receiving error like the one below:
TypeError Traceback (most recent call last)
<ipython-input-31-ed1072acfb26> in <module>
----> 1 df[int(df["publication_date"][-4: 0]) >= 2000]["publication_date"].count()
/opt/conda/lib/python3.8/site-packages/pandas/core/series.py in wrapper(self)
127 if len(self) == 1:
128 return converter(self.iloc[0])
--> 129 raise TypeError(f"cannot convert the series to {converter}")
130
131 wrapper.__name__ = f"__{converter.__name__}__"
TypeError: cannot convert the series to <class 'int'>
What should I do to fix it?
>Solution :
For speed up processing of datetime, you may have to convert it to datetime, then extract the year to make comparison.
import pandas as pd
data = {'publication_date': ['10/05/1999', '15/12/2005', '23/09/2002', '05/03/2000', '18/07/2008']}
df = pd.DataFrame(data)
df['publication_date'] = pd.to_datetime(df['publication_date'], format='%d/%m/%Y')
print(df[df['publication_date'].dt.year > 2000].count())