I am using map in a Pandas Series to apply a function that extracts any string representing a date or an empty string if there is no date in that string.
import pandas as pd
import dateparser
text_series = pd.Series(data={'label 1':'some text',
'label 2':'something happened on 2012-12-31',
'label 3':'2013-12-31'})
new_series = text_series.map(lambda x: dateparser.search.search_dates(x)[-1][1] if dateparser.search.search_dates(x) else "")
The code works as expected and I end with a new Series with datetime objects representing the dates in the strings.
label 1 NaT
label 2 2012-12-31
label 3 2013-12-31
dtype: datetime64[ns]
My issue is that I get a warning because map infers datetime from the strings returned by the function and apparently that behaviour is deprecated and type should be indicated explicitely.
FutureWarning: Inferring datetime64[ns] from data containing strings is deprecated and will be removed in a future version. To retain the old behavior explicitly pass Series(data, dtype={value.dtype})
How can I avoid this warning and avoid this code to stop working when the old behaviour stops working?
>Solution :
Took a different approach with regex
import pandas as pd
import regex as re
text_series = pd.Series(data={'label 1':'some text',
'label 2':'something happened on 2012-12-31',
'label 3':'2013-12-31'})
def make_dt(row):
x = re.search(r'(\d{4}-\d{2}-\d{2})', row)
if x:
return pd.to_datetime(x.group(1))
new_series = text_series.apply(make_dt)
in case doesn’t match the length: r'(\d-\d-\d)'
output:
label 1 NaT
label 2 2012-12-31
label 3 2013-12-31
dtype: datetime64[ns]