Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to indicate the dtype when using map on a series?

I am using map in a Pandas Series to apply a function that extracts any string representing a date or an empty string if there is no date in that string.

import pandas as pd
import dateparser

text_series = pd.Series(data={'label 1':'some text',
                              'label 2':'something happened on 2012-12-31',
                              'label 3':'2013-12-31'})

new_series = text_series.map(lambda x: dateparser.search.search_dates(x)[-1][1] if dateparser.search.search_dates(x) else "")

The code works as expected and I end with a new Series with datetime objects representing the dates in the strings.

label 1          NaT
label 2   2012-12-31
label 3   2013-12-31
dtype: datetime64[ns]

My issue is that I get a warning because map infers datetime from the strings returned by the function and apparently that behaviour is deprecated and type should be indicated explicitely.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

FutureWarning: Inferring datetime64[ns] from data containing strings is deprecated and will be removed in a future version. To retain the old behavior explicitly pass Series(data, dtype={value.dtype})

How can I avoid this warning and avoid this code to stop working when the old behaviour stops working?

>Solution :

Took a different approach with regex

import pandas as pd
import regex as re

text_series = pd.Series(data={'label 1':'some text',
                              'label 2':'something happened on 2012-12-31',
                              'label 3':'2013-12-31'})

def make_dt(row):
    x = re.search(r'(\d{4}-\d{2}-\d{2})', row)
    if x:
        return pd.to_datetime(x.group(1))

new_series = text_series.apply(make_dt)

in case doesn’t match the length: r'(\d-\d-\d)'

output:
label 1          NaT
label 2   2012-12-31
label 3   2013-12-31
dtype: datetime64[ns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading