Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to correctly split a column that doesn't follow a pattern in Pandas?

I have a huge data frame that has a column that contains a date and location together and I want to extract just the year from this column. The problem is that it doesn’t follow a pattern and I still couldn’t figure out a way to do that. Here’s a sample of the three different patterns I found in this table:

col
February 28, 2020 (United States)
April 1990 (United States)
1981 (United States)

Ideal output is:

col                                         yearcorrect
February 28, 2020 (United States)           2020
April 1990 (United States)                  1990
1981 (United States)                        1981

I managed to get the first and the third pattern right by doing this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['yearcorrect'] = df['col'].astype(str).str.split(', ').str[-1].astype(str).str[:4]

But I still have the problem with the middle pattern because when I do it, it returns "Apri". Any idea on how to get only the year of the ‘col’ and save it in a ‘yearcorrect’ column?

>Solution :

I think a short regex would be more appropriate here:

df['yearcorrect'] = df['col'].str.extract(r'(\d{4})')

or to match before the parenthesis:

df['yearcorrect'] = df['col'].str.extract(r'(\d{4})\s*\(')

output:

                                 col yearcorrect
0  February 28, 2020 (United States)        2020
1         April 1990 (United States)        1990
2               1981 (United States)        1981
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading