Is it possible to split a pandas column after last integer?

Advertisements

I am trying to split a pandas column into two separate, where the first should contain just the date and the second string. But I don’t want to split it after a certain character, like counting where the last integer instead I want to make a code that is applicable in general.

My col looks like this :

Column A
01.01.2000John Doe
01.01.2002Jane Doe

And I want it to look like this:

Column A Column B
01.01.2000 Johne Doe
01.01.2001 Jane Doe
df_t['date'] = df_t['date_time'].str[0:19]
df_t["name"] = df_t["date_time"].str[19: ]
    
    
tid = df_t.drop(["date_time"], axis = 1)

This is the way I did it but I need a general way as mentioned above

>Solution :

You can use str.extract together with regular expressions:

import pandas as pd

# Sample data
data = {'Column A': ['01.01.2000John Doe', '01.01.2002Jane Doe']}
df = pd.DataFrame(data)

# Regular expression pattern
pattern = r'(?P<Date>\d{2}\.\d{2}\.\d{4})(?P<Name>.*)'

# Extracting the date and name into separate columns
df[['Column A', 'Column B']] = df['Column A'].str.extract(pattern)

print(df)

Explanation:

  • The pattern variable contains the regular expression pattern. The expression (?P\d{2}.\d{2}.\d{4}) captures the date, and (?P.*) captures the name.
  • The ?P<> syntax is used to name the captured groups, which makes it easier to create the new columns in the DataFrame.

Leave a ReplyCancel reply