I am trying to split a pandas column into two separate, where the first should contain just the date and the second string. But I don’t want to split it after a certain character, like counting where the last integer instead I want to make a code that is applicable in general.
My col looks like this :
Column A |
---|
01.01.2000John Doe |
01.01.2002Jane Doe |
And I want it to look like this:
Column A | Column B |
---|---|
01.01.2000 | Johne Doe |
01.01.2001 | Jane Doe |
df_t['date'] = df_t['date_time'].str[0:19]
df_t["name"] = df_t["date_time"].str[19: ]
tid = df_t.drop(["date_time"], axis = 1)
This is the way I did it but I need a general way as mentioned above
>Solution :
You can use str.extract
together with regular expressions:
import pandas as pd
# Sample data
data = {'Column A': ['01.01.2000John Doe', '01.01.2002Jane Doe']}
df = pd.DataFrame(data)
# Regular expression pattern
pattern = r'(?P<Date>\d{2}\.\d{2}\.\d{4})(?P<Name>.*)'
# Extracting the date and name into separate columns
df[['Column A', 'Column B']] = df['Column A'].str.extract(pattern)
print(df)
Explanation:
- The pattern variable contains the regular expression pattern. The expression (?P\d{2}.\d{2}.\d{4}) captures the date, and (?P.*) captures the name.
- The ?P<> syntax is used to name the captured groups, which makes it easier to create the new columns in the DataFrame.