I have the following pandas dataframe:
import pandas as pd
import numpy as np
ds1 = {'Name':["Juan Pablo Montoya","Jose Hiero","Martin Vasquez"], "Comments" : ["DOB 18 May 1967; POB Mexico.","POB Mexico.","-0-"]}
df1 = pd.DataFrame(data=ds1)
Which looks like this:
print(df1)
Name Comments
0 Juan Pablo Montoya DOB 18 May 1967; POB Mexico.
1 Jose Hiero POB Mexico.
2 Martin Vasquez -0-
I need to create two new columns based on the contents of the Comments
column.
The names of the two new columns are:
DOB
POB
The values in column DOB
is the value following DOB in the Comments
column (up until the semi-colon).
The values in column POB
is the value following POB in the Comments
column (up until the dot).
If there is a value of "-0-" in the Comments
then both new columns contain NaN
.
So, from the example above, the resulting data frame would look like this:
>Solution :
You can do this with str.extract()
:
df1['DOB'] = df1['Comments'].str.extract('DOB (.*);+')
df1['POB'] = df1['Comments'].str.extract('POB (.*)')
Output:
Name Comments DOB POB
0 Juan Pablo Montoya DOB 18 May 1967; POB Mexico. 18 May 1967 Mexico.
1 Jose Hiero POB Mexico. NaN Mexico.
2 Martin Vasquez -0- NaN NaN