How to create new pandas columns from a pandas data frame column content

I have the following pandas dataframe:

import pandas as pd
import numpy as np


ds1 = {'Name':["Juan Pablo Montoya","Jose Hiero","Martin Vasquez"], "Comments" : ["DOB 18 May 1967; POB Mexico.","POB Mexico.","-0-"]}

df1 = pd.DataFrame(data=ds1)

Which looks like this:

print(df1)
                 Name                      Comments
0  Juan Pablo Montoya  DOB 18 May 1967; POB Mexico.
1          Jose Hiero                   POB Mexico.
2      Martin Vasquez                           -0-

I need to create two new columns based on the contents of the Comments column.
The names of the two new columns are:

  • DOB
  • POB

The values in column DOB is the value following DOB in the Comments column (up until the semi-colon).
The values in column POB is the value following POB in the Comments column (up until the dot).
If there is a value of "-0-" in the Comments then both new columns contain NaN.

So, from the example above, the resulting data frame would look like this:
enter image description here

>Solution :

You can do this with str.extract():

df1['DOB'] = df1['Comments'].str.extract('DOB (.*);+')
df1['POB'] = df1['Comments'].str.extract('POB (.*)')

Output:

    Name                Comments                        DOB           POB
0   Juan Pablo Montoya  DOB 18 May 1967; POB Mexico.    18 May 1967   Mexico.
1   Jose Hiero          POB Mexico.                     NaN           Mexico.
2   Martin Vasquez      -0-                             NaN           NaN

Leave a Reply