Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Generating 3 columns from one with .apply on dataframe

I want to extract some data from each row, and make that new columns of existing or new dataframe, without repeatedly doing the same operation of re. match.

Here’s how one entry of the dataframe looks:

00:00 Someones_name: some text goes here

And i have a regex that successfully takes 3 groups that I need:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

re.match(r"^(\d{2}:\d{2}) (.*): (.*)$", x)

The problem I have is, how to take matched_part[1], [2], and [3] without actually matching for every new column again.

The solution that I don’t want is:

new_df['time'] = old_df['text'].apply(function1)`
new_df['name'] = old_df['text'].apply(function2)`
new_df['text'] = old_df['text'].apply(function3)`

def function1(x):
  return re.match(r"^(\d{2}:\d{2}) (.*): (.*)$", x)[1]

>Solution :

you can use str.extract with your pattern

df[['time','name', 'text']] = df['col1'].str.extract(r"^(\d{2}:\d{2}) (.*): (.*)$")
print(df)
#                                        col1   time           name  \
# 0  00:00 Someones_name: some text goes here  00:00  Someones_name   

#                   text  
# 0  some text goes here  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading