Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to extract information from dataframe column and create a new column based on information

I have pandas dataframe which includes list of url like this:

api
https://apis.us/image/
https://apis.emea/video/
https://apis.asia/docs/
https://apis.general/

I want get a new column region which will tell the corresponding region of the urls, in case there is no region in the url, mark that as global.

api                         region
https://apis.us/image/      us
https://apis.emea/video/    emea
https://apis.asia/docs/     asia
https://apis.general/       global

How can I achieve this in efficient way? For all urls I have to search with this three region us, emea and asia

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

If need test values after apis. text use Series.str.extract with first a positive lookbehind with joined possible values in list with replace not matched values by Series.fillna:

vals = ['us','emea','asia']
df['region'] = (df['api'].str.extract(rf'(?<=https://apis\.)({"|".join(vals)})')
                         .fillna('global'))

print (df)
                        api  region
0    https://apis.us/image/      us
1  https://apis.emea/video/    emea
2   https://apis.asia/docs/    asia
3     https://apis.general/  global

If need test any substring:

vals = ['us','emea','asia']
df['region'] = df['api'].str.extract(rf'({"|".join(vals)})').fillna('global')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading