Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas extract word from link domain

I have dataframe :

import pandas as pd    
d = {'domain': ['linkedin.com','aumniversal.tumblr.com','plasticdrea.ms','linkedin.com','s-lw.tumblr.com','newsonline.media','creshendo.co.vu','deadly-skz-gods-cb.tumblr.com','deo.progr.am']}
df = pd.DataFrame(d)
df

I want to extract the words before the last word (for example, before .com, but I have not only .com there). So it will be:

    domain                            words
0   linkedin.com                    linkedin
1   aumniversal.tumblr.com          tumblr
2   plasticdrea.ms                  plasticdrea
3   linkedin.com                    linkedin
4   s-lw.tumblr.com                 tumblr
5   newsonline.media                newsonline
6   creshendo.co.vu                 co
7   deadly-skz-gods-cb.tumblr.com   tumblr
8   deo.progr.am                    progr

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use str.extract

df['words'] = df['domain'].str.extract(r'([^.]+)\.[^.]*$')

output:

                          domain        words
0                   linkedin.com     linkedin
1         aumniversal.tumblr.com       tumblr
2                 plasticdrea.ms  plasticdrea
3                   linkedin.com     linkedin
4                s-lw.tumblr.com       tumblr
5               newsonline.media   newsonline
6                creshendo.co.vu           co
7  deadly-skz-gods-cb.tumblr.com       tumblr
8                   deo.progr.am        progr

regex demo

([^.]+)   # capture word
\.[^.]*   # followed by .xxx
$         # and end of line
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading