Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to replace strings in pandas column that are in a list?

I have scrolled through the posts on this question and was unable to find an answer to my situation. I have a pandas dataframe with a list of company names, some of which are represented as their domain name.

df = pd.DataFrame(['amazon.us', 'pepsi', 'YOUTUBE.COM', 'apple.inc'], columns=['firm'])

I want to remove the domain extension from all the strings. The extensions are given in a list:

web_domains = ['.com', '.us']

The ollowing attepmt did not yield any results:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['firm'].str.lower().replace(web_domains, '')

Can someone please help me out, and possibly also explain why my solution does not work?

>Solution :

You need use regex=True for Series.replace since it matches exact string under regex=False.

For example a will only be replaced when target is a not ab nor bab

web_domains = ['\.com', '\.us'] # escape `.` for regex=True

df['removed'] = df['firm'].str.lower().replace(web_domains, '', regex=True)
print(df)

          firm    removed
0    amazon.us     amazon
1        pepsi      pepsi
2  YOUTUBE.COM    youtube
3    apple.inc  apple.inc
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading