I have scrolled through the posts on this question and was unable to find an answer to my situation. I have a pandas dataframe with a list of company names, some of which are represented as their domain name.
df = pd.DataFrame(['amazon.us', 'pepsi', 'YOUTUBE.COM', 'apple.inc'], columns=['firm'])
I want to remove the domain extension from all the strings. The extensions are given in a list:
web_domains = ['.com', '.us']
The ollowing attepmt did not yield any results:
df['firm'].str.lower().replace(web_domains, '')
Can someone please help me out, and possibly also explain why my solution does not work?
>Solution :
You need use regex=True for Series.replace since it matches exact string under regex=False.
For example a will only be replaced when target is a not ab nor bab
web_domains = ['\.com', '\.us'] # escape `.` for regex=True
df['removed'] = df['firm'].str.lower().replace(web_domains, '', regex=True)
print(df)
firm removed
0 amazon.us amazon
1 pepsi pepsi
2 YOUTUBE.COM youtube
3 apple.inc apple.inc