Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract features from text data on python

I have a dataframe from pandas like this.

ID            email
1            abc@google.com
2            abc@facebook.com
3            abc@GOOGLE.COM
4            abc@tesla.com
5            abc@hilton.com
6            abc@FaceBook.com

I want to learn company from email(after @).Sample output like this.

Sample output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ID            email                WorkGoogle     WorkFacebook    etc.....
1            abc@google.com          Yes             No              ..
2            abc@facebook.com        No              Yes             .. 
3            abc@GOOGLE.com          Yes             No               ..   
4            abc@tesla.com           No              No              ..
5            abc@hilton.com          No              No              ..
6            abc@FaceBook.com        No              Yes             ..

Need to care Uppercase lowercase.

>Solution :

FYI: this solution is not performance efficient. I am sure in the comments on this answer, you may find a more efficient solution

I would first make a list of all companies by saying:

companies = set([email.split('@')[1].split('.')[0].lower() for email in df['email']])

Then simply iterate over this:

for company in companies:
    df['Work'+company.capitalize()] = df['email'].apply(lambda x: x.split("@")[1].lower()).str.contains(company)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading