Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using .find() on a pd.dataframe series

I have the following df:

data = {'Org':  ['<a href="/00xO" target="_blank">Chocolate</a>'],
        'Owner': ['Charlie']
        }

df = pd.DataFrame(data)

print (df)

and when I apply the lamba function below instead of giving me ‘Chocolate’ it’s returning 0.

df['Correct Org']=df['Org'].apply(lambda st: st[st.find(">"):st.find("<")])

I’ve tried adding ‘str’ as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['Correct Org']=df['Org'].str.apply(lambda st: st[st.find(">")+1:st.find("<")])

& get the following error:

AttributeError: 'StringMethods' object has no attribute 'apply'

>Solution :

You’re getting None returned because df['Org'][0].find(">") returns 31 but df['Org'][0].find("<") returns 0. So it’s not clear what st[st.find(">"):st.find("<") means. You can use bs4.BeautifulSoup to create a soup object and get the text inside a directly:

from bs4 import BeautifulSoup
df['Org'] = df['Org'].apply(lambda x: BeautifulSoup(x).text)

Output:

         Org    Owner
0  Chocolate  Charlie
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading