Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter Dataframe that contain specific characters by user (Python)

I’m trying to find Names that contain the letters by user input. In this case, finding Names in the Name column that contain ‘a’ and ‘i’ however getting an error:

data = {'Name': ['Aerial', 'Tom', 'Amie', 'Anuj'],
        'Age': [27, 24, 22, 32],
        'Address': ['pennsylvania', 'newyork', 'newjersey', 'delaware'],
        'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame(data)
df["Name"] = df["Name"].str.lower()
print(df)
letters_in = input('Words in Name Column that contain these letters: \n ').split()
new_output = df.loc[df['Name'].str.contains(letters_in, case=False)]

Code run:

Words in Name Column that contain these letters: 

>? a e
ERROR: 
TypeError: unhashable type: 'list'

Ideal Output (as dataframe):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Aerial
Amie

>Solution :

First, to address your error message, the contains() method expects a string as its first argument, not a list.

The string it expects is a character sequence or regular expression (see here) that it will attempt to match, which I believe is different from what you are attempting, namely to find rows with Name containing all input letters.

To do this, you can use the following approach, for example:

import pandas as pd
data = {'Name': ['Aerial', 'Tom', 'Amie', 'Anuj'],
        'Age': [27, 24, 22, 32],
        'Address': ['pennsylvania', 'newyork', 'newjersey', 'delaware'],
        'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame(data)
df["Name"] = df["Name"].str.lower()
#letters_in = input('Words in Name Column that contain these letters: \n ').split()
letters_in = ['a', 'i']
new_output = df[df.apply(lambda x: all(letter in x['Name'] for letter in letters_in), axis=1)]
print(new_output)

Output:

     Name  Age       Address Qualification
0  aerial   27  pennsylvania           Msc
2    amie   22     newjersey           MCA
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading