Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Removing rows based on multiple lists in Python

I am trying to remove rows based on multiple list and a related column. The condition is that when the location string doesn’t fall into the list of country, remove the rows. My current approach is to use np.select and I wonder if there’s a way to make it more succinct or easier to re-use.

Europe = ["London", "Paris", "Berlin"]
Amercia = ["Boston", "New York", "Florida"]
Asia = ["Hong Kong", "Singapore", "Tokyo"]

data = {
    "country1": ["Europe", "Asia", "Asia", "Europe", "Europe", "Amercia", "Amercia",  "Asia", "Asia", "Europe"], 
    'location':["London", "London", "Hong Kong", "Hong Kong", "New York", "New York", "Singapore", "Singapore",  "London", "London"]}
df = pd.DataFrame(data)

   country   location
0   Europe     London
1     Asia     London
2     Asia  Hong Kong
3   Europe  Hong Kong
4   Europe   New York
5  Amercia   New York
6  Amercia  Singapore
7     Asia  Singapore
8     Asia     London
9   Europe     London

Current approach

conditions = [
    (df['location'].str.contains('|'.join(Europe))),
    (df['location'].str.contains('|'.join(Amercia))),
    (df['location'].str.contains('|'.join(Asia)))]

choices = ["Europe", "Amercia", "Asia"]
df['country2'] = np.select(conditions, choices, default=0)
df2 = df.query("country1 == country2").reset_index(drop = True)

Expected output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   country   location
0   Europe     London
1     Asia  Hong Kong
2  Amercia   New York
3     Asia  Singapore
4   Europe     London

>Solution :

If possible mapping location by values from list convertd to flatten dictionaries and comapred with original column use:

d = {"Europe":Europe, "Amercia":Amercia, "Asia":Asia}

d1 = {x:k for k, v in d.items() for x in v}
print (d1)
{'London': 'Europe', 'Paris': 'Europe', 'Berlin': 'Europe',
 'Boston': 'Amercia',  'New York': 'Amercia', 'Florida': 'Amercia', 
 'Hong Kong': 'Asia', 'Singapore': 'Asia', 'Tokyo': 'Asia'}


df1 = df[df['location'].map(d1).eq(df['country1'])].reset_index(drop = True)
print (df1)
  country1   location
0   Europe     London
1     Asia  Hong Kong
2  Amercia   New York
3     Asia  Singapore
4   Europe     London

If possible substring in original column is possible test first matched value by dictionary:

df1 = df[df['location'].str.extract(f'({"|".join(d1)})', expand=False).map(d1).eq(df['country1'])]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading