Home Removing rows based on multiple lists in Python

Questions

Removing rows based on multiple lists in Python

December 9, 2021

I am trying to remove rows based on multiple list and a related column. The condition is that when the location string doesn’t fall into the list of country, remove the rows. My current approach is to use np.select and I wonder if there’s a way to make it more succinct or easier to re-use.

Europe = ["London", "Paris", "Berlin"]
Amercia = ["Boston", "New York", "Florida"]
Asia = ["Hong Kong", "Singapore", "Tokyo"]

data = {
    "country1": ["Europe", "Asia", "Asia", "Europe", "Europe", "Amercia", "Amercia",  "Asia", "Asia", "Europe"], 
    'location':["London", "London", "Hong Kong", "Hong Kong", "New York", "New York", "Singapore", "Singapore",  "London", "London"]}
df = pd.DataFrame(data)

   country   location
0   Europe     London
1     Asia     London
2     Asia  Hong Kong
3   Europe  Hong Kong
4   Europe   New York
5  Amercia   New York
6  Amercia  Singapore
7     Asia  Singapore
8     Asia     London
9   Europe     London

Current approach

conditions = [
    (df['location'].str.contains('|'.join(Europe))),
    (df['location'].str.contains('|'.join(Amercia))),
    (df['location'].str.contains('|'.join(Asia)))]

choices = ["Europe", "Amercia", "Asia"]
df['country2'] = np.select(conditions, choices, default=0)
df2 = df.query("country1 == country2").reset_index(drop = True)

Expected output:

   country   location
0   Europe     London
1     Asia  Hong Kong
2  Amercia   New York
3     Asia  Singapore
4   Europe     London

>Solution :

If possible mapping location by values from list convertd to flatten dictionaries and comapred with original column use:

d = {"Europe":Europe, "Amercia":Amercia, "Asia":Asia}

d1 = {x:k for k, v in d.items() for x in v}
print (d1)
{'London': 'Europe', 'Paris': 'Europe', 'Berlin': 'Europe',
 'Boston': 'Amercia',  'New York': 'Amercia', 'Florida': 'Amercia', 
 'Hong Kong': 'Asia', 'Singapore': 'Asia', 'Tokyo': 'Asia'}


df1 = df[df['location'].map(d1).eq(df['country1'])].reset_index(drop = True)
print (df1)
  country1   location
0   Europe     London
1     Asia  Hong Kong
2  Amercia   New York
3     Asia  Singapore
4   Europe     London

If possible substring in original column is possible test first matched value by dictionary:

df1 = df[df['location'].str.extract(f'({"|".join(d1)})', expand=False).map(d1).eq(df['country1'])]

pandas

byMR

Published December 09, 2021

Add a comment

mypy error when reshaping an array with a list of tuples

byMR

December 9, 2021

Questions

Alternating messages on chatbot creation

byMR

December 9, 2021

Questions

Flex/Lex unrecognized rule for paranthesis'

byMR

December 9, 2021

Questions

can't save the image after using PIL Image.thumbnail

byMR

December 9, 2021

Questions

jQuery: Regex to validate US postal doesn't work

byMR

December 9, 2021

Questions

Cannot convert from out System.Guid? to System.Guid

byMR

December 9, 2021

Removing rows based on multiple lists in Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

mypy error when reshaping an array with a list of tuples

Alternating messages on chatbot creation

Flex/Lex unrecognized rule for paranthesis'

can't save the image after using PIL Image.thumbnail

jQuery: Regex to validate US postal doesn't work

Cannot convert from out System.Guid? to System.Guid

Keep Up to Date with the Most Important News

Removing rows based on multiple lists in Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

mypy error when reshaping an array with a list of tuples

Alternating messages on chatbot creation

Flex/Lex unrecognized rule for paranthesis'

can't save the image after using PIL Image.thumbnail

jQuery: Regex to validate US postal doesn't work

Cannot convert from out System.Guid? to System.Guid

Discover more from Dev solutions