Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: replace value in col1 with value in col2 based on filter / boolean test

I’m working with a small dataframe (9K rows) that has county and city fields (=columns). Some rows have NaN values for the county but have county names in the city column.

I can find these by doing

filter1 = df['county'].isna()
filter2 = df['city'].str.contains('County')
df[ filter1 & filter2 ]

For rows that match this criteria, I want assign the city value to the county field. This stackoverflow discusses how to do this, but I can’t get any of the examples to work.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’ve tried a variety of things:

1. df[filter1 & filter2]['county'] = df[filter1 & filter2]['city']

2. df['county'] = df.apply(lambda x: x['city'] if ( x['county'].isna() & 'County' in x['city'] ) else x['county'] , axis=1 )

3. df['county'] = np.where(df['county']=='unknown' & df['city'].str.contains('County'), df['city'], df['county']  )

I get a variety of errors:

  1. A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

Helpful feedback, but I couldn’t figure out the syntax for using .loc for this. I got as far as

df.loc[df['county'].isna() & df['city'].str.contains('County),'county'] = ???

I don’t know if the .loc function would work, but even if it does, how do I specify the correct / row city field?

  1. Type Error: TypeError: unsupported operand type(s) for &: ‘str’ and ‘str’

I thought this might be because county is a string field, so the lambda f’n objected to doing a .isna() test on it. I changed the NaNs to "unknown" but I still get the error. (I’ve seen a lot of comments about the slowness of .apply() but on a 9K table, it should be fine.)

  1. Raises no errors, but doesn’t actually change the county field!

This seems like a pretty basic operation, but I can’t figure out the correct syntax to get it to make the change. What am I doing wrong?

>Solution :

You can obtain the indexes of the filter you created and then use .loc to make the desired change:

filter1 = df['county'].isna()
filter2 = df['city'].str.contains('County')
idx = df[ filter1 & filter2 ].index
df.loc[idx, "county"] = df.loc[idx, "city"]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading