I apologize for the potentially confusing title, but I will try to explain my situation as best I can.
Let’s say I have a hypothetical Dataframe df, which has an id column and is arranged like…
time id x y
1.0 0 5 9
2.0 1 6 8
3.0 2 7 7
4.0 1 8 6
Now lets say I want only the data from rows with df[id] = 1, but instead of dropping the other rows I fill it with NaN like this…
time id x y
1.0 0 NaN NaN
2.0 1 6 8
3.0 2 NaN NaN
4.0 1 8 6
Note that I specifically want to keep the time and id columns, just change the values of x and y to NaN for any rows that don’t have the id column set to 1
My first attempt was to use DataFrame.groupby(), but this leads to any rows without the specific id value being dropped entirely, which I don’t want. My first instinct is to go into df row by row, checking the id column, and changing the values to NaN manually if id != 1, but this seems like a very cumbersome and un-Pythonic way of doing this.
Any ideas?
Thanks in advance!
>Solution :
You can use simple selection with a boolean mask:
df.loc[df['id'].ne(1), ['x', 'y']] = float('nan')
Output:
time id x y
0 1.0 0 NaN NaN
1 2.0 1 6.0 8.0
2 3.0 2 NaN NaN
3 4.0 1 8.0 6.0