One common thing people seem to want to do in pandas is to replace None-values with the next or previous None-value. This is easily done with .fillna. I however want to do something similar but different.
I have a dataframe, df, with some entries. Every row has a different number of entries and they are all "left-adjusted" (if the df is 10 columns wide and some row has n<10 entries the first n columns hold the entries and the remaining columns are Nones).
What I want to do is find the last non-None entry in every row and change it to also be a None. This could be any of the columns from the first to the last.
I could of course do this with a for-loop but my dfs can be quite large so something quicker would be preferable. Any ideas?
Thanks!
>Solution :
With help from numpy, this is quite easy. By counting the number of None in each row one can find for each row the column with the last non-None value. Then using Numpy change this value to None:
data = np.random.random((6,10))
df = pd.DataFrame(data)
df.iloc[0, 7:] = None
df.iloc[1, 6:] = None
df.iloc[2, 5:] = None
df.iloc[3, 8:] = None
df.iloc[4, 5:] = None
df.iloc[5, 4:] = None
Original dataframe looks like this:
0 1 2 3 4 5
0 0.992337 0.651785 0.521422 NaN NaN NaN
1 0.912962 0.292458 0.620195 0.507071 0.010205 NaN
2 0.061320 0.565979 0.344755 NaN NaN NaN
3 0.521936 0.057917 0.359699 0.484009 NaN NaN
isnull = df.isnull()
col = data.shape[1] - isnull.sum(axis = 1) - 1
df.values[range(len(df)), col] = None
Updated dataframe looks like this:
0 1 2 3 4 5
0 0.992337 0.651785 NaN NaN NaN NaN
1 0.912962 0.292458 0.620195 0.507071 NaN NaN
2 0.061320 0.565979 NaN NaN NaN NaN
3 0.521936 0.057917 0.359699 NaN NaN NaN