Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas : fill NaN with the closest value, according to a category column

Let’s take a sample dataframe :

df = pd.DataFrame({"Date": ["2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06","2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06"],
                   "Animal" :["Cat","Cat","Cat","Cat","Cat","Cat","Dog","Dog","Dog","Dog","Dog","Dog"],
                   "Quantity":[np.nan,4,3,5,1,np.nan,6,5,np.nan,np.nan,2,1]})

          Date Animal  Quantity
0   2022-10-01    Cat       NaN
1   2022-10-02    Cat       4.0
2   2022-10-03    Cat       3.0
3   2022-10-04    Cat       5.0
4   2022-10-05    Cat       1.0
5   2022-10-06    Cat       NaN
6   2022-10-01    Dog       6.0
7   2022-10-02    Dog       5.0
8   2022-10-03    Dog       NaN
9   2022-10-04    Dog       NaN
10  2022-10-05    Dog       2.0
11  2022-10-06    Dog       1.0

I would like to fill the NaN values in the column Quantity using the following method :

  • Replace the NaN values with the closest value that is before the NaN value and which share the same value in Animal column
  • If there is still some NaN values, replace the remaining NaN values with the closest value that is after the Nan value and which share the same value in Animal column

I thought to Series.interpolate but I don’t know how to deal with the Animal column. Would you please know an efficient way to reach the expected output ?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Expected output :

          Date Animal  Quantity
0   2022-10-01    Cat         4
1   2022-10-02    Cat         4
2   2022-10-03    Cat         3
3   2022-10-04    Cat         5
4   2022-10-05    Cat         1
5   2022-10-06    Cat         1
6   2022-10-01    Dog         6
7   2022-10-02    Dog         5
8   2022-10-03    Dog         5
9   2022-10-04    Dog         5
10  2022-10-05    Dog         2
11  2022-10-06    Dog         1
``

>Solution :

You can use ffill/bfill per group:

df['Quantity'] = (df.groupby('Animal', group_keys=False)['Quantity']
                    .apply(lambda s: s.bfill().ffill())
                 )

Output:

          Date Animal  Quantity
0   2022-10-01    Cat       4.0
1   2022-10-02    Cat       4.0
2   2022-10-03    Cat       3.0
3   2022-10-04    Cat       5.0
4   2022-10-05    Cat       1.0
5   2022-10-06    Cat       1.0
6   2022-10-01    Dog       6.0
7   2022-10-02    Dog       5.0
8   2022-10-03    Dog       5.0
9   2022-10-04    Dog       5.0
10  2022-10-05    Dog       2.0
11  2022-10-06    Dog       1.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading