Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to replace nan / fillna per group in a pandas dataframe?

I have the following data:

               type           group
0           Drought  Climatological
1               nan  Climatological
2         Explosion   Technological
3   Ground movement     Geophysical
4               nan     Geophysical
5          Ash fall     Geophysical
6          Rockfall     Geophysical
7          Ash fall     Geophysical
8               nan   Technological
9         Explosion   Technological
10              nan  Meteorological
data_pd = pd.DataFrame({'type':['Drought','nan','Explosion','Ground movement','nan','Ash fall','Rockfall','Ash fall','nan','Explosion','nan'],  
                        'group':['Climatological','Climatological','Technological','Geophysical','Geophysical',  
                        'Geophysical','Geophysical','Geophysical','Technological','Technological','Meteorological']})

How can I replace the 'nan' depending on the group?

Below is my current approach:

I want to replace nan strings that match with specific strings from the next row in another column by some alternative string.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here’s a sample of data from my dataset where it seizes to work this was an output from pd.to_dict() I wanted to keep it as it is to replicate my issue.:

for ty, go in zip(data_pd['type'].values, data_pd['group'].values):
    if ty == 'nan' and go == 'Climatological':
        #ty = ['Drought']
        print(ty) #prints nothing as it did not work

>Solution :

Do NOT iterate for this kind of task, this is inefficient!

You can use masks and pandas.where to apply your filter:

data_pd['type'] = data_pd['type'].mask(data_pd['type'].eq('nan') & data_pd['group'].eq('Climatological'), 'Drought')

output:

               type           group
0           Drought  Climatological
1           Drought  Climatological
2         Explosion   Technological
3   Ground movement     Geophysical
4               nan     Geophysical
5          Ash fall     Geophysical
6          Rockfall     Geophysical
7          Ash fall     Geophysical
8               nan   Technological
9         Explosion   Technological
10              nan  Meteorological

much cleaner solution

If your objective is to fillna per group, you could use a dictionary and groupy:

subs = {'Climatological': 'Drought', 'Technological': 'foo'}

(data_pd.replace('nan', pd.NA)
        .groupby('group')
        .apply(lambda g: g.fillna(subs.get(g.name, 'nan')))
)

output:

               type           group
0           Drought  Climatological
1           Drought  Climatological
2         Explosion   Technological
3   Ground movement     Geophysical
4               nan     Geophysical
5          Ash fall     Geophysical
6          Rockfall     Geophysical
7          Ash fall     Geophysical
8               foo   Technological
9         Explosion   Technological
10              nan  Meteorological
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading