Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

rank for nan values based on group

I have dataframe with column d1 and now i am trying calculate ‘out’ column after ranking that column when there in ‘nan’ value with in a column.

  data_input = {'Name':['Renault', 'Renault', 'Renault', 'Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault'],
                  'type':['Duster', 'Duster', 'Duster','Duster','Duster','Duster','Duster','Triber','Triber','Triber','Triber','Triber','Triber','Triber'],
             'd1':['nan','10','10','10','nan','nan','20','20','nan','nan','30','30','30','nan']}  
    
    df_input = pd.DataFrame(data_input)
    
    data_out = {'Name':['Renault', 'Renault', 'Renault', 'Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault'],
                  'type':['Duster', 'Duster', 'Duster','Duster','Duster','Duster','Duster','Triber','Triber','Triber','Triber','Triber','Triber','Triber'],
             'd1':['nan','10','10','10','nan','nan','20','20','nan','nan','30','30','30','nan'],
             'out':[1,np.NaN,np.NaN,np.NaN,2,2,np.NaN,np.NaN,1,1,np.NaN,np.NaN,np.NaN,2]}  
    
    df_out = pd.DataFrame(data_out)

If in that particular group if nan appers before and after some values then rank should be in asscending.
ex: rank for index-0 will be 1 and index-4&5 will be 2(because there is no after values in that group)

df_out["out"] = df_out.groupby(["Name","type"])['d1'].rank(method="first")

Thanks in advance!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use GroupBy.cumsum by consecutive mising values per groups:

df_out['d1'] = pd.to_numeric(df_out['d1'], errors='coerce')


m = df_out['d1'].isna()

df_out["out1"] = (df_out.assign(a = (m & ~m.shift(fill_value=False)))
                        .groupby(["Name","type"])['a']
                        .cumsum()
                        .where(m))

Alternative solution with boolean indexing:

df_out["out1"] = (df_out.assign(a = (m & ~m.shift(fill_value=False)))[m]
                        .groupby(["Name","type"])['a']
                        .cumsum())
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading