Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to group rows based on column ID in a pandas dataframe?

I have below the dataframe below df1:

ID       Label   Value
id_1     A
id_1     B
id_1     C
id_1     D
id_1     E
id_1             10
id_1             20
id_1             30
id_2     F
id_2     G
id_2     H
id_2             40
id_2             50
id_2             60
id_2             70
id_2             80
id_2             90

I would like to group the rows based on the ID column in the following way :

ID      Label   Value
id_1     A      10
id_1     B      20
id_1     C      30
id_1     D
id_1     E      
id_2     F      40
id_2     G      50
id_2     H      60
id_2            70
id_2            80
id_2            90

My goal is to always align the first value in column "Label" for a given ID, with the first value of column "Value" for the same ID (the empty cells inbetwwen numbers are expected).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How can I do this in the most optimal way ?

I tried with groupby feature but didn’t manage to get what I want, Im pretty sure there is an optimal to do this but can’t figure it out right now.

>Solution :

Assuming empty cells are NaN/None, you could count the number of leading empty rows in Value, of trailing ones in Label (with isna+cummin+sum), then groupby.apply to shift "Value" up and remove empty rows in the end with head:

def cust_shift(g):
    # number of leading empty rows
    n1 = g['Value'].isna().cummin().sum()
    # number of trailing empty rows
    n2 = g.loc[::-1, 'Label'].isna().cummin().sum()
    # shift Value up and remove trailing empty rows
    return g.assign(Value=g['Value'].shift(-n1)).head(-min(n1, n2))

out = df.groupby('ID', group_keys=False)[list(df)].apply(cust_shift)

Output:

      ID Label  Value
0   id_1     A   10.0
1   id_1     B   20.0
2   id_1     C   30.0
3   id_1     D    NaN
4   id_1     E    NaN
8   id_2     F   40.0
9   id_2     G   50.0
10  id_2     H   60.0
11  id_2  None   70.0

Reproducible input:

from numpy import nan
df = pd.DataFrame({'ID': ['id_1', 'id_1', 'id_1', 'id_1', 'id_1', 'id_1', 'id_1', 'id_1',
                          'id_2', 'id_2', 'id_2', 'id_2', 'id_2', 'id_2', 'id_2'],
                   'Label': ['A', 'B', 'C', 'D', 'E', None, None, None, 'F', 'G', 'H', None, None, None, None],
                   'Value': [nan, nan, nan, nan, nan, 10.0, 20.0, 30.0, nan, nan, nan, 40.0, 50.0, 60.0, 70.0]})

empty cells are empty strings

If your empty cells are in fact empty strings, just adapt the above code to use eq('') in place of isna and add fill_value='' in shift:

def cust_shift(g):
    # number of leading empty rows
    n1 = g['Value'].eq('').cummin().sum()
    # number of trailing empty rows
    n2 = g.loc[::-1, 'Label'].eq('').cummin().sum()
    return (g.assign(Value=g['Value'].shift(-n1, fill_value=''))
             .head(-min(n1, n2))
            )

out = (df.groupby('ID', group_keys=False)[list(df)]
         .apply(cust_shift)
      )

Output:

      ID Label Value
0   id_1     A    10
1   id_1     B    20
2   id_1     C    30
3   id_1     D      
4   id_1     E      
8   id_2     F    40
9   id_2     G    50
10  id_2     H    60
11  id_2          70

Alternative input:

df = pd.DataFrame({'ID': ['id_1', 'id_1', 'id_1', 'id_1', 'id_1', 'id_1', 'id_1', 'id_1',
                          'id_2', 'id_2', 'id_2', 'id_2', 'id_2', 'id_2', 'id_2'],
                   'Label': ['A', 'B', 'C', 'D', 'E', '', '', '', 'F', 'G', 'H', '', '', '', ''],
                   'Value': ['', '', '', '', '', 10, 20, 30, '', '', '', 40, 50, 60, 70]})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading