How to complete missing rows based on numeric columns?

August 8, 2024

My input is a dataframe :

import pandas as pd

df = pd.DataFrame({'ID': ['ID001', 'ID001', 'ID001', 'ID002'],
 'N1': [1, 1, 2, 1],
 'N2': [1, 2, 1, 3],
 'SUB_GROUP': [2, 2, 2, 1],
 'CHUNK': [2, 2, 2, 4]})

I’m trying to add some rows (blue ones in the image below) based on the values of the columns SUB_GROUP and CHUNK. For example for ID002, the dataframe has only one row N1=1 and N2=3 and since this id must have one subgroup with 4 units each, then we add 1/1, 1/2 and 1/4.

I was able to generate almost the new rows to add but can’t find a way to merge them in my df :

for id in df['ID'].unique():
    new_rows = []
    df1 = df.loc[df['ID'] == id]
    len_sub_group = df1['SUB_GROUP'].iat[0]
    len_chunk = df1['CHUNK'].iat[0]
    for list1 in [list(n) for n in df1[['N1', 'N2']].values]:
        list1 = list([tuple(list1)])
        new_rows = [(len_sub_group, i)
            for i in range(1, len_chunk + 1)
            if (len_sub_group, i) not in list1]

Do you guys have an idea to do that ?

>Solution :

You could use a custom groupby.apply with itertools.product and merge:

from itertools import product

cols = ['ID', 'SUB_GROUP', 'CHUNK']

def add_missing(g):
    p =product(range(1, g['SUB_GROUP'].iat[0]+1),
               range(1, g['CHUNK'].iat[0]+1),
              )
    return g.merge(
        pd.DataFrame(p,
                     columns=['N1', 'N2']),
        how='right'
    )

out = (df.groupby(cols)[list(df)]
         .apply(add_missing).drop(columns=cols)
         .reset_index(cols)[list(df)]
      )

Output:

      ID  N1  N2  SUB_GROUP  CHUNK
0  ID001   1   1          2      2
1  ID001   1   2          2      2
2  ID001   2   1          2      2
3  ID001   2   2          2      2
0  ID002   1   1          1      4
1  ID002   1   2          1      4
2  ID002   1   3          1      4
3  ID002   1   4          1      4