Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas Fill in Missing Row in Group with multiple keys

I’m looking to fill in a dataframe with a missing row based on a few criteria.

Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   keyA            39686 non-null  object        
 1   keyB            39686 non-null  int64         
 2   keyC            39686 non-null  object        
 3   keyD            39686 non-null  object        
 4   snapshot_week   39686 non-null  datetime64[ns]
 5   metric1         39686 non-null  int64        
 6   metric2         39686 non-null  int64        
dtypes: datetime64[ns](1), int64(1), object(5)

A1/B1/C1/D1 has data missing for 2022-08-20, and A3/B3/C3/D3 is missing 2022-08-27.

[['A1','B1','C1','D1','2022-08-27',5000,5000],
['A2','B2','C2','D2','2022-08-20',4278,4278],
['A2','B2','C2','D2','2022-08-27',6852,6852],
['A3','B3','C3','D3','2022-08-20',9587,9587]]

I’d like to update the data frame with the following extra rows, using 0 as the default value for metric1/metric2.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[['A1','B1','C1','D1','2022-08-20',0,0],  # <--- 
['A1','B1','C1','D1','2022-08-27',5000,5000],
['A2','B2','C2','D2','2022-08-20',4278,4278],
['A2','B2','C2','D2','2022-08-27',6852,6852],
['A3','B3','C3','D3','2022-08-20',9587,9587],
['A3','B3','C3','D3','2022-08-27',0,0]] # <--

I’ve tried a few different techniques such as reindex, asfreq, groupby but I have yet to achieve the desired results. The snapshot_week values will only be on a Saturday, and only two dates will ever be present at a given time. Not all key permutations are needed, and some keys are numeric identifiers. Essentially, I just need to have the week over week record for the key combination with metrics defaulting to 0.

Thanks in advance for the support!

>Solution :

Try as follows:

import pandas as pd

data = [['A1','B1','C1','D1','2022-08-27',5000,5000],
['A2','B2','C2','D2','2022-08-20',4278,4278],
['A2','B2','C2','D2','2022-08-27',6852,6852],
['A3','B3','C3','D3','2022-08-20',9587,9587]]

cols = ['keyA','keyB','keyC','keyD','snapshot_week', 'metric1', 'metric2']
df = pd.DataFrame(data, columns=cols)

df_new = df.pivot(index=['keyA','keyB','keyC','keyD'], 
                  columns=['snapshot_week'], 
                  values=['metric1','metric2'])\
    .stack(dropna=False).reset_index(drop=False).fillna(0)
    
print(df_new)

  keyA keyB keyC keyD snapshot_week  metric1  metric2
0   A1   B1   C1   D1    2022-08-20      0.0      0.0
1   A1   B1   C1   D1    2022-08-27   5000.0   5000.0
2   A2   B2   C2   D2    2022-08-20   4278.0   4278.0
3   A2   B2   C2   D2    2022-08-27   6852.0   6852.0
4   A3   B3   C3   D3    2022-08-20   9587.0   9587.0
5   A3   B3   C3   D3    2022-08-27      0.0      0.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading