Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas dataframe generate rows

I have a Input dataframe:

import pandas as pd

# Define the input data
data = {
    'ID':  [500, 200, 300],
    'A': [3, 3, ''],
    'B': [3, 1, ''],
    'C': [2, '' ,''],
    'D': ['', 2, 1],
    'E': ['', '',2 ],
}

# Convert the input data to a Pandas DataFrame
df = pd.DataFrame(data)

Input table

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I need to transform this input as you can see in below output example:

enter image description here

If you have any ideas please share. Thank you very much!

>Solution :

You can repeat the rows using the max number per row, then compare to a groupby.cumcount to map the 1/'':

tmp = df.set_index('ID').apply(pd.to_numeric, errors='coerce')

out = (
 tmp.loc[tmp.index.repeat(tmp.max(axis=1))]
    .pipe(lambda d: d.gt(d.groupby(level='ID').cumcount(), axis=0))
    .replace({True: 1, False: ''})
    .reset_index()
)

NB. you can simplify the code if you use NaN in place of empty strings.

Output:

    ID  A  B  C  D  E
0  500  1  1  1      
1  500  1  1  1      
2  500  1  1         
3  200  1  1     1   
4  200  1        1   
5  200  1            
6  300           1  1
7  300              1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading