Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Values to columns and then group and merge by same Id

I have a dataframe like this

df = DataFrame({'Id':[1,2,3,3,4,5,6,6,6],
             'Type': ['T1','T1','T2','T3','T2','T1','T1','T2','T3'],
             'Duration':[5,10,5,7,5,10,15,20,15]})
df
   Id   Type    Duration
0   1   T1      5
1   2   T1      10
2   3   T2      5
3   3   T3      7
4   4   T2      5
5   5   T1      10
6   6   T1      15
7   6   T2      20
8   6   T3      15

I want to create new columns based on the unique values from Type with values from Duration and then make it one row for each Id

col_list = df.Type.unique().tolist()
df[col_list] = nan

def fill_values(duration):
    return duration

for col in col_list:
    df[col] = df['Duration'].loc[df['Type'] == col].apply(fill_values)

Output:
    Id  Type    Duration    T1  T2  T3
0   1   T1      5           5.0 NaN NaN
1   2   T1      10          10.0 NaN NaN
2   3   T2      5           NaN 5.0 NaN
3   3   T3      7           NaN NaN 7.0
4   4   T2      5           NaN 5.0 NaN
5   5   T1      10          10.0 NaN NaN
6   6   T1      15          15.0 NaN NaN
7   6   T2      20          NaN 20.0 NaN
8   6   T3      15          NaN NaN 15.0

Q1: Is there a better way to do this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Q2: How to combine the rows with same Id in to one?

Expected output:

   Id   T1  T2  T3
0   1   5   NaN NaN 
1   2   10  NaN NaN 
2   3   NaN 5   7
3   4   NaN 5   NaN 
4   5   10  NaN NaN 
5   6   15  20   15

Any help appreciated.

>Solution :

Q1: Is there a better way to do this
Blockquote

A1: yes, use pivot_table

Q2: How to combine the rows with same Id in to one?

A2: Like this:

df = df.pivot_table(index='Id',columns='Type',values='Duration').rename_axis(None, axis=1).reset_index() 



expected result:

    Id  T1     T2      T3
0   1   5.0    NaN     NaN
1   2   10.0   NaN     NaN
2   3   NaN    5.0     7.0
3   4   NaN    5.0     NaN
4   5   10.0   NaN     NaN
5   6   15.0   20.0    15.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading