Home Transforme a dataframe with a column for each value within a colum in pandas

Questions

Transforme a dataframe with a column for each value within a colum in pandas

July 13, 2022

I have a dataframe such as :

Groups  Species_1 Species_2
G1      SP1_A     SP1
G1      SP1B      SP1
G1      SP2_AZ    SP2
G1      SP3_12:A  SP3
G1      SP4-2     SP4
G2      SP1_2     SP2
G2      SP3:21    SP3
G3      SP5(2)    SP5

And I would like to create a new dataframe with a for each Species_2 values create a column where for each Groups I add a row with the corresponding Species_1 values.

If there are multiple Species_1, I separate them with a ; in the cell as in the G1-SP1 example.

I should then get the following result;

Groups SP1          SP2     SP3      SP4   SP5
G1     SP1_A;SP1B   SP2_AZ  SP3_12:A SP4-2 NA
G2     NA           SP1_2   SP3:21   NA    NA
G3     NA           NA      NA       NA    SP5(2)

Here is the dataframe in dict format if it can help :

{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G2', 6: 'G2', 7: 'G3'}, 'Species_1': {0: 'SP1_A', 1: 'SP1B', 2: 'SP2_AZ', 3: 'SP3_12:A', 4: 'SP4-2', 5: 'SP1_2', 6: 'SP3:21', 7: 'SP5(2)'}, 'Species_2': {0: 'SP1', 1: 'SP1', 2: 'SP2', 3: 'SP3', 4: 'SP4', 5: 'SP2', 6: 'SP3', 7: 'SP5'}}

Thanks a lot for your help

>Solution :

Try with pivot_table

out =df.pivot_table(index ='Groups', columns = 'Species_2',values = 'Species_2',aggfunc = ';'.join).reset_index()
Out[73]: 
Species_2 Groups         SP1     SP2       SP3    SP4     SP5
0             G1  SP1_A;SP1B  SP2_AZ  SP3_12:A  SP4-2     NaN
1             G2         NaN   SP1_2    SP3:21    NaN     NaN
2             G3         NaN     NaN       NaN    NaN  SP5(2)