I have a dataframe such as :
Groups Species_1 Species_2
G1 SP1_A SP1
G1 SP1B SP1
G1 SP2_AZ SP2
G1 SP3_12:A SP3
G1 SP4-2 SP4
G2 SP1_2 SP2
G2 SP3:21 SP3
G3 SP5(2) SP5
And I would like to create a new dataframe with a for each Species_2 values create a column where for each Groups I add a row with the corresponding Species_1 values.
If there are multiple Species_1, I separate them with a ; in the cell as in the G1-SP1 example.
I should then get the following result;
Groups SP1 SP2 SP3 SP4 SP5
G1 SP1_A;SP1B SP2_AZ SP3_12:A SP4-2 NA
G2 NA SP1_2 SP3:21 NA NA
G3 NA NA NA NA SP5(2)
Here is the dataframe in dict format if it can help :
{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G2', 6: 'G2', 7: 'G3'}, 'Species_1': {0: 'SP1_A', 1: 'SP1B', 2: 'SP2_AZ', 3: 'SP3_12:A', 4: 'SP4-2', 5: 'SP1_2', 6: 'SP3:21', 7: 'SP5(2)'}, 'Species_2': {0: 'SP1', 1: 'SP1', 2: 'SP2', 3: 'SP3', 4: 'SP4', 5: 'SP2', 6: 'SP3', 7: 'SP5'}}
Thanks a lot for your help
>Solution :
Try with pivot_table
out =df.pivot_table(index ='Groups', columns = 'Species_2',values = 'Species_2',aggfunc = ';'.join).reset_index()
Out[73]:
Species_2 Groups SP1 SP2 SP3 SP4 SP5
0 G1 SP1_A;SP1B SP2_AZ SP3_12:A SP4-2 NaN
1 G2 NaN SP1_2 SP3:21 NaN NaN
2 G3 NaN NaN NaN NaN SP5(2)