Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a new column for each group and merge multiple element in a column

I have a dataframe such as

Groups  Species        Assignation
G1      Homo_sapiens   assign1
G1      Homo_sapiens   assign2
G1      Homo_sapiens   assign3
G1      Danio_rerio    NA
G1      Mus_musculus   NA
G1      Apis_mellifera NA 
G2      Danio_rerio    NA
G2      Homo_sapiens   assign4
G2      Cavius_percel  NA
G3      Danio_rerio    NA
G3      Mus_musculus   NA

And I would like to create a New_assignation column where for each Groups if there is multiple Assignation, I merge them separated by a "-" such as:

Groups  Species        Assignation   New_assignation
G1      Homo_sapiens   assign1       assign1-assign2-assign3
G1      Homo_sapiens   assign2       assign1-assign2-assign3
G1      Homo_sapiens   assign3       assign1-assign2-assign3
G1      Danio_rerio    NA            assign1-assign2-assign3
G1      Mus_musculus   NA            assign1-assign2-assign3
G1      Apis_mellifera NA            assign1-assign2-assign3
G2      Danio_rerio    NA            assign4
G2      Homo_sapiens   assign4       assign4
G2      Cavius_percel  NA            assign4
G3      Danio_rerio    NA            NA
G3      Mus_musculus   NA            NA

Here is the dict format of the dataframe if it can helps:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

{'Groups ': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G2', 7: 'G2', 8: 'G2', 9: 'G3', 10: 'G3'}, 'Species': {0: 'Homo_sapiens', 1: 'Homo_sapiens', 2: 'Homo_sapiens', 3: 'Danio_rerio', 4: 'Mus_musculus', 5: 'Apis_mellifera', 6: 'Danio_rerio ', 7: 'Homo_sapiens', 8: 'Cavius_percel', 9: 'Danio_rerio', 10: 'Mus_musculus'}, 'Assignation': {0: 'assign1', 1: 'assign2', 2: 'assign3', 3: nan, 4: nan, 5: 'NA ', 6: nan, 7: 'assign4', 8: nan, 9: nan, 10: nan}}

>Solution :

Use GroupBy.transform with lambda function for join and remove missing values:

f = lambda x: '-'.join(x.dropna())
df['New_assignation'] = df.groupby('Groups')['Assignation'].transform(f).replace('', np.nan)
print (df)
   Groups         Species Assignation          New_assignation
0      G1    Homo_sapiens     assign1  assign1-assign2-assign3
1      G1    Homo_sapiens     assign2  assign1-assign2-assign3
2      G1    Homo_sapiens     assign3  assign1-assign2-assign3
3      G1     Danio_rerio         NaN  assign1-assign2-assign3
4      G1    Mus_musculus         NaN  assign1-assign2-assign3
5      G1  Apis_mellifera         NaN  assign1-assign2-assign3
6      G2     Danio_rerio         NaN                  assign4
7      G2    Homo_sapiens     assign4                  assign4
8      G2   Cavius_percel         NaN                  assign4
9      G3     Danio_rerio         NaN                      NaN
10     G3    Mus_musculus         NaN                      NaN

If need also remove duplicated values in original order use dict.fromkeys trick:

print (df)
   Groups         Species Assignation
0      G1    Homo_sapiens     assign1
1      G1    Homo_sapiens     assign1
2      G1    Homo_sapiens     assign3
3      G1     Danio_rerio         NaN
4      G1    Mus_musculus         NaN
5      G1  Apis_mellifera         NaN
6      G2     Danio_rerio         NaN
7      G2    Homo_sapiens     assign4
8      G2   Cavius_percel         NaN
9      G3     Danio_rerio         NaN
10     G3    Mus_musculus         NaN

f = lambda x: '-'.join(dict.fromkeys(x.dropna()))
df['New_assignation'] = df.groupby('Groups')['Assignation'].transform(f).replace('', np.nan)
print (df)
   Groups         Species Assignation  New_assignation
0      G1    Homo_sapiens     assign1  assign1-assign3
1      G1    Homo_sapiens     assign1  assign1-assign3
2      G1    Homo_sapiens     assign3  assign1-assign3
3      G1     Danio_rerio         NaN  assign1-assign3
4      G1    Mus_musculus         NaN  assign1-assign3
5      G1  Apis_mellifera         NaN  assign1-assign3
6      G2     Danio_rerio         NaN          assign4
7      G2    Homo_sapiens     assign4          assign4
8      G2   Cavius_percel         NaN          assign4
9      G3     Danio_rerio         NaN              NaN
10     G3    Mus_musculus         NaN              NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading