Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a binary table with presence/absence from groups in pandas

I have a dataframe such as:

Groups Name             
G1     BLOC1_Lgth
G1     Canis_lupus
G1     Rattus_rattus
G2     Apis_mellifera
G2     Apis_terrestris
G3     Canis_lupus
G4     Bombus_bobomus
G4     Rattus_rattus
G4     Canis_lupus
G4     Apis_mellifera
G5     BLOC4_Lgth
G5     Apis_terrestris
G6     BLOC5_Lgth
G7     BLOC2_Lgth
G7     Apis_mellifera
G7     Apis_terrestris 
G7     Canis_lupus
G7     Rattus_rattus
G8     BLOC3_Lgth

and a list such as :

A_list=['BLOC1_Lgth','BLOC2_Lgth','BLOC3_Lgth','BLOC4_Lgth','BLOC5_Lgth']

And I would like to create a binary table where for each elements in A_list , if a Name is present in the same Group, I fill a column by 1 value.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For instance, we see that BLOC1_Lgth is in G1, where I found two different Name values :

Canis_lupus and Rattus_rattus, so I add the column and add 1 to both such as:

A_list_values    Groups  Canis_lupus Rattus_rattus Apis_mellifera Apis_terrestris 
BLOC1_Lgth       G1      1           1             0              0
 

and zero for the other Name not present in that Group (Apis_mellifera and Apis_terrestris)

Expected results:

A_list_values    Groups  Canis_lupus Rattus_rattus Apis_mellifera Apis_terrestris 
BLOC1_Lgth       G1      1           1             0              0
BLOC2_Lgth       G7      1           1             1              1
BLOC3_Lgth       G8      0           0             0              0
BLOC4_Lgth       G5      0           0             0              1
BLOC5_Lgth       G6      0           0             0              0

Here is that dataframe in dict format if it can helps :

{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G2', 4: 'G2', 5: 'G3', 6: 'G4', 7: 'G4', 8: 'G4', 9: 'G4', 10: 'G5', 11: 'G5', 12: 'G6', 13: 'G7', 14: 'G7', 15: 'G7', 16: 'G7', 17: 'G7', 18: 'G8'}, 'Name  ': {0: 'BLOC1_Lgth', 1: 'Canis_lupus', 2: 'Rattus_rattus', 3: 'Apis_mellifera', 4: 'Apis_terrestris', 5: 'Canis_lupus', 6: 'Bombus_bobomus', 7: 'Rattus_rattus', 8: 'Canis_lupus', 9: 'Apis_mellifera', 10: 'BLOC4_Lgth', 11: 'Apis_terrestris', 12: 'BLOC5_Lgth', 13: 'BLOC2_Lgth', 14: 'Apis_mellifera', 15: 'Apis_terrestris ', 16: 'Canis_lupus', 17: 'Rattus_rattus', 18: 'BLOC3_Lgth'}}

>Solution :

Use:

#typo in Apis_terrestris - removed last space
df = pd.DataFrame({'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G2', 4: 'G2', 5: 'G3', 6: 'G4', 7: 'G4', 8: 'G4', 9: 'G4', 10: 'G5', 11: 'G5', 12: 'G6', 13: 'G7', 14: 'G7', 15: 'G7', 16: 'G7', 17: 'G7', 18: 'G8'}, 'Name  ': {0: 'BLOC1_Lgth', 1: 'Canis_lupus', 2: 'Rattus_rattus', 3: 'Apis_mellifera', 4: 'Apis_terrestris', 5: 'Canis_lupus', 6: 'Bombus_bobomus', 7: 'Rattus_rattus', 8: 'Canis_lupus', 9: 'Apis_mellifera', 10: 'BLOC4_Lgth', 11: 'Apis_terrestris', 12: 'BLOC5_Lgth', 13: 'BLOC2_Lgth', 14: 'Apis_mellifera', 15: 'Apis_terrestris', 16: 'Canis_lupus', 17: 'Rattus_rattus', 18: 'BLOC3_Lgth'}})
  
A_list=['BLOC1_Lgth','BLOC2_Lgth','BLOC3_Lgth','BLOC4_Lgth','BLOC5_Lgth']

You can create A_list_values column by filter groups with A_list and forward missing values per groups by GroupBy.ffill, then remove NaNs groups by DataFrame.dropna and create indicator columns by get_dummies with aggregate max, last remove BLOC columns:

m = df['Name  '].isin(A_list)
df['A_list_values'] = df['Name  '].where(m).groupby(df['Groups']).ffill()
df = (pd.get_dummies(df.dropna(subset=['A_list_values']), 
                     columns=['Name  '], 
                     prefix='', 
                     prefix_sep='')
        .groupby(['A_list_values','Groups'], as_index=False)
        .max())
df = df.loc[:, ~df.columns.str.startswith('BLOC')]
print (df)
  A_list_values Groups  Apis_mellifera  Apis_terrestris  Canis_lupus  \
0    BLOC1_Lgth     G1               0                0            1   
1    BLOC2_Lgth     G7               1                1            1   
2    BLOC3_Lgth     G8               0                0            0   
3    BLOC4_Lgth     G5               0                1            0   
4    BLOC5_Lgth     G6               0                0            0   

   Rattus_rattus  
0              1  
1              1  
2              0  
3              0  
4              0  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading