Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas : Create new column based on text value of another column

This might be very simple question, but here’s my dataframe:

    id      text                    position        labels
0   39088   skin melanoma           [58.0, 71.0]    indication
1   39088   proteinase              [137.0, 147.0]  protein
2   39088   plasminogen activator   [170.0, 191.0]  protein
3   39088   NaN                     [nan, nan]      NaN
4   39088   NaN                     [nan, nan]      NaN
5   39088   proteinase substrates   [36.0, 57.0]    protein
6   39088   tumors                  [67.0, 73.0]    indication
7   39088   NaN                     [nan, nan]      NaN
8   39088   Melanoma                [0.0, 8.0]      indication
9   39088   EDTA                    [172.0, 176.0]  protein
{'pmid': [39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088,
  39088],
 'text': ['skin melanoma',
  'proteinase',
  'plasminogen activator',
  nan,
  nan,
  'proteinase substrates',
  'tumors',
  nan,
  'Melanoma',
  'EDTA'],
 'position': ['[58.0, 71.0]',
  '[137.0, 147.0]',
  '[170.0, 191.0]',
  '[nan, nan]',
  '[nan, nan]',
  '[36.0, 57.0]',
  '[67.0, 73.0]',
  '[nan, nan]',
  '[0.0, 8.0]',
  '[172.0, 176.0]'],
 'labels': ['indication',
  'protein',
  'protein',
  nan,
  nan,
  'protein',
  'indication',
  nan,
  'indication',
  'protein']}

And here’s the WANTED OUTPUT , where I want to create 2 new columns based on the values of the labels column, and put as values the corresponding text and position depending on whether they are indication or protein, and for the rest NaN.

    id      indication     indication.position       protein                 protein.position 
0   39088   skin melanoma   [58.0, 71.0]             NaN                     [nan, nan]
1   39088   NaN             [nan, nan]               proteinase              [137.0, 147.0]
2   39088   NaN             [nan, nan]               plasminogen activator   [170.0, 191.0]  
3   39088   NaN             [nan, nan]               NaN                     [nan, nan]
4   39088   NaN             [nan, nan]               NaN                     [nan, nan]
5   39088   NaN             [nan, nan]               proteinase substrates   [36.0, 57.0] 
6   39088   tumors          [67.0, 73.0]             NaN                     [nan, nan]
7   39088   NaN             [nan, nan]               NaN                     [nan, nan]
8   39088   Melanoma                [0.0, 8.0]       NaN                     [nan, nan]
9   39088   NaN             [nan, nan]               EDTA                    [172.0, 176.0]     

What is the best way to do this? Can someone help?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use:

out = (df
   .drop(columns=['position', 'result.value.labels'])
   .join(
 df.reset_index().dropna(subset=['result.value.labels'])
   .pivot(index='index', columns='result.value.labels', values='position')
   .reindex(df.index)
   .fillna('[nan, nan]')
   .add_suffix('.position')
   )
)

output:

    pmid      result.value.text indication.position protein.position
0  39088          skin melanoma        [58.0, 71.0]       [nan, nan]
1  39088             proteinase          [nan, nan]   [137.0, 147.0]
2  39088  plasminogen activator          [nan, nan]   [170.0, 191.0]
3  39088                    NaN          [nan, nan]       [nan, nan]
4  39088                    NaN          [nan, nan]       [nan, nan]
5  39088  proteinase substrates          [nan, nan]     [36.0, 57.0]
6  39088                 tumors        [67.0, 73.0]       [nan, nan]
7  39088                    NaN          [nan, nan]       [nan, nan]
8  39088               Melanoma          [0.0, 8.0]       [nan, nan]
9  39088                   EDTA          [nan, nan]   [172.0, 176.0]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading