Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Substitute numbers in a list of type object pandas

I have a dataframe df looking as follows:

id cited_ids        dummy_paper   d      
2  [4]                  NaN        NaN 
4  [9,18,6]             NaN        NaN
6  []                   9          0
7  [2]                  NaN        NaN
9  [4]                   7        0
14 [18,6]                3        0
18 [7]                   1        0

What I would like to do is to substitute into df['cited_ids'] 0 whenever the corresponding id has d=0 (i) and replace d=1 if there is at least one 0 in the list of df['cited_ids'] and the previous d was not 0 (ii). In other words, the first step (i) would result in:

id cited_ids        dummy_paper   d      
2  [4]                  NaN       NaN 
4  [0,0,6]             NaN        NaN
6  []                   9         0
7  [2]                  NaN       NaN
9  [4]                   7        0
14 [0,6]                 3        0
18 [0]                   1        0

The second step (ii) would then result in:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

id cited_ids        dummy_paper   d      
2  [4]                  NaN       NaN 
4  [0,0,6]             NaN        1
6  []                   9         0
7  [2]                  NaN       NaN
9  [4]                   7        0
14 [0,6]                 3        0
18 [0]                   1        0

Please also notice that the dataframe comes with df['cited_ids'] being an object.

df.to_dict() gives:

{'docdb': {0: 2, 1: 4, 2: 6, 3: 7, 4: 9, 5: 14, 6: 18},
 'cited_docdb': {0: [4],
  1: [9, 18, 6],
  2: [],
  3: [2],
  4: [4],
  5: [18, 6],
  6: [7]},
 'fronteer': {0: nan, 1: nan, 2: 9.0, 3: nan, 4: 7.0, 5: 3.0, 6: 1.0},
 'distance': {0: nan, 1: nan, 2: 0.0, 3: nan, 4: 0.0, 5: 0.0, 6: 0.0}}

Thank you

>Solution :

The exact logic is unclear and your output doesn’t seem to match the description, but IIUC:

s = df.set_index('id')['d'].dropna().convert_dtypes()

df['cited_ids'] = [[s.get(i, i) for i in x]
                   for x in df['cited_ids']]

m = [0 in x for x in df['cited_ids']]

df.loc[m&df['d'].isna(), 'd'] = 1

output:

   id  cited_ids  dummy_paper    d
0   2        [4]          NaN  NaN
1   4  [0, 0, 0]          NaN  1.0
2   6         []          9.0  0.0
3   7        [2]          NaN  NaN
4   9        [4]          7.0  0.0
5  14     [0, 0]          3.0  0.0
6  18        [7]          1.0  0.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading