Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get the position of an element in a list from a pandas dataframe column

I have this dataframe:

import pandas as pd
data  = {'day':[18,18,19,19,20],
        'tokens':[['a1','a2','a3','a3'],['a1','a2','a5','a6'],['b1','b2','b3'],['b0','b2','b3'],['c1','c2','c3']]}
df = pd.DataFrame(data)

Output:

    day    tokens
0   18  [a1, a2, a3, a3]
1   18  [a1, a2, a5, a6]
2   19  [b1, b2, b3]
3   19  [b0, b2, b3]
4   20  [c1, c2, c3]

I want to get the position of each element in the list in the column of tokens, and I want to get the position both from the beginning and from the end (to get two positions for each element):
Those for element a1 in the first row will be 0 first and 3 from the end (I want the positions to be row by row).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

     day    tokens  position
    0   18    a1     0
    1   18    a1     3
    2   18    a2     1
    3   18    a2     2
    4   18    a3     2
    5   18    a3     1
    6   18    a4     3
    7   18    a4     0
...

>Solution :

You can try assign each index from the beginning and from the end to postion column first

df['position'] = df['tokens'].apply(lambda lst: list(zip(range(len(lst)), reversed(range(len(lst))))))
print(df)

   day            tokens                          position
0   18  [a1, a2, a3, a3]  [(0, 3), (1, 2), (2, 1), (3, 0)]
1   18  [a1, a2, a5, a6]  [(0, 3), (1, 2), (2, 1), (3, 0)]
2   19      [b1, b2, b3]          [(0, 2), (1, 1), (2, 0)]
3   19      [b0, b2, b3]          [(0, 2), (1, 1), (2, 0)]
4   20      [c1, c2, c3]          [(0, 2), (1, 1), (2, 0)]

The explode multiple list columns to rows:

df = (df.set_index(['day'])
      .apply(pd.Series.explode).reset_index()
      .drop_duplicates(['tokens', 'position'])
      .apply(pd.Series.explode).reset_index(drop=True)
      .sort_values(['day', 'tokens']))
print(df)

    day tokens position
0    18     a1        0
1    18     a1        3
2    18     a2        1
3    18     a2        2
4    18     a3        2
5    18     a3        1
6    18     a3        3
7    18     a3        0
8    18     a5        2
9    18     a5        1
10   18     a6        3
11   18     a6        0
18   19     b0        0
19   19     b0        2
12   19     b1        0
13   19     b1        2
14   19     b2        1
15   19     b2        1
16   19     b3        2
17   19     b3        0
20   20     c1        0
21   20     c1        2
22   20     c2        1
23   20     c2        1
24   20     c3        2
25   20     c3        0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading