Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert tuples into grouped rows in dataframe without changing the order

I have a tuple and I need to convert it to dataframe.

res1_ =  [
  ('z1', '1'),
  ('z1', '2'),
  ('x1', '1'),
  ('x2', '1'),
  ('x1', '3'),
  ('z1', '1')]

My expected dataframe should be like this :

docid secid
z1    [1,2]
x1    [1]
x2    [1]
x1    [3]
z1    [1]

If you note, the order is not changed and if docid get repeated in next row, then two secids are merged into a single list.
Although x1 is occurring twice, sec id 1 and 3 are not in single list as we have docid x2 in mid of the x1s.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried with :

df = pd.DataFrame(res1_,columns=['docid','secid'])
df.groupby('docid')['secid'].apply(list)

But no luck as I am losing the order and x1 too is grouped.

Any pointers appreciated.

Thank you.

>Solution :

You can use the DataFrame constructor, then GroupBy.agg:

df = pd.DataFrame(res1_, columns=['docid', 'setid'])
group = df['docid'].ne(df['docid'].shift()).cumsum()
df = df.groupby(group.values).agg({'docid': 'first', 'setid': list})

output:

  docid   setid
1    z1  [1, 2]
2    x1     [1]
3    x2     [1]
4    x1     [3]
5    z1     [1]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading