Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Add an id column which resets based on another column value

I have a data frame like the one below, and I want to add an id column that restarts based on the node value.

node1,0.858
node1,0.897
node1,0.954
node2,3.784
node2,7.640
node2,11.592

For example, I want the output below

0, node1, 0.858
1, node1, 0.897
2, node1, 0.954
0, node2, 3.784
1, node2, 7.640
2, node2, 11.592

I have tried to use an index based on the node values but this would not rest the column’s value after seeing a new node. I can use a loop but that is an anti-pattern in pandas.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can group by the column you wish to base the partition on and then use cumcount() or cumsum(). Then use set_index() to reassign the index to the new field. You can skip that line however if you just need the partition index as a column.

import pandas as pd

data = {'Name':['node1','node1','node1','node2','node2','node3'],
        'Value':[1000,20000,40000,30000,589,682]}

df = pd.DataFrame(data)

df['New_Index'] = df.groupby('Name').cumcount()
df.set_index('New_Index', inplace = True)

display(df)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading