Python Add an id column which resets based on another column value

November 30, 2022

I have a data frame like the one below, and I want to add an id column that restarts based on the node value.

node1,0.858
node1,0.897
node1,0.954
node2,3.784
node2,7.640
node2,11.592

For example, I want the output below

0, node1, 0.858
1, node1, 0.897
2, node1, 0.954
0, node2, 3.784
1, node2, 7.640
2, node2, 11.592

I have tried to use an index based on the node values but this would not rest the column’s value after seeing a new node. I can use a loop but that is an anti-pattern in pandas.

>Solution :

You can group by the column you wish to base the partition on and then use cumcount() or cumsum(). Then use set_index() to reassign the index to the new field. You can skip that line however if you just need the partition index as a column.

import pandas as pd

data = {'Name':['node1','node1','node1','node2','node2','node3'],
        'Value':[1000,20000,40000,30000,589,682]}

df = pd.DataFrame(data)

df['New_Index'] = df.groupby('Name').cumcount()
df.set_index('New_Index', inplace = True)

display(df)