How to avoid key error if new operation is added which only exists once

July 24, 2023

With the following code I am able to calculate the maximum gaps of each operation:

data = [
  {'order': 1, 'operation': 'milling', 'start': 0, 'end': 70},
  {'order': 1, 'operation': 'milling', 'start': 200, 'end': 210},
  {'order': 1, 'operation': 'milling', 'start': 500, 'end': 600},
  {'order': 1, 'operation': 'grinding', 'start': 90, 'end': 150},
  {'order': 2, 'operation': 'grinding', 'start': 150, 'end': 170},
  {'order': 3, 'operation': 'grinding', 'start': 400, 'end': 420},
  {'order': 3, 'operation': 'milling', 'start': 610, 'end': 660}

  ]

df = pd.DataFrame(data)

df['start'] = df['start'].shift(-1)
df = df.groupby('operation').apply(lambda x: x.loc[(x['start'] - x['end']).idxmax()])[['operation', 'end', 'start']].reset_index(drop=True)
df.columns = ['operation', 'start', 'end']
df['max_gap'] = df['end'] - df['start']

print(df)

Prints:

  operation  start    end  max_gap
0  grinding    170  400.0    230.0
1   milling    210  500.0    290.0

The problem is, when there is a new order with a new operation (e.g. "new_operation") I get a key error (KeyError: nan) because it only exsists once (I guess).


data = [
  {'order': 1, 'operation': 'milling', 'start': 0, 'end': 70},
  {'order': 1, 'operation': 'milling', 'start': 200, 'end': 210},
  {'order': 1, 'operation': 'milling', 'start': 500, 'end': 600},
  {'order': 1, 'operation': 'grinding', 'start': 90, 'end': 150},
  {'order': 2, 'operation': 'grinding', 'start': 150, 'end': 170},
  {'order': 3, 'operation': 'grinding', 'start': 400, 'end': 420},
  {'order': 3, 'operation': 'milling', 'start': 610, 'end': 660},
  {'order': 3, 'operation': 'new_operation', 'start': 610, 'end': 660}

  ]

...

KeyError: nan

How to avoid this in a nice way?

>Solution :

when using df["start"] = df["start"].shift(-1)
the last data point is filled by NaN so you should fill the missing value by fillna method or use the fill_value option in the shift method.

df["start"] = df["start"].shift(-1).fillna(method="ffill")

df["start"] = df["start"].shift(-1, fill_value=0)