Validate consecutive numbers in column, based on category in another column

I am working with a DataFrame with the following structure:

data = {'alpha_3_id': ['LIA', 'LIA', 'LIA', 'LIA', 'MIL', 'MIL', 'DEA', 'DEA', 'DEA', 'DEA'],
        'id': [1, 2, 3, 4, 1, 2, 1, 2, 3, 4]
       }
df = pd.DataFrame(data)

  alpha_3_id  id
0        LIA   1
1        LIA   2
2        LIA   3
3        LIA   4
4        MIL   1
5        MIL   2
6        DEA   1
7        DEA   2
8        DEA   3
9        DEA   4

I need to validate that for each "alpha_3_id", numbers in column "id" are listed in a consecutive order.

I tried to do this using code below but it only works if all values in column are consecutive but I need to test it for each category (LIA, MIL DEA, etc).

all(j == i + 1 for i, j in zip(list_of_values, list_of_values[1:]))

>Solution :

For test consecutive values use custom lambda function with Series.diff with omit first value and test if all values are 1 by Series.all:

out = df.groupby('alpha_3_id')['id'].apply(lambda x: x.diff().iloc[1:].eq(1).all())
print (out)
alpha_3_id
DEA    True
LIA    True
MIL    True
Name: id, dtype: bool

out = df.groupby('alpha_3_id')['id'].apply(lambda x: x.diff().iloc[1:].eq(1).all()).all()
print (out)
True

Another idea is use DataFrameGroupBy.diff, remove first values per groups by Series.duplicated, test by 1 and last use Series.all:

out = df.groupby('alpha_3_id')['id'].diff()[df['alpha_3_id'].duplicated()].eq(1).all()
print (out)
True

Leave a Reply