I am working with a DataFrame with the following structure:
data = {'alpha_3_id': ['LIA', 'LIA', 'LIA', 'LIA', 'MIL', 'MIL', 'DEA', 'DEA', 'DEA', 'DEA'],
'id': [1, 2, 3, 4, 1, 2, 1, 2, 3, 4]
}
df = pd.DataFrame(data)
alpha_3_id id
0 LIA 1
1 LIA 2
2 LIA 3
3 LIA 4
4 MIL 1
5 MIL 2
6 DEA 1
7 DEA 2
8 DEA 3
9 DEA 4
I need to validate that for each "alpha_3_id", numbers in column "id" are listed in a consecutive order.
I tried to do this using code below but it only works if all values in column are consecutive but I need to test it for each category (LIA, MIL DEA, etc).
all(j == i + 1 for i, j in zip(list_of_values, list_of_values[1:]))
>Solution :
For test consecutive values use custom lambda function with Series.diff with omit first value and test if all values are 1 by Series.all:
out = df.groupby('alpha_3_id')['id'].apply(lambda x: x.diff().iloc[1:].eq(1).all())
print (out)
alpha_3_id
DEA True
LIA True
MIL True
Name: id, dtype: bool
out = df.groupby('alpha_3_id')['id'].apply(lambda x: x.diff().iloc[1:].eq(1).all()).all()
print (out)
True
Another idea is use DataFrameGroupBy.diff, remove first values per groups by Series.duplicated, test by 1 and last use Series.all:
out = df.groupby('alpha_3_id')['id'].diff()[df['alpha_3_id'].duplicated()].eq(1).all()
print (out)
True