Using pandas, I have created a dataframe that contains the info about weed severity by field. Every field can contain various types of weed. Records are sorted by [field, weed, date] in order to observe the evolution of weed. Ideally, the severity of every weed should be supressed as the time goes by. In order to detect critical items, I would like to compare two consecutive weed severities, grouped by field and weed type and annotate the supression status as the difference between the two observed severities.
Below is the example data
Field Date Weed Severity
1 2021-06-08 Cirsium arvense 3
1 2021-07-06 Cirsium arvense 3
1 2021-07-27 Cirsium arvense 2
1 2021-08-19 Cirsium arvense 1
1 2021-07-26 Lactuca Serriola 3
1 2021-06-08 Myagrum perfoliatum 2
1 2021-07-06 Sonchus oleraceus 3
1 2021-08-19 Sorghum halepense 2
1 2021-07-26 Xanthium strumarium 2
1 2021-07-27 Xanthium strumarium 2
1 2021-08-19 Xanthium strumarium 1
1 2020-07-08 Euphorbia cyparissias 1
1 2020-07-08 Matricaria chamomilla 1
1 2020-07-08 Polygonum Lapathifolium 1
1 2020-07-08 Sorghum halepense 5
1 2020-07-08 Xanthium strumarium 1
2 2021-09-14 Ambrosia artemisifolia 2
2 2021-05-29 Chenopodium album 1
2 2021-05-29 Cirsium arvense 1
2 2021-09-14 Echinochola crus gali 4
2 2021-05-29 Phragmites Australis 1
2 2021-05-29 Polygonum spp 1
2 2021-05-29 Setaria viridis 2
2 2021-05-29 Sinapis arvensis 1
2 2021-07-06 Sorghum halepense 4
2 2021-09-14 Sorghum halepense 3
2 2021-05-29 Xanthium strumarium 1
2 2021-09-14 Xanthium strumarium 3
On field 1, Cirsium arvense is successfully supressed as the severity decreases, thus the success rate should be (for rows 1 -4) [0, 0, 1, 1].
On the other hand, for example, Xanthium strumarium on field 2 (last two rows) is not successfully supressed and thus the values should be [0, -2].
In case of unconditionally comparing consecutive rows, I usually shift the column values and subtract the shifted column from the original. In case of observing the column, depending on other values, I wonder, what is a good way to observe the differences. Say rows 4 and 5 can’t be simply compared as the severity values of different weeds shouldn’t be compared.
>Solution :
Subtract from right side by shifted values by Series.rsub with DataFrameGroupBy.shift:
df['rate'] = (df['Severity'].rsub(df.groupby(['Field','Weed'])['Severity'].shift())
.fillna(0, downcast='infer'))
print (df)
Field Date Weed Severity rate
0 1 2021-06-08 Cirsium arvense 3 0
1 1 2021-07-06 Cirsium arvense 3 0
2 1 2021-07-27 Cirsium arvense 2 1
3 1 2021-08-19 Cirsium arvense 1 1
4 1 2021-07-26 Lactuca Serriola 3 0
5 1 2021-06-08 Myagrum perfoliatum 2 0
6 1 2021-07-06 Sonchus oleraceus 3 0
7 1 2021-08-19 Sorghum halepense 2 0
8 1 2021-07-26 Xanthium strumarium 2 0
9 1 2021-07-27 Xanthium strumarium 2 0
10 1 2021-08-19 Xanthium strumarium 1 1
11 1 2020-07-08 Euphorbia cyparissias 1 0
12 1 2020-07-08 Matricaria chamomilla 1 0
13 1 2020-07-08 Polygonum Lapathifolium 1 0
14 1 2020-07-08 Sorghum halepense 5 0
15 1 2020-07-08 Xanthium strumarium 1 0
16 2 2021-09-14 Ambrosia artemisifolia 2 0
17 2 2021-05-29 Chenopodium album 1 0
18 2 2021-05-29 Cirsium arvense 1 0
19 2 2021-09-14 Echinochola crus gali 4 0
20 2 2021-05-29 Phragmites Australis 1 0
21 2 2021-05-29 Polygonum spp 1 0
22 2 2021-05-29 Setaria viridis 2 0
23 2 2021-05-29 Sinapis arvensis 1 0
24 2 2021-07-06 Sorghum halepense 4 0
25 2 2021-09-14 Sorghum halepense 3 1
26 2 2021-05-29 Xanthium strumarium 1 0
27 2 2021-09-14 Xanthium strumarium 3 -2