Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Summarize decreases by group only when number does not increase again

I am trying to calculate the number of apples and pears lost from fruit stands (plots) at a market in a given year. Here, a loss would be defined as when the number of apples or pears decreases but remains at that number and does not increase again in that plot for that year. In other words, an apple or pear can be lost from the plot, but if another is added (e.g. re-stock), then this does not constitute a "loss". I am looking to summarize the number of apples lost and the number of pears lost by year, and also by year and plot. The date order is important here (i.e. a loss cannot happen from a date in the future to a date in the past), but I have already sorted my dataset by year so this should not be an issue.

Here is an example of the data:

table <- "date year plot apples pears
1  2021-05-26 2020   a    1      1
2  2021-05-27 2020   a    1      1
3  2021-05-28 2020   a    0      1
4  2021-05-29 2020   a    1      1
5  2021-05-30 2020   a    1      1
6  2021-05-27 2021   b    2      1
7  2021-05-28 2021   b    2      1
8  2021-05-29 2021   b    1      0
9  2021-05-30 2021   b    1      0
10 2021-05-31 2021   b    1      0
11 2021-05-27 2021   c    1      0
12 2021-05-28 2021   c    1      1
13 2021-05-29 2021   c    0      1
14 2021-05-30 2021   c    0      1
15 2021-05-31 2021   c    0      1"

Based on this example, you would expect:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  1. In 2020, there were no apples lost and no pears lost (the number did
    not decrease and remain at that decreased number).
  2. In 2021, there were two apples lost (one in plot b and one in plot c) and 1 pear lost (in plot b)

Which as an output would look similar to this summarized by year:

table <- "date year apples.lost pears.lost
1  2020   0      0
2  2021   2      1"

Or this if also grouped by plot:

table <- "date year plot apples.lost pears.lost
1  2020   a    0      0
2  2021   b    1      1
3  2021   c    1      0"

I have spent hours trying to figure out how to do this and I cannot come up with viable code. I can calculate increases/decreases in datasets, based on resources such as this, but I cannot seem to find a way to work in counting only decreases that remain at that number for the remainder of the year in that specific plot.

>Solution :

Using the dplyr package:

library(dplyr)

df %>%
  group_by(year, plot) %>%
  summarise(apples.lost = max(first(apples) - last(apples), 0),
            pears.lost = max(first(pears) - last(pears), 0)) %>%
  ungroup()

#> # A tibble: 3 x 4
#>    year plot  apples.lost pears.lost
#>   <dbl> <chr>       <dbl>      <dbl>
#> 1  2020 a               1          0
#> 2  2021 b               1          1
#> 3  2021 c               1          0

To get the total sum per year, you’d summarise it again:

df %>% 
  group_by(year, plot) %>%
  summarise(apples.lost = max(first(apples) - last(apples), 0),
            pears.lost = max(first(pears) - last(pears), 0)) %>%
  group_by(year) %>%
  summarise(apples.lost = sum(apples.lost),
            pears.lost = sum(pears.lost))

#> # A tibble: 2 x 3
#>    year apples.lost pears.lost
#>   <dbl>       <dbl>      <dbl>
#> 1  2020           1          0
#> 2  2021           2          1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading