mutate by order of appearance and conditions by group in R

I have a table that looks like the following

customer_id status amount channel
1 target 15 shop
1 beneficiary 10 shop
1 beneficiary 5 shop
2 target 25 shop
3 beneficiary 30 online
3 beneficiary 15 shop
3 target 10 shop

I want to do the following operations:

  1. create a new variable named flag. Suppose the table is arranged in a time order(because the order of appearance is important in the table). for each customer, we check for the first occurrence of the variable status that is equal to "beneficiary", if we find one, then we check if the channel variable is equal to "shop". If yes, the variable flag should be marked as "taker", otherwise, the variable flag should be marked as "targeted".

  2. create a variable that aggregates all amounts by customer_id.

The resulting table for the above example should look like this:

customer_id ttl_amount flag
1 30 beneficiary
2 25 targeted
3 55 targeted

As you can see, for customer_id =3, even though there is a line where status=="beneficiary" & channel=="shop", the variable flag is marked as "targeted" because the first occurrence of "beneficiary" was having "online" as a channel and NOT "shop".

My approach to the problem is the following: for aggregating the total amount, it’s

group_by(customer_id)%>%mutate(ttl_amount=sum(amount))

To create the flag, it seems that I should also apply group_by(customer_id), but I am having difficulty finding the first occurrence of "beneficiary" for each customer id, and if there is, check if the channel =="shop". How do you think we can solve this with dplyr?

>Solution :

You can check for the first occurrence using which and indexing [1].The coalesce is for when there are no occurances:

library(tidyverse)

data <- read_table("customer_id status  amount  channel
1   target  15  shop
1   beneficiary 10  shop
1   beneficiary 5   shop
2   target  25  shop
3   beneficiary 30  online
3   beneficiary 15  shop
3   target  10  shop") 


data |> 
  group_by(customer_id) |>
  summarise(ttl_amount = sum(amount),
            flag = coalesce(
              if_else(channel[which(status == "beneficiary")[1]] == "shop",
                        "beneficiary", "targeted"), 
              "targeted")
            )
#> # A tibble: 3 x 3
#>   customer_id ttl_amount flag         
#>         <dbl>      <dbl> <chr>      
#> 1           1         30 beneficiary
#> 2           2         25 targeted   
#> 3           3         55 targeted

Leave a Reply