Home finding the value of new column based on values and conditions in another columns

Questions

finding the value of new column based on values and conditions in another columns

February 24, 2024

suppose I have following data :

df<- data.frame(
  id = c(1,1,1,1,2,2,2,3,3,4,4,4,5,5,5,5,5,6),
  age =c (41,43,20,8,33,30,6,45,90,60,7,3,50,50,7,1,83,45),
  activity = c(412,442,0,0,419,413,0,421,0,412,0,0,421,411,0,0,0,425),
  minute = c(10,15,0,0,20,15,0,25,0,20,0,0,25,20,0,0,0,30)
)

What I want is to determine the value of column consumers, people who benefit from activities which produced by producers. The column id is the same for all members of households. The activity column shows what kind of activity is done by a members; activities 412, 442, 413, 419, and 411 are activities that are done only for children, people under the age of 15, while 421 and 425 are activities done for adults, those over the age of 15. People whose value in the 'activity column is non-zero are producers, and others are potential consumers. The minute column shows the amount of time spent on an activity. When I identify consumers, I have to assign the value of the ‘minute’ column to consumers. If in a family there is more than one consumer, I have to assign the same amount of time to all consumers.
my expected output is like this:

    id age activity minute consumers
1   1  41      412     10         0
2   1  43      442     15         0
3   1  20        0      0        25
4   1   8        0      0        25
5   2  33      419     20         0
6   2  30      413     15         0
7   2   6        0      0        35
8   3  45      421     25         0
9   3  90        0      0        25
10  4  60      412     20         0
11  4   7        0      0        20
12  4   3        0      0        20
13  5  50      421     25         0
14  5  50      411     20         0
15  5   7        0      0        20
16  5   1        0      0        20
17  5  83        0      0        25
18  6  45      425     30        30

I have tried by following code but it seems not based on my expected output:

df <- df %>%
  group_by(id) %>%
  mutate(
    consumers = ifelse(activity == 0 & age > 15, sum(minute), 0)
  )

thank you in advance.

>Solution :

I think we want to sum up different activities based on the row’s age value and activitys of producers.

if an activity is non-zero, then consumers should be 0
if age < 15, then we need to sum up minutes for those activities in your "only for children" list
similarly if age >= 15 we need to sum up minutes for the other activities

This means that in your expected output, row 3 (age 20) should have 0 assigned, since both of the activity values in id=1 are "only for children".

I think this is a good time to use case_when.

library(dplyr)
df |>
  group_by(id) |>
  mutate(
    consumers = case_when(
      activity > 0 ~ 0, 
      age < 15 ~ sum(minute[activity %in% c(411, 412, 413, 419, 442)]),
      age >= 15 ~ sum(minute[activity %in% c(421, 425)])
    )
  ) |>
  ungroup()
# # A tibble: 18 × 5
#       id   age activity minute consumers
#    <dbl> <dbl>    <dbl>  <dbl>     <dbl>
#  1     1    41      412     10         0
#  2     1    43      442     15         0
#  3     1    20        0      0         0
#  4     1     8        0      0        25
#  5     2    33      419     20         0
#  6     2    30      413     15         0
#  7     2     6        0      0        35
#  8     3    45      421     25         0
#  9     3    90        0      0        25
# 10     4    60      412     20         0
# 11     4     7        0      0        20
# 12     4     3        0      0        20
# 13     5    50      421     25         0
# 14     5    50      411     20         0
# 15     5     7        0      0        20
# 16     5     1        0      0        20
# 17     5    83        0      0        25
# 18     6    45      425     30         0

The 0 in sum(c(0, minute[..])) is to safeguard against minute[..] having no matches (so no numbers to sum up). If you’d