suppose I have following data :
df<- data.frame(
id = c(1,1,1,1,2,2,2,3,3,4,4,4,5,5,5,5,5,6),
age =c (41,43,20,8,33,30,6,45,90,60,7,3,50,50,7,1,83,45),
activity = c(412,442,0,0,419,413,0,421,0,412,0,0,421,411,0,0,0,425),
minute = c(10,15,0,0,20,15,0,25,0,20,0,0,25,20,0,0,0,30)
)
What I want is to determine the value of column consumers, people who benefit from activities which produced by producers. The column id is the same for all members of households. The activity column shows what kind of activity is done by a members; activities 412, 442, 413, 419, and 411 are activities that are done only for children, people under the age of 15, while 421 and 425 are activities done for adults, those over the age of 15. People whose value in the 'activity column is non-zero are producers, and others are potential consumers. The minute column shows the amount of time spent on an activity. When I identify consumers, I have to assign the value of the ‘minute’ column to consumers. If in a family there is more than one consumer, I have to assign the same amount of time to all consumers.
my expected output is like this:
id age activity minute consumers
1 1 41 412 10 0
2 1 43 442 15 0
3 1 20 0 0 25
4 1 8 0 0 25
5 2 33 419 20 0
6 2 30 413 15 0
7 2 6 0 0 35
8 3 45 421 25 0
9 3 90 0 0 25
10 4 60 412 20 0
11 4 7 0 0 20
12 4 3 0 0 20
13 5 50 421 25 0
14 5 50 411 20 0
15 5 7 0 0 20
16 5 1 0 0 20
17 5 83 0 0 25
18 6 45 425 30 30
I have tried by following code but it seems not based on my expected output:
df <- df %>%
group_by(id) %>%
mutate(
consumers = ifelse(activity == 0 & age > 15, sum(minute), 0)
)
thank you in advance.
>Solution :
I think we want to sum up different activities based on the row’s age value and activitys of producers.
- if an
activityis non-zero, thenconsumersshould be0 - if
age < 15, then we need to sum up minutes for those activities in your "only for children" list - similarly if
age >= 15we need to sum up minutes for the other activities
This means that in your expected output, row 3 (age 20) should have 0 assigned, since both of the activity values in id=1 are "only for children".
I think this is a good time to use case_when.
library(dplyr)
df |>
group_by(id) |>
mutate(
consumers = case_when(
activity > 0 ~ 0,
age < 15 ~ sum(minute[activity %in% c(411, 412, 413, 419, 442)]),
age >= 15 ~ sum(minute[activity %in% c(421, 425)])
)
) |>
ungroup()
# # A tibble: 18 × 5
# id age activity minute consumers
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 41 412 10 0
# 2 1 43 442 15 0
# 3 1 20 0 0 0
# 4 1 8 0 0 25
# 5 2 33 419 20 0
# 6 2 30 413 15 0
# 7 2 6 0 0 35
# 8 3 45 421 25 0
# 9 3 90 0 0 25
# 10 4 60 412 20 0
# 11 4 7 0 0 20
# 12 4 3 0 0 20
# 13 5 50 421 25 0
# 14 5 50 411 20 0
# 15 5 7 0 0 20
# 16 5 1 0 0 20
# 17 5 83 0 0 25
# 18 6 45 425 30 0
The 0 in sum(c(0, minute[..])) is to safeguard against minute[..] having no matches (so no numbers to sum up). If you’d