Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

finding the value of new column based on values and conditions in another columns

suppose I have following data :

df<- data.frame(
  id = c(1,1,1,1,2,2,2,3,3,4,4,4,5,5,5,5,5,6),
  age =c (41,43,20,8,33,30,6,45,90,60,7,3,50,50,7,1,83,45),
  activity = c(412,442,0,0,419,413,0,421,0,412,0,0,421,411,0,0,0,425),
  minute = c(10,15,0,0,20,15,0,25,0,20,0,0,25,20,0,0,0,30)
)

What I want is to determine the value of column consumers, people who benefit from activities which produced by producers. The column id is the same for all members of households. The activity column shows what kind of activity is done by a members; activities 412, 442, 413, 419, and 411 are activities that are done only for children, people under the age of 15, while 421 and 425 are activities done for adults, those over the age of 15. People whose value in the 'activity column is non-zero are producers, and others are potential consumers. The minute column shows the amount of time spent on an activity. When I identify consumers, I have to assign the value of the ‘minute’ column to consumers. If in a family there is more than one consumer, I have to assign the same amount of time to all consumers.
my expected output is like this:

    id age activity minute consumers
1   1  41      412     10         0
2   1  43      442     15         0
3   1  20        0      0        25
4   1   8        0      0        25
5   2  33      419     20         0
6   2  30      413     15         0
7   2   6        0      0        35
8   3  45      421     25         0
9   3  90        0      0        25
10  4  60      412     20         0
11  4   7        0      0        20
12  4   3        0      0        20
13  5  50      421     25         0
14  5  50      411     20         0
15  5   7        0      0        20
16  5   1        0      0        20
17  5  83        0      0        25
18  6  45      425     30        30

I have tried by following code but it seems not based on my expected output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df <- df %>%
  group_by(id) %>%
  mutate(
    consumers = ifelse(activity == 0 & age > 15, sum(minute), 0)
  )

thank you in advance.

>Solution :

I think we want to sum up different activities based on the row’s age value and activitys of producers.

  • if an activity is non-zero, then consumers should be 0
  • if age < 15, then we need to sum up minutes for those activities in your "only for children" list
  • similarly if age >= 15 we need to sum up minutes for the other activities

This means that in your expected output, row 3 (age 20) should have 0 assigned, since both of the activity values in id=1 are "only for children".

I think this is a good time to use case_when.

library(dplyr)
df |>
  group_by(id) |>
  mutate(
    consumers = case_when(
      activity > 0 ~ 0, 
      age < 15 ~ sum(minute[activity %in% c(411, 412, 413, 419, 442)]),
      age >= 15 ~ sum(minute[activity %in% c(421, 425)])
    )
  ) |>
  ungroup()
# # A tibble: 18 × 5
#       id   age activity minute consumers
#    <dbl> <dbl>    <dbl>  <dbl>     <dbl>
#  1     1    41      412     10         0
#  2     1    43      442     15         0
#  3     1    20        0      0         0
#  4     1     8        0      0        25
#  5     2    33      419     20         0
#  6     2    30      413     15         0
#  7     2     6        0      0        35
#  8     3    45      421     25         0
#  9     3    90        0      0        25
# 10     4    60      412     20         0
# 11     4     7        0      0        20
# 12     4     3        0      0        20
# 13     5    50      421     25         0
# 14     5    50      411     20         0
# 15     5     7        0      0        20
# 16     5     1        0      0        20
# 17     5    83        0      0        25
# 18     6    45      425     30         0

The 0 in sum(c(0, minute[..])) is to safeguard against minute[..] having no matches (so no numbers to sum up). If you’d

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading