taking sum of rows in R based on conditions

Advertisements

I have a data in this format

ColA ColB ColC
A 2 1
A 1 1
B 3 2
B 5 2
C 2 3
C 5 3
A 1 1
A 3 1
B 7 2
B 1 2

I want to get a new column with the sum of the rows of ColB, something like this:

ColA ColB ColC ColD
A 2 1 3
A 1 1 3
B 3 2 8
B 5 2 8
C 2 3 7
C 5 3 7
A 1 1 4
A 3 1 4
B 7 2 8
B 1 2 8

Thanks much for your help!

I tried

df$ColD <- with(df, sum(ColB[ColC == 1]))

>Solution :

It seems to me that you want ColD to have the sum of ColB for each consecutive group defined by the values in ColA. In which case, we may do:

library(dplyr)

df %>% 
  mutate(group = data.table::rleid(ColA)) %>%
  group_by(group) %>%
  mutate(ColD = sum(ColB)) %>%
  ungroup() %>%
  select(-group)
#> # A tibble: 10 x 4
#>    ColA   ColB  ColC  ColD
#>    <chr> <int> <int> <int>
#>  1 A         2     1     3
#>  2 A         1     1     3
#>  3 B         3     2     8
#>  4 B         5     2     8
#>  5 C         2     3     7
#>  6 C         5     3     7
#>  7 A         1     1     4
#>  8 A         3     1     4
#>  9 B         7     2     8
#> 10 B         1     2     8

This, at any rate, is the same as the expected output.

Created on 2023-01-16 with reprex v2.0.2


Data from question in reproducible format

df <- structure(list(ColA = c("A", "A", "B", "B", "C", "C", "A", "A", 
"B", "B"), ColB = c(2L, 1L, 3L, 5L, 2L, 5L, 1L, 3L, 7L, 1L), 
    ColC = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L)), 
class = "data.frame", row.names = c(NA, -10L))

Leave a ReplyCancel reply