Summarizing table data with repetitive and nonrepetitive IDs

April 7, 2023

I have a dataset with building IDs, each ID represents a structure (building, house and etc.). Now if a building has multiple units/condos/apartments then the IDs will be repeated for each of the units.

For each district, 1 and 6 in this case:

I am trying to separate repetitive IDs from nonrepetitive IDs.
And then count the total number of each repetitive ID and sum their
dollar value.
Lastly create two new columns, ID_Count and Floor. Whenever an
ID is repeated more than once, ID_Count will be greater than 1
and Floor will have a value multistory

Based on the sample data below how can I do this?

    structure(list(ID = c(1, 1, 1, 2, 2, 3, 4, 4, 4, 5, 6, 11, 11, 
11, 22, 22, 33, 44, 44, 44, 55, 66), Zoning = c("Residential", 
"Residential", "Residential", "Commercial", "Commercial", "Residential", 
"Miscellaneous", "Miscellaneous", "Miscellaneous", "Commercial", 
"Residential", "Residential", "Residential", "Residential", "Commercial", 
"Commerical", "Residential", "Miscellaneous", "Miscellaneous", 
"Miscellaneous", "Commercial", "Residential"), District = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6
), Value = c(11111, 12111, 13111, 14111, 16111, 17111, 18111, 
19111, 22222, 33333, 11111, 12111, 13111, 14111, 16111, 17111, 
18111, 19111, 22222, 33333, 44444, 55555)), class = "data.frame", row.names = c(NA, 
-22L))

Desired Output

ID  Zoning        District Value  ID_Count  Floor
1   Residential   1        36333  3         multistory
2   Commercial    1        30222  2         multistory
3   Residential   1        17111  1         singlestory
4   Miscellaneous 1        59444  3         multistory
5   Commercial    1        33333  1         singlestory
6   Residential   1        11111  1         singlestory
11  Residential   6        36333  3         multistory
22  Commercial    6        30222  2         multistory
33  Residential   6        17111  1         singlestory
44  Miscellaneous 6        59444  3         multistory
55  Commercial    6        33333  1         singlestory
66  Residential   6        11111  1         singlestory

>Solution :

With dplyr

library(dplyr)
df %>% group_by(ID,Zoning,District) %>% summarise(n = n(), Value = sum(Value), Floor = ifelse(n>1,"multistory","singlestory"))
# A tibble: 12 × 6
# Groups:   ID, Zoning [12]
#      ID Zoning        District     n Value Floor      
#   <dbl> <chr>            <dbl> <int> <dbl> <chr>      
# 1     1 Residential          1     3 36333 multistory 
# 2     2 Commercial           1     2 30222 multistory 
# 3     3 Residential          1     1 17111 singlestory
# 4     4 Miscellaneous        1     3 59444 multistory 
# 5     5 Commercial           1     1 33333 singlestory
# 6     6 Residential          1     1 11111 singlestory
# 7    11 Residential          6     3 39333 multistory 
# 8    22 Commercial           6     2 33222 multistory 
# 9    33 Residential          6     1 18111 singlestory
#10    44 Miscellaneous        6     3 74666 multistory 
#11    55 Commercial           6     1 44444 singlestory
#12    66 Residential          6     1 55555 singlestory