Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Summarizing table data with repetitive and nonrepetitive IDs

I have a dataset with building IDs, each ID represents a structure (building, house and etc.). Now if a building has multiple units/condos/apartments then the IDs will be repeated for each of the units.

For each district, 1 and 6 in this case:

  1. I am trying to separate repetitive IDs from nonrepetitive IDs.
  2. And then count the total number of each repetitive ID and sum their
    dollar value.
  3. Lastly create two new columns, ID_Count and Floor. Whenever an
    ID is repeated more than once, ID_Count will be greater than 1
    and Floor will have a value multistory

Based on the sample data below how can I do this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    structure(list(ID = c(1, 1, 1, 2, 2, 3, 4, 4, 4, 5, 6, 11, 11, 
11, 22, 22, 33, 44, 44, 44, 55, 66), Zoning = c("Residential", 
"Residential", "Residential", "Commercial", "Commercial", "Residential", 
"Miscellaneous", "Miscellaneous", "Miscellaneous", "Commercial", 
"Residential", "Residential", "Residential", "Residential", "Commercial", 
"Commerical", "Residential", "Miscellaneous", "Miscellaneous", 
"Miscellaneous", "Commercial", "Residential"), District = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6
), Value = c(11111, 12111, 13111, 14111, 16111, 17111, 18111, 
19111, 22222, 33333, 11111, 12111, 13111, 14111, 16111, 17111, 
18111, 19111, 22222, 33333, 44444, 55555)), class = "data.frame", row.names = c(NA, 
-22L))

Desired Output

ID  Zoning        District Value  ID_Count  Floor
1   Residential   1        36333  3         multistory
2   Commercial    1        30222  2         multistory
3   Residential   1        17111  1         singlestory
4   Miscellaneous 1        59444  3         multistory
5   Commercial    1        33333  1         singlestory
6   Residential   1        11111  1         singlestory
11  Residential   6        36333  3         multistory
22  Commercial    6        30222  2         multistory
33  Residential   6        17111  1         singlestory
44  Miscellaneous 6        59444  3         multistory
55  Commercial    6        33333  1         singlestory
66  Residential   6        11111  1         singlestory 

>Solution :

With dplyr

library(dplyr)
df %>% group_by(ID,Zoning,District) %>% summarise(n = n(), Value = sum(Value), Floor = ifelse(n>1,"multistory","singlestory"))
# A tibble: 12 × 6
# Groups:   ID, Zoning [12]
#      ID Zoning        District     n Value Floor      
#   <dbl> <chr>            <dbl> <int> <dbl> <chr>      
# 1     1 Residential          1     3 36333 multistory 
# 2     2 Commercial           1     2 30222 multistory 
# 3     3 Residential          1     1 17111 singlestory
# 4     4 Miscellaneous        1     3 59444 multistory 
# 5     5 Commercial           1     1 33333 singlestory
# 6     6 Residential          1     1 11111 singlestory
# 7    11 Residential          6     3 39333 multistory 
# 8    22 Commercial           6     2 33222 multistory 
# 9    33 Residential          6     1 18111 singlestory
#10    44 Miscellaneous        6     3 74666 multistory 
#11    55 Commercial           6     1 44444 singlestory
#12    66 Residential          6     1 55555 singlestory
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading