Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Check if all rows are equal by group ID and return boolean value

I have a data frame where a unique ID is given to each unique instance where there is a string in either title.1 or title.2. Each ID is coded with one or more names. See below:

title.1 title.2 name ID
A A1 fruit 1
A A1 fruit 1
B1 fruit 2
B fruit, vegetable 3
C C1 vegetable, poultry, grain 4
C C1 vegetable, poultry 4
C C1 vegetable, poultry 4
D1 poultry 5
D1 vegetable 5

I need to identify which IDs have the same name across rows and which do not. To do this, I’d like to group by ID and test to see if all name values are the same across all rows with that ID. Then, I’d like to append a new column with a boolean value indicating which IDs meet this condition and which do not. The output should look like this:

title.1 title.2 name ID names.equal
A A1 fruit 1 TRUE
A A1 fruit 1 TRUE
B1 fruit 2 TRUE
B fruit, vegetable 3 TRUE
C C1 vegetable, poultry, grain 4 FALSE
C C1 vegetable, poultry 4 FALSE
C C1 vegetable, poultry 4 FALSE
D1 poultry 5 FALSE
D1 vegetable 5 FALSE

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We may use n_distinct on name to get the unique count and create logical with the count after grouping by ID

library(dplyr)
df1 %>%
   group_by(ID) %>%
   mutate(names.equal = n_distinct(name) == 1) %>%
   ungroup

-output

# A tibble: 9 × 5
  title.1 title.2 name                         ID names.equal
  <chr>   <chr>   <chr>                     <int> <lgl>      
1 A       A1      fruit                         1 TRUE       
2 A       A1      fruit                         1 TRUE       
3 <NA>    B1      fruit                         2 TRUE       
4 B       <NA>    fruit, vegetable              3 TRUE       
5 C       C1      vegetable, poultry, grain     4 FALSE      
6 C       C1      vegetable, poultry            4 FALSE      
7 C       C1      vegetable, poultry            4 FALSE      
8 <NA>    D1      poultry                       5 FALSE      
9 <NA>    D1      vegetable                     5 FALSE     

data

df1 <- structure(list(title.1 = c("A", "A", NA, "B", "C", "C", "C", 
NA, NA), title.2 = c("A1", "A1", "B1", NA, "C1", "C1", "C1", 
"D1", "D1"), name = c("fruit", "fruit", "fruit", "fruit, vegetable", 
"vegetable, poultry, grain", "vegetable, poultry", "vegetable, poultry", 
"poultry", "vegetable"), ID = c(1L, 1L, 2L, 3L, 4L, 4L, 4L, 5L, 
5L)), class = "data.frame", row.names = c(NA, -9L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading