I have a dataset that looks like this:
ID countries income
1 x HIC
1 x HIC
1 y LIC
1 z UIC
2 a HIC
3 y LIC
3 z UIC
I want to filter to keep unique income rows PER ID such that I would eliminate only the repeat income values within each ID, but not across all IDs; thus the data would look like:
ID countries income
1 x HIC
1 y LIC
1 z UIC
2 a HIC
3 y LIC
3 z UIC
the current command I have now is this, but this removes all repeats of income values. What can I tweak so that it’s only unique by ID?
dat2 <- dat1 %>% distinct(income, .keep_all = TRUE)
2nd question, if I wanted to further merge the rows such that the above data would look like this, how could I condense the countries and income rows by ID? Could also create a new variable to achieve this?
ID countries income
1 x,y,z HIC,LIC, UIC
2 a HIC
3 y,z LIC,UIC
Thank you in advance!
>Solution :
You may try
library(dplyr)
df %>%
group_by(ID) %>%
distinct(income, .keep_all = TRUE) %>%
summarise(across(everything(), ~paste0(.x, collapse = ",")))
ID countries income
<int> <chr> <chr>
1 1 x,y,z HIC,LIC,UIC
2 2 a HIC
3 3 y,z LIC,UIC