Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Combine multiple rows based on ID column

I have the following df:

df<- structure(list(X18.digit.contact.id = c("0034y00002kIZ3rAAG", 
"0034y00002kIZ3rAAG", "0034y00002kIZ3rAAG", "0034y00002PpX11AAF", 
"0034y00002PpX11AAF", "0034y00002PpX11AAF", "0034y00002jHjYKAA0", 
"0034y00002jHjYKAA0", "0034y00002jHjYKAA0"), `Fitness Goal` = c(2L, 
NA, NA, -1L, NA, NA, NA, 1L, NA), `Nutrition/Hydration Goal` = c(NA, 
NA, NA, NA, 0L, NA, 2L, NA, NA), `Lifestyle Goal` = c(NA, NA, 
2L, NA, NA, 0L, NA, NA, 1L)), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -9L), groups = structure(list(
    X18.digit.contact.id = c("0034y00002PpX11AAF", "0034y00002jHjYKAA0", 
    "0034y00002kIZ3rAAG"), .rows = structure(list(4:6, 7:9, 1:3), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))

I’m hoping to collapse these rows together based on the X18contactid column. So, the final df would consist of 3 rows (3 unique IDs), and four columns (id, fitness goal, nutrition/hydration, lifestyle).

I’ve tried various solutions that I’ve found on here from summarize functions to

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 lapply: df2<- df[, lapply(.SD, paste0, collapse=""), by=X18.digit.contact.id] 

However, none of them seem to work. I can pivot it longer, remove the NA’s and then pivot it wide again which gives me the row format that I want, however, it gives the following warning message which restricts the functions I am able to do on the columns afterwards:

Values from `value` are not uniquely identified; output will contain list-cols.

Note: I can get rid of this warning message by assigning a unique row number to each when I use pivot_wider, however, the output is the same as df, where the rows are not grouped by the id variable.

Any help/insight with this would be appreciated.

>Solution :

in Base R:

aggregate(.~X18.digit.contact.id, df, na.omit, na.action = identity)

  X18.digit.contact.id Fitness Goal Nutrition/Hydration Goal Lifestyle Goal
1   0034y00002jHjYKAA0            1                        2              1
2   0034y00002kIZ3rAAG            2                        1              2
3   0034y00002PpX11AAF           -1                        0              0

pivot_longer(df,-1, values_drop_na = TRUE)%>%
   pivot_wider()

# A tibble: 3 × 4
# Groups:   X18.digit.contact.id [3]
  X18.digit.contact.id `Fitness Goal` `Nutrition/Hydration Goal` `Lifestyle Goal`
  <chr>                         <int>                      <int>            <int>
1 0034y00002kIZ3rAAG                2                          1                2
2 0034y00002PpX11AAF               -1                          0                0
3 0034y00002jHjYKAA0                1                          2                1

In datatable

dcast(na.omit(melt(df, 1)),X18.digit.contact.id~variable)
   X18.digit.contact.id Fitness Goal Nutrition/Hydration Goal Lifestyle Goal
1:   0034y00002PpX11AAF           -1                        0              0
2:   0034y00002jHjYKAA0            1                        2              1
3:   0034y00002kIZ3rAAG            2                        1              2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading