I have the following df:
df<- structure(list(X18.digit.contact.id = c("0034y00002kIZ3rAAG",
"0034y00002kIZ3rAAG", "0034y00002kIZ3rAAG", "0034y00002PpX11AAF",
"0034y00002PpX11AAF", "0034y00002PpX11AAF", "0034y00002jHjYKAA0",
"0034y00002jHjYKAA0", "0034y00002jHjYKAA0"), `Fitness Goal` = c(2L,
NA, NA, -1L, NA, NA, NA, 1L, NA), `Nutrition/Hydration Goal` = c(NA,
NA, NA, NA, 0L, NA, 2L, NA, NA), `Lifestyle Goal` = c(NA, NA,
2L, NA, NA, 0L, NA, NA, 1L)), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -9L), groups = structure(list(
X18.digit.contact.id = c("0034y00002PpX11AAF", "0034y00002jHjYKAA0",
"0034y00002kIZ3rAAG"), .rows = structure(list(4:6, 7:9, 1:3), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))
I’m hoping to collapse these rows together based on the X18contactid column. So, the final df would consist of 3 rows (3 unique IDs), and four columns (id, fitness goal, nutrition/hydration, lifestyle).
I’ve tried various solutions that I’ve found on here from summarize functions to
lapply: df2<- df[, lapply(.SD, paste0, collapse=""), by=X18.digit.contact.id]
However, none of them seem to work. I can pivot it longer, remove the NA’s and then pivot it wide again which gives me the row format that I want, however, it gives the following warning message which restricts the functions I am able to do on the columns afterwards:
Values from `value` are not uniquely identified; output will contain list-cols.
Note: I can get rid of this warning message by assigning a unique row number to each when I use pivot_wider
, however, the output is the same as df
, where the rows are not grouped by the id variable.
Any help/insight with this would be appreciated.
>Solution :
in Base R:
aggregate(.~X18.digit.contact.id, df, na.omit, na.action = identity)
X18.digit.contact.id Fitness Goal Nutrition/Hydration Goal Lifestyle Goal
1 0034y00002jHjYKAA0 1 2 1
2 0034y00002kIZ3rAAG 2 1 2
3 0034y00002PpX11AAF -1 0 0
pivot_longer(df,-1, values_drop_na = TRUE)%>%
pivot_wider()
# A tibble: 3 × 4
# Groups: X18.digit.contact.id [3]
X18.digit.contact.id `Fitness Goal` `Nutrition/Hydration Goal` `Lifestyle Goal`
<chr> <int> <int> <int>
1 0034y00002kIZ3rAAG 2 1 2
2 0034y00002PpX11AAF -1 0 0
3 0034y00002jHjYKAA0 1 2 1
In datatable
dcast(na.omit(melt(df, 1)),X18.digit.contact.id~variable)
X18.digit.contact.id Fitness Goal Nutrition/Hydration Goal Lifestyle Goal
1: 0034y00002PpX11AAF -1 0 0
2: 0034y00002jHjYKAA0 1 2 1
3: 0034y00002kIZ3rAAG 2 1 2