I’m struggling on how can I calculate the wear of a component using the lag of a variable. However, I need to calculate the wear on different groups, so I’m using the group_by function, but here’s a problem, when I use the variable that I need to group, this results in a column of "NA’s", but when I test by grouping one another variable that has fewer factors the calculation works.
The dataframe I’m using has 4093902 rows and 52 lines. The variable I need to group to perform my wear calculation has 90183 factors. The other one that I tested and it worked had 11321 factors.
Here’s the code I’m using:
final_date = result_data %>%
arrange((time)) %>%
group_by(id_specific)%>%
mutate(wear = dplyr::lag(some_value, n = 1, default = NA) - some_value)
Does anyone know if there is a factor limit for grouping? Or any other tips on how I can perform this calculation?
>Solution :
The NA can be a result of either lag which returns the first value by default as NA or from the other column value which can also be NA. Thus, when we do the - (or any arithmetic) if there is any NA in the lhs or rhs, it returns NA. One option is to make use of a function (rowSums) that can use na.rm = TRUE
library(dplyr)
final_date <- result_data %>%
arrange((time)) %>%
group_by(id_specific)%>%
mutate(some_value_new = dplyr::lag(some_value, n = 1,
default = NA)) %>%
ungroup %>%
mutate(wear = rowSums(cbind(some_value_new, -1 * some_value),
na.rm = TRUE), some_value_new = NULL)
NOTE: It is also better to ungroup before doing the rowSums to get some efficiency