I’m looking at writing an R function that operates on each row of a dataframe. The function needs to perform an adjustment calculation on a measurement column that corresponds to mass/volume of a food or drink where the adjustment calculation differs depending on if it’s a food or a drink, and if it’s a drink what kind of drink is it.
I’ve written the following small example that shows roughly the code structure I have.
df <- data.frame(type = c("food","drink","drink"), drink_type = c(NA, "ice-cream","soft-drink"), measurement = c(100, 50, 60))
adjuster <- function(row) {
adjusted_val <- switch(tolower(row[['type']]),
"food" = row[['measurement']] * 1.2,
"drink" = drink_converter(row),
stop(paste0("adjuster: Unable to determine food type from value: ",row['type']," from `type` column"))
)
return(adjusted_val)
}
drink_converter <- function(row) {
adjusted_val <- switch(tolower(row[['drink_type']]),
"ice-cream" = row[['measurement']] * 1.09,
"soft-drink" = row[['measurement']] * 1.03,
stop(paste0("adjuster: Unable to determine food type from value: ",row['type']," from `type` column"))
)
return(adjusted_val)
}
This behaves one row at a time.
> adjuster(df[1,])
[1] 120
> adjuster(df[2,])
[1] 54.5
> adjuster(df[3,])
[1] 61.8
But i’m unsure of how to apply it across the whole dataframe in one go. Using apply does not behave correctly because, as my dataframe contains characters using apply coerces everything to characters and you get the following error:
> apply(df, 1, adjuster)
Error in row[["measurement"]] * 1.2 :
non-numeric argument to binary operator
Now I could handle this by wrapping every row[['measurement']] with as.numeric and would get the behaviour I want. However, I want to know if that is sensible or if there is a neater way in base R to accomplish this.
>Solution :
Try with rowwise
library(dplyr)
df %>%
rowwise %>%
mutate(out = adjuster(pick(everything()))) %>%
ungroup
-output
# A tibble: 3 × 4
type drink_type measurement out
<chr> <chr> <dbl> <dbl>
1 food <NA> 100 120
2 drink ice-cream 50 54.5
3 drink soft-drink 60 61.8
apply with MARGIN = 1, converts to matrix and matrix can have only a single class. Instead use lapply on the sequence and subset
lapply(seq_len(nrow(df)), \(i) adjuster(df[i,]))
[[1]]
[1] 120
[[2]]
[1] 54.5
[[3]]
[1] 61.8