The example data is shown as below:
| id | a | b | n1 | n2 |
|---|---|---|---|---|
| 1 | 1 | 1 | 10 | 20 |
| 2 | 2 | 2 | 20 | 40 |
| 3 | 0 | 0 | 10 | 20 |
| 4 | 1 | 0 | 20 | 40 |
| 5 | 0 | 1 | 10 | 20 |
I need to calculate score k1 and k2 in R.
Assuming C is a constant.
k1=(a/b)/(n1/n2+C)
k2=(a/b)/(n1+n2+C)
Because row3 is double-arm zero data, k1 and k2 will be NA. If k1 or k2 is NA, an alternative formula will be used:
k1=n1/(n1+n2)
k2=n2/(n1+n2)
What I did is using for loop to locate the exact value in every single cell. But it will be very slow when applied to a huge dataset. apply function seems to be a faster method. But I’m too naive to create a runnable function for apply(data, 1, function). I don’t know what kind of input should be given into apply. Is there any elegant and faster way to do this job except for the for loop? Thank you so much.
My code is pasted below:
k1 = c()
k2 = c()
C = 0.25
for (i in 1:nrow(data)){
k1[i] = (data[i,"a"]/data[i,"b"])/(data[i,"n1"]/data[i,"n2"]+C)
k2[i] = (data[i,"a"]/data[i,"b"])/(data[i,"n1"]+data[i,"n2"]+C)
if (is.na(k1[i])){
k1[i] = data[i,"n1"]/(data[i,"n1"]+data[i,"n2"])
}
if (is.na(k2[i])){
k2[i] = data[i,"n2"]/(data[i,"n1"]+data[i,"n2"])
}
}
>Solution :
You can use the mutate() function from {dplyr}:
# Calculate k1 and k2
data <- data %>%
# Perform calculation
mutate(k1 = (a/b)/(n1/n2+C), # k1
k2 = (a/b)/(n1+n2+C), # k2
k1 = ifelse(is.na(k1), n1/(n1+n2), k1), # Other formula for k1 if k1 is NA
k2 = ifelse(is.na(k2), n2/(n1+n2), k2)) # Other formula for k2 if k2 is NA
This gives me the same as your code returned, but is more efficient:
# A tibble: 5 × 6
a b n1 n2 k1 k2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 10 20 1.33 0.0331
2 2 2 20 40 1.33 0.0166
3 0 0 10 20 0.333 0.667
4 1 0 20 40 Inf Inf
5 0 1 10 20 0 0