If I have the following two objects:
> set.seed(100)
> lookup <- sample(1:3, 20, replace=T)
> lookup
[1] 2 3 2 3 1 2 2 3 2 2 3 2 2 3 3 3 3 2 1 3
and
> tb <- tibble(A=runif(20,0,1), B=runif(20,0,1), C= runif(20,0,1))
> tb
> tb
# A tibble: 20 Ă— 3
A B C
<dbl> <dbl> <dbl>
1 0.770 0.780 0.456
2 0.882 0.884 0.445
3 0.549 0.208 0.245
4 0.278 0.307 0.694
5 0.488 0.331 0.412
6 0.929 0.199 0.328
7 0.349 0.236 0.573
8 0.954 0.275 0.967
9 0.695 0.591 0.662
10 0.889 0.253 0.625
11 0.180 0.123 0.857
12 0.629 0.230 0.775
13 0.990 0.598 0.834
14 0.130 0.211 0.0915
15 0.331 0.464 0.460
16 0.865 0.647 0.599
17 0.778 0.961 0.920
18 0.827 0.676 0.983
19 0.603 0.445 0.0378
20 0.491 0.358 0.578
How do I use lookup to select the value of the corresponding row/column from tb?
i.e.
- if the first element of
lookup= 1 then I would like to select the value in A from the first row oftb - if the second element of
lookup= 2 then I would like to select the value in B from the second row oftb
So I should end up with a 1d vector that is the same size as lookup. It will look like this:
> new data
> [1] 0.780 0.445 0.208 0.694 0.488 ... 0.578
Thanks!
>Solution :
data.frame (but not tibble or data.table) supports indexing on a matrix, so with this data,
set.seed(42)
lookup <- sample(1:3, 20, replace=T)
lookup
# [1] 1 1 1 1 2 2 2 1 3 3 1 1 2 2 2 3 3 1 1 3
tb <- tibble(A=runif(20,0,1), B=runif(20,0,1), C= runif(20,0,1))
head(tb)
# # A tibble: 6 x 3
# A B C
# <dbl> <dbl> <dbl>
# 1 0.514 0.958 0.189
# 2 0.390 0.888 0.271
# 3 0.906 0.640 0.828
# 4 0.447 0.971 0.693
# 5 0.836 0.619 0.241
# 6 0.738 0.333 0.0430
We can do
as.data.frame(tb)[cbind(seq_along(lookup), lookup)]
# [1] 0.514211784 0.390203467 0.905738131 0.446969628 0.618838207 0.333427211 0.346748248 0.388108283 0.479398564
# [10] 0.197410342 0.832916080 0.007334147 0.171264330 0.261087964 0.514412935 0.581604003 0.157905208 0.037431033
# [19] 0.973539914 0.775823363
A less-efficient method can be done without as.data.frame:
mapply(`[[`, list(tb), seq_along(lookup), lookup)
# [1] 0.514211784 0.390203467 0.905738131 0.446969628 0.618838207 0.333427211 0.346748248 0.388108283 0.479398564
# [10] 0.197410342 0.832916080 0.007334147 0.171264330 0.261087964 0.514412935 0.581604003 0.157905208 0.037431033
# [19] 0.973539914 0.775823363
## also works with `list(as.data.table(tb))`
Though it does take a big hit in performance (not a surprise):
bench::mark(
sindri_baldur1 = unlist(tb, use.names = FALSE)[seq_along(lookup) + (lookup - 1L)*nrow(tb)],
sindri_baldur2 = unlist(tb)[seq_along(lookup) + (lookup - 1L)*nrow(tb)],
base = as.data.frame(tb)[cbind(seq_along(lookup), lookup)],
mapply = mapply(`[[`, list(tb), seq_along(lookup), lookup),
paulsmith2 = {
tb %>%
mutate(lookup = lookup) %>%
rowwise %>%
mutate(new = c_across(A:C)[lookup]) %>%
pull(new)
},
check = FALSE)
# # A tibble: 5 x 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 sindri_baldur1 4.5us 5.3us 159430. 736B 15.9 9999 1 62.7ms <NULL> <Rprof~ <benc~ <tibb~
# 2 sindri_baldur2 13.2us 14.7us 56723. 1.44KB 0 10000 0 176.3ms <NULL> <Rprof~ <benc~ <tibb~
# 3 base 78.3us 91.6us 7334. 944B 8.59 3414 4 465.5ms <NULL> <Rprof~ <benc~ <tibb~
# 4 mapply 612.4us 779.45us 942. 720B 6.39 442 3 469.4ms <NULL> <Rprof~ <benc~ <tibb~
# 5 paulsmith2 4.37ms 5.85ms 147. 20.3KB 6.51 68 3 461.1ms <NULL> <Rprof~ <benc~ <tibb~
(I have to use check=FALSE to work with the names introduced in sindri_baldur2, otherwise all results are numerically identical.)