I have a nested list sampleList that can contain a variable number of data frames. In this example there are 3 data frames:
df1 <- data.frame(id = as.integer(c(1, 6)), key = c('apple', 'apple.green'), stringsAsFactors=FALSE)
df2 <- data.frame(id = as.integer(c(1, 3, 5)), key = c('apple', 'apple.red', 'apple.red.rotten'), stringsAsFactors = FALSE)
df3 <- data.frame(id = as.integer(c(17)), key = c('orange'), stringsAsFactors = FALSE)
sampleList <- list(df1, df2, df3)
I want to search for specific integers e.g. 6 in the id column across all data frames contained in the sampleList. As a result, I need the position and if possible the associated value from the key column.
The closest I got was the position in a specific data frame e.g. 1.
which(sampleList[[1]] == 6)
[1] 2
Since the number of data frames can be different each time, I need a more dynamic query.
Thanks a lot for your help.
>Solution :
I have slightly altered the data, adding 6 to df3.
df3 <- data.frame(id = as.integer(c(17, 6)), key = c('orange', "blue"), stringsAsFactors = FALSE)
Filter(nrow,
lapply(sampleList, subset, id == 6)
)
[[1]]
id key
2 6 apple.green
[[2]]
id key
2 6 blue
Explanation: We can first subset the list elements based on criteria, and later Filter out those that have nrow of 0, since F == 0.
To extract the positions (stored as rownames of the data.frames),
Filter(nrow,
lapply(sampleList, subset, id == 6)
) |>
lapply(\(x) as.integer(rownames(x)))
To make it clear in which data.frame matches were found,
Filter(nrow,
lapply(sampleList, subset, id == 6) |>
setNames(1:length(sampleList)) # swap to appropriate naming policy
) |>
lapply(\(x) as.integer(rownames(x)))