I am trying to understand how to properly format a combination of lapply, rbind and do.call in a statement and can’t get the statement to run properly. I have supplied a simple example function and data that I’m using to try to understand the formatting with. I fully understand that the scenario I’ve supplied could be ran using a simpler method, the purpose of this is to simply understand the formatting and how to use lapply and rbind on a custom function.
Here’s some test data:
facility_id patient_number test_result
123 1000 25
123 1000 30
25 1001 12
25 1002 67
25 1010 75
65 1009 8
22 1222 95
22 1223 89
I’m essentially trying to subset the data inside a custom function using a list of facility id values and then want to bind each data table together that results from the custom function.
Here’s the code I’ve used:
facilities_id_list<-c(123, 25)
facility_counts<-function(facilities_id_list){
facility<-facilities_id_list[[i]]
subset<-data[facility_id==facility]
}
results <- do.call("rbind", lapply(seq_along(facilities_id_list), function(i) facility_counts)
The result I’m hoping to achieve:
facility_id patient_number test_result
123 1000 25
123 1000 30
25 1001 12
25 1002 67
25 1010 75
Why does this not work? Do I need to change the formatting?
>Solution :
Instead of using ==, use %in% for direct subsetting
subset(data, facility_id %in% facilities_id_list)
In the OP’s code, there are multiple issues – 1) the input argument is facilities_id_list where as in lapply, we are looping over the sequence i., 2) facility_id==facility should be data$facility_id==facility as we are using [ and there is no data binding, 3) We need to specify that we are subsetting with row index as by default without any ,, it is taken as column index in data.frame
facility_counts<-function(i){
facility<-facilities_id_list[[i]]
data[data$facility_id == facility, ]
}
> do.call(rbind, lapply(seq_along(facilities_id_list), facility_counts))
facility_id patient_number test_result
1 123 1000 25
2 123 1000 30
3 25 1001 12
4 25 1002 67
5 25 1010 75