Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is it possible to interpolate a list of dataframes in r?

According to the answer of lhs,
https://stackoverflow.com/a/72467827/11124121

#From lhs
library(tidyverse)
data("population")

# create some data to interpolate
population_5 <- population %>% 
  filter(year %% 5 == 0) %>% 
  mutate(female_pop = population / 2,
         male_pop = population / 2)

interpolate_func <- function(variable, data) {
  data %>% 
    group_by(country) %>% 
    # can't interpolate if only one year
    filter(n() >= 2) %>% 
    group_modify(~as_tibble(approx(.x$year, .x[[variable]], 
                                   xout = min(.x$year):max(.x$year)))) %>% 
    set_names(c("country", "year", paste0(variable, "_interpolated"))) %>% 
    ungroup()
}

The data that already exists, i.e. year 2000 and 2005 are also interpolated. I want to keep the orginal data and only interpolate the missing parts, that is,

2001-2004 ; 2006-2009

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Therefore, I would like to construct a list:

population_5_list = list(population_5 %>% filter(year %in% c(2000,2005)),population_5 %>% filter(year %in% c(2005,2010)))

And impute the dataframes in the list one by one.

However, a error appeared:

Error in UseMethod("group_by") :
no applicable method for 'group_by' applied to an object of class "list"

I am wondering how can I change the interpolate_func into purrr format, in order to apply to list.

>Solution :

We need to loop over the list with map

library(purrr)
library(dplyr)
map(population_5_list,  
   ~ map(vars_to_interpolate, interpolate_func, data = .x) %>% 
        reduce(full_join, by = c("country", "year")))

-output

[[1]]
# A tibble: 1,266 × 5
   country      year population_interpolated female_pop_interpolated male_pop_interpolated
   <chr>       <int>                   <dbl>                   <dbl>                 <dbl>
 1 Afghanistan  2000               20595360                10297680              10297680 
 2 Afghanistan  2001               21448459                10724230.             10724230.
 3 Afghanistan  2002               22301558                11150779              11150779 
 4 Afghanistan  2003               23154657                11577328.             11577328.
 5 Afghanistan  2004               24007756                12003878              12003878 
 6 Afghanistan  2005               24860855                12430428.             12430428.
 7 Albania      2000                3304948                 1652474               1652474 
 8 Albania      2001                3283184.                1641592.              1641592.
 9 Albania      2002                3261421.                1630710.              1630710.
10 Albania      2003                3239657.                1619829.              1619829.
# … with 1,256 more rows
# ℹ Use `print(n = ...)` to see more rows

[[2]]
# A tibble: 1,278 × 5
   country      year population_interpolated female_pop_interpolated male_pop_interpolated
   <chr>       <int>                   <dbl>                   <dbl>                 <dbl>
 1 Afghanistan  2005               24860855                12430428.             12430428.
 2 Afghanistan  2006               25568246.               12784123.             12784123.
 3 Afghanistan  2007               26275638.               13137819.             13137819.
 4 Afghanistan  2008               26983029.               13491515.             13491515.
 5 Afghanistan  2009               27690421.               13845210.             13845210.
 6 Afghanistan  2010               28397812                14198906              14198906 
 7 Albania      2005                3196130                 1598065               1598065 
 8 Albania      2006                3186933.                1593466.              1593466.
 9 Albania      2007                3177735.                1588868.              1588868.
10 Albania      2008                3168538.                1584269.              1584269.
# … with 1,268 more rows
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading