I have a dataframe like:
species <- c("Tulip", "Tulip", "Tulip", "Dandy", "Dandy", "Dandy", "Rose", "Rose", "Rose")
Year <- c(2018, 2019, 2020, 2015, 2018, 2019, 2011, 2018, 2019)
length1 <- c(4, 2, 3, 4, 1,5,6,4,7)
length2 <- c(3,2,4, NA, 4, 4, NA, 3,2)
df <- data.frame(species, Year, length, length2)
The earliest year is 2011 for "Rose" in the species column, and the latest year is 2020 for "Tulip".
I want to add the missing years from 2011 to 2020 for all the groups in the "species" column, so that each group goes from 2011 to 2020.
And for column "length" and "length2", I want the values for the newly added years rows to be "0" since there is no data for it.
How would I do this?
edit: In my working data, I have up to 80 columns like "length1" and "length2"
TIA
>Solution :
Using tidyr::complete you could do:
df |>
tidyr::complete(species, Year = 2011:2020, fill = list(length1 = 0, length2 = 0))
#> # A tibble: 30 × 4
#> species Year length1 length2
#> <chr> <dbl> <dbl> <dbl>
#> 1 Dandy 2011 0 0
#> 2 Dandy 2012 0 0
#> 3 Dandy 2013 0 0
#> 4 Dandy 2014 0 0
#> 5 Dandy 2015 4 0
#> 6 Dandy 2016 0 0
#> 7 Dandy 2017 0 0
#> 8 Dandy 2018 1 4
#> 9 Dandy 2019 5 4
#> 10 Dandy 2020 0 0
#> # … with 20 more rows