Aggrate (sum) across multiple columns by FIPS i.e. county ID for two time period ranges

Based on the data below how can I sum the columns Inflow, Outflow, NetMigration, InAGI and OutAGI by FIPS across two time periods 2011-2015 and 2016-2020? Some counties might not have the data for a particular fiscal year but, that does not matter since the idea is to sum the data within these two time period ranges. Naturally some NAs are to be expected in the final dataset. I am using FIPS because there are several counties with the same name. So, the column Key is no longer needed as it’s a concatenate of FIPS and Year.

Desired output schema/columns:

    FIPS    County          State   TotInflow   TotOutflow  TotNetMigration TotInAGI    TotOutAGI   Time_Period
12001       Alachua County  FL                                                                      2011-2015
12001       Alachua County  FL                                                                      2016-2020
08001       Adams County    CO                                                                      2011-2015
08001       Adams County    CO                                                                      2016-2020

Sample data:

     df = structure(list(Key = c("080012020", "120012020", "120012018", 
"120012017", "080012017", "120012016", "120012015", "080012014", 
"120012013", "120012012", "080012012", "080012011", "080012016"
), County = c("Adams County", "Alachua County", "Alachua County", 
"Alachua County", "Adams County", "Alachua County", "Alachua County", 
"Adams County", "Alachua County", "Alachua County", "Adams County", 
"Adams County", "Adams County"), State = c("CO", "FL", "FL", 
"FL", "CO", "FL", "FL", "CO", "FL", "FL", "CO", "CO", "CO"), 
    FIPS = c("08001", "12001", "12001", "12001", "08001", "12001", 
    "12001", "08001", "12001", "12001", "08001", "08001", "08001"
    ), Inflow = c(38L, 261L, 321L, 339L, 58L, 288L, 254L, 46L, 
    413L, 433L, 30L, 42L, NA), InAGI = c(1817L, 6287L, 8423L, 
    8364L, 1865L, 14720L, 5224L, 1074L, 11774L, 10151L, 921L, 
    500L, NA), FiscalYear = c("2019- 2020", "2019- 2020", "2017 - 2018", 
    "2016 - 2017", "2016 - 2017", "2015 - 2016", "2014 - 2015", 
    "2013 - 2014", "2012 - 2013", "2011 - 2012", "2011 - 2012", 
    "2010 - 2011", "2015 - 2016"), Year = c(2020L, 2020L, 2018L, 
    2017L, 2017L, 2016L, 2015L, 2014L, 2013L, 2012L, 2012L, 2011L, 
    2016L), Outflow = c(54L, 447L, 444L, 558L, 44L, 436L, 334L, 
    49L, 466L, 495L, 39L, 31L, 51L), OutAGI = c(1879L, 13106L, 
    15409L, 16496L, 2408L, 12675L, 7448L, 733L, 10309L, 11677L, 
    847L, 605L, 1114L), NetMigration = c(-16L, -186L, -123L, 
    -219L, 14L, -148L, -80L, -3L, -53L, -62L, -9L, 11L, NA)), row.names = c(NA, 
-13L), class = "data.frame")

>Solution :

Do the grouping by ‘FIPS’, ‘County’, ‘State’ and a Time_Period column created from ‘Year’ based on whether the ‘Year’ lies between some start, end year, and then get the sum of the columns interested by looping across those column names

library(dplyr)
df %>%
   group_by(FIPS, County, State, 
   Time_Period = case_when(between(Year, 2011, 2015)~
     '2011-2015', between(Year, 2016, 2020)~ '2016-2020')) %>% 
  summarise(across(c(Inflow, InAGI, Outflow, OutAGI, NetMigration), 
   ~ sum(.x, na.rm = TRUE), .names = "Total{.col}"),.groups = "drop")

-output

# A tibble: 4 × 9
  FIPS  County         State Time_Period TotalInflow TotalInAGI TotalOutflow TotalOutAGI TotalNetMigration
  <chr> <chr>          <chr> <chr>             <int>      <int>        <int>       <int>             <int>
1 08001 Adams County   CO    2011-2015           118       2495          119        2185                -1
2 08001 Adams County   CO    2016-2020            96       3682          149        5401                -2
3 12001 Alachua County FL    2011-2015          1100      27149         1295       29434              -195
4 12001 Alachua County FL    2016-2020          1209      37794         1885       57686              -676

Leave a Reply