Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Having trouble with pivoting data in R

Below is the sample data. My goal is to create a single row (monthly data) for each area/industry/ownership combination. For this sample data set, there will be 24 values per area/industry/ownership combination. I know that you have to do a series of pivots but my attempts have not been successful. At the bottom is the desired result.

In my larger data set, I have far more than 3 years and one industry but this keeps it manageable.

 naicscode <- c("111","111","111","111","111","111","111","111","111","111","111","111","111","111","111","111")
 areavalue <- c("000000","000000","000000","000000","000000","000000","000000","000000","000003","000003","000003","000003","000003","000003","000003","000003")
 ownership <- c("50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50")
 period <- c("01","02","03","04","01","02","03","04","01","02","03","04","01","02","03","04")
 periodyear <- c("2020","2020","2020","2020","2021","2021","2021","2021", "2020","2020","2020","2020", "2021","2021","2021","2021")
 mnth1emp<- c(25000,25005,25010,25020,25025,20506,20510,21555,16000,16005,16025,16020,16035,13595,14010,13985)
 mnth2emp<- c(25005,25010,25000,24995,25005,25010,25060,24995,15995,16005,16015,16020,16030,14015,14000,14200)
 mnth3emp<- c(24985,25000,25005,25010,25009,25040,25090,25080,15990,16000,16065,16025,16030,14665,14550,14620)


 test <- data.frame(naicscode,areavalue,ownership,periodyear,period,mnth1emp,mnth2emp,mnth3emp)




  naicscode       areavalue    ownership     202001     202002    202003  202004   202005   202006  ... and on until 202112. 
     111            000000        50          25000       25005     24985   25005   25010   25000

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

I’m assuming period means quarter, and the number in name denotes the month number within that quarter.

If that’s the case, your column headers are 100*periodyear + (period-1)*4 + the number inside name.

library(tidyverse)
test %>%
  pivot_longer(starts_with("mnth")) %>%
  mutate(period_num = as.numeric(periodyear)*100 + (as.numeric(period)-1)*4 + parse_number(name)) %>%
  select(-c(periodyear:name)) %>%
  pivot_wider(names_from = period_num, values_from = value)
  

result

# A tibble: 2 × 27
  naicscode areavalue ownership 20200…¹ 20200…² 20200…³ 20200…⁴ 20200…⁵ 20200…⁶
  <chr>     <chr>     <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 111       000000    50          25000   25005   24985   25005   25010   25000
2 111       000003    50          16000   15995   15990   16005   16005   16000
# … with 18 more variables: `202009` <dbl>, `202010` <dbl>, `202011` <dbl>,
#   `202013` <dbl>, `202014` <dbl>, `202015` <dbl>, `202101` <dbl>,
#   `202102` <dbl>, `202103` <dbl>, `202105` <dbl>, `202106` <dbl>,
#   `202107` <dbl>, `202109` <dbl>, `202110` <dbl>, `202111` <dbl>,
#   `202113` <dbl>, `202114` <dbl>, `202115` <dbl>, and abbreviated variable
#   names ¹​`202001`, ²​`202002`, ³​`202003`, ⁴​`202005`, ⁵​`202006`, ⁶​`202007`
# ℹ Use `colnames()` to see all variable names
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading