From my dataset of 5000 odd samples, I made a frequency table of dates by survey type and I want to weight the frequencies based on year
So in this table I’ve got three columns of data that I want to perform the same case_when series on, so I thought I’d make a ‘for’ loop but I seem to be stuck, with the below code producing the error "Error in for (. in i) AESOP:GNSOP : 4 arguments passed to ‘for’ which requires 3"
b%<>%for(i in AESOP:GNSOP)
{
case_when(year == "2015" ~ i*0.87,
year == "2016" ~ i*0.84,
year == "2017" ~ i*0.75,
year == "2018" ~ i*0.75,
year == "2019" ~ i*0.69,
year == "2020" ~ i*0.69,
year == "2021" ~ i*0.69,
TRUE ~ i)}
2013 and 2014 are not included as they don’t need to be manipulated
here’s an example of my data
b<- data.frame (year = c(2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021),
AESOP = c(7, 6, 13, 18, 22, 25, 39, 22, 31),
ASSOP = c(8, 14, 17, 25, 31, 39, 50, 67, 88),
GNSOP = c(19, 30, 34, 45, 49, 45, 67, 72, 88))
I’ve looked around at different answers on here and reddit but can’t understand how it applies to my situation. From what I can read maybe I should be using . somewhere in there, or change to if else statements? Maybe I’m missing mutate()?
I had the above case_when block work when I had a frequency table for one survey (not in a loop but calling df$AESOP etc), so I could just run the same code block three times over, but I thought doing it in a loop would would be neater coding than multiple iterations of the same thing. However if you think there’s a different solution than a for loop I’m open to that
Thanks in advance for the answer on fixing my code, apologies if it hurts anyones eyes.
p.s. If there’s a good for loop tutorial y’all can point me to for future reference that’d be awesome.
>Solution :
You cannot use a for loop in a pipe in this way, nor do you need to. The reason you are getting this error is because for is a function in R. When you write a simple loop, such as for (i in 1:10) print(i), this is parsed as `for`(i, 1:10, print(i)). If you try to add an extra argument e.g. `for`(i, 1:10, print(i), 1), you will get the same error, 4 arguments passed to 'for' which requires 3.
Rather than writing a loop, as you’re using dplyr, you can mutate() across() the columns in question to modify them:
b |>
mutate(
across(AESOP:GNSOP, \(x) case_when(
year == 2015 ~ x * 0.87,
year == 2016 ~ x * 0.84,
year == 2017 ~ x * 0.75,
year == 2018 ~ x * 0.75,
year == 2019 ~ x * 0.69,
year == 2020 ~ x * 0.69,
year == 2021 ~ x * 0.69,
TRUE ~ x
))
)
# year AESOP ASSOP GNSOP
# 1 2013 7.00 8.00 19.00
# 2 2014 6.00 14.00 30.00
# 3 2015 11.31 14.79 29.58
# 4 2016 15.12 21.00 37.80
# 5 2017 16.50 23.25 36.75
# 6 2018 18.75 29.25 33.75
# 7 2019 26.91 34.50 46.23
# 8 2020 15.18 46.23 49.68
# 9 2021 21.39 60.72 60.72
Alternatively you can use the .names parameter to create a new column with e.g. an _mutated suffix:
b |>
mutate(
across(AESOP:GNSOP, \(x) case_when(
year == 2015 ~ x * 0.87,
year == 2016 ~ x * 0.84,
year == 2017 ~ x * 0.75,
year == 2018 ~ x * 0.75,
year == 2019 ~ x * 0.69,
year == 2020 ~ x * 0.69,
year == 2021 ~ x * 0.69,
TRUE ~ x
), .names = "{.col}_mutated")
)
# year AESOP ASSOP GNSOP AESOP_mutated ASSOP_mutated GNSOP_mutated
# 1 2013 7 8 19 7.00 8.00 19.00
# 2 2014 6 14 30 6.00 14.00 30.00
# 3 2015 13 17 34 11.31 14.79 29.58
# 4 2016 18 25 45 15.12 21.00 37.80
# 5 2017 22 31 49 16.50 23.25 36.75
# 6 2018 25 39 45 18.75 29.25 33.75
# 7 2019 39 50 67 26.91 34.50 46.23
# 8 2020 22 67 72 15.18 46.23 49.68
# 9 2021 31 88 88 21.39 60.72 60.72
Note also that your year column is numeric so rather than e.g. year == "2015" you should remove the quotes to avoid coercion to a character vector, i.e. year == 2015.