-
I have a column named budgetv2 which means the budget for each month, Cost = total expenses in a day, Period = Name of the month in Indonesian to distinguish the budget for each month and lastly Final Budget = the column you want to fill in with the code below that doesn’t run
-
Suppose I have a df as below code in df1
-
The code will fill in the Final Budget column which I filled in with NA
-
The code that I want is the example. So the first row is January because that row is the first row of January, so the Final Budget will automatically take the value from the budgetv2 column, so the final budget value is 10000
-
Then in the second row because it is no longer in the first row in January, the Final Budget will make calculations like :
budgetv2(second row) + Final Budget(top/first row) – Cost(top/first row) then the Final Budget value is in the 2nd column is 10000 + 100000 – 1000 = 19000 is the second row value in january -
continue in the third row, have the same steps as number 3, namely budgetv2(row 3) + Final Budget(row above/row two) – cost (row above row 3.row second) then the calculation becomes like = 10000 + 19000 – 500 = 285000
-
Row 4 has entered a different month, because row 4 is the first row of the February (not January) period, the Final Budget will take the value of budgetv2, which is 500, so the value of the final budget in row 4 is 500
-
Row 5 is the second row in the February Period, so this second row will have the same formula as in steps 5 and 6, namely Final Budget = budgetv2(5th row) + Final Budget (upper row/4th row) – Cost (upper row/4th row) i.e.. 500 + 500 – 200 = 800 is the value for the Final Budget in row 5
I Already tried this code
df1 <- data.frame(Cost = c(1000,500,200,200,100,300,200),
budgetv2 = c(10000,10000,10000,500,500,700,700),
Period = c("Januari","Januari","Januari","Februari","Februari","Maret","Maret"),
Final_Budget = c(NA,NA,NA,NA,NA,NA,NA))
for (i in 1:nrow(df1)) {
if (i == 1 | df1$Period[i] != df1$Period[i-1]) { #jika row saat ini adalah row pertama di bulan tersebut
df1$Final_Budget[i] <- df1$budgetv2[i]
} else {
df1$Final_Budget[i] - df1$budgetv2[i] + df1$Final_Budget[i-1] - df1$Cost[i-1]
}
}
and it gave me this error:
Error in if (i == 1 | df1$period[i]!= df1$period[i - 1]) : argument is of length zero
my expected result is like this:
| Cost | budgetv2 | Period | Final Budget |
|---|---|---|---|
| 1000 | 10000 | Januari | 10000 |
| 500 | 10000 | Januari | 19000 |
| 200 | 10000 | Januari | 28500 |
| 200 | 500 | Februari | 500 |
| 100 | 500 | Februari | 800 |
| 300 | 700 | Maret | 700 |
| 200 | 700 | Maret | 1100 |
>Solution :
You cannot pass an index of zero to df1[...]. In your first if-statement you have:
i == 1 | df1$Period[i] != df1$Period [i-1]
which returns zero for [i-1] in the first step of the for-loop. Only because you have | it doesn’t mean that the second part won’t be evaluated. I added an ifelse that returns FALSE for i==1 and only evaluates the second part where i != 1.
You are also missing the assignment in your else. I believe you meant to have
df1$Final_Budget[i] <- df1$budgetv2[i] + df1$Final_Budget[i-1] - df1$Cost[i-1]
Instead of
df1$Final_Budget[i] - df1$budgetv2[i] + df1$Final_Budget[i-1] - df1$Cost[i-1]
Notice <- instead of -.
This should work for you:
for (i in 1:nrow(df1)) {
if (i == 1 | ifelse(i==1, FALSE, df1$Period[i] != df1$Period[i-1])) {
df1$Final_Budget[i] <- df1$budgetv2[i]
} else {
df1$Final_Budget[i] <- df1$budgetv2[i] + df1$Final_Budget[i-1] - df1$Cost[i-1]
}
}
df1
#> Cost budgetv2 Period Final_Budget
#> 1 1000 10000 Januari 10000
#> 2 500 10000 Januari 19000
#> 3 200 10000 Januari 28500
#> 4 200 500 Februari 500
#> 5 100 500 Februari 800
#> 6 300 700 Maret 700
#> 7 200 700 Maret 1100
Created on 2023-03-24 by the reprex package (v2.0.1)
Above, I tried to apply the least amount of change to your code. See these variations below for better practices.
Separating your conditions:
for (i in 1:nrow(df1)) {
if (i == 1) {
df1$Final_Budget[i] <- df1$budgetv2[i]
} else if (df1$Period[i] != df1$Period[i-1]) {
df1$Final_Budget[i] <- df1$budgetv2[i]
} else {
df1$Final_Budget[i] <- df1$budgetv2[i] + df1$Final_Budget[i-1] - df1$Cost[i-1]
}
}
or have an ifelse that evaluates conditions in order:
for (i in 1:nrow(df1)) {
if (ifelse(i==1, TRUE, df1$Period[i] != df1$Period[i-1])) {
df1$Final_Budget[i] <- df1$budgetv2[i]
} else {
df1$Final_Budget[i] <- df1$budgetv2[i] + df1$Final_Budget[i-1] - df1$Cost[i-1]
}
}
or use || instead of | as suggested in the comments (only evaluate the right hand side where the left hand side is not true):
for (i in 1:nrow(df1)) {
if (i == 1 || df1$Period[i] != df1$Period[i-1]) {
df1$Final_Budget[i] <- df1$budgetv2[i]
} else {
df1$Final_Budget[i] <- df1$budgetv2[i] + df1$Final_Budget[i-1] - df1$Cost[i-1]
}
}
Or simply in dplyr:
library(dplyr)
df1 %>%
group_by(Period) %>%
mutate(Final_Budget = cumsum(budgetv2) - cumsum(lag(Cost, default = 0)))