My dataframe has 2 categorical variables, and one is of a lower hierarchy than the other. I want to sum the numerical value of all rows within the sub-category using dplyr.
Thank you in advance for all those who can help me with!
This is the dataframe I start with:
transportation <- data.frame(
Country = c("A", "A", "A", "B", "B", "B"),
Mode = c("Car", "Train", "Plane", "Car", "Train", "Plane"),
Energy = c(10000, 9000, 20000, 200000, 160000, 450000)
)
And this is the dataframe I want to end up with:
country_sum <- data.frame(
Country = c("A", "A", "A", "B", "B", "B"),
Mode = c("Car", "Train", "Plane", "Car", "Train", "Plane"),
Energy = c(10000, 9000, 20000, 200000, 160000, 450000),
country_sum = c(39000, 39000, 39000, 810000, 810000, 810000)
)
>Solution :
First group by Country and then mutate with sum:
library(dplyr)
transportation %>%
group_by(Country) %>%
mutate(country_sum = sum(Energy))
Country Mode Energy country_sum
<chr> <chr> <dbl> <dbl>
1 A Car 10000 39000
2 A Train 9000 39000
3 A Plane 20000 39000
4 B Car 200000 810000
5 B Train 160000 810000
6 B Plane 450000 810000

