The sample data is as below:
| n | period | age |
|---|---|---|
| 15 | 1991 | 5 |
| 20 | 1991 | 5 |
| 16 | 1991 | 15 |
| 29 | 1991 | 15 |
| 77 | 1991 | 25 |
| 44 | 1991 | 25 |
I use the following code to get the sum from the data grouped by period and age:
#The name of dataset is a.
a %>% group_by(period,age)%>%
mutate(n = sum(n))
But the result is:
| n | period | age |
|---|---|---|
| 35 | 1991 | 5 |
| 35 | 1991 | 5 |
| 45 | 1991 | 15 |
| 45 | 1991 | 15 |
| 121 | 1991 | 25 |
| 121 | 1991 | 25 |
Why there is duplicate rows? It is because it sums every element in each groups?
>Solution :
You need to use the summarize() function. mutate() adds a column without consolidating the data. Here’s a reproducible example:
##Check if dplyr is installed, load if installed, install if not##
if(!require(dplyr)){
install.packages("dplyr")
}
##Creating the data##
n<-c(15,20,16,29,77,44)
period<-rep(1991, 6)
age<-c(5,5,15,15,25,25)
a<-data.frame(n=n, period=period, age=age)
##Calculation with summarize()##
a %>% group_by(period, age) %>% summarize(n= sum(n))