Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why there are duplicate rows after using group_by and mutate?

The sample data is as below:

n period age
15 1991 5
20 1991 5
16 1991 15
29 1991 15
77 1991 25
44 1991 25

I use the following code to get the sum from the data grouped by period and age:

#The name of dataset is a.
a %>% group_by(period,age)%>%
      mutate(n = sum(n))

But the result is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

n period age
35 1991 5
35 1991 5
45 1991 15
45 1991 15
121 1991 25
121 1991 25

Why there is duplicate rows? It is because it sums every element in each groups?

>Solution :

You need to use the summarize() function. mutate() adds a column without consolidating the data. Here’s a reproducible example:

##Check if dplyr is installed, load if installed, install if not##
if(!require(dplyr)){
install.packages("dplyr")
}

##Creating the data##
n<-c(15,20,16,29,77,44)
period<-rep(1991, 6)
age<-c(5,5,15,15,25,25)

a<-data.frame(n=n, period=period, age=age)

##Calculation with summarize()##
a %>% group_by(period, age) %>% summarize(n= sum(n)) 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading