I have some duplicated data and I with to change a value of the grouped data so they are not duplicated. As you can see in the data below I have some duplicated names, therefore I wish to change the count column to 2 where we see a name for the second time. (Like further below). Im not quite sure how to so this so I tired to group my data first before changing, aggregate() looked a good function for this in base R but I couldn’t crack it
mydf
name id count
dave 128 1
john 123 1
steve 111 1
dave 128 1
harry 130 1
harry 130 1
will 11 1
mydf_final
name id count
dave 128 1
john 123 1
steve 111 1
dave 128 2
harry 130 1
harry 130 2
will 11 1
>Solution :
The row_number of data grouped by name and id is the same as the count you are looking to update.
mydf <- read.table(header = TRUE, text =
'name id count
dave 128 1
john 123 1
steve 111 1
dave 128 1
harry 130 1
harry 130 1
will 11 1')
library(dplyr)
mydf |>
mutate(count = row_number(),
.by = c(name, id))
#> name id count
#> 1 dave 128 1
#> 2 john 123 1
#> 3 steve 111 1
#> 4 dave 128 2
#> 5 harry 130 1
#> 6 harry 130 2
#> 7 will 11 1