I have a large dataset with unique values for individuals that have some repeats. Each of these repeated rows are important data that I need to keep. However, I want to assign a unique ID to each individual.
Below is example raw data:
| pit | let | ret | otherdata |
|---|---|---|---|
| 19000 | 1 | 2 | xx |
| 19030 | 3 | 4 | sy |
| 19530 | 5 | 6 | gh |
| 19030 | 3 | 4 | sh |
| 19530 | 5 | 6 | lh |
What I want is:
| pit | let | ret | ID | otherdata |
|---|---|---|---|---|
| 19000 | 1 | 2 | 1 | xx |
| 19030 | 3 | 4 | 2 | sy |
| 19530 | 5 | 6 | 3 | gh |
| 19030 | 3 | 4 | 2 | sh |
| 19530 | 5 | 6 | 3 | lh |
I’m not sure how to get the unique number to be used for each set of the three columns and retain all the rows.
Thanks
>Solution :
We could use
library(dplyr)
df1 %>%
group_by(pit, let, ret) %>%
mutate(ID = cur_group_id(), .before = 'otherdata') %>%
ungroup
-output
# A tibble: 5 × 5
pit let ret ID otherdata
<int> <int> <int> <int> <chr>
1 19000 1 2 1 xx
2 19030 3 4 2 sy
3 19530 5 6 3 gh
4 19030 3 4 2 sh
5 19530 5 6 3 lh