assigning unique value to new column based on multiple columns/rows

September 19, 2022

I have a large dataset with unique values for individuals that have some repeats. Each of these repeated rows are important data that I need to keep. However, I want to assign a unique ID to each individual.

Below is example raw data:

pit	let	ret	otherdata
19000	1	2	xx
19030	3	4	sy
19530	5	6	gh
19030	3	4	sh
19530	5	6	lh

What I want is:

pit	let	ret	ID	otherdata
19000	1	2	1	xx
19030	3	4	2	sy
19530	5	6	3	gh
19030	3	4	2	sh
19530	5	6	3	lh

I’m not sure how to get the unique number to be used for each set of the three columns and retain all the rows.

Thanks

>Solution :

We could use

library(dplyr)
df1 %>% 
  group_by(pit, let, ret) %>% 
  mutate(ID = cur_group_id(), .before = 'otherdata') %>% 
  ungroup

-output

# A tibble: 5 × 5
    pit   let   ret    ID otherdata
  <int> <int> <int> <int> <chr>    
1 19000     1     2     1 xx       
2 19030     3     4     2 sy       
3 19530     5     6     3 gh       
4 19030     3     4     2 sh       
5 19530     5     6     3 lh