Conditional rename values in R dataframe

I have a simple dataframe with a large column of 6-digit numbers in R:

library(dplyr)

df <- data.frame(number = c(111110, 111211, 111311, 111411, 111110, 111311, 930920, 039203, 940291, 111110), 
                 value = seq(1,10,1)
                 )

Output

df
#   number value
#1  111110     1
#2  111211     2
#3  111311     3
#4  111411     4
#5  111110     5
#6  111311     6
#7  930920     7
#8   39203     8
#9  940291     9
#10 111110    10

I would like to rename these numbers with "IDs", but the problem I am encoutering is that there are duplicates in the dataset.

Expected output

df
#   number  ID  value
#1  111110  ID-1   1
#2  111211  ID-2   2
#3  111311  ID-3   3
#4  111411  ID-4   4
#5  111110  ID-1   5
#6  111311  ID-3   6
#7  930920  ID-5   7
#8   39203  ID-6   8
#9  940291  ID-7   9
#10 111110  ID-1   10

Question

How create an extra column in the dataframe that provides every unique number with their own ID? I thought the dplyr package and mutate function might be an option?

>Solution :

Create a factor from the number column, and then convert it to an integer. The conversion from factor to integer will give you the index of the factor’s level:

df <- data.frame(number = c(111110, 111211, 111311, 111411, 111110, 111311, 930920, 039203, 940291, 111110), 
    value = seq(1,10,1)
)

df <- df %>% mutate(ID=paste0("ID-", as.integer(factor(number))))

This gives:

df

   number value   ID
1  111110     1 ID-2
2  111211     2 ID-3
3  111311     3 ID-4
4  111411     4 ID-5
5  111110     5 ID-2
6  111311     6 ID-4
7  930920     7 ID-6
8   39203     8 ID-1
9  940291     9 ID-7
10 111110    10 ID-2

If you want to preserve the original order in the ID column, specify the levels of the factor:

df <- df %>% mutate(ID=paste0("ID-", as.integer(factor(number, levels=unique(number)))))

Leave a Reply