I have a simple dataframe with a large column of 6-digit numbers in R
:
library(dplyr)
df <- data.frame(number = c(111110, 111211, 111311, 111411, 111110, 111311, 930920, 039203, 940291, 111110),
value = seq(1,10,1)
)
Output
df
# number value
#1 111110 1
#2 111211 2
#3 111311 3
#4 111411 4
#5 111110 5
#6 111311 6
#7 930920 7
#8 39203 8
#9 940291 9
#10 111110 10
I would like to rename these numbers with "IDs", but the problem I am encoutering is that there are duplicates in the dataset.
Expected output
df
# number ID value
#1 111110 ID-1 1
#2 111211 ID-2 2
#3 111311 ID-3 3
#4 111411 ID-4 4
#5 111110 ID-1 5
#6 111311 ID-3 6
#7 930920 ID-5 7
#8 39203 ID-6 8
#9 940291 ID-7 9
#10 111110 ID-1 10
Question
How create an extra column in the dataframe that provides every unique number with their own ID? I thought the dplyr
package and mutate
function might be an option?
>Solution :
Create a factor
from the number
column, and then convert it to an integer. The conversion from factor
to integer
will give you the index of the factor’s level:
df <- data.frame(number = c(111110, 111211, 111311, 111411, 111110, 111311, 930920, 039203, 940291, 111110),
value = seq(1,10,1)
)
df <- df %>% mutate(ID=paste0("ID-", as.integer(factor(number))))
This gives:
df
number value ID
1 111110 1 ID-2
2 111211 2 ID-3
3 111311 3 ID-4
4 111411 4 ID-5
5 111110 5 ID-2
6 111311 6 ID-4
7 930920 7 ID-6
8 39203 8 ID-1
9 940291 9 ID-7
10 111110 10 ID-2
If you want to preserve the original order in the ID column, specify the levels of the factor:
df <- df %>% mutate(ID=paste0("ID-", as.integer(factor(number, levels=unique(number)))))