Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Conditional rename values in R dataframe

I have a simple dataframe with a large column of 6-digit numbers in R:

library(dplyr)

df <- data.frame(number = c(111110, 111211, 111311, 111411, 111110, 111311, 930920, 039203, 940291, 111110), 
                 value = seq(1,10,1)
                 )

Output

df
#   number value
#1  111110     1
#2  111211     2
#3  111311     3
#4  111411     4
#5  111110     5
#6  111311     6
#7  930920     7
#8   39203     8
#9  940291     9
#10 111110    10

I would like to rename these numbers with "IDs", but the problem I am encoutering is that there are duplicates in the dataset.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Expected output

df
#   number  ID  value
#1  111110  ID-1   1
#2  111211  ID-2   2
#3  111311  ID-3   3
#4  111411  ID-4   4
#5  111110  ID-1   5
#6  111311  ID-3   6
#7  930920  ID-5   7
#8   39203  ID-6   8
#9  940291  ID-7   9
#10 111110  ID-1   10

Question

How create an extra column in the dataframe that provides every unique number with their own ID? I thought the dplyr package and mutate function might be an option?

>Solution :

Create a factor from the number column, and then convert it to an integer. The conversion from factor to integer will give you the index of the factor’s level:

df <- data.frame(number = c(111110, 111211, 111311, 111411, 111110, 111311, 930920, 039203, 940291, 111110), 
    value = seq(1,10,1)
)

df <- df %>% mutate(ID=paste0("ID-", as.integer(factor(number))))

This gives:

df

   number value   ID
1  111110     1 ID-2
2  111211     2 ID-3
3  111311     3 ID-4
4  111411     4 ID-5
5  111110     5 ID-2
6  111311     6 ID-4
7  930920     7 ID-6
8   39203     8 ID-1
9  940291     9 ID-7
10 111110    10 ID-2

If you want to preserve the original order in the ID column, specify the levels of the factor:

df <- df %>% mutate(ID=paste0("ID-", as.integer(factor(number, levels=unique(number)))))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading