Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R converting dataframe of strings to unique numbers

I have a dataframe that’s very large (let’s say 8 rows by 10,000 columns) that is filled with strings. I want to convert each unique string to a number and replace it with it.

For example, if I had a dataframe:

   X1       X2       X3
1 cat    mouse     rabbit
2 dog   cat, dog    dog

I’d like to convert it to:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   X1        X2     X3
1   1         2       3
2   4         5       4

Note the combined label of "cat,dog" gets its own unique number. The actual numbering of each string is irrelevant as I’m doing this for an inter-rater reliability calculation.

Short of me getting all the unique elements, assigning them a number and replacing is there a more elegant way to do this?

Also, if a value in an element is blank, eg "", it should be converted to an NA in the numeric DF.

>Solution :

You can match on the unique values:

df[] <- sapply(df, match, unique(unlist(df)))

#> df
  X1 X2 X3
1  1  3  5
2  2  4  2

Or, even simpler:

df[] <- match(unlist(df), unique(unlist(df)))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading