Simplifying column values using letters in R

February 28, 2023

Within a dataframe a column with over 300 values for ‘Longitude’.
I would like to identify rows by their Longitude in a simpler way,
and thus would like to replace (or add a column) showing each unique value of Longitude replaced with a consecutive integer preceded by the letter ‘A’.

Thus

df$Longitude
-110.59241 -108.66734  -67.00473  -75.71540 -104.88282 -143.77540

would become

df$new
A1         A2           A3         A4        A5         A6

I have done this before, but in this case different values for Longitude have different frequencies over the data frame, so I can’t do a simple ‘sort and replace’.

>Solution :

Using as.numeric(factor()) you could do:

set.seed(123)

df <- data.frame(
  Longitude = sample(runif(6), 20, replace = TRUE)
)

df$new <- paste0("A", as.numeric(factor(df$Longitude)))

table(df$Longitude, df$new)
#>                     
#>                      A1 A2 A3 A4 A5 A6
#>   0.0455564993899316  3  0  0  0  0  0
#>   0.287577520124614   0  4  0  0  0  0
#>   0.4089769218117     0  0  5  0  0  0
#>   0.788305135443807   0  0  0  3  0  0
#>   0.883017404004931   0  0  0  0  2  0
#>   0.940467284293845   0  0  0  0  0  3