Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Transform column into numeric if some values have % sign and some don't

I have a dataframe df containing multiple columns named x_1, x_2, x_3 and so on that are class character but I want to turn them numeric. The problem is, that some of the values are decimals and others are in percentages, containing a %-sign. See my input here:

df <- data.frame(Company  = c("abc", "def", "ghi"),
                  x_1 = c(0.05, "0.6%", "11.25%"),
                  x_y_1 = c("val_1", "val_2", "val_3"),
                  x_2 = c("3.5%", 0.12, 0.7),
                  x_y_2 = c("val_1", "val_2", "val_3"),
                  x_3 = c(0.83, 0.4, "0.9%"),
                  x_y_3 = c("val_1", "val_2", "val_3")
                  )
> df
  Company    x_1 x_y_1  x_2 x_y_2  x_3 x_y_3
1     abc   0.05 val_1 3.5% val_1 0.83 val_1
2     def   0.6% val_2 0.12 val_2  0.4 val_2
3     ghi 11.25% val_3  0.7 val_3 0.9% val_3

I want to turn the columns x_1, x_2, x_3 into numeric, so the output should look something like this:

> df
  Company    x_1   x_y_1  x_2   x_y_2  x_3   x_y_3
1     abc   0.05   val_1  0.035 val_1  0.83  val_1
2     def   0.006  val_2  0.12  val_2  0.4   val_2
3     ghi   0.1125 val_3  0.7   val_3  0.009 val_3

How do I transform only the values containing % signs into the right numeric value?
Thank you.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The premise in these examples is a two-step process:

  1. Determine the denominator based on the presence of %;
  2. Derive the numerator by removing the % (whether or not it exists) and converting to numeric.

base R

df[c(2,4,6)] <- lapply(df[c(2,4,6)],
                       function(z) as.numeric(sub("%", "", z)) / ifelse(grepl("%", z), 100, 1))
df
#   Company    x_1 x_y_1   x_2 x_y_2   x_3 x_y_3
# 1     abc 0.0500 val_1 0.035 val_1 0.830 val_1
# 2     def 0.0060 val_2 0.120 val_2 0.400 val_2
# 3     ghi 0.1125 val_3 0.700 val_3 0.009 val_3

dplyr

library(dplyr)
df %>%
  mutate(across(c(2, 4, 6),
         ~ as.numeric(sub("%", "", .)) / if_else(grepl("%", .), 100, 1)))
#   Company    x_1 x_y_1   x_2 x_y_2   x_3 x_y_3
# 1     abc 0.0500 val_1 0.035 val_1 0.830 val_1
# 2     def 0.0060 val_2 0.120 val_2 0.400 val_2
# 3     ghi 0.1125 val_3 0.700 val_3 0.009 val_3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading