Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to truncate multiple columns in R

I need to truncate many columns to range from -3.0 to 3.0. This means: any values greater than +3.0, should be recoded as +3.0 into a new variable, and all values less than -3.0 should also be recoded into this new variable as -3.0.

Here is an example dataset

library(tidyverse)
MyData <- tibble( a = c(2.3, 3.0, -1.5, 3.7, -4.7, 5.2),
                  b = c(3.6, 1.52, -5.4, 4.6, 1.5, 2.2),
                  c = c(1.0, -2.6, -1.2, 2.5, -4.0, 3.0))

I found out how to do that creating a new variable for each old variable, using mutate() and case_when() however I have too many variables to do it manually, and I was wondering how I could do that in a shorter and more elegant way. I would like to see an output like the one originated from this manual code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

MyData %>% 
  mutate(Ta = case_when(a >= 3.0 ~ 3.0,
                        a <= -3.0 ~ -3.0,
                        T ~ a),
         Tb = case_when(b >= 3.0 ~ 3.0,
                        b <= -3.0 ~ -3.0,
                        T ~ b),
         Tc = case_when(c >= 3.0 ~ 3.0,
                        c <= -3.0 ~ -3.0,
                        T ~ c))

# A tibble: 6 x 6
      a     b     c    Ta    Tb    Tc
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1   2.3  3.6    1     2.3  3      1  
2   3    1.52  -2.6   3    1.52  -2.6
3  -1.5 -5.4   -1.2  -1.5 -3     -1.2
4   3.7  4.6    2.5   3    3      2.5
5  -4.7  1.5   -4    -3    1.5   -3  
6   5.2  2.2    3     3    2.2    3  

>Solution :

You might define a function and then apply it to many columns using across. pmin(3, pmax(x, -3)) outputs the larger of -3 and x, and then takes that and outputs the smaller of the result and 3 — ie constrains to the range -3 to 3. The .names parameter of across lets us specify that the result of these operations should be named T+[orig column name].

cap3 <- function(x) { pmin(3, pmax(x, -3)) }

MyData %>%
  mutate(across(a:c, cap3, .names = "T{.col}"))
  # mutate(across(1:3, cap3, .names = "T{.col}"))            # Equiv. alternative
  # mutate(across(everything(), cap3, .names = "T{.col}"))   # Equiv. alternative

Result

# A tibble: 6 x 6
      a     b     c    Ta    Tb    Tc
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1   2.3  3.6    1     2.3  3      1  
2   3    1.52  -2.6   3    1.52  -2.6
3  -1.5 -5.4   -1.2  -1.5 -3     -1.2
4   3.7  4.6    2.5   3    3      2.5
5  -4.7  1.5   -4    -3    1.5   -3  
6   5.2  2.2    3     3    2.2    3  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading