R dplyr change numeric values in sequence of rows to other numeric values

July 15, 2022

Let’s say I have the following dataset. And, I want to change the range of values starting from 20010001-20010010 to 2001-2010.

How can I do this?

Sample data (df):

structure(list(x = c(20010001, 20010001, 20010002, 20010002, 
20010003, 20010003, 20010004, 20010004, 20010005, 20010005, 20010006, 
20010006, 20010007, 20010007, 20010008, 20010008, 20010009, 20010009, 
200100010, 200100010, 20, 2, 19, 18, 17, 16, 15, 14965, 14964
), y = c("2001", "ORIG", "2001", "ORIG", "2001", "ORIG", "2001", 
"ORIG", "2001", "ORIG", "2001", "ORIG", "2001", "ORIG", "2001", 
"ORIG", "2001", "ORIG", "2001", "ORIG", "2020", "2020", "2020", 
"2020", "2020", "2020", "2020", "2022", "2022")), class = "data.frame", row.names = c(NA, -29L))

Code:

library(tidyverse)

# To change a single value at a time
df["1", "x"] = 2010

# Now how to do it for a range of values wihtout having to do it one by one?

>Solution :

Another possible solution.

EXPLANATION

Regex demo

library(tidyverse)

df %>% 
  mutate(z = str_replace(x, "2001[0]+(?=\\d{2}$)", "20"))

#>            x    y     z
#> 1   20010001 2001  2001
#> 2   20010001 ORIG  2001
#> 3   20010002 2001  2002
#> 4   20010002 ORIG  2002
#> 5   20010003 2001  2003
#> 6   20010003 ORIG  2003
#> 7   20010004 2001  2004
#> 8   20010004 ORIG  2004
#> 9   20010005 2001  2005
#> 10  20010005 ORIG  2005
#> 11  20010006 2001  2006
#> 12  20010006 ORIG  2006
#> 13  20010007 2001  2007
#> 14  20010007 ORIG  2007
#> 15  20010008 2001  2008
#> 16  20010008 ORIG  2008
#> 17  20010009 2001  2009
#> 18  20010009 ORIG  2009
#> 19 200100010 2001  2010
#> 20 200100010 ORIG  2010
#> 21        20 2020    20
#> 22         2 2020     2
#> 23        19 2020    19
#> 24        18 2020    18
#> 25        17 2020    17
#> 26        16 2020    16
#> 27        15 2020    15
#> 28     14965 2022 14965
#> 29     14964 2022 14964