Extracting strings by Position in R, preferably the tidyverse

April 21, 2022

I have a dataset as follows;

My_data <- tibble(ref = 1:3, codes = c(12204, 35478, 67456))

I want to separate the codes column as follows.

The first digit of the codes column forms a new variable clouds.

The second and third digits of the codes column forms a new variable wind_direction.

The last two digits of the codes column form a new variable wind_speed.

NB: I know that str_match and str_match_all can do this. The problem is that they return a matrix. I want a solution that will extend the tibble to include the three additional variables.

Thank you.

>Solution :

You can use the tidyr::extract function with the appropriate regular expression to do the splitting

My_data %>% 
  mutate(codes = as.character(codes)) %>% 
  extract(codes, c("clouds","wind_direction","wind_speed"), r"{(\d+)(\d{2})(\d{2})}")

#     ref clouds wind_direction wind_speed
#   <int> <chr>  <chr>          <chr>     
# 1     1 1      22             04        
# 2     2 3      54             78        
# 3     3 6      74             56