Split string keeping spaces in R

January 18, 2022

I would like to prepare a table from raw text using readr::read_fwf. There is an argument col_position responsible for determining columns width which in my case could differ.
Table always includes 4 columns and is based on 4 first words from the string like besides one:
category variable description value sth

> text_for_column_width = "category    variable   description      value      sth"
> nchar("category    ")
[1] 12
> nchar("variable   ")
[1] 11
> nchar("description      ")
[1] 17
> nchar("value      ")
[1] 11

I want obtain 4 first words but keeping spaces to have category with 8[a-b]+4[spaces] characters and finally create a vector including number of characters for each of four names c(12,11,17,11). I tried using strsplit with space split argument and then calculate existing zeros however I believe there is faster way just using proper regular expression.

>Solution :

A possible solution, using stringr:

library(tidyverse)

text_for_column_width = "category    variable   description      value      sth"

strings <- text_for_column_width %>% 
  str_remove("sth$") %>% 
  str_split("(?<=\\s)(?=\\S)") %>% 
  unlist

strings

#> [1] "category    "      "variable   "       "description      "
#> [4] "value      "

strings %>% str_count

#> [1] 12 11 17 11