Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using R to split a structured string into multiple columns and set column names based on string values

I have a column of character string with each row containing information for multiple columns, including the column names. Every cell is structured so one can easily see the different pieces of information, but it is not easy to work with in this format. A cell is formatted like ‘variable: value’, with multiple inputs separated by commas. I would like to split the column into separate columns with the column names based on what’s written before the colon. Here’s an example.

my_df <- tibble(
address = c("street: AvenueName, number: 1, code: 1234 AB, city: City One", 
"street: AnotherStreet, number: 99, code: 5678 CD, city: Town Two")
)

I know I can split strings by using separate_wider_delim and then assign the column names to the data.frame. While this works, I would like to see a method that does not depend on setting the column names ‘manually’, but derive them from the values inside the string.

My expected dataframe would look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

expected_df <- tibble(
street= c("AvenueName", "AnotherStreet"),
number = c(1, 99),
code = c("1234 AB", "5678 CD"),
city = c("City One", "Town Two")
)

>Solution :

Without hard-coding column names, here is one approach –

library(dplyr)
library(tidyr)

my_df %>%
  #To keep track of each row of data 
  mutate(row = row_number()) %>%
  # Bring data in separate rows splitting on comma
  separate_rows(address, sep = ",\\s*") %>%
  # Split data on colon to get data in two columns
  separate_wider_delim(address, ": ", names = c("col", "value")) %>%
  # Get data in wide format
  pivot_wider(names_from = col, values_from = value) %>%
  # Drop row column
  select(-row)

#  street        number code    city    
#  <chr>         <chr>  <chr>   <chr>   
#1 AvenueName    1      1234 AB City One
#2 AnotherStreet 99     5678 CD Town Two

data

I have taken the liberty to fix your data to match it with the expected output.

my_df <- tibble(
  address = c("street: AvenueName, number: 1, code: 1234 AB, city: City One", 
            "street: AnotherStreet, number: 99, code: 5678 CD, city: Town Two")
)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading