Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extracting columns from strings in R

I have somewhat of a complex case that I am not able to parse out for myself (see structure and the dput below). What the df looks like:

String for each user

Each participant has a string associated with it. I would like to create columns for each participants from the strings. I see that this can be accomplished with extract() or separate() of the tidyverse package. However, the catch is that I first want to create additional rows for each ID based on ; separator.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So an intermediate step would look something like each IDs would have multiple rows of data. E.g., ID=1 has 2 semi-colons so it should have two rows of data. Basically, every substring after ; belongs into a new row.

After that, I want to separate values by , into columns.

So far, I’m trying to accomplish the first step.

mandatory_puzzles %>% mutate(Mandatory = str_split(Mandatory, pattern = ";"))

Which gives me a character list for each ID.

Character list for each ID

When I pull Mandatory column, I see that I am able to achieve an intermediary step.

Intermediate Step

How do I proceed from here?

Here’s the data:

structure(list(Mandatory = c("Manipulate environment objects , 0 , 285.54 , 270.81 , 51 , 7.18 , 43.82 , 6 , 3 , 0 , 1 , 59 , 25.67449;Manipulate environment objects , 1 , 542.94 , 255 , 320.02 , 28.26 , 291.76 , 18 , 2 , 2 , 5 , 55 , 24.58493;Manipulate environment objects , 2 , 856.8 , 311.39 , 124.58 , 21.45 , 103.13 , 15 , 3 , 0 , 2 , 36 , 26.28657;Manipulate environment objects , 3 , 1129.1 , 267.08 , 58.62 , 12.65 , 45.96 , 10 , 2 , 0 , 4 , 48 , 25.64844;Manipulate environment objects , 1 , 1226.98 , 89.83 , 65.93 , 18.71 , 47.21 , 13 , 1 , 0 , 5 , 16 , 23.01501", 
"Manipulate environment objects , 0 , 282.28 , 254.48 , 96.46 , 7.23 , 89.23 , 7 , 2 , 1 , 1 , 47 , 60.15876;Manipulate environment objects , 1 , 457.42 , 170.06 , 66.17 , 7.99 , 58.18 , 10 , 2 , 0 , 4 , 39 , 59.98176;Manipulate environment objects , 2 , 947.78 , 484.34 , 640.47 , 29.06 , 611.41 , 29 , 3 , 3 , 1 , 25 , 60.04829;Manipulate environment objects , 3 , 1073.8 , 122.48 , 83.37 , 7.91 , 75.46 , 9 , 2 , 0 , 3 , 23 , 59.94974", 
"Manipulate environment objects , 0 , 161.66 , 147.03 , 30.98 , 5.56 , 25.42 , 6 , 2 , 0 , 1 , 60 , 59.344;Manipulate environment objects , 1 , 293.38 , 126.06 , 42.92 , 11.66 , 31.26 , 9 , 2 , 0 , 4 , 52 , 59.47927;Manipulate environment objects , 2 , 548.26 , 252.67 , 221.09 , 36.73 , 184.36 , 33 , 3 , 1 , 1 , 46 , 59.70362;Manipulate environment objects , 3 , 682.66 , 132.26 , 43.8 , 7.34 , 36.45 , 8 , 2 , 0 , 3 , 52 , 62.20168;Manipulate environment objects , 1 , 818.18 , 123.68 , 14.31 , 3.15 , 11.15 , 5 , 1 , 0 , 4 , 52 , 59.62266;Manipulate environment objects , 2 , 976.86 , 155.28 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 25 , 61.21601;Manipulate environment objects , 2 , 1008.76 , 29.8 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 13 , 59.28677;Manipulate environment objects , 2 , 1019.34 , 8.43 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 2 , 59.81207", 
"Manipulate environment objects , 1 , 59.06 , 48.72 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 27 , 75.03592;Manipulate environment objects , 1 , 93.14 , 27.5 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 6 , 74.96835;Manipulate environment objects , 1 , 223.1 , 125.77 , 60.6 , 6.03 , 54.57 , 8 , 2 , 0 , 2 , 37 , 74.98416;Manipulate environment objects , 2 , 485.24 , 258 , 141.3 , 17.31 , 124 , 20 , 3 , 0 , 5 , 34 , 74.94012;Manipulate environment objects , 3 , 644.2 , 156.72 , 74.81 , 12.27 , 62.54 , 11 , 2 , 0 , 1 , 36 , 74.96394", 
"Manipulate environment objects , 1 , 2002.74 , 108.46 , 24.89 , 3.86 , 21.03 , 9 , 2 , 0 , 2 , 55 , 59.97829;Manipulate environment objects , 2 , 2249.74 , 244.39 , 94.38 , 10.67 , 83.71 , 21 , 3 , 1 , 5 , 43 , 60.01971;Manipulate environment objects , 3 , 2417.84 , 165.16 , 29.56 , 6.43 , 23.13 , 12 , 2 , 0 , 1 , 53 , 59.98882;Manipulate environment objects , 3 , 2481.99 , 61.77 , 9.57 , 2.28 , 7.29 , 5 , 1 , 0 , 2 , 16 , 60.0066;Manipulate environment objects , 1 , 2614.9 , 123.53 , 25.84 , 4.89 , 20.95 , 10 , 2 , 0 , 4 , 53 , 59.98421;Manipulate environment objects , 2 , 2818.64 , 201.51 , 76.99 , 12.27 , 64.72 , 26 , 4 , 1 , 2 , 44 , 60.03096;Manipulate environment objects , 3 , 2913.61 , 92.3 , 28.49 , 4.66 , 23.83 , 11 , 2 , 0 , 4 , 16 , 60.08723;Manipulate environment objects , 1 , 3039.14 , 116.06 , 50.32 , 11.83 , 38.48 , 22 , 2 , 0 , 6 , 53 , 59.97778;Manipulate environment objects , 2 , 3057.44 , 16.34 , 8.94 , 1.72 , 7.22 , 4 , 1 , 0 , 1 , 5 , 59.97208;Manipulate environment objects , 2 , 3217.52 , 153.71 , 37.96 , 4.72 , 33.25 , 14 , 3 , 1 , 4 , 43 , 60.01913;Manipulate environment objects , 3 , 3359.48 , 140.2 , 51.85 , 8.67 , 43.18 , 18 , 2 , 1 , 6 , 50 , 60.06129;Manipulate environment objects , 1 , 3480.78 , 114.93 , 23.72 , 4.31 , 19.41 , 13 , 2 , 0 , 2 , 52 , 59.95073;Manipulate environment objects , 2 , 3638.04 , 154.73 , 38.13 , 7.31 , 30.82 , 19 , 4 , 0 , 6 , 44 , 60.00352;Manipulate environment objects , 3 , 3764.88 , 124.92 , 25.51 , 6.16 , 19.35 , 12 , 2 , 0 , 2 , 51 , 60.01723;Manipulate environment objects , 1 , 3883.16 , 112.63 , 20.66 , 4.06 , 16.6 , 9 , 2 , 0 , 4 , 54 , 59.95847;Manipulate environment objects , 2 , 4025.9 , 140.86 , 68.49 , 13.15 , 55.33 , 24 , 3 , 1 , 1 , 43 , 59.98686;Manipulate environment ..."
), ID = 1:5), row.names = c(NA, -5L), class = c("tbl_df", "tbl", 
"data.frame"))


I appreciate any help you can provide here.

>Solution :

We may do this with separate_rows to split into rows based on the ; and then split at the , into columns with separate

library(dplyr)
library(tidyr)
library(stringr)
out <- mandatory_puzzles %>%
    separate_rows( Mandatory, sep = "\\s*;\\s*")%>% 
    separate(Mandatory, into = str_c("col", 
     seq_len(str_count(first(.$Mandatory), ",") +1)), 
         sep = "\\s*,\\s*", convert = TRUE, extra = "merge")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading