Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why doesn't str_split_fixed() create new column?

I have the following data set:

>data_short

Symbol_ID      GFP_Mean  GFP_SD Cells
   <chr>             <dbl>   <dbl> <dbl>
 1 Control_0        0.0303 0.00657 7071.
 2 XRCC4_7518       0.0396 0.00768 5022 
 3 XRCC5_7520       0.0305 0.00629 5781.
 4 BRCA1_672        0.0178 0.00833 1822.
 5 DDX48_9775       0.109  0.0201   239 
 6 HMGN1_3150       0.0997 0.00875 1173 
 7 PRDM13_59336     0.0789 0.00794  980 
 8 UBOX5_22888      0.0734 0.00653 1378 
 9 HIST1H2AE_3012   0.0719 0.00592 1906 
10 HMGN2_3151       0.0691 0.00934  738 

I try to split the first column into 2 different columns and it seems to work well

data_short<-data_short %>% mutate(Symbol_ID=str_split_fixed(data_short$Symbol_ID, "_", 2))

Symbol_ID[,1] [,2]  GFP_Mean  GFP_SD Cells
   <chr>         <chr>    <dbl>   <dbl> <dbl>
 1 Control       0       0.0303 0.00657 7071.
 2 XRCC4         7518    0.0396 0.00768 5022 
 3 XRCC5         7520    0.0305 0.00629 5781.
 4 BRCA1         672     0.0178 0.00833 1822.
 5 DDX48         9775    0.109  0.0201   239 
 6 HMGN1         3150    0.0997 0.00875 1173 
 7 PRDM13        59336   0.0789 0.00794  980 
 8 UBOX5         22888   0.0734 0.00653 1378 
 9 HIST1H2AE     3012    0.0719 0.00592 1906 
10 HMGN2         3151    0.0691 0.00934  738 

But when I check the str(data_short) it seems like it didn’t work well…:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

> str(data_short)
tibble [1,177 × 4] (S3: tbl_df/tbl/data.frame)
 $ Symbol_ID: chr [1:1177, 1:2] "Control" "XRCC4" "XRCC5" "BRCA1" ...
 $ GFP_Mean : num [1:1177] 0.0303 0.0396 0.0305 0.0178 0.1088 ...
 $ GFP_SD   : num [1:1177] 0.00657 0.00768 0.00629 0.00833 0.02014 ...
 $ Cells    : num [1:1177] 7071 5022 5781 1822 239 ...

Why is that? how can I fix it?
Thanks in advance!

>Solution :

str_split_fixed outputs a character matrix so isn’t ideal for working with dataframe columns. tidyr::separate would be more suitable in this case e.g.

data_short %>%
  tidyr::separate(Symbol_ID, into = c("SymbolID1", "SymbolID2"), sep = "_")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading