Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to separate a col into mutiple cols without knowing the final col number ahead of time?

If I have a df and I want to separate player by sep="_". Is it a smart way that I can do this without manually check how many col I will need? For my example, it is easy to tell I need to separate player into 3 new cols. What is I have 100+ rows. Is it a way to tell how many cols I need to separate player into? or is it a way to split player without knowing this info?

df <- data.frame(player=c('John_Wall', 'Dirk_Nowitzki', 'Steve_Nash_try'),
                 points=c(22, 29, 18),
                 assists=c(8, 4, 15))
## current method:
df %>% separate(player, c('v1', 'v2', 'v3'), sep ="_")

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

We may use str_count to find the count of _, get the max count, convert to sequence, paste (str_c) with ‘v’ to create the column names. Add remove = FALSE, if the original column needs to kept

library(tidyr)
library(dplyr)
library(stringr)
df %>% 
   separate(player, str_c('v', 
      seq_len(max(str_count(.$player, '_') + 1))), sep = "_", fill = "right")

-output

     v1       v2   v3 points assists
1  John     Wall <NA>     22       8
2  Dirk Nowitzki <NA>     29       4
3 Steve     Nash  try     18      15

Or instead of using separate, try with cSplit, which does automatically detect and create those columns

library(splitstackshape)
cSplit(df, 'player', sep = '_')
   points assists player_1 player_2 player_3
    <num>   <num>   <char>   <char>   <char>
1:     22       8     John     Wall     <NA>
2:     29       4     Dirk Nowitzki     <NA>
3:     18      15    Steve     Nash      try

Or using base R

cbind(df,  read.table(text = df$player, sep = "_", fill = TRUE, 
    header = FALSE, na.strings = ""))
          player points assists    V1       V2   V3
1      John_Wall     22       8  John     Wall <NA>
2  Dirk_Nowitzki     29       4  Dirk Nowitzki <NA>
3 Steve_Nash_try     18      15 Steve     Nash  try
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading