If I have a df and I want to separate player by sep="_". Is it a smart way that I can do this without manually check how many col I will need? For my example, it is easy to tell I need to separate player into 3 new cols. What is I have 100+ rows. Is it a way to tell how many cols I need to separate player into? or is it a way to split player without knowing this info?
df <- data.frame(player=c('John_Wall', 'Dirk_Nowitzki', 'Steve_Nash_try'),
points=c(22, 29, 18),
assists=c(8, 4, 15))
## current method:
df %>% separate(player, c('v1', 'v2', 'v3'), sep ="_")
>Solution :
We may use str_count to find the count of _, get the max count, convert to sequence, paste (str_c) with ‘v’ to create the column names. Add remove = FALSE, if the original column needs to kept
library(tidyr)
library(dplyr)
library(stringr)
df %>%
separate(player, str_c('v',
seq_len(max(str_count(.$player, '_') + 1))), sep = "_", fill = "right")
-output
v1 v2 v3 points assists
1 John Wall <NA> 22 8
2 Dirk Nowitzki <NA> 29 4
3 Steve Nash try 18 15
Or instead of using separate, try with cSplit, which does automatically detect and create those columns
library(splitstackshape)
cSplit(df, 'player', sep = '_')
points assists player_1 player_2 player_3
<num> <num> <char> <char> <char>
1: 22 8 John Wall <NA>
2: 29 4 Dirk Nowitzki <NA>
3: 18 15 Steve Nash try
Or using base R
cbind(df, read.table(text = df$player, sep = "_", fill = TRUE,
header = FALSE, na.strings = ""))
player points assists V1 V2 V3
1 John_Wall 22 8 John Wall <NA>
2 Dirk_Nowitzki 29 4 Dirk Nowitzki <NA>
3 Steve_Nash_try 18 15 Steve Nash try