Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create new columns and a row per each string pattern instance in dplyr

I have the following table:

MatchID Player Event
1096 Marcel NA
1096 Juan G70’IO85′
1090 Andre P43’G87′

I want to create two new columns based on the column Event. The new columns are: Event_type and Event_time, where the event type is the letter and event time is the numeric value after the letter. For example, in G70′, the event type is G, and the time is 70. Note that the resulting table should show each event in a separate line. P43’G87′ has two events for example. If Event has nothing, then show the new columns as NAs.

The resulting table should look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

MatchID Player Event_type Event_time
1096 Marcel NA NA
1096 Juan G 70
1096 Juan IO 85
1090 Andre P 43
1090 Andre G 87

My approach to the problem is by using the following functions:

df%mutate(event_type=str_match_all(Event,"[A-Z]+"),
          event_time=str_match_all(Event,"[0-9]+"))

The issue is that this doesn’t give each event in a separate row. If it finds two events like in here: P43’G87′, it just puts in the same row: "P,G". How can I separate them into a single line per event like it’s shown in the expected table?

>Solution :

We split the ‘Event’ column at the ' to expand the rows, then filter out the blank ("") elements, and capture the non digits ((\\D+)) and the digits ((\\d+)) separately from the ‘Event’ to create two columns ‘Event_type’ and ‘Event_time’

library(dplyr)
library(tidyr)
df %>% 
  separate_rows(Event, sep = "'") %>% 
  filter(nzchar(Event)) %>%
   extract(Event, into = c("Event_type", "Event_time"), 
     "(\\D+)(\\d+)", convert = TRUE)

-output

# A tibble: 5 × 4
  MatchID Player Event_type Event_time
    <int> <chr>  <chr>           <int>
1    1096 Marcel <NA>               NA
2    1096 Juan   G                  70
3    1096 Juan   IO                 85
4    1090 Andre  P                  43
5    1090 Andre  G                  87

data

df <- structure(list(MatchID = c(1096L, 1096L, 1090L), Player = c("Marcel", 
"Juan", "Andre"), Event = c(NA, "G70'IO85'", "P43'G87'")), 
class = "data.frame", row.names = c(NA, 
-3L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading