Write a function to get later elements from str_split()

I’ve received raft of data sets with multiple pieces of data in a single column recently and a like the title suggests I’m trying to write a function to return some of the later split elements. Hunting around I’ve seen solutions on how to get just the first element, or just the last but not how to select which elements are returned. This looks like a persistent issues in these data sets so a solution that I can abstract would would be delightful.

Ideally this function would return just the binomial names of these organism, but I don’t want it anchored to the back of the string as some times there is more unneeded information after the names


foo <-  data.frame(id = paste0("a", 1:6),
                      Organisms = c("EA - Enterobacter aerogenes",  "EA - Enterobacter aerogenes",
                                    "KP - Klebsiella pneumoniae", "ACBA - Acinetobacter baumannii",
                                    "ENC - Enterobacter cloacae", "KP - Klebsiella pneumoniae")) 
 ## just the first element (does not allow you to select 2 elements)                    
Orgsplit_abrev <- function(x){
  sapply(str_split(x," "), getElement, 1)

foo %>%

str_split(foo$Organisms, " ")[[1]][c(3,4)]

>Solution :

We may use tail – as there are more than one element to be returned, return as a list column

Orgsplit_abrev <- function(x){
  lapply(str_split(x," "), tail, 2)


foo %>%
1   Enterobacter, aerogenes
2   Enterobacter, aerogenes
3    Klebsiella, pneumoniae
4  Acinetobacter, baumannii
5     Enterobacter, cloacae
6    Klebsiella, pneumoniae

Also, if we want to specify the index, create a lambda function

Orgsplit_abrev <- function(x){
  lapply(str_split(x," "), function(x) x[c(3, 4)])

Or may also use Extract with [

Orgsplit_abrev <- function(x){
   lapply(str_split(x," "),`[`, c(3, 4))

Leave a Reply