Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Write a function to get later elements from str_split()

I’ve received raft of data sets with multiple pieces of data in a single column recently and a like the title suggests I’m trying to write a function to return some of the later split elements. Hunting around I’ve seen solutions on how to get just the first element, or just the last but not how to select which elements are returned. This looks like a persistent issues in these data sets so a solution that I can abstract would would be delightful.

Example:
Ideally this function would return just the binomial names of these organism, but I don’t want it anchored to the back of the string as some times there is more unneeded information after the names

library(tidyverse)

foo <-  data.frame(id = paste0("a", 1:6),
                      Organisms = c("EA - Enterobacter aerogenes",  "EA - Enterobacter aerogenes",
                                    "KP - Klebsiella pneumoniae", "ACBA - Acinetobacter baumannii",
                                    "ENC - Enterobacter cloacae", "KP - Klebsiella pneumoniae")) 
 ## just the first element (does not allow you to select 2 elements)                    
Orgsplit_abrev <- function(x){
  sapply(str_split(x," "), getElement, 1)
}

foo %>%
  summarise(Orgsplit_abrev(Organisms))


str_split(foo$Organisms, " ")[[1]][c(3,4)]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We may use tail – as there are more than one element to be returned, return as a list column

Orgsplit_abrev <- function(x){
  lapply(str_split(x," "), tail, 2)
}

-testing

foo %>%
   summarise(Orgsplit_abrev(Organisms))
Orgsplit_abrev(Organisms)
1   Enterobacter, aerogenes
2   Enterobacter, aerogenes
3    Klebsiella, pneumoniae
4  Acinetobacter, baumannii
5     Enterobacter, cloacae
6    Klebsiella, pneumoniae

Also, if we want to specify the index, create a lambda function

Orgsplit_abrev <- function(x){
  lapply(str_split(x," "), function(x) x[c(3, 4)])
}

Or may also use Extract with [

Orgsplit_abrev <- function(x){
   lapply(str_split(x," "),`[`, c(3, 4))
 }
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading