Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Randomly reshuffle words order in string

I have a larger data frame consisting of texts where I want to reshuffle the order of words in each string randomly.

To give you a concrete exampleMy data looks somehow like the data below:

library(stringi)
require(tidyverse)

set.seed(123)

n <- 100
df <- data.frame(id = 1:n,
                 text = rep(stri_rand_lipsum(n)))

# Some preprocessing
df <- df %>%
  mutate(text = tolower(text),
         text = gsub("[[:punct:]]", "", text))

I want to reshuffle word order at random in each string found in the variable text.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I found several ways how to reshuffle each letter, but not any ways of how to reshuffle word’s order randomly. Does anybody know how to do it? An important factor is that my data consists of millions of rows, thus, the approach need to be suitable for larger data sets as well.

Thanks!

>Solution :

We can strsplit the whole string with space " " as the delimiter. Then use sample on these individual words to generate random order, and paste them back into one string. I guess we should directly assign the result into a new column instead of using mutate if we are aiming for efficiency. However, I’m not sure how efficient my code is.

df$random_text <- sapply(strsplit(df$text, " "), \(x) paste(sample(x), collapse = " "))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading