Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: Trimming a very long string with complete words with the beginning and end

Let’s assume I have this dataframe:

df =data.frame(text=c("This is a very long sentence that I would like to trim because I might need to put it as a label somewhere",
               "This is another very long sentence that I would also like to trim because I might need to put it as who knows what"),col2=c("1234","5678"))

Following this post I have been able to get a new column that gets me the start of the sentence with complete words, which is fine.

df$short_txt = sapply(strsplit(df$text, ' '), function(i) paste(i[cumsum(nchar(i)) <= 20], collapse = ' '))

> df$short_txt
[1] "This is a very long"  "This is another very"

However, I would also be interested in pasting the end of complete words from 20 characters before the end, having something close to this output.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

> df$short_txt
[1] "This is a very long...it as a label somewhere"  "This is another very...it as who knows what"

I can’t figure out how to complete the sapply function to reach this outcome. I tried using the paste function and changing the cumsum function as df$short_txt = sapply(strsplit(df$text, ' '), function(i) paste(i[cumsum(nchar(i)) <= 20],"...",i[cumsum(nchar(i)) >= (nchar(i)-20)], collapse = ' ')) but it does not return what I want.

Appreciate the help.

>Solution :

Perhaps we can regex this?

gsub("^(.{20}\\S*)\\b.*\\b(\\S*.{20})$", "\\1...\\2", df$text)
# [1] "This is a very long sentence...as a label somewhere" "This is another very...it as who knows what"        

Regex explanation:

^(.{20}\\S*)\\b.*\\b(\\S*.{20})$
^                              $   beginning and end of string, respectively
 (.........)        (.........)    first and second saved groups
  .{20}                  .{20}     exactly 20 characters of any kind
       \\S*          \\S*          zero or more non-space characters
            \\b  \\b               word boundaries
               .*                  anything else (including nothing)

This did not include your it at the beginning because without it, the substring is 20-long.

I’ll look at df$text[1] with various numbers for leading/trailing complete-word substrings.

sapply(seq(10, 24, by = 2), function(len) gsub(sprintf("^(.{%d}\\S*)\\b.*\\b(\\S*.{%d})$", len, len), "\\1...\\2", df$text[1]))
# [1] "This is a very... somewhere"                            
# [2] "This is a very...label somewhere"                       
# [3] "This is a very...label somewhere"                       
# [4] "This is a very long... label somewhere"                 
# [5] "This is a very long... a label somewhere"               
# [6] "This is a very long sentence...as a label somewhere"    
# [7] "This is a very long sentence...it as a label somewhere" 
# [8] "This is a very long sentence... it as a label somewhere"

I don’t know off-hand how to protect against the spaces before/after the added ... here, but it can be cleaned up post-editing (safe as long as your strings don’t natively contain "...").

sapply(seq(10, 24, by = 2), function(len) gsub(sprintf("^(.{%d}\\S*)\\b.*\\b(\\S*.{%d})$", len, len), "\\1...\\2", df$text[1])) |>
  sub(" *(\\.\\.\\.) *", "\\1", x = _)
# [1] "This is a very...somewhere"                            
# [2] "This is a very...label somewhere"                      
# [3] "This is a very...label somewhere"                      
# [4] "This is a very long...label somewhere"                 
# [5] "This is a very long...a label somewhere"               
# [6] "This is a very long sentence...as a label somewhere"   
# [7] "This is a very long sentence...it as a label somewhere"
# [8] "This is a very long sentence...it as a label somewhere"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading