I have a df with a text column, and a column with a wordcount value.
How can I delete the last n words of the text (specified in the ‘wc’ column) and save the output to a third column?
In other words, I need the "introductory" part of a bunch of texts, and I know when the intro ends, so I want to cut the text off at that point and save the intro in a new column.
df <- data.frame(text = c("this is a long text","this is also a long text", "another long text"),wc=c('1','2','1'))
Desired output:
| text | wc | chopped_off_text |
|---|---|---|
| this is a long text | 1 | this is a long |
| this is also a long text | 2 | this is also a |
| another long text | 1 | another long |
>Solution :
You can use the word function from the stringr package to extract "words" in a sentence. str_count(text, "\\s") + 1 counts the number of words present in the sentence.
library(stringr)
library(dplyr)
df %>%
mutate(chopped_off_text =
word(text, 1, end = str_count(text, "\\s") + 1 - as.integer(wc)))
text wc chopped_off_text
1 this is a long text 1 this is a long
2 this is also a long text 2 this is also a
3 another long text 1 another long