Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove row if string contains more than one "@" using regular expression

I have a data frame with two columns. cnn_handle contains Twitter handles and tweet contains tweets where the Twitter handle in the corresponding row is mentioned. However, most tweets mention at least one other user/handle indicated by @. I want to remove all rows where a tweet contains more than one @.

df
    cnn_handle      tweet
1   @DanaBashCNN    @JohnKingCNN @DanaBashCNN @kaitlancollins @eliehonig @thelauracoates @KristenhCNN CNN you are still FAKE NEWS !!!
2   @DanaBashCNN    @DanaBashCNN He could have made the same calls here, from SC.
3   @DanaBashCNN    @DanaBashCNN GRAMMER ALERT:  THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point.   Also please refrain from showing a pic of him till you have one in his casket.   thank you
4   @brianstelter   @eliehonig @brianstelter My apologies to you sir. Just seems like that story disappeared. Imo the nursing home scandal is just as bad.
5   @brianstelter   @DrAndrewBaer1 @JGreenblattADL @brianstelter @CNN @TuckerCarlson @FoxNews Anti-Semite are you,  Herr Doktor? How very Mengele of you.
6   @brianstelter   @ma_makosh @Shortguy1 @brianstelter @ChrisCuomo Liberals, their feelings before facts and their crucifixion of people before due process. Never a presumption of innocence when it concerns the rival party. So un-American.
7   @andersoncooper @BrendonLeslie And Biden was a staunch opponent of “forced busingâ€. He also said that integrating schools will cause a “racial jungleâ€. But u won’t hear this on @ChrisCuomo @jaketapper @Acosta @andersoncooper bc they continue to cover up the truth about Biden & his family.
8   @andersoncooper Anderson Cooper revealed that he "wanted a change" when reflecting on his break from news as #TheMole arrives on Netflix.
9   @andersoncooper @johnnydollar01 @newsbusters @drsanjaygupta @andersoncooper He was terrible as a host

I suspect some type of regular expression is needed. However, I am not sure how to combine it with a greater-than sign.

The desired result i.e. tweets only mentioning the corresponding cnn_handle

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

cnn_handle      tweet
2   @DanaBashCNN    @DanaBashCNN He could have made the same calls here, from SC.
3   @DanaBashCNN    @DanaBashCNN GRAMMER ALERT:  THAT'S FORMER PRESIDENT TRUMP Please don't forget this important point.   Also please refrain from showing a pic of him till you have one in his casket.   thank you
8   @andersoncooper Anderson Cooper revealed that he "wanted a change" when reflecting on his break from news as #TheMole arrives on Netflix.

>Solution :

A straighforward solution using str_count from stringrwhich presupposes that @ occur only in Twitter handles:

base R:

library(stringr)
df[str_count(df$tweet, "@") > 1,]

dplyr:

library(dplyr)
library(stringr)
df %>%
  filter(!str_count(tweet, "@") > 1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading