Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace specific text values in a row of a dataframe based on a true condition on the row before

I am doing a text analysis from the congressional record, specifically when senators are speaking about each other. There are many instances where one senator refers to another who just finished speaking without naming them (ie: my colleague, my friend, etc). I am trying to replace those instances with their name.

The speeches are split into rows. The senator who is speaking is listed by name at the start of the row.

I tried three different functions. First attempt was an if elseif:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

#function 1 (had error)
r_sen_from_states <- function(x){
  if(x == "the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):"){
    str_replace(x, "the senator from Alabama", "Senator Shelby \\(R; Alabama\\)")
  } else if (x == "the senator from Alabama" & lag(x)=="^Mr.\\. SESSIONS \\(R; Alabama\\)"){
    str_replace(x, "the senator from Alabama", "Senator Sessions \\(R; Alabama\\)")
  }
}
test_df_ran <- r_sen_from_states(test_df)
##output -> error, condition has length 1> and only first element will be used

Second attempt was ifelse:

#function 2 (does not replace values with new values but no error because ifelse vectorization)
r_sen_from_states <- function(x){
  ifelse(x %in% "the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):",
         str_replace(x, "the senator from Alabama","senator Shelby"), x)}
test_df_ran <- r_sen_from_states(test_df$speeches)
##output -> the dataframe but without replacing any values

Third attempt was for loop ifelse:

#function 3 (produced "NULL")
r_sen_from_states <- function(x){
  for (i in 1:nrow(x)) {
    ifelse(x == "the senator from Alabama" & lag(x, n = 1L) == "^Mr\\. SHELBY \\(R; Alabama\\):",
           str_replace(x, "the senator from Alabama", "senator Shelby"), x)
  }
}
test_df_ran <- r_sen_from_states(test_df)
##output -> "NULL"

If I can get the ifelse() statement to apply and change the declared values, then I will construct the r_sen_states_from function using nested ifelse() statements for each state and senator possibility.

e.g., ifelse(x=="the senator from Alabama" & lag(x)=="^Mr\\. SHELBY \\(R; Alabama\\):", str_replace(x,"the senator from Alabama","senator Shelby"), ifelse(x=="the senator from Alabama" & lag(x)==^Mr.\\. SESSIONS \\(R; Alabama\\):", str_replace(x, "the senator from Alabama", "senator Sessions"),...[etc. for each state and senator pairing])

Here’s some sample data for replication/debugging purposes.

#environment data below
test_col <- c("Mr. SHELBY (R; Alabama): I acknowledge this is a test.",
              "Mrs. MURRAY (D; Washington): I say to my friend, the senator from Alabama, that they are wrong.",
              "Mr. SHELBY (R; Alabama): I do not agree with my colleague.",
              "Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray.",
              "Mr. SHELBY (R; Alabama): I thank the majority leader for their support.",
              "Mr. SESSIONS (R; Alabama): I am proud of my junior, the senator from Alabama.",
              "Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things.")
test_df <- data.frame(test_col)
colnames(test_df) <- c("speeches")

>Solution :

The code x == "the senator from Alabama" will only be true if x contains that text and nothing else – instead, you should use str_detect. I swapped that in for your second function (haven’t tried the others) and it worked great:

r_sen_from_states <- function(x){
  ifelse(str_detect(x, "the senator from Alabama") & str_detect(lag(x), "^Mr\\. SHELBY \\(R; Alabama\\):"),
         str_replace(x, "the senator from Alabama","senator Shelby"), x)}
test_df_ran <- r_sen_from_states(test_df$speeches) %>% print()

[1] "Mr. SHELBY (R; Alabama): I acknowledge this is a test."                                   
[2] "Mrs. MURRAY (D; Washington): I say to my friend, senator Shelby, that they are wrong."    
[3] "Mr. SHELBY (R; Alabama): I do not agree with my colleague."                               
[4] "Mr. FRIST (R; Tennessee): The senator from Alabama is correct, senator Murray."           
[5] "Mr. SHELBY (R; Alabama): I thank the majority leader for their support."                  
[6] "Mr. SESSIONS (R; Alabama): I am proud of my junior, senator Shelby."                      
[7] "Mr. SHELBY (R; Alabama): To my senior peer, the senator from Alabama, I say great things."

(BTW I’ve never discovered lag before, thank you for bringing it to my attention!)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading