Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Match and replace based on conditions in R

Scrapped this massive (10M+ entries) Twitter dataset using academictwitteR, and as I am preparing to do some network analysis, I’ve come up against an issue whereby the dataset only identifies the used ID if a particular tweet is responding to another user (see mockup below). What I am trying to do across this dataset is a conditional replace whereby the user ID in the "in response to" column is replaced by the username.

Current database

ID_column Username In_response_to
ID12345 JohnA NA
ID54321 JaneB ID12345
ID51243 MarkE ID54321

Desired outcome

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ID_column Username In_response_to
ID12345 JohnA NA
ID54321 JaneB JohnA
ID51243 MarkE JaneB

I have looked around extensively through SO and other forums for solutions, but I haven’t managed to. Being relatively new to R, I am sure the answer will be staring me in the face…

>Solution :

library(dplyr)
data_df <- read.delim(file = textConnection('
ID12345 JohnA   NA
ID54321 JaneB   ID12345
ID51243 MarkE   ID54321
'), header = FALSE) |> setNames(c('ID_column', 'Username', 'In_response_to'))


lookup_list <- (data_df$Username) |> setNames(data_df$ID_column)

data_df |>
  mutate(In_response_to = recode(In_response_to, !!!lookup_list))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading