Updating integer within string row for each instance in R dataframe

Advertisements

I have a dataframe similar to the following reproducible one in which one column contains HTML code:

ID <- c(15, 25, 90, 1, 23, 543)

HTML <- c("[demography_form][1]<div></table<text-align>[demography_form_date][1]", "<text-ali>[geography_form][1]<div></table<text-align>[geography_form_date][1]", "[social_isolation][1]<div></table<div><text-align>[social_isolation_date][1]", "<text-align>[geography_form][1]<div></table<text-align>[geography_form_date][1]", "<div>[demography_form][1]<div></table<text-align>[demography_form_date][1]", "[geography_form][1]<div></table<text-align>[geography_form_date][1]</table")

df <- data.frame(ID, HTML)

I would like to update the integer within the square brackets of the HTML column to reflect each instance of repeat. For example, the second time that [demography_form] appears in a row, I would like the square brackets following it to be 2:

What’s the best way of going about doing this? I was thinking of somehow creating an instance column and then using that to update the value in the square brackets, deleting it at the end? Thanks in advance.

>Solution :

Create a grouping column from the substring inside the [] from HTML column, replace the digits inside the [] with the sequence of rows (row_number()) using str_replace_all

library(dplyr)
library(stringr)
df %>% 
  group_by(grp = str_extract(HTML, "\\[(\\w+)\\]", group =1)) %>% 
  mutate(HTML = str_replace_all(HTML, "\\[(\\d+)\\]", 
     sprintf("[%d]", row_number()))) %>% 
  ungroup %>%
  select(-grp)

-output

# A tibble: 6 × 2
     ID HTML                                                                           
  <dbl> <chr>                                                                          
1    15 [demography_form][1]<div></table<text-align>[demography_form_date][1]          
2    25 <text-ali>[geography_form][1]<div></table<text-align>[geography_form_date][1]  
3    90 [social_isolation][1]<div></table<div><text-align>[social_isolation_date][1]   
4     1 <text-align>[geography_form][2]<div></table<text-align>[geography_form_date][2]
5    23 <div>[demography_form][2]<div></table<text-align>[demography_form_date][2]     
6   543 [geography_form][3]<div></table<text-align>[geography_form_date][3]</table     

Leave a ReplyCancel reply