Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Updating integer within string row for each instance in R dataframe

I have a dataframe similar to the following reproducible one in which one column contains HTML code:

ID <- c(15, 25, 90, 1, 23, 543)

HTML <- c("[demography_form][1]<div></table<text-align>[demography_form_date][1]", "<text-ali>[geography_form][1]<div></table<text-align>[geography_form_date][1]", "[social_isolation][1]<div></table<div><text-align>[social_isolation_date][1]", "<text-align>[geography_form][1]<div></table<text-align>[geography_form_date][1]", "<div>[demography_form][1]<div></table<text-align>[demography_form_date][1]", "[geography_form][1]<div></table<text-align>[geography_form_date][1]</table")

df <- data.frame(ID, HTML)

enter image description here

I would like to update the integer within the square brackets of the HTML column to reflect each instance of repeat. For example, the second time that [demography_form] appears in a row, I would like the square brackets following it to be 2:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

What’s the best way of going about doing this? I was thinking of somehow creating an instance column and then using that to update the value in the square brackets, deleting it at the end? Thanks in advance.

>Solution :

Create a grouping column from the substring inside the [] from HTML column, replace the digits inside the [] with the sequence of rows (row_number()) using str_replace_all

library(dplyr)
library(stringr)
df %>% 
  group_by(grp = str_extract(HTML, "\\[(\\w+)\\]", group =1)) %>% 
  mutate(HTML = str_replace_all(HTML, "\\[(\\d+)\\]", 
     sprintf("[%d]", row_number()))) %>% 
  ungroup %>%
  select(-grp)

-output

# A tibble: 6 × 2
     ID HTML                                                                           
  <dbl> <chr>                                                                          
1    15 [demography_form][1]<div></table<text-align>[demography_form_date][1]          
2    25 <text-ali>[geography_form][1]<div></table<text-align>[geography_form_date][1]  
3    90 [social_isolation][1]<div></table<div><text-align>[social_isolation_date][1]   
4     1 <text-align>[geography_form][2]<div></table<text-align>[geography_form_date][2]
5    23 <div>[demography_form][2]<div></table<text-align>[demography_form_date][2]     
6   543 [geography_form][3]<div></table<text-align>[geography_form_date][3]</table     
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading