Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract uppercase rows and fill down until next uppercase row

I have some data which looks like:

   RegionName
   <chr>     
 1 ANDALUCÍA 
 2 Almería   
 3 Abla      
 4 Abrucena  
 5 Adra      
 6 ALBÁNCHEZ 
 7 Alboloduy 
 8 Albox     
 9 ALCOLEA   
10 Alcóntar

Where some of the columns are uppercase. I want to extract the uppercase columns into a new column and fill(down) until the next uppercase column.

Expected output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   RegionName REGIONNAME
   <chr>        <chr>
 1 ANDALUCÍA   ANDALUCÍA   -first result
 2 Almería     ANDALUCÍA
 3 Abla        ANDALUCÍA
 4 Abrucena    ANDALUCÍA
 5 Adra        ANDALUCÍA
 6 ALBÁNCHEZ   ALBÁNCHEZ  - change here
 7 Alboloduy   ALBÁNCHEZ
 8 Albox       ALBÁNCHEZ
 9 ALCOLEA     ALCOLEA    - change here
10 Alcóntar    ALCOLEA

Data:

data = structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla", 
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA", 
"Alcóntar")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))

>Solution :

You can group the regions together based on if their name is == to their name in all upper case. Then set all names within the group to the first RegionName which is in all caps.

library(tidyverse) 

df %>%
  group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
  mutate(REGIONNAME = first(RegionName))

Output

   RegionName   grp REGIONNAME
   <chr>      <int> <chr>     
 1 ANDALUCÍA      1 ANDALUCÍA 
 2 Almería        1 ANDALUCÍA 
 3 Abla           1 ANDALUCÍA 
 4 Abrucena       1 ANDALUCÍA 
 5 Adra           1 ANDALUCÍA 
 6 ALBÁNCHEZ      2 ALBÁNCHEZ 
 7 Alboloduy      2 ALBÁNCHEZ 
 8 Albox          2 ALBÁNCHEZ 
 9 ALCOLEA        3 ALCOLEA   
10 Alcóntar       3 ALCOLEA 

Data

df <- structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla", 
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA", 
"Alcóntar")), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading