Extract uppercase rows and fill down until next uppercase row

April 6, 2022

I have some data which looks like:

   RegionName
   <chr>     
 1 ANDALUCÍA 
 2 Almería   
 3 Abla      
 4 Abrucena  
 5 Adra      
 6 ALBÁNCHEZ 
 7 Alboloduy 
 8 Albox     
 9 ALCOLEA   
10 Alcóntar

Where some of the columns are uppercase. I want to extract the uppercase columns into a new column and fill(down) until the next uppercase column.

Expected output:

   RegionName REGIONNAME
   <chr>        <chr>
 1 ANDALUCÍA   ANDALUCÍA   -first result
 2 Almería     ANDALUCÍA
 3 Abla        ANDALUCÍA
 4 Abrucena    ANDALUCÍA
 5 Adra        ANDALUCÍA
 6 ALBÁNCHEZ   ALBÁNCHEZ  - change here
 7 Alboloduy   ALBÁNCHEZ
 8 Albox       ALBÁNCHEZ
 9 ALCOLEA     ALCOLEA    - change here
10 Alcóntar    ALCOLEA

Data:

data = structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla", 
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA", 
"Alcóntar")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))

>Solution :

You can group the regions together based on if their name is == to their name in all upper case. Then set all names within the group to the first RegionName which is in all caps.

library(tidyverse) 

df %>%
  group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
  mutate(REGIONNAME = first(RegionName))

Output

   RegionName   grp REGIONNAME
   <chr>      <int> <chr>     
 1 ANDALUCÍA      1 ANDALUCÍA 
 2 Almería        1 ANDALUCÍA 
 3 Abla           1 ANDALUCÍA 
 4 Abrucena       1 ANDALUCÍA 
 5 Adra           1 ANDALUCÍA 
 6 ALBÁNCHEZ      2 ALBÁNCHEZ 
 7 Alboloduy      2 ALBÁNCHEZ 
 8 Albox          2 ALBÁNCHEZ 
 9 ALCOLEA        3 ALCOLEA   
10 Alcóntar       3 ALCOLEA

Data

df <- structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla", 
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA", 
"Alcóntar")), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"))