Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Keep separator using Regex in separate_rows()

How can I keep the parenthesis in from Q11 in the data below? This column is from a Google forms in which people could choose as many Brazilian regions as they wished, now I have to slipt the region. Google separates the answers with commas, so every region is separed by ), . How can I slipt the rows using ), as a separator but keep the open ) ?

  • Code:
df %>% select(Q1,Q11) %>% 
  pivot_longer(c(Q11)) %>%
  separate_rows(value, sep = "\\),") %>%   ### NOT WORKING
  # group_by(Q2, ID) %>%
  mutate(row = row_number()) %>%
  pivot_wider() %>%
  select(-row) %>% 
  mutate(Q11 = str_trim(Q11, side = c("both"))
  ) 
       
  • Desired output: one row per Brazilian region, like in the original Forms:

Data in the Forms looks like this:

## Sul (Rio Grande do Sul, Santa Catarina, Paraná), Sudeste (Espírito Santo, Minas Gerais, Rio de Janeiro e São Paulo), Nordeste (Alagoas, Bahia, Ceará, Maranhão, Piauí, Pernambuco, Paraíba, Rio Grande do Norte e Sergipe)
## Nordeste (Alagoas, Bahia, Ceará, Maranhão, Piauí, Pernambuco, Paraíba, Rio Grande do Norte e Sergipe)
## Sudeste (Espírito Santo, Minas Gerais, Rio de Janeiro e São Paulo)
## Centro-Oeste (Goiás, Mato Grosso, Mato Grosso do Sul e o Distrito Federal)
## Norte (Acre, Amazonas, Amapá, Pará, Rondônia, Roraima e Tocantins)

These links were a bit helpful: 1 , 2 , but I couldn’t get my head around it! thanks in advance.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • data:
structure(list(Q11 = structure(c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 3L, 6L, 6L, 2L, 6L, 4L, 5L, 3L, 6L, 1L, 6L, 
6L, 7L), .Label = c("Sudeste (Espírito Santo, Minas Gerais, Rio de Janeiro e São Paulo), Centro-Oeste (Goiás, Mato Grosso, Mato Grosso do Sul e o Distrito Federal)", 
"Sudeste (Espírito Santo, Minas Gerais, Rio de Janeiro e São Paulo), Centro-Oeste (Goiás, Mato Grosso, Mato Grosso do Sul e o Distrito Federal), Nordeste (Alagoas, Bahia, Ceará, Maranhão, Piauí, Pernambuco, Paraíba, Rio Grande do Norte e Sergipe)", 
"Sudeste (Espírito Santo, Minas Gerais, Rio de Janeiro e São Paulo), Nordeste (Alagoas, Bahia, Ceará, Maranhão, Piauí, Pernambuco, Paraíba, Rio Grande do Norte e Sergipe)", 
"Sul (Rio Grande do Sul, Santa Catarina, Paraná), Sudeste (Espírito Santo, Minas Gerais, Rio de Janeiro e São Paulo)", 
"Sul (Rio Grande do Sul, Santa Catarina, Paraná), Sudeste (Espírito Santo, Minas Gerais, Rio de Janeiro e São Paulo), Centro-Oeste (Goiás, Mato Grosso, Mato Grosso do Sul e o Distrito Federal), Nordeste (Alagoas, Bahia, Ceará, Maranhão, Piauí, Pernambuco, Paraíba, Rio Grande do Norte e Sergipe)", 
"Sul (Rio Grande do Sul, Santa Catarina, Paraná), Sudeste (Espírito Santo, Minas Gerais, Rio de Janeiro e São Paulo), Centro-Oeste (Goiás, Mato Grosso, Mato Grosso do Sul e o Distrito Federal), Norte (Acre, Amazonas, Amapá, Pará, Rondônia, Roraima e Tocantins), Nordeste (Alagoas, Bahia, Ceará, Maranhão, Piauí, Pernambuco, Paraíba, Rio Grande do Norte e Sergipe)", 
"Sul (Rio Grande do Sul, Santa Catarina, Paraná), Sudeste (Espírito Santo, Minas Gerais, Rio de Janeiro e São Paulo), Nordeste (Alagoas, Bahia, Ceará, Maranhão, Piauí, Pernambuco, Paraíba, Rio Grande do Norte e Sergipe)"
), class = "factor")), row.names = c(NA, -25L), class = "data.frame")

>Solution :

use a look-behind regex:. ie separate on a comma which is preceeded by a closing paranthesis: (?<=\\)),

df%>%
  separate_rows(sep='(?<=\\)),')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading