Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Take string from variable and split the variable up in two variables r

I have a dataset in which my unit of analysis and variable are together in the same column. I would like to split this up into two variables, but not sure how to.

Say I have the following data:

de <- data.frame(year = c(2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
                          2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012),
                 company = c("Company A - Variable 1", "Company A - Variable 2", "Company A - Variable 3",
                             "CompanyB - Variable 1", "CompanyB - Variable 2", "CompanyB - Variable 3", 
                             "Compan- C - Variable 1", "Compan- C - Variable 2", "Compan- C - Variable 3",
                             "Company A - Variable 1", "Company A - Variable 2", "Company A - Variable 3",
                             "CompanyB - Variable 1", "CompanyB - Variable 2", "CompanyB - Variable 3", 
                             "Compan- C - Variable 1", "Compan- C - Variable 2", "Compan- C - Variable 3"),
                 score = c(1,3,5,7,9,11,13,15,17,2,4,6,8,10,12,14,16,18))

Which looks like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

year company                score
<dbl> <chr>                  <dbl>
2010 Company A - Variable 1     1
2010 Company A - Variable 2     3
2010 Company A - Variable 3     5
2010 CompanyB - Variable 1      7
2010 CompanyB - Variable 2      9
2010 CompanyB - Variable 3     11
2010 Compan- C - Variable 1    13
2010 Compan- C - Variable 2    15
2010 Compan- C - Variable 3    17
2012 Company A - Variable 1     2
2012 Company A - Variable 2     4
2012 Company A - Variable 3     6
2012 CompanyB - Variable 1      8
2012 CompanyB - Variable 2     10
2012 CompanyB - Variable 3     12
2012 Compan- C - Variable 1    14
2012 Compan- C - Variable 2    16
2012 Compan- C - Variable 3    18

The company variable is a combination of company names (in all different forms and shapes), always followed by " – " and then the variable name. I want to wrangle the data so that I arrive at the following structure, splitting up the company and the variable into new columns:


year   company   variable score
2010 Company A Variable 1     1
2010 Company A Variable 2     3
2010 Company A Variable 3     5
2010  CompanyB Variable 1     7
2010  CompanyB Variable 2     9
2010  CompanyB Variable 3    11
2010 Compan- C Variable 1    13
2010 Compan- C Variable 2    15
2010 Compan- C Variable 3    17
2012 Company A Variable 1     2
2012 Company A Variable 2     4
2012 Company A Variable 3     6
2012  CompanyB Variable 1     8
2012  CompanyB Variable 2    10
2012  CompanyB Variable 3    12
2012 Compan- C Variable 1    14
2012 Compan- C Variable 2    16
2012 Compan- C Variable 3    18

I guess then use something to extract the text before " – " (the company name) and after " – " (the variable name). Then sort of split this up into two different variables: company and variable. Any elegant solution would be much appreciated. Thanks

>Solution :

You can use tidyr::separate with a lookahead regex to split at the blank space before the string "Variable".

library(tidyr)

separate(de, company, sep = "(?=Variable)", into = c('Company', 'Variable'))
#>    year      Company   Variable score
#> 1  2010 Company A -  Variable 1     1
#> 2  2010 Company A -  Variable 2     3
#> 3  2010 Company A -  Variable 3     5
#> 4  2010  CompanyB -  Variable 1     7
#> 5  2010  CompanyB -  Variable 2     9
#> 6  2010  CompanyB -  Variable 3    11
#> 7  2010 Compan- C -  Variable 1    13
#> 8  2010 Compan- C -  Variable 2    15
#> 9  2010 Compan- C -  Variable 3    17
#> 10 2012 Company A -  Variable 1     2
#> 11 2012 Company A -  Variable 2     4
#> 12 2012 Company A -  Variable 3     6
#> 13 2012  CompanyB -  Variable 1     8
#> 14 2012  CompanyB -  Variable 2    10
#> 15 2012  CompanyB -  Variable 3    12
#> 16 2012 Compan- C -  Variable 1    14
#> 17 2012 Compan- C -  Variable 2    16
#> 18 2012 Compan- C -  Variable 3    18
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading