Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Splitting merged data into columns

I have a dataframe containing a column of the following format, where I have Vars separated by commas ordered by count, with their respective counts after the colon

df1 <- data.frame(A = c('Var1:1', 'Var1:2,Var2:1', 'Var3:2,Var2:1'))

A
Var1:1
Var1:2,Var2:1
Var2:1,Var3:1

I’ve separated these out with:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1_split <- data.frame(dplyr::str_split(df1, A, pattern = ":|,", simplify = TRUE))

Which yields:

X1 X2 X3 X4
Var1 1
Var1 2 Var2 1
Var3 2 Var2 1

As you can see, sometimes variables are in column X1, but sometimes they appear in column X3. So Vars are always in odd columns, and their counts from the : are always in the proceeding even column#

My end goal is to have a dataframe with columns in the format:

Var1 Var 2 Var 3
1
2 1
1 2

I’ve tried using a function, for loop and case when for this, as I have 19 unique variable names in my original dataframe but they don’t always appear in the same order.

fVARCOUNT <- function(x, y){`
  # x is df_split, y are the unique Var nanmes e.g. Var 1, Var 2
  df_mutated <- df_split
  # Loop over each unique variable
  for(i in y){
    df_mutated <- df_mutated %>%
    # Check if that variable is in one of X1, X3, X5 and get the value from the proceeding column if it is
    dplyr::mutate('{{i}}' := dplyr::case_when(X1 == i ~ X2,
                                X3 == i ~ X4))
  }
}
  

However, the result I’m getting is as follows:

Var1 Var2 Var 3
1
2
2

It’s as though case_when is not checking beyond X1.

>Solution :

Data

df1 <- data.frame(A = c('Var1:1', 'Var1:2,Var2:1', 'Var3:2,Var2:1'))

Code

library(tidyr)
library(dplyr)

df1 %>% 
  mutate(id = row_number()) %>% 
  separate_rows(A,sep = ",") %>% 
  separate(col = A,into = c("name","value")) %>% 
  pivot_wider(names_from = name,values_from = value)

Output

# A tibble: 3 x 4
     id Var1  Var2  Var3 
  <int> <chr> <chr> <chr>
1     1 1     NA    NA   
2     2 2     1     NA   
3     3 NA    1     2  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading