Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Assign group identifiers to groups of rows falling between rows containing a string in R

I’ve been given an Excel file in which the end of each group of data is marked by a row that is blank except for one cell which contains a string like "Person 1", "Person 2", "Person 3", and so on. The data belonging to Person 1 are in rows preceding the row containing "Person 1", the data belonging to Person 2 are in the rows between the row with "Person 1" and the row containing "Person 2". This pattern is followed until the end of the file, where the last row contains a cell with "Person 100". To make things even more interesting, the "Person [n]" string is not always in the same column and the number of rows per person can vary. See the toy example below.

 df_1 <- data.frame(iv1=c(rbinom(3,1,.4), NA, rbinom(4,1,.4), NA, rbinom(2,1,.4), NA),
            iv2=c(rbinom(3,1,.4), NA, rbinom(4,1,.4), NA, rbinom(2,1,.4), NA),
            iv3=c(rbinom(3,1,.4), "Person 1", rbinom(4,1,.4), NA, rbinom(2,1,.4), "Person 3"),
            dv1=c(rbinom(3,1,.4), NA, rbinom(4,1,.4), "Person 2", rbinom(2,1,.4), NA),
            dv2=c(rbinom(3,1,.4), NA, rbinom(4,1,.4), NA, rbinom(2,1,.4), NA),
            dv3=c(rbinom(3,1,.4), NA, rbinom(4,1,.4), NA, rbinom(2,1,.4), NA))

Yields this data frame

   iv1 iv2      iv3      dv1 dv2 dv3
1    1   1        0        1   1   0
2    0   0        1        0   0   0
3    1   0        0        1   0   1
4   NA  NA Person 1     <NA>  NA  NA
5    1   1        0        0   1   1
6    1   0        0        0   0   0
7    0   0        0        1   0   0
8    1   0        0        1   1   1
9   NA  NA     <NA> Person 2  NA  NA
10   0   0        0        1   0   0
11   0   1        0        0   0   1
12  NA  NA Person 3     <NA>  NA  NA

What I would like to do is create a new column ("Person_ID") that identifies the data belonging to each person, so Person_ID would equal 1 for rows belonging to Person 1, Person_ID would equal 2 for rows belonging to Person 2, and so on, as in the data frame below.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  iv1 iv2 iv3 dv1 dv2 dv3 Person_ID
1   1   1   0   1   1   0         1
2   0   0   1   0   0   0         1
3   1   0   0   1   0   1         1
4   1   1   0   0   1   1         2
5   1   0   0   0   0   0         2
6   0   0   0   1   0   0         2
7   1   0   0   1   1   1         2
8   0   0   0   1   0   0         3
9   0   1   0   0   0   1         3

I would love a dplyr-based solution, but of course, I’m open to whatever works. Thanks!

>Solution :

We could do it this way:
The values in iv1:dv3 do not match because you did not set a seed:

First solution is depending on NAs that may interfere with other NA data.
The second solution is independent of NAs:

library(dplyr)

df_1 %>% 
  mutate(Person_ID=cumsum(is.na(iv1))+1) %>% 
  na.omit()
   iv1   iv2 iv3   dv1     dv2   dv3 Person_ID
  <int> <int> <chr> <chr> <int> <int>     <dbl>
1     0     0 0     0         0     0         1
2     1     1 1     0         1     0         1
3     1     0 0     0         0     0         1
4     1     1 0     0         0     1         2
5     1     1 0     0         0     1         2
6     1     0 0     0         1     1         2
7     0     0 1     1         1     0         2
8     0     0 1     0         0     0         3
9     1     1 0     1         0     0         3

Another way could be:

library(tidyverse)

df_1 %>% 
  mutate(Person_ID = coalesce(iv3, dv1),
         Person_ID = ifelse(str_detect(Person_ID, "Person"), parse_number(Person_ID), NA)) %>% 
  fill(Person_ID, .direction = "up") %>% 
  na.omit()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading