Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filling rows of multiple columns based on multiple conditions

I have example data as follows:

dat <- structure(list(
zipcode = c(1001, 1002, 1003, 1004, 1101, 1102, 1103, 1104, 1201, 1202, 1203, 1302), 
areacode = c(4, 4, NA, 4, 4, 4, NA, 1, 4, 4, NA, 4), 
type = structure(c(1L, 1L, NA, 1L, 2L, 2L, NA, 1L, 1L, 1L, NA, 1L), 
.Label = c("clay", "sand"), class = "factor"), 
region = c(3, 3, NA, 3, 3, 3, NA, 3, 3, 3, NA, 3), 
do_not_fill = c(1, NA, NA, 1, 1, NA, NA, 1, NA, NA, NA, 1)), 
class = c("data.table", "data.frame"), row.names = c(NA, -4L))

    zipcode areacode type region do_not_fill
 1:    1001        4 clay      3           1
 2:    1002        4 clay      3          NA
 3:    1003       NA <NA>     NA          NA
 4:    1004        4 clay      3           1
 5:    1101        4 sand      3           1
 6:    1102        4 sand      3          NA
 7:    1103       NA <NA>     NA          NA
 8:    1104        1 clay      3           1
 9:    1201        4 clay      3          NA
10:    1202        4 clay      3          NA
11:    1203       NA <NA>     NA          NA
12:    1302        4 clay      3           1

I want to fill ONLY the columns areacode, type and region based on two conditions.

  1. The areacode has to be the same before and after the NA.
  2. The first two digits of the zipcode have to be the same before and after the NA.

Based on this solution, and this solution, I attempted following (however data.table solutions are welcomed and even preferred):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(dplyr)
dat |> 
  mutate(type = as.character(type)) |> 
  mutate(across(1:4,
                ~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & 
                         lag(as.numeric(substr(zipcode, 1, 2))) == lead(as.numeric(substr(zipcode, 1, 2))),
                         lag(.),
                         .)))

But somewhere I am doing something wrong, because I get:

Error:
! `n` and `row.names` must be consistent.
Run `rlang::last_error()` to see where the error occurred.

Desired output:

    zipcode areacode type region do_not_fill
 1:    1001        4 clay      3           1
 2:    1002        4 clay      3          NA
 3:    1003        4 clay      3          NA
 4:    1004        4 clay      3           1
 5:    1101        4 sand      3           1
 6:    1102        4 sand      3          NA
 7:    1103       NA <NA>     NA          NA
 8:    1104        1 clay      3           1
 9:    1201        4 clay      3          NA
10:    1202        4 clay      3          NA
11:    1203       NA <NA>     NA          NA
12:    1302        4 clay      3           1

EDIT

as_tibble(dat) |>
  mutate(type = as.character(areacode)) |> 
  mutate(across(1:4,
                ~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & 
                           lag(as.numeric(substr(zipcode, 1, 2))) == lead(as.numeric(substr(zipcode, 1, 2))),
                         lag(.),
                         .)))


# A tibble: 12 x 5
   zipcode areacode type  region do_not_fill
     <dbl>    <dbl> <chr>  <dbl>       <dbl>
 1    1001        4 4          3           1
 2    1002        4 4          3          NA
 3    1003        4 4          3          NA
 4    1004        4 4          3           1
 5    1101        4 4          3           1
 6    1102        4 4          3          NA
 7    1103       NA NA        NA          NA
 8    1104        1 1          3           1
 9    1201        4 4          3          NA
10    1202        4 4          3          NA
11    1203       NA NA        NA          NA
12    1302        4 4          3           1

>Solution :

You need to convert it to a tibble first. I think this is because data.table has extra attributes

Have a look at the rownames,

rownames(as_tibble(dat))
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12"
rownames(dat)
 [1] "1" "2" "3" "4"

as_tibble(dat) |>
  mutate(type = as.character(type)) |> 
  mutate(across(1:4,
                ~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & 
                           lag(as.numeric(substr(zipcode, 1, 2))) == lead(as.numeric(substr(zipcode, 1, 2))),
                         lag(.),
                         .)))

# A tibble: 12 x 5
   zipcode areacode type  region do_not_fill
     <dbl>    <dbl> <chr>  <dbl>       <dbl>
 1    1001        4 clay       3           1
 2    1002        4 clay       3          NA
 3    1003        4 clay       3          NA
 4    1004        4 clay       3           1
 5    1101        4 sand       3           1
 6    1102        4 sand       3          NA
 7    1103       NA NA        NA          NA
 8    1104        1 clay       3           1
 9    1201        4 clay       3          NA
10    1202        4 clay       3          NA
11    1203       NA NA        NA          NA
12    1302        4 clay       3           1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading