Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sum across columns if date is not missing and if year of date is after a certain number

I want to create a new variable called bankruptcy_c, which will equal to the number of times the year of bankruptcy1, bankruptcy2, ..., bankruptcy10 is between 2007 and 2019, inclusive. At first, I created a variable that counted the number of times the columns had a date using the following code:

vars <- df %>%
  mutate(bankruptcy_c = rowSums(!is.na(across(contains("bankruptcy")))))

However, now I’m looking to only count each instance if the year is between 2007 and 2019. Here is some sample data:

structure(list(rowid = c(16L, 28L, 29L, 37L, 80778L), bankruptcy1 = structure(c(9282, 
11438, NA, 8664, 14048), class = "Date"), bankruptcy2 = structure(c(12892, 
NA, NA, 14809, 15012), class = "Date"), bankruptcy3 = structure(c(14713, 
NA, NA, NA, NA), class = "Date"), bankruptcy4 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy5 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy6 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy7 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy8 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy9 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), bankruptcy10 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), class = "Date")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

Using the sample data above, bankruptcy_c would equal to 1 for rowid = 16, 0 for rowid = 28 (because the dates are out of range), 0 for rowid = 29 (since all of the 10 instances are missing), 1 for rowid = 37, and 2 for rowid = 80778. Any help would be greatly appreciated!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

One approach would be to reshape the data longer, filter, and count.

library(tidyverse)
df %>%
  pivot_longer(-rowid) %>%
  filter(year(as.Date(value)) %in% 2007:2019) %>%
  count(rowid) %>%
  complete(rowid = df$rowid, fill = list(n=0)) # EDIT to list all rowids


# alternate filter syntax if you care about specific dates:
filter(value >= as.Date("2007-01-01"), value < as.Date("2020-01-01")) %>% 

Result

# A tibble: 5 x 2
  rowid     n
  <int> <dbl>
1    16     1
2    28     0
3    29     0
4    37     1
5 80778     2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading