Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Match list elements based on attribute component

I have a data set that i split into two list int1 and int2.

library(lubridate)
library(tidyverse)
library(purrr)

date <- rep_len(seq(dmy("01-01-2011"), dmy("01-01-2013"), by = "days"), 300)
ID <- rep(c("A","B", "C"), 300)
df <- data.frame(date = date,
                 x = runif(length(date), min = 60000, max = 80000),
                 y = runif(length(date), min = 800000, max = 900000),
                 ID)

df$month <- month(df$date)
df$year <- year(df$date)

# Create first list
int1 <- df %>%
  mutate(new = floor_date(date, '10 day')) %>%
  mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
  group_by(ID, new) %>%
  filter(month == "1") %>% 
  group_split()

# Create second list
int2 <- df %>%
  mutate(new = floor_date(date, '10 day')) %>%
  mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
  group_by(ID, new) %>%
  filter(month == "2") %>% 
  group_split()


names(int1) <- sapply(int1, function(x) paste(x$ID[1],
                                                   x$year[1], sep = '_'))

names(int2) <- sapply(int2, function(x) paste(x$ID[1],
                                                    x$year[1], sep = '_'))

I then assign a attribute to each list (match). I created a function check to grab this attribute more easily. I removed some elements from one list for this exmaple.


int1 <- int1[-c(3,6)]

# Convenience function to grab the attributes for you
check <- function(x) {
  return(attr(x, "match"))
}

# Add an attribute to hold the attributes of each list element
attr(int1, "match") <- data.frame(id = sapply(int1, function(x) paste(x$ID[1])),
                                     interval_start_date = sapply(int1, function(x) paste(x$new[1]))
)

# Check the attributes
check(int1)

# Add an attribute "tab" to hold the attributes of each list element
attr(int2, "match") <- data.frame(id = sapply(int2, function(x) paste(x$ID[1])),
                                     interval_start_date = sapply(int2, function(x) paste(x$new[1]))
) 

# Check the attributes
check(int2)

I would like to remove elements that are not in another based on the attribute that I had added. Specifically I would like to remove any that don’t have the same interval_start_date and ID associated with it. For the interval_start_date, only the year and the day have to match, as the month will most likely differ between the two list. In this case, I would like int2 to match int1. Any thoughts on how I could do this? A base r method is preferred, if possible.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# Expected results
expected_int2 <- list(int2[[1]], int2[[2]], int2[[3]], int2[[4]], int2[[5]], 
                      int2[[6]], int2[[7]])

names(expected_int2) <- sapply(int1, function(x) paste(x$ID[1],
                                                       x$year[1], sep = "_"))

>Solution :

We may create an index with %in% after pasteing the ‘id’ and the formatted ‘interval_start_date’ i.e. after removing the ‘month’ part

i1 <-  with(check(int2), paste(id, format(as.Date(interval_start_date), 
     "%Y-%d"))) %in%  with(check(int1), paste(id, 
      format(as.Date(interval_start_date), "%Y-%d")))
> which(i1)
[1] 1 2 4 5 7 8 9
out <- int2[i1]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading