Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Keep only the hours for which the value have not changed within the hour R

I have a timeseries dataset with ‘n’ number of columns. In the dataset, I would like to filter and remove the hours for which the value in a column changed within the hour. In other words, I want to keep the hours that has unchanged value.

Some info about the data:

  • 1-minute data with date in POSCIXT format (%Y-%m-%d %H:%M:%S)

    MEDevel.com: Open-source for Healthcare and Education

    Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

    Visit Medevel

  • The data is recorded for every 5 minutes, so for the time, for which the data is not recorded is filled with NA

    data1<- structure(list(PDT = structure(c(1668927600, 1668927660, 
       1668927720, 
      1668927780, 1668927840, 1668927900, 1668927960, 1668928020, 1668928080, 
      1668928140, 1668928200, 1668928260, 1668928320, 1668928380, 1668928440, 
      1668928500, 1668928560, 1668928620, 1668928680, 1668928740, 1668928800, 
      1668928860, 1668928920, 1668928980, 1668929040, 1668929100, 1668929160, 
      1668929220, 1668929280, 1668929340, 1668929400, 1668929460, 1668929520, 
      1668929580, 1668929640, 1668929700, 1668929760, 1668929820, 1668929880, 
      1668929940, 1668930000, 1668930060, 1668930120, 1668930180, 1668930240, 
      1668930300, 1668932640, 1668932700, 1668932760, 1668932820, 1668932880, 
      1668932940, 1668933000, 1668933060, 1668933120, 1668933180, 1668933240, 
      1668933300, 1668933360, 1668933720, 1668933780, 1668933840, 1668933900, 
      1668933960, 1668934020, 1668934080, 1668934140, 1668934200, 1668934260, 
      1668934320, 1668934380, 1668934440, 1668934500, 1668934560, 1668934620, 
      1668934680, 1668934740, 1668934800), class = c("POSIXct", "POSIXt"
      ), tzone = "UTC"), Date = structure(c(1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
      1668902400, 1668902400, 1668902400, 1668902400), class = c("POSIXct", 
      "POSIXt"), tzone = "UTC"), DayOfWeek = c("Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
      "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday"), 
          Month = c(11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
          11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
          11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
          11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
          11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
          11, 11, 11, 11, 11, 11), Day = c(20, 20, 20, 20, 20, 20, 
          20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
          20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
          20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
          20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
          20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20), Hour_hr = c(7, 
          7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
          7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
          7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
          8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
          9), Minute = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
          13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 
          28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 
          43, 44, 45, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
          36, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 
          56, 57, 58, 59, 0), ColA = c(NA, NA, NA, 65.5, NA, NA, NA, 
          NA, 65.5, NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, NA, 
          NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, 
          NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 
          NA, NA, 65.5, NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, 
          NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 
          65.7, NA, NA)), row.names = 3041:3118, class = "data.frame")
    

Expected output:

In the above example, I want to exclude hour 8 from my dataset, as the value in ColA is not constant.

I have a feeling that group_by() and filter() from dplyr might do the job, but I am not sure about the function to find the unchanged values within an hour.

Any help regarding this is much appreciated. Thanks.

>Solution :

This does it:

data1 %>% group_by(Hour_hr)  %>% filter(n_distinct(ColA) < 3)

Checking results:

count(data1, Hour_hr)

  Hour_hr     n
    <dbl> <int>
1       7    46
2       9     1

This will keep colA if there’s only one numerical value or no numerical values (NA), keeping hour 7 and 9.

Equivalently you could do:

data1 %>% group_by(Hour_hr)  %>% filter(n_distinct(ColA, na.rm = T) < 2)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading