Home Keep only the hours for which the value have not changed within the hour R

Questions

Keep only the hours for which the value have not changed within the hour R

May 9, 2023

I have a timeseries dataset with ‘n’ number of columns. In the dataset, I would like to filter and remove the hours for which the value in a column changed within the hour. In other words, I want to keep the hours that has unchanged value.

Some info about the data:

1-minute data with date in POSCIXT format (%Y-%m-%d %H:%M:%S)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.
Visit Medevel

The data is recorded for every 5 minutes, so for the time, for which the data is not recorded is filled with NA

data1<- structure(list(PDT = structure(c(1668927600, 1668927660, 
   1668927720, 
  1668927780, 1668927840, 1668927900, 1668927960, 1668928020, 1668928080, 
  1668928140, 1668928200, 1668928260, 1668928320, 1668928380, 1668928440, 
  1668928500, 1668928560, 1668928620, 1668928680, 1668928740, 1668928800, 
  1668928860, 1668928920, 1668928980, 1668929040, 1668929100, 1668929160, 
  1668929220, 1668929280, 1668929340, 1668929400, 1668929460, 1668929520, 
  1668929580, 1668929640, 1668929700, 1668929760, 1668929820, 1668929880, 
  1668929940, 1668930000, 1668930060, 1668930120, 1668930180, 1668930240, 
  1668930300, 1668932640, 1668932700, 1668932760, 1668932820, 1668932880, 
  1668932940, 1668933000, 1668933060, 1668933120, 1668933180, 1668933240, 
  1668933300, 1668933360, 1668933720, 1668933780, 1668933840, 1668933900, 
  1668933960, 1668934020, 1668934080, 1668934140, 1668934200, 1668934260, 
  1668934320, 1668934380, 1668934440, 1668934500, 1668934560, 1668934620, 
  1668934680, 1668934740, 1668934800), class = c("POSIXct", "POSIXt"
  ), tzone = "UTC"), Date = structure(c(1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 1668902400, 
  1668902400, 1668902400, 1668902400, 1668902400), class = c("POSIXct", 
  "POSIXt"), tzone = "UTC"), DayOfWeek = c("Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", 
  "Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Sunday"), 
      Month = c(11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
      11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
      11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
      11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
      11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
      11, 11, 11, 11, 11, 11), Day = c(20, 20, 20, 20, 20, 20, 
      20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
      20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
      20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
      20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
      20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20), Hour_hr = c(7, 
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
      7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
      7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
      8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
      9), Minute = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
      13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 
      28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 
      43, 44, 45, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
      36, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 
      56, 57, 58, 59, 0), ColA = c(NA, NA, NA, 65.5, NA, NA, NA, 
      NA, 65.5, NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, NA, 
      NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, 
      NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 
      NA, NA, 65.5, NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, 
      NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 65.5, NA, NA, NA, NA, 
      65.7, NA, NA)), row.names = 3041:3118, class = "data.frame")

Expected output:

In the above example, I want to exclude hour 8 from my dataset, as the value in ColA is not constant.

I have a feeling that group_by() and filter() from dplyr might do the job, but I am not sure about the function to find the unchanged values within an hour.

Any help regarding this is much appreciated. Thanks.

>Solution :

This does it:

data1 %>% group_by(Hour_hr)  %>% filter(n_distinct(ColA) < 3)

Checking results:

count(data1, Hour_hr)

  Hour_hr     n
    <dbl> <int>
1       7    46
2       9     1

This will keep colA if there’s only one numerical value or no numerical values (NA), keeping hour 7 and 9.

Equivalently you could do:

data1 %>% group_by(Hour_hr)  %>% filter(n_distinct(ColA, na.rm = T) < 2)

identify

byMR

Published May 09, 2023

Add a comment

TinyMCE override font style menu items

byMR

May 9, 2023

Questions

How to get values in a Pandas DataFrame using .loc with a MultiIindex?

byMR

May 9, 2023

Questions

Where Are JanusGraph's Vertexes In Cassandra?

byMR

May 9, 2023

Questions

Pinescript Beginner Strategy not generating orders?

byMR

May 9, 2023

Questions

abline not showing in ggplot

byMR

May 9, 2023

Questions

Have Sysbench memory rndrd/rndwr tests eliminate the affect of cache?

byMR

May 9, 2023

Keep only the hours for which the value have not changed within the hour R

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

TinyMCE override font style menu items

How to get values in a Pandas DataFrame using .loc with a MultiIindex?

Where Are JanusGraph's Vertexes In Cassandra?

Pinescript Beginner Strategy not generating orders?

abline not showing in ggplot

Have Sysbench memory rndrd/rndwr tests eliminate the affect of cache?

Keep Up to Date with the Most Important News

Keep only the hours for which the value have not changed within the hour R

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

TinyMCE override font style menu items

How to get values in a Pandas DataFrame using .loc with a MultiIindex?

Where Are JanusGraph's Vertexes In Cassandra?

Pinescript Beginner Strategy not generating orders?

abline not showing in ggplot

Have Sysbench memory rndrd/rndwr tests eliminate the affect of cache?

Discover more from Dev solutions