Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sum duplicate rows that are grouped and combine their IDs in R

I have seal data that is grouped by columns: ID, dive number, and dive phase. I summarize this data over 20 second intervals from my original dataframe. When I do this, it creates duplicate 20 second time intervals if two different dive phase types overlap in a 20 second window. I’d like to sum the values of beats_20max in those duplicate rows then have their dive phase be combination of the two (ex: A, B would become AB). Hopefully this will make sense after looking at my data below.

seal_ID diveNum dive_phase datetime            HR_mean HR_max beats20_mean beats20_max
   <chr>     <dbl> <chr>      <dttm>                <dbl>  <dbl>        <dbl>       <int>
 8 Baikal       19 D          2019-04-02 14:43:00    38.6   44.8          6.5          12
 9 Baikal       19 D          2019-04-02 14:43:20    42.2   48            7.5          14
10 Baikal       19 D          2019-04-02 14:43:40    44.0   54.1          8            15
11 Baikal       19 D          2019-04-02 14:44:00    45.5   61.9          8            15
12 Baikal       19 D          2019-04-02 14:44:20    42.1   49.2          7.5          14
13 Baikal       19 D          2019-04-02 14:44:40    39.9   44.1          7            13
14 Baikal       19 D          2019-04-02 14:45:00    45.5   54.5          8            15
15 Baikal       19 D          2019-04-02 14:45:20    44.6   53.1          8            15
16 Baikal       19 D          2019-04-02 14:45:40    45.9   51.7          8            15
17 Baikal       19 B          2019-04-02 14:46:00    46.1   51.7          7.5          14
18 Baikal       19 D          2019-04-02 14:46:00    55.8   59.4          1.5           2
19 Baikal       19 B          2019-04-02 14:46:20    47.4   57.1          8            15
20 Baikal       19 B          2019-04-02 14:46:40    45.4   53.6          8            15

As you can see, lines 17 and 18 are duplicate times but different dive phases, I’d like to sum the beats20_max column and make their dive phase "DB". There are multiple instances of this throughout the dataframe, so if there’s a way I can just aggregate or use dplyr to fix this that would be very helpful.
I should mention that when I do this aggregation or summarization, I’ll need to make sure I still group by seal_ID and diveNum because some seal’s datetimes are the same. Thanks for any advice!

Ideal outcome:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

seal_ID diveNum dive_phase datetime            beats20_max
   <chr>     <dbl> <chr>      <dttm>                <int>
 8 Baikal       19 D          2019-04-02 14:43:00    12
 9 Baikal       19 D          2019-04-02 14:43:20    14
10 Baikal       19 D          2019-04-02 14:43:40    15
11 Baikal       19 D          2019-04-02 14:44:00    15
12 Baikal       19 D          2019-04-02 14:44:20    14
13 Baikal       19 D          2019-04-02 14:44:40    13
14 Baikal       19 D          2019-04-02 14:45:00    15
15 Baikal       19 D          2019-04-02 14:45:20    15
16 Baikal       19 D          2019-04-02 14:45:40    15
17 Baikal       19 DB         2019-04-02 14:46:00    16  
18 Baikal       19 B          2019-04-02 14:46:20    15
19 Baikal       19 B          2019-04-02 14:46:40    15

>Solution :

We could use group_by with summarise to paste (str_c) the ‘dive_phase’ and sum the ‘beats20_max’

library(dplyr)
library(stringr)
df1 %>%
    group_by(seal_ID, diveNum, datetime) %>%
    summarise(dive_phase = str_c(dive_phase, collapse = ""), 
         beats20_max = sum(beats20_max, na.rm = TRUE), .groups = 'drop') %>%
    select(any_of(names(df1)))

-output

# A tibble: 12 × 5
   seal_ID diveNum dive_phase datetime            beats20_max
   <chr>     <int> <chr>      <chr>                     <int>
 1 Baikal       19 D          2019-04-02 14:43:00          12
 2 Baikal       19 D          2019-04-02 14:43:20          14
 3 Baikal       19 D          2019-04-02 14:43:40          15
 4 Baikal       19 D          2019-04-02 14:44:00          15
 5 Baikal       19 D          2019-04-02 14:44:20          14
 6 Baikal       19 D          2019-04-02 14:44:40          13
 7 Baikal       19 D          2019-04-02 14:45:00          15
 8 Baikal       19 D          2019-04-02 14:45:20          15
 9 Baikal       19 D          2019-04-02 14:45:40          15
10 Baikal       19 BD         2019-04-02 14:46:00          16
11 Baikal       19 B          2019-04-02 14:46:20          15
12 Baikal       19 B          2019-04-02 14:46:40          15

data

df1 <- structure(list(seal_ID = c("Baikal", "Baikal", "Baikal", "Baikal", 
"Baikal", "Baikal", "Baikal", "Baikal", "Baikal", "Baikal", "Baikal", 
"Baikal", "Baikal"), diveNum = c(19L, 19L, 19L, 19L, 19L, 19L, 
19L, 19L, 19L, 19L, 19L, 19L, 19L), dive_phase = c("D", "D", 
"D", "D", "D", "D", "D", "D", "D", "B", "D", "B", "B"), 
atetime = c("2019-04-02 14:43:00", 
"2019-04-02 14:43:20", "2019-04-02 14:43:40", "2019-04-02 14:44:00", 
"2019-04-02 14:44:20", "2019-04-02 14:44:40", "2019-04-02 14:45:00", 
"2019-04-02 14:45:20", "2019-04-02 14:45:40", "2019-04-02 14:46:00", 
"2019-04-02 14:46:00", "2019-04-02 14:46:20", "2019-04-02 14:46:40"
), HR_mean = c(38.6, 42.2, 44, 45.5, 42.1, 39.9, 45.5, 44.6, 
45.9, 46.1, 55.8, 47.4, 45.4), HR_max = c(44.8, 48, 54.1, 61.9, 
49.2, 44.1, 54.5, 53.1, 51.7, 51.7, 59.4, 57.1, 53.6), beats20_mean = c(6.5, 
7.5, 8, 8, 7.5, 7, 8, 8, 8, 7.5, 1.5, 8, 8), beats20_max = c(12L, 
14L, 15L, 15L, 14L, 13L, 15L, 15L, 15L, 14L, 2L, 15L, 15L)),
 class = "data.frame", row.names = c("8", 
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
"20"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading