Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Proportion calculation based on time

I have a dataset that contains measurements taken at different points in time. I would like to calculate the percentage of times a measurement in one time period is followed by the same measurement in the next time period. I want to know how often each row has the same measurement from one period to the next. How can I do this?

enter image description here

Sample data:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

structure(list(t1 = c(1, 2, 1), t2 = c(1, 1, 1), t3 = c(1, 3, 
4), t4 = c(2, 2, 2), t5 = c(3, 3, 3), t6 = c(3, 3, 3), t7 = c(1, 
1, 1)), row.names = c(NA, -3L), spec = structure(list(cols = list(
    t1 = structure(list(), class = c("collector_double", "collector"
    )), t2 = structure(list(), class = c("collector_double", 
    "collector")), t3 = structure(list(), class = c("collector_double", 
    "collector")), t4 = structure(list(), class = c("collector_double", 
    "collector")), t5 = structure(list(), class = c("collector_double", 
    "collector")), t6 = structure(list(), class = c("collector_double", 
    "collector")), t7 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

>Solution :

To compare each time period to the previous time period, it’s probably easiest to put the data in long form and compare to the lag:

library(dplyr)
library(tidyr)

timedata |>
    mutate(id = row_number()) |>
    pivot_longer(
        -id,
        names_to = "time"
    ) |>
    group_by(id) |>
    mutate(nochange = value == lag(value)) |>
    group_by(time) |>
    summarise(
        num_repeated = sum(nochange, na.rm = TRUE), 
        percent_repeated = num_repeated / n() * 100
    )

# A tibble: 7 x 2
#   time  num_repeated percent_repeated
#   <chr>        <int>            <dbl>
# 1 t1               0              0
# 2 t2               2             66.7
# 3 t3               1             33.3
# 4 t4               0              0
# 5 t5               0              0
# 6 t6               3            100
# 7 t7               0              0

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading