I have a dataset that contains measurements taken at different points in time. I would like to calculate the percentage of times a measurement in one time period is followed by the same measurement in the next time period. I want to know how often each row has the same measurement from one period to the next. How can I do this?
Sample data:
structure(list(t1 = c(1, 2, 1), t2 = c(1, 1, 1), t3 = c(1, 3,
4), t4 = c(2, 2, 2), t5 = c(3, 3, 3), t6 = c(3, 3, 3), t7 = c(1,
1, 1)), row.names = c(NA, -3L), spec = structure(list(cols = list(
t1 = structure(list(), class = c("collector_double", "collector"
)), t2 = structure(list(), class = c("collector_double",
"collector")), t3 = structure(list(), class = c("collector_double",
"collector")), t4 = structure(list(), class = c("collector_double",
"collector")), t5 = structure(list(), class = c("collector_double",
"collector")), t6 = structure(list(), class = c("collector_double",
"collector")), t7 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
>Solution :
To compare each time period to the previous time period, it’s probably easiest to put the data in long form and compare to the lag:
library(dplyr)
library(tidyr)
timedata |>
mutate(id = row_number()) |>
pivot_longer(
-id,
names_to = "time"
) |>
group_by(id) |>
mutate(nochange = value == lag(value)) |>
group_by(time) |>
summarise(
num_repeated = sum(nochange, na.rm = TRUE),
percent_repeated = num_repeated / n() * 100
)
# A tibble: 7 x 2
# time num_repeated percent_repeated
# <chr> <int> <dbl>
# 1 t1 0 0
# 2 t2 2 66.7
# 3 t3 1 33.3
# 4 t4 0 0
# 5 t5 0 0
# 6 t6 3 100
# 7 t7 0 0
