R: Verifying The Results Of Coin Flips

byMR

May 7, 2023

I am working with the R programming language.

Suppose I have the following problem:

There is a coin where if it lands head then the probability of the next flip being heads is 0.6 (and if tails then the next flip being tails is also 0.6)
There are 100 students in a class
Each student flips this coin a random number of times
The last flip of student_n does not influence the first flip of student_n+1 (i.e. when the next student flips the coin, the first flip has 0.5 probability of heads or tails, but the next flip for this student depends on the previous flip)

I am trying to write R code to represent this problem.

First I defined the variables:

library(dplyr)
library(stringr)

# generate data
set.seed(123)
ids <- 1:100
student_id <- sample(ids, 100000, replace = TRUE)
coin_result <- character(1000)
coin_result[1] <- sample(c("H", "T"), 1)

Next, I tried to write the flipping process:

for (i in 2:length(coin_result)) {
  if (student_id[i] != student_id[i-1]) {
    coin_result[i] <- sample(c("H", "T"), 1)
  } else if (coin_result[i-1] == "H") {
    coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.6, 0.4))
  } else {
    coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.4, 0.6))
  }
}

#tidy up
my_data <- data.frame(student_id, coin_result)
my_data <- my_data[order(my_data$student_id),]

Finally, I tried to verify the results:

my_data %>%
  group_by(student_id) %>%
  summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = 'drop') %>%
  filter(!is.na(Sequence)) %>%
  count(Sequence)

Even though the code ran, I don’t think my code is correct – when I look at the results:

# A tibble: 4 x 2
  Sequence     n
  <chr>    <int>
1 HH       23810
2 HT       25043
3 TH       25042
4 TT       26005

I think if I was correct, HH should have been significantly greater than HT , and TT should have been significantly greater than TH.

Can someone please tell me if I have done this correctly and how to correct it?

Thanks!

>Solution :

I think you need to sort the student_id vector before the loop, so that your comparison of student_id[i] != student_id[i-1] would be valid. Otherwise, it’s not catching consecutive flips from the same student.

The result seems to make sense, where HH and TT together occupies 60.4% of the total flips.

library(tidyverse)

set.seed(123)
ids <- 1:100
# only the following line was changed, all other lines are same as your code
student_id <- sort(sample(ids, 100000, replace = TRUE))
coin_result <- character(1000)
coin_result[1] <- sample(c("H", "T"), 1)

for (i in 2:length(coin_result)) {
  if (student_id[i] != student_id[i-1]) {
    coin_result[i] <- sample(c("H", "T"), 1)
  } else if (coin_result[i-1] == "H") {
    coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.6, 0.4))
  } else {
    coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.4, 0.6))
  }
}

#tidy up
my_data <- data.frame(student_id, coin_result)
my_data <- my_data[order(my_data$student_id),]

my_data %>%
  group_by(student_id) %>%
  summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = 'drop') %>%
  filter(!is.na(Sequence)) %>%
  count(Sequence)

# A tibble: 4 × 2
  Sequence     n
  <chr>    <int>
1 HH       29763
2 HT       19782
3 TH       19775
4 TT       30580