Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Draw bernouli outcome from ifelse statement on list of dataframes

I am trying to draw a 1 or 0 from a bernouli distribution for each row within a list when the value in the first column exceeds 1000.

I believe my current code is drawing a distribution for each dataframe in the list as opposed to doing it for each row. Is there a way I can confirm this? for each row where distance is >1000 I want to draw from the bernouli distribution 1 or 0. each row has its own chance of being 0 or 1

mylistnew<-lapply(mylist, transform, outcome = ifelse(distance > 1000, 
rbinom(length(distance),1,0.8), NA))

I cant see how to change rbinom(length(distance) to be a single draw for row as opposed to the length of the dataframe/if else statement.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Subset of the data:

list(structure(c(775.056695476403, 1414.15314106691, 2509.95923787194, 
1666.71143236238, 585.640129954299, 1169.17884175758, 152.505503148836, 
619.226302243787, 1263.66546590149, 1682.8712425131, -2.86809018002943, 
-2.87220511792857, -2.91236875367306, -2.91236875367306, -2.91137226768259, 
-2.91236875367306, -2.86275243787543, -2.8606012634912, -2.86264610888995, 
-2.86004943151114, 58.2523804031471, 58.2594633464797, 58.1998311185373, 
58.1998311185373, 58.1999333186371, 58.1998311185373, 58.243480631029, 
58.2359999509482, 58.2407966146843, 58.2335609045358, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1), .Dim = c(10L, 4L), .Dimnames = list(NULL, 
    c("distance", "lon", "lat", "ID"))), structure(c(775.056695476403, 
1414.15314106691, 2509.95923787194, 1666.71143236238, 585.640129954299, 
1169.17884175758, 152.505503148836, 619.226302243787, 1263.66546590149, 
1682.8712425131, -2.86809018002943, -2.87220511792857, -2.91236875367306, 
-2.91236875367306, -2.91137226768259, -2.91236875367306, -2.86275243787543, 
-2.8606012634912, -2.86264610888995, -2.86004943151114, 58.2523804031471, 
58.2594633464797, 58.1998311185373, 58.1998311185373, 58.1999333186371, 
58.1998311185373, 58.243480631029, 58.2359999509482, 58.2407966146843, 
58.2335609045358, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim = c(10L, 
4L), .Dimnames = list(NULL, c("distance", "lon", "lat", "ID"))))

>Solution :

Well your rbinom produces i.i.d. random variables, so your function is correct. A way to verify would be the following snippet:

set.seed(12123)
n <- 10000
rowSums(                        # [3]
  (mat <- replicate(n,          # [2]
             rbinom(10, 1, 0.8) # [1]
  ))
) / n
# [1] 0.8004 0.7979 0.8025 0.8033 0.7974 0.7988 0.7984 0.7993 0.7990 0.8013

cor(t(mat))
#                [,1]          [,2]          [,3]          [,4]          [,5] [...]
#  [1,]  1.0000000000  0.0028711704  0.0036386366 -0.0003859466  0.0097167804 [...]
# [...]

Explanation

  1. Draw 10 bernoulli random variables
  2. Repeat this 10000 times (data is then organized as a 10 x 10000 matrix with repetitions in columns and the 10 independent variables in rows)
  3. Take the average or each row. As we drew from a bernoulli with p = .8 we would expect an average of around .8 which the reuslts show.
  4. If we look at the correlation between the 10 observations, we see that those are all very close to 0, so they are independent.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading