Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Conditional Random Assignment

I have a question about conditional random assignment. The simplified dataset looks as below:

COMPANY         BOARDROLE               INSIDER
A               Acting Director         Yes
B               CEO                     Yes
C               Independent Director    No
D               Chairman                Unknown
E               Chairman                Unknown
F               Member                  Unknown
G               Independent Director    Outsider
H               Member                  Unknown
I               Member                  Unknown
J               Member                  Unknown

Now I want to create a fourth column, Insider Presence, that either has the value of 1 or 0. Obviously if the third column said no or outsider, there is no insider, so the Insider Presence should be 0. I know I can achieve that with the following function:

pattern <- paste(c("No", "Outsider"), collapse = "|")
df <- df %>%
  mutate(`InsiderPresence` = ifelse(str_detect(Insider, pattern), 0, 1))

But now I also want to achieve that randomly 50% of the ‘Unknown’ is also laballed as 1. So that you get, for example the following output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

COMPANY         BOARDROLE               INSIDER       INSIDER PRESENCE
A               Acting Director         Yes           1
B               CEO                     Yes           1
C               Independent Director    No            0 
D               Chairman                Unknown       1
E               Chairman                Unknown       0 
F               Member                  Unknown       0
G               Independent Director    Outsider      0
H               Member                  Unknown       0
I               Member                  Unknown       1
J               Member                  Unknown       1

I hope that anyone can help me

>Solution :

Here is an option

library(tidyverse)
df %>%
    mutate(`Insider Presence` = case_when(
        str_detect(INSIDER, "Yes") ~ 1L,
        str_detect(INSIDER, "No|Outsider") ~ 0L,
        str_detect(INSIDER, "Unknown") ~ sample(c(0L, 1L), n(), replace = TRUE),
        TRUE ~ NA_integer_))
#    COMPANY            BOARDROLE  INSIDER Insider Presence
# 1        A      Acting Director      Yes                1
# 2        B                  CEO      Yes                1
# 3        C Independent Director       No                0
# 4        D             Chairman  Unknown                1
# 5        E             Chairman  Unknown                1
# 6        F               Member  Unknown                0
# 7        G Independent Director Outsider                0
# 8        H               Member  Unknown                1
# 9        I               Member  Unknown                1
#10        J               Member  Unknown                1

We use case_when to cover all cases; the last line TRUE ~ NA_integer_ should never occur, but it is good practice to include a fall-through for debugging. We use sample to uniformly sample values with replacement from (0, 1), i.e. we draw samples from (0, 1) with a 50% probability.

Note that we draw as many samples here as there are total rows N_tot (and not just rows with INSIDER == "Unknown"). Drawing samples from N_tot with a 50% prop means that any subset will also have a 50% split (at least asymptotically for large enough sample sizes).


Sample data

df <- read.table(text = "COMPANY         BOARDROLE               INSIDER
A               'Acting Director'         Yes
B               CEO                     Yes
C               'Independent Director'    No
D               Chairman                Unknown
E               Chairman                Unknown
F               Member                  Unknown
G               'Independent Director'    Outsider
H               Member                  Unknown
I               Member                  Unknown
J               Member                  Unknown", header = T)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading