Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R- combine two variables with same character outputs to use in logistic regression

I’ve looked up what to do in this case and haven’t found much information that I could use, so any advice would be greatly appreciated

I have a dataset that separates males and females for certain variables. I would like to combine them and use the combined variable in logistic regression.

example of how data looks

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

male<- c("weekly","monthly","","never","","","weekly")
female<- c("","","never","","daily","weekly","")
df<-data.frame(male,female)

My code looks like this

df$combined<- paste(df$male,df$female)
model_00_<- glm(formula= df$outcome ~ df$main_predictor + df$combined, data=df, family=binomial(link="logit"))
exp(cbind(OR=coef(model_00_),confint(model_00_)))

but when I do the output looks like this (arbitrary numbers for simplicity)

                     OR      2.5%     97.5%            
intercept            9         6        11
daily                4         3        7
weekly               3          2        6
monthly              2.5        1.5      4
never                0.75       0.6     0.9
daily                4         3        7
weekly               3          2        6
monthly             2.5        1.5      4  
never                NA         NA      NA

I think this is happening because of the "paste" function but I am unsure as to how I can marry the two variables without the "paste" function

>Solution :

As others have mentioned, paste is a bad solution because it adds whitespace between the things being pasted. But I do not like using paste0 either, because it doesn’t really consider the original variables as data — just pastes them together as characters.

As Limey’s comment above mentions, I think coalesce is the better solution than either. coalesce(x, y) simply takes the value of x unless it is NA or NULL, in which case the value of y is used. Thus:

male <- c("weekly", "monthly", NA, "never", NA, NA, "weekly")
female <- c(NA, NA, "never", NA, "daily", "weekly", NA)

df <- data.frame(male, female)
df
> df
     male female
1  weekly   <NA>
2 monthly   <NA>
3    <NA>  never
4   never   <NA>
5    <NA>  daily
6    <NA> weekly
7  weekly   <NA>

library(dplyr)
desired_output <- coalesce(male, female)
desired_output

> desired_output
[1] "weekly"  "monthly" "never"   "never"   "daily"   "weekly"  "weekly" 

However, note that if your empty cells in the original data file have any whitespace in them, or were empty strings (""), then coalesce would not work. An empty string is different than a missing value.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading