Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to calculate partial correlations when the data frame contains missing values

I want to calculate partial correlations between sets of two variables while controlling for all the other variables in a data frame.

To do this, I used the pcor(c("variable1", "variable2", "control1", "control2", etc.), var(dataFrame)) from the ggm package. However, it didn’t work, meaning I got NA for the partial correlation coefficient.

My data frame has scores of personality test results assessing the participants for neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

studentLecturerPersonality <- read.delim("http://www.discoveringstatistics.com/docs/Chamorro-Premuzic.dat", header = TRUE)

names(studentLecturerPersonality) <- c("age", "gender", "studentNeuroticism", "studentExtraversion", "studentOpenness", "studentAgreeableness", "studentConscientiousness","lecturerNeuroticism", "lecturerExtraversion", "lecturerOpenness", "lecturerAgreeableness", "lecturerConscientiousness") 

studentLecturerPersonalityOnlyTraits <- subset(studentLecturerPersonality, select = c("studentNeuroticism", "studentExtraversion", "studentOpenness", "studentAgreeableness", "studentConscientiousness")) 

I calculated the correlation between the variables using both cor(dataFrame, use = "pairwise.complete.obs", method = "pearson") and cor(variable1, variable2, use = "pairwise.complete.obs", method = "pearson"), in which I know how to deal with missing values (NAs).

I wanted to calculate partial correlation coefficients between the variables extraversion and neuroticism while controlling for openness to experience, agreeableness, and conscientiousnes:

studentLecturerPersonalityOnlyTraitsMatrix <- as.matrix(studentLecturerPersonalityOnlyTraits)

pcExtraversionNeuroticism <- pcor(c("studentExtraversion", "studentNeuroticism",
                                    "studentOpenness", 
                                    "studentAgreeableness", 
                                    "studentConscientiousness"), var(studentLecturerPersonalityOnlyTraitsMatrix))

pcExtraversionNeuroticism

which returns [1] NA.

I don’t know if it’s because the data frame contains missing values (NAs), which I didn’t (or couldn’t) specify how R should deal with (like in cor()).

Can anyone suggest how I can make the pcor() work or an alternative method?

I really appreciate any help you can provide.

>Solution :

First, use complete.cases() to subset the matrix to just the rows which do not contain NA:

complete_matrix  <- studentLecturerPersonalityOnlyTraitsMatrix[
    complete.cases(studentLecturerPersonalityOnlyTraitsMatrix),
]

Then use this matrix before to take the partial correlation:

pcExtraversionNeuroticism <- pcor(
    c(
        "studentExtraversion",
        "studentNeuroticism",
        "studentOpenness",
        "studentAgreeableness",
        "studentConscientiousness"
    ), var(complete_matrix)
)

pcExtraversionNeuroticism
# [1] -0.2971974

It is worth noting that this will drop any rows which contain NA, rather than just rows of the columns you are using. In this case you are using all the columns so that isn’t a problem. However, in the event you were only using, for example, the first two columns, you might wish to do:

cols_to_use  <- c("studentExtraversion", "studentNeuroticism")
complete_matrix <- studentLecturerPersonalityOnlyTraitsMatrix[
    complete.cases(studentLecturerPersonalityOnlyTraitsMatrix[, cols_to_use]),
]

As an aside, your variable names are very long. The Style Guide in Advanced R by Hadley Wickham says:

Generally, variable names should be nouns and function names should be verbs. Strive for names that are concise and meaningful (this is not easy!).

You have certainly got meaningful names. This is a matter of taste, but I wonder if they could be a little more concise!

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading