Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Number of observations used by cor function in R

I have a big matrix in R with more than 2000 columns and 10,000 rows, and many missing values. This line of code calculates the correlation matrix in R.

cor(data, use = "complete.obs")

My question is: how can I find the number of observations that have been used to calculate each correlation in the output matrix?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The output should be something like this:

v1 v2 v3 v4
v1 20 12 15 18
v2 12 15 10 11
v3 15 10 25 20
v4 18 11 20 20

Thanks for any suggestion

>Solution :

Let’s use a sample matrix data filled with random NAs:

library(dplyr)

set.seed(1234)
data <- rnorm(100) %>%
    matrix(nrow = 10) %>%
    {
        m <- .
        m[rnorm(100) > .5] <- NA
        m

    }


            [,1]       [,2]       [,3]        [,4]       [,5]       [,6]
 [1,] 0.48522682         NA  0.8951720 -0.32439330 0.05913517  0.4369306
 [2,] 0.69676878 -0.4002352  0.6602126          NA 0.41339889         NA
 [3,] 0.18551392  1.4934931  2.2734835 -0.93350334         NA  0.4521904
 [4,]         NA -1.6070809  1.1734976          NA         NA  0.6631986
 [5,] 0.31168103 -0.4157518  0.2877097  0.31916024 0.71888873 -1.1363736
 [6,] 0.76046236         NA -0.6597701 -1.07754212         NA         NA
 [7,] 1.84246363 -0.1517365         NA -3.23315213 1.35727444         NA
 [8,]         NA         NA  0.6774155          NA 0.40446847 -1.2239038
 [9,] 0.03266396 -0.3047211         NA  0.02951783 0.26436427  0.2580684
[10,]         NA  0.6295361  0.1864921  0.59427377 0.26804390         NA
            [,7]       [,8]       [,9]      [,10]
 [1,]         NA -0.3046139 -1.0118219         NA
 [2,]         NA  1.8250111  0.4701675  0.1832475
 [3,]  0.1586254  0.6705594 -0.7009703 -1.7662292
 [4,] -1.7632551  0.9486326         NA         NA
 [5,]  0.3385960  2.0494030         NA         NA
 [6,]         NA -0.6511136         NA         NA
 [7,] -0.2386466  0.8086193         NA -1.1750368
 [8,] -1.1877653  0.9865806 -0.2457632         NA
 [9,]  0.3849353         NA -1.5528590  0.3536254
[10,]         NA  0.3190524  0.1284340  0.3191562

You can transform it into a logical matrix dna where dna[i,j] == TRUE means that data[i,j] is not NA:

dna <- !is.na(data)

Then you can perform matrix product of dna with t(dna) to obtain the number of non-missing observations.

dna <- !is.na(data)

dna %*% t(dna)

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    8    7    4    6    4    3    4    8    5     7
 [2,]    7    9    6    6    5    4    6    8    6     8
 [3,]    4    6    6    4    4    3    4    5    4     5
 [4,]    6    6    4    7    3    3    3    6    5     6
 [5,]    4    5    4    3    5    2    4    5    3     4
 [6,]    3    4    3    3    2    5    4    4    3     5
 [7,]    4    6    4    3    4    4    6    5    4     6
 [8,]    8    8    5    6    5    4    5    9    5     8
 [9,]    5    6    4    5    3    3    4    5    6     5
[10,]    7    8    5    6    4    5    6    8    5     9
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading