Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Entropy calculation NaN – is adding na.omit to entropy function acceptable?

I’m performing LCA using the poLCA package in R and trying to calculate entropy, which for some of my models are outputting NaN.

Following example code used for entropy calculation:

> entropy<-function (p) sum(-p*log(p))

> error_prior <- entropy(lca2$P) # Class proportions model 2
> error_post <- mean(apply(lca2$posterior, 1, entropy) na.rm = TRUE)
> results[2,8] <- round(((error_prior - error_post) / error_prior), 3)

From the answer to this question Entropy output is NaN for some class solutions and not others I understand this to be an issue caused by zeros in the function for calculating entropy. The issue is resolved when adding na.omit to the entropy as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

entropy <- function (p) sum(na.omit(-p*log(p)))

My question is – is adding this na.omit to entropy calculations a technically accepted method for resolving this issue without affecting the integrity of the calculation?

When I run the entropy calculations with and without na.omit, around 1/3 of the values (obviously those with zeros somewhere in calculation of entropy) are altered… I’m now unsure if I should always be using na.omit in entropy function or whether there is another way of resolving this problem.

>Solution :

It is valid, but not transparent at first glance. The reason is that the mathematical limit of xlog(x) as x -> 0 is 0 (we can prove this using L’Hospital Rule). In this regard, the most robust definition of the function should be

entropy.safe <- function (p) {
  if (any(p > 1 | p < 0)) stop("probability must be between 0 and 1")
  log.p <- numeric(length(p))
  safe <- p != 0
  log.p[safe] <- log(p[safe])
  sum(-p * log.p)
}

But simply dropping p = 0 cases gives identical results, because the entropy at p = 0 is 0 and contributes nothing anyway.

entropy <- function (p) {
  if (any(p > 1 | p < 0)) stop("probability must be between 0 and 1")
  log.p <- numeric(length(p))
  sum(-p * log.p, na.rm = TRUE)
}

p <- seq(0, 1, 0.1)
entropy(p)
#[1] 2.455935
entropy.safe(p)
#[1] 2.455935
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading