Entropy calculation NaN – is adding na.omit to entropy function acceptable?

June 28, 2022

I’m performing LCA using the poLCA package in R and trying to calculate entropy, which for some of my models are outputting NaN.

Following example code used for entropy calculation:

> entropy<-function (p) sum(-p*log(p))

> error_prior <- entropy(lca2$P) # Class proportions model 2
> error_post <- mean(apply(lca2$posterior, 1, entropy) na.rm = TRUE)
> results[2,8] <- round(((error_prior - error_post) / error_prior), 3)

From the answer to this question Entropy output is NaN for some class solutions and not others I understand this to be an issue caused by zeros in the function for calculating entropy. The issue is resolved when adding na.omit to the entropy as follows:

entropy <- function (p) sum(na.omit(-p*log(p)))

My question is – is adding this na.omit to entropy calculations a technically accepted method for resolving this issue without affecting the integrity of the calculation?

When I run the entropy calculations with and without na.omit, around 1/3 of the values (obviously those with zeros somewhere in calculation of entropy) are altered… I’m now unsure if I should always be using na.omit in entropy function or whether there is another way of resolving this problem.

>Solution :

It is valid, but not transparent at first glance. The reason is that the mathematical limit of xlog(x) as x -> 0 is 0 (we can prove this using L’Hospital Rule). In this regard, the most robust definition of the function should be

entropy.safe <- function (p) {
  if (any(p > 1 | p < 0)) stop("probability must be between 0 and 1")
  log.p <- numeric(length(p))
  safe <- p != 0
  log.p[safe] <- log(p[safe])
  sum(-p * log.p)
}

But simply dropping p = 0 cases gives identical results, because the entropy at p = 0 is 0 and contributes nothing anyway.

entropy <- function (p) {
  if (any(p > 1 | p < 0)) stop("probability must be between 0 and 1")
  log.p <- numeric(length(p))
  sum(-p * log.p, na.rm = TRUE)
}

p <- seq(0, 1, 0.1)
entropy(p)
#[1] 2.455935
entropy.safe(p)
#[1] 2.455935