Get the latest tech news
Entropy of a Mixture
Given a pair of probability density functions and an interpolating factor consider the mixture How does the entropy of this mixture vary as a function of ? The widget below shows what this situation looks like for a pair of discrete distributions on the plane. (Drag to adjust ) It turns out that entropy is concave as a function of probabilities.
Now, the idea that I(X;A)I(X;A)I(X;A) measures how much knowing AAA decreases our expected surprisal about XXX leads to a way to express I(X;A)I(X;A)I(X;A) in terms of KL divergences between the distributions p0,p_0,p0,p1p_1p1 and pλ.p_\lambda.pλ. So far, we've seen that the bulge in the curve H(pλ)H(p_\lambda)H(pλ) measures mutual information I(X;A)I(X;A)I(X;A) between a draw from pλp_\lambdapλ and a fictitious latent variable A,A,A, and we've remembered how to write down I(X;A)I(X;A)I(X;A) explicitly. This is true in terms of information gain; in response to a value X,X,X, Bayes' rule instructs us to increase the log likelihood of our belief that A=0A = 0A=0 by ln(p0(X)/pλ(X)),\ln (p_0(X)/p_\lambda(X)),ln(p0(X)/pλ(X)), but in expectation over X∼p0X \sim p_0X∼p0 this causes only an update of Ep0[p0lnp0pλ]=KL(p0∥pλ)=O(λ2).\E_{p_0} \left[p_0 \ln \frac{p_0}{p_\lambda}\right] = \KL(p_0 \Vert p_\lambda) = O(\lambda^2).Ep0[p0lnpλp0]=KL(p0∥pλ)=O(λ2).
Or read this on Hacker News