Get the latest tech news

Entropy of a Mixture


Given a pair of probability density functions and an interpolating factor consider the mixture How does the entropy of this mixture vary as a function of ? The widget below shows what this situation looks like for a pair of discrete distributions on the plane. (Drag to adjust ) It turns out that entropy is concave as a function of probabilities.

Now, the idea that I(X;A)I(X;A)I(X;A) measures how much knowing AAA decreases our expected surprisal about XXX leads to a way to express I(X;A)I(X;A)I(X;A) in terms of KL divergences between the distributions p0,p_0,p0​,p1p_1p1​ and pλ.p_\lambda.pλ​. So far, we've seen that the bulge in the curve H(pλ)H(p_\lambda)H(pλ​) measures mutual information I(X;A)I(X;A)I(X;A) between a draw from pλp_\lambdapλ​ and a fictitious latent variable A,A,A, and we've remembered how to write down I(X;A)I(X;A)I(X;A) explicitly. This is true in terms of information gain; in response to a value X,X,X, Bayes' rule instructs us to increase the log likelihood of our belief that A=0A = 0A=0 by ln⁡(p0(X)/pλ(X)),\ln (p_0(X)/p_\lambda(X)),ln(p0​(X)/pλ​(X)), but in expectation over X∼p0X \sim p_0X∼p0​ this causes only an update of E⁡p0[p0ln⁡p0pλ]=KL⁡(p0∥pλ)=O(λ2).\E_{p_0} \left[p_0 \ln \frac{p_0}{p_\lambda}\right] = \KL(p_0 \Vert p_\lambda) = O(\lambda^2).Ep0​​[p0​lnpλ​p0​​]=KL(p0​∥pλ​)=O(λ2).

Get the Android app

Or read this on Hacker News

Read more on:

Photo of mixture

mixture

Photo of Entropy

Entropy

Related news:

News photo

Is gravity just entropy rising? Long-shot idea gets another look

News photo

TradeExpert, a trading framework that employs Mixture of Expert LLMs

News photo

Show HN: Entropy – Sharing screen is scary in SaaS age