Get the latest tech news
Cross-Entropy and KL Divergence
widely used in modern ML to compute the loss for classification tasks. This post is a brief overview of the math behind it and a related concept called Kullback-Leibler (KL) divergence.
An information-theoretic interpretation of cross-entropy is: the average number of bits required to encode an actual probability distribution P, when we assumed the data follows Q instead. Cross-entropy is useful for tracking the training loss of a model (more on this in the next section), but it has some mathematical properties that make it less than ideal as a statistical tool to compare two probability distributions. We've shown that maximum likelihood estimation is equivalent to minimizing the cross-entropy between the true and and predicted data distributions.
Or read this on Hacker News