INTRODUCTION:

In the words of Wikipedia: “In mathematical statistics, the Kullback–Leibler Divergence is a measure of how one probability distribution diverges from a second expected probability distribution“.   Now what this means is, through KL Divergence, we could, in a way draw an estimate, as to how two probability distribution Differs.

When we talk about KL Divergence, we look to differentiate between two probability distribution , one of which is the True Probability Distribution and the other an Estimated one. The formula for estimating the KL Divergence is given by:

11

Figure 1

Where H(A,B) denotes the Cross-entropy between distribution A and B,and H(A) denotes the entropy for distribution A. Here Distribution A is considered as the True Distribution and B the estimated one. The formula on further simplification becomes:

12

Figure:2

Where P and Q are same as A and B, ie (P being the True Distribution and Q the Estimated one).

UNDERSTANDING and APPLICATIONS:

There are few ways we look at the above two equations and derive similar yet different meaning, relevant to different Context.

In field of Information and Coding Theory, KL Divergence gives the extra number of bits, which is required to represent a given data, if we explain a “true” Probability Distribution P(from which data is drawn) using an another distribution Q.

In terms of raw understanding, KL Divergence also gives an idea about degree of surprise one would have if he/she believes data is sampled from Q but is actually sampled from Distribution P. Here is an Interesting example by Dr. Harold Williams on Quora. https://www.quora.com/What-is-a-good-laymans-explanation-for-the-Kullback-Leibler-Divergence/answer/Harold-Williams-2

Another way of Looking KL Divergence, would be in terms of Likelihood Ratio, 13

Now, the Equation here Looks pretty similar to Figure 2, and is Hence used in Machine Learning Algorithms such as GANs. http://www.inference.vc/an-alternative-update-rule-for-generative-adversarial-networks/

Apart from other usage in many concepts of statistics, KL Divergence is also used in other fields of Machine Learning such as High-Dimensional Visualization purpose too such as t-SNE, make it a really important statistical tool to get used to!

Welcome to WordPress.com. After you read this, you should delete and write your own post, with a new title above. Or hit Add New on the left (of the admin dashboard) to start a fresh post.

Here are some suggestions for your first post.

  1. You can find new ideas for what to blog about by reading the Daily Post.
  2. Add PressThis to your browser. It creates a new blog post for you about any interesting  page you read on the web.
  3. Make some changes to this page, and then hit preview on the right. You can always preview any post or edit it before you share it to the world.