information theory | An Algorithmic Lucidity

(Part of Math and Wellness Month.)

Say you have a biased coin that comes up Heads 80% of the time. (I like to imagine that the Heads side has a portrait of Bernoulli.) Flip it 100 times. The naïve way to report the outcome—just report the sequences of Headses and Tailses—costs 100 bits. But maybe you don't have 100 bits. What to do?

One thing to notice is that because it was a biased coin, some bit sequences are vastly more probable than others: "all Tails" has probability 0.2¹⁰⁰ ≈ 1.268 · 10⁻⁷⁰, whereas "all Heads" has probability 0.8¹⁰⁰ ≈ 2.037 · 10⁻¹⁰, differing by a factor of sixty orders of magnitude!!

Even though "all Heads" is the uniquely most probable sequence, you'd still be pretty surprised to see it—there's only one such possible outcome, and it only happens a 2.037 · 10⁻¹⁰th of the time. You probably expect to get a sequence with about twenty Tails in it, and there are lots of those, even though each individual one is less probable than "all Heads."

Call the number of times we flip our Bernoulli coin N, and call the entropy of the coinflip H. (For the 80/20 biased coin, H is ⅕ lg 5 + 4/5 lg 5/4 ≈ 0.7219.)

It turns out for sufficiently large N (I know, one of those theorems, right?), almost all of the probability mass is going to live in a subset of 2^NH outcomes, each of which have a probability close to 2^−NH (and you'll notice that 2^NH · 2^−NH = 1).

An Algorithmic Lucidity

a blog

Tag Archives: information theory

The Typical Set