(Part of Math and Wellness Month.)

Say you have a biased coin that comes up Heads 80% of the time. (I like to imagine that the Heads side has a portrait of Bernoulli.) Flip it 100 times. The naïve way to report the outcome—just report the sequences of Headses and Tailses—costs 100 bits. But maybe you don't have 100 bits. What to do?

One thing to notice is that because it was a biased coin, some bit sequences are *vastly* more probable than others: "all Tails" has probability 0.2^{100} ≈ 1.268 · 10^{−70}, whereas "all Heads" has probability 0.8^{100} ≈ 2.037 · 10^{−10}, differing by a factor of *sixty orders of magnitude*!!

Even though "all Heads" is the uniquely most probable sequence, you'd still be pretty surprised to see it—there's only *one* such possible outcome, and it only happens a 2.037 · 10^{−10}th of the time. You *probably* expect to get a sequence with *about* twenty Tails in it, and there are *lots* of those, even though each individual one is less probable than "all Heads."

Call the number of times we flip our Bernoulli coin *N*, and call the entropy of the coinflip *H*. (For the 80/20 biased coin, *H* is ⅕ lg 5 + 4/5 lg 5/4 ≈ 0.7219.)

It turns out for sufficiently large *N* (I know, one of *those* theorems, right?), *almost all* of the probability mass is going to live in a subset of 2^{NH} outcomes, each of which have a probability close to 2^{−NH} (and you'll notice that 2^{NH} · 2^{−NH} = 1).