Group Theory for Wellness I

(Part of Math and Wellness Month.)

Groups! A group is a set with an associative binary operation such that there exists an identity element and inverse elements! And my favorite thing about groups is that all the time that you spend thinking about groups, is time that you're not thinking about pain, betrayal, politics, or moral uncertainty!

Groups have subgroups, which you can totally guess just from the name are subsets of the group that themselves satisfy the group axioms!

The order of a finite group is its number of elements, but this is not to be confused with the order of an element of a group, which is the smallest integer such that the element raised to that power equals the identity! Both senses of "order" are indicated with vertical bars like an absolute value (|G|, |a|).

Lagrange proved that the order of a subgroup divides the order of the group of which it is a subgroup! History remains ignorant of how often Lagrange cried.

To show that a nonempty subset H of a group is in fact a subgroup, it suffices to show that if x, yH, then xy⁻¹ ∈ H.

Exercise #6 in §2.1 of Dummit and Foote Abstract Algebra (3rd ed'n) asks us to prove that if G is a commutative ("abelian") group, then the torsion subgroup {gG | |g| < ∞} is in fact a subgroup. I argue as follows: we need to show that if x and y have finite order, then so does xy⁻¹, that is, that (xy⁻¹)^n equals the identity. But (xy⁻¹)^n equals (xy⁻¹)(xy⁻¹)...(xy⁻¹), "n times"—that is, pretend n ≥ 3, and pretend that instead of "..." I wrote zero or more extra copies of "(xy⁻¹)" so that the expression has n factors. (I usually dislike it when authors use ellipsis notation, which feels so icky and informal compared to a nice Π or Σ, but let me have this one.) Because group operations are associative, we can drop the parens to get xy⁻¹ xy⁻¹ ... xy⁻¹. And because we said the group was commutative, we can reörder the factors to get xxx...y⁻¹y⁻¹y⁻¹, and then we can consolidate into powers to get x^n y^(−n)—but that's the identity if n is the least common multiple of |x| and |y|, which means that xy⁻¹ has finite order, which is what I've been trying to tell you this entire time.

The Typical Set

(Part of Math and Wellness Month.)

Say you have a biased coin that comes up Heads 80% of the time. (I like to imagine that the Heads side has a portrait of Bernoulli.) Flip it 100 times. The naïve way to report the outcome—just report the sequences of Headses and Tailses—costs 100 bits. But maybe you don't have 100 bits. What to do?

One thing to notice is that because it was a biased coin, some bit sequences are vastly more probable than others: "all Tails" has probability 0.2100 ≈ 1.268 · 10−70, whereas "all Heads" has probability 0.8100 ≈ 2.037 · 10−10, differing by a factor of sixty orders of magnitude!!

Even though "all Heads" is the uniquely most probable sequence, you'd still be pretty surprised to see it—there's only one such possible outcome, and it only happens a 2.037 · 10−10th of the time. You probably expect to get a sequence with about twenty Tails in it, and there are lots of those, even though each individual one is less probable than "all Heads."

Call the number of times we flip our Bernoulli coin N, and call the entropy of the coinflip H. (For the 80/20 biased coin, H is ⅕ lg 5 + 4/5 lg 5/4 ≈ 0.7219.)

It turns out for sufficiently large N (I know, one of those theorems, right?), almost all of the probability mass is going to live in a subset of 2NH outcomes, each of which have a probability close to 2−NH (and you'll notice that 2NH · 2−NH = 1).

April Is Separability Month

It is now April! Did you know that April is one of the months in which every compact metric space is separable?

Proof. Let it be April, and let M be a compact metric space. Because M is compact, it is totally bounded, so for all n∈ℕ, we can cover M with finitely many open balls of radius 1/n. The centers of all such balls are a countable set which we can call C. But C is dense, because an arbitrary point p∈M is a limit point of C: an ε-neighborhood of p must contain the center of one the balls in our covering of M with ε/2-balls. Thus M contains a countable dense subset.

From the Top

Theorem. The product of the additive inverse of the multiplicative identity with itself is equal to the multiplicative identity.

Proof. The sum of the multiplicative identity and its additive inverse is the additive identity: that is, the expression "1 + (–1)" is equal to the expression "0". Multiplying both of these expressions by the additive inverse of the multiplicative identity, then applying the distributivity axiom, the theorem of multiplication by the additive identity, and the law of multiplicative identity, we get:

–1(–1 + 1) = –1(0)

(–1)(–1) + (–1)1 = 0

(–1)(–1) + (–1) = 0

But then adding the multiplicative identity to both of these expressions and applying the law of additive inverses and the law of additive identity, we get:

(–1)(–1) + (–1) + 1 = 0 + 1

(–1)(–1) = 1

But that's what I've been trying to tell you this whole time.

Epiphenomenal Coordinates

In the study of elementary linear algebra, unwary novices are often inclined to think of a vector as an ordered list of real numbers; to them, linear algebra is then conceived of as the study of multiplying matrices with column vectors. But this is a horribly impoverished perspective; we can do so much better for ourselves with a bit of abstraction and generality.

You can think of arrows or lists of numbers if you want or if you must, but the true, ultimate meaning of a vector space is ... well, anything that satisfies the vector space axioms. If you have things that you can "add" (meaning that we have an associative, commutative binary operation and we have inverse elements and an identity element with respect to this operation), and you can "multiply" these things by other things that come from a field (the "vectors" in the space and the "scalars" from the field play nicely together in a way that is distributive &c.), then these things you that you have are a vector space over that field, and any of the theorems that we prove about vector spaces in general apply in full force to the things you have, which don't have to be lists of real numbers; they could be matrices or polynomials or functions or whatever.

Okay, so it turns out that n-dimensional vector spaces are isomorphic to lists of n numbers (elements of the appropriate field), but that's not part of our fundamental notion of vectorness; it's something we can prove

Continue reading

Iff as Conditional Chain

I'm not sure I like how when we want to prove that two statements are equivalent, we typically say "A if and only if B" and we prove it by separately proving "both directions" AB and BA, but when we want to prove three or more statements are equivalent, we typically say "The following are equivalent" and prove a "circular chain" of conditionals (1) ⇒ (2) ⇒ [...] ⇒ (n) ⇒ (1), as if these were different proof strategies. Because really, the "both directions" business is just a special case of the chain-of-conditionals idea: (1) ⇒ (2) ⇒ (1). At the very least, one of my books ought to have mentioned this.


Say we have a linear transformation A and some nonzero vector v, and suppose that Av = λv for some scalar λ. This is a very special situation; we say that λ is an eigenvalue of A corresponding to the eigenvector v.

How can we find eigenvalues? Here's one criterion. If Av = λv for some unknown λ, we at least know that Av – λv equals the zero vector, which implies that the linear transformation (A – λI) maps v to zero. If (A – λI) maps v to zero, then it must have a nontrivial kernel, which is to say that it can't be invertible, and this happens exactly when its determinant is zero, because the determinant measures how the linear transformation distorts (signed) areas (volumes, 4-hypervolumes, &c.), so if the determinant is zero, it means you've lost a dimension; the space has been smashed infinitely thin. But det(A – λI) is a polynomial in λ, and so the roots of that polynomial are exactly the eigenvalues of A.

Two Views of the Monotone Sequence Theorem

If a sequence of real numbers (an) is bounded and monotone (and I'm actually going to say nondecreasing, without loss of generality), then it converges. I'm going to tell you why and I'm going to tell you twice.

If our sequence is bounded, the completeness of the reals ensures that it has a least upper bound, which we'll call, I don't know, B, but there have to be sequence elements arbitrarily close to (but not greater than) B, because if there weren't, then B couldn't be a least upper bound. So for whatever arbitrarily small ε, there's an N such that aN > B – ε, which implies that |aNB| < ε, but if the sequence is nondecreasing, we also have |anB| < ε for nN, which is what I've been trying to tell you—

twice; suppose by way of contraposition that our sequence is not convergent. Then there exists an ε such that for all N, there exist m and n greater or equal to N, such that |aman| is greater or equal to ε. Suppose it's monotone, without loss of generality nondecreasing; that implies that for all N, we can find n > mN such that anam ≥ ε. Now suppose our sequence is bounded above by some bound B. However, we can actually describe an algorithm to find sequence points greater than B, thus showing that this alleged bound is really not a bound at all. Start at a1. We can find points later in the sequence that are separated from each other by at least ε, but if we do this ⌈(Ba1)/ε⌉ times, then we'll have found a sequence point greater than the alleged bound.

Subscripting as Function Composition

Dear reader, don't laugh: I had thought I already understood subsequences, but then it turned out that I was mistaken. I should have noticed the vague, unverbalized discomfort I felt about the subscripted-subscript notation, (ank). But really it shouldn't be confusing at all: as Bernd S. W. Schröder points out in his Mathematical Analysis: A Concise Introduction, it's just a function composition. If it helps (it helped me), say that (an) is mere syntactic sugar for a(n): ℕ → ℝ, a function from the naturals to the reals. And (ank) is just the composition a(n(k)), with n(k): ℕ → ℕ being a strictly increasing function from the naturals to the naturals.

Bounded but Not Totally Bounded, Redux

Theorem. An open set in real sequence space under the ℓ norm is not totally bounded.

Proof. Consider an open set U containing a point p. Suppose by way of contradiction that U is totally bounded. Then for every ε > 0, there exists a finite ε-net for U. Fix ε, and let m be the number of points in our ε-net, which net we'll denote {Si}i∈{1, ..., m}. We're going to construct a very special point y, which does not live in U. For all i ∈ {1, ..., m}, we can choose the ith component yi such that the absolute value of its difference from the ith component of the ith point in the net is strictly greater than ε (that is, |yiSi,i| > ε) but also so that the absolute value of its difference from the ith component of p is less than or equal to ε (that is, |yipi| ≤ ε). Then for j > m, set yj = pj. Then |yp| ≤ ε, but that means there are points arbitrarily close to p which are not in U, which is an absurd thing to happen to a point in an open set! But that's what I've been trying to tell you this entire time.

Bounded but Not Totally Bounded

The idea of total boundedness in metric space (for every ε, you can cover the set with a finite number of ε-balls; discussed previously on An Algorithmic Lucidity) is distinct from (and in fact, stronger than) the idea of mere boundedness (there's an upper bound for the distance between any two points in the set), but to an uneducated mind, it's not immediately clear why. What would be an example of a set that's bounded but not totally bounded? Wikipedia claims that the unit ball in infinite-dimensional Banach space will do. Eric Hayashi made this more explicit for me: consider sequence space under the ℓ norm, and the "standard basis" set (1, 0, 0 ...), (0, 1, 0, 0, ...), (0, 0, 1, 0, 0, ...). The distance between any two points in this set is one, so it's bounded, but an open 1-ball around any point doesn't contain any of the other points, so no finite number of open 1-balls will do, so it's not totally bounded, which is what I've been trying to tell you this entire time.

[RETRACTED] Introducing the Fractional Arithmetic Derivative

[NOTICE: The conclusion of this post is hereby retracted because it turns out that the proposed definition of a "fractional arithmetic derivative" doesn't actually make sense. It fails to meet the basic decideratum of corresponding with an iterated arithmetic derivative. E.g., consider that 225″ = (225′)′ = ((32·52)′)′ = (2·3·52 + 32·2·5)′ = (150 + 90)′ = 240′ = (24·3·5)′ = 4·23·3·5 + 24·5 + 24·3 = 480 + 80 + 48 = 608. Whereas, under the proposed definition we would allegedly equivalently have 225(2) = (2!·30·52 + 32·2!·50) = 50 + 18 = 68. I apologize to anyone who read the original post (??) who was thereby misled. The original post follows (with the erroneous section struck through).]

Continue reading


What is a vector in Euclidean space? Some might say it's an entity characterized by possessing a magnitude and a direction. But scholars of the geometric algebra (such as Eric Chisolm and Dorst et al.) tell us that it's better to decompose the idea of direction into the two ideas of subspace attitude (our vector's quality of living in a particular line) and orientation (its quality of pointing in a particular direction in that line, and not the other). On this view, a vector is an attitudinal oriented length element. But having done this, it becomes inevitable that we should want to talk about attitudinal oriented area (volume, 4-hypervolume, &c.) elements. To this end we introduce the outer or wedge product ∧ on vectors. It is bilinear, it is anticommutative (swapping the order of arguments swaps the sign, so ab = –ba), and that's all you need to know.

Suppose we have two vectors a and b in Euclidean space and also a basis for the subspace that the vectors live in, e1 and e2, so that we can write a := a1e1 + a2e2 and b := b1e1 + b2e2. Then the claim is that the outer product ab can be said to represent a generalized vector (call it a 2-blade—and in general, when we wedge k vectors together, it's a k-blade) with a subspace attitude of the plane that our vectors live in and a magnitude equal to the area of the parallelogram spanned by them. Following Dorst et al., let's see what happens when we expand ab in terms of our basis—

Continue reading

The True Secret About Conjugate Roots and Field Automorphisms

In the study of the elementary algebra, one occasionally hears of the conjugate roots theorem, which says that if z0 is a root of a polynomial with real coefficients, then its complex conjugate is also a root. Or if you prefer, nonreal roots come in conjugate pairs. It also works in the other direction: if nonreal roots of a polynomial come in conjugate pairs, then the polynomial has real coefficients, because the purely imaginary parts cancel when you do the algebra: (x – (a + bi))(x – (abi)) = x2x(a + bi) – x(abi) + (a2 – (bi)2) = x2 – 2ax + a2 + b2.

There's also this idea that conjugation is the unique nontrivial "well-behaved" automorphism on ℂ, a map from ℂ to itself that respects addition and multiplication: the sum (respectively product) of the conjugates is the conjugate of the sum (respectively product). The complex numbers are symmetrical around the real axis in a way that they're not around the imaginary axis: while i and –i are different from each other, you can't "tell which is which" because they behave the same way. Contrast to 1 and –1, which do behave differently: if someone put either 1 or –1 in a box, but they wouldn't tell you which, but they were willing to tell you that "The number in the box squares to itself," then you could figure out that the number in the box was 1, because –1 doesn't do that.

The existence of these two ideas (the conjugate roots theorem and conjugation-as-automorphism) can't possibly be a coincidence; there must be some sense in which nonreal roots of real-coefficient polynomials come in conjugate pairs because the polynomial "can't tell" "which is which". But it would be unsatisfying to just say this much and nothing more ("Theorem: That can't possibly be a coincidence. Proof ...??"); we want to say something much more general and precise. And in fact, we can—

Continue reading

Inclusion, Exclusion

In this modern day and age, it simply cannot be doubted that it is of the very utmost importance that we find the size of the union of some sets. One might try just adding the sizes of all the sets, but that's not correct, because then one would be double-counting the elements that appear in more than one set. But it's a good start. One might then think that one could begin by adding the sizes of the sets, but then subtract the sizes of the intersections of each pair of sets, in order to correct for the double-counting. But this is also incorrect, because then what about the elements that appear in three sets and had thus initially been triple-counted?—after subtracting the pairwise intersections, these elements haven't been included in the count at all! So one realizes that one must then add the sizes of the triplewise intersections ...

And in one of the dark recesses of the human mind, untouched by outside light or sound, silent and unyielding to the invidious scrutiny of introspection and cognitive science—a conjecture is formed—

The size of the union of n sets is given by the alternating (starting from positive) sum of the sums of the sizes of all j-way intersections amongst the sets from j:=1 to n!

(N.b., the exclamation mark indicates an excited tone, not "factorial".)

Continue reading

Summing the Multinomial Coefficients

The sum of binomial coefficients \sum_{j=0}^n {n \choose j} equals 2n, because {n \choose j} is the number of ways to pick j elements from a set of size n, and 2n is the size of the powerset, the set of all subsets, of a set of size n: the sum, over all subset sizes, of the number of ways to choose subsets of a given size, is equal to the number of subsets. You can also see this using the binomial theorem itself:

2^{n} = (1 + 1)^{n} = \sum_{j=0}^{n} {n \choose j} 1^{j}1^{n-j} = \sum_{j=0}^{n} {n \choose j}

But of course there's nothing special about two; it works for multinomial coefficients just the same. The sum, over all m-tuples of subset sizes, of the number of ways to split a set of size n into subsets of sizes given by the m-tuple, is equal to the number of ways to split a set of size n into m subsets (viz., mn).

Straight Talk About Precompactness

So we have this metric space, which is this set of points along with a way of defining "distances" between them that behaves in a basically noncrazy way (points that are zero distance away from "each other" are really just the same point, the distance from one to the other is the same as the distance from the other to the one, and something about triangles).

Let's say (please, if you don't mind) that a sequence of points (xn) in our space is fundamental (or maybe Cauchy) iff (sic) for all positive ε, there's a point far enough along in the sequence so that beyond that point, the distance from one point to the next is less than ε. Let's also agree (if that's okay with you) to say that our metric space is sequentially precompact iff every sequence has a fundamental subsequence. If, furthermore, the precompact space is complete (all fundamental sequences actually converge to a point in the space, rather than leading up to an ætherial gap or missing edge), then we say it's compact. It turns out that compactness is an important property to pay attention to because it implies lots of cool stuff: like, compactness is preserved by homeomorphisms (continuously invertible continuous maps), and continuous functions with compact domains are bounded, and probably all sorts of other things that I don't know (yet). I'm saying sequentially precompact because I'm given to understand that while the convergent subsequences criterion for compactness is equivalent to this other definition (viz., "every open cover has a finite subcover") for metric spaces, the two ideas aren't the same for more general topological spaces. Just don't ask me what in the world we're going to do with a nonmetrizable space, 'cause I don't know (yet).

But anyway, as long as we're naming ideas, why not say that our metric space is totally bounded iff for every ε, there exists a finite number of open (that is, not including the boundary) balls that cover the whole space? We can call the centers of such a group of balls an ε-net. Our friend Shilov quotes his friend Liusternik as saying, "Suppose a lamp illuminating a ball of radius ε is placed at every point of a set B which is an ε-net for a set M. Then the whole set M will be illuminated." At the risk of having names for things that possibly don't actually deserve names, I'm going call each point in an ε-net a lamp. Actually Shilov, and thus likely Liusternik, is talking about closed balls of light around the lamps, not the open ones that I'm talking about. In a lot of circumstances, this could probably make all the difference in the world, but for the duration of this post, I don't think you should worry about it.

But this fear of having too many names for things is really a very serious one, because it turns out that sequential precompactness and total boundedness are the same thing: not only can you not have one without the other, but you can't even have the other without the one! Seriously, like, who even does that?!

Continue reading

Interpolating Between Vectorized Green's Theorems

Green's theorem says that (subject to some very reasonable conditions that we need not concern ourselves with here) the counterclockwise line integral of the vector field F = [P Q] around the boundary of a region is equal to the double intregral of \frac{\partial Q}{\partial x}-\frac{\partial P}{\partial y} over the region itself. It's natural to think of it as a special case of Stokes's theorem in the case of a plane. We can also think of the line integral as the integral of the inner product of the vector field with the unit tangent, leading us to write Green's theorem like this:

 \oint_{\partial D}\vec{\mathbf{F}}\cdot\vec{\mathbf{T}}\, ds=\iint_{D}(\mathrm{curl\,}\vec{\mathbf{F}})\cdot\vec{\mathbf{k}}\, ds

But some texts (I have Mardsen and Tromba's Vector Calculus and Stewart's Calculus: Early Transcendentals in my possession; undoubtedly there are others) point out that we can also think of Green's theorem as a special case of the divergence theorem! Suppose we take the integral of the inner product of the vector field with the outward-facing unit normal (instead of the unit tangent)—it turns out that

\oint_{\partial D}\vec{\mathbf{F}}\cdot\vec{\mathbf{n}}\, ds=\iint_{D}\mathrm{div\,}\vec{\mathbf{F}} ds

—which suggests that there's some deep fundamental sense in which Stokes's theorem and the divergence theorem are really just mere surface manifestations of one and the same underlying idea! (I'm told that it's called the generalized Stokes's theorem, but regrettably I don't know the details yet.)

Continue reading

The Derivative of the Natural Logarithm

Most people learn during their study of the differential and integral calculus that the derivative of the natural logarithm ln x is the reciprocal function 1/x. Indeed, sometimes the natural logarithm is defined as  \int_1^x \frac{1}{t}\,dt. However, on observing the graphs of ln x and 1/x, the inquisitive seeker of knowledge can hardly fail to notice a disturbing anomaly:

y=ln(x) y=1/x

The natural logarithm is only defined for positive numbers; no part of its graph lies in quadrants II or III. But the reciprocal function is defined for all nonzero numbers. So (one cannot help oneself but wonder) how could the latter be the derivative of the former? If the graph of the natural logarithm isn't there to be differentiated in the left half of the plane, how could its derivative be defined in that region?
Continue reading