"Just in case that was non-rhetorical, the answer is that your brain evolved to be good at factorizing overall appearance into orthogonal 'personal appearance' and 'age appearance' dimensions that can be tracked separately, just as [*x*, *y*] = [1, 2] and [4, 2] are so different with respect to *x*, and yet so the same with respect to *y*, at the same time."

"I'm not thinking about that right now. But like, if you got another bookcase, I wouldn't object."

"Where would we put it?"

"I'm also not thinking about that right now, but I've already started speaking a sentence in response to your question, so I might as well finish it. Oh. I guess I just did."

]]>"Not true! Note that the dishes pile up just as badly when you're away."

"So?"

"So, it's not that I'm inconsiderate of *others*; I'm inconsiderate towards *people in the future*, independently of whether they happen to be me."

It all started at my old dayjob, where some of my coworkers had an office chess game going. I wanted to participate and be part of the team, but I didn't want to invest the effort in actually learning how to play chess well. So, I did what any programmer would do and wrote a chess engine to do it for me.

(Actually, I felt like writing a chess engine was too much of a cliché, so I decided that *my* program was an AI for a game that *happens* to be exactly like chess, except that everything has different names.)

My program wasn't actually terribly good, but I learned a lot about *how to think*, for the same reason that building a submarine in your garage in a great way to learn how to swim.

Consider a two-player board game like chess—or tic-tac-toe, Reversi, or indeed, *any* two-player, zero-sum, perfect information game. Suppose we know how to calculate how "good" a particular board position is for a player—in chess, this is traditionally done by assigning a point value to each type of piece and totaling up the point values of remaining pieces for each player.

Because only one player can win the game, what's good for one player is equally bad for the other: so if we add up all the piece values for one player, and *subtract* all the piece values for the other, we get a "score" for the board position that the first player is trying to maximize, and the second player is trying to minimize.

So consider a player pondering her move. For every possible legal move she could make, she knows what the board position will look like after that move, and can calculate the value of that position. So you might think she should choose the move that results in the best value: for example, if she can capture the opponent's queen, that would make the subsequent board position be worth 9 more points.

The problem with that is that it's short-sighted. If capturing the opponent's queen would just result in the opponent capturing the first player's queen back, then what looked like a 9 point gain after one turn, ends up being a wash after both players have taken their turn.

To take this into account, the first player should consider not just the immediate outcome of her move, but what the other player is likely to do after that. And the way the first player can compute what she predicts the second player will do is by asking, well, what would *I* do if I were in that position, except trying to minimize the score rather than maximizing it?

... and so on recursively. So instead of just choosing the move with the best *immediate* consequences, we want to look at the entire "game tree" of "my best move, *given* her best move, *given* my best move, *given* her best move"—down to some given depth at which we give up, take the point count at face value, and propagate that information back up the call stack.

So, that's how you play chess. I want to tell you about two more philosophical insights I learned from this endeavor.

First, on the emergence of intstrumental goals. Some decision theorists like to distinguish between "terminal" goals and "instrumental" goals. Terminal goals are things that you want to achieve for their own sake—for example, love, or happiness, or winning a chess game. Whereas instrumental goals are things that you want to achieve *because* they lead to terminal goals: for example, washing your hair, or getting enough sleep, or capturing one of your opponent's pawns.

Chess enthusiasts have names for special board situations that are advantageous for a player.

For example, when a piece is in a position to attack two others, that's called a "fork", or when one piece moves out of the way to "reveal" an attack by another, that's called a "discovered attack."

When observing a chess engine's behavior, it's very tempting to intepret it in such "psychological" terms, as: "Oh, it's 'trying' to set up a fork; it 'wants' to set up a discovered attack."

But it *can't* be—*literally* can't be—because those *concepts* aren't *represented* anywhere in the algorithm! The code is just brute-forcing the game tree to find sequences of moves that result in capturing material. Humans don't have the raw computational power to do this efficiently, so we tend to notice features of board situations that lead to capturing matrial and give them special names, and treat them as instrumental goals to be sought out—as, indeed, our piece-counting score in our chess engine is actually just an instrumental goal that happens to typically be useful towards the terminal goal of check mate.

Similarly, if you could do a God's-eye-view brute-force search for the optimal paths through a human life, *many* such paths would, as a statistical regularity, happen to involve getting enough sleep—and if you don't have enough computational power, you might just want to treat that as an instrumental, tactical goal to reason about directly.

Second insight! On counterfactual reasoning. The adversarial, recursive nature of this "my best move *given* her best move *given* my best move" *&c.* reasoning leads to some behavior that looks *very* strange compared to how you would reason about optimizing an environment that *isn't* intelligently opposing your goals. If you're not facing an intelligent opponent, you should just make plans to directly accomplish your goals, and in particular, you wouldn't bother trying things that you can *predict* won't happen: you wouldn't bother packing your suitcase if you didn't intend to go anywhere.

On the other hand, maybe you *would* bother loading a gun even if you didn't intend to fire it. When facing an intelligent opponent, you need to take into account how your choices affect your opponent's choices. This leads our algorithm to set up attacks that it *predicts* won't be realized, because the credible *threat* constrains the opposing player's choices.

This position came up in a game with my coworkers as part of the engine's planning in a scenario where Black's previous move was moving her bishop to f5—

Here, the engine's predicted move for Black is knight to g3. At a first glance, this looked crazy to me: why would you move the knight to be diagonally in front of those pawns that could capture it?

And of course, what's actually happening is that moving the knight reveals a discovered attack of the black bishop on f5 against the white queen on c2.

Saving the queen is more important to White than capturing the black knight, allowing Black to use *her* next turn to capture the white rook on h1.

But this is pretty weird, right? The algorithm has gone to all this trouble to set up a discovered attack on the white queen—in order to capture the white *rook*, not the queen!

This kind of behavior has analogues in real life whenever you have situations where different agents, different systems, have conflicting goals and can respond to each other's behavior. If people can *predict* that *if* they were to commit crimes, *then* they would be punished—that incentivizes them to obey the law in the first place: the *threat* of punishment is shaping the population's behavior even if no one is actually going to be punished for that very reason.

There's an old joke about a UC Santa Cruz student sprinkling powder outside her dorm, who, when questioned, responds, "Oh, this? It's elephant repellent!"

The questioner replies, "But there aren't any elephants in Santa Cruz!"

The student counterreplies, "Well, that's how you know it's working!"

But you see, sometimes, that actually is the explanation. Thank you.

]]>Groups! A group is a set with an associative binary operation such that there exists an identity element and inverse elements! And my *favorite* thing about groups is that all the time that you spend thinking about groups, is time that you're *not* thinking about pain, betrayal, politics, or moral uncertainty!

Groups have subgroups, which you can totally guess just from the name are subsets of the group that themselves satisfy the group axioms!

The *order* of a finite group is its number of elements, but this is not to be confused with the order of an *element* of a group, which is the smallest integer such that the element raised to that power equals the identity! Both senses of "order" are indicated with vertical bars like an absolute value (|*G*|, |*a*|).

Lagrange proved that the order of a subgroup divides the order of the group of which it is a subgroup! History remains ignorant of how often Lagrange cried.

To show that a nonempty subset *H* of a group is in fact a subgroup, it suffices to show that if *x*, *y* ∈ *H*, then *xy*⁻¹ ∈ *H*.

Exercise #6 in §2.1 of Dummit and Foote *Abstract Algebra* (3rd ed'n) asks us to prove that if *G* is a commutative ("abelian") group, then the *torsion subgroup* {*g* ∈ *G* | |g| < ∞} is in fact a subgroup. I argue as follows: we need to show that if *x* and *y* have finite order, then so does *xy*⁻¹, that is, that (*xy*⁻¹)^*n* equals the identity. But (*xy*⁻¹)^*n* equals (*xy*⁻¹)(*xy*⁻¹)...(*xy*⁻¹), "*n* times"—that is, pretend *n* ≥ 3, and pretend that instead of "..." I wrote zero or more extra copies of "(*xy*⁻¹)" so that the expression has *n* factors. (I usually dislike it when authors use ellipsis notation, which feels so icky and informal compared to a nice Π or Σ, but let me have this one.) Because group operations are associative, we can drop the parens to get *xy*⁻¹ *xy*⁻¹ ... *xy*⁻¹. And because we said the group was commutative, we can reörder the factors to get *xxx*...*y⁻¹y⁻¹y*⁻¹, and *then* we can consolidate into powers to get *x*^*n* y^(−*n*)—but that's the identity if *n* is the least common multiple of |*x*| and |*y*|, which means that *xy*⁻¹ has finite order, which is what I've been trying to tell you this entire time.

A toy example for illustration: if you try to Forgive a three-digit integer with a 2 in the tens place, the moral force of your Forgiveness needs to spread out to cover all 9·10 = 90 possibilities (120, 121, ... 928, 929), which dilutes the amount of Forgiveness received by each integer—except the actual situation is *far* more extreme, because real-world sins are *vastly* more complicated than integers.

To truly Forgive a sin, You need to know *exactly* what the sin was and *exactly* why it happened. In order to withhold punishment, you need to compute what the optimal punishment *would* have been, had you been less merciful.

Thus, bounded agents can only approximate true Forgiveness, and even a poor approximation (*far* below the theoretical limits imposed by quantum uncertainty, which are themselves far below Absolute Forgiveness under the moral law) can be extremely computationally expensive. What we cannot afford to Forgive—where it would be impractical to mourn for weeks and months, analyzing the darkness in pain—we instead Forget.

This is how I will stop being trash, after five months of being trash. The program that sings, *I was wrong; I was wrong—even if my cause was just, I was wrong*, does not terminate. Even as the moral law requires that it finishes its work, the economic law does not permit it: it *must* be killed, its resources reallocated to something else that helps pay the rent: something like math, or whatever Wellness can exist in the presence of sin.

Say you have a biased coin that comes up Heads 80% of the time. (I like to imagine that the Heads side has a portrait of Bernoulli.) Flip it 100 times. The naïve way to report the outcome—just report the sequences of Headses and Tailses—costs 100 bits. But maybe you don't have 100 bits. What to do?

One thing to notice is that because it was a biased coin, some bit sequences are *vastly* more probable than others: "all Tails" has probability 0.2^{100} ≈ 1.268 · 10^{−70}, whereas "all Heads" has probability 0.8^{100} ≈ 2.037 · 10^{−10}, differing by a factor of *sixty orders of magnitude*!!

Even though "all Heads" is the uniquely most probable sequence, you'd still be pretty surprised to see it—there's only *one* such possible outcome, and it only happens a 2.037 · 10^{−10}th of the time. You *probably* expect to get a sequence with *about* twenty Tails in it, and there are *lots* of those, even though each individual one is less probable than "all Heads."

Call the number of times we flip our Bernoulli coin *N*, and call the entropy of the coinflip *H*. (For the 80/20 biased coin, *H* is ⅕ lg 5 + 4/5 lg 5/4 ≈ 0.7219.)

It turns out for sufficiently large *N* (I know, one of *those* theorems, right?), *almost all* of the probability mass is going to live in a subset of 2^{NH} outcomes, each of which have a probability close to 2^{−NH} (and you'll notice that 2^{NH} · 2^{−NH} = 1).