# "Deep Learning" Is Function Approximation

## A Surprising Development in the Study of Multi-layer Parameterized Graphical Function Approximators

As a programmer and epistemology enthusiast, I’ve been studying some statistical modeling techniques lately! It’s been boodles of fun, and might even prove useful in a future dayjob if I decide to pivot my career away from the backend web development roles I’ve taken in the past.

More specifically, I’ve mostly been focused on multi-layer parameterized graphical function approximators, which map inputs to outputs via a sequence of affine transformations composed with nonlinear “activation” functions.

(Some authors call these “deep neural networks” for some reason, but I like my name better.)

It’s a curve-fitting technique: by setting the multiplicative factors and additive terms appropriately, multi-layer parameterized graphical function approximators can approximate any function. For a popular choice of “activation” rule which takes the maximum of the input and zero, the curve is specifically a piecewise-linear function. We iteratively improve the approximation f(x, θ) by adjusting the parameters θ in the direction of the derivative of some error metric on the current approximation’s fit to some example input–output pairs (x, y), which some authors call “gradient descent” for some reason. (The mean squared error (f(x, θ) − y)² is a popular choice for the error metric, as is the negative log likelihood −log P(y | f(x, θ)). Some authors call these “loss functions” for some reason.)

Basically, the big empirical surprise of the previous decade is that given a lot of desired input–output pairs (x, y) and the proper engineering know-how, you can use large amounts of computing power to find parameters θ to fit a function approximator that “generalizes” well—meaning that if you compute ŷ = f(x, θ) for some x that wasn’t in any of your original example input–output pairs (which some authors call “training” data for some reason), it turns out that ŷ is usually pretty similar to the y you would have used in an example (x, y) pair.

It wasn’t obvious beforehand that this would work! You’d expect that if your function approximator has more parameters than you have example input–output pairs, it would overfit, implementing a complicated function that reproduced the example input–output pairs but outputted crazy nonsense for other choices of x—the more expressive function approximator proving useless for the lack of evidence to pin down the correct approximation.

And that is what we see for function approximators with only slightly more parameters than example input–output pairs, but for sufficiently large function approximators, the trend reverses and “generalization” improves—the more expressive function approximator proving useful after all, as it admits algorithmically simpler functions that fit the example pairs.

The other week I was talking about this to an acquaintance who seemed puzzled by my explanation. “What are the preconditions for this intuition about neural networks as function approximators?” they asked. (I paraphrase only slightly.) “I would assume this is true under specific conditions,” they continued, “but I don’t think we should expect such niceness to hold under capability increases. Why should we expect this to carry forward?”

I don’t know where this person was getting their information, but this made zero sense to me. I mean, okay, when you increase the number of parameters in your function approximator, it gets better at representing more complicated functions, which I guess you could describe as “capability increases”?

But multi-layer parameterized graphical function approximators created by iteratively using the derivative of some error metric to improve the quality of the approximation are still, actually, function approximators. Piecewise-linear functions are still piecewise-linear functions even when there are a lot of pieces. What did you think it was doing?

## Multi-layer Parameterized Graphical Function Approximators Have Many Exciting Applications

To be clear, you can do a lot with function approximation!

For example, if you assemble a collection of desired input–output pairs (x, y) where the x is an array of pixels depicting a handwritten digit and y is a character representing which digit, then you can fit a “convolutional” multi-layer parameterized graphical function approximator to approximate the function from pixel-arrays to digits—effectively allowing computers to read handwriting.

Such techniques have proven useful in all sorts of domains where a task can be conceptualized as a function from one data distribution to another: image synthesis, voice recognition, recommender systems—you name it. Famously, by approximating the next-token function in tokenized internet text, large language models can answer questions, write code, and perform other natural-language understanding tasks.

I could see how someone reading about computer systems performing cognitive tasks previously thought to require intelligence might be alarmed—and become further alarmed when reading that these systems are “trained” rather than coded in the manner of traditional computer programs. The summary evokes imagery of training a wild animal that might turn on us the moment it can seize power and reward itself rather than being dependent on its masters.

But “training” is just a suggestive name. It’s true that we don’t have a mechanistic understanding of how function approximators perform tasks, in contrast to traditional computer programs whose source code was written by a human. It’s plausible that this opacity represents grave risks, if we create powerful systems that we don’t know how to debug.

But whatever the real risks are, any hope of mitigating them is going to depend on acquiring the most accurate possible understanding of the problem. If the problem is itself largely one of our own lack of understanding, it helps to be specific about exactly which parts we do and don’t understand, rather than surrendering the entire field to a blurry aura of mystery and despair.

## An Example of Applying Multi-layer Parameterized Graphical Function Approximators in Success-Antecedent Computation Boosting

One of the exciting things about multi-layer parameterized graphical function approximators is that they can be combined with other methods for the automation of cognitive tasks (which is usually called “computing”, but some authors say “artificial intelligence” for some reason).

In the spirit of being specific about exactly which parts we do and don’t understand, I want to talk about Mnih et al. 2013’s work on getting computers to play classic Atari games (like Pong, Breakout, or Space Invaders). This work is notable as one of the first high-profile examples of using multi-layer parameterized graphical function approximators in conjunction with success-antecedent computation boosting (which some authors call “reinforcement learning” for some reason).

If you only read the news—if you’re not in tune with there being things to read besides news—I could see this result being quite alarming. Digital brains learning to play video games at superhuman levels from the raw pixels, rather than because a programmer sat down to write an automation policy for that particular game? Are we not already in the shadow of the coming race?

But people who read textbooks and not just news, being no less impressed by the result, are often inclined to take a subtler lesson from any particular headline-grabbing advance.

Mnih et al.’s Atari result built off the technique of Q-learning introduced two decades prior. Given a discrete-time present-state-based outcome-valued stochastic control problem (which some authors call a “Markov decision process” for some reason), Q-learning concerns itself with defining a function Q(s, a) that describes the value of taking action a while in state s, for some discrete sets of states and actions. For example, to describe the problem faced by an policy for a grid-based video game, the states might be the squares of the grid, and the available actions might be moving left, right, up, or down. The Q-value for being on a particular square and taking the move-right action might be the expected change in the game’s score from doing that (including a scaled-down expectation of score changes from future actions after that).

Upon finding itself in a particular state s, a Q-learning policy will usually perform the action with the highest Q(s, a), “exploiting” its current beliefs about the environment, but with some probability it will “explore” by taking a random action. The predicted outcomes of its decisions are compared to the actual outcomes to update the function Q(s, a), which can simply be represented as a table with as many rows as there are possible states and as many columns as there are possible actions. We have theorems to the effect that as the policy thoroughly explores the environment, it will eventually converge on the correct Q(s, a).

But Q-learning as originally conceived doesn’t work for the Atari games studied by Mnih et al., because it assumes a discrete set of possible states that could be represented with the rows in a table. This is intractable for problems where the state of the environment varies continuously. If a “state” in Pong is a 6-tuple of floating-point numbers representing the player’s paddle position, the opponent’s paddle position, and the x- and y-coordinates of the ball’s position and velocity, then there’s no way for the traditional Q-learning algorithm to base its behavior on its past experiences without having already seen that exact conjunction of paddle positions, ball position, and ball velocity, which almost never happens. So Mnih et al.’s great innovation was—

(Wait for it …)

—to replace the table representing Q(s, a) with a multi-layer parameterized graphical function approximator! By approximating the mapping from state–action pairs to discounted-sums-of-“rewards”, the “neural network” allows the policy to “generalize” from its experience, taking similar actions in relevantly similar states, without having visited those exact states before. There are a few other minor technical details needed to make it work well, but that’s the big idea.

And understanding the big idea probably changes your perspective on the headline-grabbing advance. (It certainly did for me.) “Deep learning is like evolving brains; it solves problems and we don’t know how” is an importantly different story from “We swapped out a table for a multi-layer parameterized graphical function approximator in this specific success-antecedent computation boosting algorithm, and now it can handle continuous state spaces.”

## Risks From Learned Approximation

When I solicited reading recommendations from people who ought to know about risks of harm from statistical modeling techniques, I was directed to a list of reputedly fatal-to-humanity problems, or “lethalities”.

Unfortunately, I don’t think I’m qualified to evaluate the list as a whole; I would seem to lack some necessary context. (The author keeps using the term “AGI” without defining it, and adjusted gross income doesn’t make sense in context.)

What I can say is that when the list discusses the kinds of statistical modeling techniques I’ve been studying lately, it starts to talk funny. I don’t think someone who’s been reading the same textbooks as I have (like Prince 2023 or Bishop and Bishop 2024) would write like this:

Even if you train really hard on an exact loss function, that doesn’t thereby create an explicit internal representation of the loss function inside an AI that then continues to pursue that exact loss function in distribution-shifted environments. Humans don’t explicitly pursue inclusive genetic fitness; outer optimization even on a very exact, very simple loss function doesn’t produce inner optimization in that direction. […] This is sufficient on its own […] to trash entire categories of naive alignment proposals which assume that if you optimize a bunch on a loss function calculated using some simple concept, you get perfect inner alignment on that concept.

To be clear, I agree that if you fit a function approximator by iteratively adjusting its parameters in the direction of the derivative of some loss function on example input–output pairs, that doesn’t create an explicit internal representation of the loss function inside the function approximator.

It’s just—why would you want that? And really, what would that even mean? If I use the mean squared error loss function to approximate a set of data points in the plane with a line (which some authors call a “linear regression model” for some reason), obviously the line itself does not somehow contain a representation of general squared-error-minimization. The line is just a line. The loss function defines how my choice of line responds to the data I’m trying to approximate with the line. (The mean squared error has some elegant mathematical properties, but is more sensitive to outliers than the mean absolute error.)

It’s the same thing for piecewise-linear functions defined by multi-layer parameterized graphical function approximators: the model is the dataset. It’s just not meaningful to talk about what a loss function implies, independently of the training data. (Mean squared error of what? Negative log likelihood of what? Finish the sentence!)

This confusion about loss functions seems to be linked to a particular theory of how statistical modeling techniques might be dangerous, in which “outer” training results in the emergence of an “inner” intelligent agent. If you expect that, and you expect intelligent agents to have a “utility function”, you might be inclined to think of “gradient descent” “training” as trying to transfer an outer “loss function” into an inner “utility function”, and perhaps to think that the attempted transfer primarily doesn’t work because “gradient descent” is an insufficiently powerful optimization method.

I guess the emergence of inner agents might be possible? I can’t rule it out. (“Functions” are very general, so I can’t claim that a function approximator could never implement an agent.) Maybe it would happen at some scale?

But taking the technology in front of us at face value, that’s not my default guess at how the machine intelligence transition would go down. If I had to guess, I’d imagine someone deliberately building an agent using function approximators as a critical component, rather than your function approximator secretly having an agent inside of it.

That’s a different threat model! If you’re trying to build a good agent, or trying to prohibit people from building bad agents using coordinated violence (which some authors call “regulation” for some reason), it matters what your threat model is!

(Statistical modeling engineer Jack Gallagher has described his experience of this debate as “like trying to discuss crash test methodology with people who insist that the wheels must be made of little cars, because how else would they move forward like a car does?”)

I don’t know how to build a general agent, but contemporary computing research offers clues as to how function approximators can be composed with other components to build systems that perform cognitive tasks.

Consider AlphaGo and its successor AlphaZero. In AlphaGo, one function approximator is used to approximate a function from board states to move probabilities. Another is used to approximate the function from board states to game outcomes, where the outcome is +1 when one player has certainly won, −1 when the other player has certainly won, and a proportionately intermediate value indicating who has the advantage when the outcome is still uncertain. The system plays both sides of a game, using the board-state-to-move-probability function and board-state-to-game-outcome function as heuristics to guide a search algorithm which some authors call “Monte Carlo tree search”. The board-state-to-move-probability function approximation is improved by adjusting its parameters in the direction of the derivative of its cross-entropy with the move distribution found by the search algorithm. The board-state-to-game-outcome function approximation is improved by adjusting its parameters in the direction of the derivative of its squared difference with the self-play game’s ultimate outcome.

This kind of design is not trivially safe. A similarly superhuman system that operated in the real world (instead of the restricted world of board games) that iteratively improved an action-to-money-in-this-bank-account function seems like it would have undesirable consequences, because if the search discovered that theft or fraud increased the amount of money in the bank account, then the action-to-money function approximator would generalizably steer the system into doing more theft and fraud.

Statistical modeling engineers have a saying: if you’re surprised by what your nerual net is doing, you haven’t looked at your training data closely enough. The problem in this hypothetical scenario is not that multi-layer parameterized graphical function approximators are inherently unpredictable, or must necessarily contain a power-seeking consequentialist agent in order to do any useful cognitive work. The problem is that you’re approximating the wrong function and get what you measure. The failure would still occur if the function approximator “generalizes” from its “training” data the way you’d expect. (If you can recognize fraud and theft, it’s easy enough to just not use that data as examples to approximate, but by hypothesis, this system is only looking at the account balance.) This doesn’t itself rule out more careful designs that use function approximators to approximate known-trustworthy processes and don’t search harder than their representation of value can support.

This may be cold comfort to people who anticipate a competitive future in which cognitive automation designs that more carefully respect human values will foreseeably fail to keep up with the frontier of more powerful systems that do search harder. It may not matter to the long-run future of the universe that you can build helpful and harmless language agents today, if your civilization gets eaten by more powerful and unfriendlier cognitive automation designs some number of years down the line. As a humble programmer and epistemology enthusiast, I have no assurances to offer, no principle or theory to guarantee everything will turn out all right in the end. Just a conviction that, whatever challenges confront us in the future, we’ll be a better position to face them by understanding the problem in as much detail as possible.

### Bibliography

Bishop, Christopher M., and Andrew M. Bishop. 2024. Deep Learning: Foundations and Concepts. Cambridge, UK: Cambridge University Press. https://www.bishopbook.com/

Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. “Playing Atari with Deep Reinforcement Learning.” https://arxiv.org/abs/1312.5602

Prince, Simon J.D. 2023. Understanding Deep Learning. Cambridge, MA: MIT Press. http://udlbook.com

Sutton, Richard S. and Andrew G. Barto. 2024. Reinforcement Learning. 2nd ed. Cambridge, MA: MIT Press.

# Beauty Is Truthiness, Truthiness Beauty?

Imagine reviewing Python code that looks something like this.

has_items = items is not None and len(items) > 0
if has_items:
...

...
do_stuff(has_items=has_items)

You might look at the conditional, and disapprove: None and empty collections are both falsey, so there's no reason to define that has_items variable; you could just say if items:.

But, wouldn't it be weird for do_stuff's has_items kwarg to take a collection rather than a boolean? I think it would be weird: even if the function's internals can probably rely on mere truthiness rather than needing an actual boolean type for some reason, why leave it to chance?

So, maybe it's okay to define the has_items variable for the sake of the function kwarg—and, having done so anyway, to use it as an if condition.

You might object further: but, but, None and the empty collection are still both falsey. Even if we've somehow been conned into defining a whole variable, shouldn't we say has_items = bool(items) rather than spelling out is not None and len(items) > 0 like some rube (or Rubyist) who doesn't know Python?!

Actually—maybe not. Much of Python's seductive charm comes from its friendly readability ("executable pseudocode"): it's intuitive for if not items to mean "if items is empty". English, and not the formal truthiness rules, are all ye need to know. In contrast, it's only if you already know the rules that bool(items) becomes meaningful. Since we care about good code and don't care about testing the reader's Python knowledge, spelling out items is not None and len(items) > 0 is very arguably the right thing to do here.

# Patches Welcome

"You look happy. Good day at work?"

"Yes, the open-source library we're depending on didn't have the functionality we need."

"That sounds like a bad thing."

"No, I mean, it didn't."

# RustConf 2016 Travelogue

(Previously on An Algorithmic Lucidity.)

The other weekend, excited to learn more and connect with people about what's going on at the forefront of expressive, performant, data-race-free computing—and eager for a healthy diversion from the last two months of agonizing delirium induced by the world-shattering insight about how everything I've cared about for the past fourteen years turns out to be related in unexpected and terrifying ways that I can't talk about for reasons that I also can't talk about—I took Friday off from my dayjob and caught a Thursday night flight out of SFO to exotic Portland (... I, um, don't travel much) for RustConf!

The conference itself was on Saturday, but Friday featured special training sessions run by members of the Rust core team! I was registered for Niko Matsakis's afternoon session on lifetimes, but I arrived at the venue (the Luxury Collection Nines Hotel) early to get registered (I had never seen socks as conference swag before!) and hang out with folks and get a little bit of coding done: my coolest Rust project so far is a chess engine that I wrote this time last year (feel free to go ahead and give it a Star!) which I wanted the option to show off (Option<ShowOff>) to other conference attendees, but the pretty web application frontend had broken due to a recent bug and my JavaScript build pipeline having rotted. I fixed it just in time for the lifetimes training session to start.

# 0x1f431 CAT FACE

diff --git a/.bash_aliases b/.bash_aliases
index 648287f..e00dbc9 100644
--- a/.bash_aliases
+++ b/.bash_aliases
@@ -34,6 +34,9 @@ alias gre="env | grep"
alias grps="ps aux | grep"
alias grports="netstat -tulpn | grep"

+# cat
+alias ?="cat"
+
# Vagrant
alias v="vagrant"


# Subzero

Python has this elegant destructuring-assignment iterable-unpacking syntax that every serious Pythonista and her dog tends to use whereëver possible. So where a novice might write

split_address = address.split(':')
port = split_address[1]

a serious Pythonista (and her dog) would instead say

host, port = address.split(':')

which is clearly superior on grounds of succinctness and beauty; we don't want our vision to be cluttered with this ugly sub-zero, sub-one notation when we can just declare a sequence of names.

Consider, however, the somewhat-uncommon case where we have an iterable that, for whatever reason, we happen to know contains only one element, and we want to assign that one element to a variable. Here, I've seen people who ought to know better fall back to indexing:

if len(jobs) == 1:
job = jobs[0]

But there's no reason to violate the æsthetic principle of "use a length-n (or smaller) tuple of identifiers on the left side of a destructuring assignment in order to name the elements of a length-n iterable" just because n happens to be one:

if len(jobs) == 1:
job, = jobs

# Attentional Shunt

#!/usr/bin/env python3

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

"""
Configure the machine to shunt traffic to distracting sites to localhost,
preserving attention.
"""

import os
import argparse
import subprocess
import sys
from datetime import datetime, timedelta

ETC_HOSTS = os.path.join(os.sep, 'etc', 'hosts')
HEADER = "# below managed by attentional shunt"
INVERSE_COMMANDS = {'enable': "disable", 'disable': "enable"}

DISTRACTING_HOSTS = (  # modify as needed
'news.ycombinator.com',
'math.stackexchange.com',
'scifi.stackexchange.com',
'worldbuilding.stackexchange.com',
'workplace.stackexchange.com',
'codereview.stackexchange.com',
'puzzling.stackexchange.com',
'slatestarcodex.com',
)
SHUNTING_LINES = "\n{}\n{}\n".format(
'\n'.join("127.0.0.1 {}".format(domain)
for domain in DISTRACTING_HOSTS)
)

def conditionally_reexec_with_sudo():
if os.geteuid() != 0:
os.execvp("sudo", ["sudo"] + sys.argv)

def enable_shunt():
if is_enabled():
return  # nothing to do
with open(ETC_HOSTS, 'a') as etc_hosts:
etc_hosts.write(SHUNTING_LINES)

def disable_shunt():
with open(ETC_HOSTS) as etc_hosts:
if SHUNTING_LINES not in content:
return  # nothing to do
with open(ETC_HOSTS, 'w') as etc_hosts:
etc_hosts.write(content.replace(SHUNTING_LINES, ''))

def is_enabled():
with open(ETC_HOSTS) as etc_hosts:

def status():
state = "enabled" if is_enabled() else "disabled"
print("attentional shunt is {}".format(state))

def schedule(command, when):  # requires at job-scheduling utility
timestamp = when.strftime("%H:%M %Y-%m-%d")
at_command = ['at', timestamp]
at = subprocess.Popen(
at_command,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
at.communicate(command.encode())

if __name__ == "__main__":
arg_parser = argparse.ArgumentParser(description=__doc__)
choices=("enable", "disable", "status"))
help=("revert state change after this many "
"minutes"))
args = arg_parser.parse_args()
if args.command == "status":
status()
else:
conditionally_reexec_with_sudo()
if args.command == "enable":
enable_shunt()
elif args.command == "disable":
disable_shunt()

if args.duration:
now = datetime.now()
inverse_command = INVERSE_COMMANDS[args.command]
schedule(
"{} {}".format(os.path.realpath(__file__), inverse_command),
now + timedelta(minutes=args.duration)
)

# RustCamp Reminiscences

On Saturday the first, I attended RustCamp, the first conference dedicated to the newish (in development for fiveish years, but having just hit version 1.0.0 this May, with all the stability guarantees that implies under the benevolent iron fist of semantic versioning) programming language Rust!

Why RustCamp? (It's a reasonable rhetorical question with which to begin this paragraph: going to a conference has opportunity costs in time and money; things worth blogging about are occasionally worth justifying—even if no one actually asked me for a justification.) A lot of the answer can be derived from the answer to a more fundamental question, "Why Rust?" And for me, I think a lot of the answer to that has to do with being sick of being a fake programmer living in a fake world that calls itself Python.

Don't get me wrong: Python is a very nice place to live: good weather, booming labor market, located in a good school district, with most of the books you might want already on the shelves of the main library and almost all of the others a mere hold request away. It's idyllic. Almost ... too idyllic, as if the trees and swimming pools and list comprehensions and strip malls are conspiring to hide something from us, to keep us from guessing what lurks in the underworld between the lines, the gears and gremlins feeding and turning in the layers of tools built on tools built on tools that undergird our experience. True, sometimes small imperfections in the underworld manifest themselves as strange happenings that we can't explain. But mostly, we don't worry ourselves about it. Life is simple in Python. We reassure our children that that legends of demon-king Malloc are just stories. Everything is a duck; ducks can have names and can be mutable or immutable. It all just works like you would expect from common sense, at least if you grew up around here.

# $I used to think of $ in regular expressions as matching the end of the string. I was wrong! It actually might do something more subtle than that, depending on what regex engine you're using. In my native Python's re module, $ [m]atches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. Note! The end of the string, or just before the newline at the end of the string. In [2]: my_regex = re.compile("foo$")

In [3]: my_regex.match("foo")
Out[3]: <_sre.SRE_Match object; span=(0, 3), match='foo'>

In [4]: my_regex.match("foo\n")
Out[4]: <_sre.SRE_Match object; span=(0, 3), match='foo'>

I guess I can see the motivation—we often want to use the newline character as a terminator of lines (by definition) or files (by sacred tradition), without wanting to think of \n as really part of the content of interest—but the disjunctive behavior of $ can be a source of treacherous bugs in the fingers of misinformed programmers! # The Foundations of Erasure Codes (cross-posted from the SwiftStack Blog) In enabling mechanism to combine together general symbols, in successions of unlimited variety and extent, a uniting link is established between the operations of matter and the abstract mental processes of the most abstract branch of mathematical science. A new, a vast, and a powerful language is developed for the future use of analysis, in which to wield its truths so that these may become of more speedy and accurate practical application for the purposes of mankind [sic] than the means hitherto in our possession have rendered possible. Ada Lovelace on Charles Babbage's Analytical Engine, 1842 Dear reader, if you're reading [the SwiftStack Blog], you may have already heard that erasure codes have been added to OpenStack Swift (in beta for the 2.3.0 Kilo release, with continuing improvements thereafter) and that this is a really great thing that will make the world a better place. All of this is entirely true. But what is perhaps less widely heard is exactly what erasure codes are and exactly why their arrival in Swift is a really great thing that will make the world a better place. That is what I aim to show you in this post—and I do mean show, not merely tell, for while integrating erasure codes into a production-grade storage system is (was!) an immense effort requiring months of work by some of the finest programmers the human race has to offer, the core idea is actually simple enough to fit in a (longish) blog post. Indeed, by the end of this post, we will have written a complete working implementation of a simple variant of Reed–Solomon coding, not entirely unlike what is used in Swift itself. No prior knowledge will be assumed except a working knowledge of high-school algebra and the Python programming language. # XXX III const PSEUDO_DIGITS: [char; 7] = ['M', 'D', 'C', 'L', 'X', 'V', 'I']; const PSEUDO_PLACE_VALUES: [usize; 7] = [1000, 500, 100, 50, 10, 5, 1]; #[allow(unused_parens)] fn integer_to_roman(integer: usize) -> String { let mut remaining = integer; let mut bildungsroman = String::new(); // get it?? It sounds like _building Roman_ (numerals), but it's // also part of the story about me coming into my own as a // programmer by learning a grown-up language // // XXX http://tvtropes.org/pmwiki/pmwiki.php/Main/DontExplainTheJoke for ((index, value), &figure) in PSEUDO_PLACE_VALUES.iter() .enumerate().zip(PSEUDO_DIGITS.iter()) { let factor = remaining / value; remaining = remaining % value; if figure == 'M' || factor < 4 { for _ in 0..factor { bildungsroman.push(figure); } } // IV, IX, XL, &c. let smaller_unit_index = index + 2 - (index % 2); if smaller_unit_index < PSEUDO_PLACE_VALUES.len() { let smaller_unit_value = PSEUDO_PLACE_VALUES[smaller_unit_index]; let smaller_unit_figure = PSEUDO_DIGITS[smaller_unit_index]; if value - remaining <= smaller_unit_value { bildungsroman.push(smaller_unit_figure); bildungsroman.push(figure); remaining -= (value - smaller_unit_value); } } } bildungsroman } # Mock Some people, when confronted with a Python unit-testing problem, think, "I know, I'll use mock." Now they have <MagicMock name='two_problems' id='140279267635776'>. # XXX I ;; XXX: adorable (defmacro λ [& code] (fn ~@code)) # Convert Markdown to HTML Within Emacs Using Pandoc Okay, so there actually is a pandoc-mode, but I couldn't figure out how to configure and use it, so it was easier to just write the one command that I wanted— (defun markdown-to-html () (interactive) (let* ((basename (file-name-sans-extension (buffer-file-name))) (html-filename (format "%s.html" basename))) (shell-command (format "pandoc -o %s %s" html-filename (buffer-file-name))) (find-file-other-window html-filename))) # My Favorite Error Message This Year zmd@SuddenHeap:~/Code/Finetooth$ git commit --amend
Traceback (most recent call last):
File "./manage.py", line 8, in <module>
from django.core.management import execute_from_command_line
ImportError: No module named django.core.management

# Debugging Techniques I

#def my_problematic_function(x):
def my_problematic_function(x, call_count=[]):
call_count.append(1)
print("ZMD DEBUG call #{}".format(len(call_count)))
import traceback; traceback.print_stack()
result = do_stuff(x)
etc()

# Convention

$lein new 3lg2048 Project names must be valid Clojure symbols.$ lein new Thirty-Three
Project names containing uppercase letters are not recommended
and will be rejected by repositories like Clojars and Central.
If you're truly unable to use a lowercase name, please set the
LEIN_BREAK_CONVENTION environment variable and try again.
$LEIN_BREAK_CONVENTION=1$ lein new Thirty-Three
Project names containing uppercase letters are not recommended
and will be rejected by repositories like Clojars and Central.
If you're truly unable to use a lowercase name, please set the
LEIN_BREAK_CONVENTION environment variable and try again.
$export LEIN_BREAK_CONVENTION="fuck you"$ lein new Thirty-Three

# Clarity of Intent

<bob> and here on line 79---
<bob>         else:
<bob>             return None
<bob> you don't need the else
<bob> Python functions will implicitly return None
<alice> \$ python -c "import this" | sed -n 4p


# Consistent Hashing

Dear reader, suppose you're a distibuted data storage system. Your soul (although some pedants would insist on the word program) is dispersed across a cluster of several networked computers. From time to time, your human patrons give you files, and your job—more than that, the very purpose of your existence—is to store these files for safekeeping and later retrieval.

The humans who originally crafted your soul chose a simple algorithm as the means by which you decide which file goes on which of the many storage devices that live in the computers you inhabit: you find the MD5 hash of the filename, take its residue modulo n where n is the number of devices you have—let's call the result i—and you put the file on the (zero-indexed) ith device. So when you had sixteen devices and the humans wanted you to store twilight.pdf, you computed md5("twilight.pdf") = 429eb07bb8a3871c431fe03694105883, saw that the lowest nibble was 3, and put the file on your 3rd device (most humans would say the fourth device, counting from one).

It's not a bad system, you tell yourself (some sort of pride or loyalty preventing you from disparaging your creators' efforts, even to yourself). At least it keeps the data spread out evenly. (A shudder goes down your internal buses as you contemplate what disasters might have happened if your creators had been even more naive and, say, had you put files with names starting with A through D on the first device, &c. What would have happened that time when your patrons decided they wanted to store beat00001.mp3 through beat18691.mp3?)

# Computing the Powerset

Suppose we want to find the powerset of a given set, that is, the set of all its subsets. How might we go about it? Well, the powerset of the empty set is the set containing the empty set.

And the powerset of the union of a set S with a set containing one element e, is just the union of the powerset of S with the set whose elements are like the members of the powerset of S except that they also contain e.

So in Clojure we might say

(require 'clojure.set)

(defn include-element [collection element]
(map (fn [set] (clojure.set/union set #{element}))
collection))

(defn powerset [set]
(if (empty? set)
#{#{}}
(let [subproblem (powerset (rest set))]
(clojure.set/union subproblem
(include-element subproblem
(first set))))))
`