Beauty Is Truthiness, Truthiness Beauty?

Imagine reviewing Python code that looks something like this.

has_items = items is not None and len(items) > 0
if has_items:
    ...

...
do_stuff(has_items=has_items)

You might look at the conditional, and disapprove: None and empty collections are both falsey, so there's no reason to define that has_items variable; you could just say if items:.

But, wouldn't it be weird for do_stuff's has_items kwarg to take a collection rather than a boolean? I think it would be weird: even if the function's internals can probably rely on mere truthiness rather than needing an actual boolean type for some reason, why leave it to chance?

So, maybe it's okay to define the has_items variable for the sake of the function kwarg—and, having done so anyway, to use it as an if condition.

You might object further: but, but, None and the empty collection are still both falsey. Even if we've somehow been conned into defining a whole variable, shouldn't we say has_items = bool(items) rather than spelling out is not None and len(items) > 0 like some rube (or Rubyist) who doesn't know Python?!

Actually—maybe not. Much of Python's seductive charm comes from its friendly readability ("executable pseudocode"): it's intuitive for if not items to mean "if items is empty". English, and not the formal truthiness rules, are all ye need to know. In contrast, it's only if you already know the rules that bool(items) becomes meaningful. Since we care about good code and don't care about testing the reader's Python knowledge, spelling out items is not None and len(items) > 0 is very arguably the right thing to do here.

Subzero

Python has this elegant destructuring-assignment iterable-unpacking syntax that every serious Pythonista and her dog tends to use whereëver possible. So where a novice might write

split_address = address.split(':')
host = split_address[0]
port = split_address[1]

a serious Pythonista (and her dog) would instead say

host, port = address.split(':')

which is clearly superior on grounds of succinctness and beauty; we don't want our vision to be cluttered with this ugly sub-zero, sub-one notation when we can just declare a sequence of names.

Consider, however, the somewhat-uncommon case where we have an iterable that, for whatever reason, we happen to know contains only one element, and we want to assign that one element to a variable. Here, I've seen people who ought to know better fall back to indexing:

if len(jobs) == 1:
   job = jobs[0]

But there's no reason to violate the æsthetic principle of "use a length-n (or smaller) tuple of identifiers on the left side of a destructuring assignment in order to name the elements of a length-n iterable" just because n happens to be one:

if len(jobs) == 1:
   job, = jobs

Attentional Shunt

#!/usr/bin/env python3

# Copyright © 2015 Zack M. Davis

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

"""
Configure the machine to shunt traffic to distracting sites to localhost,
preserving attention.
"""

import os
import argparse
import subprocess
import sys
from datetime import datetime, timedelta

ETC_HOSTS = os.path.join(os.sep, 'etc', 'hosts')
HEADER = "# below managed by attentional shunt"
INVERSE_COMMANDS = {'enable': "disable", 'disable': "enable"}

DISTRACTING_HOSTS = (  # modify as needed
    'news.ycombinator.com',
    'math.stackexchange.com',
    'scifi.stackexchange.com',
    'worldbuilding.stackexchange.com',
    'workplace.stackexchange.com',
    'academia.stackexchange.com',
    'codereview.stackexchange.com',
    'puzzling.stackexchange.com',
    'slatestarcodex.com',
    'twitter.com',
    'www.facebook.com',
    'slatestarscratchpad.tumblr.com',
)
SHUNTING_LINES = "\n{}\n{}\n".format(
    HEADER,
    '\n'.join("127.0.0.1 {}".format(domain)
              for domain in DISTRACTING_HOSTS)
)


def conditionally_reexec_with_sudo():
    if os.geteuid() != 0:
        os.execvp("sudo", ["sudo"] + sys.argv)


def enable_shunt():
    if is_enabled():
        return  # nothing to do
    with open(ETC_HOSTS, 'a') as etc_hosts:
        etc_hosts.write(SHUNTING_LINES)


def disable_shunt():
    with open(ETC_HOSTS) as etc_hosts:
        content = etc_hosts.read()
    if SHUNTING_LINES not in content:
        return  # nothing to do
    with open(ETC_HOSTS, 'w') as etc_hosts:
        etc_hosts.write(content.replace(SHUNTING_LINES, ''))


def is_enabled():
    with open(ETC_HOSTS) as etc_hosts:
        content = etc_hosts.read()
    return HEADER in content


def status():
    state = "enabled" if is_enabled() else "disabled"
    print("attentional shunt is {}".format(state))


def schedule(command, when):  # requires `at` job-scheduling utility
    timestamp = when.strftime("%H:%M %Y-%m-%d")
    at_command = ['at', timestamp]
    at = subprocess.Popen(
        at_command,
        stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE
    )
    at.communicate(command.encode())


if __name__ == "__main__":
    arg_parser = argparse.ArgumentParser(description=__doc__)
    arg_parser.add_argument('command',
                            choices=("enable", "disable", "status"))
    arg_parser.add_argument('duration', nargs='?', type=int,
                            help=("revert state change after this many "
                                  "minutes"))
    args = arg_parser.parse_args()
    if args.command == "status":
        status()
    else:
        conditionally_reexec_with_sudo()
        if args.command == "enable":
            enable_shunt()
        elif args.command == "disable":
            disable_shunt()

        if args.duration:
            now = datetime.now()
            inverse_command = INVERSE_COMMANDS[args.command]
            schedule(
                "{} {}".format(os.path.realpath(__file__), inverse_command),
                now + timedelta(minutes=args.duration)
            )

Monthly Favorites, September 2015

Favorite commit message fragment: "it turns out that it's `\d` that matches a digit, whereas, counterintuitively, `d` matches the letter 'd'."

Favorite line of code: a tie, between

    let mut time_radios: Vec<(Commit, mpsc::Receiver<(Option<Commit>, f32)>)> =
        Vec::new();

and

        for (previous, new), expected in zip(
                itertools.product(('foo', None), ('bar', None)),
                ("from foo to bar", "from foo", "to bar", "")):

(Though both of these contain at least one internal newline, it's only for PEP 8-like reasons; they're both what we would intuitively call one "logical" line of code.)

Favorite film: My Little Pony: Equestria Girls: Friendship Games. (Poor plotting even by Equestria Girls standards, and it could only have been because of magic that I didn't get semantically satiated on the word magic during the climax. Alternate-Twilight's idiotic decision to withdraw her application to the Everton independent study program in favor of transferring to the Canterlot School of Mediocrity and Friendship in order to be closer to the Humane 5+1 was as predictable as it was disappointing—though I do credit the writers for at least acknowledging the existence of alternatives to school. And what was up with that scene where we're momentarily led to believe that alternate-Spike got switched up with Equestria-Spike in a portal accident, but then it turns out that, no, alternate-Spike just magically learned how to talk? Is it that there was no time in the script to deal with the consequences of swapping sidekicks across worlds, but that Cathy Weseluck's contract guaranteed her a speaking role? Despite being the weakest film in the trilogy (far worse than its brilliant predecessor, My Little Pony: Equestria Girls: Rainbow Rocks), Friendship Games is still a fun watch, and an easy favorite during a month when I didn't see any other feature-length films.)

RustCamp Reminiscences

On Saturday the first, I attended RustCamp, the first conference dedicated to the newish (in development for fiveish years, but having just hit version 1.0.0 this May, with all the stability guarantees that implies under the benevolent iron fist of semantic versioning) programming language Rust!

badge_and_lambda_dragon_shirt

Why RustCamp? (It's a reasonable rhetorical question with which to begin this paragraph: going to a conference has opportunity costs in time and money; things worth blogging about are occasionally worth justifying—even if no one actually asked me for a justification.) A lot of the answer can be derived from the answer to a more fundamental question, "Why Rust?" And for me, I think a lot of the answer to that has to do with being sick of being a fake programmer living in a fake world that calls itself Python.

Don't get me wrong: Python is a very nice place to live: good weather, booming labor market, located in a good school district, with most of the books you might want already on the shelves of the main library and almost all of the others a mere hold request away. It's idyllic. Almost ... too idyllic, as if the trees and swimming pools and list comprehensions and strip malls are conspiring to hide something from us, to keep us from guessing what lurks in the underworld between the lines, the gears and gremlins feeding and turning in the layers of tools built on tools built on tools that undergird our experience. True, sometimes small imperfections in the underworld manifest themselves as strange happenings that we can't explain. But mostly, we don't worry ourselves about it. Life is simple in Python. We reassure our children that that legends of demon-king Malloc are just stories. Everything is a duck; ducks can have names and can be mutable or immutable. It all just works like you would expect from common sense, at least if you grew up around here.

Continue reading

$

I used to think of $ in regular expressions as matching the end of the string. I was wrong! It actually might do something more subtle than that, depending on what regex engine you're using. In my native Python's re module, $

[m]atches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline.

Note! The end of the string, or just before the newline at the end of the string.

In [2]: my_regex = re.compile("foo$")

In [3]: my_regex.match("foo")
Out[3]: <_sre.SRE_Match object; span=(0, 3), match='foo'>

In [4]: my_regex.match("foo\n")
Out[4]: <_sre.SRE_Match object; span=(0, 3), match='foo'>

I guess I can see the motivation—we often want to use the newline character as a terminator of lines (by definition) or files (by sacred tradition), without wanting to think of \n as really part of the content of interest—but the disjunctive behavior of $ can be a source of treacherous bugs in the fingers of misinformed programmers!

Continue reading

__pycache__/shibboleth.cpython-34.pyc

Sometimes I worry that people with power in Society will look down on me for my pronunciation of the .pyc extension for Python bytecode files. I always want to say pike-cee, even though many would argue that the c should either be hard (pike) or said as the name of the letter (py-cee), but certainly not both in sequence!

The Foundations of Erasure Codes

(cross-posted from the SwiftStack Blog)

In enabling mechanism to combine together general symbols, in successions of unlimited variety and extent, a uniting link is established between the operations of matter and the abstract mental processes of the most abstract branch of mathematical science. A new, a vast, and a powerful language is developed for the future use of analysis, in which to wield its truths so that these may become of more speedy and accurate practical application for the purposes of mankind [sic] than the means hitherto in our possession have rendered possible.

Ada Lovelace on Charles Babbage's Analytical Engine, 1842

Dear reader, if you're reading [the SwiftStack Blog], you may have already heard that erasure codes have been added to OpenStack Swift (in beta for the 2.3.0 Kilo release, with continuing improvements thereafter) and that this is a really great thing that will make the world a better place.

All of this is entirely true. But what is perhaps less widely heard is exactly what erasure codes are and exactly why their arrival in Swift is a really great thing that will make the world a better place. That is what I aim to show you in this post—and I do mean show, not merely tell, for while integrating erasure codes into a production-grade storage system is (was!) an immense effort requiring months of work by some of the finest programmers the human race has to offer, the core idea is actually simple enough to fit in a (longish) blog post. Indeed, by the end of this post, we will have written a complete working implementation of a simple variant of Reed–Solomon coding, not entirely unlike what is used in Swift itself. No prior knowledge will be assumed except a working knowledge of high-school algebra and the Python programming language.

Continue reading

Last Friday Night

it's a blacked-out blur, but I'm pretty sure

* * *

$ heroku create
Creating howling-nightmare-4505... done, stack is cedar
http://howling-nightmare-4505.herokuapp.com/ | git@heroku.com:howling-nightmare-4505.git
Git remote heroku added

"Did they—did they change their random words dictionary for Halloween?"

* * *

-----> Python app detected
-----> Installing runtime (python-2.7.8)

"What?! No! What are you doing, you crazy machine?!"

* * *

$ echo "python-3.4.1" > runtime.txt
$ g a .
$ gco -m "the month of July 2010 called and wants their programming language back"

* * *

< What are you spinning up the box for?

> it's Friday night

< How does that lead to box spinning?

> previous message was an attempt at humor, as if to suggest that I'm the sort of person for whom deploying a web application fulfills a similar purpose as some sort of wild social event with drugs might for some others, about which they might offer a similarly vacuous "explanation"

* * *

it ru-uled

Growl

Dear reader, imagine you have an idea for a work of prose that you want to have finished by Election Day for reasons which will become clear later, and you're not sure how long it should end up being, but you think maybe around twelve thousand words. When considering what you can do to ensure that this feat will actually be accomplished, it occurs to you that you could start writing now. Or

Continue reading

Huffman

Dear reader, you know what's way more fun than feeling sad about the nature of the cosmos? Data compression, that's what! Suppose you want to send a message to your friends in a nearby alternate universe, but interuniversal communication bandwidth is very expensive (different universes can't physically interact, so we and our alternate-universe analogues can only communicate by mutually inferring what the other party must be saying, which takes monstrous amounts of computing power and is not cheap), so you need to make your message as brief as possible. Note that 'brief' doesn't just have to do with how long your message is in natural language, it also has to do with how that message is represented over the transuniveral communication channel: indeed, the more efficient the encoding, the more you can afford to say on a fixed budget.

The classic ASCII encoding scheme uses seven bits to represent each character. (Seven?—you ask perplexedly, surely you mean eight? Apparently it was seven in the original version.) Can we do better? Well ... ASCII has a lot of stuff that arguably you don't need that badly. Really, upper and lower case letters? Ampersands, asterisks, backslashes? And don't get me started about those unprintable control characters! If we restrict our message to just the uncased alphabet A through Z plus space and a few punctuation marks, then we can encode our message using only a 32 (= 25) character set, at five bits per character.

Can we do better? Seemingly not—24 = 16 isn't a big enough character set to cover the alphabet. Unless ...

Continue reading

Computing the Arithmetic Derivative

Jurij Kovič's paper "The Arithmetic Derivative and Antiderivative" contains a curious remark in Section 1.2. Having just stated the definition of the logarithmic arithmetic derivative (L(n) = n′/n = Σj aj/pj where the prime mark indicates the arithmetic derivative, and Πipiai is the prime factorization of n), Kovič writes:

The logarithmic derivative is an additive function L(xy) = L(x) + L(y) for any x, y ∈ ℚ. Consequently, using a table of values L(p) = 1/p (computed to sufficient decimal places!) and the formula D(x) = L(xx, it is easy to find D(n) for n ∈ ℕ having all its prime factors in the table.

... a table of values? Did I read that correctly? Surely there must be some mistake; surely a paper published in 2012 can't expect us to rely on a printed table, for all the world as if we were John Napier in the seventeenth century! But never fear, dear reader, for the situation is easily rectified—with just a few lines of Python, you can take all the arithmetic derivatives you like on your own personal computing device.

Continue reading

Training Your Very Own Turtle to Draw the Boundary of the Mandelbrot Set

Dear reader, I don't think I've ever told you how much I love the Python standard library, but I do. When they say "Batteries included," they may not mean it in the sense of "a device that produces electricity by a chemical reaction between two substances," but they do mean it in the sense of "an array of similar things," where the similar things are great libraries. If you need a CSV reader, it's there. If you need fixed-point decimal arithmetic, it's there. But although perhaps it should not have surprised me, never has my joy and appreciation been greater than the fateful moment when I learned that the standard library itself contains a module for

Continue reading