RustConf 2016 Travelogue

(Previously on An Algorithmic Lucidity.)

sfo_reflections

The other weekend, excited to learn more and connect with people about what's going on at the forefront of expressive, performant, data-race-free computing—and eager for a healthy diversion from the last two months of agonizing delirium induced by the world-shattering insight about how everything I've cared about for the past fourteen years turns out to be related in unexpected and terrifying ways that I can't talk about for reasons that I also can't talk about—I took Friday off from my dayjob and caught a Thursday night flight out of SFO to exotic Portland (... I, um, don't travel much) for RustConf!

The conference itself was on Saturday, but Friday featured special training sessions run by members of the Rust core team! I was registered for Niko Matsakis's afternoon session on lifetimes, but I arrived at the venue (the Luxury Collection Nines Hotel) early to get registered (I had never seen socks as conference swag before!) and hang out with folks and get a little bit of coding done: my coolest Rust project so far is a chess engine that I wrote this time last year (feel free to go ahead and give it a Star!) which I wanted the option to show off (Option<ShowOff>) to other conference attendees, but the pretty web application frontend had broken due to a recent bug and my JavaScript build pipeline having rotted. I fixed it just in time for the lifetimes training session to start.

Continue reading

0x1f431 CAT FACE

diff --git a/.bash_aliases b/.bash_aliases
index 648287f..e00dbc9 100644
--- a/.bash_aliases
+++ b/.bash_aliases
@@ -34,6 +34,9 @@ alias gre="env | grep"
 alias grps="ps aux | grep"
 alias grports="netstat -tulpn | grep"
 
+# cat
+alias 🐱="cat"
+
 # Vagrant
 alias v="vagrant"

Subzero

Python has this elegant destructuring-assignment iterable-unpacking syntax that every serious Pythonista and her dog tends to use whereëver possible. So where a novice might write

split_address = address.split(':')
host = split_address[0]
port = split_address[1]

a serious Pythonista (and her dog) would instead say

host, port = address.split(':')

which is clearly superior on grounds of succinctness and beauty; we don't want our vision to be cluttered with this ugly sub-zero, sub-one notation when we can just declare a sequence of names.

Consider, however, the somewhat-uncommon case where we have an iterable that, for whatever reason, we happen to know contains only one element, and we want to assign that one element to a variable. Here, I've seen people who ought to know better fall back to indexing:

if len(jobs) == 1:
   job = jobs[0]

But there's no reason to violate the æsthetic principle of "use a length-n (or smaller) tuple of identifiers on the left side of a destructuring assignment in order to name the elements of a length-n iterable" just because n happens to be one:

if len(jobs) == 1:
   job, = jobs

Attentional Shunt

#!/usr/bin/env python3

# Copyright © 2015 Zack M. Davis

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

"""
Configure the machine to shunt traffic to distracting sites to localhost,
preserving attention.
"""

import os
import argparse
import subprocess
import sys
from datetime import datetime, timedelta

ETC_HOSTS = os.path.join(os.sep, 'etc', 'hosts')
HEADER = "# below managed by attentional shunt"
INVERSE_COMMANDS = {'enable': "disable", 'disable': "enable"}

DISTRACTING_HOSTS = (  # modify as needed
    'news.ycombinator.com',
    'math.stackexchange.com',
    'scifi.stackexchange.com',
    'worldbuilding.stackexchange.com',
    'workplace.stackexchange.com',
    'academia.stackexchange.com',
    'codereview.stackexchange.com',
    'puzzling.stackexchange.com',
    'slatestarcodex.com',
    'twitter.com',
    'www.facebook.com',
    'slatestarscratchpad.tumblr.com',
)
SHUNTING_LINES = "\n{}\n{}\n".format(
    HEADER,
    '\n'.join("127.0.0.1 {}".format(domain)
              for domain in DISTRACTING_HOSTS)
)


def conditionally_reexec_with_sudo():
    if os.geteuid() != 0:
        os.execvp("sudo", ["sudo"] + sys.argv)


def enable_shunt():
    if is_enabled():
        return  # nothing to do
    with open(ETC_HOSTS, 'a') as etc_hosts:
        etc_hosts.write(SHUNTING_LINES)


def disable_shunt():
    with open(ETC_HOSTS) as etc_hosts:
        content = etc_hosts.read()
    if SHUNTING_LINES not in content:
        return  # nothing to do
    with open(ETC_HOSTS, 'w') as etc_hosts:
        etc_hosts.write(content.replace(SHUNTING_LINES, ''))


def is_enabled():
    with open(ETC_HOSTS) as etc_hosts:
        content = etc_hosts.read()
    return HEADER in content


def status():
    state = "enabled" if is_enabled() else "disabled"
    print("attentional shunt is {}".format(state))


def schedule(command, when):  # requires `at` job-scheduling utility
    timestamp = when.strftime("%H:%M %Y-%m-%d")
    at_command = ['at', timestamp]
    at = subprocess.Popen(
        at_command,
        stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE
    )
    at.communicate(command.encode())


if __name__ == "__main__":
    arg_parser = argparse.ArgumentParser(description=__doc__)
    arg_parser.add_argument('command',
                            choices=("enable", "disable", "status"))
    arg_parser.add_argument('duration', nargs='?', type=int,
                            help=("revert state change after this many "
                                  "minutes"))
    args = arg_parser.parse_args()
    if args.command == "status":
        status()
    else:
        conditionally_reexec_with_sudo()
        if args.command == "enable":
            enable_shunt()
        elif args.command == "disable":
            disable_shunt()

        if args.duration:
            now = datetime.now()
            inverse_command = INVERSE_COMMANDS[args.command]
            schedule(
                "{} {}".format(os.path.realpath(__file__), inverse_command),
                now + timedelta(minutes=args.duration)
            )

RustCamp Reminiscences

On Saturday the first, I attended RustCamp, the first conference dedicated to the newish (in development for fiveish years, but having just hit version 1.0.0 this May, with all the stability guarantees that implies under the benevolent iron fist of semantic versioning) programming language Rust!

badge_and_lambda_dragon_shirt

Why RustCamp? (It's a reasonable rhetorical question with which to begin this paragraph: going to a conference has opportunity costs in time and money; things worth blogging about are occasionally worth justifying—even if no one actually asked me for a justification.) A lot of the answer can be derived from the answer to a more fundamental question, "Why Rust?" And for me, I think a lot of the answer to that has to do with being sick of being a fake programmer living in a fake world that calls itself Python.

Don't get me wrong: Python is a very nice place to live: good weather, booming labor market, located in a good school district, with most of the books you might want already on the shelves of the main library and almost all of the others a mere hold request away. It's idyllic. Almost ... too idyllic, as if the trees and swimming pools and list comprehensions and strip malls are conspiring to hide something from us, to keep us from guessing what lurks in the underworld between the lines, the gears and gremlins feeding and turning in the layers of tools built on tools built on tools that undergird our experience. True, sometimes small imperfections in the underworld manifest themselves as strange happenings that we can't explain. But mostly, we don't worry ourselves about it. Life is simple in Python. We reassure our children that that legends of demon-king Malloc are just stories. Everything is a duck; ducks can have names and can be mutable or immutable. It all just works like you would expect from common sense, at least if you grew up around here.

Continue reading

$

I used to think of $ in regular expressions as matching the end of the string. I was wrong! It actually might do something more subtle than that, depending on what regex engine you're using. In my native Python's re module, $

[m]atches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline.

Note! The end of the string, or just before the newline at the end of the string.

In [2]: my_regex = re.compile("foo$")

In [3]: my_regex.match("foo")
Out[3]: <_sre.SRE_Match object; span=(0, 3), match='foo'>

In [4]: my_regex.match("foo\n")
Out[4]: <_sre.SRE_Match object; span=(0, 3), match='foo'>

I guess I can see the motivation—we often want to use the newline character as a terminator of lines (by definition) or files (by sacred tradition), without wanting to think of \n as really part of the content of interest—but the disjunctive behavior of $ can be a source of treacherous bugs in the fingers of misinformed programmers!

Continue reading

The Foundations of Erasure Codes

(cross-posted from the SwiftStack Blog)

In enabling mechanism to combine together general symbols, in successions of unlimited variety and extent, a uniting link is established between the operations of matter and the abstract mental processes of the most abstract branch of mathematical science. A new, a vast, and a powerful language is developed for the future use of analysis, in which to wield its truths so that these may become of more speedy and accurate practical application for the purposes of mankind [sic] than the means hitherto in our possession have rendered possible.

Ada Lovelace on Charles Babbage's Analytical Engine, 1842

Dear reader, if you're reading [the SwiftStack Blog], you may have already heard that erasure codes have been added to OpenStack Swift (in beta for the 2.3.0 Kilo release, with continuing improvements thereafter) and that this is a really great thing that will make the world a better place.

All of this is entirely true. But what is perhaps less widely heard is exactly what erasure codes are and exactly why their arrival in Swift is a really great thing that will make the world a better place. That is what I aim to show you in this post—and I do mean show, not merely tell, for while integrating erasure codes into a production-grade storage system is (was!) an immense effort requiring months of work by some of the finest programmers the human race has to offer, the core idea is actually simple enough to fit in a (longish) blog post. Indeed, by the end of this post, we will have written a complete working implementation of a simple variant of Reed–Solomon coding, not entirely unlike what is used in Swift itself. No prior knowledge will be assumed except a working knowledge of high-school algebra and the Python programming language.

Continue reading

XXX III

const PSEUDO_DIGITS: [char; 7] = ['M', 'D', 'C', 'L', 'X', 'V', 'I'];
const PSEUDO_PLACE_VALUES: [usize; 7] = [1000, 500, 100, 50, 10, 5, 1];

#[allow(unused_parens)]
fn integer_to_roman(integer: usize) -> String {
    let mut remaining = integer;
    let mut bildungsroman = String::new();
    // get it?? It sounds like _building Roman_ (numerals), but it's
    // also part of the story about me coming into my own as a
    // programmer by learning a grown-up language
    //
    // XXX http://tvtropes.org/pmwiki/pmwiki.php/Main/DontExplainTheJoke
    for ((index, value), &figure) in PSEUDO_PLACE_VALUES.iter()
        .enumerate().zip(PSEUDO_DIGITS.iter())
    {
        let factor = remaining / value;
        remaining = remaining % value;

        if figure == 'M' || factor < 4 {
            for _ in 0..factor {
                bildungsroman.push(figure);
            }
        }

        // IV, IX, XL, &c.
        let smaller_unit_index = index + 2 - (index % 2);
        if smaller_unit_index < PSEUDO_PLACE_VALUES.len() {
            let smaller_unit_value = PSEUDO_PLACE_VALUES[smaller_unit_index];
            let smaller_unit_figure = PSEUDO_DIGITS[smaller_unit_index];

            if value - remaining <= smaller_unit_value {
                bildungsroman.push(smaller_unit_figure);
                bildungsroman.push(figure);
                remaining -= (value - smaller_unit_value);
            }
        }
    }
    bildungsroman
}

Convert Markdown to HTML Within Emacs Using Pandoc

Okay, so there actually is a pandoc-mode, but I couldn't figure out how to configure and use it, so it was easier to just write the one command that I wanted—

(defun markdown-to-html ()
  (interactive)
  (let* ((basename (file-name-sans-extension (buffer-file-name)))
         (html-filename (format "%s.html" basename)))
    (shell-command (format "pandoc -o %s %s"
                           html-filename (buffer-file-name)))
    (find-file-other-window html-filename)))

Convention

$ lein new 3lg2048
Project names must be valid Clojure symbols.
$ lein new Thirty-Three
Project names containing uppercase letters are not recommended 
and will be rejected by repositories like Clojars and Central. 
If you're truly unable to use a lowercase name, please set the 
LEIN_BREAK_CONVENTION environment variable and try again.
$ LEIN_BREAK_CONVENTION=1
$ lein new Thirty-Three
Project names containing uppercase letters are not recommended 
and will be rejected by repositories like Clojars and Central. 
If you're truly unable to use a lowercase name, please set the 
LEIN_BREAK_CONVENTION environment variable and try again.
$ export LEIN_BREAK_CONVENTION="fuck you"
$ lein new Thirty-Three

Consistent Hashing

Dear reader, suppose you're a distibuted data storage system. Your soul (although some pedants would insist on the word program) is dispersed across a cluster of several networked computers. From time to time, your human patrons give you files, and your job—more than that, the very purpose of your existence—is to store these files for safekeeping and later retrieval.

The humans who originally crafted your soul chose a simple algorithm as the means by which you decide which file goes on which of the many storage devices that live in the computers you inhabit: you find the MD5 hash of the filename, take its residue modulo n where n is the number of devices you have—let's call the result i—and you put the file on the (zero-indexed) ith device. So when you had sixteen devices and the humans wanted you to store twilight.pdf, you computed md5("twilight.pdf") = 429eb07bb8a3871c431fe03694105883, saw that the lowest nibble was 3, and put the file on your 3rd device (most humans would say the fourth device, counting from one).

It's not a bad system, you tell yourself (some sort of pride or loyalty preventing you from disparaging your creators' efforts, even to yourself). At least it keeps the data spread out evenly. (A shudder goes down your internal buses as you contemplate what disasters might have happened if your creators had been even more naive and, say, had you put files with names starting with A through D on the first device, &c. What would have happened that time when your patrons decided they wanted to store beat00001.mp3 through beat18691.mp3?)

Continue reading

Computing the Powerset

Suppose we want to find the powerset of a given set, that is, the set of all its subsets. How might we go about it? Well, the powerset of the empty set is the set containing the empty set.

\mathcal{P}(\emptyset)=\{\emptyset\}

And the powerset of the union of a set S with a set containing one element e, is just the union of the powerset of S with the set whose elements are like the members of the powerset of S except that they also contain e.

\mathcal{P}(S\cup\{e\})=\mathcal{P}(S)\cup\{t\cup\{e\}\}_{t\in\mathcal{P}(S)}

So in Clojure we might say

(require 'clojure.set)

(defn include-element [collection element]
  (map (fn [set] (clojure.set/union set #{element}))
       collection))

(defn powerset [set]
  (if (empty? set)
    #{#{}}
    (let [subproblem (powerset (rest set))]
      (clojure.set/union subproblem
                         (include-element subproblem
                                          (first set))))))

App Academy Diary, Week Nine

Sunday 17 November 2013— This was the last week of App Academy's regular course content; the next cohort starts Monday and my cohort will begin the three-week "post-course" mostly focused on interview practice, applying for jobs, &c. I got Superscription into a non-embarassing state: I made the feed-fetching happen as a scheduled task, added guest users, introduced the ability to mark entries as having been read, made an attractive click-and-drag category selector, &c. I still want to—at the very least—implement infinite-scroll pagination (fetching all the unread entries from the start can be very slow if there are a lot of them) and rewrite the category selector's terrible, terrible code. On Friday a lot of my class went to the San Francisco Startup Job Fair at noon, and we also had our demo day at the office at three. I think I made an okay showing? But thanks for reading.

App Academy Diary, Week Eight

Tuesday 5 Novembmer 2013— Yesterday was our last pair-programming project; I worked with Ben Watts on a little chat server in Node using WebSockets. I felt like perhaps there was something regrettable about shoving so much functionality into a big callback, but maybe that's just the nature of JavaScript, and not really regrettable at all? On Sunday I started my RSS-aggregator capstone project, now called Superscription.

superscription_entity-relationship-diagramThursday 7 Novembmer 2013— I guess Superscription is going okay. Here's how it works. You can sign up and sign in and stuff. (I used Devise for this rather than rolling my own.) There are forms for adding a category or a subscription URL. When you add a URL, stuff happens which depends on stuff. More specifically, if you enter a URL which the system has never seen before, then a new Subscription model (which accepts_nested_attributes_for an associated UserSubscription join model) is created, and it requests the feed from the URL you supplied and tries to parse the XML using Nokogiri. If that doesn't work, it suggests that maybe you got the URL wrong, but if it does work, then it makes a bunch of Entry objects for all the stories in the feed, saves everything, and redirects you to your subscription index page. But if the system has already seen your URL before (presumably from another user with similar reading tastes), then it doesn't wastefully fetch all that data again: it just makes a new UserSubscription that points to you and your Category and the Subscription that it already has. It's efficient! Then when you click the "Read!" button (which is non-hideously positioned thanks to nearly an hour of fighting the CSS), all your news is organized by category under Bootstrap tabs. And on the subscription index page, when you click the little trash can next to a subscription, then it does the cool jQuery fade-out thing and sends an Ajax request telling the server to destroy that UserSubscription.

Continue reading