$

I used to think of $ in regular expressions as matching the end of the string. I was wrong! It actually might do something more subtle than that, depending on what regex engine you're using. In my native Python's re module, $

[m]atches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline.

Note! The end of the string, or just before the newline at the end of the string.

In [2]: my_regex = re.compile("foo$")

In [3]: my_regex.match("foo")
Out[3]: <_sre.SRE_Match object; span=(0, 3), match='foo'>

In [4]: my_regex.match("foo\n")
Out[4]: <_sre.SRE_Match object; span=(0, 3), match='foo'>

I guess I can see the motivation—we often want to use the newline character as a terminator of lines (by definition) or files (by sacred tradition), without wanting to think of \n as really part of the content of interest—but the disjunctive behavior of $ can be a source of treacherous bugs in the fingers of misinformed programmers!

Continue reading

Convention

$ lein new 3lg2048
Project names must be valid Clojure symbols.
$ lein new Thirty-Three
Project names containing uppercase letters are not recommended 
and will be rejected by repositories like Clojars and Central. 
If you're truly unable to use a lowercase name, please set the 
LEIN_BREAK_CONVENTION environment variable and try again.
$ LEIN_BREAK_CONVENTION=1
$ lein new Thirty-Three
Project names containing uppercase letters are not recommended 
and will be rejected by repositories like Clojars and Central. 
If you're truly unable to use a lowercase name, please set the 
LEIN_BREAK_CONVENTION environment variable and try again.
$ export LEIN_BREAK_CONVENTION="fuck you"
$ lein new Thirty-Three

Consistent Hashing

Dear reader, suppose you're a distibuted data storage system. Your soul (although some pedants would insist on the word program) is dispersed across a cluster of several networked computers. From time to time, your human patrons give you files, and your job—more than that, the very purpose of your existence—is to store these files for safekeeping and later retrieval.

The humans who originally crafted your soul chose a simple algorithm as the means by which you decide which file goes on which of the many storage devices that live in the computers you inhabit: you find the MD5 hash of the filename, take its residue modulo n where n is the number of devices you have—let's call the result i—and you put the file on the (zero-indexed) ith device. So when you had sixteen devices and the humans wanted you to store twilight.pdf, you computed md5("twilight.pdf") = 429eb07bb8a3871c431fe03694105883, saw that the lowest nibble was 3, and put the file on your 3rd device (most humans would say the fourth device, counting from one).

It's not a bad system, you tell yourself (some sort of pride or loyalty preventing you from disparaging your creators' efforts, even to yourself). At least it keeps the data spread out evenly. (A shudder goes down your internal buses as you contemplate what disasters might have happened if your creators had been even more naive and, say, had you put files with names starting with A through D on the first device, &c. What would have happened that time when your patrons decided they wanted to store beat00001.mp3 through beat18691.mp3?)

Continue reading

Computing the Powerset

Suppose we want to find the powerset of a given set, that is, the set of all its subsets. How might we go about it? Well, the powerset of the empty set is the set containing the empty set.

\mathcal{P}(\emptyset)=\{\emptyset\}

And the powerset of the union of a set S with a set containing one element e, is just the union of the powerset of S with the set whose elements are like the members of the powerset of S except that they also contain e.

\mathcal{P}(S\cup\{e\})=\mathcal{P}(S)\cup\{t\cup\{e\}\}_{t\in\mathcal{P}(S)}

So in Clojure we might say

(require 'clojure.set)

(defn include-element [collection element]
  (map (fn [set] (clojure.set/union set #{element}))
       collection))

(defn powerset [set]
  (if (empty? set)
    #{#{}}
    (let [subproblem (powerset (rest set))]
      (clojure.set/union subproblem
                         (include-element subproblem
                                          (first set))))))