The Foundations of Erasure Codes

(cross-posted from the SwiftStack Blog)

In enabling mechanism to combine together general symbols, in successions of unlimited variety and extent, a uniting link is established between the operations of matter and the abstract mental processes of the most abstract branch of mathematical science. A new, a vast, and a powerful language is developed for the future use of analysis, in which to wield its truths so that these may become of more speedy and accurate practical application for the purposes of mankind [sic] than the means hitherto in our possession have rendered possible.

Ada Lovelace on Charles Babbage's Analytical Engine, 1842

Dear reader, if you're reading [the SwiftStack Blog], you may have already heard that erasure codes have been added to OpenStack Swift (in beta for the 2.3.0 Kilo release, with continuing improvements thereafter) and that this is a really great thing that will make the world a better place.

All of this is entirely true. But what is perhaps less widely heard is exactly what erasure codes are and exactly why their arrival in Swift is a really great thing that will make the world a better place. That is what I aim to show you in this post—and I do mean show, not merely tell, for while integrating erasure codes into a production-grade storage system is (was!) an immense effort requiring months of work by some of the finest programmers the human race has to offer, the core idea is actually simple enough to fit in a (longish) blog post. Indeed, by the end of this post, we will have written a complete working implementation of a simple variant of Reed–Solomon coding, not entirely unlike what is used in Swift itself. No prior knowledge will be assumed except a working knowledge of high-school algebra and the Python programming language.

Continue reading


"Don't worry, we've got our T.O.P. engineer working on it," said the support man on the phone with our most important customer, glancing meaningfully across the open-plan office in my direction; I winced briefly, then spasmed back towards my screen and fumbled with the keyboard, intending to return my attention to the definition of the DeviceAssignmentRuleComponentManagerFactory, but somehow fat-fingering C-x C-c along the way, every awkward, ungainly movement bearing testimony to the most casual of onlookers that I was Totally Observably Pathetic.



[19:26:50] <bob>    alice: you still around?
[19:27:08] <alice>  bob, sort of
[19:27:20] <bob>    alice: ok. never mind.
[19:27:41] <alice>  bob, what were you going to ask? I am at the office, 
                    trying to finish up an email but I'm really slow at 
                    choosing words
[19:28:21] <bob>    alice: i was just wondering if you happened to know a 
                    way to manually foo the bar-quuxing device
[19:28:22] <alice>  perhaps because of my overly-ornate and wordy writing 
                    style, which, for not-well-understood psychological 
                    reasons, I nevertheless continue to use despite its 
                    obvious disadvantages in business communication


const PSEUDO_DIGITS: [char; 7] = ['M', 'D', 'C', 'L', 'X', 'V', 'I'];
const PSEUDO_PLACE_VALUES: [usize; 7] = [1000, 500, 100, 50, 10, 5, 1];

fn integer_to_roman(integer: usize) -> String {
    let mut remaining = integer;
    let mut bildungsroman = String::new();
    // get it?? It sounds like _building Roman_ (numerals), but it's
    // also part of the story about me coming into my own as a
    // programmer by learning a grown-up language
    // XXX
    for ((index, value), &figure) in PSEUDO_PLACE_VALUES.iter()
        let factor = remaining / value;
        remaining = remaining % value;

        if figure == 'M' || factor < 4 {
            for _ in 0..factor {

        // IV, IX, XL, &c.
        let smaller_unit_index = index + 2 - (index % 2);
        if smaller_unit_index < PSEUDO_PLACE_VALUES.len() {
            let smaller_unit_value = PSEUDO_PLACE_VALUES[smaller_unit_index];
            let smaller_unit_figure = PSEUDO_DIGITS[smaller_unit_index];

            if value - remaining <= smaller_unit_value {
                remaining -= (value - smaller_unit_value);