Posts in "Software"

Feb. 5, 2010

A GIL Adventure (with a happy ending)

I just halved the running time of one of my test suites.

The tests in question are multi-threaded, and while they perform a lot of IO they still push the CPU pretty hard. For some time now, nose has been reporting a happy little message along these lines:

Ran 35 tests in 24.893s

I wouldn't have though anything of it, but every so often this number would drop dramatically – often down to as little as 15 seconds. After a lot of puzzling, I realised that the tests would run faster whenever I had another test suite running at the same time. Making my computer work harder made these tests run almost twice as fast!

Could it be? Yes, I was finally seeing a manifestation of Python's dreaded Global Interpreter Lock - a.k.a. the "GIL of Doom". Because I'm running on a dual core system, the different threads in this test suite were spreading themselves over both processors and engaging in an epic GIL Battle that bogged down the whole process.

The typical response to this awful multi-core behaviour is "just use multiprocessing". That's not an option here, not least because these tests are supposed to be checking the thread safety of my code!

Continue reading...

Aug. 4, 2009

New Python module: extprot

compact, efficient, extensible data serialisation

One of my commercial projects requires a space-efficient object serialisation format, and until now I've been using the obvious choice in Google's Protocol Buffers. I'm happy enough with the format itself, but the experience of using the Python bindings was just barely satisfactory. The interface feels quite Java-ish and there are some non-obvious gotchas, such as having to use special methods to manipulate list fields. I ploughed ahead, but was quietly looking around for alternatives.

The last straw came when I tried to establish a deployment scheme using pip requirements files. Both "pip install protobuf" and "easy_install protobuf" fail hard: the pypi eggs are out of date, the source download has a non-standard structure, and the setup.py script tries to bootstrap itself using the protobuf compiler that it assumes you have already built. Yuck. This was more pain than I was willing to put up with. Plus it was a good opportunity to take another look around.

I toyed briefly with Facebook's...errr...I mean Apache Thrift, but it had too much remote-procedure-call baggage and not enough documentation. Then I stumbled across a great little screed about extprot, a technology to create "compact, efficient and extensible binary protocols that can be used for cross-language communication and long-term data serialization".

Continue reading...

May 22, 2009

XML Namespace Selectors for jQuery

I hit my first real roadbump with jQuery yesterday, a missing feature that really made me stop and stare in puzzlement: jQuery doesn't support xml-namespace selectors. Since I'm trying to parse WebDAV response bodies, and such documents make extensive use of namespaces, it's quite the issue for me. Or rather, it was quite the issue – read on if you're interested in the details, or just download my solution if you're impatient.

Oh sure, jQuery supports prefix selectors just fine. If your document contains an element <D:response>, you can quite safely query for that element by name as long as you remember to backslash-escape the colon:

$(doc).find("D\\:response")

This works as long as you can guarantee that a single prefix is used for the target namespace, and that it's used uniformly throughout the document. But the XML Namespaces standard, as well as the WebDAV standard, make it very clear that you can't rely on this in general. The node name prefix is a purely syntactic construct, while its actual namespace is a semantic property that can be specified in several different ways.

Fortunately, the CSS Level 3 standard provides a very clear syntax and semantics for namespace-aware queries. After declaring 'D' to be the proper WebDAV namespace URI, the CSS-3 equivalent to the above prefix query would be:

Continue reading...

Feb. 12, 2009

filelike 0.3.2 released

Just a quite note to mark an updated release of filelike. Version 0.3.2 offers a lot of small robustness fixes and many new unit tests, much improved support for files opened in append mode, and a new filelike wrapper "FlushableBuffer" for buffering streaming reads/writes so they can be treated like a random-access file.

This library forms part of the backbone of a new project I'm working on, so expect to see more frequent releases over the next few months.

Continue reading...

Jan. 29, 2009

Forking EC2 instances for Mozart/Oz

My long-standing obsession with Mozart/Oz is no secret, but I often find it difficult to articulate precisely why I'm so fascinated by the language. I never seem to make much headway by describing the power and elegance of its novel control structures such as first-class computation spaces – which, by the way, I would rank right up there with continuations on the list of "language features that sound useless but are actually incredibly powerful"...but instead of going down that esoteric road, let me demonstrate a short and eminently useful little hack that I put together last week, one which really highlights the power of the Mozart platform.

First, a brief primer on one of Mozart's great strengths: distributed programming. If you have remote access to a computer with Mozart installed, it's very easy to move some of your computational workload onto that machine. Here's a short Mozart script that uses the remote computer "guava" (the computer hosting this website) to calculate 1000 factorial:

% Wrap the target code in a functor F = functor export result:Result define fun {Fact X} if X < 2 then 1 else X * {Fact X-1} end end Result = {Fact 1000} end % Spawn a new Mozart instance on the remote machine R = {New Remote.manager init(host:guava fork:ssh)} % Have it execute the functor, and print the result Res = {R apply(F $)} {System.showInfo Res.result}

Continue reading...