Posts in "Software"
The latest release of PyEnchant now contains an experimental binary distribution for OSX, as both an mpkg installer and a python egg. In theory, users on OSX 10.4 or later should be able to just drop pyenchant-1.6.3-py2.6-macosx-10.4-universal.egg somewhere on sys.path and be up and running and spellchecking with ease.
If you're a Mac user, please try it out and let me know if anything doesn't work the way you expect.
The experience of building this was quite interesting, and more than a little painful, because I wanted to build a proper universal library that could be used on almost any Mac out there. The gory details can be found in pyenchant-bdist-osx-sources-1.6.3.tar.gz; this post is a quick set of notes that might help others get started.
Fortunately for me, the familiar build toolchain of "./configure; make; make install" is pretty much intact on OSX. The only real trickery is getting the resulting library to work on systems other than your own. I hit two major stumbling blocks in this regard:
- how to build fat binaries that still work on older versions of OSX?
- how to make the libraries relocatable, so they can be installed at any location?
This may all be old news to seasoned OSX veterans, but hopefully these notes can help out other expat linux users like me.
I've just spent a few days trying to improve the performance of a frozen Python app - specifically, the time it takes to start up and present a login window. Most of the improvements were down to good old-fashioned writing of better code, but I also put together a couple of tricks to help shave off even more milliseconds. They both target one of the major sources of slowness when starting up a Python app: imports.
Import processing is an area where an app written in Python is at a big disadvantage compared to compiled languages such as C or Java. In a such languages the equivalent of an "import" statement is usually a compile-time directive that sucks in code from another file, and its impact on startup time is negligible. In Python, the import statement is a run-time directive that goes looking for the named module, compiles the source file if necessary, loads the compiled code into memory, executes the code in a new namespace, and finally returns the resulting module object. Clearly the fewer imports you can do at application startup, the better.
Lazy Imports
I first learned how important lazy imports can be from Andrew Bennetts, who works for Canonical on the Bazaar version control system. Most Python-related conferences in Australia feature Andrew giving a presentation on performance (most recently it was at PyCon AU with Making your python code fast) and he always mentions the lazy import mechanism used by Bazaar.
Updates to jquery.xmlns.js
I've finally gotten around to updating my XML namespace selector module for jQuery. The new version fixes a few typo-related bugs and brings compatibility with the recently-released jQuery version 1.4.
Grab the source file here: jquery.xmlns.js.
I just halved the running time of one of my test suites.
The tests in question are multi-threaded, and while they perform a lot of IO they still push the CPU pretty hard. For some time now, nose has been reporting a happy little message along these lines:
Ran 35 tests in 24.893s
I wouldn't have though anything of it, but every so often this number would drop dramatically – often down to as little as 15 seconds. After a lot of puzzling, I realised that the tests would run faster whenever I had another test suite running at the same time. Making my computer work harder made these tests run almost twice as fast!
Could it be? Yes, I was finally seeing a manifestation of Python's dreaded Global Interpreter Lock - a.k.a. the "GIL of Doom". Because I'm running on a dual core system, the different threads in this test suite were spreading themselves over both processors and engaging in an epic GIL Battle that bogged down the whole process.
The typical response to this awful multi-core behaviour is "just use multiprocessing". That's not an option here, not least because these tests are supposed to be checking the thread safety of my code!
New Python module: extprot
compact, efficient, extensible data serialisation
One of my commercial projects requires a space-efficient object serialisation format, and until now I've been using the obvious choice in Google's Protocol Buffers. I'm happy enough with the format itself, but the experience of using the Python bindings was just barely satisfactory. The interface feels quite Java-ish and there are some non-obvious gotchas, such as having to use special methods to manipulate list fields. I ploughed ahead, but was quietly looking around for alternatives.
The last straw came when I tried to establish a deployment scheme using pip requirements files. Both "pip install protobuf" and "easy_install protobuf" fail hard: the pypi eggs are out of date, the source download has a non-standard structure, and the setup.py script tries to bootstrap itself using the protobuf compiler that it assumes you have already built. Yuck. This was more pain than I was willing to put up with. Plus it was a good opportunity to take another look around.
I toyed briefly with Facebook's...errr...I mean Apache Thrift, but it had too much remote-procedure-call baggage and not enough documentation. Then I stumbled across a great little screed about extprot, a technology to create "compact, efficient and extensible binary protocols that can be used for cross-language communication and long-term data serialization".