Posts in "Python"
Like most web frameworks, Django provides a convenient mechanism for storing data across requests in a persistent "session" object. Like most web frameworks, Django implements sessions using a simple mapping from a "session key" to a session object stored on the server. And like most web frameworks, Django's default session implementation is trivially vulnerable to session hijacking attacks.
Django's session implementation is quite similar to that provided by PHP; for all the gory details here is an excellent article on The Truth about Sessions, but the simplified version is as follows. When you first visit a Django-powered site, the server generates a random "session key" and returns it to your browser in a cookie. Any data that the server wants to remember about you (say, whether you have logged in and under what username) is stored in a giant dictionary indexed by the session key. On each subsequent visit you browser sends the key back to the server, which looks up your data in this dictionary and proceeds merrily on its way. The interaction looks something like the following:
- You login at the (hypothetical) Django-powered website http://www.my-todo-list.com/.
- The server stores your login details in its session database, and sends back a session key of "123456".
- You send a request to update your todo list, presenting a session key of "123456".
- The server looks up "123456" in its session database, checks that the session is correctly logged in as you, and proceeds with the requested update.
It's a simple and convenient mechanism, but it has an important security issue: anyone who knows your session key can impersonate you to the server! Consider what happens next:
Deploying Django projects is in general a straightforward affair, but it still suffers from a pain-point that's as old as web apps themselves: deploying at an arbitrary root URL. In my ideal world, I would push my shiny new Django project to the server, instruct Apache to mount it at "/my/shiny/app", and everything would just work – all URLs would magically have "/my/shiny/app" stripped off on their way into Django and prepended again on their way out. In the real world, Django comes pretty close to this ideal but stops just far enough short to be annoying.
First, here's what Django gets right: reverse(), permalink() and {% url %} are awesome. They introspect Django's runtime environment to translate an application-level name or object into a deployment-level URL. Your applications have no excuse for hard-coding URLs or even URL fragments. In theory, these two functions should be enough to make Django completely agnostic about its deployment location.
Now here's what Django gets wrong: some of its core components don't use them. Instead they use hard-coded URLs defined in the settings module, such as settings.ADMIN_MEDIA_PREFIX and settings.LOGIN_URL. Attempts to patch these components to avoid hard-coded URLs have been closed wontfix, so I guess we're stuck with them for a while.
New Python module: extprot
compact, efficient, extensible data serialisation
One of my commercial projects requires a space-efficient object serialisation format, and until now I've been using the obvious choice in Google's Protocol Buffers. I'm happy enough with the format itself, but the experience of using the Python bindings was just barely satisfactory. The interface feels quite Java-ish and there are some non-obvious gotchas, such as having to use special methods to manipulate list fields. I ploughed ahead, but was quietly looking around for alternatives.
The last straw came when I tried to establish a deployment scheme using pip requirements files. Both "pip install protobuf" and "easy_install protobuf" fail hard: the pypi eggs are out of date, the source download has a non-standard structure, and the setup.py script tries to bootstrap itself using the protobuf compiler that it assumes you have already built. Yuck. This was more pain than I was willing to put up with. Plus it was a good opportunity to take another look around.
I toyed briefly with Facebook's...errr...I mean Apache Thrift, but it had too much remote-procedure-call baggage and not enough documentation. Then I stumbled across a great little screed about extprot, a technology to create "compact, efficient and extensible binary protocols that can be used for cross-language communication and long-term data serialization".
Just a quite note to mark an updated release of filelike. Version 0.3.2 offers a lot of small robustness fixes and many new unit tests, much improved support for files opened in append mode, and a new filelike wrapper "FlushableBuffer" for buffering streaming reads/writes so they can be treated like a random-access file.
This library forms part of the backbone of a new project I'm working on, so expect to see more frequent releases over the next few months.
Following my previous post on testing Django with Windmill, I quickly ran into a common snag with in-browser web app testing: it's not possible to programmatically set the value of file input fields. This makes it very difficult to test file upload functionality using frameworks such as Windmill or Selenium.
In Firefox it's possible to request elevated permissions for your unit tests, but this is far from ideal. It means the tests are no longer automatic (you have to click "yes, grant this page extra permissions" whenever the tests are run) and it takes other browsers out of the testing loop. Like many things in life, the easiest solution seems to be simply to fake it.
But like any convincing fakery, the details are never that simple in practice. Uploading a big file from a web browser will take a long time, but could be nearly instantaneous if you fake it using a server-side file. And what if you have custom upload handlers to enable things like upload progress reporting? How can we make fake file uploads as transparent and convincing as possible?