Authorize the REST web service

Python, Web April 19th, 2009

Once we step into the REST territory, session-based AuthenticationMiddleware is no longer an option due to the violation of stateless principle. Digest authentication seems one of the very few options left for this situation. The basic concept is that the client and server share a private secret key, the client signs the HTTP request and the server validates the signature before further operations.

There is no off-the-shelf digest authentication middleware available yet, let’s roll up sleeves and home-brew our own or more specifically, shameless copy the S3(authentication spec) python library with slightly simplification:

The entities under the radar are cut to five: HTTP verb, Content-Length, Content-Type, Date and the body. The HTTP verb and content type specify the REST request, the date prevents the man-in-the-middle replay attack. The signature is then digested on the stream, and appended into Authentication header.
Update: URL is also essential, the man-in-the-middle may record the REST operation in one entry, and replay in another entry point.

It is also pretty straightforward to wrap up the digest authentication as a middleware: create a new model named as Token and add access_id and acess_key pair, also the User as foreign key. We could just copy AuthenticateMiddleware and override get_user method to integrate the digest validation. You may check the revision 21 to 22 on pattee for more details if you are interested in.

There is another issue worthy our attention: both digest and session-based authentications are required, the latter is the gate keeper to access admin interface to manages the token used for the former, but they may not play well together: the resource for the cookie management in the server side is totally wasted to handle REST requests, and furthermore, it leaves the door open for security exploit by stolen cookies. We will discuss this issue later, stay tune.

Book Review: Django 1.0 Template Development

Development, Python, Web March 12th, 2009

This is not a paid review, I did not receive a dime from author, publisher or the affiliated; however, I did get a free copy of the book, so some harsh critics may be sugar-coated. Read with caution.

Though I am a die-hard RTFM guy, it never hurt to take the advantage of expertise of the peers. Django 1.0 Template Development focuses on a relatively narrow topic, the template system of Django. The author puts himself in an dilemma: he expects the target audience to have basic ideas of Django system, but still has to go over all the hassles(not really though) to kick start a new project to make the book self-contained. I think the author did a great job for a beginners, but I still highly recommend the official DjangoBook, tutorial and documentation.

After two chapters warm-up, Chapter 3 shows the magic of Context and RequestContext. It is interesting to see how the project evolve from the low-level operation to the shortcuts. Chapter 4 introduces the built-in tags in the toolbox. Chapter 5 and 6 demonstrates the template inheritance and how multiple templates are served. In chapter 7, the developers can extend their toolbox by creating new filter tags. Chapter 9 gives series examples for admin UI customization. Chapter 8 and 10 are about the performance, pagination and cache. Last but not the least, L10N in chapter 11.

The examples in each chapter is atomic for easy understanding, but in my humble opinion, most of them do not impress the readers the power of Django. The framework shines to solve BIG and complex questions. I just wonder what if the author starts a much more ambitious project with complicated specification, and later decompose it to small tasks and address them in each chapter to make the point, just like Dive into Python does.

Furthermore, I would appreciate if the author could share more first-hand experience with readers. Engineering is always about question-solving. The framework is naturally easy to learn, otherwise, why bother? But it may suck in the big time if it does not scale. Any real world case would help to establish the confidence for further acceptance.

The bottom line: a good book for beginners, some chapters are quite beefy for the topic. As Django is a fast-evolving project, I hope the author will bring more juicy examples in the future edition.

How to PUT a file in Django

Python, Web January 28th, 2009

Once we decide to go for PUT instead of POST, we step out the comfort zone of django, there is no mapped form filed, no validation, we have to deal with the raw WSGI interface by ourselves. Anyway, we can still use the the django.core.file.File.

If we dig into the source code, the django.core.file.File defines: open, close, read, tell, seek, flush and some other django-specific operations, like chunks, readlines, xreadlines etc. Ticket #8501 glues File and file object when chunks method is missing.

It is interesting that the interface File exposed explicitly requires that the underlying file object supports random access, which is most likely overqualified for general use. Sometimes, less is more. And it implicitly expects read will return EOF, which is also not true for WSGI.input. So we end up to brew our own:

class SocketFile(File):
    # Only forward access is allowed
    def __init__(self, socket, size):
        super(SocketFile, self).__init__(socket)
        self._size = int(size)
        self._pos = 0

    def read(self, num_bytes=None):
        if num_bytes is None:
            num_bytes = self._size – self._pos
        else:
            num_bytes = min(num_bytes, self._size – self._pos)
        self._pos += num_bytes
        return self.file.read(num_bytes)

    def tell(self):
        return self._pos

    def seek(self, position):
        pass

The SocketFile object is initialized with the length of the socket file object, aka CONTENT_LENGTH, the read method gatekeeps the operation to return EOF. seek is inherited from File, so just bypass it. Just wrap the raw WSGI.input with SocketFile, and use it as File. Please check views.py for the usage.

Is get lazily evaluated?

Development, Python January 24th, 2009

What do you think about a dict.get operation does?

d.get[the_key, fall_back]

Something like this?

try:
    return d[the_key]
except KeyError:
    return fall_back

That is my intuition, just like C’s short-circuit evaluation, the fallback is not evaluated until the requested key does not exist. Unfortunately, this not try, for example:

>>> d = {1: 3, 2:4}
>>> def f():
…     print 7

>>> d.get(3, f())
7
>>> d.get(1, f())
7
3

f is always evaluated regardless whether the key exists or not.

This is expected behavior from the compiler/interpreter perspective, the fallback value needs to be reduced first before get is invoked. It would be nice to evaluate get lazily to avoid the expensive try/except mechanism; or tedious has_key test.

RESTful Django practice

Python, Web January 14th, 2009

After several rounds reading RESTfull Web Services, I still have feeble confidence on my understanding the hyped REST idea, so please never hesitate to criticize, suggest in the comment.

Expose the resources

A book can be easily be identified by ISBN or EAN, however, it may stand for the specific book other than other books, or it refers an eBook instance to for content presentation. We use an additional argument, format to differentiate them:

/bookshelf/books/(isbn|ean)?format=(pdf|chm|…|json)

UPDATE A better approach is to use the pseudo file:

/bookshelf/books/(isbn|ean)(.(pdf|chm|…|json))

The default format is JSON, the server will render the meta data and available eBook formats in JSON.

Client request Server response
GET ../isbn(.pdf)

Book meta data and available formats, or the eBook data.

404/Not Found if the book does not exists

PUT ../isbn

201/Created if the server create a new Book instance.

200/OK if the book exists.

400/Bad Request if the ISBN is invalid.

PUT ../isbn.pdf

201/Created if the specified format eBook does not exist.

200/OK if the eBook exists. The admin needs to moderate later.

400/Bad Request if the ISBN is invalid.

HEAD ../isbn

Available formats.

404/Not Found if the book does not exists.

HEAD ../isbn.pdf

The content length and other information about the file.

404/Not Found if the book does not exists.

DELETE ../isbn

200/Accepted Remove the Book instance if no related eBook exists.

404/Not Found if the book does not exists.

409/Conflict if there exist at least one eBook related.

DELETE ../isbn.pdf

200/Accepted Remove the eBook instance, the admin needs to moderate.

404/Not Found if the book does not exists.

Furthermore, the URL representation is supposed to be discoverable. So we add two boring URL:

Client request Server response
GET /bookshelf/

Available list, currently only books supported.

GET /bookshelf/books

All books with pagination, ?page=n

Serialization

There exists a generic RESTful Django project, django-rest-interface, no surprise it takes the built-in JSON serializer.

The default JSON serializer is convenient, but from my understanding, it is more or less gears towards the round trip of data serialization, while we favor presentation only:

  • Too much database details exposed to the end users.
  • ForeignKey and ManyToMany are interpreted as external link using the id field.

Furthermore, the JSON serializer is not lazy enough: the data has to be fetched from the database and stored in the memory before it is dumped to the stream. This may result in a serious scaling issue. The side-effect of the writing policy make it impossible to serialize in a recursive fashion, just because the stream has not been flushed until all the objects have been addressed. Otherwise, the ForeighKey and ManyToMany can be easily addressed. A better solution is to take the similar approach as SAX does. The tags are emitted recursively once a new object needs to be serialized.

We will discuss the Pattee’s implementation next time.