How to PUT a file in Django

Python, Web January 28th, 2009

Once we decide to go for PUT instead of POST, we step out the comfort zone of django, there is no mapped form filed, no validation, we have to deal with the raw WSGI interface by ourselves. Anyway, we can still use the the django.core.file.File.

If we dig into the source code, the django.core.file.File defines: open, close, read, tell, seek, flush and some other django-specific operations, like chunks, readlines, xreadlines etc. Ticket #8501 glues File and file object when chunks method is missing.

It is interesting that the interface File exposed explicitly requires that the underlying file object supports random access, which is most likely overqualified for general use. Sometimes, less is more. And it implicitly expects read will return EOF, which is also not true for WSGI.input. So we end up to brew our own:

class SocketFile(File):
    # Only forward access is allowed
    def __init__(self, socket, size):
        super(SocketFile, self).__init__(socket)
        self._size = int(size)
        self._pos = 0

    def read(self, num_bytes=None):
        if num_bytes is None:
            num_bytes = self._size – self._pos
        else:
            num_bytes = min(num_bytes, self._size – self._pos)
        self._pos += num_bytes
        return self.file.read(num_bytes)

    def tell(self):
        return self._pos

    def seek(self, position):
        pass

The SocketFile object is initialized with the length of the socket file object, aka CONTENT_LENGTH, the read method gatekeeps the operation to return EOF. seek is inherited from File, so just bypass it. Just wrap the raw WSGI.input with SocketFile, and use it as File. Please check views.py for the usage.

RESTful Django practice

Python, Web January 14th, 2009

After several rounds reading RESTfull Web Services, I still have feeble confidence on my understanding the hyped REST idea, so please never hesitate to criticize, suggest in the comment.

Expose the resources

A book can be easily be identified by ISBN or EAN, however, it may stand for the specific book other than other books, or it refers an eBook instance to for content presentation. We use an additional argument, format to differentiate them:

/bookshelf/books/(isbn|ean)?format=(pdf|chm|…|json)

UPDATE A better approach is to use the pseudo file:

/bookshelf/books/(isbn|ean)(.(pdf|chm|…|json))

The default format is JSON, the server will render the meta data and available eBook formats in JSON.

Client request Server response
GET ../isbn(.pdf)

Book meta data and available formats, or the eBook data.

404/Not Found if the book does not exists

PUT ../isbn

201/Created if the server create a new Book instance.

200/OK if the book exists.

400/Bad Request if the ISBN is invalid.

PUT ../isbn.pdf

201/Created if the specified format eBook does not exist.

200/OK if the eBook exists. The admin needs to moderate later.

400/Bad Request if the ISBN is invalid.

HEAD ../isbn

Available formats.

404/Not Found if the book does not exists.

HEAD ../isbn.pdf

The content length and other information about the file.

404/Not Found if the book does not exists.

DELETE ../isbn

200/Accepted Remove the Book instance if no related eBook exists.

404/Not Found if the book does not exists.

409/Conflict if there exist at least one eBook related.

DELETE ../isbn.pdf

200/Accepted Remove the eBook instance, the admin needs to moderate.

404/Not Found if the book does not exists.

Furthermore, the URL representation is supposed to be discoverable. So we add two boring URL:

Client request Server response
GET /bookshelf/

Available list, currently only books supported.

GET /bookshelf/books

All books with pagination, ?page=n

Serialization

There exists a generic RESTful Django project, django-rest-interface, no surprise it takes the built-in JSON serializer.

The default JSON serializer is convenient, but from my understanding, it is more or less gears towards the round trip of data serialization, while we favor presentation only:

  • Too much database details exposed to the end users.
  • ForeignKey and ManyToMany are interpreted as external link using the id field.

Furthermore, the JSON serializer is not lazy enough: the data has to be fetched from the database and stored in the memory before it is dumped to the stream. This may result in a serious scaling issue. The side-effect of the writing policy make it impossible to serialize in a recursive fashion, just because the stream has not been flushed until all the objects have been addressed. Otherwise, the ForeighKey and ManyToMany can be easily addressed. A better solution is to take the similar approach as SAX does. The tags are emitted recursively once a new object needs to be serialized.

We will discuss the Pattee’s implementation next time.

Tip: Reuse Django view in urlconf

Python, Web January 11th, 2009

The Pro Django introduces a convenient way to reuse the way: fill an optional dictionary object to feed extra information.

We may extend this trick a little bit further. Book model has two unique fields, isbn and ean. They are essential the same except the queried field. We can reuse the view by categorizing the request by the length:

urlpatterns = patterns(,
    (r‘^books/(?P<isbn>[\d\w]{10})$’, views.detail) , # ISBN
    (r‘^books/(?P<ean>[\d\w]{13})$’, views.detail), # EAN
)

Then in detail, using the magic kwargs to bring the information in:

def detail(request, **kwargs):
  book_qs = Book.objects.filter(**kwargs)

Customize the Django newform admin UI

Python, Web January 11th, 2009

One of the exciting features Django 1.0 brings to the table is the integration of newform into the admin UI. Hat off to Brian, great work.

The essential workload in the Pattee’s admin UI is to input the meta information of the book, then link it to the eBook file. It is more convenient to take the advantage of Amazon Web Service, so the user scenario is:

  1. Search Amazon by keyword or ISBN
  2. Select the correct book
  3. Update the eBook file
  4. Save it to the database
Add a Book

It is straightforward to override the default template and implementation: redirect the admin URL to our own version of change_form.html template and add_view implementation, then save the object regardless the data integration; or we may reuse the infrastructure, and just implement our required function. Pattee takes the latter approach.

First, declare the BookAdmin and eBookAdmin in admin.py, and hook eBook inline:

class eBookAdmin(admin.StackedInline):
    model = eBook
    extra = 1

class BookAdmin(admin.ModelAdmin):
    form = BookForm
    inlines = [eBookAdmin]
    change_form_template = ‘book_change_form.html’

and don’t forget to register it to admin.site, which is required by Django 1.0.

admin.site.register(Book, BookAdmin)

Then in BookForm, we could override full_clean to provide the cleaned_data. And in BookAdmin, the following tricks will guard against duplication of Book instances:

def save_form(self, request, form, change):
        if form.is_valid():
            try :
                instance = self.model.objects.get(isbn = form.cleaned_data[‘isbn’])
            except ObjectDoesNotExist:
                return super(BookAdmin, self).save_form(request, form, change)
            return instance

That is all. You may consider to check out the source code for details:

svn checkout http://my-svn.assembla.com/svn/pattee/trunk pattee

Restart the Django engine

Python, Web January 10th, 2009

It has been one year since the last update of my Learning Django by Example series, partially in account of the demanding day job, the main reason lies in that the project is bloated with some advanced features which delay scratching my itch. Thanks to the snow storm in the holiday season, our vacation to Las Vegas was canceled, I had time to restart the Django engine.

The new project, Pattee, named after the library in Penn State University, aims to a simple, stupid eBook management system. It is so simple that it even does not have a dedicated UI, only integrated to a third-party web application(currently, douban.com). You may check it out at here:

svn checkout http://my-svn.assembla.com/svn/pattee/trunk pattee

There is no step-by-step tutorial to follow the footprints of the Django Book, each post will target a specific topic.

I may post