Tip: Reuse Django view in urlconf

Python, Web January 11th, 2009

The Pro Django introduces a convenient way to reuse the way: fill an optional dictionary object to feed extra information.

We may extend this trick a little bit further. Book model has two unique fields, isbn and ean. They are essential the same except the queried field. We can reuse the view by categorizing the request by the length:

urlpatterns = patterns(,
    (r‘^books/(?P<isbn>[\d\w]{10})$’, views.detail) , # ISBN
    (r‘^books/(?P<ean>[\d\w]{13})$’, views.detail), # EAN
)

Then in detail, using the magic kwargs to bring the information in:

def detail(request, **kwargs):
  book_qs = Book.objects.filter(**kwargs)

Customize the Django newform admin UI

Python, Web January 11th, 2009

One of the exciting features Django 1.0 brings to the table is the integration of newform into the admin UI. Hat off to Brian, great work.

The essential workload in the Pattee’s admin UI is to input the meta information of the book, then link it to the eBook file. It is more convenient to take the advantage of Amazon Web Service, so the user scenario is:

  1. Search Amazon by keyword or ISBN
  2. Select the correct book
  3. Update the eBook file
  4. Save it to the database
Add a Book

It is straightforward to override the default template and implementation: redirect the admin URL to our own version of change_form.html template and add_view implementation, then save the object regardless the data integration; or we may reuse the infrastructure, and just implement our required function. Pattee takes the latter approach.

First, declare the BookAdmin and eBookAdmin in admin.py, and hook eBook inline:

class eBookAdmin(admin.StackedInline):
    model = eBook
    extra = 1

class BookAdmin(admin.ModelAdmin):
    form = BookForm
    inlines = [eBookAdmin]
    change_form_template = ‘book_change_form.html’

and don’t forget to register it to admin.site, which is required by Django 1.0.

admin.site.register(Book, BookAdmin)

Then in BookForm, we could override full_clean to provide the cleaned_data. And in BookAdmin, the following tricks will guard against duplication of Book instances:

def save_form(self, request, form, change):
        if form.is_valid():
            try :
                instance = self.model.objects.get(isbn = form.cleaned_data[‘isbn’])
            except ObjectDoesNotExist:
                return super(BookAdmin, self).save_form(request, form, change)
            return instance

That is all. You may consider to check out the source code for details:

svn checkout http://my-svn.assembla.com/svn/pattee/trunk pattee

Restart the Django engine

Python, Web January 10th, 2009

It has been one year since the last update of my Learning Django by Example series, partially in account of the demanding day job, the main reason lies in that the project is bloated with some advanced features which delay scratching my itch. Thanks to the snow storm in the holiday season, our vacation to Las Vegas was canceled, I had time to restart the Django engine.

The new project, Pattee, named after the library in Penn State University, aims to a simple, stupid eBook management system. It is so simple that it even does not have a dedicated UI, only integrated to a third-party web application(currently, douban.com). You may check it out at here:

svn checkout http://my-svn.assembla.com/svn/pattee/trunk pattee

There is no step-by-step tutorial to follow the footprints of the Django Book, each post will target a specific topic.

I may post

Parse HTML file with BeautifulSoup

Development, Python October 12th, 2008

In the last post, regular expression is used to fetch the specific information. To access the structured information, BeautifulSoap BeautifulSoup is preferred for its simplicity and convenient API:

  • You may override the fromEncoding in the constructor, this is very useful for non-roman, non-standard web pages.
  • Versatile find/findAll on tag, attributes.
  • Developer-friendly syntactic sugar, the Tag implements the interface of string, list, dict and callable function, so there are many ways to access the data as you wish. The drawback of this approach is the typo is only caught in the run time instead of compilation time.
  • Easy to deploy, only one BeautifulSoup.py file.

Something I don’t like:

  • No XPath support, more efforts are needed to port from JavaScript.
  • The API does not support stream, or file object. Laziness is always cherished for pipelining.
  • Why BeautifulSoup? I have made typo as soap more than ten times.

Here is the home-brewed script to wage through the Dvbbs thread to find the corresponding messages: elevator.py

WYS is not always WYG in python.re

Development, Python October 2nd, 2008

After almost two month hard work, I finally check-in the feature, and tonight I decided to relax on some leisure python programming:

This side project is quite trivial, fetch the HTML content, search the keywords in the thread, and build links table of contents for navigation. The only intrigue highlight that make this post worthy your 5 minute is that the language of the page is Chinese, and it is encoded in GB2312.

Long story short, I am trying to search the total number of posts in the thread using this regular expression:

pattern = re.compile(‘(?<=<b class="page">总数 )(?P<total>\d+)’)

The first catch is I have to declare the code page used for the source code, as python interpreter complains:

SyntaxError: Non-ASCII character ‘\xe6′ in file ./elevator.py on line 17, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

OK, I will stick to UTF-8, so add this declaration in the second line:
# -*- coding: utf-8 -*-

It does not work. And the dumped content of the page is totally messy. Oops, we forget to decode the content to Unicode, use codec to wrap the urlopened handle:

gb = codecs.lookup(‘gb2312′)
    # load the page
    content = gb.streamreader(urllib.urlopen(url)).read()

And don’t forget to add either Unicode prefix or re.Unicode flag to the pattern.

pattern = re.compile(‘(?<=<b class="page">总数 )(?P<total>\d+)’, re.UNICODE)

Still no luck, but it works in the python console with the same pattern, faked data, and also works if we change a little bit:

pattern = re.compile(‘(?<=<b class="page">.{2} )(?P<total>\d+)’, re.UNICODE)

Looks like the trouble maker is the non-Latin characters: 总数. Let’s play a little bit in the pdb console:

(Pdb) ‘总数’
\xe6\x80\xbb\xe6\x95\xb0′
(Pdb) ‘总数’.decode(‘utf8′)
u\u603b\u6570′

And it works finally with the hard-coded Unicode character:

pattern = re.compile(‘(?<=<b class="page">\u603b\u6570 )(?P<total>\d+)’, re.UNICODE)

We can use the decode method to avoid the ugly Unicode string for better readability:

pattern = re.compile(‘(?<=<b class="page">总数 )(?P<total>\d+)’decode(‘utf-8′), re.UNICODE)

And a note is recorded that the decoded codec MUST be consistent to the code page declaration.

Some speculations based upon the observation:

  • re.UNICODE does not enforce the Unicode mode, it just redefine the escaped characters like: \b, \w etc.
  • The pattern and string in Unicode implicitly invokes the Unicode mode. That explains why some pattern works in Python console only. Both of them are encoded in UTF-8, so re really runs in 8bit!
  • Python interpreter will not translate the literal string even though the code page is specified.

Please leave your insight in the comments. Thanks

UPDATE:
Thanks for all the comments first. Seems that I have a typo when testing the pattern with Unicode prefix. Here are the test cases:

patterns = [
    re.compile(‘(?<=<b class="page">总数 )(?P<total>\d+)</b>’.decode(‘utf8′), re.UNICODE),
    re.compile(ur‘(?<=<b class="page">总数 )(?P<total>\d+)</b>’, re.UNICODE),
    re.compile(u‘(?<=<b class="page">总数 )(?P<total>\d+)</b>’, re.UNICODE),
    re.compile(‘(?<=<b class="page">总数 )(?P<total>\d+)</b>’, re.UNICODE),
    ]

print [ pattern.search(s) for pattern in patterns ]

The output is:

[<_sre.SRE_Match object at 0xb7c18260>, <_sre.SRE_Match object at 0xb7c18360>, <_sre.SRE_Match object at 0xb7c183a0>, None]

Download test.py