One up to Python expert (1) - Decorators

Development November 15th, 2007

I have been using Python for almost 4 years. I still remembered how amazed I was by the elegance of code indention, simplicity at the first sight, started from the official tutorial, then dive into it, daily scripts, then couple of side projects, but until today, I still hesitate to claim a Python expert. I would rather share this series of the journey to broaden my vision and deepen my insight on this beautiful language.

Introduction

Dr. David Mertz has a stunning demonstration in Charming Python column show this language feature, decorater. Here is the sample code to add spam, the favorite food for python, to arbitrary function:

#!/usr/bin/env python

def addspam(fn):
        def new(*args):
                “new method”
                print “spam spam spam”
                return fn(*args)
        return new

@addspam
def add(a,b):
        “add method”
        print a**2 + b**2

if __name__ == “__main__”:
        add(3, 4)

The output of is:

spam spam spam
25

According to the Python language reference, the @ operation is defined as:

Decorator expressions are evaluated when the function is defined, in the scope that contains the function definition. The result must be a callable, which is invoked with the function object as the only argument. The returned value is bound to the function name instead of the function object. Multiple decorators are applied in nested fashion.

Therefore, our add is defined as:

add = addspam(add)

new is an anonymous function, the name is arbitrary since we would never call it directly. When add is invoked, addspam is evaluated and returns a callable object, new, which accepts the arguments, then is executed. As a well-behaviored decorator, new eventually calls the decoratee after spreads the word, “spam”.

Before we rush to more sophisticated application, let’s take a look at the flaws of the crystal ball. Yes, it is not crystal transparent, new blocks the signature of add:

>>> print add.__doc__
new method

We can copy the meta data by all means, but is there a smart way to avoid the boilerplate code? Yes, we can use Michele Simionato’s decorator library like this:

#!/usr/bin/env python
from decorator import decorator
@decorator
def addspam(fn, *args, **kw):
    “new method”
    print “spam spam spam”, args
    return fn(*args)

@addspam
def add(a,b):
    “add method”
    print a**2 + b**2

if __name__ == “__main__”:
    print add.__doc__
    add(3, 4)

That is quite dizzying, What is the on the earth under the hood?

Under the hood

First, let us inspect the vanilla version, each decorator would block the signature of the decoratee, illustrated by different colors.

Signatures blocked by decorators

Here is the code snippet of decorator.py

def decorator(caller):
    def _decorator(func): # the real meat is here
        infodict = getinfo(func)
        argnames = infodict[‘argnames’]
        assert not (‘_call_’ in argnames or ‘_func_’ in argnames), (
            ‘You cannot use _call_ or _func_ as argument names!’)
        src = “lambda %(signature)s: _call_(_func_, %(signature)s)” % infodict
        # import sys; print >> sys.stderr, src # for debugging purposes
        dec_func = eval(src, dict(_func_=func, _call_=caller))
        return update_wrapper(dec_func, func, infodict)
    return update_wrapper(_decorator, caller)

The first difference that caught my eye was the _decorator’s argument, func, instead of new’s *args. Does this matter?

Yes, that is the trick of the magic. decorator decorates the addspam which decorates add, are you still awake? So add is the argument for decorator’s anonymous function, i.e _decorator.

There are two assistants for the magic: getinfo copy the signature of the function; update_wrapper seals the _decorator with caller’s signature. When add is invoked, decorator(addspam) is evaluated, which returns _decortor with addspam’s signature, in another word, decorator is transparent to addspam. _decorator is also the decorator of add, so _decorator(add) is executed:

  1. Sanity check: make sure keyword is not used in the argument names
  2. Cook the real meat: build addspam(add) in dec_func. Please check 3.6.2 String Formatting Operations for the syntax of mapping dictionary.
  3. Seal the can with fun(i.e add)’s signature

Here is the dynamic illustration:

Signatures relayed by decorators

Patterns

I would discuss this topic in detail later, you could take a look at official wiki.

Memorize

This pattern is well-documented in Wikipedia, and here is an effective but obtrusive implementation in DDJ[1], the corresponding python implementation is much more intuitive:

@memoize
def fib(n):
    print “%d is caculated” % n
    if  n < 2 :
        return 1;
    else:
        return fib(n-1) + fib(n-2)

Programming by Contract

Programming by Contract is an approach for software engineering. Microsoft Visual C++ introduced SAL annotations for precondition and postcondition. Decorators helps to separate the contract and logic[2]:

@precondition(“tom > 0″)
@precondition(“jerry > 1″)
@postcondition(lambda x: x > 1)
def foo(tom=1, jerry=2, rose=3, jack=4):
        print tom
        return jack

Precondition determines the requirement of the arguments, it is more convenient to use names of arguments for evaluation; postcondition specifies the return value, function object is more appropriate to refer the returned value. Here is the full implementation.

Aspect oriented programming

AOP is quite popular in Java community, there is corresponding Python project, PEAK for enterprise environment. For lightweight AOP developer, the decorator could do some help, such as the canonical fund transfer example:

@precondition(“amount > 0″)
@precondition(“fromAccount.amount > amount”)   
def transfer(fromAccount, toAccount, amount):
        # TODO: add transaction here.
        fromAccount.withdraw(amount)
        toAccount.deposit(amount)

Conclusion

Decorator opens a door to override the default behavior of function. The add-on lets the magic shine and hide the mechanism behind the curtain.

[1] It would be a juicy topic to implement decorator via metaprogramming
[2] Serious DBC users may consider PyDBC

HOWTO convert Chinese MP3 for ID3 v2.3 standard

Gentoo November 7th, 2007

Amarok developers probably barely thought about the response from the Chinese users when they eventually dropped the id3 tag codec detection, and enforced ID3v2 specification. “Amarok is dead”, claimed in linuxfans.org, the community-powered Magic Linux support forum. Why? Quite a few MP3 files are encoded in GB2312 on id3v1 in China and even worse, some files are encoded with GB2312 in ID3 v2.3 format. What a mess!

I respect their decision, the player has no responsibility to clean the shit of lousy encoders, but we need to face the reality by all means. Here is my cruel life: Amarok is preferred in Linux, occasionally I am using mpg123 in console mode; using foobar2000 in Windows, sometimes Windows Media Player; portable MP3 player is Creative Zen Micro. No Mac, no iPod. To make things even worse, the locale in Linux is utf8, while in Windows, it is utf16-le. Last but not the least, I do respect specification.

So ID3v1 is not considered, it only supports ISO8859-1, that make it impossible to hold CJK characters. For ID3v2, the most popular version is v2.3, unfortunately, it does not support utf8 encoding. v2.4 supports this codec, but it is seldom picked up by the hardware manufacturer or the application developers.

Let’s start from the latest specification. ID3 v2.4:

The first bad news is a de facto id3v2 implementation, id3v2-0.1.11 does not support v2.4. That cost several hours to figure out why the newly added v2.4 disappeared mysterically, the answer is id3v2 is even unable to recognize v2.4 tags. EyeD3 is the remedy, this pure python library provides a very neat command line utility to manipulate id3 v2.4 tags. The good news is Creative Zen Micro support v2.4. In fact, I am not quite sure whether the honor goes to Creative Lab, or the libnjb developers.

Another option is v2.3, most popular implementation so far. Unfortunately, it only supports unicode-LE(i.e the default locale of Microsoft Windows), unicode-be and latin-1, no UTF-8 support. To make it even worse, id3v2 writes to the tag regardless the locale, that is really horrible!

Here is my effort to address this problem, eyeD3conv, as the name suggest, it depends on eyeD3 library. This small utility will convert mistaken-encoded tags to standard Unicode16-LE ID3 v2.3 tag.

And you need to apply this patch to fix the encoding bug in eyeD3-0.6.14. The patch has been submitted to the upstream.

Update: thanks to the author of eyeD3, Travis’ quick response, according to the specification, the url is supposed to be encoded in ascii, so we can simply ignore the URLFrame. Forget the patch, and use theupdated-version.

Other mis-encoded frames may throw an UnicodeDecode exception when frame is read/written that cancels the succeeding file rename action. Here are some pragmatic tips to work around this issue:

# remove all comments
eyeDe –remove-comments foo.mp3
# remove WXXX frame
eyeDe –set-text-frame=“WXXX:” foo.mp3

No idea which application inserts such crap into the tag.