Learning Django by Example(6): Search

Web January 4th, 2008

Search is one of the must-have functionalities in Gelman. Here is an SearchQuerySet based upon MySQL full text search extension. It is really cool and neat, but

  • first, I don’t want to build my application against specific database extension, even though MySQL is universally picked up in the Web application.
  • second, I still prefer more flexible and powerful search syntax other than what MySQL provides
  • Last but not the least, I may still need Lucene or Xapian to index, search the PDF, CHM eBooks

So I home-brew the search using PLY, the Python Lex Yacc toolchain. You could check the code here, most of parser.py is just boilerplate, the interesting part is to build django.db.models.Q:

def p_expression_term(t):
    ‘expression : TERM’
    t[0] = Q(**{‘title__icontains’:t[1]})

The semantics is quite straightforward: AND(the default), OR operations are supported directly from Q; and only field title is searched. We may extend the syntax using author: like Google does later, so stay tune.

Return of the Lex

Development July 7th, 2006

I don’t touch the flex/yacc since the final exam of the class, until today I have to cope with the FORTRAN 90 input data. The input file is like:

80000.00 50.00
80000.00 0.00
120000.00 0.00

with various data formats, flexible lengths of white space, scattered white lines. I first tired the standard string search/match, and soon lost the patience for tedious if..else. Therefore, I decided to try Lex, the very first flex in action.

First, the flex rule. We just need the float pointer number, and ignore all the spaces, new-lines and end-of-file. We expect the token is returned one by one, so we return the flag as soon as the pattern matches.

%option noyywrap

space   [ \t]+
number  [-+]?[0-9]*\.?[0-9]*

%%

{number}      { return 1;  }

{space} |
\n          ;

%%

In main.c, we would like to override default input file handle stdin for yyin if additional argument is fed ( for some reason, this cause the wordpress 503 error, so take a look at the tarball, sorry for the inconveniece). Then we fetch the token one by one, convert the matched yytext to float-point number until the end of file.

extern FILE* yyin;
extern char* yytext;
extern int yylex(void);

int main(int argc, char* argv[] )
{
        double x;
        assert( yyin );
        if( argc == 1 )
                yyin = stdin;

        while( yylex() )
        {
                fprintf(stderr, "text = %s ", yytext );
                x = atof( yytext );
                fprintf(stderr, "x = %f\n", x );
        }

        return 0;
}

At then end, Makefile glues all the parts together.

LEX = flex
LEXFLAGS = -t  
CFLAGS = -g
OBJS = obstac.o main.o
TARGET = scan

all : ${TARGET}

${TARGET} : ${OBJS}
                ${CC} ${CFLAGS} -o ${TARGET} ${OBJS}

%.c : %.l
                ${LEX} ${LEXFLAGS} $<  > $@

%.o : %.c
                ${CC} ${CFLAGS} -o $@ -c $<
.PHONY: clean  
clean :
                ${RM} ${TARGET} ${OBJS}

Here is the tarball for the source code and example input file.