<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: WYS is not always WYG in python.re</title>
	<atom:link href="http://kunxi.org/archives/2008/10/wys-is-not-always-wyg-in-pythonre/feed/" rel="self" type="application/rss+xml" />
	<link>http://kunxi.org/archives/2008/10/wys-is-not-always-wyg-in-pythonre/</link>
	<description>Yet another code monkey blog.</description>
	<lastBuildDate>Wed, 04 Jan 2012 07:28:05 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Serge</title>
		<link>http://kunxi.org/archives/2008/10/wys-is-not-always-wyg-in-pythonre/comment-page-1/#comment-54269</link>
		<dc:creator>Serge</dc:creator>
		<pubDate>Sun, 05 Oct 2008 01:45:54 +0000</pubDate>
		<guid isPermaLink="false">http://kunxi.org/?p=301#comment-54269</guid>
		<description>&#039;总数&#039;.decode(&#039;utf-8&#039;) 

is equal to

# -*- coding: utf-8 -*-
u&#039;总数&#039;

Notice the u prefix</description>
		<content:encoded><![CDATA[<p>&#8216;总数&#8217;.decode(&#8216;utf-8&#8242;) </p>
<p>is equal to</p>
<p># -*- coding: utf-8 -*-<br />
u&#8217;总数&#8217;</p>
<p>Notice the u prefix</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mikko Ohtamaa</title>
		<link>http://kunxi.org/archives/2008/10/wys-is-not-always-wyg-in-pythonre/comment-page-1/#comment-54252</link>
		<dc:creator>Mikko Ohtamaa</dc:creator>
		<pubDate>Sat, 04 Oct 2008 14:01:42 +0000</pubDate>
		<guid isPermaLink="false">http://kunxi.org/?p=301#comment-54252</guid>
		<description>I do not posses skills in Python regular expression madness, but I suggest you to try to use BeautifulSoup module for HTML pattern matching.</description>
		<content:encoded><![CDATA[<p>I do not posses skills in Python regular expression madness, but I suggest you to try to use BeautifulSoup module for HTML pattern matching.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Ippolito</title>
		<link>http://kunxi.org/archives/2008/10/wys-is-not-always-wyg-in-pythonre/comment-page-1/#comment-54219</link>
		<dc:creator>Bob Ippolito</dc:creator>
		<pubDate>Fri, 03 Oct 2008 16:16:31 +0000</pubDate>
		<guid isPermaLink="false">http://kunxi.org/?p=301#comment-54219</guid>
		<description>I think the problem is that you&#039;re not actually using unicode, you&#039;re still using strings.

Instead of this:
pattern = re.compile(&#039;(?&lt;=&lt;b&gt;总数 )(?P\d+)&#039;, re.UNICODE)

You should do this:
pattern = re.compile(u&#039;(?&lt;=&lt;b&gt;总数 )(?P\d+)&#039;, re.UNICODE)

Notice the &quot;u&quot; prefix on the string literal. That makes it a unicode literal.</description>
		<content:encoded><![CDATA[<p>I think the problem is that you&#8217;re not actually using unicode, you&#8217;re still using strings.</p>
<p>Instead of this:<br />
pattern = re.compile(&#8216;(?&lt;=<b>总数 )(?P\d+)&#8217;, re.UNICODE)</p>
<p>You should do this:<br />
pattern = re.compile(u&#8217;(?&lt;=</b><b>总数 )(?P\d+)&#8217;, re.UNICODE)</p>
<p>Notice the &#8220;u&#8221; prefix on the string literal. That makes it a unicode literal.</b></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: zakhar</title>
		<link>http://kunxi.org/archives/2008/10/wys-is-not-always-wyg-in-pythonre/comment-page-1/#comment-54216</link>
		<dc:creator>zakhar</dc:creator>
		<pubDate>Fri, 03 Oct 2008 14:22:44 +0000</pubDate>
		<guid isPermaLink="false">http://kunxi.org/?p=301#comment-54216</guid>
		<description>u&#039;(?&lt;=总数 )(?P\\d+)&#039; == &#039;(?&lt;=总数 )(?P\d+)&#039;.decode(&#039;utf-8&#039;)</description>
		<content:encoded><![CDATA[<p>u&#8217;(?&lt;=总数 )(?P\\d+)&#8217; == &#8216;(?&lt;=总数 )(?P\d+)&#8217;.decode(&#8216;utf-8&#8242;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: zakhar</title>
		<link>http://kunxi.org/archives/2008/10/wys-is-not-always-wyg-in-pythonre/comment-page-1/#comment-54215</link>
		<dc:creator>zakhar</dc:creator>
		<pubDate>Fri, 03 Oct 2008 14:17:30 +0000</pubDate>
		<guid isPermaLink="false">http://kunxi.org/?p=301#comment-54215</guid>
		<description>I think this code must work:
pattern = re.compile(u‘(?&lt;=&lt;b&gt;总数 )(?P\\d+)’, re.UNICODE)</description>
		<content:encoded><![CDATA[<p>I think this code must work:<br />
pattern = re.compile(u‘(?&lt;=<b>总数 )(?P\\d+)’, re.UNICODE)</b></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Thomas Waldmann</title>
		<link>http://kunxi.org/archives/2008/10/wys-is-not-always-wyg-in-pythonre/comment-page-1/#comment-54213</link>
		<dc:creator>Thomas Waldmann</dc:creator>
		<pubDate>Fri, 03 Oct 2008 10:31:31 +0000</pubDate>
		<guid isPermaLink="false">http://kunxi.org/?p=301#comment-54213</guid>
		<description>You&#039;re doing it too complicated.

# -*- coding: utf-8 -*-
content = urllib.urlopen(url).read()
content = content.decode(coding)

pattern = re.compile(ur&#039;&lt;b&gt;总数 (?P\d+)&#039;, re.UNICODE)

Please note:
 * strictly taken, you should get the page content encoding from the page
 * i don&#039;t think you need that look-before assertion
 * but what you definitely need is u&#039;...&#039; or ur&#039;...&#039;!</description>
		<content:encoded><![CDATA[<p>You&#8217;re doing it too complicated.</p>
<p># -*- coding: utf-8 -*-<br />
content = urllib.urlopen(url).read()<br />
content = content.decode(coding)</p>
<p>pattern = re.compile(ur&#8217;<b>总数 (?P\d+)&#8217;, re.UNICODE)</p>
<p>Please note:<br />
 * strictly taken, you should get the page content encoding from the page<br />
 * i don&#8217;t think you need that look-before assertion<br />
 * but what you definitely need is u&#8217;&#8230;&#8217; or ur&#8217;&#8230;&#8217;!</b></p>
]]></content:encoded>
	</item>
</channel>
</rss>

