<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Chen Yufei's blog &#187; decorator</title>
	<atom:link href="http://chenyufei.info/blog/tag/decorator/feed/" rel="self" type="application/rss+xml" />
	<link>http://chenyufei.info/blog</link>
	<description>Keep your head about you while all those are losing theirs</description>
	<lastBuildDate>Wed, 21 Jul 2010 05:30:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Shell like data processing in Python &#8212; using decorators</title>
		<link>http://chenyufei.info/blog/2009-10-03/shell-like-data-processing-in-python-using-decorators/</link>
		<comments>http://chenyufei.info/blog/2009-10-03/shell-like-data-processing-in-python-using-decorators/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 15:31:29 +0000</pubDate>
		<dc:creator>chenyufei</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[decorator]]></category>
		<category><![CDATA[pipe]]></category>

		<guid isPermaLink="false">http://chenyufei.info/blog/?p=232</guid>
		<description><![CDATA[前面的文章展示了管道的好处，以及在 Python 程序中利用管道的思想。但是前面文章里的代码还有一点缺陷，看下面的 shell 脚本和 Python 代码的比较： find logdir -name "access-log*" &#124; \ xargs cat &#124; \ grep '[^-]$' &#124; \ awk '{ total += $NF } END { print total }' lines = grep&#40;'[^-]$', cat&#40;find&#40;'logdir', 'access-log*'&#41;&#41;&#41; col = &#40;line.rsplit&#40;None, 1&#41;&#91;1&#93; for line in lines&#41; print sum&#40;int&#40;c&#41; for c in col&#41; Python 中 find, cat, [...]]]></description>
			<content:encoded><![CDATA[<p>前面的文章展示了<a href="http://chenyufei.info/blog/2009-01-19/why-unix-pipe-is-a-good-thing/">管道的好处</a>，以及在 Python 程序中<a href="http://chenyufei.info/blog/2009-01-28/writing-unix-pipe-style-python-code/">利用管道的思想</a>。但是前面文章里的代码还有一点缺陷，看下面的 shell 脚本和 Python 代码的比较：</p>
<pre>
find logdir -name "access-log*" | \
xargs cat | \
grep '[^-]$' | \
awk '{ total += $NF } END { print total }'
</pre>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">lines = grep<span style="color: black;">&#40;</span><span style="color: #483d8b;">'[^-]$'</span>, cat<span style="color: black;">&#40;</span>find<span style="color: black;">&#40;</span><span style="color: #483d8b;">'logdir'</span>, <span style="color: #483d8b;">'access-log*'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
col = <span style="color: black;">&#40;</span>line.<span style="color: black;">rsplit</span><span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span> <span style="color: #ff7700;font-weight:bold;">for</span> line <span style="color: #ff7700;font-weight:bold;">in</span> lines<span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #008000;">sum</span><span style="color: black;">&#40;</span><span style="color: #008000;">int</span><span style="color: black;">&#40;</span>c<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> c <span style="color: #ff7700;font-weight:bold;">in</span> col<span style="color: black;">&#41;</span></pre></div></div>

<p>Python 中 find, cat, grep 的调用用一层层的括号嵌套起来进行调用，执行顺序是从最内部的括号开始，可读性没有 shell 脚本好。可不可能在 Python 脚本中用类似 shell 脚本里的语法来提高可读性？用运算符重载就可以做到了。最初的想法<a href="http://code.activestate.com/recipes/276960/">来自这里</a>，这个例子用的方法是重载或运算符 &#8220;|&#8221;。新的 Python 代码如下，我把这样的代码称为 pipe syntax (抽取最后一列并求和的代码不变。)</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">lines = find<span style="color: black;">&#40;</span><span style="color: #483d8b;">'logdir'</span>, <span style="color: #483d8b;">'access-log*'</span><span style="color: black;">&#41;</span> | cat | grep<span style="color: black;">&#40;</span><span style="color: #483d8b;">'[^-]$'</span><span style="color: black;">&#41;</span></pre></div></div>

<p>用运算符重载的方法把一个函数放在管道中必须自己定义一个类，在 override 的 <code>__ror__()</code> 函数中完成实际的工作。每次都要定义一个类我觉得不够方便，因此我用 decorator 来进行简化。</p>
<p><strong>简单来说 decorator 通过用其他对象替换原来的函数来改变函数的行为</strong>。对 decorator 的介绍可以参考 Bruce Eckel 的两篇文章，分别介绍了<a href="http://www.artima.com/weblogs/viewpost.jsp?thread=240808">无参数</a>和<a href="http://www.artima.com/weblogs/viewpost.jsp?thread=240845">有参数</a>的 decorator 如何创建，无参数的那篇文章还介绍了 decoractor 的作用。Bruce Eckel 认为 decorator 就像是宏一样，可以改变函数的语义。他还认为 Python 的 decorator 就像 Lisp 的宏一样 powerful。对这一点我不太赞同，两者差别还是很大的，decorator 其实只是提供了在运行时动态地替换函数的功能，而 Lisp 的宏是在编译时生成代码，要说 powerful 肯定还是 Lisp 的宏更强，但 decorator 更简单而且也已经足够 powerful 了。</p>
<p>使用 decorator 来定义 cat 的代码如下，函数前的 &#8220;@&#8221; 是为了支持 decorator 而引入的新的语法：</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">@pipeable
<span style="color: #ff7700;font-weight:bold;">def</span> grep<span style="color: black;">&#40;</span><span style="color: #008000;">iter</span>, match<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">callable</span><span style="color: black;">&#40;</span>match<span style="color: black;">&#41;</span>:
        fun = match
    <span style="color: #ff7700;font-weight:bold;">else</span>:
        fun = <span style="color: #dc143c;">re</span>.<span style="color: #008000;">compile</span><span style="color: black;">&#40;</span>match<span style="color: black;">&#41;</span>.<span style="color: black;">match</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> ifilter<span style="color: black;">&#40;</span>fun, <span style="color: #008000;">iter</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Without @pipeable before the function definition of grep,</span>
<span style="color: #808080; font-style: italic;"># we can use the following code to achieve the same effect.</span>
grep = pipeable<span style="color: black;">&#40;</span>grep<span style="color: black;">&#41;</span></pre></div></div>

<p>pipeable 其实只是一个普通的 Python 对象，可以是一个函数，也可以是一个类。如果是函数，那么 grep 就是给它的参数；如果是类，grep 是给它的初始化函数的参数。<strong>pipeable(grep) 必须返回一个能够调用的对象（含有 <code>__call__()</code> 方法）</strong>，除此之外没有其他要求。可见，decorator syntax 只是语法糖而已，但这个语法糖使得我们可以在函数之前加上修饰，而且从代码上一下就可以看出 grep 可以用在管道中。</p>
<p>下面首先描述 pipeable 修饰函数的要求和修饰后的行为，然后再看 pipeable 的实现。</p>
<p>可以用 pipeable 进行修饰的函数只有一个要求，第一个参数必须支持遍历操作。如果这个函数之后还有其他管道，那么函数的返回值也需要支持遍历。</p>
<p>被 pipeable 修饰之后，函数的行为如下：</p>
<ol>
<li>可以像没有修饰过时一样调用，函数行为不变</li>
<li>调用时给定除了第一个参数以外的所有参数，当函数对象出现在 &#8220;|&#8221; 右侧时，将左侧对象作为函数的第一个参数，与之前的参数一起完成函数的调用。</li>
</ol>
<p>用来实现 pipe syntax 的 decorator 如下（完整代码<a href='http://chenyufei.info/blog/wp-content/uploads/2009/10/shelike.py'>shelike.py</a>）。</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> functools <span style="color: #ff7700;font-weight:bold;">import</span> update_wrapper
<span style="color: #ff7700;font-weight:bold;">class</span> pipeable:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, method<span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">func</span> = method
        <span style="color: #008000;">self</span>.<span style="color: black;">args</span> = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">kwds</span> = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">reqlen</span> = <span style="color: #ff4500;">0</span>
        <span style="color: #808080; font-style: italic;"># Since we need to allow classes to be used in pipe, there are cases that</span>
        <span style="color: #808080; font-style: italic;"># method is not a function.</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">hasattr</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">func</span>, <span style="color: #483d8b;">'func_code'</span><span style="color: black;">&#41;</span>:
            <span style="color: #008000;">self</span>.<span style="color: black;">reqlen</span> = <span style="color: #008000;">self</span>.<span style="color: black;">func</span>.<span style="color: black;">func_code</span>.<span style="color: black;">co_argcount</span>
        <span style="color: #808080; font-style: italic;"># makes the wrapper object looks like the wrapped function</span>
        update_wrapper<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, method<span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__call__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #66cc66;">*</span>args, <span style="color: #66cc66;">**</span>kwds<span style="color: black;">&#41;</span>:
        curlen = <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>args<span style="color: black;">&#41;</span>
        <span style="color: #808080; font-style: italic;"># An ugly hack to handle classes.</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> curlen == <span style="color: #008000;">self</span>.<span style="color: black;">reqlen</span> <span style="color: #ff7700;font-weight:bold;">or</span> <span style="color: #ff4500;">0</span> == <span style="color: #008000;">self</span>.<span style="color: black;">reqlen</span>:
            <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">func</span><span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>args, <span style="color: #66cc66;">**</span>kwds<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">elif</span> curlen <span style="color: #66cc66;">!</span>= <span style="color: #008000;">self</span>.<span style="color: black;">reqlen</span> - <span style="color: #ff4500;">1</span>:
            <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">TypeError</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'Arguments number wrong.'</span><span style="color: black;">&#41;</span>
&nbsp;
        cpy = deepcopy<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>
        cpy.<span style="color: black;">args</span> = args
        cpy.<span style="color: black;">kwds</span> = kwds
        <span style="color: #ff7700;font-weight:bold;">return</span> cpy
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__ror__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">iter</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">func</span><span style="color: black;">&#40;</span><span style="color: #008000;">iter</span>, <span style="color: #66cc66;">*</span><span style="color: #008000;">self</span>.<span style="color: black;">args</span>, <span style="color: #66cc66;">**</span><span style="color: #008000;">self</span>.<span style="color: black;">kwds</span><span style="color: black;">&#41;</span></pre></div></div>

<p>由于修改后的对象需要支持 &#8220;|&#8221; 运算符，pipeable 用类的形式实现更方便。（看了 Bruce Eckel 的文章后我也偏好用类来创建 decorator。）</p>
<ol>
<li>
<code>__init__()</code> 接受要修饰的函数，保存起来以备后面调用，这个函数在函数被修饰时执行，实际上就是创建一个pipeable 对象，把原先函数名赋给这个对象。最后的 update_wrapper 把被修饰函数的 <code>__name__, __doc__, __module__</code> 属性拷贝到创建的对象上，这样这个对象看起来就更加像原先的函数。（否则这些属性的值的都是这个对象的值。）
</li>
<li>
<code>__call__()</code> 在被修饰函数/替换的对象被调用时执行，这里根据参数个数直接调用原函数或者把参数保存起来</li>
<li>
<code>__ror__()</code> 就是实现 pipe syntax 的关键，把 &#8220;|&#8221; 左边的序列和其他参数组合起来完成被修饰函数的调用。有些 ugly hack 是为了处理 class 作为初始化参数的情况</li>
</ol>
<p>要把 shell 里的常用命令一个个用 Python 再实现一下也比较麻烦，因此我还实现了一个函数用来把 Python 里的数据转成字符串直接调用 shell 命令来处理，处理完再转成一行一行的 Python 字符串。比较下用 pipeable 和自己定义一个 class 的实现代码（现在的处理方式是把输入一次性全部转成字符串通过 OS 的管道传给外部程序，数据量大的话内存开销比较大，但我没有想到好的解决办法。）</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">@pipeable
<span style="color: #ff7700;font-weight:bold;">def</span> shell<span style="color: black;">&#40;</span><span style="color: #008000;">iter</span>, <span style="color: #dc143c;">cmd</span>=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
    pipe = Popen<span style="color: black;">&#40;</span><span style="color: #dc143c;">cmd</span>, shell=<span style="color: #008000;">True</span>, stdin = PIPE, stdout = PIPE<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> pipe.<span style="color: black;">communicate</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">''</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: #008000;">iter</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">splitlines</span><span style="color: black;">&#40;</span><span style="color: #008000;">True</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> shell:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #dc143c;">cmd</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: #dc143c;">cmd</span> = <span style="color: #dc143c;">cmd</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__ror__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">iter</span><span style="color: black;">&#41;</span>:
        pipe = Popen<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: #dc143c;">cmd</span>, shell=<span style="color: #008000;">True</span>, stdin = PIPE, stdout = PIPE<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> pipe.<span style="color: black;">communicate</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">''</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: #008000;">iter</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">splitlines</span><span style="color: black;">&#40;</span><span style="color: #008000;">True</span><span style="color: black;">&#41;</span></pre></div></div>

<p>有了这个管道，就可以把求和的任务直接用 awk 来完成了。</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">sumtmp = lines | shell<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;awk '{ total += $NF } END { print total }'&quot;</span><span style="color: black;">&#41;</span> | aslist
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>sumtmp<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>当然，如果不想依赖外部程序，可以写一个提取字符串第 n 列/最后一列的函数，作为 tr 的参数放入管道（这个函数的名字不太好，其实就是 map，代码见<a href='http://chenyufei.info/blog/wp-content/uploads/2009/10/shelike.py'>shelike.py</a>），最后转成整数以后再用 sum 求和即可。</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: black;">&#40;</span>lines | tr<span style="color: black;">&#40;</span>last_column<span style="color: black;">&#41;</span> | tr<span style="color: black;">&#40;</span><span style="color: #008000;">int</span><span style="color: black;">&#41;</span> | <span style="color: #008000;">sum</span><span style="color: black;">&#41;</span></pre></div></div>

<p>我很喜欢看这样的代码，有点函数式的味道，而且可读性也很好 :)</p>
<p>有兴趣的话 <a href="http://pypi.python.org/pypi/decorator">decorator module</a> 和 <a href="http://wiki.python.org/moin/PythonDecoratorLibrary">PythonDecoratorLibrary</a> 提供了更多使用 decorator 的例子，推荐。</p>
]]></content:encoded>
			<wfw:commentRss>http://chenyufei.info/blog/2009-10-03/shell-like-data-processing-in-python-using-decorators/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
