|
|

楼主 |
发表于 2006-4-14 15:45:00
|
显示全部楼层
Re:使用Boost.Python构建混合系统
Deeper Reflection on the Horizon?
Admittedly, this formula is tedious to repeat, especially on a project with many polymorphic classes. That it is neccessary reflects some limitations in C++'s compile-time introspection capabilities: there's no way to enumerate the members of a class and find out which are virtual functions. At least one very promising project has been started to write a front-end which can generate these dispatchers (and other wrapping code) automatically from C++ headers.
Pyste is being developed by Bruno da Silva de Oliveira. It builds on GCC_XML, which generates an XML version of GCC's internal program representation. Since GCC is a highly-conformant C++ compiler, this ensures correct handling of the most-sophisticated template code and full access to the underlying type system. In keeping with the Boost.Python philosophy, a Pyste interface description is neither intrusive on the code being wrapped, nor expressed in some unfamiliar language: instead it is a 100% pure Python script. If Pyste is successful it will mark a move away from wrapping everything directly in C++ for many of our users. It will also allow us the choice to shift some of the metaprogram code from C++ to Python. We expect that soon, not only our users but the Boost.Python developers themselves will be "thinking hybrid" about their own code.
序列化(Serialization)
序列化的的含义是把内存中的对象转换成能够存储到磁盘或者通过网络连接发送的形式。序列化后生成的对象(大多数时候是一种字符串)能被重新转化到原始对象。一个好的序列化系统会自动的转化整个对象体系。Python的pickle 模块就是这样一个系统。它得益于这种语言强大的运行期内部处理(译注:instrospection)能力,几乎能序列化任意用户定义的对象。只需要通过加入一些简单 的、非打扰式的处理,这种强大机制就能够扩展到为封装的C++对象工作。下面是一个例子:
#include <string>
struct World
{
World(std::string a_msg) : msg(a_msg) {}
std::string greet() const { return msg; }
std::string msg;
};
#include <boost/python.hpp>
using namespace boost::python;
struct World_picklers : pickle_suite
{
static tuple
getinitargs(World const& w) { return make_tuple(w.greet()); }
};
BOOST_PYTHON_MODULE(hello)
{
class_<World>("World", init<std::string>())
.def("greet", &World::greet)
.def_pickle(World_picklers())
;
}
现在,我们创建一个World 对象并且把它放在磁盘上休息:
>>> import hello
>>> import pickle
>>> a_world = hello.World("howdy")
>>> pickle.dump(a_world, open("my_world", "w"))
然后,可能是在不同的计算机上不同的操作系统的不同的一个脚本上,我们这样用:
>>> import pickle
>>> resurrected_world = pickle.load(open("my_world", "r"))
>>> resurrected_world.greet()
'howdy'
当然,使用 cPickle (译注:cPickle是更高效率的一种pickle实现)模块可以更快速的处理。
Boost.Python 的 pickle_suite 完全支持标准Python文档定义的pickle 协议。像Python的__getinitargs__ 函数那样,pickle_suite 的 getinitargs() 函数负责创建argument tuple用以重建pickle过的对象。Python pickling 协议的其他元素, __getstate__ and __setstate__ 可以通过C++ getstate和setstate函数选择提供。C++的静态类型系统允许库确保在编译期避免无意义的函数合并(例如:getstate 却没有 setstate)被应用。
使更复杂的C++对象能够被序列化要比上面的示例需要更多的工作。幸运的是object 接口(查看下一部分)在代码可管理性上非常地有帮助。
Object 接口(Object interface)
有经验的C语言扩展模块接口作者应该很熟悉PyObject*,手动引用计数(reference-counting),而且需要记住哪个API返回"新的" (拥有的) 引用或者 "借来的" (raw) 引用。这些限制不仅仅是很麻烦,重要的这也是主要的错误源,特别是在异常的表示(presence of exceptions)上。
Boost.Python提供了一个object 类,能够自动进行引用计数并且提供从任意C++对象到Python对象的转换。这对于想成为扩展模块作者的人来说,极大的减少了学习困难。
从任何其他类型创建一个object 是非常简单的:
object s("hello, world"); // s manages a Python string
object 可以和所有其他数据类型进行模板化的交互(templated interactions ),并且能够自动完成到python的转换。这些都进行得非常自然以至于它很容易被忽略掉:
object ten_Os = 10 * s[4]; // -> "oooooooooo"
在上面的示例里,4 和 10 在进行索引操作和乘法操作调用(indexing and multiplication operations)前,被转化为Python对象。
extract<T> class 模板能够用来转换Python对象到C++类型:
double x = extract<double>(o);
如果任何一侧的转换不能进行,一个适当的exception将会在运行期被抛出。
object 类型与Python内建类型的‘副本’如:list, dict, tuple等等成为一套。这使得从C++转换到这些高阶类型变得方便操作:
dict d;
d["some"] = "thing";
d["lucky_number"] = 13;
list l = d.keys();
它的工作方式和看上去的样子几乎和一般的Python代码一样,但是它是纯C++的。当然我们可以封装接受或者返回object 实例的C++函数。
混合地思考(Thinking hybrid)
由于在组合不同的编程语言时实际上的和心理上的困难,通常在开始先确定单独的一种语言。对于任何应用程序来说,性能上的考虑决定了在核心算法上使用编译语言(compiled language)。不幸的是,由于静态类型系统的复杂性,我们为运行期性能所付出的代价通常在开发期极大的增加。经验显示:相对于开发同等的Python代码来说,写出可维护的C++代码通常需要更长时间和更多努力工作得来的经验。即使当开发者们用编译语言(compiled language)感觉很舒服的时候,他们也常常为他们的系统增加某种类型的脚本层,因为他们的用户可以获得同样的使用脚本语言的好处。
Boost.Python 让我们可以混合地思考。Python可以作为一些应用程序的快速原型;她的易用性和巨大的标准库给了我们到一个工作中的系统的一个开始。如果有必要,这些工作代码可以用来揭示热点比率(译注:意思是发现哪些代码运行最频繁或者占用时间/资源最多)。为了最大化提高性能,那些(热点)可以被C++重新实现,然后用Boost.Python把他们绑定到现有的高阶过程(higher-level procedure)中。
当然,自上而下的过程不是那么吸引人,如果从开始就有许多代码不得不改成用C++实现。幸运的是Boost.Python允许我们应用自下而上的过程。我们曾经应用这种过程非常成功地开发了一个科学软件的工具箱。这个工具箱的开始的时候主要是一个带有Boost.Python绑定的C++类,过了一段时间,成长的部分主要集中在C++的部分。然而由于这个工具箱越来越复杂,越来越多的新特性可以在Python内被实现。
This figure shows the estimated ratio of newly added C++ and Python code over time as new algorithms are implemented. We expect this ratio to level out near 70% Python. Being able to solve new problems mostly in Python rather than a more difficult statically typed language is the return on our investment in Boost.Python. The ability to access all of our code from Python allows a broader group of developers to use it in the rapid development of new applications.
开发历史(Development history)
The first version of Boost.Python was developed in 2000 by Dave Abrahams at Dragon Systems, where he was privileged to have Tim Peters as a guide to "The Zen of Python". One of Dave's jobs was to develop a Python-based natural language processing system. Since it was eventually going to be targeting embedded hardware, it was always assumed that the compute-intensive core would be rewritten in C++ to optimize speed and memory footprint 1. The project also wanted to test all of its C++ code using Python test scripts 2. The only tool we knew of for binding C++ and Python was SWIG, and at the time its handling of C++ was weak. It would be false to claim any deep insight into the possible advantages of Boost.Python's approach at this point. Dave's interest and expertise in fancy C++ template tricks had just reached the point where he could do some real damage, and Boost.Python emerged as it did because it filled a need and because it seemed like a cool thing to try.
This early version was aimed at many of the same basic goals we've described in this paper, differing most-noticeably by having a slightly more cumbersome syntax and by lack of special support for operator overloading, pickling, and component-based development. These last three features were quickly added by Ullrich Koethe and Ralf Grosse-Kunstleve 3, and other enthusiastic contributors arrived on the scene to contribute enhancements like support for nested modules and static member functions.
By early 2001 development had stabilized and few new features were being added, however a disturbing new fact came to light: Ralf had begun testing Boost.Python on pre-release versions of a compiler using the EDG front-end, and the mechanism at the core of Boost.Python responsible for handling conversions between Python and C++ types was failing to compile. As it turned out, we had been exploiting a very common bug in the implementation of all the C++ compilers we had tested. We knew that as C++ compilers rapidly became more standards-compliant, the library would begin failing on more platforms. Unfortunately, because the mechanism was so central to the functioning of the library, fixing the problem looked very difficult.
Fortunately, later that year Lawrence Berkeley and later Lawrence Livermore National labs contracted with Boost Consulting for support and development of Boost.Python, and there was a new opportunity to address fundamental issues and ensure a future for the library. A redesign effort began with the low level type conversion architecture, building in standards-compliance and support for component-based development (in contrast to version 1 where conversions had to be explicitly imported and exported across module boundaries). A new analysis of the relationship between the Python and C++ objects was done, resulting in more intuitive handling for C++ lvalues and rvalues.
The emergence of a powerful new type system in Python 2.2 made the choice of whether to maintain compatibility with Python 1.5.2 easy: the opportunity to throw away a great deal of elaborate code for emulating classic Python classes alone was too good to pass up. In addition, Python iterators and descriptors provided crucial and elegant tools for representing similar C++ constructs. The development of the generalized object interface allowed us to further shield C++ programmers from the dangers and syntactic burdens of the Python 'C' API. A great number of other features including C++ exception translation, improved support for overloaded functions, and most significantly, CallPolicies for handling pointers and references, were added during this period.
In October 2002, version 2 of Boost.Python was released. Development since then has concentrated on improved support for C++ runtime polymorphism and smart pointers. Peter Dimov's ingenious boost::shared_ptr design in particular has allowed us to give the hybrid developer a consistent interface for moving objects back and forth across the language barrier without loss of information. At first, we were concerned that the sophistication and complexity of the Boost.Python v2 implementation might discourage contributors, but the emergence of Pyste and several other significant feature contributions have laid those fears to rest. Daily questions on the Python C++-sig and a backlog of desired improvements show that the library is getting used. To us, the future looks bright.
总结(Conclusions)
Boost.Python achieves seamless interoperability between two rich and complimentary language environments. Because it leverages template metaprogramming to introspect about types and functions, the user never has to learn a third syntax: the interface definitions are written in concise and maintainable C++. Also, the wrapping system doesn't have to parse C++ headers or represent the type system: the compiler does that work for us.
Computationally intensive tasks play to the strengths of C++ and are often impossible to implement efficiently in pure Python, while jobs like serialization that are trivial in Python can be very difficult in pure C++. Given the luxury of building a hybrid software system from the ground up, we can approach design with new confidence and power.
引用(Citations)
[VELD1995] T. Veldhuizen, "Expression Templates," C++ Report, Vol. 7 No. 5 June 1995, pp. 26-31. http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html
脚注(Footnotes)
[1] In retrospect, it seems that "thinking hybrid" from the ground up might have been better for the NLP system: the natural component boundaries defined by the pure python prototype turned out to be inappropriate for getting the desired performance and memory footprint out of the C++ core, which eventually caused some redesign overhead on the Python side when the core was moved to C++.
[2] We also have some reservations about driving all C++ testing through a Python interface, unless that's the only way it will be ultimately used. Any transition across language boundaries with such different object models can inevitably mask bugs.
[3] These features were expressed very differently in v1 of Boost.Python
|
|