Python's Flying Circus

Posted by cwright on 2008.02.22 @ 15:35

Filed under:

Python Logo

Lately I’ve been working on integrating (or, more accurately, attempting to integrate) the Python scripting language into some plugins for an application we develop plugins for. We’ve wrapped many libraries with varying levels of success, so this one wasn’t going to be much different. Or, so we thought.

Technically, we’re interested in “Embedding” Python – We have a native Objective-C application that needs to make use of the Python interpreter at various stages of execution. The Python scripts are user-supplied, and are exceptionally free-form: the functions they write can take variable numbers of inputs, produce variable numbers of outputs, and can use any Python modules they have installed.

Unfortunately, this makes the embedding process somewhat complicated: From Objective-C, we need to be able to parse the script, find all its functions, and get all those functions’ input and output parameters. From this data, we can expose the Python module’s interface usefully. Unfortunately, Python does not appear to allow this kind of introspection from the outside (and it’s questionable whether or not it’s even possible from the inside.)

In searching for information on this, Numerous Documents are Found that Needlessly Complicate or Obfuscate the difference between Embedding and Extending. As if it’s really that difficult (embed means “fix firmly and deeply in a surrounding mass” while extend means “cause to cover a larger area; make longer or wider”. From these obvious definitions, we can infer that “Embedding” means putting something inside, while “Extending” means adding functionality or abilities. Maybe it’s not so clear for non-native english speakers. I don’t know.)

Another annoying side-trip of this research was an overwhelmingly smug idea that almost reeks of Java. This side effect is the recorded smarmy discussion of why one should “Extend” rather than “Embed” (these reasons are then used to explain away why there isn’t any good documentation on actually embedding Python). The arguments go something like this:

  • Python’s so good, cross-platform, and flexible that it’s actually more cost-effective to Throw Out All Your C/C++/ObjC Code And Rewrite Everything In Python.
  • Embedding is so cumbersome to code, and so difficult compared to embedding Python in Python, that you should Throw Out All Your C/C++/ObjC Code And Rewrite Everything In Python.
  • High-performance code can be written in C, and then called from within Python’s runtime when you actually need to performance boost. Why not just Throw Out All Your C/C++/ObjC Code And Rewrite Everything In Python Except for the fast bits?
  • If you embed Python, you’ll annoy Python developers who can’t access the modules they’re used to using. Why not Throw Out All Your C/C++/ObjC Code And Rewrite Everything In Python so you don’t annoy your Python developers?

These arguments cover some pretty diverse ground there. Unfortunately, there are 2 fatals flaws in the list above.

First, embedded Python Can in fact use installed modules just like raw Python, so point 4 above is flagrantly incorrect. There are some suggestions that it’s simply annoying to develop in such an environment because namespaces are all strange (or wrong) and nothing works quite right. Guess What: Welcome To Plugin Development! It’s like these Python developers have never worked on real projects before or something.

Second, All of the above points assume that it’s possible to discard your entire code-base and rewrite everything. While this is technically possible, it’s not very likely when you don’t have access to the source of the application to be discarded (in our case). The unfortunate downside is that All The Listed “Solutions” Hinge On This One Idea.

What’s with these non-solutions from allegedly flexible languages? Even Objective-C, a compiled language, and the (totally undocumented) JavaVM-ObjC bridge, offer enough introspection to at least find methods and parameter counts without too much hassle. Maybe Python does this just as simply, but if so, no one’s talking about it…

An epilogue:

Python in Python does allow for a similar degree of introspection. Parameter types are not available (this is a change coming in Python 3 from what I am told) though, so it’s difficult to make a free-form interface, even internally.

In the 3rd to last paragraph, where I said “it’s like hese Python Developers have never worked on real projects”, I meant people who wrote documents I found or discussed things with me on IRC, AIM, or via e-mail, not the actual Python Developers (who were ahead of their time back when Python was first developed). In retrospect, this is an embarrassingly condescending attitude, but I feel it has some validity: There’s never an excuse for incomplete documentation, especially for longer-running, widely used technologies.

For an anecdotal demonstration, let’s compare two real-world scenarios:

First, I burned a bit over a week fighting with the python library to basically do sort of what I wanted. I couldn’t get introspection working from outside python space, and I couldn’t properly handle exceptions from outside python space either.

Second, on our weekly trip to the Yon Reptile Campaign (a 4 hour drive in a small VW Cabrio), smokris was able to, without internet access, make a functionally identical proof-of-concept application using Apple’s JavaVM bridge (which, as the article notes, is largely undocumented). With another hour in my hands, I was able to twist the runtime to tell us everything we needed to know except for parameter names: java class binaries don’t store this data, so it’s impossible to provide after compilation. A regrettable weakness, but also a sound demonstration of a technology that was designed to work in heterogeneous environments.