Customize Plone 5 default theme on the fly

  • 0

When I recently wrote about, how to reintroduce ploneCustom for Plone5 TTW (through the web) by yourself, I got some feedback that it was the wrong thing to do. And the correct way would always be to create your custom theme.

If you are ready to let the precious ploneCustom go, here's how to currently customize the default Barceloneta theme on the fly by creating a new custom theme.

Inherit a new theme from Barceloneta

So, let's customize a brand new Plone 5 site by creating a new theme, which inherits everything from Barceloneta theme, yet allows us to add additional rules and styles:

  1. Open Site Setup and Theming control panel.

  2. Create New theme, not yet activated, with title mytheme (or your own title, once you get the concept)

  3. In the opened theme editor, replace the contents of rules.xml with the following code:

    <?xml version="1.0" encoding="UTF-8"?>
    <rules
        xmlns="http://namespaces.plone.org/diazo"
        xmlns:css="http://namespaces.plone.org/diazo/css"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xi="http://www.w3.org/2001/XInclude">
    
      <!-- Import Barceloneta rules -->
      <xi:include href="++theme++barceloneta/rules.xml" />
    
      <rules css:if-content="#visual-portal-wrapper">
        <!-- Placeholder for your own additional rules -->
      </rules>
    
    </rules>
    
  4. Still in the theme editor, add New file with name styles.less and edit and Save it with the following content:

    /* Import Barceloneta styles */
    @import "++theme++barceloneta/less/barceloneta.plone.less";
    
    /* Customize navbar color */
    @plone-sitenav-bg: pink;
    @plone-sitenav-link-hover-bg: darken(pink, 20%);
    
    /* Customize navbar text color */
    .plone-nav > li > a {
      color: @plone-text-color;
    }
    
    /* Customize search button */
    #searchGadget_form .searchButton {
      /* Re-use mixin from Barceloneta */
      .button-variant(@plone-text-color, pink, @plone-gray-lighter);
    }
    
    /* Inspect Barceloneta theme (and its less-folder) for more... */
    

But before activating the new theme, there's one more manual step to do...

Register and build a new LESS bundle

We just created a new LESS file, which would import the main Barceloneta LESS file at first, and then add our own additional styles with using some features from LESS syntax. To actually make that LESS file into a usable CSS (through the browser), we need register a new bundle for it, and build it:

  1. Open Site Setup and Resource Registries control panel.

  2. Add resource with name mytheme and a single CSS/LESS file with path ++theme++mytheme/styles.less to locate the file we just added into our theme:

    http://2.bp.blogspot.com/-cUE7pFkPMhY/VWJsTaQYOhI/AAAAAAAAAps/HaW5g6OCNJY/s1600/resource.png
  3. Save.

  4. Add bundle with name mytheme, requiring mytheme resoure, which we just created and Does your bundle contain any RequireJS or LESS files? checked:

    http://1.bp.blogspot.com/-6sXxYJmR80o/VWJsTXQ86aI/AAAAAAAAApo/IQmHdiaWRrE/s1600/bundle.png
  5. Save.

  6. Build mytheme bundle.

Now you should be ready to return back to Theming control panel, activate the theme, and see the gorgeous pink navigation bar:

http://4.bp.blogspot.com/-PPj1JGOUNDY/VWJsTW6_76I/AAAAAAAAApw/K31MZDUf8-c/s1600/result.png

Note: To really be a good citizen and follow the rules, there's a few additional steps:

  1. Add production-css setting into your theme's manifest.cfg to point to the compiled CSS bundle:

    [theme]
    title = mytheme
    description =
    production-css = /++plone++static/mytheme-compiled.css
    
  2. In Resource Registries, disable mytheme bundle by unchecking its Enabled checkbox and clicking Save.

  3. Deactivate and activate the theme once.

Technically this changes the CSS bundle to be registered as a so called Diazo bundle instead of a regular bundle. The difference is that Diazo bundle is always rendered last and can therefore override any CSS rule introduced the other enabled bundles. Also, as a Diazo bundle it get disabled and enabled properly when the active gets changed.

ploneCustom for Plone 5

  • 0

No more custom skins folder with infamous ploneCustom in Plone 5, they said.

Well, they can take away the skins folder, but they cannot take away our ploneCustom. I know, that the recommended way of customizing Plone 5 is via a custom theme through the Theming control panel from Site Setup. Still, sometimes you only need to add a few custom rules on top of an existing theme and creating a completely new theme would feel like an overkill.

Meet the new resource registry

One of the many big changes in Plone 5 is the completely new way how CSS and JavaScript resources are managed. Plone 5 introduces a completely new Resource Registries control panel and two new concepts to manage CSS ja JavaScipt there: resources and resource bundles.

Resource is a single CSS / LESS file, a single JavaScript file, or one of both, to provide some named feature for Plone 5. For example, a new embedded JavaScript based applet could be defined as a resource containing both its JavaScript code and required CSS /LESS stylesheet. In addition to those single files, JavaScript-files can depend on named requirejs modules provided by the other resources. Also LESS files can include any amount of available other LESS files. (LESS is superset of CSS with some optional superpowers like hierarchical directives, variables or optimized includes.)

Resource Bundle is a composition of named resources, which is eventually built into a single JavaScript and/or CSS file to be linked with each rendered page. When the page is rendered, bundles are linked (using either script-tags or stylesheet link-tags) in an order depending on their mutual dependencies. Bundles can be disabled and they can have conditions, so bundles are somewhat comparable to the legacy resource registry registrations in Plone 4 and earlier.

http://1.bp.blogspot.com/-hlFLUGS_BKE/VVBfXKQweuI/AAAAAAAAAnQ/vRnVuyvKs_4/s1600/08_bundle_ploneCustom.png

Now that you should be familiar with the concepts, you can bring our precious ploneCustom back to life.

Defining the next generation ploneCustom

These steps will define a new ploneCustom bundle, which provides both a custom CSS (with LESS) and a custom JavaScript file to allow arbitrary site customizations without introducing a new theme.

Creating and editing

At first, you need to add the actual LESS and JavaScript files. Instead of the deprecated skins custom folder you can add them into your Plone 5 site by using the old friend, ZMI (Zope Management Interface).

If you are running evelopment site, please, open the following url: http://localhost:8080/Plone/portal_resources/manage_main

http://3.bp.blogspot.com/-PYwj1yQ1nys/VVB1FeC3xpI/AAAAAAAAAos/le7yakSqO_U/s1600/01_portal_resources.png

This portal_resources is the new database (ZODB) based storage for any kind of custom resources (introduced with the new Theming control panel in Plone 4.3). Its functionality is based on plone.resource, but right now you only need to know how to use it with Plone 5 resource registries.

  1. So, in portal_resources, add a new BTreeFolder2 with name plone:

    http://2.bp.blogspot.com/-lIvOEy0ZdDc/VVB10b7OlPI/AAAAAAAAAo8/8hBPrJpWcBY/s1600/02_portal_resources.png
  2. Then navigate into that folder (select plone and press Edit button) and add an another BTreeFolder2 with name custom and navigate into that folder until you are at portal_resources/plone/custom:

    http://4.bp.blogspot.com/-qFUjEt26Qk0/VVCqHDko4eI/AAAAAAAAApU/93SOfj89Dpk/s1600/03_portal_resources.png
  3. Now Add a new File named ploneCustom.js and another named ploneCustom.less:

    http://3.bp.blogspot.com/-fA4tg9R5L0U/VVBfVsmWP8I/AAAAAAAAAm8/jnsW9BZy4ys/s1600/04_portal_resources.png
  4. And, finally, you can navigate into those files (select and press Edit button) to edit and save them with your CSS and JavaScript:

    http://2.bp.blogspot.com/-EMwY36pL8jk/VVBfWRvazhI/AAAAAAAAAnA/p3gFbqRDGZo/s1600/06_portal_resources.png

    The example JavaScript above would only annoy to to tell that it works:

    jQuery(function($) {
        alert("Hello World!");
    });
    
    http://1.bp.blogspot.com/-atBQWKrV6g4/VVBfWAilnXI/AAAAAAAAAnI/O7icbR3b34o/s1600/05_portal_resources.png

    The example CSS above would replace the portal logo with a custom text:

    #portal-logo:before {
      display: inline-block;
      content: "My Plone Site";
      font-size: 300%;
    }
    #portal-logo img {
      display: none;
    }
    

    In addition to that, you could add a little bit extra to learn more. These following lines would re-use button classes from Bootstrap 3 resources shipped with with Plone 5 (beta). This is an example of how to use LESS to cherry pick just a few special CSS rules from Bootstrap 3 framework and apply them next to the currently active theme:

    @import (reference) "../++plone++static/components/bootstrap/less/bootstrap.less";
    #searchGadget_form .searchButton {
      &:extend(.btn);
      &:extend(.btn-success);
    }
    

Registering and enabling

To register the resource and add it into a bundle (or create a new one), go to Resource Registries control panel (e.g. at http://localhost:8080/@@resourceregistry-controlpanel). Click Add resource to show the add resource form and fill it like in the screenshot below:

http://2.bp.blogspot.com/-d91GJ1BmojY/VVBfW_h0I2I/AAAAAAAAAnM/AMbzV3kU1L0/s1600/07_resource_ploneCustom.png

Note that the strings ++plone++custom/ploneCustom.js and ++plone++custom/ploneCustom.less are actually relative (public) URLs for the resources you just added into portal_resources.

After saving the resoure by clicking Save, click Add bundle to create a new bundle for your ploneCustom-resource. Fill-in the opened form as follows:

http://1.bp.blogspot.com/-hlFLUGS_BKE/VVBfXKQweuI/AAAAAAAAAnQ/vRnVuyvKs_4/s1600/08_bundle_ploneCustom.png

Note that the bundle depends on Plone bundle. That makes it getting loaded only after Plone bundle, which includes jQuery, which our custom JavaScript code depends on. (Later you may wonder, why jQuery was not required with requirejs. That would also work and is recommended for other libraries, but currently you can rely on jQuery being globally available after Plone bundle has been loaded.)

When you have saved the new ploneCustom resource bundle, it will appear into the Bundles list on the left. The final step is to click the Build button below the ploneCustom bundle label in that list. That will open a popup model to overview the build progress.

http://4.bp.blogspot.com/-2VcdXwU9So0/VVBfXe21ZLI/AAAAAAAAAoQ/1YCnr6OrlDo/s1600/09_build_ploneCustom.png

Once the build is done, you can click Close and reload the page to see your new ploneCustom bundle being applied for your site:

http://3.bp.blogspot.com/-eIJ3ZhE-qqE/VVBfX2nylqI/AAAAAAAAAnY/rW6RIeUcntk/s1600/10_ploneCustom.png

Note how the Plone logo has been replaced with a custom text and the Search button has been style after Bootstrap 3 button styles. (Also, you should now have seen an annoying alert popup from your ploneCustom JavasScript.)

To modify your ploneCustom bundle, just go to edit the file and and return to Resource Registries control panel to click the Build button again.

Now you have your ploneCustom back in Plone 5. Congratulations!

P.S. Don't forget that you can also tweak (at least the default) Plone theme a lot from the Resource Registries control panel without ploneCustom bundle simply by changing theme's LESS variables and building Plone bundle.

EXTRA: TTW ReactJS App in Plone

The new Resource Registries may feel complex to begin with, but once you get used to them, they are blessing. Just define dependencies properly, and never again you need to order Plone CSS and JavaScript resources manually, and never again (well, once add-ons get update into this new configuration) should add-ons break your site by re-registering resources into broken order.

As an example, let's implement a ReactJS Hello World for Plone TTW using the new resource registry:

At first, you need to register ReactJS library as a resource. You could upload the library into portal_resources, but for a quick experiment you can also refer to a cloud hosted version (https://fb.me/react-0.13.3.js). So, go to Resource Registries control panel and Add resource with the following details:

http://1.bp.blogspot.com/-tUd-UQ7KCws/VVBfYBYAlBI/AAAAAAAAAns/nz5T8qHEwvI/s1600/11_resource_reactjs.png

Note how the library is defined to be wrapped for requirejs with name react013. (Plone 5 actually ships with ReactJS library, but because the version in the first beta is just 0.10, we need to add newer version with a version specific name.)

Next, go to portal_resources/plone/custom/manage_main as before and add a new file called reactApp.js with the following ReactJS Hello World as its contents:

define([
  'react013',
], function(React) {

'use strict';

var ExampleApplication = React.createClass({
  render: function() {
    var elapsed = Math.round(this.props.elapsed  / 100);
    var seconds = elapsed / 10 + (elapsed % 10 ? '' : '.0' );
    var message = 'React has been successfully running for ' + seconds + ' seconds.';
    return React.createElement("p", null, message);
  }
});

var start = new Date().getTime();

setInterval(function() {
  React.render(
    React.createElement(ExampleApplication, {elapsed: new Date().getTime() - start}),
    document.getElementById('portal-logo')
  );
}, 50);

return ExampleApplication;

});

jQuery(function($) {
  require(['reactApp']);
});

Note how ReactJS is required as react013, and how the example application is required as reactApp at the bottom (using jQuery onLoad convention).

Of course, also reactApp must be defined as a new resource at Resource Registries control panel. It should depend on previously added resource react013 being wrapped for requirejs and export itself for requirejs as reactApp:

http://4.bp.blogspot.com/-6-0GxcKsJro/VVBfZXZBv7I/AAAAAAAAAn0/FRx_z_NSWd0/s1600/13_resource_reactApp.png

Finally, you can Add bundle for this example reactApp:

http://4.bp.blogspot.com/-oP5-me9bnVM/VVBfYbKdgBI/AAAAAAAAAnk/bxv6UK82H6k/s1600/12_bundle_reactApp.png

And after Save, Build the bundle from the button below the new bundle name in Bundles list:

http://2.bp.blogspot.com/-zGc9aH7HD68/VVBfZo4ZOBI/AAAAAAAAAoA/7NmP9kYmT_4/s1600/14_build_reactApp.png

Note that, because the cloud hosted ReactJS library was used, the new bundle contains only the code from reactApp.js and requirejs will require ReactJS from the cloud on-demand. If you would have added the library into portal_resources, it would have been included in the resulting bundle.

After page reload, your ReactJS Hello World should be alive:

http://1.bp.blogspot.com/-x6gPspdZdro/VVBfZ9IL1AI/AAAAAAAAAn8/8jO9TWbdAkY/s1600/15_reactApp.png

Transmogrifier, the Python migration pipeline, also for Python 3

  • 0

TL;DR; I forked collective.transmogrifier into just transmogrifier (not yet released) to make its core usable without Plone dependencies, use Chameleon for TAL-expressions, installable with just pip install and compatible with Python 3.

Transmogrifier is one of the many great developer tools by the Plone community. It's a generic pipeline tool for data manipulation, configurable with plain text INI-files, while new re-usable pipeline section blueprints can be implemented and packaged in Python. It could be used to process any number of things, but historically it's been mainly developed and used as a pluggable way to import legacy content into Plone.

A simple transmogrifier pipeline for dumping news from Slashdot to a CSV file could look like:

[transmogrifier]
pipeline =
    from_rss
    to_csv

[from_rss]
blueprint = transmogrifier.from
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/slashdot/slashdot

[to_csv]
blueprint = transmogrifier.to_csv
fieldnames =
    title
    link
filename = slashdot.csv

Actually, in time of writing this, I've yet to do any Plone migrations using transmogrifier. But when we recently had a reasonable size non-Plone migration task, I knew not to re-invent the wheel, but to transmogrify it. And we succeeded. Transmogrifier pipeline helped us to design the migration better, and splitting data processing into multiple pipeline sections helped us to delegate the work between multiple developers.

Unfortunately, currently collective.transmogrifier has unnecessary dependencies on CMFCore, is not installable without long known good set of versions and is missing any built-int command-line interface. At first, I tried to do all the necessary refactoring inside collective.transmogrifier, but eventually a fork was required to make the transmogrifier core usable outside Plone-environments, be compatible with Python 3 and to not break any existing workflows depending on the old transmogrifier.

So, meet the new transmogrifier:

  • can be installed with pip install (although, not yet released at PyPI)
  • new mr.migrator inspired command-line interface (see transmogrif --help for all the options)
  • new base classes for custom blueprints
    • transmogrifier.blueprints.Blueprint
    • transmogrifier.blueprints.ConditionalBlueprint
  • new ZCML-directives for registering blueprints and re-usable pipelines
    • <transmogrifier:blueprint component="" name="" />
    • <transmogrifier:pipeline id="" name="" description="" configuration="" />
  • uses Chameleon for TAL-expressions (e.g. in ConditionalBlueprint)
  • has only a few generic built-in blueprints
  • supports z3c.autoinclude for package transmogrifier
  • fully backwards compatible with blueprints for collective.transmogrifier
  • runs with Python >= 2.6, including Python 3+

There's still much work to do before a real release (e.g. documenting and testing the new CLI-script and new built-in blueprints), but let's still see how it works already...

P.S. Please, use a clean Python virtualenv for these examples.

Example pipeline

Let's start with an easy installation

$ pip install git+https://github.com/datakurre/transmogrifier
$ transmogrify --help
Usage: transmogrify <pipelines_and_overrides>...
                [--overrides=overrides.cfg>]
                [--include=package_or_module>...]
                [--include=package:filename>...]
                [--context=<package.module.factory>]
   transmogrify --list
                [--include=package_or_module>...]
   transmogrify --show=<pipeline>
                [--include=package_or_module>...]

and with example filesystem pipeline.cfg

[transmogrifier]
pipeline =
    from_rss
    to_csv

[from_rss]
blueprint = transmogrifier.from
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/slashdot/slashdot

[to_csv]
blueprint = transmogrifier.to_csv
fieldnames =
    title
    link
filename = slashdot.csv

and its dependencies

$ pip install feedparser

and the results

$ transmogrify pipeline.cfg
INFO:transmogrifier:CSVConstructor:to_csv wrote 25 items to /.../slashdot.csv

using, for example, Python 2.7 or Python 3.4.

Minimal migration project

Let's create an example migration project with custom blueprints using Python 3. In addition to transmogrifier, we need venusianconfiguration for easy blueprint registration and, of course, actual depedencies for our blueprints:

$ pip install git+https://github.com/datakurre/transmogrifier
$ pip install git+https://github.com/datakurre/venusianconfiguration
$ pip install fake-factory

Now we can implement custom blueprints in, for example, blueprints.py

from venusianconfiguration import configure

from transmogrifier.blueprints import Blueprint
from faker import Faker


@configure.transmogrifier.blueprint.component(name='faker_contacts')
class FakerContacts(Blueprint):
    def __iter__(self):
        for item in self.previous:
            yield item

        amount = int(self.options.get('amount', '0'))
        fake = Faker()

        for i in range(amount):
            yield {
                'name': fake.name(),
                'address': fake.address()
            }

and see them registered next to the built-in ones (or from the other packages hooking into transmogrifier autoinclude entry-point):

$ transmogrify --list --include=blueprints

Available blueprints
--------------------
faker_contacts
...

Now, we can make an example pipeline.cfg

[transmogrifier]
pipeline =
    from_faker
    to_csv

[from_faker]
blueprint = faker_contacts
amount = 2

[to_csv]
blueprint = transmogrifier.to_csv

and enjoy the results

$ transmogrify pipeline.cfg to_csv:filename=- --include=blueprints
address,name
"534 Hintz Inlet Apt. 804
Schneiderchester, MI 55300",Dr. Garland Wyman
"44608 Volkman Islands
Maryleefurt, AK 42163",Mrs. Franc Price DVM
INFO:transmogrifier:CSVConstructor:to_csv saved 2 items to -

An alternative would be to just use the shipped mr.bob-template...

Migration project using the template

The new transmogrifier ships with an easy getting started template for your custom migration project. To use the template, you need a Python environment with mr.bob and the new transmogrifier:

$ pip install mr.bob readline  # readline is an implicit mr.bob dependency
$ pip install git+https://github.com/datakurre/transmogrifier

Then you can create a new project directory with:

$ mrbob bobtemplates.transmogrifier:project

Once the new project directory is created, inside the directory, you can install rest of the depdendencies and activate the project with:

$ pip install -r requirements.txt
$ python setup.py develop

Now transmogrify knows your project's custom blueprints and pipelines:

$ transmogrify --list

Available blueprints
--------------------
myprojectname.mock_contacts
...

Available pipelines
-------------------
myprojectname_example
    Example: Generates uppercase mock addresses

And the example pipeline can be executed with:

$ transmogrify myprojectname_example
name,address
ISSAC KOSS I,"PSC 8465, BOX 1625
APO AE 97751"
TESS FAHEY,"PSC 7387, BOX 3736
APO AP 13098-6260"
INFO:transmogrifier:CSVConstructor:to_csv wrote 2 items to -

Please, see created README.rst for how to edit the example blueprints and pipelines and create more.

Mandatory example with Plone

Using the new transmogrifier with Plone should be as simply as adding it into your buildout.cfg next to the old transmogrifier packages:

[buildout]
extends = http://dist.plone.org/release/4.3-latest/versions.cfg
parts = instance plonesite
versions = versions

extensions = mr.developer
soures = sources
auto-checkout = *

[sources]
transmogrifier = git https://github.com/datakurre/transmogrifier

[instance]
recipe = plone.recipe.zope2instance
eggs =
    Plone
    z3c.pt
    transmogrifier
    collective.transmogrifier
    plone.app.transmogrifier
user = admin:admin
zcml = plone.app.transmogrifier

[plonesite]
recipe = collective.recipe.plonesite
site-id = Plone
instance = instance

[versions]
setuptools =
zc.buildout =

Let's also write a fictional migration pipeline, which would create Plone content from Slashdot RSS-feed:

[transmogrifier]
pipeline =
    from_rss
    id
    fields
    folders
    create
    update
    commit

[from_rss]
blueprint = transmogrifier.from
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/Slashdot/slashdot

[id]
blueprint = transmogrifier.set
modules = uuid
id = python:str(modules['uuid'].uuid4())

[fields]
blueprint = transmogrifier.set
portal_type = string:Document
text = path:item/summary
_path = string:slashdot/${item['id']}

[folders]
blueprint = collective.transmogrifier.sections.folders

[create]
blueprint = collective.transmogrifier.sections.constructor

[update]
blueprint = plone.app.transmogrifier.atschemaupdater

[commit]
blueprint = transmogrifier.to_expression
modules = transaction
expression = python:modules['transaction'].commit()
mode = items

Now, the new CLI-script can be used together with bin/instance -Ositeid run provided by plone.recipe.zope2instance so that transmogrifier will get your site as its context simply by calling zope.component.hooks.getSite:

$ bin/instance -OPlone run bin/transmogrify pipeline.cfg --context=zope.component.hooks.getSite

With Plone you should, of course, still use Python 2.7.

Funnelweb example with Plone

Funnelweb is a collection of transmogrifier blueprints an pipelines for scraping any web site into Plone. I heard that its example pipelines are a little outdated, but they make a nice demo anywyay.

Let's extend our previous Plone-example with the following funnelweb.cfg buildout to include all the necessary transmogrifier blueprints and the example funnelweb.ttw pipeline:

[buildout]
extends = buildout.cfg

[instance]
eggs +=
    transmogrify.pathsorter
    funnelweb

We also need a small additional pipeline commit.cfg to commit all the changes made by funnelweb.ttw:

[transmogrifier]
pipeline = commit

[commit]
blueprint = transmogrifier.interval
modules = transaction
expression = python:modules['transaction'].commit()

Now, after the buildout has been run, the following command would use pipelines funnelweb.ttw and commit.cfg to somewhat scrape my blog into Plone:

$ bin/instance -OPlone run bin/transmogrify funnelweb.ttw commit.cfg crawler:url=http://datakurre.pandala.org "crawler:ignore=feeds\ncsi.js" --context=zope.component.hooks.getSite

For tuning the import further, the used pipelines could be easily exported into filesystem, customized, and then executed similarly to commit.cfg:

$ bin/instance -OPlone run bin/transmogrify --show=funnelweb.ttw > myfunnelweb.cfg

Too many ways to do async tasks with Plone

  • 0

Triggering asynchronous tasks from Plone is hard, we hear. And that's actually quite surprising, given that, from its very beginning, Plone has been running on top of the first asynchronous web server written in Python, medusa.

Of course, there exist many, too many, different solutions to run asynchronous task with Plone:

  • plone.app.async is the only one in Plone-namespace, and probably the most criticized one, because of using ZODB to persist its task queue
  • netsight.async on the other hand being simpler by just executing the the given task outside Zope worker pool (but requiring its own database connection).
  • finally, if you happen to like Celery, Nathan Van Gheem is working on a simple Celery-integration, collective.celery, based on an earlier work by David Glick.

To add insult to injury, I've ended up developing a more than one method more, because of, being warned about plone.app.async, being hit hard by the opinionated internals of Celery, being unaware of netsight.async, and because single solution has not fit all my use cases.

I believe, my various use cases can mostly be fit into these categories:

  • Executing simple tasks with unpredictable execution time so that the execution cannot block all of the valuable Zope worker threads serving HTTP requests (amount of threads is fixed in Zope, because ZODB connection cached cannot be shared between simultaneous requests and one can afford only so much server memory per site).

    Examples: communicating to external services, loading an external RSS feed, ...

  • Queueing a lot of background tasks to be executed now or later, because possible results can be delivered asynchronously (e.g. user can return to see it later, can get notified about finished tasks, etc), or when it would benefit to be able to distribute the work between multiple Zope worker instances.

    Examples: converting files, encoding videos, burning PDFs, sending a lot of emails, ...

  • Communicating with external services.

    Examples: integration between sites or different systems, synchronizing content between sites, performing migrations, ...

For further reading about all the possible issues when queing asynchronous tasks, I'd recommend Whichert Akkermans' blog post about task queues.

So, here's the summary, from my simpliest approach solution to enterprise messaging with RabbitMQ:

ZPublisher stream iterator workers

class MyView(BrowserView):

    def __call__(self):
        return AsyncWorkerStreamIterator(some_callable, self.request)

I've already blogged earlier in detail about how to abuse ZPublisher's stream iterator interface to free the current Zope worker thread and process the current response outside Zope worker threads before letting the response to continue its way towards the requesting client (browser).

An example of this trick is a yet another zip-export add-on collective.jazzport. It exports Plone-folders as zip-files by downloading all those to-be-zipped files separately simply through ZPublisher (or, actually, using site's public address). It can also download files in parallel to use all the available load balanced instances. Yet, because it downloads files only after freeing the current Zope worker instance, it should not block any worker thread by itself (see its browser.py, and iterators.py).

There are two major limitations for this approach (common to all ZPublisher stream iterators):

  • The code should not access ZODB after the worker thread has been freed (unless a completely new connection with new cache is created).
  • This does not help installations with HAProxy or similar front-end proxy with fixed allowed simultaneous requests per Zope instance.

Also, of course, this is not real async, because it keeps the client waiting until the request is completed and cannot distribute work between Zope instances.

collective.futures

class MyView(BrowserView):

    def __call__(self):
        try:
            return futures.result('my_unique_key')
        except futures.FutureNotSubmittedError:
            futures.submit('my_unique_key', some_callable, 'foo', 'bar')
            return u'A placeholder value, which is never really returned.'

collective.futures was the next step from the previous approach. It provides a simple API for registering multiple tasks (which does not need to access ZODB) so that they will be executed outside the current Zope worker thread.

Once all the registered tasks have been executed, the same request will be queued for ZPublisher to be processed again, now with the responses from those registered tasks.

Finally, the response will be returned for the requesting like with any other requests.

collective.futures has the same issues as the previous approach (used in collective.jazzport), and it may also waste resources by processing certain parts of the request twice (like publish traverse).

We use this, for example, for loading external RSS feeds so that the Zope worker threads are freed to process other requests while we are waiting the external services to return us those feeds.

collective.taskqueue

class MyView(BrowserView):

    def __call__(self):
        taskqueue.add('/Plone/path/to/some/other/view')
        return u'Task queued, and a better view could now display a throbber.'

collective.taskqueue should be a real alternative for plone.app.async and netsight.async. I see it as a simple and opinionated sibling of collective.zamqp, and it should be able to handle all the most basic asynchrnous tasks where no other systems are involved.

collective.taskqueue provides one or more named asynchronously consumed task queues, which may contain any number of tasks: asynchronously dispatched simple requests to any traversable resources in Plone.

With out-of-the-box Plone (without any other add-ons or external services) it provides instance local volatile memory based task queues, which are consumed by the other one of the default two Zope worker threads. With redis, it supports persistent task queues with quaranteed delivery and distributed consumption. For example, you could have dedicated Plone instances to only consume those shared task queues from Redis.

To not sound too good to be true, collective.taskqueue does not have any nind of monitoring of the task queues out-of-the-box (only a instance-Z2.log entry with resulted status code for each consumed task is generated).

collective.zamqp

class MyView(BrowserView):

    def __call__(self):
        producer = getUtility(IProducer, name='my.asyncservice')
        producer.register()  # bind to successful transaction
        producer.publish({'title': u'My title'})
        return u'Task queued, and a better view could now display a throbber.'

Finally, collective.zamqp is a very flexible asynchronous framework and RabbitMQ integration for Plone, which I re-wrote from affinitic.zamqp before figuring out any of the previous approaches.

As the story behind it goes, we did use affinitic.zamqp at first, but because of its issues we had to start rewrite to make it more stable and compatible with newer AMQP specifications. At first, I tried to built it on top of Celery, then on top of Kombu (transport framework behind Celery), but at the end it had to be based directly on top of pika (0.9.4), a popular Python AMQP library. Otherwise it would have been really difficult to benefit from all the possible features of RabbitMQ and be compatible with other that Python based services.

collective.zamqp is best used for configuring and executing asynchronous messaging between Plone sites, other Plone sites and other AMQP-connected services. It's also possible to use it to build frontend messaging services (possibly secured using SSL) with RabbitMQ's webstomp server (see the chatbehavior-example). Yet, it has a few problems of its own:

  • it depends on five.grok
  • it's way too tighly integrated with pika 0.9.5, which makes upgrading the integration more difficult than necessary (and pika 0.9.5 has a few serious bugs related to synchronous AMQP connections, luckily not requird for c.zamqp)
  • it has a quite bit of poorly documented magic in how to use it to make all the possible AMQP messaging configurations.

collective.zamqp does not provide monitoring utilities of its own (beyond very detailed logging of messaging events). Yet, the basic monitoring needs can be covered with RabbitMQ's web and console UIs and RESTful APIs, and all decent monitoring tools should have their own RabbitMQ plugins.

For more detailed examples of collective.zamqp, please, see my related StackOverflow answer and our presentation from PloneConf 2012 (more examples are linked from the last slide).

Nix expressions as executable commands

  • 0

Updated 2014-09-24: I learned that in a mixed (OSX and nixpkgs) environment, one should not set LD_LIBRARY_PATH, but fix dynamic linking to use absolute paths. Yet, I refactored my wrapper to use myEnvFun when required (see the buildout example).

Updated 2014-09-22: I was wrong about, how nix-built Python environments could be used together with buildout and updated this post to reflect my experiences.

My main tools for Python based software development have been virtualenv and buildout for a long time. I've used virtualenv for providing isolated Python installation (separate from so often polluted system python) and buildout for managing both the required Python packages, developed packages, and supporting software (like Redis or memcached).

Basically everything still works, but:

  • Managing clean Python virtualenvs for only to avoid possible conflicts with system installed packages feels a lot of work with a small return.
  • Remembering to activate and deactivate the correct Python virtualenv is not fun either.
  • Also, while buildout provides excellent tool (mr.developer) for managing sources for all the project packages, it's far from optimal for building and managing supporting (not Python) software.

I've also using quite a bit of Vagrant and Docker, but, because I'm mostly working on Mac, those require a VM, which makes them much less convenient.

About Nix

I believe, I heard about Nix package manager from Rok at the first time at Barcelona Plone Testing Sprint in the early 2013. It sounded a bit esoteric and complex back then, but after about about twenty months of more virtualenvs, buildouts, Vagrantfiles, Docker containers and puppet manifests... not so much anymore.

Currently, outside NixOS, I understand Nix as

  1. a functional language for describing configuration of software and
  2. a package manager for managing those configurations.

From my own experience, the easiest way to get familiar with Nix is to follow Domen's blog post about getting started with Nix package manager. But to really make it a new tool to your toolbox, you should learn to write your own Nix expressions.

Even the the most common way to use the Nix package manager is to install Nix expressions into your current environment with nix-env, the expressions can also be used without really installing them, in a quite stateless way.

I'm not sure how proper use of Nix this is, but it seems to work for me.

(Yes, I'm aware of myEnvFun, for creating named stateful development environments with Nix expressions, but here I'm trying to use Nix in a more stateless, Docker-inspired way.)

Nix expressions as virtualenv replacements

It's almost never safe to install a Python software directly into your system Python. Different software may require different versions of same libraries and sooner than later the conflicting requirements break your Python installation.

Let's take a small utility software called i18ndude as an example of such software with way too frightening dependencies for any system Python. Traditionally, you could install it into a separate Python virtualenv and use it with the following steps:

$ virtualenv ~/.../i18ndude-env
$ source ~/.../i18ndude-env/bin/activate
$ pip install i18ndude
$ i18ndude
...
$ deactivate

With an executable Nix expression, I can call it in a stateless way with simply executing the expression:

$ ./i18ndude.nix
➜ /nix/store/gjhzw843qs1736r0qcd9mz69247g4svb-python2.7-i18ndude-3.3.5/bin/i18ndude
usage: i18ndude [-h]
                {find-untranslated,rebuild-pot,merge,sync,filter,admix,list,trmerge}
                ...
18ndude: error: too few arguments

Maybe even better, I can install the expression into my default Nix environment with

$ nix-env -i -f i18ndude.nix

and use it like it would have been installed into my system Python in the first place (but this time without polluting it):

$ i18ndude.nix
usage: i18ndude [-h]
                {find-untranslated,rebuild-pot,merge,sync,filter,admix,list,trmerge}
                ...
18ndude: error: too few arguments

No more activating or deactivating virtualenvs, not to mention about needing to remember their names or locations.

For the most common Python softare, it's not required to write your own expression, but you could simply install the contributed expressions directly from Nix packages repository.

The easiest way to check for existing expressions from nixpkgs Python packages seems to be grepping the package list with nix-env -qaP \*|grep something.

If you'd like to see more packages available by default, you can contribute them to upstream with a simple pull request.

Anyway, since i18ndude was not yet available in time of writing (although, most of its dependencies were), this is how my expression for it looked like:

#!/usr/bin/env nix-exec
with import <nixpkgs> { };

let dependencies = rec {
  ordereddict = buildPythonPackage {
    name = "ordereddict-1.1";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/o/ordereddict/ordereddict-1.1.tar.gz";
      md5 = "a0ed854ee442051b249bfad0f638bbec";
    };
  };
};

in with dependencies; rec {
  i18ndude = buildPythonPackage {
    name = "i18ndude-3.3.5";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/i/i18ndude/i18ndude-3.3.5.zip";
      md5 = "ef599b1c64eaabba4049fcd2b027ba21";
    };
    propagatedBuildInputs = [
      ordereddict
      python27Packages."zope.tal-3.5.2"
      python27Packages."plone.i18n-2.0.9"
    ];
  };
}

Nix expression for nix-exec shell wrapper

Of course, Nix expressions are not executable by default. To get them work as I wanted, I had to create tiny wrapper script to be used as the hash-bang line #!/usr/bin/env nix-exec of executable expressions.

The script simply calls nix-build and then the named executable from the build output directory (with some standard environment variables set). To put it another way, the wrapper script translates the following command:

$ ./i18ndude.nix --help

into

$ `nix-build --no-out-link i18ndude.nix`/bin/i18ndude --help

It's not required to suffix the expressions fiels with .nix, but they could also be named without suffix to look more like real commands.

The wrapper script itself, of course, can be installed from a Nix expression into your default Nix environment with nix-env -i -f filename.nix:

with import <nixpkgs> { };

stdenv.mkDerivation {
  name = "datakurre-nix-exec-1.2.1";

  builder = builtins.toFile "builder.sh" "
    source $stdenv/setup
    mkdir -p $out/bin
    echo \"#!/bin/bash
build=\\`nix-build --no-out-link \\$1\\`
if [ \\$build ]; then

  MY_TZ=\\\"\\$TZ\\\"
  MY_PATH=\\\"\\$build/bin:\\$build/sbin:\\$build/libexec:\\$PATH\\\"
  MY_http_proxy=\\\"\\$http_proxy\\\"
  MY_ftp_proxy=\\\"\\$http_proxy\\\"

  if [ -d \\$build/dev-envs ]; then
    source \\\"\\$build/dev-envs/\\\"*

    export TZ=\\\"\\$MY_TZ\\\"
    export PATH=\\\"\\$PATH:\\$MY_PATH\\\"
    export http_proxy=\\\"\\$MY_http_proxy\\\"
    export ftp_proxy=\\\"\\$MY_ftp_proxy\\\"

    export CFLAGS=\\`echo \\$NIX_CFLAGS_COMPILE|sed 's/-isystem /-I/g'\\`
    export LDFLAGS=\\$NIX_LDFLAGS
  else
    export PATH=\\\"\\$MY_PATH\\\"
  fi

  cmd=\\$\{1##*/\}; cmd=\\$\{cmd%%@*\}; cmd=\\$\{cmd%.nix\}
  paths=(\\\"\\$build/bin\\\" \\\"\\$build/sbin\\\" \\\"\\$build/libexec\\\")
  for path in \\\"\\$\{paths[@]\}\\\"; do
    if [ -f \\\"\\$path/\\$\{cmd\}\\\" ]; then
      cmd=\\\"\\$path/\\$\{cmd\}\\\"
      break
    fi
  done

  if [ -t 1 ]; then echo \\\"➜\\\" \\$cmd \\\"\\$\{@:2\}\\\"; fi
  \\\"\\$cmd\\\" \\\"\\$\{@:2\}\\\"
fi
\" > $out/bin/nix-exec
    chmod a+x $out/bin/nix-exec
  ";
}

The wrapper does execute the expression defined command in a fully clean environment (the only isolation is the one myEnvFun provides), but mostly prepends everything defined by the expression into its surrounding execution environment (so that its paths are preferred over the versions in the current environment).

A mostly positive side effect from using Nix expressions like this (only building them, but not installing them into any environment) is that they can be cleaned (from the disk) anytime with simply:

$ nix-collect-garbage

Nix expressions with buildout

Update 2014-09-24: The example was updated to use myEnvFun to simplify the wrapper script.

Update 2014-09-22: I originally covered Nix expressions with buildout as an example of replacing Python virtualenvs with Nix. Unfortunately, because of some buildout limitations that didn't work out as I expected...

A very special case of Python development environment is the one with buildout, which is required e.g. for all development with Plone.

When using Nix expressions with buildout, there is a one very special limitation: buildout does not support any additional Python packages installed into your Nix expression based environment.

That's because buildout sees Nix defined Python as a system Python, and buildout does its best to prevent any extra packages installed into system Python being available for the buildout by default.

An additional issue for buildout is that the extra Python packages defined in Nix expression are not installed directly into under the Python installation, but are made available only when that Python is executed through a specialc Nix generated wrapper.

But to cut this short, here's an example executable Nix expression, which could be used as a Plone-compatible Python environment. It includes a clean Python installation with some additional (non-Python) libraries required by Plone buildout to be able to compile a few special Python packages (like Pillow, lxml and python-ldap):

#!/usr/bin/env nix-exec
with import <nixpkgs> { };

let dependencies = rec {
  buildInputs = [
    cyrus_sasl
    openldap
    libxslt
    libxml2
    freetype
    libpng
    libjpeg
    python27Full
  ];
};

in with dependencies; buildEnv {
  name = "nix";
  paths = [(myEnvFun { name = "nix"; inherit buildInputs; })] ++ buildInputs;
}

With this Nix expression named as an executable ./python.nix, it could be used to execute buildout's bootstrap, buildout and eventually launching the Plone site like:

$ ./python.nix bootstrap.py
$ ./python.nix bin/buildout  # or ./python.nix -S bin/buildout
$ ./python.nix bin/instance fg

I must agree that this is not as convenient as it should be, because each command (bootstrap, buildout and the final buildout generated script) must be executed explicitly using our executable Nix expression defining the required Python-environment.

Also, probably because my wrapper does not completely isolate the Nix expression call from its surrounding environment, sometimes it's required to call the buildout with -S given for the Python expression, like ./python.nix -S bin/buildout (otherwise buildout does not find it's own bootstrapped installation).

On the other hand, this approach defines the execution environment explicitly and statelessly for each call.

P.S. Because I'm working with RHEL systems, it's nice to use Python with similar configuration with those. With Nix, it's easy to define local overrides for existing packages (nixpkgs derivations) there's a special function with only the required configuration changes. The following ~/.nixpkgs/config.nix-example configures Python with similar unicode flag to RHEL's native Python:

{
  packageOverrides = pkgs : with pkgs; rec {
    python27 = pkgs.python27.overrideDerivation (args: {
      configureFlags = "--enable-shared --with-threads --enable-unicode=ucs4";
    }) // { modules = pkgs.python27.modules; };
  };
}

Nix expressions as stateless development environments

Updated 2014-09-24: Because of OSX, I had to fix openldap expression to fix link one library with absolute path to not allow it to resolve an OSX library instead of the Nix built one.

In test driven development, the whole development environment can be built just around the selected test runner.

Here's an example Nix expression, which saved as an executable file called ./py.test can be used to execute pytest test runner with a couple of selected plugins and all the dependencies required by the tested software in question:

#!/usr/bin/env nix-exec
with import <nixpkgs> { };

let dependencies = rec {
  execnet = buildPythonPackage {
    name = "execnet-1.2.0";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/e/execnet/execnet-1.2.0.tar.gz";
      md5 = "1886d12726b912fc2fd05dfccd7e6432";
    };
    doCheck = false;
  };
  pycparser = buildPythonPackage {
    name = "pycparser-2.10";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/p/pycparser/pycparser-2.10.tar.gz";
      md5 = "d87aed98c8a9f386aa56d365fe4d515f";
    };
  };
  cffi = buildPythonPackage {
    name = "cffi-0.8.6";
    src = fetchurl {
      url = "http://pypi.python.org/packages/source/c/cffi/cffi-0.8.6.tar.gz";
      md5 = "474b5a68299a6f05009171de1dc91be6";
    };
    propagatedBuildInputs = [ pycparser ];
  };
  pytest_cache = buildPythonPackage {
    name = "pytest-cache-1.0";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/p/pytest-cache/pytest-cache-1.0.tar.gz";
      md5 = "e51ff62fec70a1fd456d975ce47977cd";
    };
    propagatedBuildInputs = [
       python27Packages.pytest
       execnet
    ];
  };
  pytest_flakes = buildPythonPackage {
    name = "pytest-flakes-0.2";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/p/pytest-flakes/pytest-flakes-0.2.zip";
      md5 = "44b8f9746fcd827de5c02f14b01728c1";
    };
    propagatedBuildInputs = [
       python27Packages.pytest
       python27Packages.pyflakes
       pytest_cache
    ];
  };
  pytest_pep8 = buildPythonPackage {
    name = "pytest-pep8-1.0.6";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/p/pytest-pep8/pytest-pep8-1.0.6.tar.gz";
      md5 = "3debd0bac8f63532ae70c7351e73e993";
    };
    propagatedBuildInputs = [
      python27Packages.pytest
      python27Packages.pep8
      pytest_cache
    ];
  };
  buildInputs = [
    (python27Packages.pytest.override {
      propagatedBuildInputs = [
        python27Packages.readline
        python27Packages.plumbum
        python27Packages.py
        pytest_flakes
        pytest_pep8
      ];
    })
    (lib.overrideDerivation openldap (args: {
      postBuild = if stdenv.isDarwin then ''
        install_name_tool -change /libsasl2.dylib ${cyrus_sasl}/lib/libsasl2.dylib servers/slapd/slapadd
     '' else null;
    }))
  ];
};

in with dependencies; buildEnv {
  name = "nix";
  paths = [(myEnvFun { name = "nix"; inherit buildInputs; })] ++ buildInputs;
}

In other words, this expression could work as a stateless environment for developing the product in question:

$ ./py.test
➜ /nix/store/a2w3hwc66gqm6bncic8km6b69lw2byc6-py.test/bin/py.test
================================== test session starts ==================================
platform darwin -- Python 2.7.8 -- pytest-2.5.1
plugins: flakes, cache, pep8
collected 2 items

src/.../tests/test_things.py ..
=============================== 2 passed in 0.22 seconds ================================

And, once the development is completed, an another expression could be defined for using the developed product.

Nix expression for Robot Framework test runner

Finally, as a bonus, here's an expression, which configures a Python environment with Robot Framework and its Selenium2Library with PhantomJS:

#!/usr/bin/env nix-exec
with import <nixpkgs> { };

let dependencies = rec {
  docutils = buildPythonPackage {
    name = "docutils-0.12";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/d/docutils/docutils-0.12.tar.gz";
      md5 = "4622263b62c5c771c03502afa3157768";
    };
  };
  selenium = buildPythonPackage {
    name = "selenium-2.43.0";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/s/selenium/selenium-2.43.0.tar.gz";
      md5 = "bf2b46caa5c1ea4b68434809c695d69b";
    };
  };
  decorator = buildPythonPackage {
    name = "decorator-3.4.0";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/d/decorator/decorator-3.4.0.tar.gz";
      md5 = "1e8756f719d746e2fc0dd28b41251356";
    };
  };
  robotframework = buildPythonPackage {
    name = "robotframework-2.8.5";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/r/robotframework/robotframework-2.8.5.tar.gz";
      md5 = "2d2c6938830f71a6aa6f4be32227997f";
    };
    propagatedBuildInputs = [
      docutils
    ];
  };
  robotframework-selenium2library = buildPythonPackage {
    name = "robotframework-selenium2library-1.5.0";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/r/robotframework-selenium2library/robotframework-selenium2library-1.5.0.tar.gz";
      md5 = "07c64a9e183642edd682c2b79ba2f32c";
    };
    propagatedBuildInputs = [
      robotframework
      decorator
      selenium
    ];
  };
};

in with dependencies; buildEnv {
  name = "pybot";
  paths = [
    phantomjs
    (robotframework.override {
      propagatedBuildInputs = [ robotframework-selenium2library ];
    })
  ];
}

Since you may need differently configured Robot Framework installations (with different add-on keyword libraries installed) for different projects, this should be a good fit as an executable Nix expression:

$ ./pybot.nix
➜ /nix/store/q15bimgng25qcxkq2q10finyk0n6qkm2-pybot/bin/pybot
[ ERROR ] Expected at least 1 argument, got 0.

Try --help for usage information.

Asynchronous stream iterators and experimental promises for Plone

  • 0

This post may contain traces of legacy Zope2 and Python 2.x.

Some may think that Plone is bad in concurrency, because it's not common to deployt it with WSGI, but run it on top of a barely known last millennium asynchronous HTTP server called Medusa.

See, The out-of-the-box installation of Plone launches with only a single asynchronous HTTP server with just two fixed long-running worker threads. And it's way too easy to write custom code to keep those worker threads busy (for example, by with writing blocking calls to external services), effectively resulting denial of service for rest of the incoming requests

Well, as far as I know, the real bottleneck is not Medusa, but the way how ZODB database connections work. It seems that to optimize the database connection related caches, ZODB is best used with fixed amount of concurrent worker threads, and one dedicated database connection per thread. Finally, MVCC in ZODB limits each thread can serve only one request at time.

In practice, of course, Plone-sites use ZEO-clustering (and replication) to overcome the limitations described above.

Back to the topic (with a disclaimer). The methods described in this blog post have not been battle tested yet and they may turn out to be bad ideas. Still, it's been fun to figure out how our old asynchronous friend, Medusa, could be used to serve more concurrent request in certain special cases.

ZPublisher stream iterators

If you have been working with Plone long enough, you must have heard the rumor that blobs, which basically means files and images, are served from the filesystem in some special non-blocking way.

So, when someone downloads a file from Plone, the current worker thread only initiates the download and can then continue to serve the next request. The actually file is left to be served asynchronously by the main thread.

This is possible because of a ZPublisher feature called stream iterators (search IStreamIterator interface and its implementations in Zope2 and plone.app.blobs). Stream iterators are basically a way to postpone I/O-bound operations into the main thread's asyncore loop through a special Medusa-level producer object.

And because stream iterators are consumed only within the main thread, they come with some very strict limitations:

  • they are executed only after a completed transaction so they cannot interact with the transaction anymore
  • they must not read from the ZODB (because their origin connection is either closed or in use of their origin worker thread)
  • they must not fail unexpectedly, because you don't want to crash the main thread
  • they must not block the main thread, for obvious reasons.

Because of these limitations, the stream iterators, as such, are usable only for the purpose they have been made for: streaming files or similar immediately available buffers.

Asynchronous stream iterators

What if you could use ZPublisher's stream iterator support also for CPU-bound post-processing tasks? Or for post-processing tasks requiring calls to external web services or command-line utilities?

If you have a local Plone instance running somewhere, you can add the following proof-of-concept code and its slow_ok-method into a new External Method (also available as a gist):

import StringIO
import threading

from zope.interface import implements
from ZPublisher.Iterators import IStreamIterator
from ZServer.PubCore.ZEvent import Wakeup

from zope.globalrequest import getRequest


class zhttp_channel_async_wrapper(object):
    """Medusa channel wrapper to defer producers until released"""

    def __init__(self, channel):
        # (executed within the current Zope worker thread)
        self._channel = channel

        self._mutex = threading.Lock()
        self._deferred = []
        self._released = False
        self._content_length = 0

    def _push(self, producer, send=1):
        if (isinstance(producer, str)
                and producer.startswith('HTTP/1.1 200 OK')):
            # Fix Content-Length to match the real content length
            # (an alternative would be to use chunked encoding)
            producer = producer.replace(
                'Content-Length: 0\r\n',
                'Content-Length: {0:s}\r\n'.format(str(self._content_length))
            )
        self._channel.push(producer, send)

    def push(self, producer, send=1):
        # (executed within the current Zope worker thread)
        with self._mutex:
            if not self._released:
                self._deferred.append((producer, send))
            else:
                self._push(producer, send)

    def release(self, content_length):
        # (executed within the exclusive async thread)
        self._content_length = content_length
        with self._mutex:
            for producer, send in self._deferred:
                self._push(producer, send)
            self._released = True
        Wakeup()  # wake up the asyncore loop to read our results

    def __getattr__(self, key):
        return getattr(self._channel, key)


class AsyncWorkerStreamIterator(StringIO.StringIO):
    """Stream iterator to publish the results of the given func"""

    implements(IStreamIterator)

    def __init__(self, func, response, streamsize=1 << 16):
        # (executed within the current Zope worker thread)

        # Init buffer
        StringIO.StringIO.__init__(self)
        self._streamsize = streamsize

        # Wrap the Medusa channel to wait for the func results
        self._channel = response.stdout._channel
        self._wrapped_channel = zhttp_channel_async_wrapper(self._channel)
        response.stdout._channel = self._wrapped_channel

        # Set content-length as required by ZPublisher
        response.setHeader('content-length', '0')

        # Fire the given func in a separate thread
        self.thread = threading.Thread(target=func, args=(self.callback,))
        self.thread.start()

    def callback(self, data):
        # (executed within the exclusive async thread)
        self.write(data)
        self.seek(0)
        self._wrapped_channel.release(len(data))

    def next(self):
        # (executed within the main thread)
        if not self.closed:
            data = self.read(self._streamsize)
            if not data:
                self.close()
            else:
                return data
        raise StopIteration

    def __len__(self):
        return len(self.getvalue())


def slow_ok_worker(callback):
    # (executed within the exclusive async thread)
    import time
    time.sleep(1)
    callback('OK')


def slow_ok():
    """The publishable example method"""
    # (executed within the current Zope worker thread)
    request = getRequest()
    return AsyncWorkerStreamIterator(slow_ok_worker, request.response)

The above code example simulates a trivial post-processing with time.sleep, but it should apply for anything from building a PDF from the extracted data to calling an external web service before returning the final response.

An out-of-the-box Plone instance can handle only two (2) concurrent calls to a method, which would take one (1) second to complete.

In the above code, however, the post-processing could be delegated to a completely new thread, freeing the Zope worker thread to continue to handle the next request. Because of that, we can get much much better concurrency:

$ ab -c 100 -n 100 http://localhost:8080/Plone/slow_ok
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        Zope/(2.13.22,
Server Hostname:        localhost
Server Port:            8080

Document Path:          /Plone/slow_ok
Document Length:        2 bytes

Concurrency Level:      100
Time taken for tests:   1.364 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      15400 bytes
HTML transferred:       200 bytes
Requests per second:    73.32 [#/sec] (mean)
Time per request:       1363.864 [ms] (mean)
Time per request:       13.639 [ms] (mean, across all concurrent requests)
Transfer rate:          11.03 [Kbytes/sec] received

Connection Times (ms)
               min  mean[+/-sd] median   max
Connect:        1    2   0.6      2       3
Processing:  1012 1196  99.2   1202    1359
Waiting:     1011 1196  99.3   1202    1359
Total:       1015 1199  98.6   1204    1361

Percentage of the requests served within a certain time (ms)
  50%   1204
  66%   1256
  75%   1283
  80%   1301
  90%   1331
  95%   1350
  98%   1357
  99%   1361
  100%   1361 (longest request)

Of course, most of the stream iterator limits still apply: Asynchronous stream iterator must not access the database, which limits the possible use cases a lot. For the same reasons, also plone.transformchain is effectively skipped (no Diazo or Blocks), which limits this to be usable only for non-HTML responses.

experimental.promises

To go experimenting even further, what if you could do similar non-blocking asynchronous processing in the middle of a request? For example, to free the current Zope working thread while fetching a missing or outdated RSS feed in a separate thread and only then continue to render the final response.

An interesting side effect of using streaming iterators is that they allow you to inject code into the main thread's asynchronous loop. And when you are there, it's even possible to queue completely new request for ZPublisher to handle.

So, how would the following approach sound like:

  • let add-on code to annotate requests with promises for fetching the required data (each promise would be a standalone function, which could be executed under the asynchronous stream iterator rules, and when called, would resolve into a value, effectively the future of the promise), for example:

    @property
    def content(self):
        if 'my_unique_key' in IFutures(self.request):
            return IFutures(self.request)['my_unique_key']
        else:
            IPromises(self.request)['my_unique_key'] = my_promise_func
            return u''
    
  • when promises are found, the response is turned into an asynchronous stream iterator, which would then execute all the promises in parallel threads and collects the resolved values, futures:

    def transformIterable(self, result, encoding):
        if IPromises(self.request):
            return PromiseWorkerStreamIterator(
                IPromises(self.request), self.request, self.request.response)
        else:
            return None
    
  • finally, we'd wrap the current Medusa channel in a way that instead of publishing any data yet, a cloned request is queued for the ZPublisher (similarly how retries are done after conflict errors), but those cloned request and annotated to carry the resolved futures:

    def next(self):
       if self._futures:
           IFutures(self._zrequest).update(self._futures)
           self._futures = {}  # mark consumed to raise StopIteration
    
           from ZServer.PubCore import handle
           handle('Zope2', self._zrequest, self._zrequest.response)
       else:
           raise StopIteration
    
  • now the add-on code in question would find the futures from request, not issue any promises anymore and the request would result a normal response pushed all the way to the browser, which initiated the original request.

I'm not sure yet, how good or bad idea this would be, but I've been tinkering with a proof-of-concept implementation called experimental.promises to figure it out.

Of course, there are limits and issues to be aware of. Handling the same request twice is not free, which makes approach effective only when some significant processing can be moved to be done outside the worker threads. Also, because there may be other request between the first and the second pass (freeing the worker to handle other request is the whole point), the database may change between the passes (kind of breaking the MVCC promise). Finally, currently it's possible write the code always set new promises and end into never ending loop.

Anyway, if you are interested to try out these approaches (at your own risk, of course), feel free to ask more via Twitter or IRC.