Building Docker containers from scratch using Nix

  • 0

Nix makes it reasonable to build Docker containers from scratch. The resulting containers are still big (yet I heard there's ongoing work to make Nix builds more lean), but at least you don't need to think about choosing and keeping the base images up to date.

Next follows an example, how to make a Docker image for Plone with Nix.

Creating Nix expression with collective.recipe.nix

At first, we need Nix expression for Plone. Here I use one built with my buildout based generator, collective.recipe.nix. It generates a few exression, including plone.nix and plone-env.nix. The first one is only really usable with nix-shell, but the other one can be used building a standalone Plone for Docker image.

To create ./plone-env.nix, I need a buildout environment in ./default.nix:

with import <nixpkgs> {}; {
  myEnv = stdenv.mkDerivation {
    name = "myEnv";
    buildInputs = [
      pythonPackages.buildout
    ];
    shellHook = ''
      export SSL_CERT_FILE=~/.nix-profile/etc/ca-bundle.crt
    '';
  };
}

And a minimal Plone buildout using my recipe in ./buildout.cfg:

[buildout]
extends = https://dist.plone.org/release/4-latest/versions.cfg
parts = plone
versions = versions

[instance]
recipe = plone.recipe.zope2instance
eggs = Plone
user = admin:admin

[plone]
recipe = collective.recipe.nix
eggs =
    ${instance:eggs}
    plone.recipe.zope2instance

[versions]
zc.buildout =
setuptools =

And finally produce both plone.nix and the required plone-env.nix with:

$ nix-shell --run buildout

Creating Docker container with Nix Docker buildpack

Next up is building the container with our Nix expression with the help of a builder container, which I call Nix Docker buildpack.

At first, we need to clone that:

$ git clone https://github.com/datakurre/nix-build-pack-docker
$ cd nix-build-pack-docker

And build the builder:

$ cd builder
$ docker build -t nix-build-pack --rm=true --force-rm=true --no-cache=true .
$ cd ..

Now the builder can be used to build a tarball, which only contains the built Nix derivation Plone. Let's copy the created plone-env.nix into the current working directory and run:

$ docker run --rm -v `pwd`:/opt nix-build-pack /opt/plone-env.nix

After a while, that directory should contain file called plone-env.nix.tar.gz, which only contains two directories in its root: /nix for the built derivation and /app for easy access symlinks, like /app/bin/python.

Now we need ./Dockerfile for building the final Plone image:

FROM scratch
ADD plone.env.nix.tar.gz /
EXPOSE 8080
USER 1000
ENTRYPOINT ["/app/bin/python"]

And finally, a Plone image can be built with

$ docker build -t plone --rm=true --force-rm=true --no-cache=true .

Running Nix-built Plone container

To run Plone in a container with the image built above, we still need the configuration for Plone. We can the normal buildout generated configuration, but we need to

  1. remove site.py from parts/instance.
  2. fix paths to match in parts/instance/zope.conf to match the mounted paths in Docker container (/opt/...)
  3. create some temporary directory to be mounted into container

Also, we need a small wrapper to call the Plone instance script, ./instance.py, because we cannot use the buildout generated one:

import sys
import plone.recipe.zope2instance.ctl

sys.exit(plone.recipe.zope2instance.ctl.main(
    ['-C', '/opt/parts/instance/etc/zope.conf']
    + sys.argv[1:]
))

When these are in place, within the buildout directory, we should now be able to run Plone in Docker container with:

$ docker run --rm -v `pwd`:/opt -v `pwd`/tmp:/tmp -P plone /opt/instance.py fg

The current working directory is mapped to /opt and some temporary directory is mapped to /tmp (because our image didn't contain even a /tmp).

Note: When I tried this out, for some reason (possibly because VirtualBox mount with boot2docker), I had to remove ./var/filestorage/Data.fs.tmp between runs or I got errors on ZODB writes.

Creating Nix-expressions with buildout

  • 0

The greatest blocker for using Nix or complex Python projects like Plone, I think, is the work needed to make all required Python-packages (usally very specific versions) available in nix expression. Also, in the most extreme, that would require every version for every package in PyPI in nixpkgs.

Announcing collective.recipe.nix

collective.recipe.nix is my try for generating nix expressions for arbitrary Python projects. It's an experimental buildout recipe, which re-uses zc.recipe.egg for figuring out all the required packages and their dependencies.

Example of usage

At first, bootstrap your environment by defining python with buildout in ./default.nix:

with import <nixpkgs> {}; {
  myEnv = stdenv.mkDerivation {
    name = "myEnv";
    buildInputs = [
      pythonPackages.buildout
    ];
    shellHook = ''
      export SSL_CERT_FILE=~/.nix-profile/etc/ca-bundle.crt
    '';
  };
}

And example ./buildout.cfg:

[buildout]
parts =
    i18ndude
    releaser
    robot
    sphinx

[i18ndude]
recipe = collective.recipe.nix
eggs = i18ndude

[releaser]
recipe = collective.recipe.nix
eggs = zest.releaser[recommended]

[robot]
recipe = collective.recipe.nix
eggs = robotframework
propagated-build-inputs =
    robotframework=robotframework-debuglibrary
    robotframework=robotframework-selenium2library
    robotframework=robotframework-selenium2screenshots

[sphinx]
recipe = collective.recipe.nix
eggs = sphinx
propagated-build-inputs =
    sphinx=sphinxcontrib_robotframework[docs]

Run the buildout:

$ nix-shell --run buildout

The recipe generates three kind of expressions:

  • default [name].nix usable with nix-shell
  • buildEnv based [name]-env.nix usable with nix-build
  • buildPythonPackage based [name]-package.nix usable with nix-env -i -f

So, now you should be able to run zest.releaser with:

$ nix-shell releaser.nix --run fullrelease

You could also build Nix-environment with symlinks in folder ./releaser or into a Docker image with:

$ nix-build releaser-env.nix -o releaser

Finally, you could install zest.releaser into your current Nix-profile with:

$ nix-env -i -f releaser-zest_releaser.nix

Stateless Nix environments revisited

  • 0

It's almost a year, since I tried to bend Nix package manager to fit my own workflows for the first time. I disliked the recommended way of describing nix environments in global configuration and activating and deactivating them in statefull way. Back then, I worked my way around by defining a wrapper to make local nix-expressions callable executables.

Consider that deprecated.

Nix 1.9 introduced shebang support to use Nix-built interpreter in callable scripts. This alone is a major new feature and solves most of my use cases, where I wanted to define required Nix-dependencies locally, as close to their usage as possible.

Still, I do have a one more use case: For example, want to run make with an environment, which has locally defined Nix-built dependencies. Because the make in this particular example results just a static PDF file, it does not make sense to make that project into a Nix derivation itself. (Neither does it make much sense to make Makefile an executable.)

Of course, I start with defining my dependencies into a Nix derivation, and to make that more convenient with nix-shell, I save that into a file called ./default.nix:

with import <nixpkgs> {}; {
  myEnv = stdenv.mkDerivation {
    name = "myEnv";
    buildInputs = [
      (texLiveAggregationFun { paths = [
        texLive
        texLiveAuctex
        texLiveExtra
        texLivePGF
      ];})
      (rWrapper.override { packages = with rPackages; [
        tikzDevice
      ];})
      dot2tex
      gnumake
      graphviz
      pythonPackages.dateutil
      pythonPackages.matplotlib
      pythonPackages.numpy
      pythonPackages.scipy
    ];
  };
}

Now, I can run make in that environment in a stateless manner with:

$ nix-shell --pure --run "make clean all"

Unfortunately, while that works, it's a bit long command to type every time.

Initially, I would have preferred to be able to define local callable script named ./make, which was possible with my old approach. Yet, this time I realized, that I can reach almost the same result by defining the following bash function to help:

function nix() { echo nix-shell --pure --run \"$@\" | sh; }

or with a garbage collection root to avoid re-evaluation the expression on every call:

function nix() {
    if [ ! -e shell.drv ]; then
        nix-instantiate --indirect --add-root $PWD/shell.drv
    fi
    echo nix-shell $PWD/shell.drv --pure --run \"$@\" | sh;
}

With this helper in place, I can run any command from the locally defined default.nix with simply:

$ nix make clean all

Customize Plone 5 default theme on the fly

  • 0

When I recently wrote about, how to reintroduce ploneCustom for Plone5 TTW (through the web) by yourself, I got some feedback that it was the wrong thing to do. And the correct way would always be to create your custom theme.

If you are ready to let the precious ploneCustom go, here's how to currently customize the default Barceloneta theme on the fly by creating a new custom theme.

Inherit a new theme from Barceloneta

So, let's customize a brand new Plone 5 site by creating a new theme, which inherits everything from Barceloneta theme, yet allows us to add additional rules and styles:

  1. Open Site Setup and Theming control panel.

  2. Create New theme, not yet activated, with title mytheme (or your own title, once you get the concept)

  3. In the opened theme editor, replace the contents of rules.xml with the following code:

    <?xml version="1.0" encoding="UTF-8"?>
    <rules
        xmlns="http://namespaces.plone.org/diazo"
        xmlns:css="http://namespaces.plone.org/diazo/css"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xi="http://www.w3.org/2001/XInclude">
    
      <!-- Import Barceloneta rules -->
      <xi:include href="++theme++barceloneta/rules.xml" />
    
      <rules css:if-content="#visual-portal-wrapper">
        <!-- Placeholder for your own additional rules -->
      </rules>
    
    </rules>
    
  4. Still in the theme editor, add New file with name styles.less and edit and Save it with the following content:

    /* Import Barceloneta styles */
    @import "++theme++barceloneta/less/barceloneta.plone.less";
    
    /* Customize navbar color */
    @plone-sitenav-bg: pink;
    @plone-sitenav-link-hover-bg: darken(pink, 20%);
    
    /* Customize navbar text color */
    .plone-nav > li > a {
      color: @plone-text-color;
    }
    
    /* Customize search button */
    #searchGadget_form .searchButton {
      /* Re-use mixin from Barceloneta */
      .button-variant(@plone-text-color, pink, @plone-gray-lighter);
    }
    
    /* Inspect Barceloneta theme (and its less-folder) for more... */
    

But before activating the new theme, there's one more manual step to do...

Register and build a new LESS bundle

We just created a new LESS file, which would import the main Barceloneta LESS file at first, and then add our own additional styles with using some features from LESS syntax. To actually make that LESS file into a usable CSS (through the browser), we need register a new bundle for it, and build it:

  1. Open Site Setup and Resource Registries control panel.

  2. Add resource with name mytheme and a single CSS/LESS file with path ++theme++mytheme/styles.less to locate the file we just added into our theme:

    http://2.bp.blogspot.com/-cUE7pFkPMhY/VWJsTaQYOhI/AAAAAAAAAps/HaW5g6OCNJY/s1600/resource.png
  3. Save.

  4. Add bundle with name mytheme, requiring mytheme resoure, which we just created and Does your bundle contain any RequireJS or LESS files? checked:

    http://1.bp.blogspot.com/-6sXxYJmR80o/VWJsTXQ86aI/AAAAAAAAApo/IQmHdiaWRrE/s1600/bundle.png
  5. Save.

  6. Build mytheme bundle.

Now you should be ready to return back to Theming control panel, activate the theme, and see the gorgeous pink navigation bar:

http://4.bp.blogspot.com/-PPj1JGOUNDY/VWJsTW6_76I/AAAAAAAAApw/K31MZDUf8-c/s1600/result.png

Note: To really be a good citizen and follow the rules, there's a few additional steps:

  1. Add production-css setting into your theme's manifest.cfg to point to the compiled CSS bundle:

    [theme]
    title = mytheme
    description =
    production-css = /++plone++static/mytheme-compiled.css
    
  2. In Resource Registries, disable mytheme bundle by unchecking its Enabled checkbox and clicking Save.

  3. Deactivate and activate the theme once.

Technically this changes the CSS bundle to be registered as a so called Diazo bundle instead of a regular bundle. The difference is that Diazo bundle is always rendered last and can therefore override any CSS rule introduced the other enabled bundles. Also, as a Diazo bundle it get disabled and enabled properly when the active gets changed.

ploneCustom for Plone 5

  • 0

No more custom skins folder with infamous ploneCustom in Plone 5, they said.

Well, they can take away the skins folder, but they cannot take away our ploneCustom. I know, that the recommended way of customizing Plone 5 is via a custom theme through the Theming control panel from Site Setup. Still, sometimes you only need to add a few custom rules on top of an existing theme and creating a completely new theme would feel like an overkill.

Meet the new resource registry

One of the many big changes in Plone 5 is the completely new way how CSS and JavaScript resources are managed. Plone 5 introduces a completely new Resource Registries control panel and two new concepts to manage CSS ja JavaScipt there: resources and resource bundles.

Resource is a single CSS / LESS file, a single JavaScript file, or one of both, to provide some named feature for Plone 5. For example, a new embedded JavaScript based applet could be defined as a resource containing both its JavaScript code and required CSS /LESS stylesheet. In addition to those single files, JavaScript-files can depend on named requirejs modules provided by the other resources. Also LESS files can include any amount of available other LESS files. (LESS is superset of CSS with some optional superpowers like hierarchical directives, variables or optimized includes.)

Resource Bundle is a composition of named resources, which is eventually built into a single JavaScript and/or CSS file to be linked with each rendered page. When the page is rendered, bundles are linked (using either script-tags or stylesheet link-tags) in an order depending on their mutual dependencies. Bundles can be disabled and they can have conditions, so bundles are somewhat comparable to the legacy resource registry registrations in Plone 4 and earlier.

http://1.bp.blogspot.com/-hlFLUGS_BKE/VVBfXKQweuI/AAAAAAAAAnQ/vRnVuyvKs_4/s1600/08_bundle_ploneCustom.png

Now that you should be familiar with the concepts, you can bring our precious ploneCustom back to life.

Defining the next generation ploneCustom

These steps will define a new ploneCustom bundle, which provides both a custom CSS (with LESS) and a custom JavaScript file to allow arbitrary site customizations without introducing a new theme.

Creating and editing

At first, you need to add the actual LESS and JavaScript files. Instead of the deprecated skins custom folder you can add them into your Plone 5 site by using the old friend, ZMI (Zope Management Interface).

If you are running evelopment site, please, open the following url: http://localhost:8080/Plone/portal_resources/manage_main

http://3.bp.blogspot.com/-PYwj1yQ1nys/VVB1FeC3xpI/AAAAAAAAAos/le7yakSqO_U/s1600/01_portal_resources.png

This portal_resources is the new database (ZODB) based storage for any kind of custom resources (introduced with the new Theming control panel in Plone 4.3). Its functionality is based on plone.resource, but right now you only need to know how to use it with Plone 5 resource registries.

  1. So, in portal_resources, add a new BTreeFolder2 with name plone:

    http://2.bp.blogspot.com/-lIvOEy0ZdDc/VVB10b7OlPI/AAAAAAAAAo8/8hBPrJpWcBY/s1600/02_portal_resources.png
  2. Then navigate into that folder (select plone and press Edit button) and add an another BTreeFolder2 with name custom and navigate into that folder until you are at portal_resources/plone/custom:

    http://4.bp.blogspot.com/-qFUjEt26Qk0/VVCqHDko4eI/AAAAAAAAApU/93SOfj89Dpk/s1600/03_portal_resources.png
  3. Now Add a new File named ploneCustom.js and another named ploneCustom.less:

    http://3.bp.blogspot.com/-fA4tg9R5L0U/VVBfVsmWP8I/AAAAAAAAAm8/jnsW9BZy4ys/s1600/04_portal_resources.png
  4. And, finally, you can navigate into those files (select and press Edit button) to edit and save them with your CSS and JavaScript:

    http://2.bp.blogspot.com/-EMwY36pL8jk/VVBfWRvazhI/AAAAAAAAAnA/p3gFbqRDGZo/s1600/06_portal_resources.png

    The example JavaScript above would only annoy to to tell that it works:

    jQuery(function($) {
        alert("Hello World!");
    });
    
    http://1.bp.blogspot.com/-atBQWKrV6g4/VVBfWAilnXI/AAAAAAAAAnI/O7icbR3b34o/s1600/05_portal_resources.png

    The example CSS above would replace the portal logo with a custom text:

    #portal-logo:before {
      display: inline-block;
      content: "My Plone Site";
      font-size: 300%;
    }
    #portal-logo img {
      display: none;
    }
    

    In addition to that, you could add a little bit extra to learn more. These following lines would re-use button classes from Bootstrap 3 resources shipped with with Plone 5 (beta). This is an example of how to use LESS to cherry pick just a few special CSS rules from Bootstrap 3 framework and apply them next to the currently active theme:

    @import (reference) "../++plone++static/components/bootstrap/less/bootstrap.less";
    #searchGadget_form .searchButton {
      &:extend(.btn);
      &:extend(.btn-success);
    }
    

Registering and enabling

To register the resource and add it into a bundle (or create a new one), go to Resource Registries control panel (e.g. at http://localhost:8080/@@resourceregistry-controlpanel). Click Add resource to show the add resource form and fill it like in the screenshot below:

http://2.bp.blogspot.com/-d91GJ1BmojY/VVBfW_h0I2I/AAAAAAAAAnM/AMbzV3kU1L0/s1600/07_resource_ploneCustom.png

Note that the strings ++plone++custom/ploneCustom.js and ++plone++custom/ploneCustom.less are actually relative (public) URLs for the resources you just added into portal_resources.

After saving the resoure by clicking Save, click Add bundle to create a new bundle for your ploneCustom-resource. Fill-in the opened form as follows:

http://1.bp.blogspot.com/-hlFLUGS_BKE/VVBfXKQweuI/AAAAAAAAAnQ/vRnVuyvKs_4/s1600/08_bundle_ploneCustom.png

Note that the bundle depends on Plone bundle. That makes it getting loaded only after Plone bundle, which includes jQuery, which our custom JavaScript code depends on. (Later you may wonder, why jQuery was not required with requirejs. That would also work and is recommended for other libraries, but currently you can rely on jQuery being globally available after Plone bundle has been loaded.)

When you have saved the new ploneCustom resource bundle, it will appear into the Bundles list on the left. The final step is to click the Build button below the ploneCustom bundle label in that list. That will open a popup model to overview the build progress.

http://4.bp.blogspot.com/-2VcdXwU9So0/VVBfXe21ZLI/AAAAAAAAAoQ/1YCnr6OrlDo/s1600/09_build_ploneCustom.png

Once the build is done, you can click Close and reload the page to see your new ploneCustom bundle being applied for your site:

http://3.bp.blogspot.com/-eIJ3ZhE-qqE/VVBfX2nylqI/AAAAAAAAAnY/rW6RIeUcntk/s1600/10_ploneCustom.png

Note how the Plone logo has been replaced with a custom text and the Search button has been style after Bootstrap 3 button styles. (Also, you should now have seen an annoying alert popup from your ploneCustom JavasScript.)

To modify your ploneCustom bundle, just go to edit the file and and return to Resource Registries control panel to click the Build button again.

Now you have your ploneCustom back in Plone 5. Congratulations!

P.S. Don't forget that you can also tweak (at least the default) Plone theme a lot from the Resource Registries control panel without ploneCustom bundle simply by changing theme's LESS variables and building Plone bundle.

EXTRA: TTW ReactJS App in Plone

The new Resource Registries may feel complex to begin with, but once you get used to them, they are blessing. Just define dependencies properly, and never again you need to order Plone CSS and JavaScript resources manually, and never again (well, once add-ons get update into this new configuration) should add-ons break your site by re-registering resources into broken order.

As an example, let's implement a ReactJS Hello World for Plone TTW using the new resource registry:

At first, you need to register ReactJS library as a resource. You could upload the library into portal_resources, but for a quick experiment you can also refer to a cloud hosted version (https://fb.me/react-0.13.3.js). So, go to Resource Registries control panel and Add resource with the following details:

http://1.bp.blogspot.com/-tUd-UQ7KCws/VVBfYBYAlBI/AAAAAAAAAns/nz5T8qHEwvI/s1600/11_resource_reactjs.png

Note how the library is defined to be wrapped for requirejs with name react013. (Plone 5 actually ships with ReactJS library, but because the version in the first beta is just 0.10, we need to add newer version with a version specific name.)

Next, go to portal_resources/plone/custom/manage_main as before and add a new file called reactApp.js with the following ReactJS Hello World as its contents:

define([
  'react013',
], function(React) {

'use strict';

var ExampleApplication = React.createClass({
  render: function() {
    var elapsed = Math.round(this.props.elapsed  / 100);
    var seconds = elapsed / 10 + (elapsed % 10 ? '' : '.0' );
    var message = 'React has been successfully running for ' + seconds + ' seconds.';
    return React.createElement("p", null, message);
  }
});

var start = new Date().getTime();

setInterval(function() {
  React.render(
    React.createElement(ExampleApplication, {elapsed: new Date().getTime() - start}),
    document.getElementById('portal-logo')
  );
}, 50);

return ExampleApplication;

});

jQuery(function($) {
  require(['reactApp']);
});

Note how ReactJS is required as react013, and how the example application is required as reactApp at the bottom (using jQuery onLoad convention).

Of course, also reactApp must be defined as a new resource at Resource Registries control panel. It should depend on previously added resource react013 being wrapped for requirejs and export itself for requirejs as reactApp:

http://4.bp.blogspot.com/-6-0GxcKsJro/VVBfZXZBv7I/AAAAAAAAAn0/FRx_z_NSWd0/s1600/13_resource_reactApp.png

Finally, you can Add bundle for this example reactApp:

http://4.bp.blogspot.com/-oP5-me9bnVM/VVBfYbKdgBI/AAAAAAAAAnk/bxv6UK82H6k/s1600/12_bundle_reactApp.png

And after Save, Build the bundle from the button below the new bundle name in Bundles list:

http://2.bp.blogspot.com/-zGc9aH7HD68/VVBfZo4ZOBI/AAAAAAAAAoA/7NmP9kYmT_4/s1600/14_build_reactApp.png

Note that, because the cloud hosted ReactJS library was used, the new bundle contains only the code from reactApp.js and requirejs will require ReactJS from the cloud on-demand. If you would have added the library into portal_resources, it would have been included in the resulting bundle.

After page reload, your ReactJS Hello World should be alive:

http://1.bp.blogspot.com/-x6gPspdZdro/VVBfZ9IL1AI/AAAAAAAAAn8/8jO9TWbdAkY/s1600/15_reactApp.png

Transmogrifier, the Python migration pipeline, also for Python 3

  • 0

TL;DR; I forked collective.transmogrifier into just transmogrifier (not yet released) to make its core usable without Plone dependencies, use Chameleon for TAL-expressions, installable with just pip install and compatible with Python 3.

Transmogrifier is one of the many great developer tools by the Plone community. It's a generic pipeline tool for data manipulation, configurable with plain text INI-files, while new re-usable pipeline section blueprints can be implemented and packaged in Python. It could be used to process any number of things, but historically it's been mainly developed and used as a pluggable way to import legacy content into Plone.

A simple transmogrifier pipeline for dumping news from Slashdot to a CSV file could look like:

[transmogrifier]
pipeline =
    from_rss
    to_csv

[from_rss]
blueprint = transmogrifier.from
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/slashdot/slashdot

[to_csv]
blueprint = transmogrifier.to_csv
fieldnames =
    title
    link
filename = slashdot.csv

Actually, in time of writing this, I've yet to do any Plone migrations using transmogrifier. But when we recently had a reasonable size non-Plone migration task, I knew not to re-invent the wheel, but to transmogrify it. And we succeeded. Transmogrifier pipeline helped us to design the migration better, and splitting data processing into multiple pipeline sections helped us to delegate the work between multiple developers.

Unfortunately, currently collective.transmogrifier has unnecessary dependencies on CMFCore, is not installable without long known good set of versions and is missing any built-int command-line interface. At first, I tried to do all the necessary refactoring inside collective.transmogrifier, but eventually a fork was required to make the transmogrifier core usable outside Plone-environments, be compatible with Python 3 and to not break any existing workflows depending on the old transmogrifier.

So, meet the new transmogrifier:

  • can be installed with pip install (although, not yet released at PyPI)
  • new mr.migrator inspired command-line interface (see transmogrif --help for all the options)
  • new base classes for custom blueprints
    • transmogrifier.blueprints.Blueprint
    • transmogrifier.blueprints.ConditionalBlueprint
  • new ZCML-directives for registering blueprints and re-usable pipelines
    • <transmogrifier:blueprint component="" name="" />
    • <transmogrifier:pipeline id="" name="" description="" configuration="" />
  • uses Chameleon for TAL-expressions (e.g. in ConditionalBlueprint)
  • has only a few generic built-in blueprints
  • supports z3c.autoinclude for package transmogrifier
  • fully backwards compatible with blueprints for collective.transmogrifier
  • runs with Python >= 2.6, including Python 3+

There's still much work to do before a real release (e.g. documenting and testing the new CLI-script and new built-in blueprints), but let's still see how it works already...

P.S. Please, use a clean Python virtualenv for these examples.

Example pipeline

Let's start with an easy installation

$ pip install git+https://github.com/datakurre/transmogrifier
$ transmogrify --help
Usage: transmogrify <pipelines_and_overrides>...
                [--overrides=overrides.cfg>]
                [--include=package_or_module>...]
                [--include=package:filename>...]
                [--context=<package.module.factory>]
   transmogrify --list
                [--include=package_or_module>...]
   transmogrify --show=<pipeline>
                [--include=package_or_module>...]

and with example filesystem pipeline.cfg

[transmogrifier]
pipeline =
    from_rss
    to_csv

[from_rss]
blueprint = transmogrifier.from
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/slashdot/slashdot

[to_csv]
blueprint = transmogrifier.to_csv
fieldnames =
    title
    link
filename = slashdot.csv

and its dependencies

$ pip install feedparser

and the results

$ transmogrify pipeline.cfg
INFO:transmogrifier:CSVConstructor:to_csv wrote 25 items to /.../slashdot.csv

using, for example, Python 2.7 or Python 3.4.

Minimal migration project

Let's create an example migration project with custom blueprints using Python 3. In addition to transmogrifier, we need venusianconfiguration for easy blueprint registration and, of course, actual depedencies for our blueprints:

$ pip install git+https://github.com/datakurre/transmogrifier
$ pip install git+https://github.com/datakurre/venusianconfiguration
$ pip install fake-factory

Now we can implement custom blueprints in, for example, blueprints.py

from venusianconfiguration import configure

from transmogrifier.blueprints import Blueprint
from faker import Faker


@configure.transmogrifier.blueprint.component(name='faker_contacts')
class FakerContacts(Blueprint):
    def __iter__(self):
        for item in self.previous:
            yield item

        amount = int(self.options.get('amount', '0'))
        fake = Faker()

        for i in range(amount):
            yield {
                'name': fake.name(),
                'address': fake.address()
            }

and see them registered next to the built-in ones (or from the other packages hooking into transmogrifier autoinclude entry-point):

$ transmogrify --list --include=blueprints

Available blueprints
--------------------
faker_contacts
...

Now, we can make an example pipeline.cfg

[transmogrifier]
pipeline =
    from_faker
    to_csv

[from_faker]
blueprint = faker_contacts
amount = 2

[to_csv]
blueprint = transmogrifier.to_csv

and enjoy the results

$ transmogrify pipeline.cfg to_csv:filename=- --include=blueprints
address,name
"534 Hintz Inlet Apt. 804
Schneiderchester, MI 55300",Dr. Garland Wyman
"44608 Volkman Islands
Maryleefurt, AK 42163",Mrs. Franc Price DVM
INFO:transmogrifier:CSVConstructor:to_csv saved 2 items to -

An alternative would be to just use the shipped mr.bob-template...

Migration project using the template

The new transmogrifier ships with an easy getting started template for your custom migration project. To use the template, you need a Python environment with mr.bob and the new transmogrifier:

$ pip install mr.bob readline  # readline is an implicit mr.bob dependency
$ pip install git+https://github.com/datakurre/transmogrifier

Then you can create a new project directory with:

$ mrbob bobtemplates.transmogrifier:project

Once the new project directory is created, inside the directory, you can install rest of the depdendencies and activate the project with:

$ pip install -r requirements.txt
$ python setup.py develop

Now transmogrify knows your project's custom blueprints and pipelines:

$ transmogrify --list

Available blueprints
--------------------
myprojectname.mock_contacts
...

Available pipelines
-------------------
myprojectname_example
    Example: Generates uppercase mock addresses

And the example pipeline can be executed with:

$ transmogrify myprojectname_example
name,address
ISSAC KOSS I,"PSC 8465, BOX 1625
APO AE 97751"
TESS FAHEY,"PSC 7387, BOX 3736
APO AP 13098-6260"
INFO:transmogrifier:CSVConstructor:to_csv wrote 2 items to -

Please, see created README.rst for how to edit the example blueprints and pipelines and create more.

Mandatory example with Plone

Using the new transmogrifier with Plone should be as simply as adding it into your buildout.cfg next to the old transmogrifier packages:

[buildout]
extends = http://dist.plone.org/release/4.3-latest/versions.cfg
parts = instance plonesite
versions = versions

extensions = mr.developer
soures = sources
auto-checkout = *

[sources]
transmogrifier = git https://github.com/datakurre/transmogrifier

[instance]
recipe = plone.recipe.zope2instance
eggs =
    Plone
    z3c.pt
    transmogrifier
    collective.transmogrifier
    plone.app.transmogrifier
user = admin:admin
zcml = plone.app.transmogrifier

[plonesite]
recipe = collective.recipe.plonesite
site-id = Plone
instance = instance

[versions]
setuptools =
zc.buildout =

Let's also write a fictional migration pipeline, which would create Plone content from Slashdot RSS-feed:

[transmogrifier]
pipeline =
    from_rss
    id
    fields
    folders
    create
    update
    commit

[from_rss]
blueprint = transmogrifier.from
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/Slashdot/slashdot

[id]
blueprint = transmogrifier.set
modules = uuid
id = python:str(modules['uuid'].uuid4())

[fields]
blueprint = transmogrifier.set
portal_type = string:Document
text = path:item/summary
_path = string:slashdot/${item['id']}

[folders]
blueprint = collective.transmogrifier.sections.folders

[create]
blueprint = collective.transmogrifier.sections.constructor

[update]
blueprint = plone.app.transmogrifier.atschemaupdater

[commit]
blueprint = transmogrifier.to_expression
modules = transaction
expression = python:modules['transaction'].commit()
mode = items

Now, the new CLI-script can be used together with bin/instance -Ositeid run provided by plone.recipe.zope2instance so that transmogrifier will get your site as its context simply by calling zope.component.hooks.getSite:

$ bin/instance -OPlone run bin/transmogrify pipeline.cfg --context=zope.component.hooks.getSite

With Plone you should, of course, still use Python 2.7.

Funnelweb example with Plone

Funnelweb is a collection of transmogrifier blueprints an pipelines for scraping any web site into Plone. I heard that its example pipelines are a little outdated, but they make a nice demo anywyay.

Let's extend our previous Plone-example with the following funnelweb.cfg buildout to include all the necessary transmogrifier blueprints and the example funnelweb.ttw pipeline:

[buildout]
extends = buildout.cfg

[instance]
eggs +=
    transmogrify.pathsorter
    funnelweb

We also need a small additional pipeline commit.cfg to commit all the changes made by funnelweb.ttw:

[transmogrifier]
pipeline = commit

[commit]
blueprint = transmogrifier.interval
modules = transaction
expression = python:modules['transaction'].commit()

Now, after the buildout has been run, the following command would use pipelines funnelweb.ttw and commit.cfg to somewhat scrape my blog into Plone:

$ bin/instance -OPlone run bin/transmogrify funnelweb.ttw commit.cfg crawler:url=http://datakurre.pandala.org "crawler:ignore=feeds\ncsi.js" --context=zope.component.hooks.getSite

For tuning the import further, the used pipelines could be easily exported into filesystem, customized, and then executed similarly to commit.cfg:

$ bin/instance -OPlone run bin/transmogrify --show=funnelweb.ttw > myfunnelweb.cfg

Too many ways to do async tasks with Plone

  • 0

Triggering asynchronous tasks from Plone is hard, we hear. And that's actually quite surprising, given that, from its very beginning, Plone has been running on top of the first asynchronous web server written in Python, medusa.

Of course, there exist many, too many, different solutions to run asynchronous task with Plone:

  • plone.app.async is the only one in Plone-namespace, and probably the most criticized one, because of using ZODB to persist its task queue
  • netsight.async on the other hand being simpler by just executing the the given task outside Zope worker pool (but requiring its own database connection).
  • finally, if you happen to like Celery, Nathan Van Gheem is working on a simple Celery-integration, collective.celery, based on an earlier work by David Glick.

To add insult to injury, I've ended up developing a more than one method more, because of, being warned about plone.app.async, being hit hard by the opinionated internals of Celery, being unaware of netsight.async, and because single solution has not fit all my use cases.

I believe, my various use cases can mostly be fit into these categories:

  • Executing simple tasks with unpredictable execution time so that the execution cannot block all of the valuable Zope worker threads serving HTTP requests (amount of threads is fixed in Zope, because ZODB connection cached cannot be shared between simultaneous requests and one can afford only so much server memory per site).

    Examples: communicating to external services, loading an external RSS feed, ...

  • Queueing a lot of background tasks to be executed now or later, because possible results can be delivered asynchronously (e.g. user can return to see it later, can get notified about finished tasks, etc), or when it would benefit to be able to distribute the work between multiple Zope worker instances.

    Examples: converting files, encoding videos, burning PDFs, sending a lot of emails, ...

  • Communicating with external services.

    Examples: integration between sites or different systems, synchronizing content between sites, performing migrations, ...

For further reading about all the possible issues when queing asynchronous tasks, I'd recommend Whichert Akkermans' blog post about task queues.

So, here's the summary, from my simpliest approach solution to enterprise messaging with RabbitMQ:

ZPublisher stream iterator workers

class MyView(BrowserView):

    def __call__(self):
        return AsyncWorkerStreamIterator(some_callable, self.request)

I've already blogged earlier in detail about how to abuse ZPublisher's stream iterator interface to free the current Zope worker thread and process the current response outside Zope worker threads before letting the response to continue its way towards the requesting client (browser).

An example of this trick is a yet another zip-export add-on collective.jazzport. It exports Plone-folders as zip-files by downloading all those to-be-zipped files separately simply through ZPublisher (or, actually, using site's public address). It can also download files in parallel to use all the available load balanced instances. Yet, because it downloads files only after freeing the current Zope worker instance, it should not block any worker thread by itself (see its browser.py, and iterators.py).

There are two major limitations for this approach (common to all ZPublisher stream iterators):

  • The code should not access ZODB after the worker thread has been freed (unless a completely new connection with new cache is created).
  • This does not help installations with HAProxy or similar front-end proxy with fixed allowed simultaneous requests per Zope instance.

Also, of course, this is not real async, because it keeps the client waiting until the request is completed and cannot distribute work between Zope instances.

collective.futures

class MyView(BrowserView):

    def __call__(self):
        try:
            return futures.result('my_unique_key')
        except futures.FutureNotSubmittedError:
            futures.submit('my_unique_key', some_callable, 'foo', 'bar')
            return u'A placeholder value, which is never really returned.'

collective.futures was the next step from the previous approach. It provides a simple API for registering multiple tasks (which does not need to access ZODB) so that they will be executed outside the current Zope worker thread.

Once all the registered tasks have been executed, the same request will be queued for ZPublisher to be processed again, now with the responses from those registered tasks.

Finally, the response will be returned for the requesting like with any other requests.

collective.futures has the same issues as the previous approach (used in collective.jazzport), and it may also waste resources by processing certain parts of the request twice (like publish traverse).

We use this, for example, for loading external RSS feeds so that the Zope worker threads are freed to process other requests while we are waiting the external services to return us those feeds.

collective.taskqueue

class MyView(BrowserView):

    def __call__(self):
        taskqueue.add('/Plone/path/to/some/other/view')
        return u'Task queued, and a better view could now display a throbber.'

collective.taskqueue should be a real alternative for plone.app.async and netsight.async. I see it as a simple and opinionated sibling of collective.zamqp, and it should be able to handle all the most basic asynchrnous tasks where no other systems are involved.

collective.taskqueue provides one or more named asynchronously consumed task queues, which may contain any number of tasks: asynchronously dispatched simple requests to any traversable resources in Plone.

With out-of-the-box Plone (without any other add-ons or external services) it provides instance local volatile memory based task queues, which are consumed by the other one of the default two Zope worker threads. With redis, it supports persistent task queues with quaranteed delivery and distributed consumption. For example, you could have dedicated Plone instances to only consume those shared task queues from Redis.

To not sound too good to be true, collective.taskqueue does not have any nind of monitoring of the task queues out-of-the-box (only a instance-Z2.log entry with resulted status code for each consumed task is generated).

collective.zamqp

class MyView(BrowserView):

    def __call__(self):
        producer = getUtility(IProducer, name='my.asyncservice')
        producer.register()  # bind to successful transaction
        producer.publish({'title': u'My title'})
        return u'Task queued, and a better view could now display a throbber.'

Finally, collective.zamqp is a very flexible asynchronous framework and RabbitMQ integration for Plone, which I re-wrote from affinitic.zamqp before figuring out any of the previous approaches.

As the story behind it goes, we did use affinitic.zamqp at first, but because of its issues we had to start rewrite to make it more stable and compatible with newer AMQP specifications. At first, I tried to built it on top of Celery, then on top of Kombu (transport framework behind Celery), but at the end it had to be based directly on top of pika (0.9.4), a popular Python AMQP library. Otherwise it would have been really difficult to benefit from all the possible features of RabbitMQ and be compatible with other that Python based services.

collective.zamqp is best used for configuring and executing asynchronous messaging between Plone sites, other Plone sites and other AMQP-connected services. It's also possible to use it to build frontend messaging services (possibly secured using SSL) with RabbitMQ's webstomp server (see the chatbehavior-example). Yet, it has a few problems of its own:

  • it depends on five.grok
  • it's way too tighly integrated with pika 0.9.5, which makes upgrading the integration more difficult than necessary (and pika 0.9.5 has a few serious bugs related to synchronous AMQP connections, luckily not requird for c.zamqp)
  • it has a quite bit of poorly documented magic in how to use it to make all the possible AMQP messaging configurations.

collective.zamqp does not provide monitoring utilities of its own (beyond very detailed logging of messaging events). Yet, the basic monitoring needs can be covered with RabbitMQ's web and console UIs and RESTful APIs, and all decent monitoring tools should have their own RabbitMQ plugins.

For more detailed examples of collective.zamqp, please, see my related StackOverflow answer and our presentation from PloneConf 2012 (more examples are linked from the last slide).