Transmogrifier, the Python migration pipeline, also for Python 3

  • 0

TL;DR; I forked collective.transmogrifier into just transmogrifier (not yet released) to make its core usable without Plone dependencies, use Chameleon for TAL-expressions, installable with just pip install and compatible with Python 3.

Transmogrifier is one of the many great developer tools by the Plone community. It's a generic pipeline tool for data manipulation, configurable with plain text INI-files, while new re-usable pipeline section blueprints can be implemented and packaged in Python. It could be used to process any number of things, but historically it's been mainly developed and used as a pluggable way to import legacy content into Plone.

A simple transmogrifier pipeline for dumping news from Slashdot to a CSV file could look like:

[transmogrifier]
pipeline =
    from_rss
    to_csv

[from_rss]
blueprint = transmogrifier.from_expression
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/slashdot/slashdot

[to_csv]
blueprint = transmogrifier.to_csv
fieldnames =
    title
    link
filename = slashdot.csv

Actually, in time of writing this, I've yet to do any Plone migrations using transmogrifier. But when we recently had a reasonable size non-Plone migration task, I knew not to re-invent the wheel, but to transmogrify it. And we succeeded. Transmogrifier pipeline helped us to design the migration better, and splitting data processing into multiple pipeline sections helped us to delegate the work between multiple developers.

Unfortunately, currently collective.transmogrifier has unnecessary dependencies on CMFCore, is not installable without long known good set of versions and is missing any built-int command-line interface. At first, I tried to do all the necessary refactoring inside collective.transmogrifier, but eventually a fork was required to make the transmogrifier core usable outside Plone-environments, be compatible with Python 3 and to not break any existing workflows depending on the old transmogrifier.

So, meet the new transmogrifier:

  • can be installed with pip install (although, not yet released at PyPI)
  • new mr.migrator inspired command-line interface (see transmogrif --help for all the options)
  • new base classes for custom blueprints
    • transmogrifier.blueprints.Blueprint
    • transmogrifier.blueprints.ConditionalBlueprint
  • new ZCML-directives for registering blueprints and re-usable pipelines
    • <transmogrifier:blueprint component="" name="" />
    • <transmogrifier:pipeline id="" name="" description="" configuration="" />
  • uses Chameleon for TAL-expressions (e.g. in ConditionalBlueprint)
  • has only a few generic built-in blueprints
  • supports z3c.autoinclude for package transmogrifier
  • fully backwards compatible with blueprints for collective.transmogrifier
  • runs with Python >= 2.6, including Python 3+

There's still much work to do before a real release (e.g. documenting and testing the new CLI-script and new built-in blueprints), but let's still see how it works already...

P.S. Please, use a clean Python virtualenv for these examples.

Example pipeline

Let's start with an easy installation

$ pip install git+https://github.com/datakurre/transmogrifier
$ transmogrify --help
Usage: transmogrify <pipelines_and_overrides>...
                [--overrides=overrides.cfg>]
                [--include=package_or_module>...]
                [--include=package:filename>...]
                [--context=<package.module.factory>]
   transmogrify --list
                [--include=package_or_module>...]
   transmogrify --show=<pipeline>
                [--include=package_or_module>...]

and with example filesystem pipeline.cfg

[transmogrifier]
pipeline =
    from_rss
    to_csv

[from_rss]
blueprint = transmogrifier.from_expression
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/slashdot/slashdot

[to_csv]
blueprint = transmogrifier.to_csv
fieldnames =
    title
    link
filename = slashdot.csv

and its dependencies

$ pip install feedparser

and the results

$ transmogrify pipeline.cfg
INFO:transmogrifier:CSVConstructor:to_csv wrote 25 items to /.../slashdot.csv

using, for example, Python 2.7 or Python 3.4.

Minimal migration project

Let's create an example migration project with custom blueprints using Python 3. In addition to transmogrifier, we need venusianconfiguration for easy blueprint registration and, of course, actual depedencies for our blueprints:

$ pip install git+https://github.com/datakurre/transmogrifier
$ pip install git+https://github.com/datakurre/venusianconfiguration
$ pip install fake-factory

Now we can implement custom blueprints in, for example, blueprints.py

from venusianconfiguration import configure

from transmogrifier.blueprints import Blueprint
from faker import Faker


@configure.transmogrifier.blueprint.component(name='faker_contacts')
class FakerContacts(Blueprint):
    def __iter__(self):
        for item in self.previous:
            yield item

        amount = int(self.options.get('amount', '0'))
        fake = Faker()

        for i in range(amount):
            yield {
                'name': fake.name(),
                'address': fake.address()
            }

and see them registered next to the built-in ones (or from the other packages hooking into transmogrifier autoinclude entry-point):

$ transmogrify --list --include=blueprints

Available blueprints
--------------------
faker_contacts
...

Now, we can make an example pipeline.cfg

[transmogrifier]
pipeline =
    from_faker
    to_csv

[from_faker]
blueprint = faker_contacts
amount = 2

[to_csv]
blueprint = transmogrifier.to_csv

and enjoy the results

$ transmogrify pipeline.cfg to_csv:filename=- --include=blueprints
address,name
"534 Hintz Inlet Apt. 804
Schneiderchester, MI 55300",Dr. Garland Wyman
"44608 Volkman Islands
Maryleefurt, AK 42163",Mrs. Franc Price DVM
INFO:transmogrifier:CSVConstructor:to_csv saved 2 items to -

An alternative would be to just use the shipped mr.bob-template...

Migration project using the template

The new transmogrifier ships with an easy getting started template for your custom migration project. To use the template, you need a Python environment with mr.bob and the new transmogrifier:

$ pip install mr.bob readline  # readline is an implicit mr.bob dependency
$ pip install git+https://github.com/datakurre/transmogrifier

Then you can create a new project directory with:

$ mrbob bobtemplates.transmogrifier:project

Once the new project directory is created, inside the directory, you can install rest of the depdendencies and activate the project with:

$ pip install -r requirements.txt
$ python setup.py develop

Now transmogrify knows your project's custom blueprints and pipelines:

$ transmogrify --list

Available blueprints
--------------------
myprojectname.mock_contacts
...

Available pipelines
-------------------
myprojectname_example
    Example: Generates uppercase mock addresses

And the example pipeline can be executed with:

$ transmogrify myprojectname_example
name,address
ISSAC KOSS I,"PSC 8465, BOX 1625
APO AE 97751"
TESS FAHEY,"PSC 7387, BOX 3736
APO AP 13098-6260"
INFO:transmogrifier:CSVConstructor:to_csv wrote 2 items to -

Please, see created README.rst for how to edit the example blueprints and pipelines and create more.

Mandatory example with Plone

Using the new transmogrifier with Plone should be as simply as adding it into your buildout.cfg next to the old transmogrifier packages:

[buildout]
extends = http://dist.plone.org/release/4.3-latest/versions.cfg
parts = instance plonesite
versions = versions

extensions = mr.developer
soures = sources
auto-checkout = *

[sources]
transmogrifier = git https://github.com/datakurre/transmogrifier

[instance]
recipe = plone.recipe.zope2instance
eggs =
    Plone
    z3c.pt
    transmogrifier
    collective.transmogrifier
    plone.app.transmogrifier
user = admin:admin
zcml = plone.app.transmogrifier

[plonesite]
recipe = collective.recipe.plonesite
site-id = Plone
instance = instance

[versions]
setuptools =
zc.buildout =

Let's also write a fictional migration pipeline, which would create Plone content from Slashdot RSS-feed:

[transmogrifier]
pipeline =
    from_rss
    id
    fields
    folders
    create
    update
    commit

[from_rss]
blueprint = transmogrifier.from_expression
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/Slashdot/slashdot

[id]
blueprint = transmogrifier.expression
modules = uuid
id = python:str(modules['uuid'].uuid4())

[fields]
blueprint = transmogrifier.expression
portal_type = string:Document
text = path:item/summary
_path = string:slashdot/${item['id']}

[folders]
blueprint = collective.transmogrifier.sections.folders

[create]
blueprint = collective.transmogrifier.sections.constructor

[update]
blueprint = plone.app.transmogrifier.atschemaupdater

[commit]
blueprint = transmogrifier.to_expression
modules = transaction
expression = python:modules['transaction'].commit()
mode = items

Now, the new CLI-script can be used together with bin/instance -Ositeid run provided by plone.recipe.zope2instance so that transmogrifier will get your site as its context simply by calling zope.component.hooks.getSite:

$ bin/instance -OPlone run bin/transmogrify pipeline.cfg --context=zope.component.hooks.getSite

With Plone you should, of course, still use Python 2.7.

Funnelweb example with Plone

Funnelweb is a collection of transmogrifier blueprints an pipelines for scraping any web site into Plone. I heard that its example pipelines are a little outdated, but they make a nice demo anywyay.

Let's extend our previous Plone-example with the following funnelweb.cfg buildout to include all the necessary transmogrifier blueprints and the example funnelweb.ttw pipeline:

[buildout]
extends = buildout.cfg

[instance]
eggs +=
    transmogrify.pathsorter
    funnelweb

We also need a small additional pipeline commit.cfg to commit all the changes made by funnelweb.ttw:

[transmogrifier]
pipeline = commit

[commit]
blueprint = transmogrifier.to_expression
modules = transaction
expression = python:modules['transaction'].commit()
mode = items

Now, after the buildout has been run, the following command would use pipelines funnelweb.ttw and commit.cfg to somewhat scrape my blog into Plone:

$ bin/instance -OPlone run bin/transmogrify funnelweb.ttw commit.cfg crawler:url=http://datakurre.pandala.org "crawler:ignore=feeds\ncsi.js" --context=zope.component.hooks.getSite

For tuning the import further, the used pipelines could be easily exported into filesystem, customized, and then executed similarly to commit.cfg:

$ bin/instance -OPlone run bin/transmogrify --show=funnelweb.ttw > myfunnelweb.cfg

Too many ways to do async tasks with Plone

  • 0

Triggering asynchronous tasks from Plone is hard, we hear. And that's actually quite surprising, given that, from its very beginning, Plone has been running on top of the first asynchronous web server written in Python, medusa.

Of course, there exist many, too many, different solutions to run asynchronous task with Plone:

  • plone.app.async is the only one in Plone-namespace, and probably the most criticized one, because of using ZODB to persist its task queue
  • netsight.async on the other hand being simpler by just executing the the given task outside Zope worker pool (but requiring its own database connection).
  • finally, if you happen to like Celery, Nathan Van Gheem is working on a simple Celery-integration, collective.celery, based on an earlier work by David Glick.

To add insult to injury, I've ended up developing a more than one method more, because of, being warned about plone.app.async, being hit hard by the opinionated internals of Celery, being unaware of netsight.async, and because single solution has not fit all my use cases.

I believe, my various use cases can mostly be fit into these categories:

  • Executing simple tasks with unpredictable execution time so that the execution cannot block all of the valuable Zope worker threads serving HTTP requests (amount of threads is fixed in Zope, because ZODB connection cached cannot be shared between simultaneous requests and one can afford only so much server memory per site).

    Examples: communicating to external services, loading an external RSS feed, ...

  • Queueing a lot of background tasks to be executed now or later, because possible results can be delivered asynchronously (e.g. user can return to see it later, can get notified about finished tasks, etc), or when it would benefit to be able to distribute the work between multiple Zope worker instances.

    Examples: converting files, encoding videos, burning PDFs, sending a lot of emails, ...

  • Communicating with external services.

    Examples: integration between sites or different systems, synchronizing content between sites, performing migrations, ...

For further reading about all the possible issues when queing asynchronous tasks, I'd recommend Whichert Akkermans' blog post about task queues.

So, here's the summary, from my simpliest approach solution to enterprise messaging with RabbitMQ:

ZPublisher stream iterator workers

class MyView(BrowserView):

    def __call__(self):
        return AsyncWorkerStreamIterator(some_callable, self.request)

I've already blogged earlier in detail about how to abuse ZPublisher's stream iterator interface to free the current Zope worker thread and process the current response outside Zope worker threads before letting the response to continue its way towards the requesting client (browser).

An example of this trick is a yet another zip-export add-on collective.jazzport. It exports Plone-folders as zip-files by downloading all those to-be-zipped files separately simply through ZPublisher (or, actually, using site's public address). It can also download files in parallel to use all the available load balanced instances. Yet, because it downloads files only after freeing the current Zope worker instance, it should not block any worker thread by itself (see its browser.py, and iterators.py).

There are two major limitations for this approach (common to all ZPublisher stream iterators):

  • The code should not access ZODB after the worker thread has been freed (unless a completely new connection with new cache is created).
  • This does not help installations with HAProxy or similar front-end proxy with fixed allowed simultaneous requests per Zope instance.

Also, of course, this is not real async, because it keeps the client waiting until the request is completed and cannot distribute work between Zope instances.

collective.futures

class MyView(BrowserView):

    def __call__(self):
        try:
            return futures.result('my_unique_key')
        except futures.FutureNotSubmittedError:
            futures.submit('my_unique_key', some_callable, 'foo', 'bar')
            return u'A placeholder value, which is never really returned.'

collective.futures was the next step from the previous approach. It provides a simple API for registering multiple tasks (which does not need to access ZODB) so that they will be executed outside the current Zope worker thread.

Once all the registered tasks have been executed, the same request will be queued for ZPublisher to be processed again, now with the responses from those registered tasks.

Finally, the response will be returned for the requesting like with any other requests.

collective.futures has the same issues as the previous approach (used in collective.jazzport), and it may also waste resources by processing certain parts of the request twice (like publish traverse).

We use this, for example, for loading external RSS feeds so that the Zope worker threads are freed to process other requests while we are waiting the external services to return us those feeds.

collective.taskqueue

class MyView(BrowserView):

    def __call__(self):
        taskqueue.add('/Plone/path/to/some/other/view')
        return u'Task queued, and a better view could now display a throbber.'

collective.taskqueue should be a real alternative for plone.app.async and netsight.async. I see it as a simple and opinionated sibling of collective.zamqp, and it should be able to handle all the most basic asynchrnous tasks where no other systems are involved.

collective.taskqueue provides one or more named asynchronously consumed task queues, which may contain any number of tasks: asynchronously dispatched simple requests to any traversable resources in Plone.

With out-of-the-box Plone (without any other add-ons or external services) it provides instance local volatile memory based task queues, which are consumed by the other one of the default two Zope worker threads. With redis, it supports persistent task queues with quaranteed delivery and distributed consumption. For example, you could have dedicated Plone instances to only consume those shared task queues from Redis.

To not sound too good to be true, collective.taskqueue does not have any nind of monitoring of the task queues out-of-the-box (only a instance-Z2.log entry with resulted status code for each consumed task is generated).

collective.zamqp

class MyView(BrowserView):

    def __call__(self):
        producer = getUtility(IProducer, name='my.asyncservice')
        producer.register()  # bind to successful transaction
        producer.publish({'title': u'My title'})
        return u'Task queued, and a better view could now display a throbber.'

Finally, collective.zamqp is a very flexible asynchronous framework and RabbitMQ integration for Plone, which I re-wrote from affinitic.zamqp before figuring out any of the previous approaches.

As the story behind it goes, we did use affinitic.zamqp at first, but because of its issues we had to start rewrite to make it more stable and compatible with newer AMQP specifications. At first, I tried to built it on top of Celery, then on top of Kombu (transport framework behind Celery), but at the end it had to be based directly on top of pika (0.9.4), a popular Python AMQP library. Otherwise it would have been really difficult to benefit from all the possible features of RabbitMQ and be compatible with other that Python based services.

collective.zamqp is best used for configuring and executing asynchronous messaging between Plone sites, other Plone sites and other AMQP-connected services. It's also possible to use it to build frontend messaging services (possibly secured using SSL) with RabbitMQ's webstomp server (see the chatbehavior-example). Yet, it has a few problems of its own:

  • it depends on five.grok
  • it's way too tighly integrated with pika 0.9.5, which makes upgrading the integration more difficult than necessary (and pika 0.9.5 has a few serious bugs related to synchronous AMQP connections, luckily not requird for c.zamqp)
  • it has a quite bit of poorly documented magic in how to use it to make all the possible AMQP messaging configurations.

collective.zamqp does not provide monitoring utilities of its own (beyond very detailed logging of messaging events). Yet, the basic monitoring needs can be covered with RabbitMQ's web and console UIs and RESTful APIs, and all decent monitoring tools should have their own RabbitMQ plugins.

For more detailed examples of collective.zamqp, please, see my related StackOverflow answer and our presentation from PloneConf 2012 (more examples are linked from the last slide).

Nix expressions as executable commands

  • 0

Updated 2014-09-24: I learned that in a mixed (OSX and nixpkgs) environment, one should not set LD_LIBRARY_PATH, but fix dynamic linking to use absolute paths. Yet, I refactored my wrapper to use myEnvFun when required (see the buildout example).

Updated 2014-09-22: I was wrong about, how nix-built Python environments could be used together with buildout and updated this post to reflect my experiences.

My main tools for Python based software development have been virtualenv and buildout for a long time. I've used virtualenv for providing isolated Python installation (separate from so often polluted system python) and buildout for managing both the required Python packages, developed packages, and supporting software (like Redis or memcached).

Basically everything still works, but:

  • Managing clean Python virtualenvs for only to avoid possible conflicts with system installed packages feels a lot of work with a small return.
  • Remembering to activate and deactivate the correct Python virtualenv is not fun either.
  • Also, while buildout provides excellent tool (mr.developer) for managing sources for all the project packages, it's far from optimal for building and managing supporting (not Python) software.

I've also using quite a bit of Vagrant and Docker, but, because I'm mostly working on Mac, those require a VM, which makes them much less convenient.

About Nix

I believe, I heard about Nix package manager from Rok at the first time at Barcelona Plone Testing Sprint in the early 2013. It sounded a bit esoteric and complex back then, but after about about twenty months of more virtualenvs, buildouts, Vagrantfiles, Docker containers and puppet manifests... not so much anymore.

Currently, outside NixOS, I understand Nix as

  1. a functional language for describing configuration of software and
  2. a package manager for managing those configurations.

From my own experience, the easiest way to get familiar with Nix is to follow Domen's blog post about getting started with Nix package manager. But to really make it a new tool to your toolbox, you should learn to write your own Nix expressions.

Even the the most common way to use the Nix package manager is to install Nix expressions into your current environment with nix-env, the expressions can also be used without really installing them, in a quite stateless way.

I'm not sure how proper use of Nix this is, but it seems to work for me.

(Yes, I'm aware of myEnvFun, for creating named stateful development environments with Nix expressions, but here I'm trying to use Nix in a more stateless, Docker-inspired way.)

Nix expressions as virtualenv replacements

It's almost never safe to install a Python software directly into your system Python. Different software may require different versions of same libraries and sooner than later the conflicting requirements break your Python installation.

Let's take a small utility software called i18ndude as an example of such software with way too frightening dependencies for any system Python. Traditionally, you could install it into a separate Python virtualenv and use it with the following steps:

$ virtualenv ~/.../i18ndude-env
$ source ~/.../i18ndude-env/bin/activate
$ pip install i18ndude
$ i18ndude
...
$ deactivate

With an executable Nix expression, I can call it in a stateless way with simply executing the expression:

$ ./i18ndude.nix
➜ /nix/store/gjhzw843qs1736r0qcd9mz69247g4svb-python2.7-i18ndude-3.3.5/bin/i18ndude
usage: i18ndude [-h]
                {find-untranslated,rebuild-pot,merge,sync,filter,admix,list,trmerge}
                ...
18ndude: error: too few arguments

Maybe even better, I can install the expression into my default Nix environment with

$ nix-env -i -f i18ndude.nix

and use it like it would have been installed into my system Python in the first place (but this time without polluting it):

$ i18ndude.nix
usage: i18ndude [-h]
                {find-untranslated,rebuild-pot,merge,sync,filter,admix,list,trmerge}
                ...
18ndude: error: too few arguments

No more activating or deactivating virtualenvs, not to mention about needing to remember their names or locations.

For the most common Python softare, it's not required to write your own expression, but you could simply install the contributed expressions directly from Nix packages repository.

The easiest way to check for existing expressions from nixpkgs Python packages seems to be grepping the package list with nix-env -qaP \*|grep something.

If you'd like to see more packages available by default, you can contribute them to upstream with a simple pull request.

Anyway, since i18ndude was not yet available in time of writing (although, most of its dependencies were), this is how my expression for it looked like:

#!/usr/bin/env nix-exec
with import <nixpkgs> { };

let dependencies = rec {
  ordereddict = buildPythonPackage {
    name = "ordereddict-1.1";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/o/ordereddict/ordereddict-1.1.tar.gz";
      md5 = "a0ed854ee442051b249bfad0f638bbec";
    };
  };
};

in with dependencies; rec {
  i18ndude = buildPythonPackage {
    name = "i18ndude-3.3.5";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/i/i18ndude/i18ndude-3.3.5.zip";
      md5 = "ef599b1c64eaabba4049fcd2b027ba21";
    };
    propagatedBuildInputs = [
      ordereddict
      python27Packages."zope.tal-3.5.2"
      python27Packages."plone.i18n-2.0.9"
    ];
  };
}

Nix expression for nix-exec shell wrapper

Of course, Nix expressions are not executable by default. To get them work as I wanted, I had to create tiny wrapper script to be used as the hash-bang line #!/usr/bin/env nix-exec of executable expressions.

The script simply calls nix-build and then the named executable from the build output directory (with some standard environment variables set). To put it another way, the wrapper script translates the following command:

$ ./i18ndude.nix --help

into

$ `nix-build --no-out-link i18ndude.nix`/bin/i18ndude --help

It's not required to suffix the expressions fiels with .nix, but they could also be named without suffix to look more like real commands.

The wrapper script itself, of course, can be installed from a Nix expression into your default Nix environment with nix-env -i -f filename.nix:

with import <nixpkgs> { };

stdenv.mkDerivation {
  name = "datakurre-nix-exec-1.2.1";

  builder = builtins.toFile "builder.sh" "
    source $stdenv/setup
    mkdir -p $out/bin
    echo \"#!/bin/bash
build=\\`nix-build --no-out-link \\$1\\`
if [ \\$build ]; then

  MY_TZ=\\\"\\$TZ\\\"
  MY_PATH=\\\"\\$build/bin:\\$build/sbin:\\$build/libexec:\\$PATH\\\"
  MY_http_proxy=\\\"\\$http_proxy\\\"
  MY_ftp_proxy=\\\"\\$http_proxy\\\"

  if [ -d \\$build/dev-envs ]; then
    source \\\"\\$build/dev-envs/\\\"*

    export TZ=\\\"\\$MY_TZ\\\"
    export PATH=\\\"\\$PATH:\\$MY_PATH\\\"
    export http_proxy=\\\"\\$MY_http_proxy\\\"
    export ftp_proxy=\\\"\\$MY_ftp_proxy\\\"

    export CFLAGS=\\`echo \\$NIX_CFLAGS_COMPILE|sed 's/-isystem /-I/g'\\`
    export LDFLAGS=\\$NIX_LDFLAGS
  else
    export PATH=\\\"\\$MY_PATH\\\"
  fi

  cmd=\\$\{1##*/\}; cmd=\\$\{cmd%%@*\}; cmd=\\$\{cmd%.nix\}
  paths=(\\\"\\$build/bin\\\" \\\"\\$build/sbin\\\" \\\"\\$build/libexec\\\")
  for path in \\\"\\$\{paths[@]\}\\\"; do
    if [ -f \\\"\\$path/\\$\{cmd\}\\\" ]; then
      cmd=\\\"\\$path/\\$\{cmd\}\\\"
      break
    fi
  done

  if [ -t 1 ]; then echo \\\"➜\\\" \\$cmd \\\"\\$\{@:2\}\\\"; fi
  \\\"\\$cmd\\\" \\\"\\$\{@:2\}\\\"
fi
\" > $out/bin/nix-exec
    chmod a+x $out/bin/nix-exec
  ";
}

The wrapper does execute the expression defined command in a fully clean environment (the only isolation is the one myEnvFun provides), but mostly prepends everything defined by the expression into its surrounding execution environment (so that its paths are preferred over the versions in the current environment).

A mostly positive side effect from using Nix expressions like this (only building them, but not installing them into any environment) is that they can be cleaned (from the disk) anytime with simply:

$ nix-collect-garbage

Nix expressions with buildout

Update 2014-09-24: The example was updated to use myEnvFun to simplify the wrapper script.

Update 2014-09-22: I originally covered Nix expressions with buildout as an example of replacing Python virtualenvs with Nix. Unfortunately, because of some buildout limitations that didn't work out as I expected...

A very special case of Python development environment is the one with buildout, which is required e.g. for all development with Plone.

When using Nix expressions with buildout, there is a one very special limitation: buildout does not support any additional Python packages installed into your Nix expression based environment.

That's because buildout sees Nix defined Python as a system Python, and buildout does its best to prevent any extra packages installed into system Python being available for the buildout by default.

An additional issue for buildout is that the extra Python packages defined in Nix expression are not installed directly into under the Python installation, but are made available only when that Python is executed through a specialc Nix generated wrapper.

But to cut this short, here's an example executable Nix expression, which could be used as a Plone-compatible Python environment. It includes a clean Python installation with some additional (non-Python) libraries required by Plone buildout to be able to compile a few special Python packages (like Pillow, lxml and python-ldap):

#!/usr/bin/env nix-exec
with import <nixpkgs> { };

let dependencies = rec {
  buildInputs = [
    cyrus_sasl
    openldap
    libxslt
    libxml2
    freetype
    libpng
    libjpeg
    python27Full
  ];
};

in with dependencies; buildEnv {
  name = "nix";
  paths = [(myEnvFun { name = "nix"; inherit buildInputs; })] ++ buildInputs;
}

With this Nix expression named as an executable ./python.nix, it could be used to execute buildout's bootstrap, buildout and eventually launching the Plone site like:

$ ./python.nix bootstrap.py
$ ./python.nix bin/buildout  # or ./python.nix -S bin/buildout
$ ./python.nix bin/instance fg

I must agree that this is not as convenient as it should be, because each command (bootstrap, buildout and the final buildout generated script) must be executed explicitly using our executable Nix expression defining the required Python-environment.

Also, probably because my wrapper does not completely isolate the Nix expression call from its surrounding environment, sometimes it's required to call the buildout with -S given for the Python expression, like ./python.nix -S bin/buildout (otherwise buildout does not find it's own bootstrapped installation).

On the other hand, this approach defines the execution environment explicitly and statelessly for each call.

P.S. Because I'm working with RHEL systems, it's nice to use Python with similar configuration with those. With Nix, it's easy to define local overrides for existing packages (nixpkgs derivations) there's a special function with only the required configuration changes. The following ~/.nixpkgs/config.nix-example configures Python with similar unicode flag to RHEL's native Python:

{
  packageOverrides = pkgs : with pkgs; rec {
    python27 = pkgs.python27.overrideDerivation (args: {
      configureFlags = "--enable-shared --with-threads --enable-unicode=ucs4";
    }) // { modules = pkgs.python27.modules; };
  };
}

Nix expressions as stateless development environments

Updated 2014-09-24: Because of OSX, I had to fix openldap expression to fix link one library with absolute path to not allow it to resolve an OSX library instead of the Nix built one.

In test driven development, the whole development environment can be built just around the selected test runner.

Here's an example Nix expression, which saved as an executable file called ./py.test can be used to execute pytest test runner with a couple of selected plugins and all the dependencies required by the tested software in question:

#!/usr/bin/env nix-exec
with import <nixpkgs> { };

let dependencies = rec {
  execnet = buildPythonPackage {
    name = "execnet-1.2.0";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/e/execnet/execnet-1.2.0.tar.gz";
      md5 = "1886d12726b912fc2fd05dfccd7e6432";
    };
    doCheck = false;
  };
  pycparser = buildPythonPackage {
    name = "pycparser-2.10";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/p/pycparser/pycparser-2.10.tar.gz";
      md5 = "d87aed98c8a9f386aa56d365fe4d515f";
    };
  };
  cffi = buildPythonPackage {
    name = "cffi-0.8.6";
    src = fetchurl {
      url = "http://pypi.python.org/packages/source/c/cffi/cffi-0.8.6.tar.gz";
      md5 = "474b5a68299a6f05009171de1dc91be6";
    };
    propagatedBuildInputs = [ pycparser ];
  };
  pytest_cache = buildPythonPackage {
    name = "pytest-cache-1.0";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/p/pytest-cache/pytest-cache-1.0.tar.gz";
      md5 = "e51ff62fec70a1fd456d975ce47977cd";
    };
    propagatedBuildInputs = [
       python27Packages.pytest
       execnet
    ];
  };
  pytest_flakes = buildPythonPackage {
    name = "pytest-flakes-0.2";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/p/pytest-flakes/pytest-flakes-0.2.zip";
      md5 = "44b8f9746fcd827de5c02f14b01728c1";
    };
    propagatedBuildInputs = [
       python27Packages.pytest
       python27Packages.pyflakes
       pytest_cache
    ];
  };
  pytest_pep8 = buildPythonPackage {
    name = "pytest-pep8-1.0.6";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/p/pytest-pep8/pytest-pep8-1.0.6.tar.gz";
      md5 = "3debd0bac8f63532ae70c7351e73e993";
    };
    propagatedBuildInputs = [
      python27Packages.pytest
      python27Packages.pep8
      pytest_cache
    ];
  };
  buildInputs = [
    (python27Packages.pytest.override {
      propagatedBuildInputs = [
        python27Packages.readline
        python27Packages.plumbum
        python27Packages.py
        pytest_flakes
        pytest_pep8
      ];
    })
    (lib.overrideDerivation openldap (args: {
      postBuild = if stdenv.isDarwin then ''
        install_name_tool -change /libsasl2.dylib ${cyrus_sasl}/lib/libsasl2.dylib servers/slapd/slapadd
     '' else null;
    }))
  ];
};

in with dependencies; buildEnv {
  name = "nix";
  paths = [(myEnvFun { name = "nix"; inherit buildInputs; })] ++ buildInputs;
}

In other words, this expression could work as a stateless environment for developing the product in question:

$ ./py.test
➜ /nix/store/a2w3hwc66gqm6bncic8km6b69lw2byc6-py.test/bin/py.test
================================== test session starts ==================================
platform darwin -- Python 2.7.8 -- pytest-2.5.1
plugins: flakes, cache, pep8
collected 2 items

src/.../tests/test_things.py ..
=============================== 2 passed in 0.22 seconds ================================

And, once the development is completed, an another expression could be defined for using the developed product.

Nix expression for Robot Framework test runner

Finally, as a bonus, here's an expression, which configures a Python environment with Robot Framework and its Selenium2Library with PhantomJS:

#!/usr/bin/env nix-exec
with import <nixpkgs> { };

let dependencies = rec {
  docutils = buildPythonPackage {
    name = "docutils-0.12";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/d/docutils/docutils-0.12.tar.gz";
      md5 = "4622263b62c5c771c03502afa3157768";
    };
  };
  selenium = buildPythonPackage {
    name = "selenium-2.43.0";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/s/selenium/selenium-2.43.0.tar.gz";
      md5 = "bf2b46caa5c1ea4b68434809c695d69b";
    };
  };
  decorator = buildPythonPackage {
    name = "decorator-3.4.0";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/d/decorator/decorator-3.4.0.tar.gz";
      md5 = "1e8756f719d746e2fc0dd28b41251356";
    };
  };
  robotframework = buildPythonPackage {
    name = "robotframework-2.8.5";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/r/robotframework/robotframework-2.8.5.tar.gz";
      md5 = "2d2c6938830f71a6aa6f4be32227997f";
    };
    propagatedBuildInputs = [
      docutils
    ];
  };
  robotframework-selenium2library = buildPythonPackage {
    name = "robotframework-selenium2library-1.5.0";
    src = fetchurl {
      url = "https://pypi.python.org/packages/source/r/robotframework-selenium2library/robotframework-selenium2library-1.5.0.tar.gz";
      md5 = "07c64a9e183642edd682c2b79ba2f32c";
    };
    propagatedBuildInputs = [
      robotframework
      decorator
      selenium
    ];
  };
};

in with dependencies; buildEnv {
  name = "pybot";
  paths = [
    phantomjs
    (robotframework.override {
      propagatedBuildInputs = [ robotframework-selenium2library ];
    })
  ];
}

Since you may need differently configured Robot Framework installations (with different add-on keyword libraries installed) for different projects, this should be a good fit as an executable Nix expression:

$ ./pybot.nix
➜ /nix/store/q15bimgng25qcxkq2q10finyk0n6qkm2-pybot/bin/pybot
[ ERROR ] Expected at least 1 argument, got 0.

Try --help for usage information.

Asynchronous stream iterators and experimental promises for Plone

  • 0

This post may contain traces of legacy Zope2 and Python 2.x.

Some may think that Plone is bad in concurrency, because it's not common to deployt it with WSGI, but run it on top of a barely known last millennium asynchronous HTTP server called Medusa.

See, The out-of-the-box installation of Plone launches with only a single asynchronous HTTP server with just two fixed long-running worker threads. And it's way too easy to write custom code to keep those worker threads busy (for example, by with writing blocking calls to external services), effectively resulting denial of service for rest of the incoming requests

Well, as far as I know, the real bottleneck is not Medusa, but the way how ZODB database connections work. It seems that to optimize the database connection related caches, ZODB is best used with fixed amount of concurrent worker threads, and one dedicated database connection per thread. Finally, MVCC in ZODB limits each thread can serve only one request at time.

In practice, of course, Plone-sites use ZEO-clustering (and replication) to overcome the limitations described above.

Back to the topic (with a disclaimer). The methods described in this blog post have not been battle tested yet and they may turn out to be bad ideas. Still, it's been fun to figure out how our old asynchronous friend, Medusa, could be used to serve more concurrent request in certain special cases.

ZPublisher stream iterators

If you have been working with Plone long enough, you must have heard the rumor that blobs, which basically means files and images, are served from the filesystem in some special non-blocking way.

So, when someone downloads a file from Plone, the current worker thread only initiates the download and can then continue to serve the next request. The actually file is left to be served asynchronously by the main thread.

This is possible because of a ZPublisher feature called stream iterators (search IStreamIterator interface and its implementations in Zope2 and plone.app.blobs). Stream iterators are basically a way to postpone I/O-bound operations into the main thread's asyncore loop through a special Medusa-level producer object.

And because stream iterators are consumed only within the main thread, they come with some very strict limitations:

  • they are executed only after a completed transaction so they cannot interact with the transaction anymore
  • they must not read from the ZODB (because their origin connection is either closed or in use of their origin worker thread)
  • they must not fail unexpectedly, because you don't want to crash the main thread
  • they must not block the main thread, for obvious reasons.

Because of these limitations, the stream iterators, as such, are usable only for the purpose they have been made for: streaming files or similar immediately available buffers.

Asynchronous stream iterators

What if you could use ZPublisher's stream iterator support also for CPU-bound post-processing tasks? Or for post-processing tasks requiring calls to external web services or command-line utilities?

If you have a local Plone instance running somewhere, you can add the following proof-of-concept code and its slow_ok-method into a new External Method (also available as a gist):

import StringIO
import threading

from zope.interface import implements
from ZPublisher.Iterators import IStreamIterator
from ZServer.PubCore.ZEvent import Wakeup

from zope.globalrequest import getRequest


class zhttp_channel_async_wrapper(object):
    """Medusa channel wrapper to defer producers until released"""

    def __init__(self, channel):
        # (executed within the current Zope worker thread)
        self._channel = channel

        self._mutex = threading.Lock()
        self._deferred = []
        self._released = False
        self._content_length = 0

    def _push(self, producer, send=1):
        if (isinstance(producer, str)
                and producer.startswith('HTTP/1.1 200 OK')):
            # Fix Content-Length to match the real content length
            # (an alternative would be to use chunked encoding)
            producer = producer.replace(
                'Content-Length: 0\r\n',
                'Content-Length: {0:s}\r\n'.format(str(self._content_length))
            )
        self._channel.push(producer, send)

    def push(self, producer, send=1):
        # (executed within the current Zope worker thread)
        with self._mutex:
            if not self._released:
                self._deferred.append((producer, send))
            else:
                self._push(producer, send)

    def release(self, content_length):
        # (executed within the exclusive async thread)
        self._content_length = content_length
        with self._mutex:
            for producer, send in self._deferred:
                self._push(producer, send)
            self._released = True
        Wakeup()  # wake up the asyncore loop to read our results

    def __getattr__(self, key):
        return getattr(self._channel, key)


class AsyncWorkerStreamIterator(StringIO.StringIO):
    """Stream iterator to publish the results of the given func"""

    implements(IStreamIterator)

    def __init__(self, func, response, streamsize=1 << 16):
        # (executed within the current Zope worker thread)

        # Init buffer
        StringIO.StringIO.__init__(self)
        self._streamsize = streamsize

        # Wrap the Medusa channel to wait for the func results
        self._channel = response.stdout._channel
        self._wrapped_channel = zhttp_channel_async_wrapper(self._channel)
        response.stdout._channel = self._wrapped_channel

        # Set content-length as required by ZPublisher
        response.setHeader('content-length', '0')

        # Fire the given func in a separate thread
        self.thread = threading.Thread(target=func, args=(self.callback,))
        self.thread.start()

    def callback(self, data):
        # (executed within the exclusive async thread)
        self.write(data)
        self.seek(0)
        self._wrapped_channel.release(len(data))

    def next(self):
        # (executed within the main thread)
        if not self.closed:
            data = self.read(self._streamsize)
            if not data:
                self.close()
            else:
                return data
        raise StopIteration

    def __len__(self):
        return len(self.getvalue())


def slow_ok_worker(callback):
    # (executed within the exclusive async thread)
    import time
    time.sleep(1)
    callback('OK')


def slow_ok():
    """The publishable example method"""
    # (executed within the current Zope worker thread)
    request = getRequest()
    return AsyncWorkerStreamIterator(slow_ok_worker, request.response)

The above code example simulates a trivial post-processing with time.sleep, but it should apply for anything from building a PDF from the extracted data to calling an external web service before returning the final response.

An out-of-the-box Plone instance can handle only two (2) concurrent calls to a method, which would take one (1) second to complete.

In the above code, however, the post-processing could be delegated to a completely new thread, freeing the Zope worker thread to continue to handle the next request. Because of that, we can get much much better concurrency:

$ ab -c 100 -n 100 http://localhost:8080/Plone/slow_ok
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        Zope/(2.13.22,
Server Hostname:        localhost
Server Port:            8080

Document Path:          /Plone/slow_ok
Document Length:        2 bytes

Concurrency Level:      100
Time taken for tests:   1.364 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      15400 bytes
HTML transferred:       200 bytes
Requests per second:    73.32 [#/sec] (mean)
Time per request:       1363.864 [ms] (mean)
Time per request:       13.639 [ms] (mean, across all concurrent requests)
Transfer rate:          11.03 [Kbytes/sec] received

Connection Times (ms)
               min  mean[+/-sd] median   max
Connect:        1    2   0.6      2       3
Processing:  1012 1196  99.2   1202    1359
Waiting:     1011 1196  99.3   1202    1359
Total:       1015 1199  98.6   1204    1361

Percentage of the requests served within a certain time (ms)
  50%   1204
  66%   1256
  75%   1283
  80%   1301
  90%   1331
  95%   1350
  98%   1357
  99%   1361
  100%   1361 (longest request)

Of course, most of the stream iterator limits still apply: Asynchronous stream iterator must not access the database, which limits the possible use cases a lot. For the same reasons, also plone.transformchain is effectively skipped (no Diazo or Blocks), which limits this to be usable only for non-HTML responses.

experimental.promises

To go experimenting even further, what if you could do similar non-blocking asynchronous processing in the middle of a request? For example, to free the current Zope working thread while fetching a missing or outdated RSS feed in a separate thread and only then continue to render the final response.

An interesting side effect of using streaming iterators is that they allow you to inject code into the main thread's asynchronous loop. And when you are there, it's even possible to queue completely new request for ZPublisher to handle.

So, how would the following approach sound like:

  • let add-on code to annotate requests with promises for fetching the required data (each promise would be a standalone function, which could be executed under the asynchronous stream iterator rules, and when called, would resolve into a value, effectively the future of the promise), for example:

    @property
    def content(self):
        if 'my_unique_key' in IFutures(self.request):
            return IFutures(self.request)['my_unique_key']
        else:
            IPromises(self.request)['my_unique_key'] = my_promise_func
            return u''
    
  • when promises are found, the response is turned into an asynchronous stream iterator, which would then execute all the promises in parallel threads and collects the resolved values, futures:

    def transformIterable(self, result, encoding):
        if IPromises(self.request):
            return PromiseWorkerStreamIterator(
                IPromises(self.request), self.request, self.request.response)
        else:
            return None
    
  • finally, we'd wrap the current Medusa channel in a way that instead of publishing any data yet, a cloned request is queued for the ZPublisher (similarly how retries are done after conflict errors), but those cloned request and annotated to carry the resolved futures:

    def next(self):
       if self._futures:
           IFutures(self._zrequest).update(self._futures)
           self._futures = {}  # mark consumed to raise StopIteration
    
           from ZServer.PubCore import handle
           handle('Zope2', self._zrequest, self._zrequest.response)
       else:
           raise StopIteration
    
  • now the add-on code in question would find the futures from request, not issue any promises anymore and the request would result a normal response pushed all the way to the browser, which initiated the original request.

I'm not sure yet, how good or bad idea this would be, but I've been tinkering with a proof-of-concept implementation called experimental.promises to figure it out.

Of course, there are limits and issues to be aware of. Handling the same request twice is not free, which makes approach effective only when some significant processing can be moved to be done outside the worker threads. Also, because there may be other request between the first and the second pass (freeing the worker to handle other request is the whole point), the database may change between the passes (kind of breaking the MVCC promise). Finally, currently it's possible write the code always set new promises and end into never ending loop.

Anyway, if you are interested to try out these approaches (at your own risk, of course), feel free to ask more via Twitter or IRC.

Cross-Browser Selenium testing with Robot Framework and Sauce Labs

  • 0

How do you keep your Selenium tests up-to-date with your ever-changing user interface? Do you try to fix your existing tests, or do you just re-record them over and over again?

In the Plone Community, we have chosen the former approach (Plone is a popular open source CMS written in Python). We use a tool called Robot Framework to write our Selenium acceptance tests as maintainable BDD-style stories. Robot Framework's extensible test language allows us to describe Plone's features in a natural language sentences, which can then be expanded into either our domain specific or Selenium WebDriver API based testing language.

As an example, have a look at the following real-life acceptance test case on the next generation multilingual support in Plone:

*** Test Cases ***

Scenario: As an editor I can add new translation
    Given a site owner
      and a document in English
      and a document in Catalan
     When I view the Catalan document
      and I add the document in English as a translation
      and I switch to English
     Then I can view the document in English

*** Keywords ***

a site owner
    Enable autologin as  Manager

a document in English
    Create content  type=Document
    ...  container=/${PLONE_SITE_ID}/en/
    ...  id=an-english-document
    ...  title=An English Document

a document in Catalan
    Create content  type=Document
    ...  container=/${PLONE_SITE_ID}/ca/
    ...  id=a-catalan-document
    ...  title=A Catalan Document

I view the Catalan document
    Go to  ${PLONE_URL}/ca/a-catalan-document
    Wait until page contains  A Catalan Document

I add the document in English as a translation
    Click Element  css=#plone-contentmenu-multilingual .actionMenuHeader a
    Wait until element is visible  css=#_add_translations

    Click Element  css=#_add_translations
    Wait until page contains element
    ...  css=#formfield-form-widgets-content-widgets-query .searchButton

    Click Element  css=#formfield-form-widgets-content-widgets-query .searchButton
    Wait until element is visible  css=#form-widgets-content-contenttree a[href$='/plone/en']

    Click Element  css=#form-widgets-content-contenttree a[href$='/plone/en']
    Wait until page contains  An English Document

    Click link  xpath=//*[contains(text(), 'An English Document')]/parent::a
    Click Element  css=.contentTreeAdd

    Select From List  name=form.widgets.language:list  en
    Click Element  css=#form-buttons-add_translations
    Click Element  css=#contentview-view a
    Wait until page contains  A Catalan Document

I switch to English
    Click Link  English
    Wait until page contains  An English Document

I can view the document in English
    Page Should Contain Element
    ...  xpath=//*[contains(@class, 'documentFirstHeading')][./text()='An English Document']
    Page Should Contain Element
    ...  xpath=//ul[@id='portal-languageselector']/li[contains(@class, 'currentLanguage')]/a[@title='English']

About Robot Framework

Robot Framework is a generic keyword-driven test automation framework for acceptance testing and acceptance test-driven development. It's a neat tool by itself, yet its testing capabilities can be extended implementing custom test libraries either with Python or Java – without any other limits.

The super powers of Robot Framework come from its user keyword feature: in addition to the keywords provided by the extension libraries, users can create new higher-level keywords from the existing ones using the same syntax that is used for creating test cases. And, of course, everything can be parametrized with variables.

How could you cross-browser test your web applications with Robot Framework and its popular Selenium WebDriver keyword library?

Installing Robot Framework

To get started, we need Firefox, Python 2.7 (or Python 2.6) with pip package manager and virtualenv isolation packages installed. For Linux-distributions, all of these should be available directly from the system repositories (e.g. using apt-get install python-virtualenv in Ubuntu), but on OS X and Windows, some extra steps would be needed.

Once all these prerequisites all available, you can install Robot Framework and all the requirements for Selenium-testing with:

$ virtualenv robot --no-site-packages
$ robot/bin/pip install robotframework-selenium2library

The installation process should look something like:

$ virtualenv robot --no-site-packages
New python executable in robot/bin/python
Installing Setuptools...done.
Installing Pip...done.

$ robot/bin/pip install robotframework-selenium2library
Downloading/unpacking robotframework-selenium2library
  Downloading robotframework-selenium2library-1.5.0.tar.gz (216kB): 216kB downloaded
  Running setup.py egg_info for package robotframework-selenium2library
  ...
Downloading/unpacking decorator>=3.3.2 (from robotframework-selenium2library)
  Downloading decorator-3.4.0.tar.gz
  Running setup.py egg_info for package decorator
  ...
Downloading/unpacking selenium>=2.32.0 (from robotframework-selenium2library)
  Downloading selenium-2.40.0.tar.gz (2.5MB): 2.5MB downloaded
  Running setup.py egg_info for package selenium
  ...
Downloading/unpacking robotframework>=2.6.0 (from robotframework-selenium2library)
  Downloading robotframework-2.8.4.tar.gz (579kB): 579kB downloaded
  Running setup.py egg_info for package robotframework
  ...
Downloading/unpacking docutils>=0.8.1 (from robotframework-selenium2library)
  Downloading docutils-0.11.tar.gz (1.6MB): 1.6MB downloaded
  Running setup.py egg_info for package docutils
  ...
Installing collected packages: robotframework-selenium2library, decorator, selenium, robotframework, docutils
  Running setup.py install for robotframework-selenium2library
  ...
  Running setup.py install for decorator
  ...
  Running setup.py install for selenium
  ...
  Running setup.py install for robotframework
  ...
  Running setup.py install for docutils
  ...
 Successfully installed robotframework-selenium2library decorator selenium robotframework
 docutils
 Cleaning up...

And we should end up having the Robot Framework executable installed at:

$ robot/bin/pybot

Writing a Selenium test suite in robot

In the following examples, we use Robot Framework's space separated plain text test format. In this format a simple test suite can be written in a single plain text file named with a .robot suffix. To maximize readability, only two or more spaces are required to separate the different test syntax parts in the same line.

In the first example, we:

  • import Selenium2Library to enable Selenium keywords (because only the built-in keywords are available by default)
  • define simple test setup and teardown keywords
  • implement a simple test case using the imported Selenium keywords
  • use a tag to categorize the test case
  • abstract the test with a variable to make it easier to update the test later.

Now, write the following complete Selenium test suite into a file named test_saucelabs_login.robot:

*** Settings ***

Library  Selenium2Library

Test Setup  Open test browser
Test Teardown  Close test browser

*** Variables ***

${LOGIN_FAIL_MSG}  Incorrect username or password.

*** Test Cases ***

Incorrect username or password
    [Tags]  Login
    Go to  https://saucelabs.com/login

    Page should contain element  id=username
    Page should contain element  id=password

    Input text  id=username  anonymous
    Input text  id=password  secret

    Click button  id=submit

    Page should contain  ${LOGIN_FAIL_MSG}

*** Keywords ***

Open test browser
    Open browser  about:

Close test browser
    Close all browsers

A standalone test suite may contain one to four sections from *** Settings ***, *** Variables ***, *** Test Cases *** and *** Keywords ***, but always the *** Test Cases *** section. To summarize the sections:

Settings
Imports all the used keyword libraries and user keyword resource files. Contains all test suite level configuration such as suite/test setup and teardown instructions.
Variables
Defines all suite level variables with their default values.
Test Cases
Contains all the test cases for the test suite.
Keywords
Contains all the suite level user keyword implementations.

For the complete list of all available features for each of these sections, you can refer to Robot Framework User Guide.

Running a robot test suite

The default Robot Framework test runner is called pybot. Next, we can execute our test suite and create a test report from the execution by typing:

$ robot/bin/pybot test_saucelabs_login.robot

Besides opening a web browser, our example test suite run should look like:

$ robot/bin/pybot test_saucelabs_login.robot
==============================================================================
Test Saucelabs Login
==============================================================================
Incorrect username or password                                        | PASS |
------------------------------------------------------------------------------
Test Saucelabs Login                                                  | PASS |
1 critical test, 1 passed, 0 failed
1 test total, 1 passed, 0 failed
==============================================================================
Output:  /.../output.xml
Log:     /.../log.html
Report:  /.../report.html

And the test run should result a HTML test report file named report.html and a complete step by step test log file named log.html. The latter should look like:

http://1.bp.blogspot.com/-EgtubAVwqRs/UypsztN22oI/AAAAAAAAAjs/IvuzBPFIrMY/s1600/saucelabs-robot-log.png

To see all the available options for the test runner, just type:

$ robot/bin/pybot --help

Writing a Sauce-Labs Selenium test suite in robot

Now that we have a working Robot Framework installation and a functional test suite, we can continue to refactor the test suite to support cross-browser testing with Sauce Labs.

Let's update our test_saucelabs_login.robot to look like:

*** Settings ***

Library  Selenium2Library
Library  SauceLabs

Test Setup  Open test browser
Test Teardown  Close test browser

*** Variables ***

${BROWSER}  firefox
${REMOTE_URL}
${DESIRED_CAPABILITIES}

${LOGIN_FAIL_MSG}  Incorrect username or password.

*** Test Cases ***

Incorrect username or password
    [Tags]  Login
    Go to  https://saucelabs.com/login

    Page should contain element  id=username
    Page should contain element  id=password

    Input text  id=username  anonymous
    Input text  id=password  secret

    Click button  id=submit

    Page should contain  ${LOGIN_FAIL_MSG}

*** Keywords ***

Open test browser
    Open browser  about:  ${BROWSER}
    ...  remote_url=${REMOTE_URL}
    ...  desired_capabilities=${DESIRED_CAPABILITIES}

Close test browser
    Run keyword if  '${REMOTE_URL}' != ''
    ...  Report Sauce status
    ...  ${SUITE_NAME} | ${TEST_NAME}
    ...  ${TEST_STATUS}  ${TEST_TAGS}  ${REMOTE_URL}
    Close all browsers

All the things we changed:

  • a new keyword library called SauceLabs is imported
  • keyword Open test browser is abstracted to be configurable with variables to support running the tests at Sauce Labs
  • keyword Close test browser is enhanced to send test details and test result to Sauce Labs by calling the new Report Sauce status keyword.

Next, we must implement our custom Sauce Labs keyword library with Python by creating the following SauceLabs.py file to provide the new Report Sauce status keyword:

import re
import requests
import simplejson as json

from robot.api import logger
from robot.libraries.BuiltIn import BuiltIn

USERNAME_ACCESS_KEY = re.compile('^(http|https):\/\/([^:]+):([^@]+)@')


def report_sauce_status(name, status, tags=[], remote_url=''):
    # Parse username and access_key from the remote_url
    assert USERNAME_ACCESS_KEY.match(remote_url), 'Incomplete remote_url.'
    username, access_key = USERNAME_ACCESS_KEY.findall(remote_url)[0][1:]

    # Get selenium session id from the keyword library
    selenium = BuiltIn().get_library_instance('Selenium2Library')
    job_id = selenium._current_browser().session_id

    # Prepare payload and headers
    token = (':'.join([username, access_key])).encode('base64').strip()
    payload = {'name': name,
               'passed': status == 'PASS',
               'tags': tags}
    headers = {'Authorization': 'Basic {0}'.format(token)}

    # Put test status to Sauce Labs
    url = 'https://saucelabs.com/rest/v1/{0}/jobs/{1}'.format(username, job_id)
    response = requests.put(url, data=json.dumps(payload), headers=headers)
    assert response.status_code == 200, response.text

    # Log video url from the response
    video_url = json.loads(response.text).get('video_url')
    if video_url:
        logger.info('<a href="{0}">video.flv</a>'.format(video_url), html=True)

Finally, we must install a couple of required Python libraries into our Python virtual environment with:

$ robot/bin/pip install simplejson requests

We are almost there!

Running a robot test suite with Sauce Labs

Once we have abstracted our test suite to support Sauce Labs with configurable suite variables, we can run the tests either locally, or using Sauce Labs, or using different browsers at Sauce Labs, just by executing the Robot Framework test runner with different arguments.

  1. To run the test suite locally, we simply type:

    $ robot/bin/pybot test_saucelabs_login.robot
    
  2. To run the test at Sauce Labs, we pass the Sauce Labs OnDemand address as ${REMOTE_URL} variable by using -v argument supported by the test runner:

    $ robot/bin/pybot -v REMOTE_URL:http://USERNAME:ACCESS_KEY@ondemand.saucelabs.com:80/wd/hub test_saucelabs_login.robot
    

    Make sure to replace USERNAME and ACCESS_KEY with your Sauce Labs account username and its current access key!

  3. To change the Sauce Labs test browser or platform, we just need to add an another variable with -v to define the required browser in ${DESIRED_CAPABILITIES} variable passed to Selenium.

    The only trick is to know the format used by the Selenium keyword library: a comma separated string with KEY:VALUE-pairs of the desired WebDriver capabilities.

    The full command to run the test suite with iPhone 7 browser at Sauce Labs would look like:

    $ robot/bin/pybot -v DESIRED_CAPABILITIES:"platform:OS X 10.9,browserName:iphone,version:7" -v REMOTE_URL:http://USERNAME:ACCESS_KEY@ondemand.saucelabs.com:80/wd/hub test_saucelabs_login.robot
    
  4. And to top the cake, we can also include our CI build number, just by adding the parameter into our desired capabilities string:

    $ robot/bin/pybot -v DESIRED_CAPABILITIES:"build:demo,platform:OS X 10.9,browserName:iphone,version:7" -v REMOTE_URL:http://USERNAME:ACCESS_KEY@ondemand.saucelabs.com:80/wd/hub test_saucelabs_login.robot
    

This is how our final tests would look in the Sauce Labs test table, with test names, tags, build numbers, results and all the stuff!

http://3.bp.blogspot.com/-UsrXUoyBFs0/UyptB8AcAqI/AAAAAAAAAj0/mrPBn_17xtU/s1600/saucelabs-table.png

Quite nice, isn't it.

P.S. The final example can be downloaded as a gist at: https://gist.github.com/datakurre/9589707


Written by Asko Soukka – an occasional Plone core contributor and a full time web developer at University of Jyväskylä, Finland.