Transmogrifier, the Python migration pipeline, also for Python 3
TL;DR; I forked
collective.transmogrifier
into just transmogrifier
(not yet released) to make its core usable without Plone dependencies,
use Chameleon for
TAL-expressions, installable with just pip install
and compatible with
Python 3.
Transmogrifier is one of the many great developer tools by the Plone community. It’s a generic pipeline tool for data manipulation, configurable with plain text INI-files, while new re-usable pipeline section blueprints can be implemented and packaged in Python. It could be used to process any number of things, but historically it’s been mainly developed and used as a pluggable way to import legacy content into Plone.
A simple transmogrifier pipeline for dumping news from Slashdot to a CSV file could look like:
[transmogrifier]
pipeline =
from_rss
to_csv
[from_rss]
blueprint = transmogrifier.from
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/slashdot/slashdot
[to_csv]
blueprint = transmogrifier.to_csv
fieldnames =
title
link
filename = slashdot.csv
Actually, in time of writing this, I’ve yet to do any Plone migrations using transmogrifier. But when we recently had a reasonable size non-Plone migration task, I knew not to re-invent the wheel, but to transmogrify it. And we succeeded. Transmogrifier pipeline helped us to design the migration better, and splitting data processing into multiple pipeline sections helped us to delegate the work between multiple developers.
Unfortunately, currently collective.transmogrifier has unnecessary dependencies on CMFCore, is not installable without long known good set of versions and is missing any built-int command-line interface. At first, I tried to do all the necessary refactoring inside collective.transmogrifier, but eventually a fork was required to make the transmogrifier core usable outside Plone-environments, be compatible with Python 3 and to not break any existing workflows depending on the old transmogrifier.
So, meet the new transmogrifier:
- can be installed with
pip install
(although, not yet released at PyPI) - new mr.migrator inspired
command-line interface (see
transmogrif --help
for all the options) - new base classes for custom blueprints
transmogrifier.blueprints.Blueprint
transmogrifier.blueprints.ConditionalBlueprint
- new ZCML-directives for registering blueprints and re-usable
pipelines
<transmogrifier:blueprint component="" name="" />
<transmogrifier:pipeline id="" name="" description="" configuration="" />
- uses Chameleon for
TAL-expressions (e.g. in
ConditionalBlueprint
) - has only a few generic built-in blueprints
- supports z3c.autoinclude for package transmogrifier
- fully backwards compatible with blueprints for collective.transmogrifier
- runs with Python >= 2.6, including Python 3+
There’s still much work to do before a real release (e.g. documenting and testing the new CLI-script and new built-in blueprints), but let’s still see how it works already…
P.S. Please, use a clean Python virtualenv for these examples.
Example pipeline
Let’s start with an easy installation
$ pip install git+https://github.com/datakurre/transmogrifier
$ transmogrify --help
Usage: transmogrify <pipelines_and_overrides>...
[--overrides=overrides.cfg>]
[--include=package_or_module>...]
[--include=package:filename>...]
[--context=<package.module.factory>]
transmogrify --list
[--include=package_or_module>...]
transmogrify --show=<pipeline>
[--include=package_or_module>...]
and with example filesystem pipeline.cfg
[transmogrifier]
pipeline =
from_rss
to_csv
[from_rss]
blueprint = transmogrifier.from
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/slashdot/slashdot
[to_csv]
blueprint = transmogrifier.to_csv
fieldnames =
title
link
filename = slashdot.csv
and its dependencies
$ pip install feedparser
and the results
$ transmogrify pipeline.cfg
INFO:transmogrifier:CSVConstructor:to_csv wrote 25 items to /.../slashdot.csv
using, for example, Python 2.7 or Python 3.4.
Minimal migration project
Let’s create an example migration project with custom blueprints using Python 3. In addition to transmogrifier, we need venusianconfiguration for easy blueprint registration and, of course, actual depedencies for our blueprints:
$ pip install git+https://github.com/datakurre/transmogrifier
$ pip install git+https://github.com/datakurre/venusianconfiguration
$ pip install fake-factory
Now we can implement custom blueprints in, for example, blueprints.py
from venusianconfiguration import configure
from transmogrifier.blueprints import Blueprint
from faker import Faker
.transmogrifier.blueprint.component(name='faker_contacts')
class FakerContacts(Blueprint):
def __iter__(self):
for item in self.previous:
yield item
amount = int(self.options.get('amount', '0'))
fake = Faker()
for i in range(amount):
yield {
'name': fake.name(),
'address': fake.address()
}
and see them registered next to the built-in ones (or from the other packages hooking into transmogrifier autoinclude entry-point):
$ transmogrify --list --include=blueprints
Available blueprints
--------------------
faker_contacts
...
Now, we can make an example pipeline.cfg
[transmogrifier]
pipeline =
from_faker
to_csv
[from_faker]
blueprint = faker_contacts
amount = 2
[to_csv]
blueprint = transmogrifier.to_csv
and enjoy the results
$ transmogrify pipeline.cfg to_csv:filename=- --include=blueprints
address,name
"534 Hintz Inlet Apt. 804
Schneiderchester, MI 55300",Dr. Garland Wyman
"44608 Volkman Islands
Maryleefurt, AK 42163",Mrs. Franc Price DVM
INFO:transmogrifier:CSVConstructor:to_csv saved 2 items to -
An alternative would be to just use the shipped mr.bob-template…
Migration project using the template
The new transmogrifier ships with an easy getting started template for your custom migration project. To use the template, you need a Python environment with mr.bob and the new transmogrifier:
$ pip install mr.bob readline # readline is an implicit mr.bob dependency
$ pip install git+https://github.com/datakurre/transmogrifier
Then you can create a new project directory with:
$ mrbob bobtemplates.transmogrifier:project
Once the new project directory is created, inside the directory, you can install rest of the depdendencies and activate the project with:
$ pip install -r requirements.txt
$ python setup.py develop
Now transmogrify knows your project’s custom blueprints and pipelines:
$ transmogrify --list
Available blueprints
--------------------
myprojectname.mock_contacts
...
Available pipelines
-------------------
myprojectname_example
Example: Generates uppercase mock addresses
And the example pipeline can be executed with:
$ transmogrify myprojectname_example
name,address
ISSAC KOSS I,"PSC 8465, BOX 1625
APO AE 97751"
TESS FAHEY,"PSC 7387, BOX 3736
APO AP 13098-6260"
INFO:transmogrifier:CSVConstructor:to_csv wrote 2 items to -
Please, see created README.rst
for how to edit the example blueprints
and pipelines and create more.
Mandatory example with Plone
Using the new
transmogrifier with Plone
should be as simply as adding it into your buildout.cfg
next to the
old transmogrifier packages:
[buildout]
extends = http://dist.plone.org/release/4.3-latest/versions.cfg
parts = instance plonesite
versions = versions
extensions = mr.developer
soures = sources
auto-checkout = *
[sources]
transmogrifier = git https://github.com/datakurre/transmogrifier
[instance]
recipe = plone.recipe.zope2instance
eggs =
Plone
z3c.pt
transmogrifier
collective.transmogrifier
plone.app.transmogrifier
user = admin:admin
zcml = plone.app.transmogrifier
[plonesite]
recipe = collective.recipe.plonesite
site-id = Plone
instance = instance
[versions]
future =
setuptools =
zc.buildout =
Let’s also write a fictional migration pipeline, which would create Plone content from Slashdot RSS-feed:
[transmogrifier]
pipeline =
from_rss
id
fields
folders
create
update
commit
[from_rss]
blueprint = transmogrifier.from
modules = feedparser
expression = python:modules['feedparser'].parse(options['url']).get('entries', [])
url = http://rss.slashdot.org/Slashdot/slashdot
[id]
blueprint = transmogrifier.set
modules = uuid
id = python:str(modules['uuid'].uuid4())
[fields]
blueprint = transmogrifier.set
portal_type = string:Document
text = path:item/summary
_path = string:slashdot/${item['id']}
[folders]
blueprint = collective.transmogrifier.sections.folders
[create]
blueprint = collective.transmogrifier.sections.constructor
[update]
blueprint = plone.app.transmogrifier.atschemaupdater
[commit]
blueprint = transmogrifier.transform
modules = transaction
commit = modules['transaction'].commit()
Now, the new CLI-script can be used together with
bin/instance -Ositeid run
provided by
plone.recipe.zope2instance
so that transmogrifier will get your site as its context simply by
calling `zope.component.hooks.getSite`:
$ bin/instance -OPlone run bin/transmogrify pipeline.cfg --context=zope.component.hooks.getSite
With Plone you should, of course, still use Python 2.7.
Funnelweb example with Plone
Funnelweb is a collection of transmogrifier blueprints an pipelines for scraping any web site into Plone. I heard that its example pipelines are a little outdated, but they make a nice demo anywyay.
Let’s extend our previous Plone-example with the following
funnelweb.cfg
buildout to include all the necessary transmogrifier
blueprints and the example funnelweb.ttw
pipeline:
[buildout]
extends = buildout.cfg
[instance]
eggs +=
transmogrify.pathsorter
funnelweb
We also need a small additional pipeline commit.cfg
to commit all the
changes made by funnelweb.ttw
:
[transmogrifier]
pipeline = commit
[commit]
blueprint = transmogrifier.interval
modules = transaction
expression = python:modules['transaction'].commit()
Now, after the buildout has been run, the following command would use
pipelines funnelweb.ttw
and commit.cfg
to somewhat scrape my blog
into Plone:
$ bin/instance -OPlone run bin/transmogrify funnelweb.ttw commit.cfg crawler:url=http://datakurre.pandala.org "crawler:ignore=feeds\ncsi.js" --context=zope.component.hooks.getSite
For tuning the import further, the used pipelines could be easily
exported into filesystem, customized, and then executed similarly to
commit.cfg
:
$ bin/instance -OPlone run bin/transmogrify --show=funnelweb.ttw > myfunnelweb.cfg