Skip to content

oxfamamerica/oxfama.transmogrifier

Repository files navigation

oxfama.transmogrifier

Tranmogrifier pipelines for the analysis and export of Plone content from the Oxfam America website.

Based on collective.transmogrifier and related tools.

What do You Get?

This package provides a transmogrifier export pipeline which will dump the field values of the schema of all of the items in the oxfam america website into a set of CSV files, organized by object type. Each type of object which has a schema will have its own csv file. Each file will have a header row which lists the fields output in alphabetical order. Each row after the header will contain the values for the given fields, if any, present on a single object within the site. In addition to schema fields, the path to the object within the plone site and the UID and UUID (if present) will be provided.

For file fields (images, files, etc), the value will be a simple string indicating that the field is a binary file field. No file data will be contained in the csv export. File data will be exported separately (see below).

For reference fields, the type and UID of the referenced object(s) will be listed. This information can be used to look up the referenced object. Simply find the row with the given UID value in the csv file named for the type.

You can control which fields are output (see Controlling the Pipeline below).

All binary file data (images, media, .pdf, .doc, etc.) will be written out to a filesystem structure that mirrors the Plone site structure. Each file will be located at the same path where it is found in the site. The sole exception to this is if a content object has more than one binary file field, in which case a folder named for the Plone object will be created and each of the files will be written to that folder.

Installing This Package

As this package is unreleased, installing it requires the mr.developer buildout extension.

First, include the package in the [sources] section of your buildout. The source is in a private repository, so unless you have ssh authentication for git set up on the machine where you are working, you'll need to use HTTP authentication:

oxfama.transmogrifier = git https://github.com/oxfamamerica/oxfama.transmogrifier.git

Next, include the package name in the eggs = section of the buildout:

eggs =
    ...
    oxfama.transmogrifier

Once this is done, you can re-run buildout. You should also see collective.transmogrifier and quintagroup.transmogrifier installed as dependencies of this package.

This package does not provide an installable Generic Setup profile, so you should not expect to see it listed either in the Add-ons panel in Plone Site Setup or in the list of available add-ons when you add a new Plone site.

To verify that the package is properly installed, go to the ZMI (Zope Management Interface). Find and click on the portal_setup tool and then click on the import tab. Look for the Oxfam America Site Dump profile listed in the drop-down list of available Profiles or Snapshots at the top.

Running the Export Pipeline

To run the export dump, you'll need to use the portal_setup as described above.

  • Go to the ZMI
  • Find and click on the portal_setup tool
  • Click on the Import tab
  • Find and select the Oxfam America Site Dump profile in the Select Profile or Snapshot dropdown.
  • Find the Run transmogrifier pipeline step in the list of available import steps. Click the checkbox to select it.
  • At the bottom of the page, unselect the Include dependencies checkbox.
  • Click the import selected steps button.

The export will run for some time. You can see progress in the terminal if you are running the site in fg mode.

After all items are dumped, the CSV files will be written to the destination provided in the pipeline configuration (see Controlling the Pipeline below)

Controlling the Pipeline

There are a number of settings you can control for the pipeline. These controls are available by editing the content_to_csv.cfg configuration file in this package. This file is located in oxfama/transmogrifier/export.

The file is organized into a series of sections, each delineated by a [name] in square brackets.

The [transmogrifier] section provides a list of the pipeline sections that make up the export pipeline as well as the local_destination which should be an absolute filesystem path to the folder where the CSV output files will be written:

[transmogrifier]
# configure pipeline and other required information
pipeline =
    source
    schemadumper
    writer
    logger

local_destination = /absolute/path/to/folder

It is vital to ensure that the user running the Plone process have write access to the destination folder. If no local destination is provided, the system will attempt to use /tmp.

The [source] section provides settings for the SiteWalkerSection that is used to walk the object graph of the site and read the contents. You can provide this section a starting path, if you wish to focus only on one section of the website. This path should be absolute, based from the ZMI (/). The object identified by the path will be included. The walk can be limited using the limit setting. At most this number of objects will be dumped. If the setting is omitted, or set to 0, all objects will be dumped:

[source]
blueprint = oxfama.transmogrifier.sitewalker
start-path = /oxfam
limit = 1000

The [schemadumper] section controls the way in which the schema of objects is converted to a csv row. You may provide a list of fields to exclude. If the named field exists in an object's schema, it will be left out of the final csv file for that type. This can be used to control the volume of data dumped:

[schemadumper]
blueprint = oxfama.transmogrifier.schemadumper
exclude =
    text
    description
    file
    image

The [filewriter] section controls the writing of binary file data to the filesystem. The path variable (which defaults to the local_destination value set above) determines the root filesystem path on the server to which all files will be written. This folder must exist and be writable by the user running Plone. No default location will be set if this value is omitted. The context variable determines how the files will be exported. Leave this set to directory to have all files written to the filesystem.

The [writer] section controls the writing of csv files to the filesystem. There are no settings currently available for this section.

The [logger] section controls the writing of log output for each object as it is passed through the pipeline. You can provide a list of the keys for each item that will be written to the log line.

To omit any section, simply remove its name from the pipeline setting in the [transmogrifier] section of the configuration file.

About

A place to hold our scripts that deal with Plone

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages