Skip to content

Legacy content import

Pages are migrated from Drupal Page nodes to Wagtail pages via a series of dedicated Django management commands, similar to those used for CIS Replication.

All of the Wagtail page types that are populated by an importer have a few additional fields to help with migration:

  • drupal_id: An IntegerField to store the original 'node id' value for the equivalent page in Drupal.
  • last_imported_at: A DateTimeField to store when the node was last updated by it's respective importer. Importers use this value to help avoid unncecessarily re-importing content that is already up-to-date in Wagtail.
  • legacy_path: An CharField that stores the 'path' of the page in the original site. This value is used by most importers to determine the slug value for new pages, and to try and find a suitable parent page. However, if for any reason a parent page doesn't exist with an equivalent URL, or another page exists with the same slug, importers will save the page elsewhere (usually below the homepage), and a fixup command will use the value to reparent/rename the page later, once the tree is better populated.

There are basically two types of import command:

1. Content import commands

These read content from the Tate API's 'nodes' endpoint (https://api.tate.org.uk/nodes/), using the "?facets=type:contenttypename" filter to fetch results of a specific Drupal content type, and turn the results into Wagtail pages.

Thankfully, the API results include a 'changed' timestamp value, so if the commands need to be re-run for any reason, only rows that have changed since the last time the importer updated the equivalent Wagtail page will be processed.

Most of commands translate Drupal nodes to a single Wagtail page type. But, there are others where content is converted to multiple Wagtail page types, depending on a number of factors.

The commands are non-destructive, so should never delete a Wagtail page - only create or update them.

2. Fixup commands

These commands are run at the end of the migration process to fix problems that couldn't be solved at the time the pages were originally imported. For example:

  • A page couldn't be created in a certain part of the tree, because the pages that make up that section hadn't been migrated yet
  • The page content referenced a number of pages of that hadn't yet been migrated
  • The page content referenced images or documents that hadn't yet been migrated

The fixup commands read imported data directly out of the datbase, and save any changes back to it, making them faster and more reliable than content importers.

Migragtion process

Before running the following commands, run the following to put all of the relevant structural pages in place first:

python manage.py create_structural_pages

Then run each of the commands below, in the order they are listed:

TBC