Tate website data sources¶

A number of different sources supply the website with the data it requires. These consist of databases and processes within the Tate network, for the most part managed by the Technology department.

Diagram of Tate data sources

eMaintenance¶

eMaintenance is a database used by the Visitor Experience team to record artworks currently on display in all four galleries. Records are updated every Friday afternoon.

iBase¶

iBase is Tate's Digital Asset Management System (DAMS) for storing and accessing photography of its collection and archive.

ImageNet¶

ImageNet is the partner site to iBase, used to manage all non-collection photography, such as images of installations and events, Tate buildings and estate, or any photography not specific to a collection or archive item.

T-Drive & TextLoader¶

Detailed information about each collection item, such as display captions and summaries, artist biographies and catalogue entries, is saved on the T-Drive in Tate's intranet. A process called TextLoader syncs text from files in a watched folder to the CIS database.

ImageLoader¶

Running on a JavaNet server, ImageLoader is a script that generates 4 image sizes (labelled 7, 8, 9 & 10) from collection photography stored in iBase, and saves these to a shared drive in the WHAM stack every Friday at 6 PM. A separate ImageOrganiser script sorts images that do not have copyright clearance into a special folder so they do not appear on the website. The images folder on the shared drive is synced to an Azure Blob Storage container which is hosted at media.tate.org.uk.

WHAM stack¶

The outcome of a 'Web Hosting Azure Migration' in 2019, the WHAM stack comprises a number of virtual machines hosted on Microsoft's Azure platform. There is a production and a pre-production stack. These VMs are now being slowly replaced by various services in the Wagtail Azure account.

~~Frontend~~ (Legacy Django frontend server, now replaced by Wagtail frontend running in Azure App Service)
~~CMS~~ (Legacy Drupal CMS server, replaced by Wagtail CMS)
~~MySQL~~ (Database storing Drupal data, replaced by PostgreSQL DB service in Wagtail Azure)
API (serves Tate's API v1)
Varnish (runs the Varnish caching service)

CALM¶

CALM is the Library and Archive management system. Its data is fed to TMS.

CRAM¶

The 'Collection Research Assets Manager' contains the data for some legacy research publications, including the extensive Turner research project, comprising some 40,000 objects. CRAM supplies data for the rather ancient Research Pages frontend, which is built in the Java-based RSF framework.

TMS¶

TMS (The Museum System) is proprietary software used by Tate for collection management. In most cases it is the 'source of truth' for all Tate's collection objects (artworks) and constituents (artists). It also contains historical data on loans, exhibitions and shipping, as well as legal information relating to the collection.

CIS¶

CIS (Collection Information System) is where the magic happens. It's an Oracle database on a JavaNet server that unifies the multiple sources listed above into a single source (in the form of a Lucene index) comprising the base data that feeds the website. The CIS database is refreshed every Sunday evening and the new indices are copied to the Research VM on the WHAM stack, where they are fed into the API (v1) build.

API v1¶

Prior to the Wagtail migration, Tate's legacy API supplied the Django frontend with all the data it needed: a mix of information from CIS and the Drupal CMS. While the latter has now been decommissioned The API is powered by an ElasticSearch instance, which rebuilds its index every Monday at 5 AM via a 'cis2json' script, which resides on WHAM's Research VM. The script basically translates CIS's Lucene indices into JSON files. The process is described in depth here, and more detailed legacy API documentation can be found here.