Skip to content

Caching strategy

This document explores the different cache mechanisms used by the project, and explains how they are used and evaluated.

1. Downstream caching

All hosted versions of the project use a Downstream cache (or reverse-proxy) to keep server activity down to mimimum, and provide faster access to 'already requested' content to users. This approach works especially well for Tate, because users of the front-end mostly see exactly the same thing when requesting most URL accross the site. Generally, the more user-specific or request-specific responses are, the less suitable they are for this type of caching.

In Tate's case, this service is provided by Cloudflare, which provides much more than just caching. But, for the sake of this article, that's all we'll focus on here.

TIP: If you need implement changes that might significantly jeopardize the usefulness of the dowstream cache, consider having the generic content load initially, and updating only the relevant parts with Javascript.

How does it work?

Downstream caches sit 'in-front of' the application, and intercept every single all of the requests. They attempt to find a suitable 'cached' response to send back, without bothering the application server at all.

Where a cached response is unavailable (or unsuitable for sharing), requests are allowed through to the application server, and a response is generated in the usual way. Before handing the response back to the requester though, the downstream cache saves a copy, in case it can be reused.

The application tells the downstream cache which responses are suitable for caching (and under what circumstances) by including one or more HTTP headers with every response. It is good practice to include these headers regardless of whether a downstream cache is in place or not, because they also help web browsers and other software to understand when they can and cannot reuse a response they have already recieved.

How invalidation works

Content is usually only cached for a few hours (the default 'expiry' be controlled via the CACHE_CONTROL_S_MAXAGE env var which, for those with access, can be updated via Azure), so in most cases, will expire automatically. In addition to this, Wagtail's Front-end cache invalidator app triggers one-off purges when the following things happen:

  1. A page is published
  2. A page is unpublished
  3. A page is deleted
  4. Changes to a page are published

However, these actions only trigger a purge for the page in question - not everywhere on the site that page appears. On the Tate website, any page can be featured on any other page (either in a strip, or simply as a link in rich text), and changes made to a page are not automatically reflected in those places. The website implements the following measures to help reduce issues:

Automatic creation of redirects

Page URLs in Wagtail are managed automatically. In most cases, a page's URL is made up of its own slug and the slugs of its ancestors (joined by slashes '/'). So, if you change a page's slug or move a page to a different part of the tree, the canonical URL of that page (and any of it's descendants) changes.

Links to those pages in other parts of the site should automatically correct themselves in time. But, while cached versions of those pages are still being served to visitors, visitors are still going to be clicking on old links and expecting to get to the correct page. By automatically creating a (temporary) Redirect for pages when the CMS detects a change to their URL, we ensure that anyone requesting the page at its old URL is taken to the updated one.

Manual purging of URLs

If it's important that changes to a page are reflected on another page more urgently, a purge request can be submitted manually via the "Purge" menu item in Wagtail (both the "URL" or "Page URLs" option will work).

Replicating things locally

The main downside of downstream caches is that it's difficult to replicate them in local development environments. This isn't necessarily a bad thing, because seeing the latest version of something is usually very useful. But, when developing new features (especially those that make use of private data), you should consider testing the feature out in preprod as an important step in that feature's life-cycle.

2. Internal caching

All hosted versions of the project have access to a simple key/value store (separate from the database), which apps interact with indirectly, via Django's generic cache framework.

How does it work?

Being a Django-based CMS, Wagtail makes heavy use of Django's low-level cache API to improve the performance in many places, as do some features in this codebase.

Generally speaking, the cache framework is useful for caching small chunks of non-sensitive data that doesn't change very often, and would otherwise need to be repeatedly fetched from the database using one or many expensive queries. Lookups still have an overhead, but fetching data from a cache is generally much faster than fetching the same data from a database.

How invalidation works

In general, you shouldn't have to worry about invalidating it this type of cache manually, because features that use the low-level cache API will typically use some kind of 'hook' offered by Django or Wagtail to automatically clear or replace cached data when it becomes invalid. However, having the option to manually purge CAN be useful in some circumstances (for example, when experiencing cache-related errors after a Django upgrade).

The most convenient method way to purge a cache manually is to submit a "Django cache" purge request from the "Purge" menu within Wagtail. Or, if the CMS is inaccessible for some reason, you can SSH into the container, fire up a Django shell, and interact with the API directly.

TIP: You should be able to find most signal-related code in signal_handlers.py (where the functions are defined) and apps.py (where they are connected to signals).

Replicating things locally

Thanks to Django's generic APIs, you can easily benefit from (and test the impact of) this type of caching locally, without adding a bunch of complication to your local setup.

By default, Django's Local-memory cache backend is enabled for development environments, which simply stores values in local memory until the app restarts. However, you are welcome to experiment with other backends by overriding the CACHES setting value in your personal local.py settings file.