Skip to content

cis2json

What is cis2json?

cis2json is a Java application that translates records in the CIS (collection index system) database to JSON files that build the (legacy) API’s ElasticSearch index. The application is run during by a series of bash scripts (found at this location in the Ansible repository). The scripts are:

  • compile_cis2json.sh
  • index_cis_data.sh
  • index_wiki_data.sh
  • refresh_cis_data.sh
  • refresh_wiki_data.sh
  • test_cis2json.sh

What does it do?

For more info see the README in the repo, but in a nutshell:

The the initially triggered refresh_cis_data.sh (triggered via cron on research VM 10.184.1.4 at 5am):

  • Compiles the cis2json Java app (on same machine at /apps/cis2json)
  • Runs this Java app, passing the full_monty argument, to gather collection data from CIS via SQL calls and save these as Java objects [Lucene indices?]
  • Serialises these objects and saves them as JSON files in /apps/cis2json_data/[timestamp]
  • Symlinks the new timestamped directory as current and the old one as previous
  • Calls the API’s index_collection.py command (directly from the app’s instance on the same server: /apps/api )
  • Which rebuilds the ElasticSearch index (hosted at 172.16.96.80 [?])

The JSON files are saved in /apps/cis2json_data/current/, in 4 sub-directories:

archive/
artist/
artwork/
group/

This data will be read into the elasticstearch index with a line in the script that basically does this:

cd /apps/api
python manage.py index_collection -d /apps/cis2json_data/current -a -p -V --settings=setting_production

(The -a option aliases the new index to the live index, -p purges expired indexes and -V purges Varnish.)