cis2json¶
What is cis2json?¶
cis2json is a Java application that translates records in the CIS (collection index system) database to JSON files that build the (legacy) API’s ElasticSearch index. The application is run during by a series of bash scripts (found at this location in the Ansible repository). The scripts are:
compile_cis2json.shindex_cis_data.shindex_wiki_data.shrefresh_cis_data.shrefresh_wiki_data.shtest_cis2json.sh
What does it do?¶
For more info see the README in the repo, but in a nutshell:
The the initially triggered refresh_cis_data.sh (triggered via cron on research VM 10.184.1.4 at 5am):
- Compiles the cis2json Java app (on same machine at
/apps/cis2json) - Runs this Java app, passing the
full_montyargument, to gather collection data from CIS via SQL calls and save these as Java objects [Lucene indices?] - Serialises these objects and saves them as JSON files in
/apps/cis2json_data/[timestamp] - Symlinks the new timestamped directory as
currentand the old one asprevious - Calls the API’s
index_collection.pycommand (directly from the app’s instance on the same server:/apps/api) - Which rebuilds the ElasticSearch index (hosted at 172.16.96.80 [?])
The JSON files are saved in /apps/cis2json_data/current/, in 4 sub-directories:
archive/
artist/
artwork/
group/
This data will be read into the elasticstearch index with a line in the script that basically does this:
cd /apps/api
python manage.py index_collection -d /apps/cis2json_data/current -a -p -V --settings=setting_production
(The -a option aliases the new index to the live index, -p purges expired indexes and -V purges Varnish.)