A mostly haphazard collection of scripts (Bash, Perl) that take Zephir records, do some clean up and calculate Bib Rights, among other processes.
Parts of these should likely be extracted into their own repositories, or obviated by a re-architecture.
Clone repo using your protocol of choice.
docker compose build
There is no need for a bundle install
step as this is taken care of in the Dockerfile
.
docker compose run --rm test
docker compose run --rm test bundle exec standardrb
docker compose run --rm test bundle exec rspec
- Process daily file of new/updated/deleted metadata provided by Zephir
- Send deleted bib record IDs (provided by Zephir) to Bill
- "Clean up" zephir records
- (re)determine bibliographic rights
- Write new/updated bib rights to file for Aaron's process to pick up and update the rights db (Why: possibly because of limited permissions on the rights database)
- File of processed new/updated records is copied to an HT server for Bill to index in the catalog
- Retrieves full bib metadata file from zephir and runs run_zephir_full_monthly.sh. (Why?)
The new/updated/deleted metadata provided by Zephir needs to make it to the catalog, and eventually into the rights database.
ht_bib_export_incr_YYYY-MM-DD.json.gz
(incremental updates from Zephir,ftps_zephir_get
)vufind_removed_cids_YYYY-MM-DD.txt.gz
(CIDs that have gone away,ftps_zephir_get
)/tmp/rights_dbm
(taken fromht_rights.rights_current
table in the rights database)us_cities.db
(dependency forbib_rights.pm
)us_fed_pub_exception_file
(dependency forbib_rights.pm
,/htdata/govdocs/feddocs_oclc_filter/
)
debug_current.txt
(what and why for this?)zephir_upd_YYYYMMDD.rights
- picked up hourly by https://github.com/hathitrust/feed_internal/blob/master/feed.hourly/populate_rights_data.pl and loaded into therights_current
table. Will be placed directly in /htapps/babel/feed/var/rights and will remove the scp logic from populate_rights_data.plzephir_upd_YYYYMMDD_delete.txt.gz
will be moved to /htsolr/catalog/prep. Used by the catalog to process deletes.zephir_upd_YYYYMMDD_dollar_dup.txt
(generated by post_zephir_cleanup.pl, gets sent to Zephir, ftps_zephir_send, Zephir uningests these duplicate records)zephir_upd_YYYYMMDD.json.gz
will be sent to /htsolr/catalog/prep for catalog indexingzephir_full_monthly_rpt.txt
Does anyone need this?
bld_rights_db.pl
(builds/tmp/rights_dbm
)bib_rights.pm
postZephir.pm
ftps_zephir_get
ftps_zephir_send
run_process_zephir_full.sh
- Pulls a full bib metadata file from zephir
- Moves groove_full.tsv.gz to /htapps/babel/feed/var/bibrecords
- Assembles zephir_ingested_items.txt.gz and moves to /htapps/babel/feed/var/bibrecords
- Processes the full zephir file:
- Splits input file and runs multiple invocations of postZephir.pm in parallel
- Generate new/updated bib rights
Previously generated the HTRC datasets. All that remains is the zephir_ingested_items and bib rights.
- US Fed Doc exception list
/htdata/govdocs/feddocs_oclc_filter/oclcs_removed_from_registry.txt
/tmp/rights_dbm
groove_export_YYYY-MM-DD.tsv.gz
(ftps from cdlib)ht_bib_export_full_YYYY-MM-DD.json.gz
groove_export_YYYY-MM-DD.tsv.gz
will be moved to /htapps/babel/feed/var/bibrecords/groove_full.tsv.gzzephir_full_${YESTERDAY}_vufind.json.gz
catalog archive. Indexed into catalog via the same process as forrun_process_zephir_incremental.sh
zephir_full_${YESTERDAY}.rights
moved to /htapps/babel/feed/var/rights/zephir_full_${YESTERDAY}.rights.debug
, doesn't appear to be usedzephir_full_monthly_rpt.txt
moved to ../data/full/- `zephir_full_${YESTERDAY}.rights_rpt.tsv moved to ./data/full/
zephir_ingested_items.txt.gz
- copied to/htapps/babel/feed/var/bibrecords
. Used by https://github.com/hathitrust/feed_internal/blob/master/feed.monthly/zephir_diff.pl to refresh the fullfeed_zephir_items
table on a monthly basis.
bld_rights_db.pl
bib_rights.pm
postZephir.pm
ftps_zephir_get
ftps_zephir_send
Tests with limited coverage can be run with Docker.
docker compose build
docker compose up -d
docker compose run --rm pz perl t/test_postZephir.t
For test coverage, replace the previous docker compose run
with
docker compose run --rm pz bash -c "perl -MDevel::Cover=-silent,1 t/*.t && cover -nosummary /usr/src/app/cover_db"