fix: Implement Unique Document ID for Elasticsearch Indexing to Prevent Duplication #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Description
This Pull Request introduces a unique document ID generation mechanism for the Elasticsearch indexing process used within the LiteRev platform. The purpose of this enhancement is to prevent data duplication during the daily indexing of new data from MedRxiv and BioRxiv servers. By ensuring each document indexed into Elasticsearch is unique, we maintain data integrity and improve the platform's overall search efficiency.
How to Test These Changes
To test these changes, follow the steps below:
medrxiv
orbiorxiv
data.Pull Request Checklists
This PR is a:
About this PR:
/tmp/elasticrxivx_{index_name}_{timestamp}.log
).Author's Checklist:
Additional Implementation
1. Secure Password Management for Elasticsearch
Introduced a script to automatically reset and update the Elasticsearch 'elastic' user password, enhancing security by automating credential management. This script is executed as part of the container startup process, ensuring that Elasticsearch credentials are securely managed and updated as needed.
Reviewer's Checklist
Please use the following checklist for reviewing this PR: