Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use collection_id for linking rather than inferring from eadid #1531

Merged
merged 1 commit into from
May 7, 2024

Conversation

corylown
Copy link
Contributor

@corylown corylown commented May 4, 2024

At the last community call (2024-04-22) we discussion a desired feature to make it easier to customize the formation of the collection document id. The use cases cited were needing to add the repository name/code to prevent identifier collisions across repositories and adding a slug version of the title to the id for friendlier URLs.

As I was looking at how we could better support this I found that there were a couple places in the application where we were inferring the parent collection id from the normalized form of the eadid. I found this confusing and was concerned that allowing customization of the document id would break these assumptions about the relationship between the parent collection document id and the eadid.

This PR attempts to remove this assumption from the app so we can (in a future PR) add a seam for customizing the collection document id formation.

Instead of inferring the parent collection document id from the eadid this creates a SolrDocument accessor method collection_id using the _root_ Solr field from Solr's nested document structure. This change works with either older style legacy_id parent field relationships or the newer parent_ids field. It should be completely backwards compatible with existing arclight v1 apps.

Grateful for feedback, questions, concerns.

@corylown corylown marked this pull request as ready for review May 6, 2024 15:35
@seanaery
Copy link
Contributor

seanaery commented May 7, 2024

@corylown This looks good. Thanks for thinking it through. I have tested these changes out locally and they work well.

I would recommend taking this refactor even further, and removing Arclight::NormalizedId and all references to it. It's redundant at this point. The app already normalizes the id (strips whitespace and turns . to -) during indexing here, and as you've noted that value ends up as the _root_ on all the collection & component docs. The ead_ssi field holds the non-normalized version of theeadid, but that is sparsely used in the app: it factors into the qf in solrconfig.xml but otherwise isn't apparently displayed anywhere or used for anything that impacts the UI.

I.e., remove this file:

lib/arclight/normalized_id.rb

Remove this method from app/models/concerns/arclight/solr_document.rb, since nothing calls it anymore:

def normalized_eadid
  Arclight::NormalizedId.new(eadid).to_s
end

...or if we're concerned that implementers may have called normalized_eadid in their code, we could deprecate the method and change it so it just calls collection_id.

In app/models/arclight/parents.rb & app/models/arclight/parent.rb (these models exist only for the breadcrumb display), the eadid is modeled and available but is not actually used anywhere. So it could be entirely stripped out, or alternatively this...

def eadid
  Arclight::NormalizedId.new(@eadid).to_s
end

Could be:

def eadid
  @eadid
end

And this...

eadid = document.eadid

Could just be:

eadid = document.collection_id

Here's a gist, note: without rspec tests. I did test these changes out by checking out v1.0.1, rebuilding/reindexing, then checking out this new branch and before reindexing verifying the breadcrumb & context hierarchy UI worked; then reindexed and verified again that it worked.

@corylown
Copy link
Contributor Author

corylown commented May 7, 2024

@seanaery thanks for taking a close look at this and for your thorough testing. My initial work on this included many of the changes you suggest, then I backed them out for a few reasons.

  1. I plan to make changes to Arclight::NormalizedId in a future PR to serve as the seam where collection id formation can be customized. It didn't make sense to me to remove it only to add it back again. A preview of where I am heading with this: main...collection-id-creation
  2. As you note, my reason for leaving normalized_eadid in place is because it's a public method and could be in use by an implementation for various reasons. We could deprecate it, but I don't think we should make it equivalent to collection_id. Adding a **kwargs argument to the Arclight::NormalizedId would support its use for either applying normalization to just the eadid or forming an id from the eadid and other components. I'm torn on this. Maybe this could be treated as a separate issue?
  3. I had a similar concern about ripping eadid out of app/models/arclight/parents.rb & app/models/arclight/parent.rb that maybe implementers are using these models and expecting eadid to be there. It would be nice to make these classes leaner. Would you be open to considering this change in a separate PR? I'd like to keep this one simple and not make too many changes at once.

@seanaery seanaery self-requested a review May 7, 2024 20:55
Copy link
Contributor

@seanaery seanaery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all sounds good to me. 👍

@seanaery seanaery merged commit 8151234 into main May 7, 2024
4 checks passed
@seanaery seanaery deleted the links-from-ids branch May 7, 2024 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants