Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fulltext indexing and search across works in Curate and Lux #2185

Open
4 of 5 tasks
eporter23 opened this issue Aug 1, 2023 · 1 comment · Fixed by #2189
Open
4 of 5 tasks

Implement fulltext indexing and search across works in Curate and Lux #2185

eporter23 opened this issue Aug 1, 2023 · 1 comment · Fixed by #2189

Comments

@eporter23
Copy link
Contributor

eporter23 commented Aug 1, 2023

Story

As a repository end user, I want to be alerted to matches in full-text in my search results, so that I can find material that matches my interests more quickly

As a repository administrator, I want a user interface to manage full-text indexing of specific works, so that I can ensure only materials with OCR data undergo the additional processing

See also the enhancement request from the Rose Library.

Acceptance Criteria

Use one or more of the following options to provide acceptance criteria.

Notes

This epic follows work done previously to research and prototype options for enabling full text works. This epic seeks to actually implement a solution. The scope of this epic would include enhancements to the full text indexing process in Curate as well as enabling search and display of full text matches within Lux (https://digital.library.emory.edu). Searching within an individual work in the Universal Viewer will be a separate effort.

As background, our digitized text material which has already had OCR processing consists of 2 main types of outputs, "Kirtas" and "LIMB". Kirtas is the legacy output, and some examples include the Yellowbacks. LIMB is the newer, current output and includes collections such as the Yearbooks.

Both will always contain:

  • a PDF of the entire volume
  • a plain text .txt for each page containing textual content (File Use: "Transcript File")

Kirtas volumes also contain:

  • An OCR xml file for the entire volume

LIMB volumes also contain:

  • An OCR xml file (either ALTO or ABYY) for each page containing textual content (File Use: "Extracted")

Links to Additional Information

Add links here

Checklist

  • A UI component is available to Curate admin users so they can run the reindexing process on selected works/IDs only
  • The existing display implemented in Curate search results is added to Lux. Please revise the existing search results label "Keyword matches" to "Full-text matches".
  • The highlighting of matching full text search terms in search results currently implemented for Curate should also display in Lux.
  • The new full text SOLR field should be available in the Common Fields and All Fields search options in Lux.

Optional if time permits

  • The full text indexing process can be added as part of current bulk-import process for new works as they are ingested

Given/When/Then

  • Given (some context) and (some other optional context)
  • When (some action is carried out)
  • Then (a set of observable consequences should occur)
@eporter23 eporter23 changed the title Implement fulltext indexing and search in Curate and Lux [placeholder] Implement fulltext indexing and search in Curate and Lux Aug 1, 2023
@bwatson78 bwatson78 self-assigned this Aug 7, 2023
@bwatson78 bwatson78 linked a pull request Aug 10, 2023 that will close this issue
@eporter23 eporter23 reopened this Aug 10, 2023
@eporter23 eporter23 added the Epic label Aug 10, 2023
@eporter23
Copy link
Contributor Author

Re-opening because this is an Epic.

@eporter23 eporter23 changed the title Implement fulltext indexing and search in Curate and Lux Implement fulltext indexing and search across works in Curate and Lux Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants