Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed IIIFv3 manifest parsing to max image dimensions. #51

Merged
merged 2 commits into from
Nov 29, 2024
Merged

Conversation

mdellabitta
Copy link
Contributor

@mdellabitta mdellabitta commented Nov 29, 2024

Important

Enhance IIIFv3 manifest handling by maximizing image URLs to highest resolution and update tracking and tests accordingly.

  • Behavior:
    • Updated iiif_v3_urls() in metadata.py to maximize image URLs to highest resolution using maximize_iiif_v3_url().
    • Added maximize_iiif_v3_url() to handle URL transformation for IIIFv3 manifests.
    • Modified process_item() in downloader.py to skip items with no media.
  • Regex:
    • Added IMAGE_API_UP_THROUGH_IDENTIFIER_REGEX and FULL_IMAGE_API_URL_REGEX in metadata.py for URL parsing.
  • Tracker:
    • Added BAD_IIIF_MANIFEST, NO_MEDIA, and BAD_IMAGE_API_V3 to Result enum in tracker.py.
    • Updated Tracker to increment new result types.
  • Tests:
    • Updated test_iiif_v3_urls() in test_metadata.py to test new URL maximization behavior.
    • Updated test_str_representation() in test_tracker.py to reflect new result types.

This description was created by Ellipsis for 35469ab. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 35469ab in 1 minute and 26 seconds

More details
  • Looked at 285 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. ingest_wikimedia/metadata.py:181
  • Draft comment:
    The logging statement should use exc_info=e to correctly log the exception.
            logging.warning("Unable to parse IIIF manifest.", exc_info=e)
  • Reason this comment was not posted:
    Comment was on unchanged code.
2. ingest_wikimedia/metadata.py:309
  • Draft comment:
    The regex should not include the prefix group.
    r"^(?P<scheme>http|https)://(?P<server>[^/]+)/"
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The 'prefix' group is used in the 'maximize_iiif_v3_url' function to construct URLs. Removing it from the regex would affect how URLs are parsed and constructed. The comment does not provide strong evidence that the 'prefix' is unnecessary. The code seems to rely on this group for URL construction, so the comment may not be valid.
    I might be missing the specific context or requirement that makes the 'prefix' group unnecessary. The comment could be based on a deeper understanding of the URL patterns expected by the application.
    The code uses the 'prefix' group in URL construction, indicating its importance. Without clear evidence or explanation, it's risky to assume the 'prefix' is unnecessary.
    The comment should be deleted as it lacks strong evidence or explanation for removing the 'prefix' group, which is used in URL construction.

Workflow ID: wflow_B9E1VLxB7YdkERBW


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

tools/downloader.py Outdated Show resolved Hide resolved
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
@mdellabitta mdellabitta merged commit 954ad1d into main Nov 29, 2024
5 checks passed
@mdellabitta mdellabitta deleted the mwdl-fix2 branch November 29, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant