Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] "figure" or "picture" not parsed #654

Open
jdavidlopez opened this issue Jan 4, 2025 · 0 comments
Open

[BUG] "figure" or "picture" not parsed #654

jdavidlopez opened this issue Jan 4, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@jdavidlopez
Copy link

Describe the bug
When an article (all from Substack or Medium) has images embed inside a <figure> it doesn't get parsed.

To Reproduce
Parse any article from Substack/Medium that contains images.

Expected behavior
When using keep_article_html=True images should be embedded there

Screenshots
N/A

System information

  • OS: Linux
  • Python version: 3.8
  • Library version: 0.9.3.1

Additional context
Example of image not being parsed

<figure class="mi mj mk ml mm mn mf mg paragraph-image">
  <div role="button" tabindex="0" class="mo mp ed mq bh mr">
    <div class="mf mg mh">
      <picture>
        <source srcset="https://miro.medium.com/v2/resize:fit:640/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 640w, https://miro.medium.com/v2/resize:fit:720/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 720w, https://miro.medium.com/v2/resize:fit:750/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 750w, https://miro.medium.com/v2/resize:fit:786/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 786w, https://miro.medium.com/v2/resize:fit:828/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 828w, https://miro.medium.com/v2/resize:fit:1100/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 1100w, https://miro.medium.com/v2/resize:fit:1400/format:webp/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 1400w" sizes="(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px" type="image/webp">
        <source data-testid="og" srcset="https://miro.medium.com/v2/resize:fit:640/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 640w, https://miro.medium.com/v2/resize:fit:720/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 720w, https://miro.medium.com/v2/resize:fit:750/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 750w, https://miro.medium.com/v2/resize:fit:786/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 786w, https://miro.medium.com/v2/resize:fit:828/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 828w, https://miro.medium.com/v2/resize:fit:1100/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 1100w, https://miro.medium.com/v2/resize:fit:1400/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg 1400w" sizes="(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px">
        <img alt="Two hands reaching for one another, seen under pink light against a bright pink background" class="bh ko ms c" width="700" height="680" loading="eager" src="https://miro.medium.com/v2/resize:fit:1155/1*IWuB7jVUQ0lsNICWUv5gPg.jpeg">
      </picture>
    </div>
  </div>
</figure>
@jdavidlopez jdavidlopez added the bug Something isn't working label Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant