You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @djstrong, thanks for the question. Currently pdf is the only supported doc_type. It is to handle cases where text extracted from a PDF often tends to have a line break in the middle of a sentence. An htmldoc_type could definitely be something worth supporting. Please feel free to submit a pull request if you are willing or able.
What kind of
doc_types
are supported? I have triedhtml
, but it is not working.The text was updated successfully, but these errors were encountered: