-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support big endian platform by providing new ops implementation #559
Conversation
I think this looks like a good match for how we've been doing things. @adrianeboyd do you have any comments? It looks okay to me. If it all works, I'm pleased to resolve this long-standing platform support issue! I do think the mechanism we've designed for the ops selection could be better though. It's quite unsightly the way we have to reference the subclasses in Thinc, and it leads to the circular import problems. Here's a suggestion we could try to improve this:
This will prevent Thinc from having to know all of the ops classes in order to resolve which one should be used. Generally ops classes should return I think we could merge the current PR as-is, as it doesn't seem to make things worse (that I can see). But I think the patch does indicate we could improve our system here. |
For reference on associated issues, I should mention I have the needed murmurhash changes here: https://github.com/andrewsi-z/murmurhash |
I think I'd reverse/massage the logic in Adding
So we need to be more careful around |
thinc/backends/__init__.py
Outdated
cls = ops_by_name.get("apple", ops_by_name.get("numpy")) | ||
|
||
if "bigendian" in ops_by_name: | ||
cls = ops_by_name.get("bigendian", ops_by_name.get("numpy")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cls = ops_by_name.get("apple", ops_by_name.get("numpy")) | |
if "bigendian" in ops_by_name: | |
cls = ops_by_name.get("bigendian", ops_by_name.get("numpy")) | |
cls = ops_by_name.get("numpy") | |
cls = ops_by_name.get("apple", cls) | |
cls = ops_by_name.get("bigendian", cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where a ranking from the ops classes themselves would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with suggested change and retested.
thinc/backends/__init__.py
Outdated
# avoid fallback to base NumpyOps if on big endian platform | ||
if current_ops.name == "bigendian" and name == "numpy": | ||
name = current_ops.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've removed all the use_ops("numpy")
from our code, so I think I'd rather have at most a warning/error rather than having use_ops
silently not do what you just told it to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the highlighted code... At best, this was my attempt to keep an explicit use_ops("numpy") specification from reverting back to numpy ops when explicitly requested from outside of thinc. It is best not to ignore an explicit request.
Hi @adrianeboyd, thank you for your comments and I'll start reviewing them in more detail. You brought up CI... I know github actions doesn't support s390x yet, but there are a number of supporting options that can potentially (?) be triggered by a github action. These would be community available no cost resources . I can explore this as a follow-on action and present some options for discussion (and help get it going if you and the explosion team believe it is worthwhile to purse). |
While we probably want to think about more generic solutions in the future, this should be fine to merge as is for now, and will become available in Thinc's next release 8.0.14. |
Hi, first sorry, I know that we are 2024 but I have a IBM Power7 ppc64 and I want to use to learn about model languages, I installed Debian 12 because have a version to use with ppc64. I really new about model languages, I tried to install PyTorch but architecture of processor of Power7 doesn't work. Now install spacy was successfully but I got the error: ValueError: Little-endian buffer not supported on big-endian compiler. But reading this repo you can fixed the problem with endians, can somebody help me to how configure or use in a python file, I have this as a little example: `import spacy from thinc.api import use_ops with use_ops("numpy", use_blis=False): Input texttext = "SpaCy amazing NLP." Process the textdoc = nlp(text) Tokenization: Print each token in the textprint("Tokens:") Named Entity Recognition (NER)print("\nNamed Entities:") Part-of-Speech Taggingprint("\nPart-of-Speech Tags:") Dependency Parsingprint("\nDependency Parsing:") The idea to use and learn on this Power7 server is resources of this server. Thanks, thanks! |
Based on discussion in spaCy issue 9428, there are a set of modules related to the load and use of pre-trained language pipelines and/or models on a platform with a different byteorder. This leads to incorrect results produced by the use of these pretrained models on platforms such as s390x.
As discussed in the spaCy issue referenced above, this PR adds support for thinc-bigendian-ops , which is created and located here: https://github.com/andrewsi-z/thinc-bigendian-ops .
thinc-bigendian-ops follows the precedent and use of thinc-apple-ops. When imported, it provides a custom ops class that implements the following:
The logic in thinc proper is meant to mirror the logic added for thinc_apple_ops in a minimally intrusive way.
Notes: