-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Byte order marker (BOM) is displayed as empty cell #1095
Comments
With reference to https://github.com/martanne/vis/wiki/FAQ#how-should-i-edit-files-in-legacy-encodings I would suggest WONTFIX here. Yes, BOM in UTF-8 is an abomination of lesser platforms (so called “operating systems”), which punish everybody else for their unfortunate decision to use double-byte encoding for text, UTF-8 doesn’t need BOM, but whole that business should be kept outside of vis in my opinion. |
The issue isn't that vis should interpret or remove BOMs; it's that a ZWNBSP at the start of a file (a BOM) is currently rendered differently from a ZWNBSP elsewhere in the file. See zwnbsp.txt. The ZWNBSP between 'H' and 'e' is correctly rendered as invisible. |
That was the point. If you open I have noticed this problem before but usually I just press |
The same behavior can be seen with other zero width characters such as zero-width space (ZWSP) and word joiner (WJ). zwsp-start.txt vs zwsp-middle.txt |
I still believe that the principle matters: all shenanigans with incorrectly encoded files (and yes a file with BOM is incorrectly encoded one) should stay outside of vis and by definition are NOT a vis problem. |
I agree with the principle but I also don't like that the ui gets garbled by files like |
Sure, if it is so, then I guess, “SHOW ME THE PATCH!”. Also, what should happen with the content of the file? Should BOM should be just hidden but untouched in the file, or should it be really eliminated? |
Leave it untouched like what happens when the bytes appear in the middle of the file. I'll look into it later if I have time but I suspect what is happening is that vis is decrementing the index of where the next character is supposed to be drawn one cell too many when its the first character in the line. Then everything is off by one for rest of window. |
The byte order marker (BOM) is the use of a zero width no-break space character (U+FEFF) at the start of a file to indicate the encoding byte order in UTF-16/32. While not useful in UTF-8, it is legal and occasionally used as a signature to indicate UTF-8 encoding.
Consider this file: bom.txt
When opened in vis, the BOM is visible as a blank cell when it should be invisible. Interestingly, ZWNBSP is correctly displayed (or rather not displayed) when part of the rest of the file.
https://unicode.org/faq/utf_bom.html#BOM
The text was updated successfully, but these errors were encountered: