Replies: 33 comments 1 reply
-
`Hi @NiKola-UE The description of this issue is not very clear to me. To what does it apply? Is there something specific in this repository, or more generally in the add-on translation framework, that prevents you from translating to Serbian Cyrillic? The issue with punctuation seems quite specific. Does it apply to NVDA? Or to a specific TTS add-on? |
Beta Was this translation helpful? Give feedback.
-
Hello, Thank you for your response. I apologize if I didn't post something or put it where and properly because in the meantime I received an error message when posting this issue, and until recently I was just an ordinary user who has yet to get to know programming and I was hesitant if I should post it your in this place. Yes, by this issue I mean the complete translation of the entire NVDA into Serbian Cyrillic. In the language selection options in the NVDA menu > "Preferences" > "Settings" > "General", it can be found that Serbian (Latin) is already available, but there is still no translation into Serbian (Cyrillic). However, the main reason for opening this issue concerns the non-recognition of punctuation simbol promounciation in Serbian Cyrillic because all speech synthesizers incorrectly identify them as Russian, which most often happens when reading an already written text in a any text editors or on a web page that are written in Serbian (Cyrillic) when I use the arrows for navigation through the text; if I was a little clearer now. Like I said, I'm willing to fix it if I can at all. Thank you for your help. |
Beta Was this translation helpful? Give feedback.
-
For clarity, I'd recommend to handle the two issues separately. Regarding the issue about bad punctuation reading, could you report it in a new bug report. Please complete the template carefully so that anybody can reproduce your issue, even if he/she does not speak Serbian. Also indicate if it is reproducible with any TTS supporting Serbian or only some of them. Re the translation of the GUI: If you still want to translate the GUI though in Serbian Cyrillic, I'd recommend to confirm with NV Access that your work can be intergrated correctly before proceeding. If I am correct, both Serbian (Latin) and Serbian (Cyrillic) are the same language, i.e. Serbian from Serbia, but written with two distinct alphabets; as of today I am not sure at all that a translation in Serbian Cyrillic could be integrated seemlessly. |
Beta Was this translation helpful? Give feedback.
-
Hello, As a start, could you mention which synthesizer are you using? The translation of NVDA interface does not change this. As a last point, could you indicate some sample characters which aren't read correctly? |
Beta Was this translation helpful? Give feedback.
-
I have nothing against the issue of the correct punctuation marks being resolved separately, but it should be done first, i.e. the already existing Serbian translation should be Cyrillicized, because it is really only about two letters of the same, not so policentric language, with the fact that there are three consonants in the Latin variant, as in Croatian , ambiguous (one letter is written as two: - Љ & љ = Lj & lj, Њ & њ = Nj & nj, Џ & џ = Dž & dž). I don't know how synthesizers affect the correct pronunciation, but until recently I used the commercial AlfaNum's AnReader, and lately, except ESpeak, I also use the Newfon (which is currently only available as a NVDA add-on), and also FOSS RHVoice, in the development of which I want to be involved; depending on the needs, since I like to experiment. But in all cases the result is the same, with the fact that ESpeak still has the problem of misidentifying both Serbian Latin and Croatian as English, which is the problem of that synthesizer. Most punctuation marks are misidentified. So, . (tačka / тачка) is точка, , (zarez / зарез) is запятая, ! (uzvičnik / узвичник) is восклицательный знак, ? (upitnik / упитник) is вопросительный знак, etc. The same happens with recognition of the emoticons. Perhaps the solution to this problem could be to add a folder "sr_CY" to the "Locales" program folder (the folder "sr" also can be rename as "sr_LA" or some similar), which will probably happen automatically when the Serbian language is equally available in the NVDA in both its variants, and which has already been done for languages such as Spanish ("es" and "es_CO"), Portuguese ("pt", "pt_BR" and "pt_PT" - why three variants for the same?) or Chinese ("zh", "zh_CN", "zh_HK" and "zh_TW"), for which there are, so to speak, parallel translations, even though we are talking about the same languages here. |
Beta Was this translation helpful? Give feedback.
-
@NiKola-UE on my end, trying in Notepad to read punctuation with eSpeak in Serbian or in Croatian produces localized words, not English ones. |
Beta Was this translation helpful? Give feedback.
-
I use ESpeak every day, and basically I have never encountered this issue. I haven't also seen any similar complaints from other users. That is to say, there must be another factor on your end causing this that a Cyrillic translation certainly won't fix. There is no different comma, period, a question mark or an exclamation mark in Cyrillic, so if they're misrecognized at this moment, there is no reason for another translation to fix that, since the Serbian punctuation file already has these symbols perfectly translated. I am not sure if Newfon or RHVoice have any bugs (maybe one of them introduces some bad symbols.dic replacement on a global level), but as a start I would try to disable all addons and then reproduce the issue, making sure that in NVDA's voice settings ESpeak's language is set to Serbian. Detailed steps to reproduce certainly would be helpful. Another last possibility is also that since you said the issue happens on the web, maybe you are just encountering web page authors marking a wrong language of their website. You can also try to disable NVDA's automatic language detection in the voice settings dialog and see if the issue still reproduces. In any case, the so called Cyrillization of the already existing translation doesn't bring any improvement to anybody, it only brings a worse experience for the synthesizers which do not support Cyrillic as I indicated in my comment, so I would avoid that unless we can definitely and clearly identify the issue. Certain problematic punctuation marks (if they can clearly be identified) can also be manually added to the symbols file for Serbian, but the mentioned ones are already there. |
Beta Was this translation helpful? Give feedback.
-
Thanks for moving this issue to where it belongs, since I didn't know how to do it. I can open a new issue about punctuation symbols, but I don't know what else to say because I've already said almost everything here. I don't use ESpeak that often because when I read e-books or edit some text, I prefer to use synthesizers with natural voices. In addition to web pages, the indicated errors with punctuation symbols also appear when I read a document in a text editor, but they do not occur when I modify and edit that document. But when I save the changes, the same errors appear again, so it turns out that the characters are recognized based on the most closely related alphabet, and not the language, which does not happen in the case of the Latin alphabet. Again, with documents and pages written in Serbo-Latin, this does not happen, so I assumed that an additional Cyrillic translation of the complete program interface could help here, so that the correct recognition of the language would be as accurate, precise and better as possible, which can also "help" synthesizers to be easier they manage with language and writing. Of course, maybe I'm wrong because I'm (still) not a developer, I don't know programming languages or how interfaces are created and translations from one language to another (including languages represented in multiple variants). However, it is possible that there are errors in connection with an AnReader, which I used most often and somehow got used to it. And the majority of users in Serbia still use that reader most often. All symbols are nicely translated into Serbian Latin, but nothing will hurt if they are also translated into Cyrillic, which means an additional translation of the entire program interface (that's why I mentioned all this here, because I consider the issue global). I have already mentioned some languages for which parallel translations already exist, and this does not pose any problem for the NVDA and any synthesizer. By the way, all synthesizers and NVDATTS add-ons that support the Serbian language also support the Cyrillic alphabet and no any problems with it... |
Beta Was this translation helpful? Give feedback.
-
Yes of course, I did not indicate it here because I do not use that synthesizer, but with AnReader, everything should work fine as well. It definitely rather sounds to me like your documents are marked as written in Russian, rather than any NVDA issue here. Is there any particular Web page you could share that demonstrates this behaviour? Background for other community members: AnReader is a commercial SAPI 5 synthesizer for Serbian, Croatian and Macedonian.
This is only true on the surface. If we look strictly at Serbian synthesizers, this is correct. However it is also important to keep in mind that Serbian, Croatian, Bosnian and Montenegrin are very similar languages, i.e. all pronunciation and spelling rules apply equally, so community members like to have more voice variety. Other translations were mentioned, such as Portuguese or Chinese, but those translations do have semantic differences, where as SR Cyrillic would be exactly the same as SR Latin (excluding the difference of the script). There are no semantic or grammar differences here. Anyway, it's important to first identify why you are having this issue, then we can focus on the best ways to solve it. I still think it is a simple error somewhere that will be easy to resolve without any major investment such as a new language translation. |
Beta Was this translation helpful? Give feedback.
-
Nidžo, Congratulations for the effort. That's why I will try to be as short as possible. First of all, an AnReader used to exist in the Macedonian language earlier, but since version 3.0 it no longer exists. As for the above-mentioned four languages, they are actually one and the same language, but the reasons for the different names are more political than linguistic, and politics is the main reason why the supporters of the different names of one language do not recognize the other, so I will not talk about it anymore Speak. But I completely agree that new varieties and dialects should be added, which has already been done in ESpeak even for smaller regional varieties of the same languages (eg. Southern American English). Adding new letters for the same languages is only good for the programming interface - synthesizers that supported Cyrillic will continue to support it, those that didn't won't and that's perfectly fine. The Cyrillic text is perfectly readable on all pages, only the punctuation is incorrectly recognized as Russian when I move through the text with the left and right arrows. If it helps in any way, I can send the audio files for the demonstrating it, but I don't know exactly where to forward them. Again, maybe it depends on the synth. The automatic language change option is good precisely because of the recognition of different languages and possibly dialects, which is good when using multilingual syntheses that support language and voice synchronization (like ESpeak or Nuance's Vocalizer Expressive). As for the Microsoft's One Core Voices, it is understandable that the Croatian or Slovenian voices does not read Cyrillic because that script has not been used in Croatia since the XIX century, although most Croats still know it. Interestingly, Serbian One Core Voices can only read Cyrillic, but those voices cannot be used in Windows; at least as far as I know. On the other side, Lana, a Croatian voice of the aforementioned Vocalizer can read Cyrillic (Slovenian voice Tina cannot), but there are problems with the pronunciation of written letters, just like with Latin diacritical marks, which is of course a problem with that synthesis. Bulgarian, Russian and Ukrainian voices can read Serbian Cyrillic more or less well, but they cannot recognize some letters that do not exist in the Cyrillic alphabet of those languages. The same is for Russian, Ukrainian and Macedonian voices of the RHVoice, which also has to do with those syntheses, not with NVDA. Newfon support only the Serbian Latin in to the interface, but reads Cyrillic flawlessly; probably also because the authors are from this area, ie. from the Balkan (Ex Yugoslavia). Without further ado, I agree that the problem is not a big one. It doesn't bother me much and I use NVDA in English, but it can confuse some other users, mainly beginners who are just getting acquainted with these things and may not understand what it is actually about... However, you, Nidžo, should try to incorporate as much as possible and integrate Serbian Cyrillic into the interface and menu itself, which should appear in the next version, if at all possible. If there are similar errors with all synths even after re-updating, I will open a separate issue for that. |
Beta Was this translation helpful? Give feedback.
-
With the information provided in this informal discussion, I have not been able to reproduce the issue on my side. First of all clear and detailed steps to reproduce the issue are missing. E.g.:
|
Beta Was this translation helpful? Give feedback.
-
As I sayd: "Nidžo should try to incorporate as much as possible and integrate Serbian Cyrillic into the interface and menu itself, which should appear in the next version, if at all possible. If there are similar errors with all synths even after re-updating, I will open a separate issue for that." So, I'll wait for the next version of NVDA (no need to rush) and then I'll check if anything has changed, which means testing with all synths that support Serbian Cyrillic. If everything remains the same, then I will open a new issue. For now, there is no need for that because I have already said everything I wanted to here. The primary reason for opening this issue was the very integration of the Cyrillic alphabet as such, which I thought should be better implemented in the program itself, but this is just my opinion, which may not be correct at all. That is all. Please close this issue if you think it served its purpose, as I don't know what else to add. There is no reason to be nervous. |
Beta Was this translation helpful? Give feedback.
-
That won't solve the issue. After some investigation, interestingly enough, I just found it. I have never seen this before, but I also must admit that I do not use the automatic language switching function. I reproduce this in the Windows 11 Notepad, and at the moment I do not have access to my main device until Monday, which is a device running Windows 10, so I am unsure if this is Windows 11 specific or not. For @CyrilleB79 here are some steps to reproduce it:
ResultsIf I paste this sentence from Chrome into Notepad, when I read it, ESpeak switches to Russian. Somewhere, a bug is present, and the clipboard data (if cyrillic) is assumed to be in Russian. This is probably Windows giving that language information, but I am not sure if it is a regression, or it was always so. I can perform more useful tests next week and create an issue if needed, but I am not sure if NVDA can do anything here. @NiKola-UE Temporarily, when using AnReader, you can disable language detection inside NVDA's voice settings. This setting won't do anything useful in this case anyway, because AnReader does not support any other languages to switch to. |
Beta Was this translation helpful? Give feedback.
-
@nidza07 thanks for reproducing the Russian-switch issue. Re the localization in Serbian Cyrillic of NVDA, I have no opinion, not being a Serbian speaker myself. Do not expect though that such localization solve the punctuation issue though without any other investigation. |
Beta Was this translation helpful? Give feedback.
-
Note: I am using Windows 10 to try to reproduce. |
Beta Was this translation helpful? Give feedback.
-
As far as I'm concerned, it's not a problem to close both issues. But since this issue is open, I will comment only briefly here. In principle, I have already said everything I wanted to. I can only attach audio recordings that demonstrate what was written about, purely so that Nidžo would have some idea of what it looks like in practice, if it could be of any use to him. But as I said before, those recordings are completely useless to those who don't know Serbian, unless they want to hear how that language sounds or something like that... Anyway, I think this will be fixed in one of the next versions of NVDA. |
Beta Was this translation helpful? Give feedback.
-
Hello, Unless I can really find a case on the web where this reproduces even though the Web page language is Serbian, I'm not sure if we should really open an issue or rather submit feedback to Microsoft for the Notepad problem. @NiKola-UE I don't need audio recordings, but in one of my earlier comments, I have asked for some sample web pages where you reproduce the issue, and there was no reply to this. |
Beta Was this translation helpful? Give feedback.
-
@nidza07 or @NiKola-UE, would you be able to provide a log at debug level while reproducing the punctuation issue? A log would help see if wrong language information is passed by NVDA in the speech sequence or if the TTS is responsible for this issue. |
Beta Was this translation helpful? Give feedback.
-
Hi, Nidžo. As I said before, I don't use Notepad ин Windows 11 because I don't like the way it looks. Instead, I use the programs I mentioned earlier, although I get those errors when I use LibreOffice and WordPad, which has already been discontinued. Of all the syntheses we named, I used ESpeak the least often. Maybe it's a general problem with Microsoft's Sapi 5. Although it's not appropriate to ask you here (I'll delete this part of the comment when you read it), I would like the addon Anyway, let Sean close this issue whenever he wants. |
Beta Was this translation helpful? Give feedback.
-
Hello, Anyway, it seems like the issue does affect rich edit controls, even on Windows 10, as I was able to reproduce it in WordPad. Thus, I have two debug logs: They basically demonstrate the same steps, open Chrome, type this is a test in Serbian Cyrillic, copy it and paste into the editor of either Notepad in Win 11 or WordPad. |
Beta Was this translation helpful? Give feedback.
-
Yes, Nidžo, you said and did everything well. However, take into account that I mostly used the Sapi 5-based synthesizers, which generally do not have their own settings for languages and their customizations, and as a program for writing and editing documents cross-platform FOSS LibreOffice, whose main format is ODF, so it can happen that .DOC and .DOCX documents are displayed drastically differently than in Word, with which screen readers may have certain proglem. The same errors can occur when I use PDF readers such as Foxit PDF Reader, which is my main one for that, although I also happily use Blind Pandas Team's Bookworm, which seems to be no longer developed because the last version is from two years ago, and the official website getbookworm.com is not available... I don't think that this is a bug that needs to be removed, but more of a so-called "misunderstanding" that will surely not happen again as the program continues to develop, improve and progress in the future... Also, please tell me a little something about IBMTTS addon or, better yet, send me a private message via Anyway, thanks for the expert help and greetings. |
Beta Was this translation helpful? Give feedback.
-
OK. Thanks to both. With Wordpad, I am finally able to reproduce this on my side on Windows 10. I (or anybody else) need to investigate where the ru_RU comes from. |
Beta Was this translation helpful? Give feedback.
-
Additional information: |
Beta Was this translation helpful? Give feedback.
-
Hi, CyrilleB. WordPad is a simple program that should not be used unless absolutely necessary - usually when there is no other, better choice. This program can be good only for RTF documents. Earlier versions had a lot of problems with Unicode, and if you edit and save any .doc(x) documents, it can easily happen that there are quality losses, which the program itself will warn you about. It's no surprise that Microsoft withdrew it because it's almost no longer used, and it's no secret that some malwares was created in WordPad. As for what we are discussing here, I repeat that the problems with incorrect recognition of punctuation simbols occur mainly when I use the arrows to navigate through already saved and finished texts, not while I am writing. |
Beta Was this translation helpful? Give feedback.
-
@NiKola-UE, Wordpad is useful for me to be able to test this issue, and more specifically the Rich Edit component. But the Rich Edit component is also used in other applications, e.g. Windows 11 Notepad or NVDA's log window. But to be clear, I do not recommend to anybody to use WordPad for normal usage. |
Beta Was this translation helpful? Give feedback.
-
Hi,
In that case turn off the language detection, because you are getting
the wrong markup of the language.
Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I'm not saying that WordPad is so bad, but I agree that it's good for some simpler documents, where Notepad gets stuck, with WordPad not supporting code editing. I think LibreOffice should also be adapted a little better for NVDA, but that can be opened a separate issue, unless someone has already done it. |
Beta Was this translation helpful? Give feedback.
-
Correct me if I'm wrong, but I think I've finally discovered which bush the rabbit is in. It is possible that the problem is not in the option "Automatic language switching...", but in next two: "Trust voice language when processing characters and simbols" and "Include Unicode consortium data...". So, (Unicode) simbols, not an any language. It seems to me that something is not quite as it should be, but how it should be solved - I don't know; I'm not smart... Of course, I can very easy uncheck both mentioned options, but... Although I still think there should be a parallel interface translation in Cyrillic, Nidžo07 is absolutely right when he says that basically nothing would be improved. But maybe now we know how and why these mistakes happen. |
Beta Was this translation helpful? Give feedback.
-
Just to refresh the thread a bit before it's closed. I remember that I had to unchecked the "Report spelling errors" option in "Document Formatting" because it almost always happened that at least most, if not all Croatian and Serbian Latin words were incorrectly recognized as spelling errors (mispells), since those words do not exist in English. But in the newer versions it has been corrected, although this error sometimes occurs, but still less often, I assume that it will be the same with what we are talking about here, especially now that we know where it happens. If no one has anything to add, we can slowly close this issue that I mistitled and probably posted in the wrong place without knowing where the problem actually is or why it's happening... |
Beta Was this translation helpful? Give feedback.
-
It's ok, since there are no more reactions to this, I will be so free to close this issue. Just one note for @nidza07: Think about these options I mentioned in the last messages. Yes, it's easy to turn them off, but you need to check why it's happening in order to fix it somehow. Thank you for your attention. |
Beta Was this translation helpful? Give feedback.
-
Hello,
Serbian Cyrillic punctuation simbols and marks are mistakenly recognized as Russian. This mostly happens when reading an already written text, while it doesn't happen when typing.
Serbian and Russian Cyrillic are to some extent similar, but they differ drastically, so the names of those characters in these languages also differ, which can confuse users who do not know Russian or perhaps even frustrate those who simply do not want the punctuation in Serbian texts to be pronounced On russian.
Sometimes it also happens that the pronunciations of Serbian and Croatian Latin characters are mixed up, but this is much less common, even though these two Latin characters are exactly the same, with the fact that there are minor differences between their names.
By the way, NVDA has already been translated into Serbian Latin, but not yet into Serbian Cyrillic, which might solve the problem at least to some extent. I'm not a programmer, but I can translate the files used for this. However, I really don't know exactly what and how to do it and where to send it.
Thanks for all suggestions andd every help in advanced.
Beta Was this translation helpful? Give feedback.
All reactions