-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define how grammars work and give examples #57
Comments
Do you mean how to use @gshires do you have any context on what these methods do? @marcoscaceres when you've looked at the API, could you make sense of this bit? I'm adding use counters to Chromium to figure out how this is used, and if the grammar stuff isn't really used in the wild it's possible it could be removed. |
Yes, I mean having a understanding use case where |
The basic idea is to decrease word error rate for STT.
You'd set An example, say you are doing English STT for restaurant that has Indian dishes. So you'd want to have a grammar that includes the names of Indian dishes, e.g. palak paneer, to decrease word error rate for STT unfamiliar with such terms. As to if it's used in Chromium, I do not know. |
The spec fails to specify the format the the grammar is in (see "ISSUE 3" in the spec). This really not be in the spec at all, given how poorly specified that all is. We should get rid of all of In Chrome, the src just returns the URL from the web page - so seems completely useless. |
@marcoscaceres, here @jlguenego was asking: What is a use case for a grammar. The issues you mention, while valid, are orthogonal to "what is a use case for a grammar" question. So maybe they should be is a different GitHub issue? |
New issue for removing that would be SpeechGrammarList and associated uses would be great. |
Sent PR #58 |
https://bugs.chromium.org/p/chromium/issues/detail?id=680944 has some clues about what grammars are for, although it's a bug about something that doesn't work: recognizer = new SpeechRecognition(); //standard
recognizer.continuous = false; //stops listening in a pause
recognizer.lang = "en-GB"; //"en-GB" "el-GR"
recognizer.interimResults = true;
recognizer.maxAlternatives = 5;
var commands = ['transfer' , 'inquiry', 'statement','balance'];
var grammar = '#JSGF V1.0; grammar commands; public <command> = '
+ commands.join(' \| ') + ' ;';
var speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognizer.grammars = speechRecognitionList; This example is about limiting recognition to a small set of words. A banking use case can be inferred from the set of strings in the example: 'transfer' , 'inquiry', 'statement', 'balance'. But again, this doesn't work in Chrome. @jlguenego I'll go ahead and rename this issue to go beyond just asking for examples, hope you don't mind. |
A fuller example from MDN is the tutorial we wrote several years ago. However, I doubt if any browser supports the tutorial as written as it uses JSGF which as far as I know is not supported in any browser. |
Unless the interface works and actually does something useful by changing the outcome of speech recognition, the API surface itself is write-only and doesn't reveal anything, there isn't even a way to feature detect what's supported :) |
I've done some digging in HTTP Archive for pages containing "SpeechRecognition" and ".grammars" and 381 results. Most are variations of the same script, and all the bits producing grammar strings that I could interpret are using JSGF. So, probably that has worked to some extent. |
I found 32 references to "jspeech" which is probably https://github.com/tur-nr/node-jspeech maintained by @tur-nr. @tur-nr, do you know which browsers JSGF works in? |
@foolip Myself and Andre Natal implemented JSGF in Firefox and Firefox OS many years ago, both "prefed off" by default, but removed both. I'd guest most of the HTTP Archive hits you found are variations of the tutorial we made for the now dead FirefoxOS. |
@kdavis-mozilla numerically most actually seem to be https://cdn.botframework.com/botframework-webchat/latest/botchat.js or variations on this. botframework.com is a Microsoft framework, so perhaps we could find someone to help shed light on how grammar is being used in this project. @thejohnjansen, are you able to help? |
@foolip I wrote that library a few years back when doing a hackathon project with the speech API. I don't know the browser support unfortunately, but I used Chromium (WebKit) Speech Recognition. Wasn't sure it was doing anything as Chrome just captured anything I said regardless of the grammar I gave it 🤷♀️ Microsoft have used it in their bot framework yes. You can review their usage on GitHub. Hope that helps 😬 |
SpeechGrammar's addFromUri method was already measured because the capitalization doesn't match the spec, but also measure the other parts of the grammar API surface to learn how widely used it is. Prompted by spec issue: WebAudio/web-speech-api#57 Change-Id: Ib9289f911ad4966d5e0c836924444cd1d3b4be60 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1732790 Auto-Submit: Philip Jägenstedt <[email protected]> Reviewed-by: Henrik Boström <[email protected]> Commit-Queue: Philip Jägenstedt <[email protected]> Cr-Commit-Position: refs/heads/master@{#683515}
Thanks @tur-nr! I tried to find use of grammars in https://github.com/microsoft/BotFramework-WebChat but couldn't find the same code as I see in https://cdn.botframework.com/botframework-webchat/latest/botchat.js. @billba @danmarshall @corinagum, I see you are among the top contributors to that repo at Microsoft. Can any of you shed light on how https://cdn.botframework.com/botframework-webchat/latest/botchat.js uses the web-exposed APIs in https://w3c.github.io/speech-api/#speechreco-speechgrammar, which we're discussing in this issue? Specifically, what kind of values are you passing to |
That code resembles https://github.com/microsoft/BotFramework-WebChat/blob/4849ce2928125475ee801a7bc90973cfa8db9c6e/src/SpeechModule.ts |
Yes, that's it, added by @compulim in microsoft/BotFramework-WebChat#937. @compulim, you mentioned "Web Speech API + JSRF" there, did you get that working in any browser at the time? |
With help from @gshires I've been able to locate what happens with the grammars in Chromium, the weights and URLs are passed along to the speech recognition engine. Since neither are interpreted by Chromium, this is effectively just a way to pass engine-specific configuration or options along. As with text layout engines or WebRTC, having some controls is reasonable, and standardizing some options across speech engines might be possible. So, at least as implemented in Chromium, grammars aren't quite what you'd expect them to be, and I would not recommend trying to use it since the behavior isn't documented and could change. I've sent https://chromium-review.googlesource.com/c/chromium/src/+/1732790 to measure the usage of these APIs in more detail. |
What does "implemented" mean here? Is the grammar only retained or is it retained and used to affect STT results? |
@foolip Agree with your observations. Although Chromium say it support JSGF, when I send JSGF with weighted phrases, I don't feel the difference. It feels like the JSGF is simply ignored. But my observation is very subjective because I am using my voice to test the engine. Without looking at the source code, it's very hard to say whether JSGF is working or ignored. |
@kdavis-mozilla Chromium only exposes the interfaces and passes along the grammar URL and weight as given without interpretation. Any actual effect would be in the speech engine service and that is neither open source nor documented, AFAICT. I wasn't able to find any use of grammars in httparchive that seems to have any effect in Chrome. @compulim I'm pretty sure at this point that What I'd like to do at this point:
I think the outcome will likely be adding other ways to configure the speech recognition, maybe more attributes. But it depends a lot on the feasibility of changing/removing the existing APIs. |
As a designer of an STT engine I'd say this is not the way to go. The only attributes I can think of that might make sense across engines are grammars/language models with weights. Even this is very engine specific. Generally, there are many other such "attributes" that are STT engine specific and even vary from one version of an STT engine to the next. So exposing them in this API is not going to be a good design decision. |
It's probably easier to discuss a concrete example than the general idea of adding attributes, but I don't have a concrete example at this time. |
I can come up with many concrete examples, all of them bad 😊 Off the top of my head here are two... Beam Width However, the beam width is usually tuned such that the beam is as large as possible for the given resource budget. Allowing the users to set the beam width above this value will increase the quality but invalidate the financial calculations that went into deployment of the STT engine. Allowing users to decrease this beam width will decrease the quality of the STT results and frustrate users. Language Model vs Acoustic Model Weighting These weights could be some of these "attributes". However, giving users access to these weights will allow them to tune the engine away from its optimal configuration for the in-browser use case. Decreasing quality and frustrating users. If you want more examples I can provide more. |
No, no, I wasn't looking for examples, of course there are more bad ideas than good ideas, and bad ones are easy to list. I'll open new issues for any actual proposals if they show up. |
The Chrome/Chromium implementation of What is known is that at Chrome/Chromium no permission is requested and no notification is provided that the user PII biometric data (the users' voice) is being recorded and sent to an undisclosed third-party web service. It is unclear if the users' voice is stored forever, and further used for research and development of proprietary technologies #56. It is not documented exactly how the third-party web service performs STT. Additionally, so-called "curse" words should not be censored in the result. Until the glaring issue at Chrome/Chromium of users not being notified and not being asked permission for their voice to be recorded and sent to an undisclosed web service, there is no way to practically test or implement grammars. What can be done now is to 1) review how https://github.com/cmusphinx/pocketsphinx handles grammars https://github.com/cmusphinx/pocketsphinx/search?q=grammar&unscoped_q=grammar; 2) start from scratch converting voice to IPA (https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) to words in a given language, essentially the reverse of |
Hello @foolip, is there any update on the usage report for the grammars API? |
Hi @szewai! Here's the data from the use counters we have in Chrome, including the ones added in #57 (comment) and some more:
The However, as stated in #57 (comment), these "grammars" are effectively engine-specific options. If we find that usage in the wild depends on this in some important way, I think we should first try to define what the effect of certain invocations of |
Safari does not currently implement webkitSpeechGrammarList and throws an error at line 2. I have guarded against this, so the script now works in Safari on desktop and iPhone. I've also added a comment around the grammar code because no browser currently supports them, as noted in this WICG discussion: WebAudio/web-speech-api#57 Since we'll check usage stats to determine if grammars should be removed from the spec, and this script encourages grammar usage (I imagine this code will be copy/pasted assuming they have an effect), I added a comment to discourage their use. I didn't want to remove grammars completely, as this script is used as a demonstration of the spec on MDN.
The removal of SpeechGrammar from the Web Speech API spec was discussed at TPAC 2024 and the popular opinion was to remove it from the spec as it is under specified and unsupported by many implementations of the Web Speech API including Chrome. |
You should provide an fully working example about how to use grammar. Because I did not see any use case where I could use that.
I do not understand what is the purpose of having grammar and how to use it.
"google search" did not help me.
The text was updated successfully, but these errors were encountered: