Define how grammars work and give examples #57

jlguenego · 2019-07-29T14:37:29Z

You should provide an fully working example about how to use grammar. Because I did not see any use case where I could use that.

I do not understand what is the purpose of having grammar and how to use it.
"google search" did not help me.

foolip · 2019-08-01T14:36:47Z

Do you mean how to use recognition.grammars.addFromURI(...) and recognition.grammars.addFromString(...)? A while ago I tried to work this out myself by looking at the Chromium source code, but I couldn't get to the bottom of what they actually do.

@gshires do you have any context on what these methods do?

@marcoscaceres when you've looked at the API, could you make sense of this bit?

I'm adding use counters to Chromium to figure out how this is used, and if the grammar stuff isn't really used in the wild it's possible it could be removed.

jlguenego · 2019-08-01T14:47:04Z

Yes, I mean having a understanding use case where recognition.grammars.addFromURI(...) would be useful.

kdavis-mozilla · 2019-08-01T15:14:20Z

I do not understand what is the purpose of having grammar...

The basic idea is to decrease word error rate for STT.

...and how to use it

You'd set grammars on a SpeechRecognition instance before calling start().

An example, say you are doing English STT for restaurant that has Indian dishes. So you'd want to have a grammar that includes the names of Indian dishes, e.g. palak paneer, to decrease word error rate for STT unfamiliar with such terms.

As to if it's used in Chromium, I do not know.

marcoscaceres · 2019-08-02T04:35:44Z

The spec fails to specify the format the the grammar is in (see "ISSUE 3" in the spec). This really not be in the spec at all, given how poorly specified that all is.

We should get rid of all of SpeechGrammarList entirely (looks like more fingerprinting surface).

In Chrome, the src just returns the URL from the web page - so seems completely useless.

kdavis-mozilla · 2019-08-02T05:13:54Z

@marcoscaceres, here @jlguenego was asking: What is a use case for a grammar.

The issues you mention, while valid, are orthogonal to "what is a use case for a grammar" question. So maybe they should be is a different GitHub issue?

marcoscaceres · 2019-08-02T05:18:40Z

New issue for removing that would be SpeechGrammarList and associated uses would be great.

marcoscaceres · 2019-08-02T05:34:00Z

Sent PR #58

foolip · 2019-08-02T06:07:21Z

https://bugs.chromium.org/p/chromium/issues/detail?id=680944 has some clues about what grammars are for, although it's a bug about something that doesn't work:

recognizer = new SpeechRecognition();  //standard
recognizer.continuous = false; //stops listening in a pause
recognizer.lang = "en-GB";   //"en-GB"  "el-GR"
recognizer.interimResults = true;
recognizer.maxAlternatives = 5;
var commands = ['transfer' , 'inquiry', 'statement','balance'];
var grammar = '#JSGF V1.0; grammar commands; public <command> = '
+ commands.join(' \| ') + ' ;';
var speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognizer.grammars = speechRecognitionList;

This example is about limiting recognition to a small set of words. A banking use case can be inferred from the set of strings in the example: 'transfer' , 'inquiry', 'statement', 'balance'.

But again, this doesn't work in Chrome.

@jlguenego I'll go ahead and rename this issue to go beyond just asking for examples, hope you don't mind.

foolip · 2019-08-02T06:33:18Z

Searching for "webkitSpeechRecognition JSGF" there are some examples:
https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/grammars#Examples

StackOverflow questions:

kdavis-mozilla · 2019-08-02T06:44:18Z

A fuller example from MDN is the tutorial we wrote several years ago.

However, I doubt if any browser supports the tutorial as written as it uses JSGF which as far as I know is not supported in any browser.

foolip · 2019-08-02T07:08:49Z

looks like more fingerprinting surface

Unless the interface works and actually does something useful by changing the outcome of speech recognition, the API surface itself is write-only and doesn't reveal anything, there isn't even a way to feature detect what's supported :)

foolip · 2019-08-02T08:33:52Z

I've done some digging in HTTP Archive for pages containing "SpeechRecognition" and ".grammars" and 381 results. Most are variations of the same script, and all the bits producing grammar strings that I could interpret are using JSGF. So, probably that has worked to some extent.

foolip · 2019-08-02T08:36:20Z

I found 32 references to "jspeech" which is probably https://github.com/tur-nr/node-jspeech maintained by @tur-nr. @tur-nr, do you know which browsers JSGF works in?

kdavis-mozilla · 2019-08-02T08:45:47Z

@foolip Myself and Andre Natal implemented JSGF in Firefox and Firefox OS many years ago, both "prefed off" by default, but removed both.

I'd guest most of the HTTP Archive hits you found are variations of the tutorial we made for the now dead FirefoxOS.

foolip · 2019-08-02T09:11:19Z

@kdavis-mozilla numerically most actually seem to be https://cdn.botframework.com/botframework-webchat/latest/botchat.js or variations on this. botframework.com is a Microsoft framework, so perhaps we could find someone to help shed light on how grammar is being used in this project. @thejohnjansen, are you able to help?

tur-nr · 2019-08-02T09:29:06Z

@foolip I wrote that library a few years back when doing a hackathon project with the speech API. I don't know the browser support unfortunately, but I used Chromium (WebKit) Speech Recognition. Wasn't sure it was doing anything as Chrome just captured anything I said regardless of the grammar I gave it 🤷‍♀️

Microsoft have used it in their bot framework yes. You can review their usage on GitHub.

Hope that helps 😬

SpeechGrammar's addFromUri method was already measured because the capitalization doesn't match the spec, but also measure the other parts of the grammar API surface to learn how widely used it is. Prompted by spec issue: WebAudio/web-speech-api#57 Change-Id: Ib9289f911ad4966d5e0c836924444cd1d3b4be60 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1732790 Auto-Submit: Philip Jägenstedt <[email protected]> Reviewed-by: Henrik Boström <[email protected]> Commit-Queue: Philip Jägenstedt <[email protected]> Cr-Commit-Position: refs/heads/master@{#683515}

foolip · 2019-08-02T09:55:15Z

Thanks @tur-nr! I tried to find use of grammars in https://github.com/microsoft/BotFramework-WebChat but couldn't find the same code as I see in https://cdn.botframework.com/botframework-webchat/latest/botchat.js.

@billba @danmarshall @corinagum, I see you are among the top contributors to that repo at Microsoft. Can any of you shed light on how https://cdn.botframework.com/botframework-webchat/latest/botchat.js uses the web-exposed APIs in https://w3c.github.io/speech-api/#speechreco-speechgrammar, which we're discussing in this issue? Specifically, what kind of values are you passing to addFromString and have you found it to have an effect on any browser?

saschanaz · 2019-08-05T06:30:32Z

Thanks @tur-nr! I tried to find use of grammars in https://github.com/microsoft/BotFramework-WebChat but couldn't find the same code as I see in https://cdn.botframework.com/botframework-webchat/latest/botchat.js.

That code resembles https://github.com/microsoft/BotFramework-WebChat/blob/4849ce2928125475ee801a7bc90973cfa8db9c6e/src/SpeechModule.ts

foolip · 2019-08-05T07:38:56Z

Yes, that's it, added by @compulim in microsoft/BotFramework-WebChat#937. @compulim, you mentioned "Web Speech API + JSRF" there, did you get that working in any browser at the time?

foolip · 2019-08-05T07:51:05Z

With help from @gshires I've been able to locate what happens with the grammars in Chromium, the weights and URLs are passed along to the speech recognition engine. Since neither are interpreted by Chromium, this is effectively just a way to pass engine-specific configuration or options along.

As with text layout engines or WebRTC, having some controls is reasonable, and standardizing some options across speech engines might be possible.

So, at least as implemented in Chromium, grammars aren't quite what you'd expect them to be, and I would not recommend trying to use it since the behavior isn't documented and could change.

I've sent https://chromium-review.googlesource.com/c/chromium/src/+/1732790 to measure the usage of these APIs in more detail.

kdavis-mozilla · 2019-08-05T07:55:27Z

So, at least as implemented in Chromium

What does "implemented" mean here?

Is the grammar only retained or is it retained and used to affect STT results?

compulim · 2019-08-05T08:32:24Z

@foolip Agree with your observations.

Although Chromium say it support JSGF, when I send JSGF with weighted phrases, I don't feel the difference. It feels like the JSGF is simply ignored.

But my observation is very subjective because I am using my voice to test the engine. Without looking at the source code, it's very hard to say whether JSGF is working or ignored.

foolip · 2019-08-05T08:36:20Z

@kdavis-mozilla Chromium only exposes the interfaces and passes along the grammar URL and weight as given without interpretation. Any actual effect would be in the speech engine service and that is neither open source nor documented, AFAICT. I wasn't able to find any use of grammars in httparchive that seems to have any effect in Chrome.

@compulim I'm pretty sure at this point that addFromString doesn't do anything at all in Chromium, and the source code doesn't mention JSGF anywhere.

What I'd like to do at this point:

Understand what configuration the engine is capable of via the grammar-related APIs
Wait for stats from new use counters to be available

I think the outcome will likely be adding other ways to configure the speech recognition, maybe more attributes. But it depends a lot on the feasibility of changing/removing the existing APIs.

kdavis-mozilla · 2019-08-05T09:20:07Z

...maybe more attributes...

As a designer of an STT engine I'd say this is not the way to go.

The only attributes I can think of that might make sense across engines are grammars/language models with weights. Even this is very engine specific.

Generally, there are many other such "attributes" that are STT engine specific and even vary from one version of an STT engine to the next. So exposing them in this API is not going to be a good design decision.

foolip · 2019-08-05T12:18:32Z

It's probably easier to discuss a concrete example than the general idea of adding attributes, but I don't have a concrete example at this time.

kdavis-mozilla · 2019-08-05T12:43:54Z

I can come up with many concrete examples, all of them bad 😊 Off the top of my head here are two...

Beam Width
For example, when using a language model one usually introduces a beam search in which the beam elements are ordered by language model and acoustic model scoring. The width of this beam "beam width" could be one of these "attributes".

However, the beam width is usually tuned such that the beam is as large as possible for the given resource budget. Allowing the users to set the beam width above this value will increase the quality but invalidate the financial calculations that went into deployment of the STT engine. Allowing users to decrease this beam width will decrease the quality of the STT results and frustrate users.

Language Model vs Acoustic Model Weighting
When one creates a STT engine it generally has two components, a language model and an acoustic model. Both models work together to assign probabilities to proposed transcripts. The language model suggests its probability for a transcript and the acoustic model suggests its probability for the same transcript. The final probability is given but assigning a weight to the language model's probabilities and a weight to the acoustic model's probabilities. These weights are laboriously tuned to optimize performance for particular use cases.

These weights could be some of these "attributes". However, giving users access to these weights will allow them to tune the engine away from its optimal configuration for the in-browser use case. Decreasing quality and frustrating users.

If you want more examples I can provide more.

foolip · 2019-08-06T04:53:07Z

No, no, I wasn't looking for examples, of course there are more bad ideas than good ideas, and bad ones are easy to list. I'll open new issues for any actual proposals if they show up.

guest271314 · 2019-08-12T20:47:53Z

@compulim

@foolip Agree with your observations.

Although Chromium say it support JSGF, when I send JSGF with weighted phrases, I don't feel the difference. It feels like the JSGF is simply ignored.

But my observation is very subjective because I am using my voice to test the engine. Without looking at the source code, it's very hard to say whether JSGF is working or ignored.

The Chrome/Chromium implementation of SpeechRecognition is essentially a black box.

What is known is that at Chrome/Chromium no permission is requested and no notification is provided that the user PII biometric data (the users' voice) is being recorded and sent to an undisclosed third-party web service. It is unclear if the users' voice is stored forever, and further used for research and development of proprietary technologies #56. It is not documented exactly how the third-party web service performs STT.

Additionally, so-called "curse" words should not be censored in the result.

Until the glaring issue at Chrome/Chromium of users not being notified and not being asked permission for their voice to be recorded and sent to an undisclosed web service, there is no way to practically test or implement grammars.

What can be done now is to 1) review how https://github.com/cmusphinx/pocketsphinx handles grammars https://github.com/cmusphinx/pocketsphinx/search?q=grammar&unscoped_q=grammar; 2) start from scratch converting voice to IPA (https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) to words in a given language, essentially the reverse of
https://github.com/itinerarium/phoneme-synthesis; see also

szewai · 2021-01-20T04:02:29Z

Hello @foolip, is there any update on the usage report for the grammars API?

foolip · 2021-01-25T19:59:57Z

Hi @szewai!

Here's the data from the use counters we have in Chrome, including the ones added in #57 (comment) and some more:

new webkitSpeechRecognition(): ~10%
SpeechRecognition.prototype.start(): ~0.06%
SpeechRecognition.prototype.grammars getter: ~0.0001%
SpeechRecognition.prototype.grammars setter: ~0.6%
new webkitSpeechGrammar(): ~0.0006%
new webkitSpeechGrammarList(): ~2%
SpeechGrammarList.prototype.addFromString(): ~0.0008%
SpeechGrammarList.prototype.addFromUri(): ~0.02%
SpeechGrammarList.prototype.item(): ~0%

The new webkitSpeechRecognition() and new webkitSpeechGrammarList() usage is much higher than I would have guessed, but SpeechRecognition.prototype.start() gives a much better idea of the real usage. The addFromString() and addFromUri() usage in particular should be understood in relation to that, and a reasonable interpretation is that addFromUri() is often used when there's real usage of the API happening. (It could be that the start() and addFromUri() is mutually exclusive, but I see no reason to suspect it.)

However, as stated in #57 (comment), these "grammars" are effectively engine-specific options. If we find that usage in the wild depends on this in some important way, I think we should first try to define what the effect of certain invocations of addFromUri() should be, or if that turns out impractical to implement for other engines, define an alternative way to communicate those settings, and try to migrate the usage in the wild to that standardized mechanism.

Safari does not currently implement webkitSpeechGrammarList and throws an error at line 2. I have guarded against this, so the script now works in Safari on desktop and iPhone. I've also added a comment around the grammar code because no browser currently supports them, as noted in this WICG discussion: WebAudio/web-speech-api#57 Since we'll check usage stats to determine if grammars should be removed from the spec, and this script encourages grammar usage (I imagine this code will be copy/pasted assuming they have an effect), I added a comment to discourage their use. I didn't want to remove grammars completely, as this script is used as a demonstration of the spec on MDN.

evanbliu · 2025-01-07T01:35:18Z

The removal of SpeechGrammar from the Web Speech API spec was discussed at TPAC 2024 and the popular opinion was to remove it from the spec as it is under specified and unsupported by many implementations of the Web Speech API including Chrome.

marcoscaceres linked a pull request Aug 2, 2019 that will close this issue

BREAKING CHANGE: Drop grammars #58

Open

foolip changed the title ~~grammar example~~ Define how grammars work and give examples Aug 2, 2019

kennethkufluk mentioned this issue Jan 27, 2022

Update speech-color-changer to be compatible with Safari mdn/web-speech-api#64

Merged

brollin mentioned this issue Apr 5, 2022

play puzzles by voice lichess-org/lila#10644

Closed

msub2 mentioned this issue Dec 8, 2022

Implementing grammars msub2/sepia-speechrecognition-polyfill#1

Open

evanbliu linked a pull request Nov 5, 2024 that will close this issue

Remove SpeechGrammar from the Web Speech API spec #117

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define how grammars work and give examples #57

Define how grammars work and give examples #57

jlguenego commented Jul 29, 2019

foolip commented Aug 1, 2019

jlguenego commented Aug 1, 2019

kdavis-mozilla commented Aug 1, 2019

marcoscaceres commented Aug 2, 2019

kdavis-mozilla commented Aug 2, 2019

marcoscaceres commented Aug 2, 2019

marcoscaceres commented Aug 2, 2019

foolip commented Aug 2, 2019

foolip commented Aug 2, 2019

kdavis-mozilla commented Aug 2, 2019

foolip commented Aug 2, 2019

foolip commented Aug 2, 2019

foolip commented Aug 2, 2019

kdavis-mozilla commented Aug 2, 2019

foolip commented Aug 2, 2019

tur-nr commented Aug 2, 2019 •

edited

Loading

foolip commented Aug 2, 2019

saschanaz commented Aug 5, 2019

foolip commented Aug 5, 2019

foolip commented Aug 5, 2019

kdavis-mozilla commented Aug 5, 2019 •

edited

Loading

compulim commented Aug 5, 2019

foolip commented Aug 5, 2019

kdavis-mozilla commented Aug 5, 2019 •

edited

Loading

foolip commented Aug 5, 2019

kdavis-mozilla commented Aug 5, 2019 •

edited

Loading

foolip commented Aug 6, 2019

guest271314 commented Aug 12, 2019

szewai commented Jan 20, 2021

foolip commented Jan 25, 2021

evanbliu commented Jan 7, 2025

Define how grammars work and give examples #57

Define how grammars work and give examples #57

Comments

jlguenego commented Jul 29, 2019

foolip commented Aug 1, 2019

jlguenego commented Aug 1, 2019

kdavis-mozilla commented Aug 1, 2019

marcoscaceres commented Aug 2, 2019

kdavis-mozilla commented Aug 2, 2019

marcoscaceres commented Aug 2, 2019

marcoscaceres commented Aug 2, 2019

foolip commented Aug 2, 2019

foolip commented Aug 2, 2019

kdavis-mozilla commented Aug 2, 2019

foolip commented Aug 2, 2019

foolip commented Aug 2, 2019

foolip commented Aug 2, 2019

kdavis-mozilla commented Aug 2, 2019

foolip commented Aug 2, 2019

tur-nr commented Aug 2, 2019 • edited Loading

foolip commented Aug 2, 2019

saschanaz commented Aug 5, 2019

foolip commented Aug 5, 2019

foolip commented Aug 5, 2019

kdavis-mozilla commented Aug 5, 2019 • edited Loading

compulim commented Aug 5, 2019

foolip commented Aug 5, 2019

kdavis-mozilla commented Aug 5, 2019 • edited Loading

foolip commented Aug 5, 2019

kdavis-mozilla commented Aug 5, 2019 • edited Loading

foolip commented Aug 6, 2019

guest271314 commented Aug 12, 2019

szewai commented Jan 20, 2021

foolip commented Jan 25, 2021

evanbliu commented Jan 7, 2025

tur-nr commented Aug 2, 2019 •

edited

Loading

kdavis-mozilla commented Aug 5, 2019 •

edited

Loading

kdavis-mozilla commented Aug 5, 2019 •

edited

Loading

kdavis-mozilla commented Aug 5, 2019 •

edited

Loading