Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repetition with CompoundRule #239

Open
cjbassi opened this issue Apr 18, 2020 · 14 comments
Open

Repetition with CompoundRule #239

cjbassi opened this issue Apr 18, 2020 · 14 comments
Labels
Documentation Issues related to documentation Feature Request Feature requests not plan to be implemented by OP

Comments

@cjbassi
Copy link

cjbassi commented Apr 18, 2020

Is it possible to use repetition within the CompoundRule? Is there some syntax that can be used in the spec to achieve this? Is there any documentation for this that i missed? Thanks!

@daanzu
Copy link
Collaborator

daanzu commented Apr 19, 2020

Not currently. There has been some discussion of it in the past on Gitter (most of which I'm afraid I have forgotten). The question of how to design it is non trivial. It is definitely worth discussing here, as it is a pain point currently!

@mrob95
Copy link
Member

mrob95 commented Apr 19, 2020

Could you give an example of what you're trying to do?

@cjbassi
Copy link
Author

cjbassi commented Apr 19, 2020

I'm trying to create a compound rule where certain words can be repeated. For my use case, the syntax that I would most prefer is word* for repetitions of 0 or more and word+ for repetitions of 1 or more. Although maybe we need to instead specify the repetition in the extras, so that you can be more precise with specifying the repetition parameters. Also, it needs to work with other extras like Choice.

@mrob95
Copy link
Member

mrob95 commented Apr 19, 2020

Ahh I see. You can obviously use a Repetition to create a new extra which does this but there is no syntax at the moment, though it would be possible.

* would need to produce Optional(Repetition(object)) and + Repetition(object). Hard part is that we would probably also need to do some trickery with the names so that you could refer to both the repeated version and the normal version of something with the same name.

@cjbassi
Copy link
Author

cjbassi commented Apr 19, 2020

Oh you can specify it as an extra already? Is there some documentation for this that I missed? I specifically need it to work with a Choice extra, if that's possible. Thanks for the help.

@mrob95
Copy link
Member

mrob95 commented Apr 19, 2020

Ah yeah you're right, there is a docstring for it but I can't find it in the docs. Basically it's just another extra you can wrap around others, couple of example usages:

# Matches between one and four integers
Repetition(IntegerRef("", 0, 10), min=1, max=5, name="num_seq"),
# Matches between one and three alphabet characters in a Choice
Repetition(Choice("", {...}), 1, 4, "alphabet_seq"),

It will return a list of the matched elements, and you can specify a default as normal if it's going to be optional.

Docs: https://github.com/dictation-toolbox/dragonfly/blob/master/dragonfly/grammar/elements_basic.py#L563

@cjbassi
Copy link
Author

cjbassi commented Apr 20, 2020

I got it to work based on the examples you provided, so thanks for that. At this point, the only question here is whether the CompoundRule should provide syntax support for * and +. I'm not really sure i have an opinion on this, but I think it might be nice.

@drmfinlay
Copy link
Member

Use of Repetition elements, either for continuous command recognition (CCR) or for more simple use cases, is currently poorly documented.

@mrob95 Would you mind if your examples were included in the docstring for the Repetition element class?

At this point, the only question here is whether the CompoundRule should provide syntax support for * and +. I'm not really sure i have an opinion on this, but I think it might be nice.

As @daanzu said, this has come up in the past and has unfortunately been forgotten about. I would be in support of adding this syntax to Compound specs. It would be nice to have!

@drmfinlay drmfinlay added Documentation Issues related to documentation Feature Request Feature requests not plan to be implemented by OP labels Apr 23, 2020
@daanzu
Copy link
Collaborator

daanzu commented Apr 24, 2020

At this point, the only question here is whether the CompoundRule should provide syntax support for * and +. I'm not really sure i have an opinion on this, but I think it might be nice.

I think this could come in handy fairly often. The main design question is how to handle the extras: does the extras variable get assigned an array of all of the matches? I seem to remember coming up with various possibilities, but I haven't considered it lately.

@cjbassi
Copy link
Author

cjbassi commented Apr 24, 2020

I've already implemented this in an application I'm writing: https://github.com/osprey-voice/osprey/blob/master/osprey/voice.py#L138, and the way I've implemented it is both * and + return an array of the matches. * is implemented with an Optional so I just set it's default to [].

@cjbassi
Copy link
Author

cjbassi commented May 22, 2020

Is it possible to have an unbounded number of repetitions? Otherwise we are going to have to put a limit on the number of repetitions for * and +. If that's the case, it might be better to not implement this feature in dragonfly since it would be opinionated.

@daanzu
Copy link
Collaborator

daanzu commented May 23, 2020

Is it possible to have an unbounded number of repetitions?

For engines that support it (I think Kaldi and natlink), the optimize parameter makes the number unbounded, and ignores the min and max.

@cjbassi
Copy link
Author

cjbassi commented May 25, 2020

That didn't work for me unfortunately. It still checks max which is 1 by default. Maybe we should add another repetition parameter called unbounded, which makes it so that it doesn't check max.

@kb100
Copy link
Contributor

kb100 commented Jul 23, 2020

Most regex languages support using {min,max} or {,max} notation for bounded repetitions, in the short term that fits the model already in place. As for dragonfly, unbounded wouldnt be hard to add to repetition. Alternatively, there is a natural bound on repetitions, namely the number of recognized words (except in the trivial repetition of empty element). Thus we dont need to worry about infinite loops while decoding.

Perhaps this should be a separate issue, but I should also point out that the dfly convention of making min inclusive and max exclusive in repetitions is both nonstandard compared to regex languages and very confusing. Reading the documentation saying the default is max=min+1 hurts my head :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to documentation Feature Request Feature requests not plan to be implemented by OP
Projects
None yet
Development

No branches or pull requests

5 participants