Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate possibility of separation wights from value stores. #105

Open
narekgharibyan opened this issue Sep 14, 2018 · 2 comments
Open

Comments

@narekgharibyan
Copy link
Member

In a long run I see weights to be something independent from value store itself, for example we can have completion dictionary with json values. Certainly this will require addition of new interfaces as well.

@hendrikmuhs
Copy link
Contributor

It's certainly good to have a json dictionary with inner weights (completion is the most prominent usecase, but there might be more). Note that it would not be a big deal to implement with today's state.

Still I think separation is interesting for various other things, some thoughts:

Weight function

The inner weight right now is the max of the child weights. It would be good to use other criteria instead of max, e.g. the sum.

The application for this would be data analysis, e.g. classification (I did some prototyping a while back in this direction).

Requirements

Flexibility

Today's implementation is hardcoded to the completion case. The inner weight is taken from the value at the beginning and than passed through the process. of building the FSA. I remember having some problems implementing sum in place of max

Storage

The inner weight has only 2 bytes available for storage, that's good enough for the completion case but not enough for other use cases. We should store this rather similar to a final value. No need to make it arbitrary large as long as 1 pointer can fit in there. This pointer can point into an extra buffer, like we do for the value store. This change will require a new version of the binary format, compatibility code etc.

Data Structure

The inner weight is stored on every level (depth of the fsa), there is an optimization to skip storage for depth > x, default 30, again this is hardcoded for the completion case. It would be good to be more flexible with this and e.g. store it only on certain levels to prevent size explosion.

@narekgharibyan
Copy link
Member Author

@hendrikmuhs Thanks for sharing your thoughts on this, points you brought up should be evaluated during investigation.
Some time back I've implemented a bit hacky prototype (https://github.com/narekgharibyan/keyvi-1/tree/json_inner_weight), and I'd rather prefer a proper implementation instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants