Skip to content
This repository has been archived by the owner on Jan 19, 2019. It is now read-only.

Avoid instantiating huge tensors as input to similarity functions #308

Open
matt-gardner opened this issue Apr 20, 2017 · 4 comments
Open

Comments

@matt-gardner
Copy link
Contributor

I'm not sure how this would work, really, but it takes a whole lot of memory to do it like we do it, tiling everything and then doing elementwise multiplication. There might be some way to make this work using some kind of batch_dot or dot.

@matt-gardner
Copy link
Contributor Author

It looks like tf.einsum might do the trick, at least for simple similarity functions. For more complicated ones, I'm not sure.

@matt-peters
Copy link

tf.matmul works well for generic dot product based similarities. It's probably a lot faster since it'll call directly the optimized matrix routines.

@matt-gardner
Copy link
Contributor Author

The issue is that our similarity functions try to be fancy, letting you easily swap out different parameterized and non-parameterized functions when computing attentions. The trouble is that the way we make this easy is by taking a whole lot of memory. We need to re-think the API a bit.

@matt-gardner
Copy link
Contributor Author

I'm decreasing the priority of this, as the adaptive batch size and dynamic padding stuff makes this not too big of an issue anymore.

It'd still be a nice optimization, and would likely make runtimes faster, but it's not blocking anything anymore.

@matt-gardner matt-gardner added P2 and removed P1 labels May 10, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants