This library doesn't work for large embeddings #11

ruanchaves · 2019-12-23T18:28:42Z

Issue description

I tried to execute a slightly modified version of this script ( no significative changes were made ) for an embedding with a large vocabulary and 600 dimensions:

from nncompress import EmbeddingCompressor

# Load my embedding matrix
matrix = np.load("data/glove.6B.300d.npy")

# Initialize the compressor
compressor = EmbeddingCompressor(32, 16, "data/mymodel")

# Train the quantization model
compressor.train(matrix)

# Evaluate
distance = compressor.evaluate(matrix)
print("Mean euclidean distance:", distance)

# Export the codes and codebook
compressor.export(matrix, "data/mymodel")

But then, this is what I got:

Traceback (most recent call last):
  File "compress.py", line 82, in <module>
    pipe\
  File "compress.py", line 70, in train
    compressor.train(matrix)
  File "/home/user/summer/smallnilc/nncompress/embed_compress.py", line 159, in train
    word_ids_var, loss_op, train_op, maxp_op = self.build_training_graph(embed_matrix)
  File "/home/user/summer/smallnilc/nncompress/embed_compress.py", line 114, in build_training_graph
    input_matrix = tf.constant(embed_matrix, name="embed_matrix")
  File "/home/user/summer/smallnilc/small/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 180, in constant_v1
    allow_broadcast=False)
  File "/home/user/summer/smallnilc/small/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 284, in _constant_impl
    allow_broadcast=allow_broadcast))
  File "/home/user/summer/smallnilc/small/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 537, in make_tensor_proto
    "Cannot create a tensor proto whose content is larger than 2GB.")
ValueError: Cannot create a tensor proto whose content is larger than 2GB.

Tensorflow devs have answered issues similar to this one by saying that the only solution is to rewrite your code in a way that it doesn't break the hard limit of 2GB imposed by protobuf.

Steps to reproduce the issue

Simply try to compress an embedding above 300 dimensions ( either 600 or 1000 dimensions ).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This library doesn't work for large embeddings #11

This library doesn't work for large embeddings #11

ruanchaves commented Dec 23, 2019

This library doesn't work for large embeddings #11

This library doesn't work for large embeddings #11

Comments

ruanchaves commented Dec 23, 2019

Issue description

Steps to reproduce the issue