Skip to content
This repository has been archived by the owner on Jul 26, 2019. It is now read-only.

error of sparse_categorical_crossentropy when using theano backend #7

Open
HighCWu opened this issue Nov 30, 2018 · 5 comments
Open

Comments

@HighCWu
Copy link
Contributor

HighCWu commented Nov 30, 2018

It's totally no problem when using tensorflow backend.
Now I test the theano.
When running train_model of tutorial.ipynb,we get 1d~2d tensor but not Tensortype(float32,3D) error from
T.nnet.softmax() of K.sparse_categorical_crossentropy

<ipython-input-22-27837df85ad1> in classification_loss(y_true, y_pred)
      2 import keras.backend as K
      3 def classification_loss(y_true, y_pred):
----> 4     return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
      5 train.classification_loss = classification_loss

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in sparse_categorical_crossentropy(target, output, from_logits, axis)
   1788     target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])
   1789     target = reshape(target, shape(output))
-> 1790     return categorical_crossentropy(target, output, from_logits, axis=-1)
   1791 
   1792 

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in categorical_crossentropy(target, output, from_logits, axis)
   1762         target = permute_dimensions(target, permutation)
   1763     if from_logits:
-> 1764         output = T.nnet.softmax(output)
   1765     else:
   1766         # scale preds so that the class probas of each sample sum to 1

/usr/local/lib/python3.6/dist-packages/theano/tensor/nnet/nnet.py in softmax(c)
    813     if c.broadcastable[-1]:
    814         warnings.warn("The softmax is applied on a dimension of shape 1, which does not have a semantic meaning.")
--> 815     return softmax_op(c)
    816 
    817 

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    613         """
    614         return_list = kwargs.pop('return_list', False)
--> 615         node = self.make_node(*inputs, **kwargs)
    616 
    617         if config.compute_test_value != 'off':

/usr/local/lib/python3.6/dist-packages/theano/tensor/nnet/nnet.py in make_node(self, x)
    428                 or x.type.dtype not in tensor.float_dtypes:
    429             raise ValueError('x must be 1-d or 2-d tensor of floats. Got %s' %
--> 430                              x.type)
    431         if x.ndim == 1:
    432             warnings.warn("DEPRECATION: If x is a vector, Softmax will not automatically pad x "

ValueError: x must be 1-d or 2-d tensor of floats. Got TensorType(float32, 3D)

Then I use this to avoid it:

    import keras.backend as K
    _softmax = K.T.nnet.softmax
    def softmax(x):
        if x.ndim == 3:
            d1,d2,d3 = x.shape
            return _softmax(x.reshape((d1*d2,d3))).reshape((d1,d2,d3))
        return _softmax(x)
    K.T.nnet.softmax = softmax

but run

m = train_model(base_model=sequence_encoder, is_causal=False, tasks_meta_data=tasks, pretrain_generator=generator,
                finetune_generator=generator, pretrain_epochs=100, pretrain_steps=number_of_pretrain_steps // 100,
                finetune_epochs=100, finetune_steps=number_of_finetune_steps // 100, verbose=2, TPUStrategy=strategy)

again
we get error:

/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_logits and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 6)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 8), (None, 8, 6), (None, 8)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_flatten and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 8, 6)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_gather and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 8, 6), (None, 1)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 1), (None, 8, 2)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_random_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 8), (None, 8, 25), (None, 8)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
Epoch 1/100
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in rval(p, i, o, n)
    891             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892                 r = p(n, [x[0] for x in i], o)
    893                 for o in node.outputs:

/usr/local/lib/python3.6/dist-packages/theano/tensor/subtensor.py in perform(self, node, inputs, out_)
   2338         if self.set_instead_of_inc:
-> 2339             out[0][inputs[2:]] = inputs[1]
   2340         else:

IndexError: index 8 is out of bounds for axis 1 with size 6

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-39-7b7276d2ce06> in <module>()
      1 m = train_model(base_model=sequence_encoder, is_causal=False, tasks_meta_data=tasks, pretrain_generator=generator,
      2                 finetune_generator=generator, pretrain_epochs=100, pretrain_steps=number_of_pretrain_steps // 100,
----> 3                 finetune_epochs=100, finetune_steps=number_of_finetune_steps // 100, verbose=2, TPUStrategy=strategy)
      4 # now m is ready to be used!
      5 print(m.inputs)

/content/bert_keras_repo/transformer/train.py in train_model(base_model, is_causal, tasks_meta_data, pretrain_generator, finetune_generator, pretrain_epochs, pretrain_optimizer, pretrain_steps, pretrain_callbacks, finetune_epochs, finetune_optimizer, finetune_steps, finetune_callbacks, verbose, TPUStrategy)
    145 
    146     if pretrain_generator is not None:
--> 147         train_step(True)
    148     if finetune_generator is not None:
    149         train_step(False)

/content/bert_keras_repo/transformer/train.py in train_step(is_pretrain)
    142         _model.fit_generator(_generator, steps_per_epoch=pretrain_steps if is_pretrain else finetune_steps,
    143                              verbose=verbose, callbacks=pretrain_callbacks if is_pretrain else finetune_callbacks,
--> 144                              shuffle=False, epochs=pretrain_epochs if is_pretrain else finetune_epochs)
    145 
    146     if pretrain_generator is not None:

/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1416             use_multiprocessing=use_multiprocessing,
   1417             shuffle=shuffle,
-> 1418             initial_epoch=initial_epoch)
   1419 
   1420     @interfaces.legacy_generator_methods_support

/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    215                 outs = model.train_on_batch(x, y,
    216                                             sample_weight=sample_weight,
--> 217                                             class_weight=class_weight)
    218 
    219                 outs = to_list(outs)

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1215             ins = x + y + sample_weights
   1216         self._make_train_function()
-> 1217         outputs = self.train_function(ins)
   1218         return unpack_singleton(outputs)
   1219 

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in __call__(self, inputs)
   1386     def __call__(self, inputs):
   1387         assert isinstance(inputs, (list, tuple))
-> 1388         return self.function(*inputs)
   1389 
   1390 

/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    915                     node=self.fn.nodes[self.fn.position_of_error],
    916                     thunk=thunk,
--> 917                     storage_map=getattr(self.fn, 'storage_map', None))
    918             else:
    919                 # old-style linkers raise their own exceptions

/usr/local/lib/python3.6/dist-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
    327 

/usr/local/lib/python3.6/dist-packages/six.py in reraise(tp, value, tb)
    690                 value = tp()
    691             if value.__traceback__ is not tb:
--> 692                 raise value.with_traceback(tb)
    693             raise value
    694         finally:

/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    901         try:
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)
    905         except Exception:

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in rval(p, i, o, n)
    890             # default arguments are stored in the closure of `rval`
    891             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892                 r = p(n, [x[0] for x in i], o)
    893                 for o in node.outputs:
    894                     compute_map[o][0] = True

/usr/local/lib/python3.6/dist-packages/theano/tensor/subtensor.py in perform(self, node, inputs, out_)
   2337 
   2338         if self.set_instead_of_inc:
-> 2339             out[0][inputs[2:]] = inputs[1]
   2340         else:
   2341             np.add.at(out[0], tuple(inputs[2:]), inputs[1])

IndexError: index 8 is out of bounds for axis 1 with size 6
Apply node that caused the error: AdvancedIncSubtensor{inplace=False,  set_instead_of_inc=True}(Alloc.0, TensorConstant{1}, ARange{dtype='int64'}.0, Reshape{1}.0)
Toposort index: 315
Inputs types: [TensorType(float32, matrix), TensorType(int8, scalar), TensorType(int64, vector), TensorType(int32, vector)]
Inputs shapes: [(64, 6), (), (64,), (64,)]
Inputs strides: [(24, 4), (), (8,), (4,)]
Inputs values: ['not shown', array(1, dtype=int8), 'not shown', 'not shown']
Outputs clients: [[Reshape{3}(AdvancedIncSubtensor{inplace=False,  set_instead_of_inc=True}.0, MakeVector{dtype='int64'}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "bert_keras_repo/transformer/train.py", line 68, in train_model
    [task_loss_weight, task_target, logits, task_mask])
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/layers/core.py", line 687, in call
    return self.function(inputs, **arguments)
  File "bert_keras_repo/transformer/train.py", line 67, in <lambda>
    task_loss = Lambda(lambda x: x[0] * masked_classification_loss(x[1], x[2], x[3]), name=task.name + '_loss')(
  File "bert_keras_repo/transformer/train.py", line 20, in masked_classification_loss
    return _mask_loss(y_true, y_pred, y_mask, classification_loss)
  File "bert_keras_repo/transformer/train.py", line 11, in _mask_loss
    l = K.switch(y_mask, element_wise_loss(y_true, y_pred), K.zeros_like(y_mask, dtype=K.floatx()))
  File "<ipython-input-22-27837df85ad1>", line 4, in classification_loss
    return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py", line 1788, in sparse_categorical_crossentropy
    target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

It seems it's not my coding bug because I checkout the branch back to that one is before tpu support.

@HighCWu
Copy link
Contributor Author

HighCWu commented Nov 30, 2018

I got something about this: keras-users/EhWwuq6R0lQ
I'm not familiar with theano, so I don't know why it's OK on tensorflow but not okay on theano.

@Separius
Copy link
Owner

Separius commented Nov 30, 2018 via email

@HighCWu
Copy link
Contributor Author

HighCWu commented Nov 30, 2018

Oh, I see it. Maybe the theano support is not very necessary. At least now we rarely use theano.
I should have seen it.
It seems that I have donesome useless work.I should spend my time on something else.
Will you spend your time on this ?

@Separius
Copy link
Owner

Separius commented Nov 30, 2018

TBH I spent a day on this and at the end, I just hated Keras (for allowing such issues) and my self! so no I'm not going to waste any more time on this;
Right now I'm changing the attention mechanism of BERT and trying to make it faster

If you want to play with BERT and learn something (and help others) a good direction is to train a distilled version of BERT, so maybe you can train a model that is only 8 layers deep and 16 heads per layer but with similar accuracy
another idea that you can try is to use an encoder other than the transformer, so maybe a multilayer bidirectional QRNN can be used instead of the transformer?

Oh and thanks for making sure that the TPU version is correct and checking the backward compatibility 👍

@HighCWu
Copy link
Contributor Author

HighCWu commented Nov 30, 2018

Thanks for your advice. BERT is really so large one for me.
I will try your suggestion and wish you success on your new try.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants