Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make summarize_tensor robust to non-float dtypes #171

Merged
merged 4 commits into from
Jan 11, 2024

Conversation

piercus
Copy link
Collaborator

@piercus piercus commented Jan 9, 2024

Current implementation of summarize_tensor can raise secondary error while obfuscating the original error.

Actual

When doing some experiments, the stack was displaying a secondary error

Traceback (most recent call last):
  File "/home/pierre/dev/refiners/scripts/training/finetune-ldm-color-palette.py", line 15, in <module>
    trainer.train()
  File "/home/pierre/dev/refiners/src/refiners/training_utils/trainer.py", line 89, in inner_wrapper
    result = func(*args, **kwargs)
  File "/home/pierre/dev/refiners/src/refiners/training_utils/trainer.py", line 528, in train
    self.epoch()
  File "/home/pierre/dev/refiners/src/refiners/training_utils/trainer.py", line 512, in epoch
    self.step(batch=batch)
  File "/home/pierre/dev/refiners/src/refiners/training_utils/trainer.py", line 503, in step
    loss = self.compute_loss(batch=batch)
  File "/home/pierre/dev/refiners/src/refiners/training_utils/color_palette.py", line 234, in compute_loss
    prediction = self.unet(noisy_latents)
  File "/home/pierre/anaconda3/envs/refiners/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/pierre/anaconda3/envs/refiners/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/pierre/dev/refiners/src/refiners/fluxion/layers/chain.py", line 279, in forward
    result = self._call_layer(layer, name, *intermediate_args)
  File "/home/pierre/dev/refiners/src/refiners/fluxion/layers/chain.py", line 273, in _call_layer
    raise ChainError(message) from None
refiners.fluxion.layers.chain.ChainError: RuntimeError:
   File "/home/pierre/dev/refiners/src/refiners/fluxion/utils.py", line 199, in summarize_tensor
    f"mean={tensor.mean():.2f}",

mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Long

Expected

I'm expecting to get to orignal error in the stack

Which in my case was

Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

Solution

This PR implements a fix for this

limiteinductive
limiteinductive previously approved these changes Jan 9, 2024
Copy link
Contributor

@limiteinductive limiteinductive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me, important fix

@piercus
Copy link
Collaborator Author

piercus commented Jan 9, 2024

Thanks for your feedback @limiteinductive.

I feel it worth unit testing summarize_tensor.
It will prevent any regression here (which might not be visible elsewhere in the CI).

tests/fluxion/test_utils.py Outdated Show resolved Hide resolved
src/refiners/fluxion/utils.py Outdated Show resolved Hide resolved
piercus added a commit to piercus/refiners that referenced this pull request Jan 10, 2024
@limiteinductive limiteinductive force-pushed the summarize-non-float-tensor branch from aa232ce to d4cc507 Compare January 10, 2024 23:04
@deltheil deltheil added run-ci Run CI and removed run-ci Run CI labels Jan 11, 2024
@deltheil deltheil merged commit c141091 into finegrain-ai:main Jan 11, 2024
1 check passed
@deltheil
Copy link
Member

Thanks @piercus!

deltheil added a commit that referenced this pull request Jan 19, 2024
Calling `tensor.float()` on a complex tensor raises a warning:

    UserWarning: Casting complex values to real discards the imaginary
    part (Triggered internally at ../aten/src/ATen/native/Copy.cpp:299.)

Follow up of #171
catwell pushed a commit that referenced this pull request Jan 19, 2024
Calling `tensor.float()` on a complex tensor raises a warning:

    UserWarning: Casting complex values to real discards the imaginary
    part (Triggered internally at ../aten/src/ATen/native/Copy.cpp:299.)

Follow up of #171
deltheil added a commit that referenced this pull request Jan 19, 2024
Calling `tensor.float()` on a complex tensor raises a warning:

    UserWarning: Casting complex values to real discards the imaginary
    part (Triggered internally at ../aten/src/ATen/native/Copy.cpp:299.)

Follow up of #171
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-ci Run CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants