Make summarize_tensor robust to non-float dtypes #171

piercus · 2024-01-09T09:16:43Z

Current implementation of summarize_tensor can raise secondary error while obfuscating the original error.

Actual

When doing some experiments, the stack was displaying a secondary error

Traceback (most recent call last):
  File "/home/pierre/dev/refiners/scripts/training/finetune-ldm-color-palette.py", line 15, in <module>
    trainer.train()
  File "/home/pierre/dev/refiners/src/refiners/training_utils/trainer.py", line 89, in inner_wrapper
    result = func(*args, **kwargs)
  File "/home/pierre/dev/refiners/src/refiners/training_utils/trainer.py", line 528, in train
    self.epoch()
  File "/home/pierre/dev/refiners/src/refiners/training_utils/trainer.py", line 512, in epoch
    self.step(batch=batch)
  File "/home/pierre/dev/refiners/src/refiners/training_utils/trainer.py", line 503, in step
    loss = self.compute_loss(batch=batch)
  File "/home/pierre/dev/refiners/src/refiners/training_utils/color_palette.py", line 234, in compute_loss
    prediction = self.unet(noisy_latents)
  File "/home/pierre/anaconda3/envs/refiners/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/pierre/anaconda3/envs/refiners/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/pierre/dev/refiners/src/refiners/fluxion/layers/chain.py", line 279, in forward
    result = self._call_layer(layer, name, *intermediate_args)
  File "/home/pierre/dev/refiners/src/refiners/fluxion/layers/chain.py", line 273, in _call_layer
    raise ChainError(message) from None
refiners.fluxion.layers.chain.ChainError: RuntimeError:
   File "/home/pierre/dev/refiners/src/refiners/fluxion/utils.py", line 199, in summarize_tensor
    f"mean={tensor.mean():.2f}",

mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Long

Expected

I'm expecting to get to orignal error in the stack

Which in my case was

Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

Solution

This PR implements a fix for this

limiteinductive

Seems good to me, important fix

src/refiners/fluxion/utils.py

piercus · 2024-01-09T12:26:04Z

Thanks for your feedback @limiteinductive.

I feel it worth unit testing summarize_tensor.
It will prevent any regression here (which might not be visible elsewhere in the CI).

tests/fluxion/test_utils.py

src/refiners/fluxion/utils.py

deltheil · 2024-01-11T08:58:22Z

Thanks @piercus!

Calling `tensor.float()` on a complex tensor raises a warning: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at ../aten/src/ATen/native/Copy.cpp:299.) Follow up of #171

deltheil requested a review from limiteinductive January 9, 2024 10:00

limiteinductive previously approved these changes Jan 9, 2024

View reviewed changes

limiteinductive reviewed Jan 9, 2024

View reviewed changes

src/refiners/fluxion/utils.py Outdated Show resolved Hide resolved

deltheil added the run-ci Run CI label Jan 9, 2024

piercus dismissed limiteinductive’s stale review via 2a9665e January 9, 2024 12:21

limiteinductive suggested changes Jan 9, 2024

View reviewed changes

tests/fluxion/test_utils.py Outdated Show resolved Hide resolved

src/refiners/fluxion/utils.py Outdated Show resolved Hide resolved

piercus added a commit to piercus/refiners that referenced this pull request Jan 10, 2024

finegrain-ai#171: clean assert & generic mean

aa232ce

piercus requested a review from limiteinductive January 10, 2024 22:41

piercus added 4 commits January 11, 2024 00:04

make summarize_tensor robust to non-float dtypes

d1376d4

experiment std and norm for non-float tensor

45be9ca

add unit tests for complex/double/int tensors

ac81c27

finegrain-ai#171: clean assert & generic mean

d4cc507

limiteinductive force-pushed the summarize-non-float-tensor branch from aa232ce to d4cc507 Compare January 10, 2024 23:04

limiteinductive approved these changes Jan 10, 2024

View reviewed changes

deltheil added run-ci Run CI and removed run-ci Run CI labels Jan 11, 2024

deltheil merged commit c141091 into finegrain-ai:main Jan 11, 2024
1 check passed

deltheil mentioned this pull request Jan 19, 2024

summarize_tensor: fix minor warning #195

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make summarize_tensor robust to non-float dtypes #171

Make summarize_tensor robust to non-float dtypes #171

piercus commented Jan 9, 2024

limiteinductive left a comment

piercus commented Jan 9, 2024

deltheil commented Jan 11, 2024

Make summarize_tensor robust to non-float dtypes #171

Make summarize_tensor robust to non-float dtypes #171

Conversation

piercus commented Jan 9, 2024

Actual

Expected

Solution

limiteinductive left a comment

Choose a reason for hiding this comment

piercus commented Jan 9, 2024

deltheil commented Jan 11, 2024