Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display module dtype and device #173

Merged
merged 3 commits into from
Jan 11, 2024

Conversation

piercus
Copy link
Collaborator

@piercus piercus commented Jan 11, 2024

Context

In the context of #165, I'm splitting the training on different GPUs.

While doing this, I faced a lot of errors like Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!.

Actual

The stack trace gives information about input tensor device allocation, but not any information about the module's device & dtype information.

Expected

Print the device and the dtype of modules in the stack trace.

@limiteinductive
Copy link
Contributor

This clutters a lot the tree view.
image

I have two ideas to improve this matter:

  • show dtype/device only for WeightedModule
  • show only on the higher level Chain and hide this information except where it's different

Copy link
Contributor

@limiteinductive limiteinductive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is sound, let's find a better way to do it.

piercus added a commit to piercus/refiners that referenced this pull request Jan 11, 2024
@piercus
Copy link
Collaborator Author

piercus commented Jan 11, 2024

This version is much more KISS and should be less verbose also.
Since Chain has _show_only_tag(). I feel we should show only tag for Chain.

@piercus
Copy link
Collaborator Author

piercus commented Jan 11, 2024

Thanks for your help @limiteinductive

@limiteinductive limiteinductive force-pushed the module-log-dtype-device branch from 075ab1e to cfea67c Compare January 11, 2024 17:27
@deltheil deltheil added the run-ci Run CI label Jan 11, 2024
@deltheil deltheil merged commit 457c3f5 into finegrain-ai:main Jan 11, 2024
1 check passed
piercus added a commit to piercus/refiners that referenced this pull request Jan 12, 2024
@piercus piercus mentioned this pull request Jan 12, 2024
deltheil pushed a commit that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-ci Run CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants