[Refactor] Apply latest version of flux and merge all branches from ai-compiler-study/flux #1

cmpark0126 · 2024-11-30T05:42:17Z

Features

Apply the latest version of black-forest-labs/flux
Merge ai-compiler-study:main and ai-compiler-study:triton
Refactor codes to solve the problems below:
- Refactored the codebase to make custom kernel implementations optional while maintaining backward compatibility with the original FLUX implementation, allowing users to choose between different kernel versions.
  - e.g., triton kernel implemented by @sjjeong94 at src/flux/model.py
  - e.g., XFORMERS_FLASH3 at src/flux/math.py
- Remain cli as cli program not for benchmarking

https://replicate.com/collections/flux-fine-tunes

Co-authored-by: Neil Movva <[email protected]>

* Remove unused import * Remove extraneous `f` prefix --------- Co-authored-by: Emil Sadek <[email protected]>

* apply ruff * rename * specify ruff version for CI * also check imports * check formatting

cmpark0126 · 2024-11-30T05:54:15Z

demo_st.py

+        if not CHECK_NSFW or nsfw_score < NSFW_THRESHOLD:
+            buffer = BytesIO()


I have switched to using CHECK_NSFW instead of the previously commented-out code.

cmpark0126 · 2024-11-30T05:59:59Z

src/flux/cli.py

-class CudaTimer:
-    """
-    A static context manager class for measuring execution time of PyTorch code
-    using CUDA events. It synchronizes GPU operations to ensure accurate time measurements.
-    """
-
-    def __init__(self, name="", precision=5, display=False):
-        self.name = name
-        self.precision = precision
-        self.display = display
-
-    def __enter__(self):
-        torch.cuda.synchronize()
-        self.start_event = torch.cuda.Event(enable_timing=True)
-        self.end_event = torch.cuda.Event(enable_timing=True)
-        self.start_event.record()
-        return self
-
-    def __exit__(self, *exc):
-        self.end_event.record()
-        torch.cuda.synchronize()
-        # Convert from ms to s
-        self.elapsed_time = self.start_event.elapsed_time(self.end_event) * 1e-3
-
-        if self.display:
-            print(f"{self.name}: {self.elapsed_time:.{self.precision}f} s")
-
-    def get_elapsed_time(self):
-        """Returns the elapsed time in microseconds."""
-        return self.elapsed_time
-


Removed benchmarking functionality from cli.py since similar benchmarking capabilities are available in benchmark/benchmark_flux.py. This change aligns with the primary purpose of cli.py, which is not intended for benchmarking operations.

cmpark0126 · 2024-11-30T06:04:02Z

src/flux/math.py

    if xformers_flash3:
+        if torch_sdpa or triton_attention:
+            print(
+                "Warning: xformers_flash3 is enabled, but torch_sdpa or triton_attention is also enabled. "
+                "Please remain only one of them."
+            )
+
        q = q.permute(0, 2, 1, 3) # B, H, S, D
        k = k.permute(0, 2, 1, 3) # B, H, S, D
        v = v.permute(0, 2, 1, 3) # B, H, S, D

-        x = compiled_xformers_flash_hopper(q, k, v).permute(0,2,1,3)
-    if torch_sdpa:
+        x = _compiled_xformers_flash_hopper(q, k, v).permute(0,2,1,3)
+    elif torch_sdpa:
+        if triton_attention:
+            print(
+                "Warning: torch_sdpa is enabled, but triton_attention is also enabled. "
+                "Please remain only one of them."
+            )
+
        x = scaled_dot_product_attention(q, k, v)
-    if triton_attention:
+    elif triton_attention:
+        from triton.ops import attention as attention_triton
+
        softmax_scale = q.size(-1) ** -0.5
        x = attention_triton(q, k, v, True, softmax_scale)
+    else:
+        x = torch.nn.functional.scaled_dot_product_attention(q, k, v)


Refactored attention mechanism implementation to improve code maintainability and provide clearer error handling when multiple attention methods are enabled simultaneously. Major changes include:

Lazy importing of optional dependencies (xformers, triton)

Added warning messages for conflicting attention method selections

Made default attention method explicit (torch.scaled_dot_product_attention)

cmpark0126 · 2024-11-30T06:05:37Z

src/flux/model.py

+try:
+    import triton_kernels
+    from triton_kernels import SingleStreamBlock, DoubleStreamBlock
+except ImportError:
+    print("Triton kernels not found, using flux native implementation.")
+    from flux.modules.layers import SingleStreamBlock, DoubleStreamBlock
+except ModuleNotFoundError:
+    print("Triton kernels not found, using flux native implementation.")
+    from flux.modules.layers import SingleStreamBlock, DoubleStreamBlock
+except Exception as e:
+    print(f"Error: {e}")
+    from flux.modules.layers import SingleStreamBlock, DoubleStreamBlock


Added graceful fallback mechanism for triton kernel imports - if triton kernels are unavailable or fail to load, the code automatically falls back to native FLUX implementation while providing appropriate error messages. This ensures smoother execution across different environments and clearer debugging information.

zeke and others added 17 commits August 29, 2024 13:56

Add link to fine-tunes collection on Replicate (black-forest-labs#130)

d013dc4

https://replicate.com/collections/flux-fine-tunes

Add Torch CUDA sync to fix timing code in cli.py (black-forest-labs#147)

87f6fff

Co-authored-by: Neil Movva <[email protected]>

Update API interface for FLUX.1.1 [pro]

5826335

CLI: /n is for steps, not seeds (black-forest-labs#169)

d219f1a

single stream block with triton kernels

b88df1b

Update README.md

ca98d5e

Update README.md

01ff622

Remove unused import and extraneous f prefix (black-forest-labs#171)

c0e2ac6

* Remove unused import * Remove extraneous `f` prefix --------- Co-authored-by: Emil Sadek <[email protected]>

update readme for 1.1

478338d

double stream block with triton kernels

e3ad713

Ruff ci (black-forest-labs#194)

7e14a05

* apply ruff * rename * specify ruff version for CI * also check imports * check formatting

FLUX Tools

805da85

Merge branch 'main' into refactor/merge-all

3317b6f

Add: gitignore rules

1a42570

Add: left log when error occurs during importing triton kernel package

4e7a4db

Merge branch 'study-main' into refactor/merge-all

fba7f07

Refactor: restore compile, ignore nsfw option

065b642

cmpark0126 commented Nov 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Apply latest version of flux and merge all branches from ai-compiler-study/flux #1

[Refactor] Apply latest version of flux and merge all branches from ai-compiler-study/flux #1

cmpark0126 commented Nov 30, 2024 •

edited

Loading

cmpark0126 Nov 30, 2024

cmpark0126 Nov 30, 2024

cmpark0126 Nov 30, 2024

cmpark0126 Nov 30, 2024

		if not CHECK_NSFW or nsfw_score < NSFW_THRESHOLD:
		buffer = BytesIO()

[Refactor] Apply latest version of flux and merge all branches from ai-compiler-study/flux #1

Are you sure you want to change the base?

[Refactor] Apply latest version of flux and merge all branches from ai-compiler-study/flux #1

Conversation

cmpark0126 commented Nov 30, 2024 • edited Loading

Features

cmpark0126 Nov 30, 2024

Choose a reason for hiding this comment

cmpark0126 Nov 30, 2024

Choose a reason for hiding this comment

cmpark0126 Nov 30, 2024

Choose a reason for hiding this comment

cmpark0126 Nov 30, 2024

Choose a reason for hiding this comment

cmpark0126 commented Nov 30, 2024 •

edited

Loading