Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIU-412: Basic graph saving in DIM dlg/workspace #291

Merged
merged 8 commits into from
Nov 4, 2024
Merged

LIU-412: Basic graph saving in DIM dlg/workspace #291

merged 8 commits into from
Nov 4, 2024

Conversation

myxie
Copy link
Collaborator

@myxie myxie commented Oct 17, 2024

JIRA Ticket

LIU-412

Type

  • Feature (addition)

Problem/Issue

To progress towards re-running a graph, or to determine what the graph configuration was for a given session, we need to store graphs. Currently this is not something we can do in DALiUGE.

Solution

This PR adds an option for the DIM/MM to store the graphs in the DALiuGE workspace for the node that the DIM is running on. It is necessary to use the DIM/MM to do this, as NodeManagers only receive partitioned graphs and therefore do not have the entire "picture".

Example output
image

After discussion in-person, we have also made the decision to remove the -w/--work-dir command line argument, as it was not achieving the purpose for which it was intended (i.e. the working directory was intended to be the workspace, but we never set the workspace directory as anything other than DLG_ROOT/workspace).

Checklist

Summary by Sourcery

Implement basic graph saving functionality in the DALiuGE workspace by adding an option to store graphs through the DIM/MM. Remove the '-w/--work-dir' command line argument to simplify configuration.

New Features:

  • Add functionality to store graphs in the DALiuGE workspace for the node that the DIM is running on.

Enhancements:

  • Remove the '-w/--work-dir' command line argument as it was not serving its intended purpose.

@coveralls
Copy link

coveralls commented Oct 17, 2024

Coverage Status

coverage: 79.655% (-0.006%) from 79.661%
when pulling 9a3de7d on LIU-412
into ff6f09b on master.

myxie and others added 6 commits October 22, 2024 14:02
…mmand line flag.

- This is supported in cluster deployments, where the CLI flag is set for highest-level manager, and will therefore be stored on the node on which it is running.
- Previously, we had a work directory, but only stored the settings in there. This is because we never kept track of the user-provided work directory, and instead used the default.
- We use a new DLG_WORKSPACE environment variable to maintain this state on the node
- This never actually worked as intended, as we didn't maintain the state of the working directory/workspace. (getDlgWorkDir always ended up being to DLG_ROOT\workspace)
- This removes code previously added as part of this work to set the working directory, as we have decided there's no real need to maintain a workspace separate to the standard one.
@myxie myxie marked this pull request as ready for review October 31, 2024 07:52
Copy link
Contributor

sourcery-ai bot commented Oct 31, 2024

Reviewer's Guide by Sourcery

This PR implements graph storage functionality in DALiUGE by adding the ability to save physical graphs in the workspace directory. It also removes the deprecated working directory command line arguments, simplifying the workspace directory handling by always using DLG_ROOT/workspace.

Sequence diagram for graph storage process

sequenceDiagram
    participant User
    participant DIM
    participant CompositeManager
    participant FileSystem

    User->>DIM: Submit graph
    DIM->>CompositeManager: Add graphSpec
    alt dump_graphs is true
        CompositeManager->>FileSystem: _dump_graph_to_file(sessionId, graphSpec)
        FileSystem-->>CompositeManager: Graph saved
    end
    CompositeManager-->>DIM: Graph processed
Loading

Class diagram for CompositeManager and related classes

classDiagram
    class CompositeManager {
        -list[str] dmHosts
        -str pkeyPath
        -int dmCheckTimeout
        -bool dump_graphs
        +_dump_graph_to_file(sessionId: str, graphSpec: dict)
    }
    class DataIslandManager {
        -list[str] dmHosts
        -str pkeyPath
        -int dmCheckTimeout
        -bool dump_graphs
    }
    class MasterManager {
        -list[str] dmHosts
        -str pkeyPath
        -int dmCheckTimeout
        -bool dump_graphs
    }
    CompositeManager <|-- DataIslandManager
    CompositeManager <|-- MasterManager
    note for CompositeManager "Added dump_graphs attribute and _dump_graph_to_file method"
Loading

File-Level Changes

Change Details Files
Added graph storage functionality to the Composite Manager
  • Added new --dump_graphs command line option
  • Implemented _dump_graph_to_file method to store physical graphs as JSON
  • Added dump_graphs parameter to CompositeManager, DataIslandManager, and MasterManager classes
  • Enabled graph dumping in cluster deployment scripts
daliuge-engine/dlg/manager/cmdline.py
daliuge-engine/dlg/manager/composite_manager.py
daliuge-engine/dlg/deploy/start_dlg_cluster.py
Simplified workspace directory handling
  • Removed -w/--work-dir command line option
  • Removed --cwd option and related logic
  • Standardized working directory to always use DLG_ROOT/workspace
  • Removed get_workspace_dir function from cluster deployment
daliuge-engine/dlg/manager/cmdline.py
daliuge-engine/dlg/deploy/start_dlg_cluster.py
Enhanced logging functionality
  • Added warning message to indicate logging level at startup
  • Improved log message formatting
daliuge-engine/dlg/manager/cmdline.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @myxie - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@@ -216,15 +203,14 @@ def start_dim(node_list, log_dir, origin_ip, logv=1):
args = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

@@ -242,15 +228,14 @@
args = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

Copy link
Contributor

@awicenec awicenec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, but I have two questions/concerns:

  1. It is not totally clear to me whether the previous default functionality to put the workspace under $DLG_ROOT is still maintained, since the get_workspace function, which returned that directory is now gone.
  2. Does this interfere with the start_dlg_cluster, which already puts everything into that directory?

@myxie
Copy link
Collaborator Author

myxie commented Nov 4, 2024

Hi Andreas,

We discussed this in person on Friday, but I thought I'd leave a detailed reply here for posterity and to clear up any lingering confusing that might exist.

It is not totally clear to me whether the previous default functionality to put the workspace under $DLG_ROOT is still maintained, since the get_workspace function, which returned that directory is now gone.

The deleted get_workspace function was attempting to set the DALiUGE workspace on Setonix in the parent directory of the log_dir. This wasn't actually happening through the get_workspace function, because the -w/--work-dir command-line argument was never setting the workspace directory; the workspace directory was always being returned by utils.getDlgWorkDir:

def getDlgWorkDir():
"""
Returns the location of the directory used by the DALiuGE framework to store
results. If `createIfMissing` is True, the directory will be created if it
currently doesn't exist
"""
return os.path.join(getDlgDir(), "workspace")

As asked in your question, this will always sets the workspace directory to $DLG_ROOT/workspace.

def getDlgDir():
"""
Returns the root of the directory structure used by the DALiuGE framework at
runtime.
"""
if "DLG_ROOT" in os.environ:
path = os.environ["DLG_ROOT"]
else:
path = os.path.join(os.path.expanduser("~"), "dlg")
os.environ["DLG_ROOT"] = path
logger.debug(f"DLG_ROOT directory is {path}")
return path

Does this interfere with the start_dlg_cluster, which already puts everything into that directory?

This should not interfere with start_dlg_cluster, because the SlurmConfig that we create for Setonix sets the DLG_ROOT environment variable to the directory we were trying to set it to in get_workspace:

HOME_DIR = f"/scratch/{ACCOUNT}"
DLG_ROOT = f"{HOME_DIR}/{USER}/dlg"
LOG_DIR = f"{DLG_ROOT}/log"

@myxie myxie merged commit 7163226 into master Nov 4, 2024
19 checks passed
@myxie myxie deleted the LIU-412 branch November 4, 2024 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants