Skip to content

Latest commit

 

History

History
91 lines (73 loc) · 14.8 KB

Swarm_Learning_Command_Interface.md

File metadata and controls

91 lines (73 loc) · 14.8 KB

Swarm Learning Command Line Interface (SWCI) is the command interface tool to the Swarm Learning framework. It is used to view the status, control, and manage the Swarm Learning framework. SWCI manages the Swarm Learning framework using contexts and contracts. For more information on how to start the SWCI tool, see Starting SWCI.

The user can enter any command from the predefined set of commands in the SWCI prompt. The entered command is parsed, then executed and the output is shown in the command line.

The user can provide an init script file that has SWCI commands to be executed at the start of SWCI. If we provide this option, all the SWCI commands within this script file are processed, before it enters the interactive mode and waits for users commands. Users can simulate a non-interactive SWCI run, by having a bunch of SWCI commands and an SWCI EXIT command at the end of this init script file. This could be used for automation.

The subsequent sections provide details about SWCI-related commands, and are also part of the online help that is available within the tool.

A SWCI tool can control operations on multiple Swarm Learning framework instances. Each instance is encapsulated and represented as a context. At a given time, only one context is active. All SWCI commands work with the current active context (default context) and they do not take context as an explicit argument. Before running any SWCI command, users are required to explicitly create a context (using the CREATE context command) and activate it as follows:

+------------------------------------------------------------------------+
| SWCI:13 > CREATE CONTEXT **testContext** WITH IP sn.test.sw.net        |
| API Server is UP!                                                      |
| CONTEXT CREATED : **testContext**                                      |
| SWCI:14 > SWITCH CONTEXT testContext                                   |
| DEFAULT CONTEXT SET TO : testContext                                   |
| SWCI:15 >                                                              |
+------------------------------------------------------------------------+
Command Description and parameters
CREATE CONTEXT <contextName : string> WITH IP <ip : string> [port : string] Creates a SWCI context.
- contextname: A user given identifier for this context.
- ip: IP address or FQDN of the API server (Swarm Network node) serving Swarm Learning APIs.
- port(optional): The port number on which the API server is listening.
CREATE CONTEXT <contextName : string> WITH SERVICE <service : string> Creates a SWCI context. This is used when using reverse proxy.
- contextname: A user given identifier for this context.
- service: FQDN for the API Service of the associated Swarm Network node. If the reverse proxy is configured to run on any nondefault port, it has to be passed along with the service string parameter with ':' separator." This is an optional port.
GET CONTEXT INFO [contextName : string] Prints the context information of the current Context when contextName is not specified. If contextName is specified prints information for the specified context.
GET CONTEXT ENV \<contextName:string> \<env:string> Obtains the value of the specified context-specific environment variable. If the contextName is not specified, the default context is used.
- contextName: Specifies the name of the target context.
- env: Specifies the target environment variable within the context.
GET CONTEXT VERSION Returns the API Server’s version information.
LIST CONTEXTS Prints the list of contexts related to the current SWCI session.
LIST CONTEXT ENV <contextName : string>: Displays the list of context-specific environment variables. If the contextName is not specified, the default context is used.contextName: Specifies the context from which these environment variables are listed.
SET CONTEXT ENV <contextName : string> <env : string> <envValue : string> Sets the value of the specified context-specific environment variable. If the contextName is not specified, the default context is used. - contextName: Specifies the name of the target context.
- env: Specifies the target environment variable within the context.
- envValue: The value for the variable being set.
SWITCH CONTEXT <contextName : string> Switches current SWCI context to a specified context.
Command Description and parameters
CREATE CONTRACT <SLContractName:string> Registers the specified Swarm Learning contract into the Swarm Learning network.
GET CONTRACT INFO <SLContractName:string> Displays the static information about the specified Swarm Learning contract.
GET CONTRACT STATUS <SLContractName:string> Reports the current dynamic status of the specified Swarm Learning contract.
LIST CONTRACTS Displays the list of Swarm Learning contracts currently registered into the Swarm Learning network.
PERFDATA CONTRACT <SLContractName:string> Displays performance data about the training under the Swarm Learning contract.
It provides training performance data like UID, SL ADMIN status, model loss, model metric, total number of epochs and total number of completed epochs, for each SL-ML pair.
RESET CONTRACT <SLContractName:string> Resets the state of the contract to uninitialized state.
WARNING:This action cannot be undone, reset only completed Swarm Learning contracts. Resetting the active contracts can result in unexpected behavior.
A typical scenario would be when a user wants to reuse a completed training contract and start a new training session.
Command Description and parameters
APPEND TASK BODY <taskName : string> <idx : int> <contentLine : string> Add or overwrite a specified line in the task body.
NOTE:Finalized task cannot be modified.
- taskName: Specifies the task that must be modified.
- idx: Specifies a nonzero line that must be added or modified.
- contentLine: Actual text that is less the 80 characters.
CREATE TASK <taskType : string> <taskName : string> <author : string> <prereq : string> <outcome : string> Creates a task and registers into Swarm network.
- taskType: Specifies the type of task to be created.
- taskName: Specifies the unique name of task.
- author: Specifies the author of the task.
- prereq: Specifies the prerequisite required to create and execute the task.
- outcome: Specifies the name of the artifact that is produced as a result of this task.
CREATE TASK FROM <taskDefFile:string>: taskDefFile is a relative path to task definition file.
DELETE TASK <taskName : string> Delete the specified task. This command is used for removing tasks that have errors while being created.
NOTE:Finalized task cannot be deleted.
taskName: Specifies the task that must be deleted.
FINALIZE TASK <taskName : string> Finalizes the specified task. Once finalized, task body cannot be modified further.
taskName: Specifies the task that must be finalized.
GET TASK BODY <taskName : string> Prints the consolidated task body for the specified Task ID.
GET TASK INFO <taskName : string> Prints task information of the specified Task ID.
LIST TASKS Displays the list of tasks that are registered into the Swarm Learning network.
Command Description and parameters
ASSIGN TASK <taskName : string> TO <trName : string> WITH <peersNeeded> PEERS Assigns a task to a taskrunner instance and specify the minimum peer count for declaring the result. This triggers SWOPs to execute this task

- taskName: Specifies the unique name of task.
- trName: Specifies the unique name of Taskrunner. SWOP’s listening on this Taskrunner participates and executes this task.
- WITH <peersNeeded> PEERS:

peersNeeded is a nonzero positive integer specifying the minimum number of peers required to complete this task.

Meaningful value of minimum peers is dependent on the task type. For RUN_SWARM task, it is the number of SL and ML node pairs. For all other TASK types, it is the number of SWOP nodes.

For RUN_SWARM task type, the actual numbers of SL/ML peers started could be equal or greater than peersNeeded, depending on the number of the SL nodes defined in the SWOP profiles.

For other TASK types, the number of SWOPs participating would be equal to peersNeeded.
CREATE TASKRUNNER <trName : string> Creates and registers a Taskrunner contract into Swarm network.
trName: Specifies the unique name of Taskrunner.
GET TASKRUNNER INFO <taskRunnerName : string> Displays the current status of the specified Taskrunner ID.
GET TASKRUNNER PEER STATUS <taskRunnerName:string><swopIndex\swopUid>: Displays current status for the specified PEER in the Taskrunner’s context.

- taskRunnerName: Specifies the unique name of Taskrunner.
- swopIndex: The index number of SWOP Node, starts with 0 up to ENROLLEDSWOP.
- swopUid:: The UID String of the SWOP Node. You can specify either swopIndex or swopUid, not both.

The status of the PEER differs based on the type of the current TASK that has been assigned to TASKRUNNER.

For RUN_SWARM task, the status summary reports SWOP node UID, Number of SL PEERs this SWOP has spawned, and list of all SL node information (UID, Status, Description).

For all other types of tasks, the status summary reports SWOP node status (UID, Status, Description).

Note: Node UID can be used to identify the container name/id from ‘LIST NODES’ command. With container name/id, user can debug the error with docker logs.
GET TASKRUNNER STATUS <trName:string> Displays the current status of the specified Taskrunner.
trName: Specifies the unique name of Taskrunner.
Provides below information like,
- TASK NAME – Current running task or ‘Empty’ if no task is assigned.
- PEER TYPE – SL for RUN_SWARM tasks, SWOP for all other tasks.
- TASK STATE – Current task state.
- ACTIVE, COMPLETED and FAILED PEERS information.
- TIME STAMPs of various events on the TASKRUNNER contract.
LIST TASKRUNNERS Displays the list of Taskrunners that are registered into the Swarm Learning network.
LIST TASKRUNNER PEERS <taskRunnerName:string> Displays list of enrolled peers for the specified Taskrunner ID.
RESET TASKRUNNER <trName : string> Resets the state of the taskrunner contract to an uninitialized state.
WARNING:This action cannot be undone, reset only completed Taskrunner contracts. Resetting the active taskrunner contract can result in unexpected behavior.
trName: Specifies the unique name of Taskrunner.
WAIT FOR TASKRUNNER <trName : string> Waits for the Taskrunner to complete its current task.
 
-trName: Specifies the unique name of the Taskrunner.
 
WAITING FOR TASKRUNNER TO COMPLETE - Maximum wait time is : <SWCI_TASK_MAX_WAIT_TIME>.
 
If SWCI_TASK_MAX_WAIT_TIME is not set, the default value is 120 mins.
 
Prints # (a progress indicator) periodically until the task completes. When the task completes, the final state of the task is printed.
 
If the maximum wait time is reached, the following warning message is displayed:
WARNING - Maximum configured SWCI wait time is over
Command Description and parameters
cd <dirPath> This method changes the current working directory of SWCI container.
EXIT This command exits the SWCI session unconditionally.
EXIT ON FAILURE [ON/OFF] This command instructs SWCI to exit the current session when any of the subsequent commands fail. The default value is OFF.
LIST NODES This command displays the list of Swarm nodes that have registered and are currently active.
 
For each Swarm node, it displays the Node type, Host IP, Port, Container name, UUID, parent UUID and the last received ‘i-amalive’ Timestamp.
 
SWCI and ML nodes are not displayed.
 
For a reverse proxy scenario, it displays the service name of the associated node, instead of Host IP. Users can look at their NGINX configuration file to know the IP addresses of the respective service names.
 
For Sentinel and Non-Sentinel nodes, it displays its respective API service names. For SL nodes, it displays the FS service names. For SWOP node, it displays the API service name of the associated SN node.
ls() This method displays the directory contents of the SWCI container.
pwd() This method displays the present working directory of the SWCI container.
SLEEP This command sleeps for a specified time before executing the subsequent commands.
 
For example, in between a WAIT FOR TASKRUNNER and RESET TASKRUNNER, one can use a SLEEP 10, to give a grace time of 10 secs, before the RESET command cleans up the SL and user container.
 
This would be required to allow the user ML code to save the model or do any inference of the model, after the Swarm training is over.
 
For more information, see the example SWCI scripts in the swarm-learning/examples/ directory.
WAIT FOR IP <ip:string> [port:string] [retries:string] This command waits for the specified API server to accept connections.
- ip: The IP address or FQDN of the API server (SN node) serving Swarm Learning APIs.
- port: (Optional) The string representation of the port number on which the API Server is listening.
- retries: (Optional). Default "retries" is 360 times (30 mins). This is the maximum number of times SWCI reattempts to connect after waiting for a 5 seconds timeout period.
WAIT FOR SERVICE <service:string> [retries:string] This command waits for the specified API server to accept connections.
- service: FQDN for the API Service of the associated Swarm Network node. If the Port number exists, it has to be passed along with the service string parameter with ':' separator. This is an optional port.
- retries: (Optional). Default "retries" is 360 times (30 mins). This is the maximum number of times SWCI reattempts to connect after waiting for a 5 seconds timeout period.

For programmatic interface of SWCI, see - SWCI APIs