Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace legacy commands with 'dask worker' and 'dask scheduler'. #399

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions dask_cloudprovider/aws/ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,9 +265,6 @@ class EC2Cluster(VMCluster):
It is assumed that the ``ami`` will not have Docker installed (or the NVIDIA drivers for GPU instances).
If ``bootstrap`` is ``True`` these dependencies will be installed on instance start. If you are using
a custom AMI which already has these dependencies set this to ``False.``
worker_command: string (optional)
The command workers should run when starting. By default this will be ``"dask-worker"`` unless
``instance_type`` is a GPU instance in which case ``dask-cuda-worker`` will be used.
ami: string (optional)
The base OS AMI to use for scheduler and workers.

Expand Down Expand Up @@ -340,7 +337,7 @@ class EC2Cluster(VMCluster):
The Docker image to run on all instances.

This image must have a valid Python environment and have ``dask`` installed in order for the
``dask-scheduler`` and ``dask-worker`` commands to be available. It is recommended the Python
``dask scheduler`` and ``dask worker`` commands to be available. It is recommended the Python
environment matches your local environment where ``EC2Cluster`` is being created from.

For GPU instance types the Docker image much have NVIDIA drivers and ``dask-cuda`` installed.
Expand Down
79 changes: 44 additions & 35 deletions dask_cloudprovider/aws/ecs.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ class Task:
AWS resource tags to be applied to any resources that are created.

name: str (optional)
Name for the task. Currently used for the --namecommand line argument to dask-worker.
Name for the task. Currently used for the --namecommand line argument to `dask worker`.

platform_version: str (optional)
Version of the AWS Fargate platform to use, e.g. "1.4.0" or "LATEST". This
Expand Down Expand Up @@ -368,7 +368,7 @@ class Scheduler(Task):
scheduler_timeout: str
Time of inactivity after which to kill the scheduler.
scheduler_extra_args: List[str] (optional)
Any extra command line arguments to pass to dask-scheduler, e.g. ``["--tls-cert", "/path/to/cert.pem"]``
Any extra command line arguments to pass to ``dask scheduler``, e.g. ``["--tls-cert", "/path/to/cert.pem"]``

Defaults to `None`, no extra command line arguments.
kwargs:
Expand All @@ -386,7 +386,8 @@ def __init__(
self.task_type = "scheduler"
self._overrides = {
"command": [
"dask-scheduler",
"dask",
"scheduler",
"--idle-timeout",
scheduler_timeout,
]
Expand Down Expand Up @@ -434,24 +435,25 @@ def __init__(
self._mem = mem
self._gpu = gpu
self._nthreads = nthreads
_command = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why create this variable?

"dask",
"cuda" if self._gpu else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm apprehensive about having a None in this string. Have you tested that this works as expected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a None but it gets filtered out before we use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the introduction and filtering of None hard to parse when reviewing this.

I think it would be more readable to have something like

if self._gpu:
    _command = ["dask", "cuda", "worker"]
else:
    _command = ["dask", "worker"]
_command += [OTHERARGS...]

"worker",
self.scheduler,
"--name",
str(self.name),
"--nthreads",
"{}".format(
max(int(self._cpu / 1024), 1) if nthreads is None else self._nthreads
),
"--memory-limit",
"{}GB".format(int(self._mem / 1024)),
"--death-timeout",
"60",
]
_command = [e for e in _command if e is not None]
self._overrides = {
"command": [
"dask-cuda-worker" if self._gpu else "dask-worker",
self.scheduler,
"--name",
str(self.name),
"--nthreads",
"{}".format(
max(int(self._cpu / 1024), 1)
if nthreads is None
else self._nthreads
),
"--memory-limit",
"{}GB".format(int(self._mem / 1024)),
"--death-timeout",
"60",
]
+ (list() if not extra_args else extra_args)
"command": _command + (list() if not extra_args else extra_args)
}


Expand Down Expand Up @@ -507,7 +509,7 @@ class ECSCluster(SpecCluster, ConfigMixin):

Defaults to ``8786``
scheduler_extra_args: List[str] (optional)
Any extra command line arguments to pass to dask-scheduler, e.g. ``["--tls-cert", "/path/to/cert.pem"]``
Any extra command line arguments to pass to ``dask scheduler``, e.g. ``["--tls-cert", "/path/to/cert.pem"]``

Defaults to `None`, no extra command line arguments.
scheduler_task_definition_arn: str (optional)
Expand Down Expand Up @@ -553,7 +555,7 @@ class ECSCluster(SpecCluster, ConfigMixin):
Defaults to `None`, meaning that the task definition will be created along with the cluster, and cleaned up once
the cluster is shut down.
worker_extra_args: List[str] (optional)
Any extra command line arguments to pass to dask-worker, e.g. ``["--tls-cert", "/path/to/cert.pem"]``
Any extra command line arguments to pass to ``dask worker``, e.g. ``["--tls-cert", "/path/to/cert.pem"]``

Defaults to `None`, no extra command line arguments.
worker_task_kwargs: dict (optional)
Expand Down Expand Up @@ -702,7 +704,7 @@ class ECSCluster(SpecCluster, ConfigMixin):
... worker_gpu=1)

By setting the ``worker_gpu`` option to something other than ``None`` will cause the cluster
to run ``dask-cuda-worker`` as the worker startup command. Setting this option will also change
to run ``dask cuda worker`` as the worker startup command. Setting this option will also change
the default Docker image to ``rapidsai/rapidsai:latest``, if you're using a custom image
you must ensure the NVIDIA CUDA toolkit is installed with a version that matches the host machine
along with ``dask-cuda``.
Expand Down Expand Up @@ -1195,7 +1197,8 @@ async def _create_scheduler_task_definition_arn(self):
"memoryReservation": self._scheduler_mem,
"essential": True,
"command": [
"dask-scheduler",
"dask",
"scheduler",
"--idle-timeout",
self._scheduler_timeout,
]
Expand Down Expand Up @@ -1266,17 +1269,23 @@ async def _create_worker_task_definition_arn(self):
"resourceRequirements": resource_requirements,
"essential": True,
"command": [
"dask-cuda-worker" if self._worker_gpu else "dask-worker",
"--nthreads",
"{}".format(
max(int(self._worker_cpu / 1024), 1)
if self._worker_nthreads is None
else self._worker_nthreads
),
"--memory-limit",
"{}MB".format(int(self._worker_mem)),
"--death-timeout",
"60",
e
for e in [
"dask",
"cuda" if self._worker_gpu else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To both the above, just so it was more visible that we're filtering Nones out.
That seemed like the easiest way to have a variable number of arguments in the command.

"worker",
"--nthreads",
"{}".format(
max(int(self._worker_cpu / 1024), 1)
if self._worker_nthreads is None
else self._worker_nthreads
),
"--memory-limit",
"{}MB".format(int(self._worker_mem)),
"--death-timeout",
"60",
]
if e is not None
]
+ (
list()
Expand Down
2 changes: 1 addition & 1 deletion dask_cloudprovider/azure/azurevm.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,7 @@ class AzureVMCluster(VMCluster):
The Docker image to run on all instances.

This image must have a valid Python environment and have ``dask`` installed in order for the
``dask-scheduler`` and ``dask-worker`` commands to be available. It is recommended the Python
``dask scheduler`` and ``dask worker`` commands to be available. It is recommended the Python
environment matches your local environment where ``AzureVMCluster`` is being created from.

For GPU instance types the Docker image much have NVIDIA drivers and ``dask-cuda`` installed.
Expand Down
1 change: 0 additions & 1 deletion dask_cloudprovider/cloudprovider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ cloudprovider:
availability_zone: null # The availability zone to start you clusters. By default AWS will select the AZ with most free capacity.
bootstrap: true # It is assumed that the AMI does not have Docker and needs bootstrapping. Set this to false if using a custom AMI with Docker already installed.
auto_shutdown: true # Shutdown instances automatically if the scheduler or worker services time out.
# worker_command: "dask-worker" # The command for workers to run. If the instance_type is a GPU instance dask-cuda-worker will be used.
ami: null # AMI ID to use for all instances. Defaults to latest Ubuntu 20.04 image.
instance_type: "t2.micro" # Instance type for the scheduler and all workers
scheduler_instance_type: "t2.micro" # Instance type for the scheduler
Expand Down
2 changes: 1 addition & 1 deletion dask_cloudprovider/digitalocean/droplet.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ class DropletCluster(VMCluster):
The Docker image to run on all instances.

This image must have a valid Python environment and have ``dask`` installed in order for the
``dask-scheduler`` and ``dask-worker`` commands to be available. It is recommended the Python
``dask scheduler`` and ``dask worker`` commands to be available. It is recommended the Python
environment matches your local environment where ``EC2Cluster`` is being created from.

For GPU instance types the Docker image much have NVIDIA drivers and ``dask-cuda`` installed.
Expand Down
2 changes: 1 addition & 1 deletion dask_cloudprovider/gcp/instances.py
Original file line number Diff line number Diff line change
Expand Up @@ -437,7 +437,7 @@ class GCPCluster(VMCluster):
The Docker image to run on all instances.

This image must have a valid Python environment and have ``dask`` installed in order for the
``dask-scheduler`` and ``dask-worker`` commands to be available. It is recommended the Python
``dask scheduler`` and ``dask worker`` commands to be available. It is recommended the Python
environment matches your local environment where ``EC2Cluster`` is being created from.

For GPU instance types the Docker image much have NVIDIA drivers and ``dask-cuda`` installed.
Expand Down
2 changes: 1 addition & 1 deletion dask_cloudprovider/gcp/tests/test_gcp.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ async def test_get_cloud_init():
docker_args="--privileged",
extra_bootstrap=["gcloud auth print-access-token"],
)
assert "dask-scheduler" in cloud_init
assert "dask scheduler" in cloud_init
assert "# Bootstrap" in cloud_init
assert " --privileged " in cloud_init
assert "- gcloud auth print-access-token" in cloud_init
Expand Down
4 changes: 2 additions & 2 deletions dask_cloudprovider/generic/vmcluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ class VMCluster(SpecCluster):
The Docker image to run on all instances.

This image must have a valid Python environment and have ``dask`` installed in order for the
``dask-scheduler`` and ``dask-worker`` commands to be available. It is recommended the Python
``dask scheduler`` and ``dask worker`` commands to be available. It is recommended the Python
environment matches your local environment where ``EC2Cluster`` is being created from.

For GPU instance types the Docker image much have NVIDIA drivers and ``dask-cuda`` installed.
Expand Down Expand Up @@ -375,7 +375,7 @@ def get_cloud_init(cls, *args, **kwargs):
cluster.auto_shutdown = False
return cluster.render_cloud_init(
image=cluster.options["docker_image"],
command="dask-scheduler --version",
command="dask scheduler --version",
docker_args=cluster.options["docker_args"],
extra_bootstrap=cluster.options["extra_bootstrap"],
gpu_instance=cluster.gpu_instance,
Expand Down
4 changes: 2 additions & 2 deletions doc/source/gpus.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Each cluster manager handles this differently but generally you will need to con

- Configure the hardware to include GPUs. This may be by changing the hardware type or adding accelerators.
- Ensure the OS/Docker image has the NVIDIA drivers. For Docker images it is recommended to use the [RAPIDS images](https://hub.docker.com/r/rapidsai/rapidsai/).
- Set the ``worker_module`` config option to ``dask_cuda.cli.dask_cuda_worker`` or ``worker_command`` option to ``dask-cuda-worker``.
- Set the ``worker_module`` config option to ``dask_cuda.cli.dask_cuda_worker`` or set ``resources`` to include ``GPU=n`` where ``n`` is the number of GPUs you require. This will cause ``dask cuda worker`` to be used in place of ``dask worker``.

In the following AWS :class:`dask_cloudprovider.aws.EC2Cluster` example we set the ``ami`` to be a Deep Learning AMI with NVIDIA drivers, the ``docker_image`` to RAPIDS, the ``instance_type``
to ``p3.2xlarge`` which has one NVIDIA Tesla V100 and the ``worker_module`` to ``dask_cuda.cli.dask_cuda_worker``.
Expand All @@ -24,4 +24,4 @@ to ``p3.2xlarge`` which has one NVIDIA Tesla V100 and the ``worker_module`` to `
bootstrap=False,
filesystem_size=120)

See each cluster manager's example sections for info on starting a GPU cluster.
See each cluster manager's example sections for info on starting a GPU cluster.
Loading