Skip to content

Latest commit

 

History

History
94 lines (76 loc) · 7.36 KB

running-in-aws.md

File metadata and controls

94 lines (76 loc) · 7.36 KB

Running Cactus on AWS

Cactus supports running on AWS with an auto-scaling cluster using Toil. Check out the Toil docs on running in the cloud for the full story, but here's a short walkthrough of running on AWS.

AWS setup

If you have a fresh AWS account or haven't used EC2 before, you'll need to go through some initial setup.

Keypair

Make sure you have an AWS keypair ready. This document will tell you how to create an AWS keypair that will allow you to log into the instances you create.

Access keys

You'll also need to have your AWS access credentials set up in ~/.aws/credentials or the typical AWS environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY). Here is AWS's documentation on setting up your access key, and this guide will help you set up ~/.aws/credentials.

Instance limits

By default, AWS will restrict you to running only a few small instances at a time. If you have a new AWS account, or you're not sure what your limits are, you will probably need to increase them. See this guide for information on how to check your existing limits and how to increase them if necessary. AWS support takes only one or two business days to process your request if you have the default "basic" support package.

Look below for tips on how many instances you may need for your specific alignment. Keep in mind that spot-market instance limits are separate from "on-demand" (non-preemptable) instance limits. (You'll probably want to request a slightly higher limit than you think you need, just in case you want to tweak the number of instances later on.)

Installing Toil on your local machine

The simplest way to do this is to download the latest precompiled binary Cactus release and follow the installation instructions.

Estimating the maximum number of worker instances you'll need

The cluster will automatically scale up and down, but you'll want to set a maximum number of nodes so the scaler doesn't get overly aggressive and waste money, or go over your AWS limits. We typically use c4.8xlarge on the spot market for most jobs, and r4.8xlarge on-demand for database jobs. Here are some very rough estimates of what we typically use for the maximum of each type (round up):

  • N mammal-size genomes (~2-4Gb): (N / 2) * 20 c4.8xlarge on the spot market, (N / 2) r3.8xlarge on-demand
  • N bird-size genomes (~1-2Gb): (N / 2) * 10 c4.8xlarge on the spot market, (N / 4) r3.8xlarge on-demand
  • N nematode-size genomes (~100-300Mb): (N / 2) c4.8xlarge on the spot market, (N / 10) r3.8xlarge on-demand
  • For anything less than 100Mb, the computational requirements are so small that you may be better off running it on a single machine than using an autoscaling cluster.

Launching the "leader" instance

Make sure you have your AWS keypair active in your ssh-agent, unless you used your existing SSH key. If ssh-agent isn't started by your operating system, you may have to run eval $(ssh-agent) to start it. You can activate the AWS keypair for a session by running:

ssh-add path/to/your_aws_ssh_keypair

Then launch the leader like so:

toil launch-cluster -z us-west-2b <clusterName> --keyPairName <yourKeyPairName> --leaderNodeType t2.medium

Transfer over local input data (or use URLs in your seqfile)

You need to get your actual data to the leader somehow. You can use http:// and s3:// to specify FASTA locations in your input file, or you can rsync your data over like so:

toil rsync-cluster -z us-west-2b my-cactus-cluster -avP seqFile.txt input1.fa input2.fa :/

Log into the leader and set up the cluster's Cactus installation

Log into the leader by running:

toil ssh-cluster -z us-west-2b <clusterName>

You should now get a prompt like:

root@ip-172-31-34-148:/#

indicating that you're on the leader.

You now install Cactus again by downloading the latest precompiled binary Cactus release and following the installation instructions BUT WITH 2 CRITICAL CHANGES to make sure you use the correct Toil:

  • add the --system-site-packages option when creating the virtualenv
  • do not run python3 -m pip install -U -r ./toil-requirement.txt (ie don't re-install Toil)

For example, the installation instructions for v2.8.0 (you can/should use the latest release) would be changed to

wget -q https://github.com/ComparativeGenomicsToolkit/cactus/releases/download/v2.8.0/cactus-bin-v2.8.0.tar.gz
tar -xzf cactus-bin-v2.8.0.tar.gz
cd cactus-bin-v2.8.0
virtualenv -p python3 --system-site-packages venv-cactus-v2.8.0
source venv-cactus-v2.8.0/bin/activate
python3 -m pip install -U setuptools pip wheel
python3 -m pip install -U .

Run the alignment (on the leader)

The key parameters you'll care about changing (besides the usual Cactus parameters) are the autoscaling parameters --nodeTypes, --minNodes, --maxNodes and the jobStore location, aws:<region>:<jobStoreName>.

You must use the AWS jobstore (or other cloud jobstore, though others may incur data egress charges), not a directory jobstore, because there is no shared filesystem in the cluster. Set the region to whatever region you're running the leader in. The jobStoreName must be globally unique.

The --nodeTypes option lets you specify the list of instances you want as well as the price you're willing to pay for spot instances. For example, c4.8xlarge:0.6 says that we want a c4.8xlarge instance on the spot market, and we're willing to pay up to 60 cents an hour for it. A value like r3.8xlarge indicates that we also want on-demand r3.8xlarge instances (for which we pay exactly the on-demand price, which is usually substantially higher). The --maxNodes option will let you specify a list containing the maximum number of nodes of each type to use at any given time (in the same order as --nodeTypes. The --minNodes option should probably be left at 0 unless you really know what you're doing and you're having substantial difficulties with the autoscaler.

cactus --nodeTypes c4.8xlarge:0.6,r3.8xlarge --minNodes 0,0 --maxNodes 20,2 --provisioner aws --batchSystem mesos --metrics aws:us-west-2:<jobstoreName> seqFile.txt output.hal

This will take a while. You'll want to run this inside something that will preserve your session, like tmux or screen, so the command doesn't terminate when you disconnect.

Restarting after failure

If you cancel your run, or it fails for some reason, you can start where you left off by running:

cactus [all your usual options...] --restart

Shut down your leader

When the alignment is done, your leader will still be active, costing you a little bit of money every day. Make sure you get rid of it when you're done:

toil destroy-cluster -z us-west-2b <yourClusterName>