Skip to content

Commit

Permalink
Update pip setup and documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
pichuan committed Jan 18, 2022
1 parent b478af5 commit f1413ee
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 16 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ See the [quick start](https://github.com/google/deepconsensus/blob/main/docs/qui

## Where does DeepConsensus fit into my pipeline?

After a PacBio sequencing run, DeepConsensus is meant to be run on the CCS reads
and subreads to create new corrected reads in FASTQ format that can take the
place of the CCS reads for downstream analyses.
After a PacBio sequencing run, DeepConsensus is meant to be run on the subreads
to create new corrected reads in FASTQ format that can take the place of the CCS
reads for downstream analyses.

See the [quick start](https://github.com/google/deepconsensus/blob/main/docs/quick_start.md)
for an example of inputs and outputs.
Expand Down
21 changes: 21 additions & 0 deletions README_pip.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Important: Pip install is different for CPU versus GPU

If you're on a GPU machine:

```bash
pip install deepconsensus[gpu]==0.2.0
# To make sure the `deepconsensus` CLI works, set the PATH:
export PATH="/home/${USER}/.local/bin:${PATH}"
```

If you're on a CPU machine:

```bash
pip install deepconsensus[cpu]==0.2.0
# To make sure the `deepconsensus` CLI works, set the PATH:
export PATH="/home/${USER}/.local/bin:${PATH}"
```

## Documentation, quick start, citation

All other documentation is on GitHub: [https://github.com/google/deepconsensus](https://github.com/google/deepconsensus).
26 changes: 14 additions & 12 deletions docs/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This covers the following stages:
to use DeepConsensus from existing *ccs* reads, but yield will be higher when
including all reads)
2. Aligning subreads to the *ccs* consensus with *[actc]*
3. Running DeepConsensus using one of two options (with pip or using Docker)
3. Running DeepConsensus using either pip or Docker

## System configuration

Expand All @@ -24,9 +24,9 @@ GPU: 1 nvidia-tesla-p100
```

DeepConsensus can be run on any compatible Unix systems. In this case, we used a
[n1-standard-16 machine on GCP](https://cloud.google.com/compute/docs/general-purpose-machines#n1_machines), with a NVIDIA P100 GPU.
[n1-standard-16 machine on GCP](https://cloud.google.com/compute/docs/general-purpose-machines#n1_machines), with an NVIDIA P100 GPU.

## Download data for testing
## Download example data

This will download about 142 MB of data and the model is another 245 MB.

Expand All @@ -40,16 +40,17 @@ MODEL_DIR="${QUICKSTART_DIRECTORY}/model"
mkdir -p "${DATA}"
mkdir -p "${MODEL_DIR}"

# Download the input data which is PacBio subreads.
# Download the input data, which is PacBio subreads.
gsutil cp gs://brain-genomics-public/research/deepconsensus/quickstart/v0.2/subreads.bam* "${DATA}"/

# Download DeepConsensus model.
# Download the DeepConsensus model.
gsutil cp gs://brain-genomics-public/research/deepconsensus/models/v0.2/* "${MODEL_DIR}"/
```

## If running with GPU, set up your GPU machine correctly.

In our example run, because we're using GPU, we used:

```bash
curl https://raw.githubusercontent.com/google/deepvariant/r1.3/scripts/install_nvidia_docker.sh -o install_nvidia_docker.sh
bash install_nvidia_docker.sh
Expand All @@ -62,8 +63,8 @@ to make sure our GPU is set up correctly.
You can install *[ccs]* and *[actc]* on your own. For convenience, we put them in
a Docker image:

```
DOCKER_IMAGE=google/deepconsensus:0.2.0rc1-gpu
```bash
DOCKER_IMAGE=google/deepconsensus:0.2.0-gpu
sudo docker pull ${DOCKER_IMAGE}
```

Expand All @@ -84,7 +85,7 @@ quality threshold.
If you want to split up the task for parallelization, we recommend using the
`--chunk` option in *ccs*.

Then, we create `subreads_to_ccs.bam` was created by running *actc*:
Then, we create `subreads_to_ccs.bam` by running *actc*:

```bash
sudo docker run -v "${DATA}":"/data" ${DOCKER_IMAGE} \
Expand All @@ -94,7 +95,7 @@ sudo docker run -v "${DATA}":"/data" ${DOCKER_IMAGE} \
/data/subreads_to_ccs.bam
```

DeepConsensus will take FASTA format of *ccs*.
DeepConsensus will take the consensus sequences output by *ccs* in FASTA format.

*actc* already converted the BAM into FASTA. Rename and index it.

Expand All @@ -113,7 +114,7 @@ sudo docker run -v "${DATA}":"/data" ${DOCKER_IMAGE} \
You can install DeepConsensus using `pip`:

```bash
pip install deepconsensus[gpu]==0.2.0rc1
pip install deepconsensus[gpu]==0.2.0
```

NOTE: If you're using a CPU machine, install with `deepconsensus[cpu]` instead.
Expand All @@ -139,14 +140,15 @@ time deepconsensus run \
```

At the end of your run, you should see:

```
Processed 1000 ZMWs in 341.3297851085663 seconds
Outcome counts: OutcomeCounter(empty_sequence=0, only_gaps_and_padding=50, failed_quality_filter=424, failed_length_filter=0, success=526)
```
the outputs can be found at the following paths:

The final output FASTQ can be found at the following path:

```bash
# Final output fastq file which has DeepConsensus reads.
ls "${DATA}"/output.fastq
```

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
here = pathlib.Path(__file__).parent.resolve()

# Get the long description from the README file
long_description = (here / 'README.md').read_text(encoding='utf-8')
long_description = (here / 'README_pip.md').read_text(encoding='utf-8')

REQUIREMENTS = (here / 'requirements.txt').read_text().splitlines()
EXTRA_REQUIREMENTS = {
Expand Down

0 comments on commit f1413ee

Please sign in to comment.