Skip to content

SmartSeq2 Pipeline Timing and Cost

Jishu Xu edited this page Sep 25, 2017 · 33 revisions

Table of Contents


Google Cloud Computing Pricing

It could be very complicated to estimate the cost of cloud computing. For google cloud, the billing is on the reserved resource, such as number of cores, size of harddrive and RAM. You will be charged by a hourly rate over the time you use the reserved resource. For google cloud, the pricing also varies between machine type. For example, the hourly rate of using a high specs machine (more nodes and high RAM) is higher than low specs ones. The pricing page lists details.

scRNASeq Pipeline Timing and Cost

The cost of processing RNA-Seq on cloud computing depends one several factors. One is running time. Second how much resource has been reserved during data processing. Third is machine type. Itcould be tricky to estiamte the cost. For example, to run one RSEM job, we can either request min resouce, such as 1-node with 4Gb RAM machine or we can requst a 4-nodes and 15Gb machine. The hourly rate from the first type is cheaper than the second one but takes longer time to finish job.

Pipeline Runtime

In this task, we intended to estimate the cost of processing scRNA-Seq data on google cloud. The testing pipeline include the following modules/steps:

  • STAR alignment [request 8-core and 40Gb RAM]
  • RSEM to estimate gene counts [request 4-core and 4Gb RAM]
  • FeatureCount to calculate gene/exon/transcript counts [request 1-core and 4Gb RAM]
  • Picard to collect several sequencing and RNA-Seq specific metrics [request 1-core and 4Gb RAM]
  • Python script to parse pipeline output [request 1-core and 2Gb]

Timing and Cost

Then we tested this pipeline on a published scRNA-Seq dataset, which include 864 single cells full length RNA-Seq data. And we summary the total hours(in hour) and total cost(in dollar) in the following table.

StepName Program/software Total Hours Total Cost
Star Star 292.133 140.115
RSEM RSEM 151.983 29.376
CollectAlignemntMetrics Picard 144.216 13.858
CollectRnaMetrics Picard 144.555 13.890
CollectInsertSizeMetrics Picard 144.116 13.849
CollectDuplicationMetrics Picard 144.433 13.958
FeatureCountsUniqueCounts FeatureCount 144.266 14.179
FeatureCountsMultiCounts FeatureCount 144.483 14.200
CollectMetricsbySample python 144.466 7.535

Association with scRNASeq Quality

We also examed the impact fo scRNA-Seq data quality on total hours and total cost. The two QC measurements we examed are TOTAL READS and PCT_USABLE_BASES. In general, the first measure would tell us the size of data and second measurement would tell us the efficience of RNA-Seq experiement. For example,we examed Star timing and cost with TOTAL READS and PCT_USABLE_BASES

And RSEM