This website contains information on obtaining the whole genome shotgun sequence of the Cannabis Sativa cultivar "Chemdawg." The data is provided by Medicinal Genomics with the help of Nimbus Informatics. Academic use is free of charge but Amazon EC2 costs are the responsibility of the user. If you are a commercial enterprise please contact Medicinalgenomics@gmail.com for a commercial license.


The sequence data is derived from an ILMN HiSeq v2.0 chemistry with 2x100 reads. There are 7 Lanes in total which add up to 131Gb of sequence. Quality statistics for the run can be found at here. The genome is estimated to be 400Mb thus an estimated 327X coverage.

There are several ways in which we anticipate people will want to use this data:

  • Reassembly of the data with different assemblers. Only two have been tried so far. SOAPdenovo and CLC bio and neither have assembled more than 2 lanes of data. Its possible a far better assembly could be made by using contrail, or the celera assembler found on the web.
  • SNP and indel calling. We have performed preliminary calls and are mapping these to blastX hits to prioritize functional variants. The C. sativa strain is more polymorphic than the C. indica strain currently being assembled.
  • Other cloud based annotation tools.

If improvements are made to the assembly or variant calls we ask people post those to Amazon in public EBS volumes and send a note to Medicinalgenomics@gmail.com so we can link to your improvements from our website.

Download Data
We have made the files available via S3 for direct download:
Use Data on Amazon
You can create your own EBS volume from our Amazon-hosted public snapshot "snap-f8af5298". For more information on using an EBS volume on EC2 please see this document.
Latest News

August 18, 2011

Today we posted the fastq files for download and made an EBS snapshot of the data available on Amazon.