-
Notifications
You must be signed in to change notification settings - Fork 0
ICGC ARGO RNA Seq Analysis
- GRCh38 Genome Build Version: GRCh38_Verily_v1
file name | size | md5sum |
---|---|---|
GRCh38_Verily_v1.genome.fa | 3150152408 | 16626761857940321a7a1142e03f8217 |
GRCh38_Verily_v1.genome.fa.fai | 123145 | b373ad1f64003c910dce216f93718aab |
GRCh38_Verily_v1.genome.fa.gz | 887918831 | 1fb31dcb45ca7c52d0e27c523504bc9a |
GRCh38_Verily_v1.genome.fa.gz.gzi | 772104 | 55b7a860d1cef3793fcda54af56664e3 |
GRCh38_Verily_v1.genome.fa.gz.fai | 123145 | b373ad1f64003c910dce216f93718aab |
README.txt | 1492 | db3b3e4233b6ddb92ff3e3dc152ccda8 |
The above files need to be staged under a path in the file system where workflow
jobs can access. The files can be downloaded using wget
, one example is given as
below:
wget https://object.cancercollaboratory.org:9080/swift/v1/genomics-public-data/rna-seq-references/GRCh38_Verily_v1.genome/README.txt
- Since RNA-Seq aligners are not ALT-aware, a slightly different version of reference genome is used by ICGC-ARGO for RNA-Seq Analysis. This file is composed of the following sequences:
- GRCh38 primary assembly
- Decoy sequences
- Epstein-Barr virus (EBV) sequence
- GENCODE v40 contains the comprehensive gene annotation on the reference chromosomes, scaffolds, assembly patches and alternate loci (haplotypes)
file name | size | md5sum |
---|---|---|
gencode.v40.chr_patch_hapl_scaff.annotation.gtf | 1616162883 | beeee37565d2a76f477fb474fcfa922e |
The above files need to be staged under a path in the file system where workflow
jobs can access. The files can be downloaded using wget
, one example is given as
below:
wget https://object.cancercollaboratory.org:9080/swift/v1/genomics-public-data/rna-seq-references/GRCh38_Verily_v1.annotation/gencode.v40.chr_patch_hapl_scaff.annotation.gtf
file name | size | md5sum |
---|---|---|
Genome | 3823751360 | c40d86d0b50c34dd46a9347472462937 |
SA | 24750976325 | 318f38f408c9b48e8c2e4c4911bcf470 |
SAindex | 1565873619 | aa883f548dbc9399e7a1444891fdd741 |
STARindex.log | 763 | acb99118146c8723378709968565c3c4 |
chrLength.txt | 13158 | 7f5964f5965ea24ade6257990c7461cf |
chrName.txt | 66140 | fbb1fe18634dc8fc7192930225f0e6a1 |
chrNameLength.txt | 79298 | 98e83030349933a0b0ca21888a71edb0 |
chrStart.txt | 28378 | d87a026a4ec84d67c3e53256a985f904 |
exonGeTrInfo.tab | 56068230 | 696ce75a16af0d50a47f79c4b95ff4b1 |
exonInfo.tab | 22952209 | 8932772eace6ab5408133590b4f34b56 |
geneInfo.tab | 2591817 | 303aa1d1f63fae8bd954dba3c5f5dcb9 |
genomeParameters.txt | 1008 | 7ad39ed85712bdb3f7e238e364f39de4 |
sjdbInfo.txt | 11620218 | 29b9af281debb5900d8db82f832a0642 |
sjdbList.fromGTF.out.tab | 12610890 | 1d4d6966ec9f67d9125067b7636c3038 |
sjdbList.out.tab | 10259582 | 207834d2baf062f5d0a303c18fdb8798 |
transcriptInfo.tab | 16599748 | 445aa2a51ddfc112f0f6f6b8463f9b8d |
The above files need to be staged under a path in the file system where workflow
jobs can access. The files can be downloaded using wget
, one example is given as
below:
wget https://object.cancercollaboratory.org:9080/swift/v1/genomics-public-data/rna-seq-references/GRCh38_Verily_v1.STARindex.sjdbOverhang_75/STARindex.log
file name | size | md5sum |
---|---|---|
GRCh38_Verily_v1.1.ht2 | 1818403521 | 60998540231f7e21ad8a53d13898de08 |
GRCh38_Verily_v1.2.ht2 | 736877080 | a6b58d2aa00d32007c1227e9835e2038 |
GRCh38_Verily_v1.3.ht2 | 31508 | 682418739b6d9c3dd92dc39df73fdfeb |
GRCh38_Verily_v1.4.ht2 | 735167267 | aac99bf451926a49e0cb5a921588fdf2 |
GRCh38_Verily_v1.5.ht2 | 1772593003 | 834ad923bead0a77562f80ea55ed3c93 |
GRCh38_Verily_v1.6.ht2 | 749013982 | 89110c7f502a5ffa5fd9895cba2f87da |
GRCh38_Verily_v1.7.ht2 | 14465092 | bef9ed20ad08932a0d07b5da317be62b |
GRCh38_Verily_v1.8.ht2 | 2823782 | 4bfcde812f6b0ce124439d6da85ccdf6 |
GRCh38_Verily_v1.log | 10620 | 4e833f06e59568c17e409b1a69cf7b11 |
The above files need to be staged under a path in the file system where workflow
jobs can access. The files can be downloaded using wget
, one example is given as
below:
wget https://object.cancercollaboratory.org:9080/swift/v1/genomics-public-data/rna-seq-references/GRCh38_Verily_v1.HISAT2index/GRCh38_Verily_v1.log
-
--ref_flat
: a tab-delimited file containing information about the location of RNA transcripts, exon start and stop sites, etc. -
--ribosomal_interval_list
: provide the locations of rRNA sequences in the genome in interval_list format. If not specified no bases will be identified as being ribosomal.
file name | size | md5sum |
---|---|---|
GRCh38_Verily_v1.rRNA.interval_list | 134077 | 6e00a55590ec6cbddafe9bd59f7f444b |
GRCh38_Verily_v1.refFlat.txt.gz | 8043021 | 21ebee2684e7be6df13500d880b2b6ad |
The above files need to be staged under a path in the file system where workflow
jobs can access. The files can be downloaded using wget
, one example is given as
below:
wget https://object.cancercollaboratory.org:9080/swift/v1/genomics-public-data/rna-seq-references/GRCh38_Verily_v1.Picard_CollectRnaSeqMetrics/GRCh38_Verily_v1.rRNA.interval_list