Whole Genome and Whole Exome Sequencing Modules¶
FASTQC¶
- omics_pipe.modules.fastqc.fastqc(sample, fastqc_flag)[source]¶
QC check of raw .fastq files using FASTQC.
- input:
- .fastq file
- output:
- folder and zipped folder containing html, txt and image files
- citation:
- Babraham Bioinformatics
- link:
- http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- parameters from parameters file:
RAW_DATA_DIR:
QC_PATH:
FASTQC_VERSION:
COMPRESSION:
BWA-MEM¶
- omics_pipe.modules.bwa.bwa_mem(sample, bwa_mem_flag)[source]¶
BWA aligner with BWA-MEM algorithm.
- input:
- .fastq
- output:
- .sam
- citation:
- Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]
- link:
- http://bio-bwa.sourceforge.net/bwa.shtml
- parameters from parameters file:
BWA_RESULTS:
TEMP_DIR:
SAMTOOLS_VERSION:
BWA_VERSION:
GENOME:
RAW_DATA_DIR:
BWA_OPTIONS:
COMPRESSION:
PICARD Mark Duplicates¶
- omics_pipe.modules.picard_mark_duplicates.picard_mark_duplicates(sample, picard_mark_duplicates_flag)[source]¶
Picard tools Mark Duplicates.
- input:
- sorted.bam
- output:
- _sorted.rg.md.bam
- citation:
- http://picard.sourceforge.net/
- link:
- http://picard.sourceforge.net/
- parameters from parameters file:
BWA_RESULTS:
TEMP_DIR:
PICARD_VERSION:
SAMTOOLS_VERSION:
GATK Preprocessing¶
WES¶
- omics_pipe.modules.GATK_preprocessing_WES.GATK_preprocessing_WES(sample, GATK_preprocessing_WES_flag)[source]¶
GATK preprocessing steps for whole exome sequencing.
- input:
- sorted.rg.md.bam
- output:
- .ready.bam
- citation:
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
- link:
- http://www.broadinstitute.org/gatk/
- parameters from parameters file:
BWA_RESULTS:
TEMP_DIR:
GATK_VERSION:
GENOME:
DBSNP:
MILLS:
G1000:
CAPTURE_KIT_BED:
SAMTOOLS_VERSION:
WGS¶
- omics_pipe.modules.GATK_preprocessing_WGS.GATK_preprocessing_WGS(sample, GATK_preprocessing_WGS_flag)[source]¶
GATK preprocessing steps for whole genome sequencing.
- input:
- sorted.rg.md.bam
- output:
- .ready.bam
- citation:
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
- link:
- http://www.broadinstitute.org/gatk/
- parameters from parameters file:
BWA_RESULTS:
TEMP_DIR:
GATK_VERSION:
GENOME:
DBSNP:
MILLS:
G1000:
SAMTOOLS_VERSION:
GATK Variant Discovery¶
- omics_pipe.modules.GATK_variant_discovery.GATK_variant_discovery(sample, GATK_variant_discovery_flag)[source]¶
GATK_variant_discovery.
- input:
- sorted.rg.md.bam
- output:
- .ready.bam
- citation:
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
- link: GATK_variant_discovery
- http://www.broadinstitute.org/gatk/
- parameters from parameters file:
BWA_RESULTS:
TEMP_DIR:
GATK_VERSION:
GENOME:
DBSNP:
VARIANT_RESULTS:
GATK Variant Filtering¶
- omics_pipe.modules.GATK_variant_filtering.GATK_variant_filtering(sample, GATK_variant_filtering_flag)[source]¶
GATK_variant_filtering.
- input:
- sorted.rg.md.bam
- output:
- .ready.bam
- citation:
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
- link: GATK_variant_filtering
- http://www.broadinstitute.org/gatk/
- parameters from parameters file:
VARIANT_RESULTS:
TEMP_DIR:
GATK_VERSION:
GENOME:
DBSNP:
MILLS:
OMNI:
HAPMAP:
R_VERSION:
G1000_SNPs:
G1000_Indels:
- omics_pipe.modules.GATK_variant_filtering.GATK_variant_filtering_group(sample, GATK_variant_filtering_group_flag)[source]¶
GATK_variant_filtering.
- input:
- sorted.rg.md.bam
- output:
- .ready.bam
- citation:
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.
- link: GATK_variant_filtering
- http://www.broadinstitute.org/gatk/
parameters from parameters file:
VARIANT_RESULTS:
TEMP_DIR:
GATK_VERSION:
GENOME:
DBSNP:
MILLS_G1000:
OMNI:
HAPMAP:
R_VERSION:
G1000: