Skip to content

Whole Genome Analysis

For whole-genome sequencing studies it is highly recommended to split the analysis by chromosome and then merge the results. Splitting the work by chromosome will reduce overall runtime requirements to analyze whole-genome data.

SLURM

NUM_CORES=64

for chrom in $(head -24 GRCh38.fasta.fai | cut -f1 | tr '\n' ' ')
do
sbatch --job-name="Lancet2_${chrom}" --cpus-per-task="${NUM_CORES}" --wrap \
"Lancet2 pipeline --num-threads ${NUM_CORES} \
    --normal normal.bam --tumor tumor.bam \
    --reference GRCh38.fasta --region ${chrom} \
    --out-vcfgz output.${chrom}.vcf.gz"
done

SGE (Sun Grid Engine)

NUM_CORES=64

for chrom in $(head -24 GRCh38.fasta.fai | cut -f1 | tr '\n' ' ')
do
qsub -N "Lancet2_${chrom}" -cwd -pe smp "${NUM_CORES}" -j y -b y \
"Lancet2 pipeline --num-threads ${NUM_CORES} \
    --normal normal.bam --tumor tumor.bam \
    --reference GRCh38.fasta --region ${chrom} \
    --out-vcfgz output.${chrom}.vcf.gz"
done

Merging results

After all per-chromosome jobs complete, merge the VCF files:

bcftools concat -Oz -o merged.vcf.gz output.chr*.vcf.gz
bcftools index -t merged.vcf.gz