Getting Started¶
Background¶
Lancet2 is a command line somatic variant caller (SNVs and InDels) for short read sequencing data implemented with modern C++. It performs joint multi-sample localized colored de-bruijn graph assembly for more accurate variant calls, especially InDels.
In addition to variant calling accuracy and improved somatic filtering, Lancet2 has significant runtime performance improvements compared to Lancet1 (upto ∼10x speedup and 50% less peak memory usage)
Installation¶
Build prerequisites¶
Build commands¶
git clone https://github.com/nygenome/Lancet2.git
cd Lancet2 && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && make -j$(nproc)
Static binary¶
Note
It is recommended to build Lancet2 from scratch on the target machine where processing is expected to happen for maximum runtime performance.
If you have a Linux based operating system and a CPU that supports AVX2 instructions. The simplest way to use Lancet2
is to download the binary from the latest available release. The binary from releases is static, with no dependencies and needs only executable permissions before it can be used.
chmod +x Lancet2
./Lancet2 --help
Docker images¶
Note
A CPU that supports the AVX512 instruction set is required to use the pre-built public docker images. Custom docker images for older CPUs can be built by the user by modifying the BUILD_ARCH
argument in the Dockerfile.
Public docker images hosted on Google Cloud are available for recent tagged releases.
Basic Usage¶
The following command demonstrates the basic usage of the Lancet2 variant calling pipeline for a tumor and normal bam file pair on chr22.
Lancet2 pipeline \
--normal /path/to/normal.bam \
--tumor /path/to/tumor.bam \
--reference /path/to/reference.fasta \
--region "chr22" --num-threads $(nproc) \
--out-vcfgz /path/to/output.vcf.gz
See here for more information on how to score and filter somatic variants using explainable machine learning models.
License¶
Lancet2 is distributed under the BSD 3-Clause License.
Citing Lancet2¶
- Lancet2: Improved and accelerated somatic variant calling with joint multi-sample local assembly graphs
- Somatic variant analysis of linked-reads sequencing data with Lancet
- Genome-wide somatic variant calling using localized colored de Bruijn graphs
Funding¶
Informatics Technology for Cancer Research (ITCR) under the NCI U01 award 1U01CA253405-01A1.