Skip to content

Getting Started

Background

Lancet2 is a command line somatic variant caller (SNVs and InDels) for short read sequencing data implemented with modern C++. It performs joint multi-sample localized colored de-bruijn graph assembly for more accurate variant calls, especially InDels.

In addition to variant calling accuracy and improved somatic filtering, Lancet2 has significant runtime performance improvements compared to Lancet1 (upto ∼10x speedup and 50% less peak memory usage)

Installation

Lancet2 packages with full Cloud I/O support (s3://, gs://, http(s)://, ftp(s)://) are published to prefix.dev/channels/lancet2. Install using your preferred package manager:

Package Manager Install Command
Pixi (recommended) pixi global install --channel https://prefix.dev/channels/lancet2 lancet2
Conda conda install -c https://prefix.dev/channels/lancet2 lancet2
Mamba mamba install -c https://prefix.dev/channels/lancet2 lancet2

Development builds are published automatically on every commit. To install a specific stable release, pin the version:

pixi global install --channel https://prefix.dev/channels/lancet2 'lancet2==v2.9.0'

Docker images

Note

A CPU that supports the AVX2 instruction set is required to use the pre-built public docker images. Custom docker images for older CPUs can be built by the user by modifying the BUILD_ARCH argument in the Dockerfile.

Public docker images hosted on Google Cloud are available for recent tagged releases.

Build from source

Note

Building from source on the target machine is recommended for maximum runtime performance.

Cloud support in static builds

Static builds (the default) do not support Cloud Streaming (gs://, s3://, http(s)://, ftp(s)://) because cloud I/O requires dynamic linking of the host OS's network stack (libcurl / openssl). Use the pre-built packages or Docker images instead, or build with -DLANCET_ENABLE_CLOUD_IO=ON (see below).

Build prerequisites

Build commands

git clone https://github.com/nygenome/Lancet2.git
cd Lancet2 && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && make -j$(nproc)

macOS Homebrew Configuration

To fix CMake failing to find Homebrew-installed BZip2 on macOS, explicitly set the CMAKE_PREFIX_PATH to the Homebrew prefix when running CMake. Homebrew often skips linking bzip2 into the shared prefix to prevent conflicts with system libraries.

For Apple Silicon (M1/M2/M3):

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/opt/homebrew/opt/bzip2 ..

For Intel Mac:

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/usr/local/opt/bzip2 ..

Cloud I/O Support (GCS, S3, HTTP/S, FTP/S)

Cloud streaming can be enabled by setting -DLANCET_ENABLE_CLOUD_IO=ON during CMake configuration. This is opt-in and requires dynamic linkage (-DLANCET_BUILD_STATIC=OFF) with libcurl and openssl installed. For usage details, see the Cloud Streaming Guide.

cmake -DCMAKE_BUILD_TYPE=Release -DLANCET_BUILD_STATIC=OFF -DLANCET_ENABLE_CLOUD_IO=ON ..
make -j$(nproc)

Basic Usage

Tumor-Normal Somatic Calling

The primary supported workflow. Variants are classified as CASE (tumor-only), CTRL (normal-only), or SHARED (both) in the VCF INFO field.

Lancet2 pipeline \
    --normal /path/to/normal.bam \
    --tumor /path/to/tumor.bam \
    --reference /path/to/reference.fasta \
    --region "chr22" --num-threads $(nproc) \
    --out-vcfgz /path/to/output.vcf.gz

Why CASE/CTRL instead of TUMOR/NORMAL?

Lancet2 uses case/control terminology to generalize beyond tumor-normal analysis (e.g., treated vs. untreated, responder vs. non-responder). The --normal and --tumor CLI flags are permanent and first-class — tumor-normal somatic calling is the primary supported workflow. See the CLI Reference for details.

Single-Sample Mode

Omit --tumor to run on a single sample. Lancet2 generates raw variant candidates across the full allele spectrum (germline, mosaic, artifact) without somatic state classification — no SHARED/CTRL/CASE tags appear in the VCF.

Lancet2 pipeline \
    --normal /path/to/sample.bam \
    --reference /path/to/reference.fasta \
    --region "chr22" --num-threads $(nproc) \
    --out-vcfgz /path/to/output.vcf.gz

Multi-Sample Mode

Additional samples can be added using -s,--sample alongside the standard flags. See Multi-Sample & Germline Mode for details.

Experimental — no pre-trained ML models

Single-sample and multi-sample modes produce raw variant candidates. No pre-trained ML models are currently provided for filtering in these modes — variant calls require custom downstream filtering. The only pre-trained model available is for the standard tumor-normal somatic workflow (v2.8.7 compatible). See Scoring Somatic Variants for details.

License

Lancet2 is distributed under the BSD 3-Clause License.

Citing Lancet2

Funding

Informatics Technology for Cancer Research (ITCR) under the NCI U01 award 1U01CA253405-01A1.