Reference¶
pipeline¶
The pipeline subcommand runs the entire Lancet variant calling pipeline on one (or) more region(s) of interest. The full help text for the subcommand can be generated using the following command line
Required¶
-r,--reference¶
[PATH]
Path to the reference FASTA file. Supports local paths and cloud URIs (s3://, gs://, http(s)://, ftp(s)://) when built with cloud I/O support. See Native Cloud Streaming for setup instructions.
-o,--out-vcfgz¶
[PATH]
Output path to the compressed VCF file (.vcf.gz). Supports cloud URIs for direct streaming uploads. When writing to a cloud bucket, Lancet2 performs an upfront authentication check before processing any windows. See VCF Output Reference for the full output format specification.
Datasets¶
Terminology: Case/Control vs. Tumor/Normal
Lancet2 uses case/control terminology to generalize beyond tumor-normal analysis (e.g., treated vs. untreated, responder vs. non-responder). The --normal and --tumor CLI flags remain first-class, permanent — tumor-normal somatic calling is the primary supported workflow. The flags map directly: --normal → control samples, --tumor → case samples.
-n,--normal¶
[PATH...]
Path to one (or) more normal BAM/CRAM file(s). Required. Maps to control samples internally. Multiple paths enable multi-sample mode where all control samples share the CTRL graph color. Sample names are read from BAM SM read group tags. See Multi-Sample & Germline Mode for caveats.
-t,--tumor¶
[PATH...]
Path to one (or) more tumor BAM/CRAM file(s). Optional — when omitted, Lancet2 runs in germline-only mode (no SHARED/CTRL/CASE INFO tags in the VCF). Maps to case samples internally. Multiple paths enable multi-sample case-control mode. See Multi-Sample & Germline Mode.
-s,--sample¶
[PATH:ROLE...]
Unified sample input for N-sample mode. Each argument is a <path>:<role> pair where role is control or case. If the role suffix is omitted, the sample defaults to control. The role is identified by checking the suffix after the last colon, so cloud URIs like s3://bucket/file.bam:case work correctly. This flag can be used alongside --normal/--tumor to add additional samples. --normal remains required. Samples from all flags are merged into a single sorted sample list.
Examples:
# Add a third sample alongside standard tumor-normal
Lancet2 pipeline -n normal.bam -t tumor.bam -s extra.bam:case ...
# Three-way comparison with shared control
Lancet2 pipeline -n control.bam -s treated_a.bam:case -s treated_b.bam:case ...
# Omitted role defaults to control
Lancet2 pipeline -n normal.bam -s population.bam ...
Regions¶
-R,--region¶
[REF:[:START[-END]]...]
One (or) more genomic regions (1-based, both inclusive). When neither --region nor --bed-file is specified, Lancet2 processes all non-decoy, non-mitochondrial reference contigs.
-b,--bed-file¶
[PATH]
Path to BED file with regions to process. Supports cloud URIs (s3://, gs://, http(s)://) via htslib's remote file streaming. The BED must have exactly 3 tab-separated columns (chrom, start, end). Comment lines starting with # are ignored.
-P,--padding¶
[0-1000]. Default value → 500
Extends each input region by N bases on both sides before windowing. Captures indel breakpoints that straddle region boundaries — a 50 bp deletion centered at a BED endpoint would be missed without sufficient padding. See Windowing & Overlap for how padding interacts with window size and overlap.
-p,--pct-overlap¶
[10-90]. Default value → 20
Percent overlap between consecutive windows. At defaults (1000 bp windows, 20% overlap), the step size is 800 bp, creating 200 bp overlaps. Higher overlap → fewer edge-effect blind spots but more total windows to process. See Windowing & Overlap for the step size formula and trade-offs.
-w,--window-size¶
[1000-2500]. Default value → 1000
Width of each micro-assembly window in base pairs. Larger windows provide more flanking context for large indels but consume more memory per thread. See Windowing & Overlap for detailed trade-offs.
Parameters¶
-T,--num-threads¶
Number of async worker threads for parallel window processing. Default value → 2. Each thread owns an independent VariantBuilder instance with no shared mutable state. See Performance & Parallelism for the threading architecture.
-k,--min-kmer¶
Minimum k-mer length to try for micro-assembly graph nodes. Default value → 13. Allowed range: [13–253]. The graph construction starts at this k-mer size and increments by --kmer-step on retry. Smaller values increase sensitivity for short variants but produce more complex (slower) graphs. See K-mer Retry Cascade for the retry logic.
-K,--max-kmer¶
Maximum k-mer length to try for micro-assembly graph nodes. Default value → 127. Allowed range: [15–255]. If no cycle-free graph is produced by this k-mer size, the window yields no assembled haplotypes. Higher values can resolve longer repeat motifs at the cost of requiring longer exact matches in reads. See K-mer Retry Cascade.
--kmer-step¶
K-mer step size for the retry cascade. Must be one of {2, 4, 6, 8, 10}. Default value → 6. Smaller steps try more intermediate k values before exhausting the search space — more chances to find a clean graph, but more rebuild iterations. See K-mer Retry Cascade.
--min-anchor-cov¶
Minimum coverage for source/sink anchor nodes in the De Bruijn graph. Default value → 5. Anchors are reference k-mers at the start and end of the assembled component — they must be well-supported to define reliable walk boundaries. Lower values may enable assembly in low-coverage regions but risk anchoring on noise.
--min-node-cov¶
Minimum coverage for non-anchor nodes in the De Bruijn graph. Default value → 2. Nodes below this threshold are pruned during the Graph Pruning Pipeline. Higher values aggressively filter noise (faster runtime) but reduce sensitivity for subclonal variants with low allele frequency.
--max-sample-cov¶
Maximum per-sample coverage before downsampling. Default value → 1000. Windows exceeding this threshold per sample are downsampled using a deterministic paired strategy (fixed seed for reproducibility). Both mates of a pair are symmetrically accepted or rejected. See Read Filtering & Downsampling for the full downsampling algorithm.
Flags¶
--verbose¶
Turn on verbose logging. Emits per-window status messages (skipped/assembled/genotyped) and graph complexity metrics to stderr.
--extract-pairs¶
Extract out-of-region mate reads for discordant and supplementary-aligned pairs. When enabled, reads that are not in a proper pair or have a supplementary alignment (SA tag) trigger targeted htslib queries to retrieve their out-of-region mates. This captures split reads spanning structural variant breakpoints at window edges, at the cost of additional I/O per discordant pair. See Read Filtering & Downsampling for details.
--no-active-region¶
Force assembly of all windows regardless of mutation evidence. By default, Lancet2 skips windows where no read shows variation (the Active Region Detection heuristic). This flag disables that fast-skip, forcing assembly of every window. Useful for completeness auditing but causes 5–10× slower runtime on WGS. See Active Region Detection for the heuristic algorithm and the MD tag requirement.
--no-contig-check¶
Skip contig name validation between the reference FASTA and BAM/CRAM headers. Use when contig naming conventions differ across files (e.g., chr1 vs 1). Without this flag, mismatched contig names cause Lancet2 to exit with an error.
Optional¶
--out-graphs-tgz¶
Output path for a gzipped TAR archive containing per-window assembly graphs in DOT and GFA format. Path must end in .tar.gz. Lancet writes a single archive containing two top-level subtrees: dbg_graph/<chrom>_<start>_<end>/...dot (Graphviz files, one per connected component per window) and poa_graph/<chrom>_<start>_<end>/...{gfa,fasta} (SPOA POA graphs + multiple-sequence alignments). Snapshots are written only for the k-attempt that succeeded — abandoned attempts (cycle / complexity retry) leave no artifacts in the archive. Extract with tar -xzf <archive>.tar.gz -C <where>/ to recover the per-window directory tree, then run dot -Tpdf <file>.dot on individual files. See Custom Visualization for rendering instructions and interpretation.
--graph-snapshots¶
[final|verbose]. Default value → final
Controls DOT snapshot verbosity when --out-graphs-tgz is set. final (default) emits one DOT per component per window: filename substring is enumerated_walks if walks were enumerated, fully_pruned otherwise. verbose additionally emits intermediate-stage snapshots after each pruning step once reference anchors are identified. (compression1, low_cov_removal2, compression2, short_tip_removal). See Custom Visualization for the orthogonal styling axes (role / anchor / probe / walk-color overlays) and the multi-walk colorList composition.
--genome-gc-bias¶
[0.0-1.0]. Default value → 0.41
Global genome GC fraction for LongdustQ score correction. Set to 0.5 to disable GC correction (uniform model). Default value (0.41) is the human genome-wide GC average. Adjust for non-human genomes or targeted panels with significantly different GC content.
Lancet2 pipeline \
--genome-gc-bias 0.41 \
--normal normal.bam --tumor tumor.bam \
--reference ref.fasta --region "chr22" \
--out-vcfgz output.vcf.gz
VCF Output¶
See the VCF Output Format guide for complete documentation of all INFO and FORMAT fields.