Scoring Somatic Variants¶
Model Compatibility
The somatic scoring script (score_variants.py) and ML model are compatible only with Lancet2 v2.8.7 VCF output. You must use the v2.8.7 release to use this model. No pre-trained ML model is currently available for versions above v2.8.7.
Setup & Installation¶
The score_variants.py python script requires Python 3.x and some additional dependencies which are ideally installed using a virtual environment.
python3 -m venv --upgrade-deps pyenv
./pyenv/bin/pip install numpy==1.26.4 tqdm==4.66.2 pysam==0.22.0 interpret-core==0.5.1
The explainable somatic machine learning model (somatic_ebm.lancet_6ef7ba445a.v1.pkl) is also needed to run the score_variants.py script.
Usage¶
./pyenv/bin/python3 score_variants.py \
lancet2_output.vcf.gz somatic_ebm.lancet_6ef7ba445a.v1.pkl \
> lancet2_output.somatic_scoring.vcf
The PASS somatic variants can then be filtered from the scored VCF as follows.