Skip to content

Latest commit

 

History

History
172 lines (149 loc) · 7.48 KB

README.md

File metadata and controls

172 lines (149 loc) · 7.48 KB

get_MNV

get_mnv License: GPL v3 Anaconda-Server Badge Anaconda-Version Badge DOI PGO

Paula Ruiz-Rodriguez1 and Mireia Coscolla1
1. Institute for Integrative Systems Biology, I2SysBio, University of Valencia-CSIC, Valencia, Spain

get Multi-Nucleotide Variants

get_MNV is a tool designed to identify Multi-Nucleotide Variants (MNVs) within the same codon in genomic sequences. MNVs occur when multiple Single Nucleotide Variants (SNVs) are present within the same codon, leading to the translation of a different amino acid. This tool addresses limitations in current annotation programs like ANNOVAR or SnpEff, which are primarily designed to work with individual SNVs and might overlook the actual amino acid changes resulting from MNVs.

get_MNV seeks to address this issue, enhancing the comprehensiveness of genetic variant interpretation.

get_MNV

IMPORTANT this script works with SNV against a reference, insertions and deletions modifiying reading frame are not currently supported

💾 Features

  • MNV Identification: Detects SNVs occurring within the same codon and reclassifies them as MNVs.
  • Accurate Amino Acid Change Calculation: Computes the resulting amino acid changes based on genomic reads.
  • Integration with BAM and VCF Files: Supports input from VCF files for variants and optional BAM files for aligned reads.
  • Quality Analysis: Allows setting a minimum Phred quality threshold to filter out low-quality reads.

🛠️ Installation

You can install get_MNV via conda, mamba (for unix/mac) or downloading the binary file (unix):

🐍 Using conda

conda install -c bioconda get_mnv

🐍 Using mamba

mamba install -c bioconda get_mnv

📨 Using binary

wget https://github.com/PathoGenOmics-Lab/get_MNV/releases/download/1.0.0/get_mnv

📎 Usage

get_mnv [OPTIONS] --vcf <VCF_FILE> --fasta <FASTA_FILE> --genes <GENES_FILE>

🗃️ Options:

  • -v, --vcf <VCF_FILE>: VCF file containing the SNVs. (Required)
  • -b, --bam <BAM_FILE>: BAM file with aligned reads. (Optional)
  • -f, --fasta <FASTA_FILE>: FASTA file with the reference sequence. (Required)
  • -g, --genes <GENES_FILE>: File containing gene information. (Required)
  • -q, --quality : Minimum Phred quality score (default: 20).

Example:

get_mnv \
  --vcf variants.vcf \
  --bam reads.bam \
  --fasta reference.fasta \
  --genes genes.txt \
  --quality 30

Input File Formats

  • VCF File: Should contain the identified SNVs.
  • BAM File: (Optional) Genomic reads aligned to the reference sequence.
  • FASTA File: Reference genomic sequence.
  • Gene File: A tab-delimited text file with the following structure per line (GeneName,GeneStart,GeneEnd,Strand):
Rv0007_Rv0007	9914	10828	+
ileT_Rvnt01	10887	10960	+
alaT_Rvnt02	11112	11184	+
Rv0008c_Rv0008c	11874	12311	-
ppiA_Rv0009	12468	13016	+
Rv0010c_Rv0010c	13133	13558	-

🎴Output

The program generates a TSV file named <vcf_filename>.MNV.tsv containing the following information:

  • Gene: Name of the gene.
  • Positions: Positions of the variants.
  • Base Changes: Nucleotide base changes.
  • AA Changes: Resulting amino acid changes.
  • SNP AA Changes: Amino acid changes if considering individual SNVs.
  • Variant Type: Type of variant (SNP, MNV, or SNP/MNV).
  • Change Type: Type of change at the protein level (Synonymous, Non-synonymous, Stop gained).
  • SNP Reads: (If BAM provided) Count of reads supporting each SNP.
  • MNV Reads: (If BAM provided) Count of reads supporting the MNV.

Example:

Gene	Positions	Base Changes	AA Changes	SNP AA Changes	Variant Type	Change Type	SNP Reads	MNV Reads
Rv0095c_Rv0095c	104838	T	Asp126Glu	Asp126Glu	SNP	Non-synonymous	0	16
Rv0095c_Rv0095c	104941,104942	T,G	Gly92Gln	Gly92Glu; Gly92Arg	MNV	Non-synonymous	0,0	25
esxL_Rv1198	1341044	C	His13His	His13His	SNP	Synonymous	0	41
esxL_Rv1198	1341083	G	Ala26Ala	Ala26Ala	SNP	Synonymous	0	12
esxL_Rv1198	1341102,1341103	T,C	Arg33Ser	Arg33Cys; Arg33Pro	MNV	Non-synonymous	0,0	11

📉 Limitations

  • The script currently works only with SNVs compared against a reference sequence.
  • Insertions and deletions that modify the reading frame are not supported in this version.

✨ Contributors

get_MNV is developed with ❤️ by:

Paula Ruiz-Rodriguez

💻 🔬 🤔 🔣 🎨 🔧

Mireia Coscolla

🔍 🤔 🧑‍🏫 🔬 📓

This project follows the all-contributors specification (emoji key).


Fun

3D model logo

Click for the stl file

get_MNV