-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tolerant VCF parsing #156
Comments
Hi, can you show me where to find this? I looked in this one: https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chrY.vcf.bgz |
The spec: https://samtools.github.io/hts-specs/VCFv4.2.pdf
So, I get that you want more lenient parsing, but do other parsers handle this? And is it something added at your institution? Or from the original gnomad files? |
Thanks for the fast response! Yeah, I get that it's non-conformant. I like standards, and I think they are important, so I am sympathetic to the "your data is drunk. Come back when it's sober!" argument. Especially when it comes to the metadata, I think there are two kinds of non-conformance. In some cases the non-conformance leads to a situation where the program can't figure out how to produce correct output. In other cases the problem is essentially cosmetic and is orthogonal to the production of correct output. I am pretty sure the file is derived from the individual chromosome files by running VEP and concatenating them, but I don't know the precise provenance. I'm still trying to find out. The Python and Rust libraries I use (and the C++ I've written) ignore non-conformant meta lines with a warning when reading, but scrupulously make sure they only emit conformant data. |
G'day. Thanks for making a nice tool.
I'm trying to use
vcfanno
(0.3.5, linux binary) with a large combined VCF of gnomad v3.1. The combined bgzipped file is ~2TB, so obviously manipulating it is inconvenient at best.I don't know if these are standard in the gnomad downloads, but
vcfanno
is aborting:The offending lines in the gnomad VCF are:
For reference, the
config.toml
I am using is:Any chance that the VCF parsing could be made a bit more tolerant for headers? It would be pretty painful to have to modify the GnomAD VCF.
Tom.
The text was updated successfully, but these errors were encountered: