-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when generating variants where there is a degenerate symbol in the reference #122
Comments
Yeah, I've never seen a "Y" in the reference before. I can investigate how to handle that. For now I would just just do something like |
I've been inspecting the ref files and apparently there are some ambiguous characters like K, Y, M, R, W... I guess I'll just have to preprocess them as you suggest. I don't know if I would have to reindex the human ref genome afterwards. Thanks! |
That sounds like you may have grabbed the protein reference, maybe? NEAT currently only works with DNA. Maybe the new revision of HG38 is doing something different.
…________________________________
From: Daniel Turégano ***@***.***>
Sent: Thursday, July 11, 2024 12:37 PM
To: ncsa/NEAT ***@***.***>
Cc: Allen, Josh ***@***.***>; Comment ***@***.***>
Subject: Re: [ncsa/NEAT] Error when generating variants where there is a degenerate symbol in the reference (Issue #122)
I've been inspecting the ref files and apparently there are some ambiguous characters like K, Y, M, R, W... I guess I'll just have to preprocess them as you suggest. Thanks!
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/ncsa/NEAT/issues/122*issuecomment-2223511946__;Iw!!DZ3fjg!4f4ua_bm2dx3Wtv7K3OhJQ8ePDlvcx45BlqFycLzg3MdWTq7LW0euQv6TTnj5nMQ_5BxPUXYETVu7jWCpbqGTHW5c7q4jw$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGMI727I3FW5FNDQCCMTJYDZL27ELAVCNFSM6AAAAABKW5UKFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTGUYTCOJUGY__;!!DZ3fjg!4f4ua_bm2dx3Wtv7K3OhJQ8ePDlvcx45BlqFycLzg3MdWTq7LW0euQv6TTnj5nMQ_5BxPUXYETVu7jWCpbqGTHV3YIybbw$>.
You are receiving this because you commented.Message ID: ***@***.***>
|
It is the DNA reference indeed, but there are just a few of these degenerate bases spilled over the reference to indicate variation or uncertainty in the assembly. You can read more here: https://en.wikipedia.org/wiki/Nucleic_acid_notation |
okay, just haven't run into those yet I guess in the wild. |
You might try HG19 or some older version of the reference. |
Describe the bug
It looks like when neat was generating variants, it found by chance a “Y” in the reference sequence and aborted the variant generation process.
To Reproduce
Steps to reproduce the behavior:
Download the latest human reference genome: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40/
Make a copy of the provided template config file (I called it test_config_human.yml) and set the parameters:
‘’’reference: <path_to_GRCh38_latest_genomic.fna>
target_bed: <path_to_bed_file>
produce_vcf: true
produce_fastq: false
rng_seed: 6386514007882411’’’
The rest are left with the “.” as default.
Run neat on the command line:
neat --log-name test --log-detail HIGH --log-level DEBUG read-simulator -c test_config_human.yml -o test
Expected behavior
Generate variants and output them to a vcf file.
Desktop:
The text was updated successfully, but these errors were encountered: