-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VerifyBamID: Ensuring Accurate .mu and .UD File Creation #61
Comments
Hi @shadizaheri , VB2 was designed to work with common SNPs in the reference panel VCF which usually should be quite abundant and easy to randomly downsample. On Chromosome 19, there should be way more than 219 sites available(I understand you were trying to extract the intersection between your vcf and the original resource markers). The recipe to create a new set of resource files is either to:
In your case, I would suggest to
Let me know if this works and I'm happy to help. Fan |
Hi @Griffan sorry to post to an old thing here. Unrelated to the original poster here, I've written a tiny Snakemake workflow to do this and have generated parameter estimates for T2T. I need to do some validation to see if the downstream estimates I'm getting are consistent with other alignments, but once that's done, would you be at all interested in posting or linking to the parameter estimates for T2T to avoid people doing duplicated work? If so, I'd certainly welcome your inspection of the workflow such that you're comfortable with the results. |
Hi @lightning-auriga, yes, I will definitely be interested in this update. Let me know what I can help with. You can directly reach me at [email protected]. |
We aim to create the
.mu
and.UD
resource files, which are auxiliary files for the SVDPrefix, using the CHM13 variant call format files (VCFs) available at this link. Our objective is to conduct contamination analysis on a Telomere-to-Telomere (T2T) reference sequence that has been adapted through a lift-over process.As a preliminary step, we attempted to regenerate these resource files specifically for chromosome 19, following the instructions provided for VerifyBamID under the --RefVCF option, as detailed in the VerifyBamID GitHub repository. Chromosome 19 source vcf has 219 sites, but while running the –RefVCF command we were warned that two indel sites were skipped (please see the warning below).
skip indel at chr19:6785595 skip indel at chr19:17225618 NOTICE - Number of Markers:217 NOTICE - Number of Individuals:3202 NOTICE - Success!
As a result, we see 217 sites in our
.mu
and.UD
generated files..mu
: Mean Matrix Shape: (217,).UD
: UD Matrix Shape: (217, 10)Below is the header information from the resulting files for your review:
Could you advise on the following:
.mu
and.UD
files are generated correctly or doing these quick checks are enough?Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: