-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VBID2 may be underestimating intra-family contamination when applied to long reads (CCS) #46
Comments
Initial discovery: mostly fine, but seems to underestimate familial mixturesIn the initial experiment, we created in siloco mixture of CCS BAM files with different mixture levels.
However, notice the mixture that involves parental-daughter mixtures, where VBID2 meaningfully underestimates the contamination levels. This table was generated in the following procedure
|
Maybe the downsampling is wrong? Merging the BAMs directlyIn order to avoid any artifacts that could have been introduced in the downsampling step and the pileup conversion step in the pipeline, we simply merged the bams for the three HGSVC families (CHS, PUR, and YRI) files without any downsampling, and ran VBID2 directly on the merged BAMs (re-header-ed by changing the sample names to avoid an exception). The results are similar but also different: notice the PUR family has
And I started to notice that the coverage calculation by
|
Using BAMs from HGSVC2Suspecting that the BAM files we used may have some issues, we performed the above step using the data generated by HGSVC2.
|
Using gNomAD short reads BAMs for the PUR familyIn this case, contamination estimation from VBID2 is concordant with the expected levels.
|
Hi, @SHuang-Broad
Thank you again for these experiments. I wonder if you can share with me part of your simulation data, I can try to do some experiments to adjust the underlying model to take these issues into account. Fan |
Thanks for the reply, Fan! I'll ask and see if I can share the data (they are on public samples, so it should be OK, but I need to confirm). In the meantime, please let us know where we could contribute code (we are happy to see VBID2 work). Thanks! |
Hello Fan,
Continuing our effort to incorporate VBID2 into our pipeline for contamination estimation (see #43), we performed some experiments to evaluate its performance, given the fact that the tool was developed with short reads in mind.
Let me state the summary first, and post the experiment in subsequence posts.
Summary
In summary, it seems that VBID2 is underestimating contamination levels when
I am not sure how much time you have for addressing this issue.
But if you do, I can do more experiments as needed.
Thank you!
Steve
The text was updated successfully, but these errors were encountered: