-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finish developing/evaluating hapestry merge #15
Comments
Great to see some boxes getting checked! 👍 Might be nice to record some notes about the actions taken here, if they were meaty enough and you feel like it would be useful for yourself or others to see at some point. |
Thanks for checking in, I can point to my commits with a bit of explanation:
|
After fixing multiple issues, the most egregious of which was forgetting to use the GIAB confident BED uniformly in the experiment, I am getting much better performance. Here are the results for various d_weight=1d_weight=4d_weight=32I think there is still room for improvement, so I will continue to investigate the missing haplotype_coverage. @samuelklee since this is nearing a viable output, is there some WDL we should run to see how the downstream performance is for Hapestry on the 47 HPRC samples? It might be interesting to see. The next major milestone for improving imputation results will probably be the inclusion of small vars, which @fabio-cunial is working on. |
This sounds great! And yes, take a look at PhasedPanelEvaluation—ideally you should just be able to slot your VCF into the |
For the purpose of comparing to kanpig is there any way to guarantee equal subsetting of the VCF by a “confident” BED? I don’t want to make the same mistake twice and compare with unequal/missing regions |
The Vcfdist task takes in a BED file, right now we are using GIAB |
I see, I didn't realize we were skipping all the tandems. Is that because of some issue caused by multiallelics? they are probably the biggest area of improvement for hapestry vs truvari |
I think the choice of that BED file was perhaps arbitrary. I would hope that we expand the Vcfdist task to take in an array of BEDs and stratify. We'll get there! |
It looks like all of your evaluation is on GRCH38 😐 |
The workflow itself should be fairly reference agnostic, although there may be one or two resources in the evaluations that aren’t as readily available for CHM. |
Given that there are about 18 fields in the JSON that would need to be changed I think I will wait until I rerun hapestry on GRCH38. |
(Also note from #1 that this was the plan from the start… I’d rather not spend time on doing HPRC-only for two references, given that we’re supposedly getting AoU1 access back tomorrow!) |
ERROR: multiple paths are ref-only nodes: <3<2<1 != >1>2>3
The text was updated successfully, but these errors were encountered: