-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated evaluation of current end-to-end pipeline. #1
Comments
#20 was just merged and gets us most of the way there. Some remaining TODOs:
But even before adding these, note that the drop in recall in the last VCF produced by PanGeniePanelCreation (e.g., for SVs, from 82% in VcfdistEvaluationShapeit4 to ~60% in VcfdistEvaluationPanel) is a good enough indicator of remaining overlap issues to guide development for now. Again, this drop results solely from the PanGenie script removing an entire allele if it is found to overlap in any one sample. In this case, the 6 additional FNs (note one locus with multiple removed alleles) are at:
It is easy to see how this problem exacerbates when we have many more samples. In any case, we should already have what we need for some simple experiments, e.g.:
|
@kvg probably a good point for you to take a look and get caught up. Sorry, took just over a week 😆 |
I think we can consider this first push complete! |
vcfdist still broken :( |
but yes pipeline runs end to end 🎉 |
File an issue 😜 I did enable those arguments in a test run; didn’t seem to move the numbers too much, but maybe the story is different for tandems. |
Meta-issue. Please spin out issues and self-assign.
For the first cut, let's organize WDLs and resources for a pipeline that takes
Inputs
and goes through
Methods
covered by per-stage evaluations (as sensible)
Evaluations
For now, freely open PRs and merge without review---but please do use descriptive commit messages and PR titles. Furthermore, please commit fresh copies of all relevant WDLs and indicate versioned provenance and provide a link in a corresponding PR comment, if appropriate/possible.
I will organize the end-to-end evaluation in a megaWDL and then do a round of cleanup of the subworkflows after an initial manual run (with the goal being to show that cleanup does not affect performance). I expect running this evaluation to be manual for the near future, but we can think about CI testing later if it makes sense.
We'll continue to work with hg38 chr1:100-110Mbp to start.
Once this settles (hopefully within a week or two), we'll be better able to see where @rlorigro can slot in Hapestry methods and demonstrate improvement. If it makes sense, we can expand coverage of the pipelines upstream to intra/intersample integration and downstream to SR genotyping/phasing/imputation.
The text was updated successfully, but these errors were encountered: