-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BQSR: avoid throwing an error when read group is missing in the recal table, and some refactoring. #9020
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just needs another pass at some of your comments and TODOs.
build.gradle
Outdated
@@ -188,7 +188,7 @@ configurations.all { | |||
} | |||
|
|||
tasks.withType(JavaCompile) { | |||
options.compilerArgs = ['-proc:none', '-Xlint:all', '-Werror', '-Xdiags:verbose'] | |||
options.compilerArgs = ['-proc:none', '-Xlint:all', '-Xdiags:verbose'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beyond my job description but I trust there's a reason?
* If set to true, do not throw an error upon encountering a read with a read group that's not in the recalibration table. | ||
* Instead, simply set the quantized original base qualities as the recalibrated base qualities. | ||
*/ // tsato: should this be in the *unique* ApplyBQSRArgumentCollection? | ||
@Argument(fullName = ALLOW_MISSING_READ_GROUPS_LONG_NAME, doc = "", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty doc string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -90,7 +90,7 @@ public final class ApplyBQSR extends ReadWalker{ | |||
*/ | |||
@Argument(fullName=StandardArgumentDefinitions.BQSR_TABLE_LONG_NAME, shortName=StandardArgumentDefinitions.BQSR_TABLE_SHORT_NAME, doc="Input recalibration table for BQSR") | |||
@WorkflowInput | |||
public File BQSR_RECAL_FILE; | |||
public File bqsrRecalFile; // tsato: change to GATKPath? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address this TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -26,14 +28,14 @@ | |||
|
|||
public final class BQSRReadTransformer implements ReadTransformer { | |||
private static final long serialVersionUID = 1L; | |||
|
|||
// tsato: remove? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
//clear indel qualities | ||
// clear indel qualities // tsato: why? Is this ok? Do we update them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for ( final NestedIntegerArray.Leaf<RecalDatum> leaf : byQualTable.getAllLeaves() ) { | ||
final int rgKey = leaf.keys[0]; | ||
final int eventIndex = leaf.keys[2]; | ||
final int rgKey = leaf.keys[readGroupIndex]; // tsato: ??? What are these indices? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -32,7 +32,7 @@ | |||
public final class RecalUtils { | |||
public static final String ARGUMENT_REPORT_TABLE_TITLE = "Arguments"; | |||
public static final String QUANTIZED_REPORT_TABLE_TITLE = "Quantized"; | |||
public static final String READGROUP_REPORT_TABLE_TITLE = "RecalTable0"; | |||
public static final String READGROUP_REPORT_TABLE_TITLE = "RecalTable0"; // TODO: make them more descriptive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
private static boolean warnUserNullPlatform = false; | ||
|
||
private static final String SCRIPT_FILE = "BQSR.R"; | ||
public static final int EMPIRICAL_QUAL_DECIMAL_PLACES = 4; | ||
public static final int EMPIRICAL_Q_REPORTED_DECIMAL_PLACES = 4; | ||
public static final int REPORTED_QUALITY_DECIMAL_PLACES = 4; // tsato: "estimated" q reported...we need to rename (DONE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
public final class ContextCovariate implements Covariate { | ||
private static final long serialVersionUID = 1L; | ||
private static final Logger logger = LogManager.getLogger(ContextCovariate.class); | ||
|
||
private final int mismatchesContextSize; | ||
private final int mismatchesContextSize; // TODO: rename "mismatch" here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -136,27 +152,30 @@ public double getEmpiricalErrorRate() { | |||
return doubleMismatches / doubleObservations; | |||
} | |||
} | |||
|
|||
public void setEmpiricalQuality(final double empiricalQuality) { | |||
// tsato: int-double |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@takutosato I took a look. Definitely want to put back the gradle build to how it was. If you need help with the warning we can help. I'm not sure I totally understand the motivation for changing the internal quality value from a double to an int? I think you changed if to being PHRED scaled, but it seems like that might change the precision of the calculation which could change the results? Or is it always output as a PHRED value before further calculations are made? It's not clear to me.
Thank you for the clarifying comments.
Lots of unaddressed TODOs. Some of them probably should be removed but if you think that some of them are useful to future readers than that's fine.
build.gradle
Outdated
@@ -188,7 +188,7 @@ configurations.all { | |||
} | |||
|
|||
tasks.withType(JavaCompile) { | |||
options.compilerArgs = ['-proc:none', '-Xlint:all', '-Werror', '-Xdiags:verbose'] | |||
options.compilerArgs = ['-proc:none', '-Xlint:all', '-Xdiags:verbose'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What warning are we seeing that made you take this out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets fix the warning or supress it instead of turning this off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been reverted.
* If set to true, do not throw an error upon encountering a read with a read group that's not in the recalibration table. | ||
* Instead, simply set the quantized original base qualities as the recalibrated base qualities. | ||
*/ // tsato: should this be in the *unique* ApplyBQSRArgumentCollection? | ||
@Argument(fullName = ALLOW_MISSING_READ_GROUPS_LONG_NAME, doc = "", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot the doc string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this note. I think this argument should be moved up. to ApplyBQSRUniqueArgumentCollection. It is only relevant to the apply stage and not to the recalibration stage so that makes sense for it to be there. Also, I think you need to update the method
ApplyBQSRUniqueArgumentCollection.toApplyBQSRArgumentCollection
to pass this argument to the newly created collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do away with this UniqueArgumentCollection construction --- for another time.
@@ -90,7 +90,7 @@ public final class ApplyBQSR extends ReadWalker{ | |||
*/ | |||
@Argument(fullName=StandardArgumentDefinitions.BQSR_TABLE_LONG_NAME, shortName=StandardArgumentDefinitions.BQSR_TABLE_SHORT_NAME, doc="Input recalibration table for BQSR") | |||
@WorkflowInput | |||
public File BQSR_RECAL_FILE; | |||
public File bqsrRecalFile; // tsato: change to GATKPath? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a good thing to do but we could do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
@@ -26,14 +28,14 @@ | |||
|
|||
public final class BQSRReadTransformer implements ReadTransformer { | |||
private static final long serialVersionUID = 1L; | |||
|
|||
// tsato: remove? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be removed.
private byte[] staticQuantizedMapping; | ||
private final CovariateKeyCache keyCache; | ||
|
||
// tsato: new flag, needs to be moved to apply BQSR argument | ||
private boolean allowMissingReadGroup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- allowMissingReadGroup -> allowMissingReadGroups
- make this final
- I think the comment is resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -92,6 +92,8 @@ public abstract class GATKBaseTest extends BaseTest { | |||
// Variants from a DBSNP 138 VCF form the first 65Mb of chr1 | |||
public static final String dbsnp_138_b37_1_65M_vcf = largeFileTestDir + "dbsnp_138.b37.1.1-65M.vcf"; | |||
|
|||
public static final String BQSR_TEST_RESOURCE_DIR = toolsTestDir + "BQSR/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the wrong place for this constant. I'd put it in a BQSR test or maybe just use the CommandLineProgram.getTestDataDir() if that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
IllegalStateException.class); | ||
spec.executeTest("testMissingReadGroup", this); | ||
public void testMissingReadGroup() { | ||
// TODO: there are two copies of this file (one under Validation and one under BQSR) in the repo; conslidate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. Duplicated test files are alright. Git deduplicates them internally for transfer so they don't really take up meaningful amounts of room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
final byte[] oldQuals = ReadUtils.getOriginalBaseQualities(originalRead); | ||
|
||
if (ReadUtils.getPlatformUnit(originalRead, originalBamHeader).equals(readGroupToFilterOut)) { | ||
// These are the read groups that are not in teh recal table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
if (ReadUtils.getPlatformUnit(originalRead, originalBamHeader).equals(readGroupToFilterOut)) { | ||
// These are the read groups that are not in teh recal table. | ||
// final Random random = new Random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@@ -97,12 +97,12 @@ public void testRecalDatumCopyAndCombine(RecalDatumTestProvider cfg) { | |||
@Test(dataProvider = "RecalDatumTestProvider") | |||
public void testRecalDatumModification(RecalDatumTestProvider cfg) { | |||
RecalDatum datum = cfg.makeRecalDatum(); | |||
datum.setEmpiricalQuality(10.1); | |||
Assert.assertEquals(datum.getEmpiricalQuality(), 10.1); | |||
datum.setEmpiricalQuality(10); // tsato: 10.1 -> 10. Downstream data might be affected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does changing the quality from double -> int potentially change the results of the recalibration table by rounding earlier? I'm not sure what the advantage of that is?
@lbergelson thanks for the review! I will address the changes as soon as I can. The motivation for the double -> int change is in |
Github actions tests reported job failures from actions build 11903509453
|
Github actions tests reported job failures from actions build 12702971619
|
Addresses #6242.
Current behavior: when all the reads in a read group are filtered in the base recalibration step, the read group is not logged in the recal table. Then ApplyBQSR encounters these reads, can't find the read group in the recal table, and throws an error.
New behavior: if
--allow-read-group
flag is set to true, then ApplyBQSR outputs the original quantities (after quantizing).I avoided the alternative approach of collapsing (marginalizing) across the read groups, mostly because it would require a complete overhaul of the code. I also think that using recal data from other read groups might not be a good idea. In any case, using OQ should be good enough; I assume that these "missing" read groups are low enough quality to be filtered out and are likely to be thrown out by downstream tools.
I also refactored the BQSR code, mostly to update the variable and class names to be more accurate and descriptive. For instance:
ReadCovariates.java -> PerReadCovariateMatrix.java
EstimatedQReported -> ReportedQuality