-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing error in WGS single sample workflow #1331
Comments
We asked the User to:
The User replied and noticed: "Read pairs are different lengths which is why I asked about trimming. I can dig to see if this happens consistently across all failures. Parsing error was for read -
The User also mentioned that the mismatch in read pairs isn't consistent across parsing errors. |
Are we sure that this is the line that causes issues? I don't know if "line 204289" refers to the line number with or without header lines. What happens if they only pass that read pair into the aligner? Does it fail? If not, does it fail if they pass plus/minus 10,000 read pairs around line 204289 into the aligner? |
the other thing I'd suggest is to split apart the individual parts in the dragmap --> samtools --> mergebamalignments pipe in If there's nothing obvious there, the fastest way to find the solution is probably to run the data through samtools in a debugger. Depending on the user's familiarity with the samtools source code and/or software debugging in general, that may or may not be feasible. If they get to the debugging step but don't feel that's something they can do, we could try to take a look if we can find the time, if they are able to share the dragmap output bam with us (that's a big "if"). |
ultimately, this will probably end up pointing us to some bug in dragmap, though, so will likely either need a workaround, or illumina to fix |
Thanks for the comments/suggestions. I've passed only the read pair corresponding to line 204289 to the workflow as well as slices of the ubam (including one with X read pairs around line 204289). All these attempts aligned successfully. After looking at log files across ubam shards and across samples the parsing error comes up around line 200K which feels like resource limitation to me. |
@kachulis @michaelgatzen I am running into a similar error in reprocessing WGS data initially processed with dragen v3.7.8. The files initially fail the Picard RevertSam step of CRAM-to-uBAM file conversion due to formatting of a user-defined tag (XQ - Picard expects the XQ tag to be a string but is an integer). To convert the files, I set the RevertSam [--RESTORE_HARDCLIPS] flag to 'false' which ignores the XQ tag; this successfully coverts the CRAM to a validated uBAM but it subsequently fails in the SamToFastqAndDragmapAndMba task. As stated above, I suspect this is a resource limitation and can adjust disk/memory for this particular task but was wondering if you had run into the XQ issues and had advice on how to handle them. Many thanks, |
@mmwheel is this still an issue for you? |
This was posted by an external user:
I am harmonizing WGS data for the GREGoR consortium using the WGS single sample workflow (in dragen_mode) on AnVIL. I've hit a parsing error when reprocessing a subset of the consortium data (see attached log file).
In brief, the data, which throws the error, was similarly pre-processed and uploaded to AnVIL in CRAM format. I successfully converted (passed ValidateSam) these crams to ubams and used this as input for the WGS single sample workflow. As you can see in the log, the error comes up during alignment with DRAGMAP. It first prints "When maskLen < 15, the function ssw_align doesn't return 2nd best alignment information." and then throws a parsing error.
Could you please take a look and let me know if this looks familiar or have any additional insights on how to troubleshoot.
Update:
Are there processing steps that are incompatible with REprocessing with WARP. For instance, I know that for these samples, reads were trimmed and duplicates dropped in the initial processing. Could this be causing the error...
The text was updated successfully, but these errors were encountered: