Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR! Tree file of gene COX2 not found : trees2/aligned_COX2_pro.zZ.fasta.treefile #21

Open
salvatierra8 opened this issue Aug 2, 2023 · 15 comments

Comments

@salvatierra8
Copy link

Greetings,

I getting said error when running the tree command, it also seems the process does not complete because of this. I'm not able to determine what is causing the error. I checked the COX2_pro.zZ.fasta files:

zZ7200270472082568702zZ
MYFQDSATPNQEEDGQLRLLDTDTSIVAPVDTHIRFIVSAADVIHDFAIPSLGIKIDACPGRLNQVSALIEREGVFYGQCSELCGVAHSAMPIKLEVVSLPEFLE

zZ5232740519337231783zZ
MGRESLVSPRRSRAASARRLLPGLSRRVLTSLLLFSRRRSYGDLSGVSPEICNNGLGCGSPLDTSVAPEGMLGVSRPPALVDSPTSSDDPPSVLPAQNISATHFYVGSNVVRNYGIFLQARNIPGQHFAVHTWSHQYTTSLTNEQVVAEIGWTMQILADNNFGLIPAHWRPRELVSSRFFTSHDSDSSPLPSPRPAYGDVDNRVRAIAREVFGLKTITWNPDEYEARVRGSKSPGLIPLEHVTSVEAVDGELIYVLTPPKYLLLTLYPFVNQSSSRPTLSSSPKAGTSPTSFVASPLAFLPFVLLFLADAKFFLSVFSFSLISGQKPGTRTPTPATSRRVSLQESPSSRLASPMAPLLPWLQQEHPLLDTQNGGAFCCGPEFSWELWERGEGGVGGVGCSWEHSRFGHWFRLGRNELDQIVLIFDTLPRHFDSKTPSELSRPLFLLQPPTLSFSSEDAALSIKQPLQADETDLLDNLVSRLPPLPSPPPSMSISLLPQEVLEPILHLAIQPSTTEGASILLVCTLWHNLGREKLYEHVTLSSQAAYDSYFLLGGSKASWRPLAQAQRTLDYQNLRSLHLRFGPLTKLPFSLSSSNPSPPYLPRFRNLKLIHLDLAKGSSYLSCRPKPMARRVAKLMGGFSPETMILARSSSAEISLSAVMPHHLRRTKYLLLASHHVSHLPSLTPQSCPAMIKNVGFQLLTGVLLPPSALAPLSETTKPSSTRSSPSSSAVGGLQPAVLRDSTFAVCHFTASSVCRMAAVVVSSRQDTSLASRFRSARPNVLFSPPRGLKPSSSQLHPHFPPACSPPVSHLSPRPSPPGYVSPPPSFKDTGSKSCRESRKAKPISSHLSHPFPFLFPSSLPQAFSSTPRSAPSAAIVGNSMLAASGAEGRADLSRWIETQPGSLPTTSTTTSTRRTLPSRSSLPPPRRMYLFPSISGRKGRGADLWSELTSSSSFFFLLLPITLRSSTSSNHPSSTPHSPLHPWILHHRIYHQRHHHQLHSLPLFKTPSTNAFSNLGSLTTSLLAHADEVGVSFDSYMVPDNEIADGQPRLLDVDARVVLPIETHTRFILSSTDVIHDWAVPSLGIKMDAMPGRLNQTSTLIERKGLFFGQCSELCGVYHGFMPIVVEAVELPEYLAWLLAQE

zZ7320208565470240394zZ
MYFQDSATPNQEEDGQLRLLDTDTSIVAPVDTHIRFIVSAADVIHDFAIPSLGIKIDACPGRLNQVSALIEREGVFYGQCSELCGVAHSAMPIKLEVVSLPEFLE

The only weird thing that I am able to discern is that the sequence is significantly larger than the others, also with less identity. What could be causing this error?

@endixk
Copy link
Member

endixk commented Aug 7, 2023

Hello,

Seems like a false positive hit, which can be resolved by lowering the search sensitivity after I implement the feature as #19.

For now, could you please try to use different tree inference methods (FastTree or RAxML) and see if the issue persists? This will specify which step is failing, between alignment and tree inference.

@salvatierra8
Copy link
Author

Hello,

I forgot to update the topic, I did used Fasttree and it worked. But I will also try the other solution and hopefully to not forget to make a comment about it. Thank you very much!

@salvatierra8
Copy link
Author

So, I just tested the new feature, but so far the default tree option is not working for any of the sensitivity options at least for my data. I have rerun the tree using Raxml without any problem with both default sensitivity and lowest sensitivity option.

@endixk
Copy link
Member

endixk commented Aug 16, 2023

Could you check if the same super long COX1 sequence was found from the profile generated with the lowest sensitivity option? If so, I will try to look into the reference gene database of these mitochondrial genes.

@salvatierra8
Copy link
Author

yes it did happen but not with COX anymore but TUB1

@JWDebler
Copy link

Hi, I'm getting the same error for another protein:

ERROR! Tree file of gene HEM12 not found : tree/aligned_HEM12_pro.zZ.fasta.treefile

I had a look in the fasta file and there is no false positive hit as in @salvatierra8's case. My sequences all line up nicely.
Ran again with raxml and fasttree and it finished without problems.

@endixk
Copy link
Member

endixk commented Jun 3, 2024

I recently stumbled into this error using a smaller dataset and found the exact reason why IQ-TREE suffers.

IQ-TREE deduplicates the input MSA, therefore if given MSA contains 3 or less unique alignment rows, the tree won't be produced, which subsequently results in this gene tree not found error.

My recent commit rectifies this issue, and will be included in the next stable release. I suppose a binary compiled with the most recent version won't suffer from this issue anymore.

I would be most appreciated If anyone can test this on your dataset to see whether the issue is fixed.

@JWDebler
Copy link

JWDebler commented Jul 3, 2024

I just ran tree with your recent commit version and got this error:

image

The command used:

ufcg tree -i output_lentis -l label -a nucleotide -t 16 -o output_lentis_tree_nucleotide

Not sure where the -T comes from which it is complaining about. Still happens if I remove the -t 16.

@endixk
Copy link
Member

endixk commented Jul 5, 2024

-T option is given internally to set a multi-thread option for iqtree binary. This error should not happen, unless the dependent binary is either not properly installed or updated with this argument removed (which is not likely).

Please check your iqtree installation and try again, and if the error persists, please provide the resulting messages with -dev option given.

@JWDebler
Copy link

JWDebler commented Jul 8, 2024

Looks like it was due to an old version of iqtree installed via apt.

@JWDebler
Copy link

JWDebler commented Jul 8, 2024

OK, next problem :-)
The treebuilding step finished correctly, however the final 'cleanup' didn't happen. All the files in the output directory have 'zZ' in their filenames, and the 'label' tag from the metadatafile used during profile has not been applied. All the files instead have strings zZ2641650705628771812zZ etc.
Previous successful runs clean up the directory and moved files into subfolders.
Can I run the respective commands manually somehow?
I just had a look at the prune model, but the run did not produce a .trm file, maybe that is the problem?
Cheers

@JWDebler
Copy link

JWDebler commented Jul 9, 2024

This seems to be a problem with the current git version. The version installed via conda (without the iqtree fix) properly processes all the files after the tree building step.

Git version:
image

Conda version:
image

@endixk
Copy link
Member

endixk commented Jul 9, 2024

@JWDebler I looked into this, and found out that the Maven compiled binary doesn't properly include the GSI calculation package as a dependency. Precompiled JAR (including the conda release) doesn't suffer from this. Confusing part is that the process is finishing without invoking any error.

Since I do not have a source code for this package, I need to find a way to properly include the package into the pom.xml configuration. Until I found out the solution, please use the -G option included from the recent commit, which will turn off the GSI analysis and evade the problem. If you need a GSI annotated tree output, please use the stable conda version.

@JWDebler
Copy link

@endixk Thanks, yep renaming works with this commit. The folder doesn't get cleaned up though, all the files are in the same folder while the previous conda version (1.0.5) organises everything neatly like this:
image
Just ran the current conda version (1.0.6) and it also didn't clean up the resutls folder.

@endixk
Copy link
Member

endixk commented Jul 10, 2024

The cleaning script is included in the config payload and they'll be gone after the version update. It should work fine after downloading it with ufcg download -t config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants