Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some configurations of taxons in --kingdoms don't work #20

Open
HeloiseMuller opened this issue Dec 5, 2022 · 0 comments
Open

Some configurations of taxons in --kingdoms don't work #20

HeloiseMuller opened this issue Dec 5, 2022 · 0 comments

Comments

@HeloiseMuller
Copy link

Hi,

I am trying to clean several de novo assemblies of insects from common contamination sources: human, bacteria and the database UniVec. For this, I concatenated all the fasta in one file called ' cat.fna' and I made a mapping file at the species level called 'ids_cat'.

I then ran:
conterminator dna cat.fna ids_cat conterminator.results tmp_conterminator --threads 20 --blacklist 10239 --kingdoms '2,28384,9606,50557'

I changed the option kingdoms in order to look for contamination between bacteria, other sequences (which is the taxid I used for the sequences of UniVec), homo sapiens and insects. I do not which to look for contamination between by insect genomes.
I do not need to ignore any taxa, so I just specified 10239 in the option blacklist in order to avoid the default taxons (which contain 28384, which I need).

Running this command, I get the following error message rescorediagonal step died.
Interestingly, it works if I only specify 2,28384,9606 or 2,50557 or even 9606,50557 for kingdoms. Do you have any idea, why the combination I used do not work? 28384,50557 does not work either, but I get a different error message: Extractframes died

Moreover, I do not understand why in the output in which I used 2,50557, I have contamination between bacteria and human? Shouldn't it not even be looking for contamination at all between these two taxons in this configuration?

nohup_conterminator.txt

Thanks,

Héloïse

N.B. Just to let you know, it seems that conterminator cannot deal with some pattern of fasta identifier. The sequences of UniVec look like gnl|uv|X66730.1:1-2687-49. I had to change that to gnl uv|X66730.1:1-2687-49 and to write in the mapping file:
gnl 28384. Otherwise I had the error: crosstaxonfilterorf step died.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant