Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraction of FASTA adds unnecessary .pdb extension if absent, which leads to inconsistensies #45

Open
valentynbez opened this issue Dec 27, 2023 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@valentynbez
Copy link

I.e., when I extract FASTA from afdb_swissprot_v4:

foldcomp_id = AF-B1YUJ2-F1-model_v4
fasta_header = AF-B1YUJ2-F1-model_v4.pdb

When I extract FASTA from my personal db:

foldcomp_id = MIP_00183643.pdb
fasta_header = MIP_00183643.pdb

I cannot use FASTA headers to query the database consistently.

@khb7840
Copy link
Member

khb7840 commented Dec 29, 2023

I'm sorry for the inconsistency.
This kind of thing would happen with only provided databases that I removed ".pdb" from .lookup files.
The default behavior of foldcomp is saving extension to the ID as your example from personal db. (It saves file name)
For afdb databases we provided, we removed pdb extensions for saving spaces & easier scripting but I think that resulted into this kind of inconsistency.
Appending ".pdb" extension to the second column of lookup file like scripts below would help.

# Appending pdb extensions
awk -F '\t' 'BEGIN {OFS="\t";} {$2=$2 ".pdb"; print;}' afdb_swissprot_v4.lookup > afdb_swissprot_v4.new.lookup
# Replace afdb_swissprot_v4.lookup with afdb_swissprot_v4.new.lookup

@khb7840 khb7840 added the help wanted Extra attention is needed label Dec 29, 2023
@valentynbez
Copy link
Author

Proposed behaviour

Remove all file suffixes after first .. Throw a warning, that . are not allowed in the filenames, as it might results in duplications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants