Update README.md

steineggerlab · Jan 10, 2025 · 12b76f3 · 12b76f3
1 parent 6f52e8c
commit 12b76f3
Showing 1 changed file with 10 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -31,6 +31,7 @@ Foldseek enables fast and sensitive comparisons of large protein structure sets,
     - [Databases](#databases)
       - [Create custom databases and indexes](#create-custom-databases-and-indexes)
       - [Create custom database from protein sequence (FASTA)](#create-custom-database-from-protein-sequence-fasta)
+      - [Pad database for fast GPU search](#pad-database-for-fast-gpu-search)
     - [Cluster](#cluster)
       - [Output Cluster](#output-cluster)
         - [Tab-separated cluster](#tab-separated-cluster)
@@ -204,6 +205,15 @@ foldseek createdb db.fasta db --prostt5-model weights --gpu 1
 - Use the `CUDA_VISIBLE_DEVICES` variable to select the GPU device(s).
   - `CUDA_VISIBLE_DEVICES=0` to use GPU 0.
   - `CUDA_VISIBLE_DEVICES=0,1` to use GPUs 0 and 1.
+
+#### Pad database for fast GPU search
+GPU searches require the database to be reformatted, with padding added to each sequence using the `makepaddedseqdb` command. The padded database can be used for both CPU and GPU searches.
+```
+# Prepare the database for GPU search
+foldseek makepaddedseqdb db db_pad
+# Perform GPU search
+foldseek search db db_pad result_dir --gpu 1
+```
 
 ### Cluster
 The `easy-cluster` algorithm is designed for structural clustering by assigning structures to a representative protein structure using structural alignment. It accepts input in either as protein structures as PDB/mmCIF or protein sequences as [fasta](#create-custom-database-from-protein-sequence-fasta) format, with support for both flat and gzipped files. By default, easy-cluster generates three output files with the following prefixes: (1) `_clu.tsv`, (2) `_repseq.fasta`, and (3) `_allseq.fasta`. The first file (1) is a [tab-separated](#tab-separated-cluster) file describing the mapping from representative to member, while the second file (2) contains only [representative sequences](#representative-fasta), and the third file (3) includes all [cluster member sequences](#all-member-fasta).