From 12b76f35bfcdde7f23f47109b8fbfad219427e52 Mon Sep 17 00:00:00 2001
From: Martin Steinegger <themartinsteinegger@gmail.com>
Date: Fri, 10 Jan 2025 22:17:46 +0900
Subject: [PATCH] Update README.md

---
 README.md | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/README.md b/README.md
index 66106aac..6d62d831 100644
--- a/README.md
+++ b/README.md
@@ -31,6 +31,7 @@ Foldseek enables fast and sensitive comparisons of large protein structure sets,
     - [Databases](#databases)
       - [Create custom databases and indexes](#create-custom-databases-and-indexes)
       - [Create custom database from protein sequence (FASTA)](#create-custom-database-from-protein-sequence-fasta)
+      - [Pad database for fast GPU search](#pad-database-for-fast-gpu-search)
     - [Cluster](#cluster)
       - [Output Cluster](#output-cluster)
         - [Tab-separated cluster](#tab-separated-cluster)
@@ -204,6 +205,15 @@ foldseek createdb db.fasta db --prostt5-model weights --gpu 1
 - Use the `CUDA_VISIBLE_DEVICES` variable to select the GPU device(s).
   - `CUDA_VISIBLE_DEVICES=0` to use GPU 0.
   - `CUDA_VISIBLE_DEVICES=0,1` to use GPUs 0 and 1.
+ 
+#### Pad database for fast GPU search
+GPU searches require the database to be reformatted, with padding added to each sequence using the `makepaddedseqdb` command. The padded database can be used for both CPU and GPU searches.
+```
+# Prepare the database for GPU search
+foldseek makepaddedseqdb db db_pad
+# Perform GPU search
+foldseek search db db_pad result_dir --gpu 1
+```
 
 ### Cluster
 The `easy-cluster` algorithm is designed for structural clustering by assigning structures to a representative protein structure using structural alignment. It accepts input in either as protein structures as PDB/mmCIF or protein sequences as [fasta](#create-custom-database-from-protein-sequence-fasta) format, with support for both flat and gzipped files. By default, easy-cluster generates three output files with the following prefixes: (1) `_clu.tsv`, (2) `_repseq.fasta`, and (3) `_allseq.fasta`. The first file (1) is a [tab-separated](#tab-separated-cluster) file describing the mapping from representative to member, while the second file (2) contains only [representative sequences](#representative-fasta), and the third file (3) includes all [cluster member sequences](#all-member-fasta).