From 12b76f35bfcdde7f23f47109b8fbfad219427e52 Mon Sep 17 00:00:00 2001 From: Martin Steinegger Date: Fri, 10 Jan 2025 22:17:46 +0900 Subject: [PATCH] Update README.md --- README.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/README.md b/README.md index 66106aac..6d62d831 100644 --- a/README.md +++ b/README.md @@ -31,6 +31,7 @@ Foldseek enables fast and sensitive comparisons of large protein structure sets, - [Databases](#databases) - [Create custom databases and indexes](#create-custom-databases-and-indexes) - [Create custom database from protein sequence (FASTA)](#create-custom-database-from-protein-sequence-fasta) + - [Pad database for fast GPU search](#pad-database-for-fast-gpu-search) - [Cluster](#cluster) - [Output Cluster](#output-cluster) - [Tab-separated cluster](#tab-separated-cluster) @@ -204,6 +205,15 @@ foldseek createdb db.fasta db --prostt5-model weights --gpu 1 - Use the `CUDA_VISIBLE_DEVICES` variable to select the GPU device(s). - `CUDA_VISIBLE_DEVICES=0` to use GPU 0. - `CUDA_VISIBLE_DEVICES=0,1` to use GPUs 0 and 1. + +#### Pad database for fast GPU search +GPU searches require the database to be reformatted, with padding added to each sequence using the `makepaddedseqdb` command. The padded database can be used for both CPU and GPU searches. +``` +# Prepare the database for GPU search +foldseek makepaddedseqdb db db_pad +# Perform GPU search +foldseek search db db_pad result_dir --gpu 1 +``` ### Cluster The `easy-cluster` algorithm is designed for structural clustering by assigning structures to a representative protein structure using structural alignment. It accepts input in either as protein structures as PDB/mmCIF or protein sequences as [fasta](#create-custom-database-from-protein-sequence-fasta) format, with support for both flat and gzipped files. By default, easy-cluster generates three output files with the following prefixes: (1) `_clu.tsv`, (2) `_repseq.fasta`, and (3) `_allseq.fasta`. The first file (1) is a [tab-separated](#tab-separated-cluster) file describing the mapping from representative to member, while the second file (2) contains only [representative sequences](#representative-fasta), and the third file (3) includes all [cluster member sequences](#all-member-fasta).