Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
martin-steinegger authored Jan 10, 2025
1 parent 6f52e8c commit 12b76f3
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Foldseek enables fast and sensitive comparisons of large protein structure sets,
- [Databases](#databases)
- [Create custom databases and indexes](#create-custom-databases-and-indexes)
- [Create custom database from protein sequence (FASTA)](#create-custom-database-from-protein-sequence-fasta)
- [Pad database for fast GPU search](#pad-database-for-fast-gpu-search)
- [Cluster](#cluster)
- [Output Cluster](#output-cluster)
- [Tab-separated cluster](#tab-separated-cluster)
Expand Down Expand Up @@ -204,6 +205,15 @@ foldseek createdb db.fasta db --prostt5-model weights --gpu 1
- Use the `CUDA_VISIBLE_DEVICES` variable to select the GPU device(s).
- `CUDA_VISIBLE_DEVICES=0` to use GPU 0.
- `CUDA_VISIBLE_DEVICES=0,1` to use GPUs 0 and 1.

#### Pad database for fast GPU search
GPU searches require the database to be reformatted, with padding added to each sequence using the `makepaddedseqdb` command. The padded database can be used for both CPU and GPU searches.
```
# Prepare the database for GPU search
foldseek makepaddedseqdb db db_pad
# Perform GPU search
foldseek search db db_pad result_dir --gpu 1
```

### Cluster
The `easy-cluster` algorithm is designed for structural clustering by assigning structures to a representative protein structure using structural alignment. It accepts input in either as protein structures as PDB/mmCIF or protein sequences as [fasta](#create-custom-database-from-protein-sequence-fasta) format, with support for both flat and gzipped files. By default, easy-cluster generates three output files with the following prefixes: (1) `_clu.tsv`, (2) `_repseq.fasta`, and (3) `_allseq.fasta`. The first file (1) is a [tab-separated](#tab-separated-cluster) file describing the mapping from representative to member, while the second file (2) contains only [representative sequences](#representative-fasta), and the third file (3) includes all [cluster member sequences](#all-member-fasta).
Expand Down

0 comments on commit 12b76f3

Please sign in to comment.