You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to run the following code using Google Colab's GPU, but while the GPU is briefly utilized, it is mostly not being used, which is causing problems. Could you provide any suggestions for improvement?
Compute score and coverage
Query database size: 25329 type: Aminoacid
Target database size: 25329 type: Aminoacid
Calculation of alignments
^C
ls: cannot access './mmseqs_work/search_result.m8': No such file or directory
No .m8 file found!
Error: MMseqs2 output file not found at ./mmseqs_work/search_result.m8
The text was updated successfully, but these errors were encountered:
It seems like it is computing the SW alignment here. This might be slow on Colab because the cores are very weak. What exactly do you need? Score only or do you need the full alignment?
I want to run the following code using Google Colab's GPU, but while the GPU is briefly utilized, it is mostly not being used, which is causing problems. Could you provide any suggestions for improvement?
必要なツールをインストール
!apt-get update -qq
!apt-get install -y -qq wget tar cmake build-essential
MMseqs2 (GPU版) をダウンロードして展開
!wget https://mmseqs.com/latest/mmseqs-linux-gpu.tar.gz -O mmseqs-linux-gpu.tar.gz
!tar xvzf mmseqs-linux-gpu.tar.gz
!mv mmseqs/bin/mmseqs /usr/local/bin/
CUDAツールをインストール
!apt-get install -y -qq nvidia-cuda-toolkit
!nvcc --version # CUDAがインストールされているか確認
PyCUDAとその他のPythonライブラリをインストール
!pip install -q pycuda biopython pandas
Google ColabでのGPU利用状況を確認
!nvidia-smi
MMseqs2ワークディレクトリを作成
import os
work_dir = "./mmseqs_work"
os.makedirs(work_dir, exist_ok=True)
入力FASTAファイルを指定
input_fasta = "/content/Book2test.fasta" # 必要に応じてファイルパスを変更してください
MMseqs2データベースの作成(1回のみ)
!mmseqs createdb {input_fasta} {work_dir}/db
データベースをGPU対応フォーマットに変換(makepaddedseqdbを使用)
!mmseqs makepaddedseqdb {work_dir}/db {work_dir}/db_gpu
自身に対してペアワイズ検索(GPUを使用)
search_result_path = os.path.join(work_dir, "search_result")
tmp_dir = os.path.join(work_dir, "tmp")
os.makedirs(tmp_dir, exist_ok=True)
!mmseqs search {work_dir}/db {work_dir}/db_gpu {search_result_path} {tmp_dir}
--min-seq-id 0.8 --threads 4 --search-type 3 --gpu 1 || echo "Search failed!"
.m8ファイルが生成されているか確認
!ls {search_result_path}.m8 || echo "No .m8 file found!"
出力結果を解析
import pandas as pd
from Bio import SeqIO
MMseqs2出力ファイルを指定
search_result_m8 = f"{search_result_path}.m8" # MMseqs2出力ファイルのパス
MMseqs2出力形式を読み込む
columns = ["query", "target", "pident", "alnlen", "mismatch", "gapopen", "qstart", "qend", "tstart", "tend", "evalue", "bits"]
try:
results = pd.read_csv(search_result_m8, sep="\t", names=columns)
except FileNotFoundError:
print(f"Error: MMseqs2 output file not found at {search_result_m8}")
except Exception as e:
print(f"Unexpected error: {e}")
low deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Profile output mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 3
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
Translation mode 0
ungappedprefilter ./mmseqs_work/db ./mmseqs_work/db_gpu ./mmseqs_work/tmp/14843528504956813129/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.001 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 300 --db-load-mode 0 --gpu 1 --gpu-server 0 --prefilter-mode 1 --threads 4 --compressed 0 -v 3
[=================================================================] 100.00% 25.33K 3m 2s 739ms
Time for merging to pref_0: 0h 0m 0s 4ms
Time for processing: 0h 3m 2s 790ms
align ./mmseqs_work/db ./mmseqs_work/db_gpu ./mmseqs_work/tmp/14843528504956813129/pref_0 ./mmseqs_work/search_result --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.8 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 4 --compressed 0 -v 3
Compute score and coverage
Query database size: 25329 type: Aminoacid
Target database size: 25329 type: Aminoacid
Calculation of alignments
^C
ls: cannot access './mmseqs_work/search_result.m8': No such file or directory
No .m8 file found!
Error: MMseqs2 output file not found at ./mmseqs_work/search_result.m8
The text was updated successfully, but these errors were encountered: