My machine with built-in gcc <6. Even if I set LD_LIBRARY_PATH and PATH to use new gcc, the source code still tried to use old gcc in /usr/bin and failed compilation.
The key is setting LD_LIBRARY_PATH and PATH before typing cmake ./
. The easiest way maybe:
- Delete the current source code using commands
rm -rf anchorwave
. - Set
LD_LIBRARY_PATH
,PATH
,CC
, andGCC
to the new gcc. - Re-clone the AnchorWave repository and compile it.
Please check log information from the cmake ./
command.
-- The C compiler identification is GNU
*****
-- The CXX compiler identification is GNU
*****
Was C
and CXX
were correctly recognized by cmake? Was their version larger than or equal with 7.0?
If you have newer version of GNU gcc installed, but cmake could recognize them correctly.
I used the following commands to tell cmake where to find the correct GNU gcc
export LD_LIBRARY_PATH=/usr/local/gcc-7.3.0/lib64:/usr/local/gcc-7.3.0/lib:$LD_LIBRARY_PATH
export CC=/usr/local/gcc-7.3.0/bin/gcc
export CXX=/usr/local/gcc-7.3.0/bin/g++
On a different computer, you need figure out where are those programms located.
To save memory and increase the number of threads running in parallel, AnchorWave does not catch genome sequences in memory, but read them on demand for a lot of times.
Please avoid using network storage if it is possible. Or if you have a big memory, you could copy the genome file, especially the query genome file into memory, /dev/shm
.
Genome masking is not expected to improve the performance of AnchorWave.
AnchorWave do not utilize any soft masking information. Hard masking would increase the
computational cost of AnchorWave.
anchorwave: /net/eichler/vol26/projects/primate_sv/nobackups/Tools/anchorwave/src/service/TransferGffWithNucmerResult.cpp:203: void readSam(std::vector<AlignmentMatch>&, std::ifstream&, std::map<std::__cxx11::basic_string<char>, Transcript>&, int&, const double&, double&, std::set<std::__cxx11::basic_string<char> >&, const string&, int32_t&, int32_t&, int32_t&, int32_t&, int&, bool&, int&, std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&): Assertion 'databaseStart < databaseEnd' failed.
Aborted (core dumped)
We observed three possibilities that cause this problem:
- AnchorWave may check the open reading frame of the input GFF file and genome sequence. If there are many transcript annotations that do not start with a start codon, do not end with a stop codon, have premature stop codon or non-standard splice sites, AnchorWave maybe fail.
- If inputting the CDS fasta file is generated by other applications, AnchorWave could fail. The
anchorwave gff2seq
command filter some short CDS records from the GFF file, to reduce problem that minimap2 could not deal with short CDS very well. Please use the fasta file generated byanchorwave gff2seq
command as input for minimap2 andanchorwave proali
oranchorwave genoAli
. - If you set parameters
-x
or-m
foranchorwave gff2seq
, please set the identical parameter foranchorwave proali
oranchorwave genoAli
.
The genoAli
function aligns the query chromosome sequence against the reference chromosome with the same name. Those pairs of sequences with identical names should be similar to each other.
- Some assemblies contain contigs or scaffolds. Those contigs or scaffolds with identical names from the reference genome and query genome are not similar to each other. They should be removed from the input file before performing genome alignment.
- Some assemblies concatenate those contigs or scaffolds together as chr0/chr00 or something similar. Those chr0/chr00s from the reference genome and query genome should not be aligned using the global alignment strategy implemented as
proali
function.