PA-Star is a software that performs a parallel A-Star search to solve the Multiple Sequence Alignment (MSA) problem. This is a complete Rust rewrite of the original C++ implementation.
- Serial A-Star: Classic A-Star algorithm for MSA
- Parallel A-Star: Multi-threaded implementation using work distribution
- Hybrid CPU Support: Optimized for asymmetric processors (Intel 12th-14th Gen)
- Multiple Cost Matrices: Support for PAM250 (proteins) and nucleotide scoring
- Rust 1.70 or later
- Cargo (comes with Rust)
# Clone the repository
git clone https://github.com/vncsmnl/astar_msa_rust
cd astar_msa_rust
# Build in release mode (optimized)
make release
# or
cargo build --release
# Binaries will be copied to bin/
# Also available in target/release/After building with make release, use the binaries directly from bin/:
# Serial A-Star
./bin/msa_astar data/seqs/3/synthetic_easy.fasta
# Parallel A-Star
./bin/msa_pastar data/seqs/4/3pmg_ref1.fasta# Basic usage
cargo run --release --bin msa_astar -- data/seqs/3/synthetic_easy.fasta
# With nucleotide scoring
cargo run --release --bin msa_astar -- -n data/seqs/NUC/EASY_instances/1.fasta
# Save output to file
cargo run --release --bin msa_astar -- -f output.fasta data/seqs/3/synthetic_easy.fasta# Use all available cores
cargo run --release --bin msa_pastar -- data/seqs/4/3pmg_ref1.fasta
# Specify number of threads
cargo run --release --bin msa_pastar -- -t 4 data/seqs/4/3pmg_ref1.fasta
# With hash configuration
cargo run --release --bin msa_pastar -- --hash-type pzorder --hash-shift 10 data/seqs/5/EASY_instances/synthetic_easy.fasta
# Hybrid CPU configuration (Intel 12th Gen example: 8 P-cores, 8 E-cores)
cargo run --release --bin msa_pastar -- --p-cores-num 8 --p-cores-size 1 --e-cores-num 8 --e-cores-size 1 data/seqs/4/3pmg_ref1.fasta<FILE>: Input FASTA file (required)-f, --output-file <FILE>: Output FASTA file with alignment-n, --nucleotide: Use nucleotide cost matrix (default: PAM250 for proteins)
-t, --threads <NUM>: Number of threads (default: number of CPUs)--hash-type <TYPE>: Hash type: fzorder, pzorder, fsum, psum (default: fzorder)--hash-shift <NUM>: Hash shift value (default: 8)--no-affinity: Disable thread affinity--affinity <LIST>: Thread affinity list (comma-separated core IDs)--p-cores-num <NUM>: Number of P-cores (hybrid CPU)--p-cores-size <NUM>: Size of P-core groups (hybrid CPU)--e-cores-num <NUM>: Number of E-cores (hybrid CPU)--e-cores-size <NUM>: Size of E-core groups (hybrid CPU)
# Easy test
cargo run --release --bin msa_astar -- data/seqs/Benchmark/1gpb_cutted.fasta
# Medium test with 2 threads
cargo run --release --bin msa_pastar -- -t 2 data/seqs/4/3pmg_ref1.fasta
# Nucleotide alignment
cargo run --release --bin msa_astar -- -n data/seqs/NUC/SARS-COV-2_2/all.fasta
# Save output
cargo run --release --bin msa_pastar -- -f aligned.fasta data/seqs/3/synthetic_veryeasy.fastaThe Rust implementation provides:
- Memory safety without garbage collection
- Zero-cost abstractions
- Fearless concurrency with data race prevention
- Performance comparable to or better than C++
The project is organized into modules:
coord: Multidimensional coordinatesnode: Search space nodescost: Alignment cost matricessequences: Sequence managementheuristic_hpair: Pairwise alignment heuristicastar: Serial A-Star algorithmpastar: Parallel A-Star algorithmpriority_list: Priority queue implementationbacktrace: Alignment reconstruction
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_coord_creation# Build with optimizations
cargo build --release
# Time an alignment
time ./target/release/msa_pastar data/seqs/5/EASY_instances/synthetic_easy.fastaMIT License - See LICENSE.txt
- Original C++ version: Daniel Sundfeld
- Rust port: [Current maintainer]
If you use PA-Star in your research, please cite:
[Original PA-Star paper citation]
Contributions are welcome! Please feel free to submit pull requests.
- Memory Safety: Rust's ownership system eliminates many classes of bugs
- Concurrency: Using Rayon for data parallelism instead of manual thread management
- Dependencies: Using modern Rust crates instead of Boost
- Type Safety: Const generics for compile-time sequence number checking
- Error Handling: Using Result types instead of exceptions
If you encounter build errors, ensure:
- Rust version is 1.70 or later:
rustc --version - Dependencies are up to date:
cargo update
For optimal performance:
- Always build in release mode:
--release - Adjust thread count for your CPU
- Use appropriate hash function and shift values
For large sequences:
- Monitor memory usage
- Adjust thread count if needed
- Consider using a machine with more RAM