Note: This repository is based on teaching materials from the Bioinformatics Algorithms course at the University of Edinburgh (Spring 2025). See full credits below.
Blast101 is a Python command-line tool that mimics the core functionality of the BLAST algorithm alongside Smith-Waterman local alignment. It was initially developed for teaching purposes and uses fundamental bioinformatics algorithms, simple Python logic, and custom scoring heuristics.
This repository represents my extension of the original teaching code through:
- Building a command-line interface (CLI) with multiple modes
- Writing unit tests for key modules
- Improving structure, usability, and documentation
- Adding input validation (e.g., protein/DNA detection)
- Making the tool runnable from the terminal (outside an IDE)
- 🧬 BLAST-like word-based search with configurable word size
- 🧪 Smith-Waterman local alignment scoring
- ⚙️ Customisable via
settings.ini - 🔎 Validates FASTA inputs, including DNA-vs-protein detection
- 🧪 Unit tests using
unittestframework - 🖥️ Easy CLI interface with usage guidance
To run a basic BLAST101 alignment:
python run_blast101.py --query nanog.fasta --database uniprot_bit2.fasta --mode blast --verbose| Mode | Description |
|---|---|
blast |
Run BLAST101 alignment |
sw |
Run Smith-Waterman alignment |
stats |
Run statistical scoring evaluation |
test |
Run all unit |
To view CLI help and examples:
python run_blast101.py --helpRun all tests with:
python run_blast101.py --mode testTest suite includes:
- Smith-Waterman scoring edge cases
- FASTA parsing with malformed/partial inputs
- Dictionary creation from sequences
- Core BLAST101 functionality
| File/Folder | Purpose |
|---|---|
run_blast101.py |
CLI script to control the entire app |
blast_101_search.py |
Heuristic BLAST-like alignment |
smith_waterman_p.py |
Smith-Waterman scoring implementation |
process_fasta_file.py |
FASTA parser and validator |
test_*.py |
Test modules for individual components |
programme_settings.py |
Default configuration |
settings.ini |
Editable scoring parameters |
logs/ |
Logs of alignments and results |
nanog.fasta, uniprot_bit2.fasta |
Example input files |
requirements.txt |
List of dependencies (minimal) |
This project is licensed under the MIT License
© 2025 Simon Tomlinson, University of Edinburgh
You are free to reuse, modify, and distribute this code under the terms of the license. Please include appropriate attribution in any reuse or derivative work.
🧑🏫 Simon Tomlinson – Original author of the code for the Bioinformatics Algorithms MSc course (2025)
👩💻 cemileblks – Code extensions, CLI implementation, tests, and GitHub curation
📘 Based on ICA coursework submitted for grading, later modified for public release
This repository contains beta teaching code and is not intended for production use in real-world bioinformatics pipelines. Accuracy, speed, and feature-completeness may be limited.