Galaxy Agent XP-II is a FastAPI-powered AI assistant that recommends Galaxy tools and workflows based on natural language queries. It integrates Galaxy's BioBlend API, GitHub-hosted workflows, and large language models (LLMs) such as Gemini and SBERT embeddings to provide intelligent recommendations.
- Features
- Architecture
- Quick Start
- API Endpoints
- How It Works
- Example Usage
- Tech Stack
- Security Notes
- Galaxy Tool Recommendation – Suggest relevant Galaxy tools for a given bioinformatics task
- Workflow Recommendation – Recommend publicly available Galaxy workflows (from GitHub) with descriptions, scores, and download links
- Secure Configuration – Uses
.envfor API keys andconfig.ymlfor paths - Unified Recommendation Endpoint –
/recommendmerges tool & workflow suggestions in one API call - Gemini-powered Query Classification – Automatically determines if a query is about tools, workflows, or both
- Structured JSON Responses – Easy to consume for frontend or third-party applications
- FastAPI Backend – RESTful API endpoints for easy integration with frontend or other services
- Embeddings-based Search – Uses
intfloat/e5-largefor semantic search
Client → [Merged FastAPI endpoint] → [Gemini "classifier" layer]
↙ ↘
ToolSuggestionAgent --------------------> WorkflowSuggestionAgent
↘ ↙
Summarizers (optional)
↓
API Response
- Client sends a natural-language query to
/recommend - Gemini classifier analyzes whether the query relates to:
- Tools
- Workflows
- Both
- Routes the query to:
ToolSuggestionAgentWorkflowSuggestionAgent- or both
- Returns a unified JSON response
- Python 3.9+
- Galaxy API Key – Create one from https://usegalaxy.eu/ (User → Preferences → API Key)
- GitHub Personal Access Token – For higher GitHub API rate limits
- Gemini API Key – For advanced LLM-based analysis (optional)
Clone and set up the environment:
git clone https://github.com/iCog-Labs-Dev/galaxy-agent-xp-II.git
cd galaxy-agent-xp-II
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root:
# Create a .env file and copy this in it
GALAXY_URL='https://usegalaxy.eu/'
GALAXY_API_KEY='GALAXY_API_KEY_HERE' # Replace with your Galaxy API key
GEMINI_API_KEY='YOUR_GEMINI_API_KEY_HERE' # Replace with your Gemini API key
GITHUB_TOKEN='YOUR_GITHUB_TOKEN_HERE' # GitHub personal access token for higher rate limitsUpdate config.yml if you want to customize paths:
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
data_to_encode_path: str = "data/tools_metadata.json" # Raw Galaxy tools metadata
embeddings_path: str = "embeddings/galaxy_embeddings.npy" # Precomputed embeddings
metadata_path: str = "embeddings/galaxy_metadata.json" # Cleaned metadata for lookup# Fetch and preprocess Galaxy tools metadata
python utilities/tools_metadata_downloader/run.py
# Fetch and preprocess publicly available workflows
python utilities/workflow_downloader/run.pyGenerate embeddings for semantic search:
# Embed tools
python agents/scripts/embed_tools.py --input utilities/tools_metadata_downloader/data/preprocessed_tools_{timestamp}.json
# Embed workflows
python agents/scripts/embed_workflows.py --input utilities/workflow_downloader/data/preprocessed_workflows_{timestamp}.jsonNote: Update
config.ymlto point to the generated embedding and metadata files.
uvicorn agents.app:app --reloadServer runs at: http://127.0.0.1:8000
GET /health
Response:
{"status": "ok"}POST /suggest
Request:
{
"query": "align sequences to a reference genome",
"top_k": 5
}Response:
{
"results": [
{
"id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2",
"name": "BWA-MEM",
"description": "Align sequences to a reference genome",
"help": "Performs sequence alignment using BWA-MEM algorithm",
"category": "Alignment",
"version": "0.7.17.2",
"score": 0.92
}
]
}POST /suggest-workflows
Request:
{
"query": "bacterial genome assembly pipeline",
"top_k": 3
}Response:
{
"results": [
{
"name": "bacterial-genome-assembly",
"category": "genome-assembly",
"tools_used": ["Shovill", "Bandage Info", "Bandage Image", "ToolDistillator", "ToolDistillator Summarize"],
"download_url": "https://raw.githubusercontent.com/galaxyproject/iwc/main/workflows/genome-assembly/bacterial-genome-assembly/bacterial_genome_assembly.ga",
"readme_excerpt": "Bacterial genome assembly workflow for paired end data...",
"score": 0.8218
}
]
}POST /recommend
Simpler endpoint where users don't need to specify if their query is about tools or workflows. Gemini will classify the query and return the appropriate response.
Request Body:
{
"query": "I want to perform RNA-seq alignment",
"top_k": 5
}Possible Responses:
- If Gemini detects a Tool query
"tool_results": [...]- If Gemini detects a Workflow query
"workflow_results": [...]- If Gemini detects Both
{
"type": "both",
"tool_results": [...],
"workflow_results": [...]
}-
Data Collection
- Galaxy tools fetched via BioBlend
- Public workflows retrieved from Galaxy IWC GitHub
-
Embedding & Indexing
- Tools & workflows metadata converted to vector embeddings using intfloat/e5-large or Gemini Embeddings
-
AI-Powered Search
- User queries converted to embeddings
- Cosine similarity search finds the most relevant tools and workflows
-
FastAPI Service
- Single API layer to handle requests from frontend or CLI
-
Gemini Classification
- The internal Gemini layer classifies the query into tool, workflow, or both before routing
- Independent Use: You can still call
/tools/suggestand/workflows/suggestindependently if you already know the type of query - Merged Endpoint:
/recommendis ideal for frontend clients or end users who simply provide a natural-language query
After starting the FastAPI server:
curl -X POST "http://127.0.0.1:8000/suggest-tools" \
-H "Content-Type: application/json" \
-d '{"query":"RNA sequencing analysis","top_k":3}'- FastAPI – Backend API
- Gemini LLM – Query classification
- BioBlend – Communication with Galaxy API
- Galaxy Platform – Public instance: usegalaxy.eu
- Never commit your real
.envfile to GitHub - Use the provided sample as a template only
- For production, use environment variable secrets or a vault service