Skip to content

iCog-Labs-Dev/galaxy-agent-xp-II

Repository files navigation

Galaxy Agent XP-II – AI-Powered Galaxy Tool & Workflow Recommender

Galaxy Agent XP-II is a FastAPI-powered AI assistant that recommends Galaxy tools and workflows based on natural language queries. It integrates Galaxy's BioBlend API, GitHub-hosted workflows, and large language models (LLMs) such as Gemini and SBERT embeddings to provide intelligent recommendations.

Table of Contents

Features

  • Galaxy Tool Recommendation – Suggest relevant Galaxy tools for a given bioinformatics task
  • Workflow Recommendation – Recommend publicly available Galaxy workflows (from GitHub) with descriptions, scores, and download links
  • Secure Configuration – Uses .env for API keys and config.yml for paths
  • Unified Recommendation Endpoint/recommend merges tool & workflow suggestions in one API call
  • Gemini-powered Query Classification – Automatically determines if a query is about tools, workflows, or both
  • Structured JSON Responses – Easy to consume for frontend or third-party applications
  • FastAPI Backend – RESTful API endpoints for easy integration with frontend or other services
  • Embeddings-based Search – Uses intfloat/e5-large for semantic search

Architecture

Client → [Merged FastAPI endpoint] → [Gemini "classifier" layer]
       ↙                                      ↘
  ToolSuggestionAgent -------------------->  WorkflowSuggestionAgent
                    ↘                         ↙
                       Summarizers (optional)
                                ↓
                           API Response

Workflow

  1. Client sends a natural-language query to /recommend
  2. Gemini classifier analyzes whether the query relates to:
    • Tools
    • Workflows
    • Both
  3. Routes the query to:
    • ToolSuggestionAgent
    • WorkflowSuggestionAgent
    • or both
  4. Returns a unified JSON response

Quick Start

1️⃣ Prerequisites

  • Python 3.9+
  • Galaxy API Key – Create one from https://usegalaxy.eu/ (User → Preferences → API Key)
  • GitHub Personal Access Token – For higher GitHub API rate limits
  • Gemini API Key – For advanced LLM-based analysis (optional)

2️⃣ Installation

Clone and set up the environment:

git clone https://github.com/iCog-Labs-Dev/galaxy-agent-xp-II.git
cd galaxy-agent-xp-II

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate    # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3️⃣ Configuration

Environment Variables

Create a .env file in the project root:

# Create a .env file and copy this in it

GALAXY_URL='https://usegalaxy.eu/'
GALAXY_API_KEY='GALAXY_API_KEY_HERE'        # Replace with your Galaxy API key
GEMINI_API_KEY='YOUR_GEMINI_API_KEY_HERE'   # Replace with your Gemini API key
GITHUB_TOKEN='YOUR_GITHUB_TOKEN_HERE'       # GitHub personal access token for higher rate limits

Application Configuration

Update config.yml if you want to customize paths:

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    data_to_encode_path: str = "data/tools_metadata.json"  # Raw Galaxy tools metadata
    embeddings_path: str = "embeddings/galaxy_embeddings.npy"  # Precomputed embeddings
    metadata_path: str = "embeddings/galaxy_metadata.json"      # Cleaned metadata for lookup

4️⃣ Data Preparation & Embeddings

Fetch Galaxy Tools & Workflows

# Fetch and preprocess Galaxy tools metadata
python utilities/tools_metadata_downloader/run.py

# Fetch and preprocess publicly available workflows
python utilities/workflow_downloader/run.py

Generate Embeddings

Generate embeddings for semantic search:

# Embed tools
python agents/scripts/embed_tools.py --input utilities/tools_metadata_downloader/data/preprocessed_tools_{timestamp}.json

# Embed workflows
python agents/scripts/embed_workflows.py --input utilities/workflow_downloader/data/preprocessed_workflows_{timestamp}.json

Note: Update config.yml to point to the generated embedding and metadata files.

5️⃣ Run the FastAPI Backend

uvicorn agents.app:app --reload

Server runs at: http://127.0.0.1:8000

API Endpoints

Health Check

GET /health

Response:

{"status": "ok"}

Tool Recommendation

POST /suggest

Request:

{
  "query": "align sequences to a reference genome",
  "top_k": 5
}

Response:

{
  "results": [
    {
      "id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2",
      "name": "BWA-MEM",
      "description": "Align sequences to a reference genome",
      "help": "Performs sequence alignment using BWA-MEM algorithm",
      "category": "Alignment",
      "version": "0.7.17.2",
      "score": 0.92
    }
  ]
}

Workflow Recommendation

POST /suggest-workflows

Request:

{
  "query": "bacterial genome assembly pipeline",
  "top_k": 3
}

Response:

{
  "results": [
    {
      "name": "bacterial-genome-assembly",
      "category": "genome-assembly",
      "tools_used": ["Shovill", "Bandage Info", "Bandage Image", "ToolDistillator", "ToolDistillator Summarize"],
      "download_url": "https://raw.githubusercontent.com/galaxyproject/iwc/main/workflows/genome-assembly/bacterial-genome-assembly/bacterial_genome_assembly.ga",
      "readme_excerpt": "Bacterial genome assembly workflow for paired end data...",
      "score": 0.8218
    }
  ]
}

Unified Recommendation Endpoint

POST /recommend

Simpler endpoint where users don't need to specify if their query is about tools or workflows. Gemini will classify the query and return the appropriate response.

Request Body:

{
  "query": "I want to perform RNA-seq alignment",
  "top_k": 5
}

Possible Responses:

  • If Gemini detects a Tool query
"tool_results": [...]
  • If Gemini detects a Workflow query
"workflow_results": [...]
  • If Gemini detects Both
{
  "type": "both",
  "tool_results": [...],
  "workflow_results": [...]
}

How It Works

  1. Data Collection

  2. Embedding & Indexing

    • Tools & workflows metadata converted to vector embeddings using intfloat/e5-large or Gemini Embeddings
  3. AI-Powered Search

    • User queries converted to embeddings
    • Cosine similarity search finds the most relevant tools and workflows
  4. FastAPI Service

    • Single API layer to handle requests from frontend or CLI
  5. Gemini Classification

    • The internal Gemini layer classifies the query into tool, workflow, or both before routing

Key Notes

  • Independent Use: You can still call /tools/suggest and /workflows/suggest independently if you already know the type of query
  • Merged Endpoint: /recommend is ideal for frontend clients or end users who simply provide a natural-language query

Example Usage

After starting the FastAPI server:

curl -X POST "http://127.0.0.1:8000/suggest-tools" \
    -H "Content-Type: application/json" \
    -d '{"query":"RNA sequencing analysis","top_k":3}'

Tech Stack

  • FastAPI – Backend API
  • Gemini LLM – Query classification
  • BioBlend – Communication with Galaxy API
  • Galaxy Platform – Public instance: usegalaxy.eu

Security Notes

  • Never commit your real .env file to GitHub
  • Use the provided sample as a template only
  • For production, use environment variable secrets or a vault service

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5