MCP-enhanced MOPs Extraction

A multi-stage pipeline for extracting Metal-Organic Polyhedra (MOPs) information from scientific papers using MCP-enhanced LLM agents, producing structured knowledge graphs (TTL).

Setup (detailed)

Prerequisites

Python 3.11+
(Recommended) WSL on Windows for a smoother Linux-like environment
Docker (only if you use MCP tools that require it; some tools are stdio-only)

1) Create a Python environment

# venv
python -m venv .venv
source .venv/bin/activate  # Windows PowerShell: .venv\Scripts\Activate.ps1

# or conda
conda create -n mcp_layer python=3.11
conda activate mcp_layer

2) Install dependencies

pip install -r requirements.txt

3) Bootstrap required runtime folders (important)

This repo git-ignores many runtime folders (caches, logs, generated prompts/scripts).
Some modules (notably models/locations.py) require directories to exist at import time.

Run:

python scripts/bootstrap_repo.py

If you plan to run grounding/lookup agents, also create grounding-cache folders:

python scripts/bootstrap_repo.py --with-grounding-cache ontospecies

4) Configure MCP settings

cp configs/mcp_configs.json.example configs/mcp_configs.json

Then edit configs/mcp_configs.json to reflect your local environment (paths, server commands).

5) Configure LLM credentials (if you run LLM agents)

This repo does not ship a committed .env.example. Create .env in the repo root with what your environment expects. At minimum, many agents expect something like:

API_KEY=...
BASE_URL=...

Exact keys depend on your models/ModelConfig.py / models/LLMCreator.py configuration.

Common folder layout (fresh clone)

After python scripts/bootstrap_repo.py, you should have (among others):

data/ (runtime data, cached results; gitignored)
- data/log/ (required; some modules error if missing)
- data/ontologies/ (place ontology T-Box TTLs here)
- data/grounding_cache/<ontology>/labels (optional; for Script C fuzzy lookup)
raw_data/ (PDF inputs; gitignored)
sandbox/ (scratch scripts; gitignored)
ai_generated_contents*/ (LLM-generated artifacts; gitignored)

Grounding (overview)

There are two “layers”:

Ontology-specific MCP lookup server (generated for a given ontology)
Grounding consumer agent that applies mappings to TTLs

OntoSpecies lookup MCP server

This repo includes configs/grounding.json to run the OntoSpecies lookup server via stdio.

Ground TTLs (single or batch)

The grounding agent lives at src/agents/grounding/grounding_agent.py.

Single file:

python -m src.agents.grounding.grounding_agent --ttl path/to/file.ttl --write-grounded-ttl

Batch folder (recursively processes *.ttl, skipping *_grounded.ttl and *link.ttl):

python -m src.agents.grounding.grounding_agent --batch-dir evaluation/data/merged_tll --write-grounded-ttl

Notes:

Internal merge (deduplicating identical nodes across TTLs) runs by default in batch mode; disable with --no-internal-merge.
Default grounding materialization mode is replace (replaces source_iri with grounded_iri). You can switch to sameas with --grounding-mode sameas.

Main extraction entrypoint

The main pipeline entrypoint is mop_main.py (see its CLI help):

python mop_main.py --help

Prompt + MCP script generation (no `.sh` wrappers)

Use the following canonical Python entrypoints to generate plans, prompts, and MCP scripts.

1) Generate a task division plan (writes `configs/task_division_plan.json`)

python -m src.agents.scripts_and_prompts_generation.task_division_agent \
  --tbox data/ontologies/ontosynthesis.ttl \
  --output configs/task_division_plan.json \
  --model gpt-5

2) Generate KG-building iteration prompts (writes into `ai_generated_contents_candidate/prompts/…`)

python -m src.agents.scripts_and_prompts_generation.task_prompt_creation_agent \
  --version 1 \
  --plan configs/task_division_plan.json \
  --tbox data/ontologies/ontosynthesis.ttl \
  --model gpt-4.1 \
  --parallel 3

3) Generate extraction-scope prompts (writes into `ai_generated_contents_candidate/prompts/…`)

Legacy plan-driven mode (matches the old run_extraction_prompt_creation.sh intent):

python -m src.agents.scripts_and_prompts_generation.task_extraction_prompt_creation_agent \
  --version 1 \
  --plan configs/task_division_plan.json \
  --tbox data/ontologies/ontosynthesis.ttl \
  --model gpt-5 \
  --parallel 3

Iterations-driven mode (uses ontology flags + ai_generated_contents_candidate/iterations/**/iterations.json):

python -m src.agents.scripts_and_prompts_generation.task_extraction_prompt_creation_agent \
  --ontosynthesis \
  --version 1 \
  --model gpt-5 \
  --parallel 3

4) Generate MCP underlying scripts from T-Box (writes into `ai_generated_contents_candidate/scripts/…`)

All ontologies from ape_generated_contents/meta_task_config.json:

python -m src.agents.scripts_and_prompts_generation.mcp_underlying_script_creation_agent --all

Single ontology (by short name or by TTL path):

python -m src.agents.scripts_and_prompts_generation.mcp_underlying_script_creation_agent \
  --ontology ontosynthesis \
  --model gpt-5 \
  --split

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
ape_generated_contents		ape_generated_contents
configs		configs
docs/paper		docs/paper
evaluation		evaluation
mini_marie		mini_marie
models		models
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
generic_main.py		generic_main.py
mop_main.py		mop_main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MCP-enhanced MOPs Extraction

Setup (detailed)

Prerequisites

1) Create a Python environment

2) Install dependencies

3) Bootstrap required runtime folders (important)

4) Configure MCP settings

5) Configure LLM credentials (if you run LLM agents)

Common folder layout (fresh clone)

Grounding (overview)

OntoSpecies lookup MCP server

Ground TTLs (single or batch)

Main extraction entrypoint

Prompt + MCP script generation (no `.sh` wrappers)

1) Generate a task division plan (writes `configs/task_division_plan.json`)

2) Generate KG-building iteration prompts (writes into `ai_generated_contents_candidate/prompts/…`)

3) Generate extraction-scope prompts (writes into `ai_generated_contents_candidate/prompts/…`)

4) Generate MCP underlying scripts from T-Box (writes into `ai_generated_contents_candidate/scripts/…`)

About

Uh oh!

Releases

Packages

Languages

TheWorldAvatar/mcp-tool-layer

Folders and files

Latest commit

History

Repository files navigation

MCP-enhanced MOPs Extraction

Setup (detailed)

Prerequisites

1) Create a Python environment

2) Install dependencies

3) Bootstrap required runtime folders (important)

4) Configure MCP settings

5) Configure LLM credentials (if you run LLM agents)

Common folder layout (fresh clone)

Grounding (overview)

OntoSpecies lookup MCP server

Ground TTLs (single or batch)

Main extraction entrypoint

Prompt + MCP script generation (no .sh wrappers)

1) Generate a task division plan (writes configs/task_division_plan.json)

2) Generate KG-building iteration prompts (writes into ai_generated_contents_candidate/prompts/…)

3) Generate extraction-scope prompts (writes into ai_generated_contents_candidate/prompts/…)

4) Generate MCP underlying scripts from T-Box (writes into ai_generated_contents_candidate/scripts/…)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Prompt + MCP script generation (no `.sh` wrappers)

1) Generate a task division plan (writes `configs/task_division_plan.json`)

2) Generate KG-building iteration prompts (writes into `ai_generated_contents_candidate/prompts/…`)

3) Generate extraction-scope prompts (writes into `ai_generated_contents_candidate/prompts/…`)

4) Generate MCP underlying scripts from T-Box (writes into `ai_generated_contents_candidate/scripts/…`)

Packages