This file provides guidance for AI coding agents working on the markdown-it-py repository.
markdown-it-py is a Python port of markdown-it, the JavaScript Markdown parser. It provides:
- A Markdown parser following the CommonMark spec
- Configurable syntax: you can add new rules and even replace existing ones
- Pluggable architecture with support for syntax extensions (see mdit-py-plugins)
- High performance with efficient parsing algorithms
- Safe by default with configurable HTML handling
markdown-it-py is designed as a foundation for projects requiring robust Markdown parsing in Python, with the same design principles as the original JavaScript implementation.
pyproject.toml # Project configuration and dependencies (flit)
tox.ini # Tox test environment configuration (use with tox-uv for faster env creation)
markdown_it/ # Main source code
├── __init__.py # Package init
├── main.py # MarkdownIt main class
├── token.py # Token dataclass
├── ruler.py # Ruler class for managing rules
├── tree.py # SyntaxTreeNode for AST representation
├── renderer.py # RendererHTML and RendererProtocol
├── parser_core.py # ParserCore - top-level rules executor
├── parser_block.py # ParserBlock - block-level tokenizer
├── parser_inline.py # ParserInline - inline tokenizer
├── utils.py # Utility types (OptionsType, PresetType, etc.)
├── common/ # Common utilities
├── helpers/ # Helper functions
├── presets/ # Configuration presets (commonmark, gfm-like, zero, etc.)
├── rules_core/ # Core parsing rules
├── rules_block/ # Block-level parsing rules
├── rules_inline/ # Inline parsing rules
├── cli/ # Command-line interface
└── py.typed # PEP 561 marker
tests/ # Test suite
├── test_api/ # API tests
├── test_cmark_spec/ # CommonMark spec compliance tests
├── test_port/ # Port-specific tests
├── test_tree/ # SyntaxTreeNode tests
├── fuzz/ # Fuzzing tests for OSS-Fuzz
├── test_cli.py # CLI tests
├── test_linkify.py # Linkify tests
└── test_tree.py # Tree tests
docs/ # Documentation source
├── conf.py # Sphinx configuration
├── index.md # Documentation index
├── architecture.md # Design principles
├── using.md # Usage guide
├── plugins.md # Plugin documentation
├── contributing.md # Contributing guide
├── performance.md # Performance benchmarks
└── security.md # Security considerations
benchmarking/ # Performance benchmarking
scripts/ # Utility scripts
All commands should be run via tox for consistency. The project uses tox-uv for faster environment creation.
# Run all tests
tox
# Run tests with specific Python version
tox -e py311
# Run tests with plugins
tox -e py311-plugins
# Run a specific test file
tox -- tests/test_api/test_main.py
# Run a specific test function
tox -- tests/test_api/test_main.py::test_get_rules
# Run tests with coverage
tox -- --cov=markdown_it --cov-report=html# Build docs (clean)
tox -e docs-clean
# Build docs (incremental)
tox -e docs-update
# Specific builder (e.g., linkcheck)
BUILDER=linkcheck tox -e docs-update# Run core benchmarks
tox -e py311-bench-core
# Run package comparison benchmarks
tox -e py311-bench-packages
# Run profiler
tox -e profile# Run fuzzer on testcase file
tox -e fuzz path/to/testcase# Run pre-commit hooks on all files
pre-commit run --all-files
# Type checking (via pre-commit)
pre-commit run mypy --all-files
# Linting and formatting (via pre-commit)
pre-commit run ruff --all-files
pre-commit run ruff-format --all-files- Formatter/Linter: Ruff (configured in
pyproject.toml) - Type Checking: Mypy with strict settings (configured in
pyproject.toml) - Pre-commit: Use pre-commit hooks for consistent code style (
.pre-commit-config.yaml)
- Type annotations: Use complete type annotations for all function signatures. The codebase uses strict mypy settings.
- Docstrings: Use Google-style or Sphinx-style docstrings. Types are not required in docstrings as they should be in type hints.
- Pure functions: Where possible, write pure functions without side effects.
- Immutability: Prefer immutable data structures. The
Tokenclass uses dataclass with appropriate mutability. - Testing: Write tests for all new functionality. Use
pytest-regressionsfor output comparison tests.
from __future__ import annotations
from typing import Sequence
def parse_blocks(
state: StateBlock,
start_line: int,
end_line: int,
silent: bool = False
) -> bool:
"""Parse block-level content.
:param state: The parser state object
:param start_line: Starting line number
:param end_line: Ending line number
:param silent: If True, only validate without generating tokens
:return: True if parsing succeeded
"""
...markdown-it-py follows a multi-stage parsing pipeline:
Markdown → Tokens → HTML
The parsing happens through three nested chains:
- Core Chain (
parser_core.py): Top-level rules that orchestrate the parsing - Block Chain (
parser_block.py): Parse block-level content (headings, lists, code blocks, etc.) - Inline Chain (
parser_inline.py): Parse inline content (emphasis, links, code spans, etc.)
Instead of a traditional AST, markdown-it-py uses a token stream representation:
- Tokens are a simple sequence (list)
- Opening and closing tags are separate tokens
- Inline containers have nested tokens in their
.childrenproperty - This design follows the KISS principle and allows easy manipulation
The main entry point for parsing:
parse(): Parse markdown and return token streamrender(): Parse and render to HTMLuse(): Add pluginsenable()/disable(): Control rulesset(): Set options
Manages parsing rules:
- Rules can be enabled/disabled by name
- Rules can be inserted at specific positions
- Each parser (core/block/inline) has its own Ruler instance
Represents a single token in the stream:
type: Token type (e.g., "paragraph_open", "text", "heading_close")tag: HTML tag to use for renderingattrs: Attributes for the HTML tagcontent: Raw contentchildren: Nested tokens for inline containerslevel: Nesting level
Converts token stream to HTML:
render(): Convert full token stream to HTMLrenderToken(): Render a single token- Custom render rules can be added via
add_render_rule()
Input Markdown
↓
Core Rules (normalize, etc.)
↓
Block Parser → Block Tokens
↓
Core Rules (intermediate)
↓
Inline Parser → Inline Tokens (for each block token with "inline" type)
↓
Core Rules (final: abbreviations, footnotes, linkify, etc.)
↓
Token Stream
↓
Renderer
↓
HTML Output
- Tests use
pytestwith fixtures fromconftest.pyfiles - CommonMark spec tests are in
tests/test_cmark_spec/ - Port-specific tests verify JavaScript markdown-it parity
- Regression testing uses
pytest-regressionsfor output comparison - Fuzzing tests are in
tests/fuzz/for integration with OSS-Fuzz
- For API tests, add to appropriate file in
tests/test_api/ - For new syntax/rules, add test cases to
tests/test_port/ - For CommonMark compliance, run the spec test updater
- Use
file_regressionfixture for comparing output against stored fixtures - Use parameterization for multiple test scenarios
- Test coverage: Write tests for all new functionality and bug fixes
- Isolation: Each test should be independent
- Descriptive names: Test function names should describe what is being tested
- Regression testing: Use
file_regression.check()for complex output comparisons - Parametrization: Use
@pytest.mark.parametrizefor multiple test scenarios
import pytest
from markdown_it import MarkdownIt
def test_basic_parsing():
md = MarkdownIt()
result = md.render("# Heading\n\nParagraph")
assert "<h1>Heading</h1>" in result
assert "<p>Paragraph</p>" in result
@pytest.mark.parametrize(
"input_text,expected",
[
("**bold**", "<strong>bold</strong>"),
("*italic*", "<em>italic</em>"),
]
)
def test_emphasis(input_text, expected):
md = MarkdownIt()
result = md.render(input_text)
assert expected in resultUse this format:
<EMOJI> <KEYWORD>: Summarize in 72 chars or less (#<PR>)
Optional detailed explanation.
Keywords:
✨ NEW:– New feature🐛 FIX:– Bug fix👌 IMPROVE:– Improvement (no breaking changes)‼️ BREAKING:– Breaking change📚 DOCS:– Documentation🔧 MAINTAIN:– Maintenance changes only (typos, etc.)🧪 TEST:– Tests or CI changes only♻️ REFACTOR:– Refactoring
Use the same as for the commit message format, but for the title you can omit the KEYWORD and only use EMOJI.
When submitting changes:
- Description: Include a meaningful description or link explaining the change
- Tests: Include test cases for new functionality or bug fixes
- Documentation: Update docs if behavior changes or new features are added
- Changelog: Update
CHANGELOG.mdunder the appropriate section - Code Quality: Ensure
pre-commit run --all-filespasses - Type Checking: Ensure mypy passes with strict settings
- CommonMark Compliance: Don't break existing CommonMark spec tests unless intentional
pyproject.toml- Project configuration, dependencies, and tool settings (Ruff, Mypy)tox.ini- Test environment configurationmarkdown_it/main.py-MarkdownItmain classmarkdown_it/token.py-Tokendataclassmarkdown_it/renderer.py- HTML renderermarkdown_it/parser_core.py- Core parsing rulesmarkdown_it/parser_block.py- Block-level parsermarkdown_it/parser_inline.py- Inline parsermarkdown_it/ruler.py-Rulerclass for managing rulesmarkdown_it/utils.py- Type definitions and utilitiesmarkdown_it/presets/- Configuration presets (commonmark, gfm-like, zero)
- Use the CLI with
markdown-itcommand to test parsing interactively - Check token stream with
md.parse(text)to see tokens before rendering - Use
md.render(text)to see final HTML output - Enable specific rules with
md.enable(['rule_name']) - Disable rules with
md.disable(['rule_name']) - Use
tox -- -v --tb=longfor verbose test output with full tracebacks - Check the Live demo (JavaScript version) to compare behavior
from markdown_it import MarkdownIt
md = MarkdownIt()
# See the token stream
tokens = md.parse("# Heading\n\nParagraph")
for token in tokens:
print(f"{token.type} | {token.tag} | {token.content}")
# See available rules
print(md.get_all_rules())
# Enable/disable specific rules
md.disable(['emphasis'])
result = md.render("*text*") # Won't be emphasized- Determine which parser the rule belongs to (core/block/inline)
- Create rule function in appropriate
rules_*/directory - Rule signature for block rules:
def rule_name(state: StateBlock, startLine: int, endLine: int, silent: bool) -> bool: ...
- Rule signature for inline rules:
def rule_name(state: StateInline, silent: bool) -> bool: ...
- Register the rule in the appropriate parser's
__init__method - Add tests for the new rule
- Update documentation if it's a user-facing feature
- Create a plugin function that receives the
MarkdownItinstance:def my_plugin(md: MarkdownIt, **options): """My custom plugin.""" # Add rules md.block.ruler.before("fence", "my_block_rule", my_block_rule) md.inline.ruler.after("emphasis", "my_inline_rule", my_inline_rule) # Add render rules md.add_render_rule("my_token_type", render_my_token)
- Use the plugin with
md.use(my_plugin, option1=value1) - See existing plugins in mdit-py-plugins for examples
from markdown_it import MarkdownIt
def render_custom_link(self, tokens, idx, options, env):
tokens[idx].attrSet("target", "_blank")
tokens[idx].attrSet("rel", "noopener noreferrer")
return self.renderToken(tokens, idx, options, env)
md = MarkdownIt()
md.add_render_rule("link_open", render_custom_link)