[WIP] Adding Arrow support for Trino's spooling protocol #7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Arrow Spooling Support for aiotrino
🚀 Overview
This pull request introduces Apache Arrow spooling support to aiotrino, built on top of Aiotrino's Segment cursor, enabling dramatically improved performance for large query result retrieval through columnar data format and parallel processing.
✨ Key Features
🏃♂️ Performance Improvements
🎯 Core Functionality
arrow+zstd(recommended) andarrowencoding optionsfetchall_arrow()- Fast retrieval of complete result setsfetchone_arrow()- Single segment retrieval as Arrow Table📊 Performance Benchmarks
Real-World Performance Gains
Benchmark Environment Details
Single Docker Container Test:
Small Cluster Test:
SELECT * FROM iceberg_table WHERE attribute IN (range...)🔄 Breaking Changes
None - This is a backward-compatible addition that:
📝 Usage Examples
Basic Arrow Usage
🛠️ Technical Implementation
Architecture Changes
Server-Side Requirements
Comprehensive Test Coverage
Supported Data Types
✅ Fully Supported:
Run Tests
# Run all arrow tests pytest -k arrow🔗 Related Work
Server-Side Implementation
Motivation
Addresses critical performance bottlenecks in Python-based Trino clients:
📁 Files Changed
Core Implementation
aiotrino/dbapi.py- SegmentCursor and Arrow fetch methodsaiotrino/constants.py- Arrow-related constants and configurationaiotrino/client.py- Enhanced spooling segment handlingTesting & Benchmarks
tests/integration/test_dbapi_integration.py- Comprehensive Arrow teststests/benchmark/- Performance comparison frameworktests/development_server.py- Arrow-enabled test server configurationDocumentation
README.md- Updated with Arrow usage examples and benchmarks✅ Checklist
This work builds upon the excellent foundation provided by the Trino community and specifically the Arrow spooling prototype developed by @dysn and @wendigo.