Unlike fixed-strategy trading systems, PopAgent maintains populations of agents that learn to SELECT which methods to use from a shared inventory. This creates a meta-learning system where agents discover optimal method combinations through continual learning.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β POPAGENT: METHOD SELECTION LEARNING β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β INVENTORY (15 methods) AGENT POPULATION (5 agents) β
β βββββββββββββββββββββββ βββββββββββββββββββββββββββββ β
β β β RSI β β Agent 1 β β
β β β MACD ββββ selects ββ Preferences: RSIβ HMMβ β β
β β β BollingerBands β β Picks: [RSI, HMM, Kalman] β β
β β β HMM_Regime β βββββββββββββββββββββββββββββ β
β β β KalmanFilter β βββββββββββββββββββββββββββββ β
β β β WaveletTransform ββββ selects ββ Agent 2 β β
β β β STL_Decomposition β β Preferences: MACDβ STLβ β β
β β β VolatilityClustering β Picks: [MACD, STL, Waveletβ β
β β β ... (more) β βββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββ ... β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CONTINUAL LEARNING β β
β β β β
β β 1. Agents select methods β Execute pipeline β Get reward β β
β β 2. Update preferences: pref[method] += Ξ± Γ (reward - baseline) β β
β β 3. Transfer: Best agent's preferences β Other agents β β
β β 4. Diversity: Ensure agents don't all select same methods β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Traditional Approach | PopAgent Approach |
|---|---|
| Fixed agent strategies | Agents SELECT methods dynamically |
| Learn parameters | Learn WHICH methods to use |
| Single best agent | Population discovers combinations |
| Static configurations | Adapts to market conditions |
| Train-then-deploy | Online learning (models update every bar) |
- Meta-Learning for Trading: Agents learn to select strategies, not just tune parameters
- Selection Pressure: Inventory (15) > Selection (3) creates meaningful choices
- Preference Transfer: Knowledge sharing is about WHAT to select
- Context-Aware Selection: Different methods for different market regimes
- Online Learning: Models update after EVERY observation (like real hedge funds)
Key insight: Update frequency should match FEATURE TIMESCALE, not model complexity!
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FEATURE-ALIGNED LEARNING ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β FAST FEATURES (momentum, vol) ββββββββββΊ Update: EVERY BAR β
β Model: Any (even XGBoost!) Why: These change every 4h β
β β
β MEDIUM FEATURES (trend, daily) βββββββββΊ Update: EVERY 6 BARS β
β Model: Any Why: Trend changes daily β
β β
β SLOW FEATURES (regime, corr) βββββββββββΊ Update: EVERY 42 BARSβ
β Model: Any (even simple!) Why: Regime changes weekly β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Wrong Approach (Model-Based) | Right Approach (Feature-Based) |
|---|---|
| Simple model β fast update | Fast-changing feature β fast update |
| Complex model β slow update | Slow-changing feature β slow update |
| Computational constraint drives design | Data dynamics drive design |
Bar 1: Observe β Predict β Trade β See outcome β UPDATE WEIGHTS
Bar 2: Observe β Predict (better) β Trade β See outcome β UPDATE WEIGHTS
Bar 3: Observe β Predict (even better) β Trade β See outcome β UPDATE WEIGHTS
...
Bar 8700: Model has been learning for 4 years
| Feature Group | Features | Update Freq | Models Used |
|---|---|---|---|
| Fast | ret_1bar, ret_5bar, vol_intrabar, momentum | Every bar | OnlineLinear + OnlineRidge |
| Medium | trend_strength, daily_vol, sma_ratio | Every 6 bars | Ridge with batch refit |
| Slow | regime, cross_correlation | Every 42 bars | RandomForest + regime means |
| Model | Algorithm | What It Learns |
|---|---|---|
OnlineLinearRegression |
SGD | Return prediction |
OnlineRidge |
Recursive Least Squares | Trend prediction |
OnlineVolatility |
EWMA | Volatility estimation |
OnlineRegimeDetector |
Bayesian HMM | Market regime (Bull/Bear/Neutral) |
# Online models update after EVERY bar:
for bar in price_data:
features = extract_features(bar)
# Predict BEFORE seeing outcome
prediction = model.predict(features)
# Execute trade
execute_trade(prediction)
# Next bar: see actual outcome
actual_return = next_bar.close / bar.close - 1
# UPDATE model weights with observation
model.update(features, actual_return) # β This is online learning!Three lightweight, theoretically-grounded RL improvements for robust learning:
Instead of deterministic UCB, agents sample from Beta distributions to naturally balance exploration and exploitation:
For each method m:
sample ~ Beta(Ξ±_m, Ξ²_m)
# High uncertainty β high variance β more exploration
# High success rate β high mean β more exploitation
| Scenario | Alpha | Beta | Behavior |
|---|---|---|---|
| New method | 1 | 1 | Uniform sampling (explore) |
| 10 wins, 2 losses | 11 | 3 | High mean, exploit |
| 3 wins, 10 losses | 4 | 11 | Low mean, avoid |
Per-regime baselines for proper credit assignment:
Bull market: +2% is average (baseline = 2.5%) β advantage β 0
Bear market: +2% is exceptional (baseline = -0.5%) β advantage β +2.5%
Agents learn context-specific method preferences, not global averages.
Discounted future rewards for methods that sacrifice short-term for long-term:
G_t = r_t + Ξ³Β·r_{t+1} + Ξ³Β²Β·r_{t+2} + ...
Method A: Immediate +1%, then +0.5%, +0.5% β G = 1.86%
Method B: Immediate -0.5%, then +3%, +2% β G = 4.32% β
Multi-step returns properly credit methods that set up future gains.
population:
use_thompson_sampling: true
gamma: 0.9 # Discount factor
n_step: 3 # Steps for multi-step returnsEach role has 10-15 methods available, but agents only select 3 at a time:
| Category | Methods |
|---|---|
| Technical | RSI, MACD, BollingerBands, ADX, Stochastic |
| Statistical | Autocorrelation, VolatilityClustering, MeanReversion, Cointegration |
| Decomposition | STL, WaveletTransform, FourierAnalysis |
| ML | HMM_Regime, KalmanFilter, IsolationForest |
| Category | Methods |
|---|---|
| Statistical | ARIMA, ExponentialSmoothing, VectorAutoregression, GARCH |
| ML | RandomForest, GradientBoosting, LSTM, TemporalFusion |
| Uncertainty | BootstrapEnsemble, QuantileRegression, BayesianInference, ConformalPrediction |
| Category | Methods |
|---|---|
| Execution | AggressiveMarket, PassiveLimit, TWAP, VWAP |
| Sizing | KellyCriterion, FixedFractional, VolatilityScaled |
| Entry | MomentumEntry, ContrarianEntry, BreakoutEntry |
| Category | Methods |
|---|---|
| Position | MaxLeverage, MaxPositionSize, ConcentrationLimit |
| Loss | MaxDrawdown, DailyStopLoss, TrailingStop |
| Metrics | VaRLimit, ExpectedShortfall |
| Dynamic | VolatilityAdjusted, RegimeAware |
# Create and activate conda environment
conda create -n mas python=3.11 -y
conda activate mas
# Install core packages
conda install pandas numpy matplotlib requests pyyaml -y
conda install -c conda-forge openai -y
# Install project
cd /path/to/MAS_Final_With_Agents
pip install -e .# Single asset backtest
python -m trading_agents.cli backtest --symbol BTC
# Multi-asset backtest
python -m trading_agents.cli backtest --symbols BTC,ETH,SOL,DOGE,XRP
# With options
python -m trading_agents.cli backtest --symbol BTC \
--population-size 5 \
--capital 10000 \
--start 2024-01-01 \
--end 2024-06-01Terminal 1 - Start API server:
conda activate mas
python -m trading_agents.cli api --port 8000Terminal 2 - Start React dashboard:
cd dashboard
npm install
npm run devOpen http://localhost:3000 in your browser.
# Run all 4 conditions: A=Baseline, B=LLM, C=News, D=Full
python -m trading_agents.cli ablation --condition all \
--symbols BTC,ETH,SOL,XRP,DOGE \
--start 2022-01-01 \
--end 2024-12-01
# Run single condition (e.g., baseline only)
python -m trading_agents.cli ablation --condition A| Condition | LLM | News | Description |
|---|---|---|---|
| A (Baseline) | No | No | Pure Thompson Sampling |
| B (LLM Only) | Yes | No | LLM reasoning, no news |
| C (News Only) | No | Yes | News as features |
| D (Full) | Yes | Yes | Complete system |
# Run real-time learning with 4-hour iterations
python -m trading_agents.cli live --symbols BTC,ETH,SOL,XRP,DOGE
# With options
python -m trading_agents.cli live \
--symbols BTC,ETH,SOL \
--interval 4.0 \
--use-llm \
--use-news \
--testnet # Execute on Bybit testnet
# Test single iteration (no waiting)
python -m trading_agents.cli live --test-onceKey difference from backtesting:
- Backtest: Simulates historical data rapidly (1000+ iterations in minutes)
- Live Mode: Waits actual 4 hours between iterations, fetches live data
| Mode | Data Source | Wait Time | Use Case |
|---|---|---|---|
| Backtest | Historical CSV | None | Research, hyperparameter tuning |
| Live | Real-time API | 4 hours | Continuous learning, paper trading |
python -m trading_agents.cli export --experiment-id <exp_id> --output-dir exports/neuripspython -m trading_agents.cli selector --config configs/multi_asset.yaml# configs/multi_asset.yaml
population:
mode: "selector" # Use method selection (vs "fixed" for legacy)
size: 5 # 5 agents per role
max_methods: 3 # Each agent picks 3 methods
transfer_frequency: 10
learning_rate: 0.1
exploration_rate: 0.15Iteration N:
β
βββ 1. METHOD SELECTION
β βββ Each agent selects 3 methods from inventory (UCB + preferences)
β Agent 1: [RSI, HMM_Regime, KalmanFilter]
β Agent 2: [MACD, STL_Decomposition, WaveletTransform]
β ...
β
βββ 2. PIPELINE SAMPLING
β βββ Sample 25 combinations of (analyst, researcher, trader, risk)
β
βββ 3. EVALUATION
β βββ Run each pipeline β measure PnL
β
βββ 4. PREFERENCE UPDATE (Reinforcement Learning)
β βββ For each method used:
β preference[method] += learning_rate Γ (reward - baseline)
β
βββ 5. KNOWLEDGE TRANSFER (every 10 iterations)
β βββ Best agent's preferences β Other agents (soft update Ο=0.1)
β
βββ 6. DIVERSITY CHECK
β βββ If selection diversity < threshold β increase exploration
β
βββ 7. Next Iteration
trading_agents/
βββ population/ # Population-based method selection
β βββ selector.py # MethodSelector class (core innovation)
β βββ inventories.py # 15 methods per role
β βββ selector_workflow.py # Selection-based workflow
β βββ ... # Transfer, diversity, scoring
βββ agents/ # Agent implementations
βββ inventory/ # Method implementations
β βββ online_models.py # Online learning (SGD, RLS, HMM)
β βββ feature_aligned_learner.py # Feature-timescale-aligned learning (v0.9.8)
β βββ ... # Analyst, Researcher, Trader, Risk methods
βββ backtesting/ # Backtesting engine
β βββ engine.py # BacktestEngine with population support
β βββ executor.py # Order execution simulation
βββ services/ # LLM, events, notifications
β βββ experiment_logger.py # Structured logging (JSONL)
β βββ scheduler.py # 4-hour paper trading scheduler
β βββ neurips_export.py # Publication-ready exports
βββ api/ # Dashboard API
β βββ server.py # FastAPI + WebSocket server
βββ config/ # Configuration management
dashboard/ # React Visualization Dashboard
βββ src/components/ # AgentPopulation, MethodInventory, etc.
βββ ... # Next.js app
tests/ # Test suite
βββ conftest.py # Pytest fixtures
βββ test_*.py # Mock and integration tests
- (2025.07.03) First Meeting
- (2025.08.28) Project Proposal and Workflow First Draft
- (2025.09.18) Completed Micro & Macro News and Price Data Fetch
- (2025.10.17) Created config-driven, raw multi-agent pipeline
- (2025.12.19) Major Architecture Refactoring v0.2.0
- (2025.12.19) Multi-Asset Data Pipeline v0.3.0 (5 coins)
- (2025.12.19) Admin Agent & Paper Trading v0.4.0
- (2025.12.19) Bocha Search Integration v0.4.1
- (2025.12.19) PopAgent v0.5.0: Population-Based Learning
- (2025.12.19) PopAgent v0.6.0: Adaptive Method Selection
- Agents now SELECT methods from inventory (not fixed strategies)
- Extended inventories: 15/12/10/10 methods per role
- Selection learning via UCB + reinforcement learning
- Preference-based knowledge transfer
- Context-aware method selection
- (2025.12.19) PopAgent v0.7.0: RL Enhancements
- Thompson Sampling for Bayesian exploration
- Contextual baselines for regime-aware learning
- Multi-step returns for temporal credit assignment
- (2025.12.20) PopAgent v0.8.0: Testing & Visualization
- Complete test suite with mock data fixtures
- Population-based backtesting (
run_population_backtest) - React dashboard for visualization (Next.js + Tailwind)
- FastAPI backend with WebSocket live updates
- 4-hour paper trading scheduler
- NeurIPS export utilities (figures, tables, traces)
- (2025.12.20) PopAgent v0.9.0: Online Learning (Hedge Fund Style)
- TRUE Online Learning: Models update weights after EVERY observation
OnlineLinearRegression: SGD-based return predictorOnlineRidge: Recursive Least Squares with forgetting factorOnlineVolatility: EWMA variance estimationOnlineRegimeDetector: Bayesian HMM with incremental updates- Persistent model state across sessions
- Real-time learning mode (
python -m trading_agents.cli live)
- (2025.12.20) PopAgent v0.9.1-v0.9.6: Incremental Improvements
- v0.9.1: Stay-flat metrics tracking (avoid trading in uncertainty)
- v0.9.4: Simplified trading logic (online model decides trade/no-trade)
- v0.9.5: Fixed momentum as PRIMARY driver (not overridden by untrained models)
- v0.9.6: Real pipeline execution, regime detector responsiveness fixes
- (2025.12.21) PopAgent v0.9.8: Feature-Aligned Learning π
- KEY INSIGHT: Update frequency should match FEATURE TIMESCALE, not model complexity!
- Deprecated hybrid learning (model complexity β frequency approach was flawed)
- New
FeatureAlignedLearnerwith 3 feature groups:- Fast features (momentum, vol spikes): Update EVERY bar
- Medium features (trend, daily vol): Update every 6 bars (~daily)
- Slow features (regime, correlations): Update every 42 bars (~weekly)
- Each group can use ANY model complexity - complexity β update frequency
- Adaptive blending weights based on market conditions
feature_aligned_learner.py: 500+ lines of principled learning architecture
| Experiment | Description | Hypothesis |
|---|---|---|
| A: Online-Only | Pure SGD, update every bar | Fast adaptation, poor pattern capture |
| B: Batch-Only | Refit RF/XGB weekly | Good patterns, slow adaptation |
| C: Hybrid (Model-Based) | Simpleβfast, Complexβslow | Suboptimal: wrong dimension |
| D: Feature-Aligned | Fast featuresβfast, Slowβslow | β Best: matches data dynamics |
python -m trading_agents.cli ablation --experiment learning_approach \
--conditions online,batch,hybrid,feature_aligned| Config | Fast Freq | Medium Freq | Slow Freq |
|---|---|---|---|
| Aggressive | 1 bar | 3 bars | 21 bars |
| Default | 1 bar | 6 bars | 42 bars |
| Conservative | 1 bar | 12 bars | 84 bars |
| Condition | Strategy | Expected Outcome |
|---|---|---|
| Fixed-Best | Always use top-3 methods | Good baseline, no adaptation |
| Fixed-Random | Random method selection | Poor performance |
| PopAgent | Learned selection | Adapts to regime changes |
| Pop Size | Diversity | Convergence Speed | Final Performance |
|---|---|---|---|
| 3 | Low | Fast | Risk of local optima |
| 5 | Medium | Balanced | Default setting |
| 10 | High | Slow | Better exploration |
| Transfer Every | Effect |
|---|---|
| 5 iterations | Rapid homogenization |
| 10 iterations | Balanced (default) |
| 20 iterations | More diversity, slower learning |
| Condition | Description |
|---|---|
| Independent | Each asset learns separately |
| Shared Population | Single population, cross-asset features |
| Transfer Across Assets | BTC insights β altcoins |
- Sharpe Ratio (primary)
- Total Return %
- Maximum Drawdown
- Win Rate
- Stay-Flat Rate (% of iterations with no trade)
- Learning Improvement (avg PnL last 10% vs first 10%)
- Selection Diversity (entropy of method usage)
"PopAgent: Adaptive Method Selection in Multi-Agent LLM Trading via Continual Learning"
- Method Selection as Meta-Learning - Agents learn WHAT to use, not just HOW
- Inventory > Agents - Selection pressure creates meaningful learning
- Preference Transfer - Novel knowledge sharing mechanism
- Context-Aware Selection - Adapt to market regimes
- 5 crypto assets (BTC, ETH, SOL, DOGE, XRP)
- 2 years of 4h data
- Compare: Fixed strategies vs Method Selection
- Ablations: Transfer frequency, inventory size, exploration rate
Trades 5 cryptocurrencies with cross-asset market context:
| Coin | Symbol | Description |
|---|---|---|
| Bitcoin | BTC | Primary market benchmark |
| Ethereum | ETH | Smart contract platform |
| Solana | SOL | High-performance L1 |
| Dogecoin | DOGE | Meme coin / retail sentiment |
| Ripple | XRP | Payment-focused crypto |
- BTC dominance, altcoin momentum, ETH/BTC ratio
- Cross OI delta, aggregate funding, risk-on/off
- Market volatility, cross-correlation
data:
multi_asset: true
symbols: [BTC, ETH, SOL, DOGE, XRP]
bybit_csv_dir: "data/bybit"
population:
mode: "selector"
size: 5
max_methods: 3
transfer_frequency: 10
learning_rate: 0.1This implementation builds on TradingAgents (Apache-2.0) and Population-Based Training research.