A comprehensive implementation of the Theta Model for time series forecasting. Includes model estimation, sensitivity analysis, diagnostics, and extensions. Uses statsmodels, pandas, and matplotlib.
A comprehensive implementation and analysis of the Theta forecasting model developed by Assimakopoulos & Nikolopoulos (2000). This notebook provides a complete workflow for time series forecasting using Python's statsmodels library, with detailed explanations, parameter analysis, and diagnostic evaluation.
- Default Estimation (SES + OLS): Traditional parameter estimation approach
- Maximum Likelihood Estimation (MLE): Statistical estimation using IMA(1,1) model
- Multi-theta Forecasting: Sensitivity analysis across different theta values
- Seasonal Decomposition: Automatic detection and handling of seasonality
- Housing Starts Analysis: US housing market data (FRED: HOUST)
- Personal Consumption Expenditure: Macroeconomic forecasting (FRED: PCEC)
- Comparative Analysis: Different estimation methods and their implications
- Residual Analysis: Distribution, autocorrelation, and normality tests
- Forecast Evaluation: MAE, RMSE, MAPE metrics and interpretation
- Model Diagnostics: Jarque-Bera, Ljung-Box tests for model validation
- Component Analysis: Trend, SES, and seasonal component decomposition
- Hedgehog Plots: Rolling origin forecasts for model stability assessment
- Parameter Sensitivity: Theta value impact on forecast trajectories
- Prediction Intervals: Uncertainty quantification in forecasts
- Residual Diagnostics: Comprehensive error analysis visualizations
pip install statsmodels pandas matplotlib seaborn numpy pandas-datareader scipy- statsmodels (≥0.13.0): Theta model implementation
- pandas (≥1.3.0): Data manipulation and time series handling
- numpy (≥1.21.0): Numerical computations
- matplotlib (≥3.5.0): Data visualization
- pandas-datareader (≥0.10.0): FRED data access
- scipy (≥1.7.0): Statistical functions
# Basic Theta Model Implementation
from statsmodels.tsa.forecasting.theta import ThetaModel
import pandas_datareader as pdr
# Load data
reader = pdr.fred.FredReader(['HOUST'], start="1980-01-01", end="2020-04-01")
data = reader.read()
series = data['HOUST']
# Fit Theta Model
tm = ThetaModel(series)
results = tm.fit()
# Generate forecasts
forecast = results.forecast(12)
print(forecast)-
b0 (Drift/Trend): Linear trend component
- Positive: Upward trend
- Negative: Downward trend
- Near-zero: Stationary behavior
-
alpha (Smoothing): Exponential smoothing parameter
- Range: (0, 1)
- High values: More weight to recent observations
- Low values: More weight to historical data
-
theta: Curvature adjustment parameter
- θ = 0: Straight line (linear trend)
- θ = 2: Standard Theta model
- θ → ∞: IMA(1,1) with drift
# Automatic frequency detection and seasonal adjustment
series.index.freq = series.index.inferred_freq
tm = ThetaModel(series, method='additive')- Default: SES smoothing + OLS trend estimation
- MLE: Maximum likelihood via IMA(1,1) model
- Comparative analysis of estimation approaches
# Access individual forecast components
components = results.forecast_components(12)
# Returns: trend, SES, and seasonal components# Compare different theta values
forecasts = {
'theta=1.2': results.forecast(12, theta=1.2),
'theta=2': results.forecast(12),
'theta=3': results.forecast(12, theta=3),
'theta=inf': results.forecast(12, theta=np.inf)
}- Data: 484 monthly observations (1980-2020)
- Trend: Negative drift (b0 = -0.9186)
- Smoothing: Moderate persistence (alpha = 0.6165)
- Accuracy: MAPE = 8.28% (Good performance)
- Data: 162 quarterly observations (1980-2020)
- Trend: Positive growth (b0 = 0.0130)
- Smoothing: High persistence (alpha = 0.9999)
- Pattern: Strong upward trajectory with COVID-19 impact visible
- Normality: Jarque-Bera test for distribution assessment
- Autocorrelation: Ljung-Box test for residual independence
- Heteroscedasticity: Variance stability checks
- Error Metrics: MAE, RMSE, MAPE calculation
- Hedgehog plots: Model stability over time
- Prediction intervals: Uncertainty quantification
- Out-of-sample testing: Model generalization assessment
Extension to support multiple theta values with weighted combinations:
class ExtendedThetaModel:
"""Support for multiple theta values with custom weighting"""
def __init__(self, series, thetas=[0, 2, 3]):
self.series = series
self.thetas = thetasTheta model expressed as state space system for Kalman filter implementation. This refers to reformulating the Theta model as a state space system, enabling Kalman filter implementation. The state space representation decomposes the time series into unobserved components (level, trend, seasonal) with explicit transition equations. This allows for optimal recursive estimation of these hidden states using the Kalman filter algorithm. The approach provides several advantages: it yields minimum mean-squared error estimates, handles missing data naturally, and produces exact prediction intervals. Essentially, the Theta model's components are estimated optimally rather than heuristically, while maintaining the model's interpretable structure. This representation bridges the intuitive Theta decomposition with rigorous statistical estimation methods.
Grid search for optimal theta values based on validation performance.
| Metric | Housing Starts | PCE |
|---|---|---|
| MAPE | 8.28% | 5-7% (estimated) |
| MAE | 103.90 units | Scale-dependent |
| RMSE | 141.03 units | Scale-dependent |
| Autocorrelation | None detected | None detected |
- Simplicity: Easy to implement and interpret
- Flexibility: Theta parameter allows trend/smoothness tuning
- Performance: Particularly effective for monthly economic data
- Decomposition: Clear separation of trend, seasonal, and irregular components
- Linear Trend Assumption: Long-term forecasts assume linearity
- Parameter Sensitivity: Theta choice significantly impacts results
- Error Distribution: Residuals often non-normal in practice
- Seasonal Adjustment: Method choice (additive/multiplicative) affects results
- Always validate with out-of-sample testing
- Compare multiple theta values for optimal performance
- Check residuals for model adequacy
- Consider transformations (log, etc.) for non-stationary data
- Use hedgehog plots to assess forecast stability
-
Assimakopoulos, V., & Nikolopoulos, K. (2000). The theta model: a decomposition approach to forecasting. International Journal of Forecasting, 16(4), 521-530.
-
Hyndman, R. J., & Billah, B. (2003). Unmasking the Theta method. International Journal of Forecasting, 19(2), 287-290.
-
Fioruci, J. A., Pellegrini, T. R., Louzada, F., & Petropoulos, F. (2015). The optimized theta method. arXiv preprint arXiv:1503.03529.
-
Statsmodels Documentation: Theta Model Implementation
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions, issues, and feature requests are welcome. Feel free to check [issues page] if you want to contribute.
For questions and support:
- Check the Statsmodels documentation
- Review existing issues
- Submit detailed questions with reproducible examples
This implementation is based on statsmodels v0.13.0+. Some features may not be available in earlier versions. Always check compatibility with your environment.
This notebook is designed for educational and research purposes. Real-world applications may require additional considerations and validation.