Daily Realized Volatility Prediction of SP500 Index (SPX)

This project focuses on predicting the daily realized volatility of the S&P 500 Index using various econometric and machine learning models. The research explores the effectiveness of traditional models, such as GARCH and VAR, compared to more advanced machine learning techniques, including LSTM and RNN.

Key Components

Data Collection and Preprocessing: The dataset combines daily data and macrofinancial indicators, alongside sentiment analysis of financial news headlines related to S&P 500 companies.
Model Development: Univariate and multivariate models were developed, including GARCH, VAR, Linear Regression, Ridge, Lasso, Random Forest, XGBoost, Simple RNN, and LSTM. The models were trained and tested using robust evaluation metrics like R², RMSE, MAE, and MAPE.
Sentiment Analysis: The project incorporates sentiment scores derived from the VADER model applied to news headlines, enhancing the predictive power of the models.
Results and Comparison: The project systematically compares the out-of-sample forecasting performance of the models, highlighting the superior performance of LSTM in capturing complex temporal patterns in financial time series data.

This research provides valuable insights into the application of machine learning in finance, particularly in volatility forecasting, and demonstrates the potential of integrating sentiment analysis with traditional financial modeling techniques.

Requirements:

The requirements are inside the requirements.txt file.

This project can be sub-divided into these main parts:

Download Market Capitalization data from CRSP, sourced from Wharton
Download indicators
Sentimental Analysis
Models

Download Data

Data was downloaded from these websites:

CRSP was used for Market Capitalization of SP500 and the first 50 firms of the index
Yahoo Finance was used for Indicators
Economic Policy Uncertainty was used for indicators

Sentimental Analysis

The sentimental analysis can be divided into these main passages:

Scrap all information regarding SP500 from Wikipedia. Subsequently, scrap all news from Business Market Insider. Everything is done in scraper.py
Run the sentiment on all scraped headlines using sentimental.py
Plots and analysis of of data scraped in the parts above in plots_SP500.ipynb
Analysis sentimental scores in sentimental_and_plots.ipynb
Computation of extra weight which will be given to the first 50 firms of the SP500. Code is visible in market_capitalization_weights.ipynb
Adjust sentiment scores taking into account the weights computed previously. Run compute_weighted_sentiments.ipynb

Due to GitHub's storage limitations, some CSV files which were used cannot be uploaded. However, they can still be visualized here.

In order to not encounter any problems with the codes, they have to be saved in a Directory called Data. For instance, the CSV file sp500_news_and_sentimental.csv must be present in Data\sp500_news_and_sentimental.csv, as well as all the other CSV files.

Model Implementation

Before running financial econometrics and ML's models, we need to scrap financial data from online and to subsequently merge it with the data from Oxford-Man Institute’s and TwelveData. All steps can be visualized in RV dataset.ipynb. Then, data analysis is computed in Data Analysis and Visualization.ipynb

Now, after data analysis was done, we ran the models:

Regression models: Linear, Lasso, Ridge. Code can be visualized in Regression.ipynb.
Financial econometrics models (GARCH and VAR), with the testing of all their assumptions. The code can be found in GARCH_VAR.ipynb
Lastly, ML models (Random Forest, XGBoost, RNN, LSTM) were executed in ML models.ipynb.

All model results can be observed in Results.ipynb.

Questions

For any question and/or curiosity, feel free to reach

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Daily Realized Volatility Prediction of SP500 Index (SPX)

Key Components

Requirements:

Download Data

Sentimental Analysis

Model Implementation

Questions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Data		Data
Outputs		Outputs
Sentiment Analysis		Sentiment Analysis
.gitignore		.gitignore
Data Analysis and Visualization.ipynb		Data Analysis and Visualization.ipynb
GARCH_VAR.ipynb		GARCH_VAR.ipynb
ML models.ipynb		ML models.ipynb
README.md		README.md
RV dataset.ipynb		RV dataset.ipynb
Regression.ipynb		Regression.ipynb
Results.ipynb		Results.ipynb
report.pdf		report.pdf
requirements.txt		requirements.txt
utils.py		utils.py

valee99/Realized-Volatility-Prediction

Folders and files

Latest commit

History

Repository files navigation

Daily Realized Volatility Prediction of SP500 Index (SPX)

Key Components

Requirements:

Download Data

Sentimental Analysis

Model Implementation

Questions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages