NVIDIA Stock Forecasting and Analysis

This repository provides an end‑to‑end pipeline—data cleaning, feature engineering, classical econometrics, sentiment fusion, and deep learning—for NVIDIA (NVDA) price forecasting.

Project Overview

Forecast NVDA prices with ARIMA, SARIMAX, GARCH, and a hybrid Random‑Forest ➔ Bidirectional LSTM + Attention network.
Explain drivers such as S&P 500, Google, Microsoft, Intel, macro factors, and a news‑based ImpactScore.
Compare each layer of model complexity against a simple T‑2 baseline (price from two days prior).

Data

Below is a list of each data file contained in the Data directory:

Backup.csv
Database.csv.csv
cleaned_data.csv
merged_data.csv
merged_data2.csv
merged_with_impact.csv
merged_with_impact_score.csv
nvidia_events_filtered.csv

Each file is used at different stages of the data processing and analysis pipeline. Please refer to the specific sections of the project documentation for details on how each file is utilized

Environment and Setup

conda create -n nvda_env python=3.9
conda activate nvda_env
pip install -r requirements.txt

Key libraries – pandas • numpy • scikit-learn • statsmodels • arch • tensorflow (keras)

Usage

1 . Clone

git clone https://github.com/YourUsername/NVIDIA-Forecasting.git
cd NVIDIA-Forecasting

2 . Drop data into `data/`

mkdir -p data
# place cleaned_data.csv and merged_with_impact_score.csv here

3 . Run full notebook‑style script (quick start)

python "Total Code.py"

4 . Modular workflow (recommended)

Step	Script	Core Techniques
① Pre‑processing	`src/data_prep.py`	drop NA/∞, `StandardScaler`, add 20‑/80‑day MA
② Feature ranking	`src/feature_select.py`	Pearson corr, PCA 95 %, LassoCV, Random Forest
③ ARIMA	`src/arima.py`	ADF test → ARIMA(4,1,0) rolling forecast
④ SARIMAX & GARCH	`src/sarimax_garch.py`	exog = `[SP500_log, ImpactScore, INTC_ret]`
⑤ Sentiment scrape	`src/news_sentiment.py`	VADER / FinBERT → daily `ImpactScore`
⑥ Deep model	`src/lstm_attention.py`	RF meta‑feature ➔ 2×50 bi‑LSTM + Attention
⑦ Plots	`src/visualization.py`	saves all figures to `docs/img/`

Data Pre‑processing & Feature Selection

We start by removing NaN/Inf rows and standardising all numerical columns.
Two complementary feature‑ranking tracks are applied:

Technique	Purpose	Outcome
PCA (95 % var)	Orthogonalise & compress	9 principal components retained
LassoCV + Random Forest	Sparse, non‑parametric importance	Top drivers: `SP500`, `MSFT_Adj_Close`, `ImpactScore`, 20‑/80‑day MA

These features feed every downstream model to ensure consistency and avoid look‑ahead bias.

Classical Time‑Series Models

ARIMA

The ADF test rejects the unit‑root hypothesis after one differencing, leading to an (4,1,0) specification chosen via AIC grid search.
The autocorrelation structure is visualised below:

Both plots confirm strong short‑memory up to three lags, justifying the AR term.

SARIMAX

Seasonality (12‑month) and exogenous regressors—SP500_log, ImpactScore, INTC_ret—are introduced in a SARIMAX(1,0,1)(0,1,1,12) framework.

Left panel: fitted vs. observed shows tight tracking.
Right panel: standardised residuals & Q‑Q plot indicate near‑normality with mild tail risk—later captured by GARCH.

GARCH

A GARCH(1,1) layer is fitted to log‑return residuals, reducing volatility clustering and yielding a log‑likelihood of −2156 (↑ vs. ARIMA).

Sentiment Integration

Daily news headlines are scored by VADER and FinBERT; scores are averaged into an ImpactScore that enters SARIMAX and LSTM as a leading indicator.

Scatter illustrates a Pearson‑r = 0.43 between ImpactScore and next‑day return.
Event overlay shows price jumps aligning with major positive (green) and negative (red) news.

Hybrid RF → LSTM‑Attention

Random Forest predicts next‑day price to create a meta‑feature.
Neural net architecture:

Input (20 features, T‑2 window)
     └─ bi‑LSTM (50) ─┐
     └─ bi‑LSTM (50) ─┘ → Attention → LSTM (50) → Dense(1)

Regularisation: Dropout 0.3, L2 0.01, EarlyStopping (patience 10).

The plot compares Actual (blue), LSTM (red), and the naive T‑2 baseline (green).

Results

Model	Inputs	Best Test Metric	T‑2 Baseline
ARIMA(4,1,0)	Close	MSE 34.2	—
SARIMAX	Close + exog	MSE 22.5	—
GARCH(1,1)	log‑σ²	LLH ↑ −2156	—
LSTM‑Attention	20 features, time_steps = 2	MAE 2.42	3.44

32 % MAE reduction from baseline to LSTM.
SARIMAX halves ARIMA’s error by injecting macro + sentiment.
Residual heavy tails in SARIMAX are mostly neutralised after the GARCH layer.

References

Xiao, Q., & Ihnaini, B. (2023). Stock trend prediction using sentiment analysis. PeerJ Computer Science.
Yahoo Finance – NVDA
Mnih, V. et al. (2016). Asynchronous Methods for Deep Reinforcement Learning. arXiv:1607.01958.

Thank you for visiting this project. If you have any questions or suggestions, feel free to open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Data		Data
Model		Model
Photo		Photo
Report		Report
License.md		License.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA Stock Forecasting and Analysis

Table of Contents

Project Overview

Data

Environment and Setup

Usage

1 . Clone

2 . Drop data into `data/`

3 . Run full notebook‑style script (quick start)

4 . Modular workflow (recommended)

Data Pre‑processing & Feature Selection

Classical Time‑Series Models

ARIMA

SARIMAX

GARCH

Sentiment Integration

Hybrid RF → LSTM‑Attention

Results

References

About

Uh oh!

Releases

Packages

Languages

License

Haonan-100/NVIDIA-Stock-Analysis

Folders and files

Latest commit

History

Repository files navigation

NVIDIA Stock Forecasting and Analysis

Table of Contents

Project Overview

Data

Environment and Setup

Usage

1 . Clone

2 . Drop data into data/

3 . Run full notebook‑style script (quick start)

4 . Modular workflow (recommended)

Data Pre‑processing & Feature Selection

Classical Time‑Series Models

ARIMA

SARIMAX

GARCH

Sentiment Integration

Hybrid RF → LSTM‑Attention

Results

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2 . Drop data into `data/`

Packages