ephys-gpt

Neural signal modeling toolkit for MEG/ephys research. It bundles preprocessing, dataset utilities, tokenizers, autoregressive/diffusion/flow/conv models, and PyTorch Lightning training + evaluation entrypoints.

Install

Python 3.13 (see setup.py).
Create/activate an environment, then pip install -e ..
Preprocessing depends on osl-ephys and MNE in addition to the packages in requirements.txt.

Repository layout

preprocess.py – CLI for multi-stage MEG preprocessing.
run.py – Unified train/test/eval/tokenizer driver built on ExperimentDL.
evals.py – Lightweight evaluation runner for automated checkpoint sweeps.
ephys_gpt/preprocessing – Dataset-specific preprocessors (Ωmega/MEG and MOUS) with OSL wrappers and post-processing steps.
ephys_gpt/dataset – Dataset builders, datasplitters, augmentations, mixup dataloader, and text helpers.
ephys_gpt/models – Model zoo (GPT-style AR, Wavenets, CNN/LSTM, diffusion/flow, tokenizers, MEGFormer, BrainOmni, etc.) plus layers/ and mdl/ submodules.
ephys_gpt/training – Lightning LitModel, experiment runners, optimizer/scheduler/pl logging glue, tokenizer trainers, and evaluation utilities.
ephys_gpt/eval – Evaluation classes invoked via run.py --mode eval (quantized AR, diffusion, flow/image, continuous, flat, text, VQ systems).
ephys_gpt/utils – Quantizers, plotting, sampling, metrics, and test helpers.
tests/ – Shape/causality checks for datasets and models.
configs/ – Example configs for preprocessing, models, training/eval, augmentations, and local development.

Running the main pipelines

Preprocessing (`preprocess.py`)

The preprocessing script orchestrates OSL-ephys (stage 1) and custom transforms (stage 2/3) for different datasets.

python preprocess.py --dataset <omega|mous|mous_conditioned> \
  --stage <stage_1|stage_2|stage_3|both|all> \
  --args configs/local/preprocess.yaml

--dataset selects the dataset-specific pipeline (Ωmega, MOUS, or MOUS with conditioning).
--stage chooses which stages to run (both = stage_1+stage_2, all runs all available stages).
--args points to a YAML file containing stage parameters.

Outputs are stored under preprocessed/<sub-...>/<i>.npy with data arrays and metadata (sfreq, ch_names, pos_2d, etc.).

Preprocess config anatomy (`configs/local/preprocess.yaml`)

Paths: data_path (raw dataset root), out_root (optional override for preprocessed/), osl_config (OSL YAML path or inline dict to forward to osl_ephys.preprocessing.run_proc_chain).
Stage selection: stage_1 keys mirror OSL-ephys parameters (filtering, ICA, bad channel detection). stage_2 handles chunking/normalization/quantization via preproc_config. stage_3 handles post-processing such as conditioning labels for MOUS.
Performance: n_workers (Dask workers), max_bad_channels and max_chunks to bound work, chunk_seconds to control sequence length for downstream models.
Transforms: preproc_config toggles normalize, clip, mulaw_quantize, random_sign_flip, etc. Use mulaw_quantize parameters to set codebook size for downstream quantizers.
Metadata: save_info enables saving sensor metadata alongside .npy chunks for plotting/evaluation.

Training / testing / evaluation (`run.py`)

run.py loads a YAML config, resolves model/dataset/loss classes, and dispatches to the requested mode.

python run.py --mode train --args configs/gpt2meg/train.yaml
python run.py --mode test  --args configs/gpt2meg/train.yaml
python run.py --mode eval  --args configs/gpt2meg/eval.yaml
python run.py --mode tokenizer --args <tokenizer_yaml>
python run.py --mode tokenizer-text --args <text_tokenizer_yaml>

Training/test config anatomy

Composition: top-level YAML (configs/*/train.yaml) references a model_config (e.g., configs/gpt2meg/model.yaml) that defines architecture + tokenizer settings. Fields in the top-level file override/extend the nested model file when merged.
Datasets: datasplitter chooses a dataset class (OmegaDataset, QuantizedOmegaDataset, MousDataset, etc.) plus paths and sizing (dataset_root, example_seconds, step_seconds, val_ratio, example_overlap_seconds). dataset_kwargs configures quantization bins, positional encodings, and conditioning labels.
Dataloaders: dataloader configures batch size, workers, and persistence. For tokenizer training, use text_dataloader and tokenizer_data blocks to specify text corpora or image shards.
Models/losses: model_name and loss_name are strings resolved by training/utils.py. loss holds task-specific knobs (e.g., label_smoothing, alpha_l1, KL weights). model_config adds architecture depth/width, attention/windowing, quantizer heads, and tokenizer checkpoints.
Optim/Lightning: lightning carries optimizer choice (optimizer, lr, weight_decay, betas), scheduler (lr_scheduler, warmup_steps), AMP/compile toggles, gradient clipping, and logging frequency.
Trainer: trainer mirrors PyTorch Lightning arguments (accelerator, devices, precision, max_epochs, max_steps, accumulate_grad_batches, checkpointing/logging cadence).
Saving/resume: save_dir, resume_from, and ckpt_path control where checkpoints and logs land. Use version to segregate runs under the same save directory.

Evaluation config anatomy (`run.py --mode eval`)

eval_class chooses the evaluator (EvalQuant, EvalDiffusion, EvalFlow, EvalCont, EvalVQ, EvalFlat, EvalText).
eval block configures checkpoint selection (ckpt_path, version, step, best), data sizing (max_batches, num_examples), and generation settings (gen_sampling, temperature, top_k, top_p, future_steps, gen_seconds).
plot/sample options enable PSD plots, spectrograms, and sample dumping. shape_hints can enforce target sequence length or channels for diffusion/flow/image evaluators.

Automated evaluation runner (`evals.py`)

evals.py is a thin wrapper around ephys_gpt.training.eval_runner.EvaluationRunner, useful for periodic checkpoint checks.

# YAML config
python evals.py --args configs/gpt2meg/eval.yaml

# Inline JSON config
python evals.py --dict --args '{"save_dir": "...", "eval_runner": {"ckpt_path": "..."}, ...}'

eval_runner options support selecting a checkpoint (ckpt_path), version/step filtering, max_batches, num_examples, and optional generation settings under eval_runner.generate (strategy, temperature, top-k/p). The runner rebuilds the validation dataloader from the provided datasplitter/dataloader configuration before scoring.

Configuration tips

Start from the closest example under configs/ (e.g., configs/gpt2meg/train.yaml for Ωmega AR training, configs/cnnlstm/train.yaml for LibriBrain classification, configs/local/preprocess.yaml for preprocessing). Copy and adjust paths rather than editing the originals.
Keep architecture in model_config and data/training knobs in the top-level file. Shared defaults can be placed in a base YAML and extended via YAML anchors/include if desired.
Quantized pipelines: set datasplitter.dataset_root to preprocessed/ outputs and align tokenizer vocab/codebook sizes between preproc_config, dataset_kwargs, and the model config.
Tokenizer training: use mode tokenizer/tokenizer-text with configs under configs/tokenizers/. These mirror the train configs but swap in ExperimentTokenizer/ExperimentTokenizerText and text/image dataset specs.
Remote runs: adjust trainer.accelerator/devices for multi-GPU, and ensure save_dir points to a shared filesystem; Lightning loggers are wired in training/logging.

Module quick reference

Preprocessing: dataset wrappers (Omega, MOUS, MOUSConditioned) orchestrating OSL-ephys and custom transforms.
Datasets: chunked Ωmega loaders (continuous/quantized/image), LibriBrain grouped classification helpers, text loaders, and a MixupDataLoader.
Models: GPT2MEG/STGPT2MEG/VQGPT2MEG, Wavenet variants, CNN/LSTM hybrids, diffusion (NTD), flow/image models (MEGFormer, ChronoFlowSSM), tokenizers (VideoGPTTokenizer, Emu3VisionVQ, BrainOmniTokenizer), and research attention stacks (LITRA/TACA/CK3D/TASA3D/LatteAR).
Layers/MDL: reusable attention/convolution blocks plus model wrappers in ephys_gpt.layers and ephys_gpt.mdl (e.g., Perceiver, LinearAttention, S4, CKConv, MLPMixer).
Losses: task-specific objectives in ephys_gpt.losses (e.g., contrastive, reconstruction, classification, multi-task losses).
Training: ExperimentDL, ExperimentTokenizer, and ExperimentTokenizerText wire configs into Lightning modules, optimizers, schedulers, and loggers; eval_runner and evaluation classes provide reusable scoring/generation hooks.
Eval: task-specific evaluators (EvalQuant, EvalDiffusion, EvalFlow, EvalCont, EvalVQ, EvalFlat, EvalText) that share dataset splitting logic and plotting/sampling utilities.
Logging: wrappers for WandB/CSV/TensorBoard in ephys_gpt.logging plus experiment helpers under scripts/.
Utils: quantizers (µ-law, residual VQ helpers), PSD/cov plotting, sampling helpers, YAML loaders, and gradient-causality assertions for tests.
Notebooks/Scripts: exploratory notebooks and utility scripts (e.g., dataset inspection, checkpoint export) under notebooks/ and scripts/.

Tests

Run the lightweight test suite with:

pytest -q

The suite checks dataset shapes/shift semantics/image mapping, forward passes across key models (GPT2MEG, STGPT2MEG, MEGFormer, BENDR, NTD, Wavenet, tokenizers, etc.), and causality/gradient safety via utilities like tests/utils.assert_future_grad_zero. Tests use synthetic CPU inputs by default.

Contributing

Open issues or discussions before large changes; small fixes/docs are welcome via PR.
Keep README/config examples synchronized with new pipelines or arguments.
Ensure new models/datasets include minimal tests and synthetic-forward coverage. Prefer CPU-friendly fixtures in tests/.
Run pytest -q (and tokenizer-specific smoke tests if you add a tokenizer) before submitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ephys-gpt

Install

Repository layout

Running the main pipelines

Preprocessing (`preprocess.py`)

Preprocess config anatomy (`configs/local/preprocess.yaml`)

Training / testing / evaluation (`run.py`)

Training/test config anatomy

Evaluation config anatomy (`run.py --mode eval`)

Automated evaluation runner (`evals.py`)

Configuration tips

Module quick reference

Tests

Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
docs		docs
ephys_gpt		ephys_gpt
legacy		legacy
tests		tests
tmp		tmp
.flake8		.flake8
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
evals.py		evals.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

ricsinaruto/brain-forecast

Folders and files

Latest commit

History

Repository files navigation

ephys-gpt

Install

Repository layout

Running the main pipelines

Preprocessing (preprocess.py)

Preprocess config anatomy (configs/local/preprocess.yaml)

Training / testing / evaluation (run.py)

Training/test config anatomy

Evaluation config anatomy (run.py --mode eval)

Automated evaluation runner (evals.py)

Configuration tips

Module quick reference

Tests

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Preprocessing (`preprocess.py`)

Preprocess config anatomy (`configs/local/preprocess.yaml`)

Training / testing / evaluation (`run.py`)

Evaluation config anatomy (`run.py --mode eval`)

Automated evaluation runner (`evals.py`)

Packages