Superposition09m / SAE-Track Public

Notifications You must be signed in to change notification settings
Fork 0
Star 6

Code for the paper "Tracking the Feature Dynamics in LLM Training: A Mechanistic Study"

6 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
sae		sae
sae_analysis		sae_analysis
sae_lens		sae_lens
sae_training		sae_training
sae_vis		sae_vis
transformer_lens		transformer_lens
README.md		README.md
act_dynamics.py		act_dynamics.py
cos_analysis_feature_centric.py		cos_analysis_feature_centric.py
cos_plot_ckpt.py		cos_plot_ckpt.py
feat_dynamics.py		feat_dynamics.py
feat_fns.py		feat_fns.py
generate_feat_seqs.py		generate_feat_seqs.py
pipeline.py		pipeline.py
random_sim_baseline.py		random_sim_baseline.py
sparse_autoencoder_trainer.py		sparse_autoencoder_trainer.py
traj.py		traj.py
umap_vis.py		umap_vis.py
vis_in_one.py		vis_in_one.py
w_no_jaccard.py		w_no_jaccard.py
wjaccard_dynamics.py		wjaccard_dynamics.py

Repository files navigation

Codebase for "Tracking the Feature Dynamics in LLM Training: A Mechanistic Study"

🔹 Utilities

feat_fns.py: Utility functions used throughout the codebase.
generate_feat_seqs.py: Generates datapoints corresponding to given features.

🔹 SAE-Track

pipeline.py: Implements SAE-Track by training a sequence of SAEs using sparse_autoencoder_trainer.py.
sparse_autoencoder_trainer.py: Trains individual SAEs on model activations.

🔹 Feature Semantics

vis_in_one.py: Feature panel visualization, including semantic information.

🔹 Feature Formation

umap_vis.py: UMAP visualization.
act_dynamics.py: Computes activation space progress measures.
feat_dynamics.py: Computes feature space progress measures.
w_no_jaccard.py: Uses Jaccard similarity for progress measure.
wjaccard_dynamics.py: Uses weighted Jaccard similarity for progress measure.

🔹 Feature Drift

cos_analysis_feature_centric.py: Cosine similarity analysis focusing on features.
cos_plot_ckpt.py: Cosine similarity visualization across checkpoints.
traj.py: Analyzes trajectories of decoder vectors (W_dec).

About

Code for the paper "Tracking the Feature Dynamics in LLM Training: A Mechanistic Study"

Report repository

Releases

No releases published

Packages

No packages published

Languages