Matthew A. Clarke, Hardik Bhatnagar, Joseph Bloom
This repository contains the code for the LessWrong Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces. See also our app to explore the results.
This project uses Poetry for dependency management. To install dependencies, run:
poetry install
Access to HuggingFace via API token for base models.
In src/cooc
:
1_generate_normalised_features_loop.py
generates normalised feature co-occurrence data according to a config file specified by the user. These are saved as npz compressed numpy arrays. For each model and SAE layer this generates:
- A co-occurrence matrix of shape
[n_features, n_features]
(e.g.,feature_acts_cooc_total_threshold_1_5.npz
) - A normalised co-occurrence matrix of shape
[n_features, n_features]
(e.g.,feature_acts_cooc_jaccard_threshold_1_5.npz
) - A list of overall feature occurrences of shape
[n_features]
(e.g.,feature_acts_total_threshold_1_5.npz
)
2_generate_graphs_loop.py
calculates a threshold such that the largest connected component of the normalised feature co-occurrence graph is below a threshold. It then generates a dataframe of all the nodes, their subgraphs, and neuronpedia links for said subgraphs.
Output:
- A datatable of all nodes, subgraphs, and neuronpedia links (e.g.,
dataframes/node_info_df_1_5.csv
)
3_analyse_subspaces.ipynb
- Template notebook to analyse a subspace using PCA.
4_pca_for_streamlit.py
generates PCA data for a set of example graphs.
Output:
- An h5 file containing:
- PCA data (pca_df)
- Results from pca.py/ProcessedResults class (includes tokens, context, etc.)
Template notebook to examine and analyse PCA data saved as h5 files from 4_pca_for_streamlit.py
rather than running the PCA from scratch as in 3_analyse_subspaces.ipynb
.
n_batches
: Number of batches of the activation store to cycle through (Default: 1000 for gpt2-small, 500 for feature splitting)model_name
: Name of the model to usesae_release_short
: Short name of the SAE release (e.g., 'res-jb' or 'res-jb-feature-splitting')sae_ids
: List of SAE IDs to useactivation_thresholds
: List of activation thresholds for feature activation countingremove_special_tokens
: Removes pad, BOS and EOS tokens before generating clusters.
random_seed
: Random seedmin_subgraph_size
: Minimum size of largest connected component (default: 150)max_subgraph_size
: Maximum size of largest connected component (default: 200)min_subgraph_size_to_plot
: Minimum size for HTML/pyvis visualizationskip_subgraph_plots
: Toggle subgraph plottingskip_subgraph_pickles
: Toggle saving subgraphs as pickle filesinclude_metrics
: Toggle inclusion of hubness metrics
candidate_sizes
: List of candidate sizes for PCA analysiscandidates_per_size
: Number of candidates per sizen_batches_reconstruction
: Number of batches for PCA reconstructionrecalculate_results
: Deprecated
This project uses Ruff for linting and formatting, Pyright for type checking, and Pytest for testing.
To run all checks:
make check-ci
In VSCode, install the Ruff extension for automatic linting and formatting. Enable formatting on save for best results.
Install pre-commit hook for automatic linting and type-checking:
poetry run pre-commit install
Common Poetry commands:
- Install main dependency:
poetry add <package>
- Install development dependency:
poetry add --dev <package>
- Update lockfile:
poetry lock
- Run command in virtual environment:
poetry run <command>
- Run Python file as module:
poetry run python -m sae_coocurrence.path.to.file
See src/size_effects/gpt2_768_heatmap_and_cluster_stats.ipynb
See src/size_effects/gpt2_768_heatmap_and_cluster_stats.ipynb
See src/size_effects/gpt2_768_heatmap_and_cluster_stats.ipynb
See src/size_effects/features_active_per_token.py
See src/size_effects/feature_distribution_data.py
and src/size_effects/feature_distribution_plots.py
See src/size_effects/subgraph_size_vs_width.ipynb
See src/size_effects/feature_graph_l0.py
SAEs trained on Gemma-2-2b (Gemma Scope) encode qualitative statements about the number of items compositionally (Figure 12, 13, 14):
See src/example_clusters/gemma_one_of_4740_layer_12_1_5_activation_100_batch_100_pca.ipynb
. This relies on loading the h5 file generated by 5_pca_for_streamlit.py
for this cluster.
Encoding of continuous properties in feature strength without compositionality (Figure 22, 23, Appendix Figure 16, 16): 88F7 h4>
See src/example_clusters/gemma_counting_1370_layer_0_1_5_100_100.ipynb
and src/example_clusters/gemma_first_second_layer_21_1_5_511.ipynb
.
Encoding of continuous properties in feature strength without compositionality (Appendix Figure 17):
See src/example_clusters/gemma_apostrophe_4334_layer_12_1_5_100_100.ipynb
See src/example_clusters/gpt2_layer8_24k_1_5_787_how.ipynb
Distinguishing the type of entity whose possession is indicated by an apostrophe in Gemma-2-2b (Figure 28, 25, 25, 26):
See src/example_clusters/gemma_apostrophe_4334_layer_12_1_5_100_100.ipynb
Months of the year (Appendix Figures 11)
Streamlit app uses data generated by 5_pca_for_streamlit.py
which depends on the dataframes generated by 2_generate_graphs_loop.py
.
In cases where the data files are too large to be stored on a github repository, the data files can be split with src/datahandling/split_h5.py
. When running locally this is not necessary.
To run the streamlit app:
poetry run streamlit run sae_cooccurrence/general_streamlit.py
The streamlit app configuration is stored in sae_cooccurrence/config_pca_streamlit_maxexamples.toml
. This allows the user to specify the models, SAE releases, and SAE IDs to use in the app.
Please cite the package as follows:
@misc{clarke2024saecooccurrence,
title = {Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces},
author = {Matthew A. Clarke, Hardik Bhatnagar, Joseph Bloom},
year = {2024},
howpublished = {\url{https://github.com/mclarke1991/sae_cooccurrence}},
}