8000 GitHub - acoastalfog/connections: An AI-generated embeddings and LLM workflow for NYT connections
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

acoastalfog/connections

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NYT Connections Solver Assistant

This Python script assists in solving the New York Times Connections puzzle by generating potential groupings based on semantic similarity and suggesting category names using a local Large Language Model (LLM).

It takes the 16 puzzle words as input and provides multiple distinct 4x4 partition guesses.

Features

  • Accepts the 16 Connections words via command-line input.
  • Uses Sentence Transformer models to embed words into vector space.
  • Calculates pairwise cosine similarity between word embeddings.
  • Employs a heuristic search to find multiple distinct 4x4 partitions of the words based on internal group cohesion (average similarity).
  • Optionally uses a local Large Language Model (LLM) via the Hugging Face transformers library to suggest a concise category name for each generated group.
  • Configurable options via command line:
    • Number of partition guesses to generate.
    • Choice of Sentence Transformer embedding model.
    • Choice of local LLM model.
    • Hugging Face Hub token for accessing models.
    • Option to disable the LLM feature entirely.

How it Works

  1. Embedding: The 16 input words are converted into numerical vectors (embeddings) using a specified Sentence Transformer model (e.g., all-MiniLM-L6-v2). These embeddings capture semantic meaning.
  2. Similarity Calculation: The cosine similarity between all pairs of word embeddings is computed.
  3. Group Scoring: All possible combinations of 4 words are generated, and each combination is scored based on the average pairwise similarity of the words within it.
  4. Heuristic Partitioning: The script greedily searches for valid partitions by combining high-scoring, disjoint groups until full 4x4 partitions are formed. This process is repeated to generate multiple distinct partition guesses.
  5. LLM Category Suggestion (Optional): If enabled, each 4-word group from the generated partitions is sent to the local LLM to request the single most likely concise category name.
  6. Output: The script displays the top N partition guesses, showing the word groups and, if enabled, the category suggestion from the LLM for each group.

Requirements

  • Python: 3.8+ recommended.
  • Libraries:
    • sentence-transformers
    • scikit-learn (for cosine_similarity)
    • numpy
    • torch (PyTorch - ensure compatibility with your CUDA version if using GPU)
    • transformers (Hugging Face library)
    • accelerate (Hugging Face helper library)
    • bitsandbytes (for 4-bit LLM quantization)
    pip install sentence-transformers scikit-learn numpy torch transformers accelerate bitsandbytes
    (Note: Ensure you install a torch version compatible with your system and CUDA setup if applicable. See PyTorch installation guide)
  • Hardware (for LLM feature):
    • NVIDIA GPU: Required for running the local LLM efficiently. Ensure appropriate NVIDIA drivers and a compatible CUDA toolkit are installed.
    • VRAM: The default LLM (mistralai/Mistral-7B-Instruct-v0.3 loaded in 4-bit) requires ~6-8GB VRAM. Check requirements if changing models.

Setup

  1. Clone Repository (Optional):
    git clone <repository_url>
    cd <repository_directory>
  2. Install Dependencies:
    pip install sentence-transformers scikit-learn numpy torch transformers accelerate bitsandbytes
  3. Local LLM Setup (Optional):
    • Run with --disable_llm if you don't have a suitable GPU or don't want category suggestions.
    • Model Download: The first time you run with a specific --llm_model_id, the model weights (potentially several GB) will be downloaded.
    • CUDA: Ensure your environment (drivers, CUDA toolkit, PyTorch CUDA version) is correctly set up.
  4. Hugging Face Token (Optional):
    • Needed only for gated/private models. Provide via --hf_token "hf_YOURTOKEN", the DEFAULT_HF_TOKEN variable in the script, huggingface-cli login, or the HF_TOKEN environment variable.

Limitations

  • Heuristic Guesses: The partition finding is based on a heuristic and is not guaranteed to find the optimal or intended Connections solution, only plausible groupings based on semantic similarity.
  • LLM Dependence: Category suggestions rely heavily on the chosen local LLM's capability. LLMs can be inaccurate, hallucinate, or struggle with nuanced Connections categories (puns, specific knowledge).
  • Computational Cost: Running the local LLM requires significant computational resources (GPU, VRAM). Initial model downloads can be large. Environment Sensitivity: Correct installation of libraries, drivers, and CUDA is crucial for GPU features.

Usage

Run the script from your terminal, providing the 16 words as a single comma-separated string using the -w argument.

python connections_solver_heuristic.py -w "WORD1,WORD2,..." [options]

# Get 3 guesses using default models
python connections_solver_heuristic.py -w "APPLE,BANANA,ORANGE,PEAR,SHIRT,PANTS,SOCKS,HAT,TABLE,CHAIR,SOFA,LAMP,DOG,CAT,FISH,BIRD" -n 3

# Use a different embedding model and disable the LLM
python connections_solver_heuristic.py -w "RED,BLUE,GREEN,YELLOW,ONE,TWO,THREE,FOUR,SQUARE,CIRCLE,TRIANGLE,RECTANGLE,NORTH,SOUTH,EAST,WEST" --embedding_model_id "all-mpnet-base-v2" --disable_llm

# Use a different LLM and provide a token
python connections_solver_heuristic.py -w "PASTE,STICK,GLUE,TAPE,BUTTER,MARGARINE,LARD,OIL,BEAR,DEER,ELK,MOOSE,SODER,WELD,BRAZE,RIVET" --llm_model_id "HuggingFaceH4/zephyr-7b-beta" --hf_token "hf_YOURTOKEN"

About

An AI-generated embeddings and LLM workflow for NYT connections

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0