8000 GitHub - YIFANK/SketchAgent
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

YIFANK/SketchAgent

 
 

Repository files navigation

SketchAgent: Language-Driven Sequential Sketch Generation


SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, Antonio Torralba


SketchAgent leverages an off-the-shelf multimodal LLM to facilitate language-driven, sequential sketch generation through an intuitive sketching language. It can sketch diverse concepts, engage in interactive sketching with humans, and edit content via chat.

Setup

Clone the repository and navigate to the project folder:

git clone https://github.com/yael-vinker/SketchAgent.git
cd SketchAgent

Set up the environment:

conda env create -f environment.yml
conda activate sketch_agent

For Mac users, use the following environment file instead:

conda env create -f mac_environment.yml
conda activate sketch_agent

If python flashes a warning at you, try reinstalling cairosvg:

conda uninstall cairosvg && conda install cairosvg

API Key

This repository requires an Anthropic API key. If you don't have one, create an Anthropic account and follow the instructions to obtain a key.

Once you have the key, save it in the .env file:

ANTHROPIC_API_KEY=<your_key>

Start Sketching! 👩‍🎨 🎨

Text-to-Sketch

Generate a single sketch by running:

python gen_sketch.py --concept_to_draw "<your_concept_here>" 

For example:

python gen_sketch.py --concept_to_draw "sailboat" 

Optional arguments:

  • --seed_mode Default is "deterministic" for reproducible results. Set to "stochastic" for increased variability.
  • --path2save By default, results are saved to results/test/.

Collaborative Sketching

Collaborate with SketchAgent by alternating strokes! To use the interactive interface:

python collab_sketch.py

This will launch a Flask-based web application. Once running, look for the following output in the terminal:

Server running at: http://<your-ip-address>:5000

Open the provided URL in your web browser to interact with the application. Results are saved to results/collab_sketching/. Use the text box to change the concept to be drawn.

Chat-Based Editing


Interact with SketchAgent through natural language to edit existing sketches! To use the chat-based editing interface:

python chat_and_edit.py

This will launch a Flask-based web application. Once running, look for output like:

Server running at: http://<your-ip-address>:5000

You can then:

  • Give textual instructions to edit specific sketch elements
  • Add new elements through natural conversation

Results are saved to:

results/api_{timestamp}_{session_id}

🧑‍💻 Accessing the Interface

✅ If Running Locally:

Open the printed URL (e.g., http://127.0.0.1:5000) in your browser.

🔐 If Running Remotely via SSH:

Use SSH port forwarding to access the app from your local browser:

  1. On your local machine (not the server), forward the port:

    ssh -L 5000:localhost:5000 your_username@remote_server_ip
  2. Once connected and the app is running, open:

    http://localhost:5000
    

    in your local browser. This securely tunnels traffic to the remote Flask app.

Tips:

  • The gen_sketch.py script produces sketches with variability. Try running it multiple times to explore different outcomes.
  • Prompts are available in the prompts.py file. For unique concepts, ensure that your input prompt is clear and meaningful.

TODOs

  • Add support for chat based editing.
  • Add SVG drawing process animations in HTML.
  • Add support of other backbone models (GPT4o, LLama3).

Citation

If you find this useful for your research, please cite the following:

@misc{vinker2024sketchagent,
      title={SketchAgent: Language-Driven Sequential Sketch Generation}, 
      author={Yael Vinker and Tamar Rott Shaham and Kristine Zheng and Alex Zhao and Judith E Fan and Antonio Torralba},
      year={2024},
      eprint={2411.17673},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17673}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.0%
  • HTML 28.0%
0