8000 feat(image generation): Add image generation support for models OpenAI and Google Gemini by usiegj00 · Pull Request #1009 · patterns-ai-core/langchainrb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat(image generation): Add image generation support for models OpenAI and Google Gemini #1009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

usiegj00
Copy link

Image Generation Feature Implementation Summary

Overview

This PR adds first-class image generation support to langchainrb, addressing issue #924. The implementation follows existing patterns in the codebase while making image generation a provider-agnostic feature that works seamlessly across OpenAI, Google Gemini, and Google Vertex AI.

Key Design Decisions

1. Provider-Agnostic API

  • Added generate_image method to Langchain::LLM::Base that raises NotImplementedError
  • Each provider implements its own version with provider-specific parameters
  • Consistent interface across all providers: generate_image(prompt:, n: 1, ...)

2. Model Configuration via Defaults

  • Following the existing pattern, image generation models are configured through the @defaults hash
  • Added image_generation_model to DEFAULTS for each provider:
    • OpenAI: "dall-e-3"
    • Google Gemini: "gemini-2.0-flash-preview-image-generation"
    • Google Vertex AI: "imagen-3.0-generate-002"

3. Response Handling

  • Providers return different formats:
    • OpenAI: URLs via image_urls method
    • Google providers: Base64-encoded data via image_base64s method
  • Response classes extended with appropriate helper methods
  • Google Gemini requires both TEXT and IMAGE response modalities in the request

Implementation Details

Files Modified

  1. lib/langchain/llm/base.rb

    • Added stub generate_image method that raises NotImplementedError
  2. lib/langchain/llm/openai.rb

    • Implemented generate_image using OpenAI's Images API
    • Returns URLs to generated images
    • Added image_generation_model to DEFAULTS
  3. lib/langchain/llm/google_gemini.rb

    • Implemented generate_image using Gemini's generateContent endpoint
    • Requires both TEXT and IMAGE response modalities
    • Returns base64-encoded images
    • Added image_generation_model to DEFAULTS
  4. lib/langchain/llm/google_vertexai.rb

    • Implemented generate_image using Vertex AI's Imagen model
    • Returns base64-encoded images
    • Added image_generation_model to DEFAULTS

Response Classes

  1. lib/langchain/llm/response/openai_response.rb

    • Added image_urls method to extract URLs from image generation responses
  2. lib/langchain/llm/response/google_gemini_response.rb

    • Added image_base64s method to extract base64 data from inlineData.data field
  3. lib/langchain/llm/response/google_vertex_ai_response.rb (new file)

    • Created specialized response class for Vertex AI image responses
    • Implements image_base64s method

Examples and Tests

  1. examples/generate_image.rb

    • Comprehensive example showing unified API across all providers
    • Demonstrates polymorphic usage
    • Handles both URL and base64 response formats
    • Saves base64 images to files
  2. Test Coverage

    • spec/lib/langchain/llm/base_spec.rb - Tests NotImplementedError
    • spec/lib/langchain/llm/openai_image_spec.rb - Tests OpenAI implementation
    • spec/lib/langchain/llm/google_gemini_image_spec.rb - Tests Gemini implementation
    • spec/lib/langchain/llm/google_vertexai_image_spec.rb - Tests Vertex AI implementation

Usage Example

# The unified API allows seamless provider switching
llms = []
llms << Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"]) if ENV["OPENAI_API_KEY"]
llms << Langchain::LLM::GoogleGemini.new(api_key: ENV["GOOGLE_GEMINI_API_KEY"]) if ENV["GOOGLE_GEMINI_API_KEY"]

llms.each do |llm|
  response = llm.generate_image(prompt: "A ruby gemstone")
  
  if response.respond_to?(:image_urls)
    puts response.image_urls
  elsif response.respond_to?(:image_base64s)
    File.write("image.png", Base64.decode64(response.image_base64s.first))
  end
end

Benefits

  1. Unified Interface - Single API works across all providers
  2. Provider Flexibility - Easy to switch between providers without code changes
  3. Extensibility - New providers can implement generate_image following the same pattern
  4. Ruby Idiomatic - Follows existing langchainrb patterns and conventions
  5. First-Class Feature - Image generation is not tied to specific provider nomenclature

Future Considerations

  • Additional providers (Anthropic, Replicate, etc.) can implement the same interface when they add image generation
  • The response format difference (URLs vs base64) could potentially be unified in the future
  • Parameters could be further standardized across providers

This implementation provides a solid foundation for image generation in langchainrb while maintaining backward compatibility and following established patterns in the codebase.

…tex AI

- Add generate_image method to LLM::Base that raises NotImplementedError
- Implement generate_image for OpenAI using DALL-E API
- Implement generate_image for Google Gemini using generateContent endpoint
- Implement generate_image for Google Vertex AI using Imagen model
- Add image_urls helper to OpenAIResponse for URL responses
- Add image_base64s helper to Google response classes for base64 data
- Add comprehensive specs for all implementations
- Add example demonstrating unified API across providers

Fixes patterns-ai-core#924
@usiegj00 usiegj00 changed the title Add image generation support for models OpenAI and Google Gemini feat(image generation): Add image generation support for models OpenAI and Google Gemini Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0