feat(image generation): Add image generation support for models OpenAI and Google Gemini #1009

usiegj00 · 2025-06-17T10:42:31Z

Image Generation Feature Implementation Summary

Overview

This PR adds first-class image generation support to langchainrb, addressing issue #924. The implementation follows existing patterns in the codebase while making image generation a provider-agnostic feature that works seamlessly across OpenAI, Google Gemini, and Google Vertex AI.

Key Design Decisions

1. Provider-Agnostic API

Added generate_image method to Langchain::LLM::Base that raises NotImplementedError
Each provider implements its own version with provider-specific parameters
Consistent interface across all providers: generate_image(prompt:, n: 1, ...)

2. Model Configuration via Defaults

Following the existing pattern, image generation models are configured through the @defaults hash
Added image_generation_model to DEFAULTS for each provider:
- OpenAI: "dall-e-3"
- Google Gemini: "gemini-2.0-flash-preview-image-generation"
- Google Vertex AI: "imagen-3.0-generate-002"

3. Response Handling

Providers return different formats:
- OpenAI: URLs via image_urls method
- Google providers: Base64-encoded data via image_base64s method
Response classes extended with appropriate helper methods
Google Gemini requires both TEXT and IMAGE response modalities in the request

Implementation Details

Files Modified

lib/langchain/llm/base.rb
- Added stub generate_image method that raises NotImplementedError
lib/langchain/llm/openai.rb
- Implemented generate_image using OpenAI's Images API
- Returns URLs to generated images
- Added image_generation_model to DEFAULTS
lib/langchain/llm/google_gemini.rb
- Implemented generate_image using Gemini's generateContent endpoint
- Requires both TEXT and IMAGE response modalities
- Returns base64-encoded images
- Added image_generation_model to DEFAULTS
lib/langchain/llm/google_vertexai.rb
- Implemented generate_image using Vertex AI's Imagen model
- Returns base64-encoded images
- Added image_generation_model to DEFAULTS

Response Classes

lib/langchain/llm/response/openai_response.rb
- Added image_urls method to extract URLs from image generation responses
lib/langchain/llm/response/google_gemini_response.rb
- Added image_base64s method to extract base64 data from inlineData.data field
lib/langchain/llm/response/google_vertex_ai_response.rb (new file)
- Created specialized response class for Vertex AI image responses
- Implements image_base64s method

Examples and Tests

examples/generate_image.rb
- Comprehensive example showing unified API across all providers
- Demonstrates polymorphic usage
- Handles both URL and base64 response formats
- Saves base64 images to files
Test Coverage
- spec/lib/langchain/llm/base_spec.rb - Tests NotImplementedError
- spec/lib/langchain/llm/openai_image_spec.rb - Tests OpenAI implementation
- spec/lib/langchain/llm/google_gemini_image_spec.rb - Tests Gemini implementation
- spec/lib/langchain/llm/google_vertexai_image_spec.rb - Tests Vertex AI implementation

Usage Example

# The unified API allows seamless provider switching
llms = []
llms << Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"]) if ENV["OPENAI_API_KEY"]
llms << Langchain::LLM::GoogleGemini.new(api_key: ENV["GOOGLE_GEMINI_API_KEY"]) if ENV["GOOGLE_GEMINI_API_KEY"]

llms.each do |llm|
  response = llm.generate_image(prompt: "A ruby gemstone")
  
  if response.respond_to?(:image_urls)
    puts response.image_urls
  elsif response.respond_to?(:image_base64s)
    File.write("image.png", Base64.decode64(response.image_base64s.first))
  end
end

Benefits

Unified Interface - Single API works across all providers
Provider Flexibility - Easy to switch between providers without code changes
Extensibility - New providers can implement generate_image following the same pattern
Ruby Idiomatic - Follows existing langchainrb patterns and conventions
First-Class Feature - Image generation is not tied to specific provider nomenclature

Future Considerations

Additional providers (Anthropic, Replicate, etc.) can implement the same interface when they add image generation
The response format difference (URLs vs base64) could potentially be unified in the future
Parameters could be further standardized across providers

This implementation provides a solid foundation for image generation in langchainrb while maintaining backward compatibility and following established patterns in the codebase.

…tex AI - Add generate_image method to LLM::Base that raises NotImplementedError - Implement generate_image for OpenAI using DALL-E API - Implement generate_image for Google Gemini using generateContent endpoint - Implement generate_image for Google Vertex AI using Imagen model - Add image_urls helper to OpenAIResponse for URL responses - Add image_base64s helper to Google response classes for base64 data - Add comprehensive specs for all implementations - Add example demonstrating unified API across providers Fixes patterns-ai-core#924

usiegj00 changed the title ~~Add image generation support for models OpenAI and Google Gemini~~ feat(image generation): Add image generation support for models OpenAI and Google Gemini Jun 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(image generation): Add image generation support for models OpenAI and Google Gemini #1009

feat(image generation): Add image generation support for models OpenAI and Google Gemini #1009

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat(image generation): Add image generation support for models OpenAI and Google Gemini #1009

Are you sure you want to change the base?

feat(image generation): Add image generation support for models OpenAI and Google Gemini #1009

Uh oh!

Conversation

Image Generation Feature Implementation Summary

Overview

Key Design Decisions

1. Provider-Agnostic API

2. Model Configuration via Defaults

3. Response Handling

Implementation Details

Files Modified

Response Classes

Examples and Tests

Usage Example

Benefits

Future Considerations

Uh oh!

Uh oh!