This document outlines the architecture and implementation plan for adding image generation support to ReqLLM, following the library's established patterns for operation-based dispatch and provider abstraction.
- Add first-class image generation support with
generate_image/3andgenerate_image!/3functions - Support OpenAI (DALL-E) and Google (Gemini) as initial providers
- Design a flexible API that can accommodate future providers
- Maintain consistency with existing ReqLLM patterns and conventions
Following the Vercel AI SDK-inspired pattern used for generate_text/3 and generate_object/4:
# Simple image generation
{:ok, response} = ReqLLM.generate_image("openai:dall-e-3", "A sunset over mountains")
# With options
{:ok, response} = ReqLLM.generate_image(
"openai:dall-e-3",
"A sunset over mountains",
size: "1792x1024",
quality: :hd,
style: :natural
)
# Bang version for simple use cases
image = ReqLLM.generate_image!("google:gemini-2.0-flash-preview-image-generation", "A cat in space")
# Access response data
response.images # List of generated images
response.revised_prompt # Provider's revised prompt (if applicable)
response.usage # Cost/usage metadatalib/req_llm/
├── image_generation.ex # High-level API module (like Generation, Embedding)
├── image_generation/
│ └── response.ex # ImageGenerationResponse struct
A new response struct specifically for image generation results:
defmodule ReqLLM.ImageGenerationResponse do
@moduledoc """
Response struct for image generation operations.
Contains generated images with metadata, usage information,
and provider-specific details.
"""
use TypedStruct
typedstruct enforce: true do
# Core fields
field(:id, String.t())
field(:model, String.t())
# Generated images (list to support n > 1)
field(:images, [Image.t()])
# Provider may revise the prompt (DALL-E 3 does this)
field(:revised_prompt, String.t() | nil, default: nil)
# Metadata
field(:usage, map() | nil)
field(:provider_meta, map(), default: %{})
# Error handling
field(:error, Exception.t() | nil, default: nil)
end
@doc "Extract first image data (convenience for n=1 case)"
def image(response), do: List.first(response.images)
@doc "Extract first image as binary data"
def data(response), do: image(response) && image(response).data
@doc "Extract first image URL (if url format was requested)"
def url(response), do: image(response) && image(response).url
endIndividual image representation:
defmodule ReqLLM.ImageGenerationResponse.Image do
@moduledoc """
Represents a single generated image.
Images can be returned as URLs (temporary, provider-hosted) or
as base64-encoded binary data, depending on the response_format option.
"""
use TypedStruct
typedstruct do
# Image data (mutually exclusive with url)
field(:data, binary() | nil)
field(:media_type, String.t() | nil) # e.g., "image/png"
# URL (mutually exclusive with data)
field(:url, String.t() | nil)
# Provider's revised prompt for this specific image
field(:revised_prompt, String.t() | nil)
# Index for batch generation (n > 1)
field(:index, non_neg_integer(), default: 0)
end
@doc "Check if image is base64 data"
def base64?(image), do: image.data != nil
@doc "Check if image is URL"
def url?(image), do: image.url != nil
@doc "Convert to ContentPart for use in multi-modal prompts"
def to_content_part(%__MODULE__{data: data, media_type: media_type}) when data != nil do
ReqLLM.Message.ContentPart.image(data, media_type)
end
def to_content_part(%__MODULE__{url: url}) when url != nil do
ReqLLM.Message.ContentPart.image_url(url)
end
end@image_generation_schema NimbleOptions.new!(
# Number of images to generate
n: [
type: :pos_integer,
default: 1,
doc: "Number of images to generate (1-10, provider dependent)"
],
# Image dimensions
size: [
type: :string,
doc: "Image size (e.g., '1024x1024', '1792x1024'). Provider-specific."
],
# Response format
response_format: [
type: {:in, [:url, :b64_json]},
default: :b64_json,
doc: "Format for returned images: :url (temporary URL) or :b64_json (base64 data)"
],
# User identifier
user: [
type: :string,
doc: "User identifier for tracking and abuse detection"
],
# Provider-specific options pass-through
provider_options: [
type: {:or, [:map, {:list, :any}]},
doc: "Provider-specific options",
default: []
],
# HTTP options
req_http_options: [
type: {:or, [:map, {:list, :any}]},
doc: "Req-specific HTTP options",
default: []
],
# Testing
fixture: [
type: {:or, [:string, {:tuple, [:atom, :string]}]},
doc: "HTTP fixture for testing"
]
)@openai_image_schema [
# DALL-E 3 quality
quality: [
type: {:in, [:standard, :hd, "standard", "hd"]},
default: :standard,
doc: "Image quality (DALL-E 3 only): :standard or :hd"
],
# DALL-E 3 style
style: [
type: {:in, [:vivid, :natural, "vivid", "natural"]},
default: :vivid,
doc: "Image style (DALL-E 3 only): :vivid or :natural"
]
]@google_image_schema [
# Aspect ratio
aspect_ratio: [
type: {:in, ["1:1", "16:9", "9:16", "4:3", "3:4"]},
doc: "Image aspect ratio"
],
# Safety settings
google_safety_settings: [
type: {:list, :map},
doc: "Safety filter settings"
]
]Endpoint: POST https://api.openai.com/v1/images/generations
Request Parameters:
model- "dall-e-2" or "dall-e-3"prompt- Text description (required)n- Number of images (1-10 for DALL-E 2, only 1 for DALL-E 3)size- Image dimensions- DALL-E 2: "256x256", "512x512", "1024x1024"
- DALL-E 3: "1024x1024", "1792x1024", "1024x1792"
quality- "standard" or "hd" (DALL-E 3 only)style- "vivid" or "natural" (DALL-E 3 only)response_format- "url" or "b64_json"
Response Format:
{
"created": 1234567890,
"data": [
{
"b64_json": "...",
"revised_prompt": "A detailed sunset..."
}
]
}Endpoint: POST https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
Models:
gemini-2.0-flash-preview-image-generation- Fast, efficient model- Future:
gemini-3-pro-image-preview- Advanced model with reasoning
Request Format:
{
"contents": [
{
"role": "user",
"parts": [{"text": "Generate an image of..."}]
}
],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"imageConfig": {
"aspectRatio": "16:9"
}
}
}Response Format:
{
"candidates": [
{
"content": {
"parts": [
{"text": "Here's the image..."},
{
"inlineData": {
"mimeType": "image/png",
"data": "base64..."
}
}
]
}
}
],
"usageMetadata": {...}
}Add :image_generation to the operation type:
# In ReqLLM.Provider
@type operation :: :chat | :object | :embedding | :image_generation | atom()# In ReqLLM.Providers.OpenAI
@impl ReqLLM.Provider
def prepare_request(:image_generation, model_spec, prompt, opts) do
with {:ok, model} <- ReqLLM.model(model_spec),
{:ok, processed_opts} <- process_image_options(__MODULE__, model, opts) do
http_opts = Keyword.get(processed_opts, :req_http_options, [])
request =
Req.new(
[
url: "/images/generations",
method: :post,
receive_timeout: 120_000 # Image generation can take longer
] ++ http_opts
)
|> Req.Request.register_options(image_option_keys())
|> Req.Request.merge_options(
Keyword.take(processed_opts, image_option_keys()) ++
[
model: model.id,
prompt: prompt,
operation: :image_generation,
base_url: Keyword.get(processed_opts, :base_url, base_url())
]
)
|> attach_image(model, processed_opts)
{:ok, request}
end
end
defp encode_image_body(request) do
body = %{
"model" => request.options[:model],
"prompt" => request.options[:prompt],
"n" => request.options[:n] || 1,
"size" => request.options[:size] || "1024x1024",
"response_format" => to_string(request.options[:response_format] || :b64_json)
}
|> maybe_put("quality", request.options[:quality])
|> maybe_put("style", request.options[:style])
|> maybe_put("user", request.options[:user])
request
|> Req.Request.put_header("content-type", "application/json")
|> Map.put(:body, Jason.encode!(body))
end
defp decode_image_response({req, %Req.Response{status: 200, body: body} = resp}) do
parsed = ensure_parsed_body(body)
images =
parsed["data"]
|> Enum.with_index()
|> Enum.map(fn {image_data, index} ->
%ReqLLM.ImageGenerationResponse.Image{
data: image_data["b64_json"] && Base.decode64!(image_data["b64_json"]),
url: image_data["url"],
media_type: "image/png",
revised_prompt: image_data["revised_prompt"],
index: index
}
end)
response = %ReqLLM.ImageGenerationResponse{
id: parsed["created"] |> to_string(),
model: req.options[:model],
images: images,
revised_prompt: List.first(images) && List.first(images).revised_prompt,
usage: nil, # OpenAI doesn't return usage for image generation
provider_meta: %{}
}
{req, %{resp | body: response}}
end# In ReqLLM.Providers.Google
@impl ReqLLM.Provider
def prepare_request(:image_generation, model_spec, prompt, opts) do
with {:ok, model} <- ReqLLM.model(model_spec),
{:ok, processed_opts} <- process_image_options(__MODULE__, model, opts) do
http_opts = Keyword.get(processed_opts, :req_http_options, [])
request =
Req.new(
[
url: "/models/#{model.id}:generateContent",
method: :post,
receive_timeout: 120_000
] ++ http_opts
)
|> Req.Request.register_options(image_option_keys())
|> Req.Request.merge_options(
Keyword.take(processed_opts, image_option_keys()) ++
[
model: model.id,
prompt: prompt,
operation: :image_generation,
base_url: effective_base_url(processed_opts)
]
)
|> attach_image(model, processed_opts)
{:ok, request}
end
end
defp encode_image_body(request) when request.options[:operation] == :image_generation do
prompt = request.options[:prompt]
generation_config =
%{
responseModalities: ["TEXT", "IMAGE"]
}
|> maybe_put_image_config(request.options)
body = %{
contents: [
%{
role: "user",
parts: [%{text: prompt}]
}
],
generationConfig: generation_config
}
|> maybe_put(:safetySettings, request.options[:google_safety_settings])
request
|> Req.Request.put_header("content-type", "application/json")
|> Map.put(:body, Jason.encode!(body))
end
defp maybe_put_image_config(config, opts) do
image_config =
%{}
|> maybe_put(:aspectRatio, opts[:aspect_ratio])
if map_size(image_config) > 0 do
Map.put(config, :imageConfig, image_config)
else
config
end
end
defp decode_image_response({req, %Req.Response{status: 200, body: body} = resp})
when req.options[:operation] == :image_generation do
parsed = ensure_parsed_body(body)
images =
case parsed do
%{"candidates" => [%{"content" => %{"parts" => parts}} | _]} ->
parts
|> Enum.with_index()
|> Enum.filter(fn {part, _} -> Map.has_key?(part, "inlineData") end)
|> Enum.map(fn {part, index} ->
inline_data = part["inlineData"]
%ReqLLM.ImageGenerationResponse.Image{
data: Base.decode64!(inline_data["data"]),
media_type: inline_data["mimeType"],
index: index
}
end)
_ ->
[]
end
text_content =
case parsed do
%{"candidates" => [%{"content" => %{"parts" => parts}} | _]} ->
parts
|> Enum.filter(&Map.has_key?(&1, "text"))
|> Enum.map_join("", & &1["text"])
_ ->
nil
end
usage = extract_usage_from_google_response(parsed)
response = %ReqLLM.ImageGenerationResponse{
id: "google-#{System.unique_integer([:positive])}",
model: req.options[:model],
images: images,
revised_prompt: text_content,
usage: usage,
provider_meta: %{}
}
{req, %{resp | body: response}}
endThe high-level API module following the pattern of ReqLLM.Embedding:
defmodule ReqLLM.ImageGeneration do
@moduledoc """
Image generation functionality for ReqLLM.
Provides text-to-image generation capabilities with support for:
- Single and batch image generation
- Multiple output formats (URL or base64)
- Provider-specific options (quality, style, aspect ratio)
## Supported Providers
- OpenAI (DALL-E 2, DALL-E 3)
- Google (Gemini 2.0 Flash Image)
## Examples
# Simple generation
{:ok, response} = ReqLLM.ImageGeneration.generate("openai:dall-e-3", "A sunset")
image_data = ReqLLM.ImageGenerationResponse.data(response)
# With options
{:ok, response} = ReqLLM.ImageGeneration.generate(
"openai:dall-e-3",
"A professional portrait",
size: "1024x1024",
quality: :hd,
style: :natural
)
"""
alias ReqLLM.ImageGenerationResponse
@base_schema NimbleOptions.new!(
n: [type: :pos_integer, default: 1],
size: [type: :string],
response_format: [type: {:in, [:url, :b64_json]}, default: :b64_json],
user: [type: :string],
provider_options: [type: {:or, [:map, {:list, :any}]}, default: []],
req_http_options: [type: {:or, [:map, {:list, :any}]}, default: []],
fixture: [type: {:or, [:string, {:tuple, [:atom, :string]}]}]
)
@doc "Returns the base image generation options schema."
@spec schema :: NimbleOptions.t()
def schema, do: @base_schema
@doc """
Returns list of model specs that support image generation.
"""
@spec supported_models() :: [String.t()]
def supported_models do
# Initially hardcoded, later integrate with LLMDB capabilities
[
"openai:dall-e-2",
"openai:dall-e-3",
"google:gemini-2.0-flash-preview-image-generation"
]
end
@doc """
Validates that a model supports image generation operations.
"""
@spec validate_model(String.t() | {atom(), keyword()} | struct()) ::
{:ok, LLMDB.Model.t()} | {:error, term()}
def validate_model(model_spec) do
with {:ok, model} <- ReqLLM.model(model_spec) do
model_string = LLMDB.Model.spec(model)
if model_string in supported_models() do
{:ok, model}
else
{:error,
ReqLLM.Error.Invalid.Parameter.exception(
parameter: "model: #{model_string} does not support image generation"
)}
end
end
end
@doc """
Generates images from a text prompt.
## Parameters
* `model_spec` - Model specification (e.g., "openai:dall-e-3")
* `prompt` - Text description of the image to generate
* `opts` - Generation options
## Options
* `:n` - Number of images to generate (default: 1)
* `:size` - Image dimensions (provider-specific)
* `:response_format` - :url or :b64_json (default: :b64_json)
* `:quality` - :standard or :hd (OpenAI DALL-E 3 only)
* `:style` - :vivid or :natural (OpenAI DALL-E 3 only)
* `:aspect_ratio` - "1:1", "16:9", etc. (Google only)
* `:provider_options` - Provider-specific options
## Examples
{:ok, response} = ReqLLM.ImageGeneration.generate(
"openai:dall-e-3",
"A serene mountain landscape at sunset"
)
# Get the image data
image = ReqLLM.ImageGenerationResponse.image(response)
File.write!("landscape.png", image.data)
"""
@spec generate(
String.t() | {atom(), keyword()} | struct(),
String.t(),
keyword()
) :: {:ok, ImageGenerationResponse.t()} | {:error, term()}
def generate(model_spec, prompt, opts \\ [])
def generate(model_spec, prompt, opts) when is_binary(prompt) do
with {:ok, model} <- validate_model(model_spec),
:ok <- validate_prompt(prompt),
{:ok, provider_module} <- ReqLLM.provider(model.provider),
{:ok, request} <- provider_module.prepare_request(:image_generation, model, prompt, opts),
{:ok, %Req.Response{status: status, body: response}} when status in 200..299 <-
Req.request(request) do
{:ok, response}
else
{:ok, %Req.Response{status: status, body: body}} ->
{:error,
ReqLLM.Error.API.Request.exception(
reason: "HTTP #{status}: Request failed",
status: status,
response_body: body
)}
{:error, error} ->
{:error, error}
end
end
@doc """
Generates images from a text prompt, raising on error.
"""
@spec generate!(
String.t() | {atom(), keyword()} | struct(),
String.t(),
keyword()
) :: ImageGenerationResponse.t()
def generate!(model_spec, prompt, opts \\ []) do
case generate(model_spec, prompt, opts) do
{:ok, response} -> response
{:error, error} -> raise error
end
end
defp validate_prompt("") do
{:error, ReqLLM.Error.Invalid.Parameter.exception(parameter: "prompt: cannot be empty")}
end
defp validate_prompt(prompt) when is_binary(prompt), do: :ok
endAdd to the main ReqLLM module:
# In lib/req_llm.ex
alias ReqLLM.ImageGeneration
@doc """
Generates images from a text prompt using an AI model.
Returns a canonical ImageGenerationResponse which includes generated images,
usage data, and metadata.
## Parameters
* `model_spec` - Model specification (e.g., "openai:dall-e-3")
* `prompt` - Text description of the image to generate
* `opts` - Additional options (keyword list)
## Options
* `:n` - Number of images to generate (default: 1)
* `:size` - Image dimensions (e.g., "1024x1024", "1792x1024")
* `:response_format` - :url or :b64_json (default: :b64_json)
* `:quality` - :standard or :hd (OpenAI DALL-E 3 only)
* `:style` - :vivid or :natural (OpenAI DALL-E 3 only)
* `:aspect_ratio` - "1:1", "16:9", etc. (Google only)
* `:provider_options` - Provider-specific options
## Examples
{:ok, response} = ReqLLM.generate_image("openai:dall-e-3", "A sunset over mountains")
# Access first image
image = ReqLLM.ImageGenerationResponse.image(response)
File.write!("sunset.png", image.data)
"""
defdelegate generate_image(model_spec, prompt, opts \\ []), to: ImageGeneration, as: :generate
@doc """
Generates images from a text prompt, returning the response directly.
Raises on error.
"""
defdelegate generate_image!(model_spec, prompt, opts \\ []), to: ImageGeneration, as: :generate!# Content policy violation
%ReqLLM.Error.API.Response{
reason: "Content policy violation",
status: 400,
response_body: %{"error" => %{"code" => "content_policy_violation"}}
}
# Invalid model for image generation
%ReqLLM.Error.Invalid.Parameter{
parameter: "model: openai:gpt-4 does not support image generation"
}
# Provider-specific errors
%ReqLLM.Error.API.Response{
reason: "Rate limit exceeded",
status: 429,
response_body: %{}
}defmodule ReqLLM.ImageGenerationTest do
use ExUnit.Case
describe "generate/3" do
test "validates empty prompt" do
assert {:error, %ReqLLM.Error.Invalid.Parameter{}} =
ReqLLM.ImageGeneration.generate("openai:dall-e-3", "")
end
test "validates unsupported model" do
assert {:error, %ReqLLM.Error.Invalid.Parameter{}} =
ReqLLM.ImageGeneration.generate("openai:gpt-4", "A cat")
end
end
describe "supported_models/0" do
test "returns image generation capable models" do
models = ReqLLM.ImageGeneration.supported_models()
assert "openai:dall-e-3" in models
refute "openai:gpt-4" in models
end
end
enddefmodule ReqLLM.ImageGeneration.OpenAITest do
use ExUnit.Case
@moduletag :integration
describe "OpenAI DALL-E 3" do
test "generates image with default options" do
{:ok, response} = ReqLLM.generate_image(
"openai:dall-e-3",
"A simple red circle on white background",
fixture: "openai_dalle3_simple"
)
assert %ReqLLM.ImageGenerationResponse{} = response
assert length(response.images) == 1
assert response.images |> hd() |> Map.get(:data) |> is_binary()
end
end
end
defmodule ReqLLM.ImageGeneration.GoogleTest do
use ExUnit.Case
@moduletag :integration
describe "Google Gemini Image" do
test "generates image with default options" do
{:ok, response} = ReqLLM.generate_image(
"google:gemini-2.0-flash-preview-image-generation",
"A simple blue square",
fixture: "google_gemini_image_simple"
)
assert %ReqLLM.ImageGenerationResponse{} = response
assert length(response.images) >= 1
end
end
end- Create
ReqLLM.ImageGenerationResponsestruct - Create
ReqLLM.ImageGenerationResponse.Imagestruct - Create
ReqLLM.ImageGenerationmodule with schema - Add
:image_generationoperation type to Provider behavior - Add model validation for image generation capability
- Implement
prepare_request(:image_generation, ...)in OpenAI provider - Implement
encode_image_body/1for DALL-E API format - Implement
decode_image_response/1for DALL-E response parsing - Add OpenAI-specific options (quality, style)
- Add unit tests
- Add integration tests with fixtures
- Implement
prepare_request(:image_generation, ...)in Google provider - Modify
encode_body/1to handle image generation operation - Modify
decode_response/1to handle image responses - Add Google-specific options (aspect_ratio)
- Add unit tests
- Add integration tests with fixtures
- Add
generate_image/3andgenerate_image!/3to ReqLLM module - Add documentation with examples
- Update README with image generation section
- Add image generation models to LLMDB (if not already present)
- Add
image_generationcapability flag - Add model constraints (sizes, aspect ratios, etc.)
- Complete module documentation
- Add usage examples
- Add error handling guide
# Potential future API for image editing (DALL-E 2 only)
ReqLLM.edit_image("openai:dall-e-2",
image: image_binary,
mask: mask_binary,
prompt: "Add a hat"
)# Potential future API for image variations (DALL-E 2 only)
ReqLLM.vary_image("openai:dall-e-2",
image: original_image,
n: 3
)Google's Gemini 3 Pro supports using multiple reference images:
# Potential future API
ReqLLM.generate_image("google:gemini-3-pro-image",
prompt: "Combine these styles",
reference_images: [image1, image2, image3]
)This architecture provides:
- Consistency: Follows existing ReqLLM patterns for operations, providers, and responses
- Flexibility: Provider-specific options while maintaining a unified API
- Extensibility: Easy to add new providers and future features (editing, variations)
- Type Safety: Clear struct definitions with TypedStruct
- Testability: Fixture support and clear separation of concerns
The implementation follows the established patterns from ReqLLM.Embedding and ReqLLM.Generation, making it familiar to existing users and maintainers.