Image Generation

The Image Generation tool creates images from text descriptions using large language models with image generation capabilities.

Overview

Image Generation provides:

Text-to-Image: Convert text descriptions to images
LLM Integration: Uses configured LLM for generation
Flexible Sizing: Configurable output dimensions
Prompt Preservation: Maintains detailed requirements from user input
Content Items: Returns images as ContentItem objects

Registration

@register_tool('image_gen', allow_overwrite=True)
class ImageGen(BaseTool):
    ...

Tool Name: image_gen

Parameters

prompt

string

required

Detailed description of the desired image content. Should include:

Main subject
Style or artistic direction
Colors and mood
Composition details
Any specific requirements or text to include

Important: Keep all specific requirements from the original request intact. Omission is prohibited.

Parameter Schema

{
  "type": "object",
  "properties": {
    "prompt": {
      "description": "Detailed description of the desired content of the generated image. Please keep the specific requirements such as text from the original request fully intact. Omission is prohibited.",
      "type": "string"
    }
  },
  "required": ["prompt"]
}

Configuration

llm_cfg

dict

required

Configuration for the image generation LLM. Must include model settings.Example:

{
    'model': 'qwen-vl-plus',
    'api_key': 'your-api-key',
    'model_server': 'dashscope'
}

size

string

default:"1024*1024"

Output image dimensions. Format: "width*height"Common sizes:

"1024*1024" - Square
"1024*768" - Landscape
"768*1024" - Portrait
"1920*1080" - HD Landscape

Usage

Basic Image Generation

from qwen_agent.tools import ImageGen
import json

# Initialize with LLM config
image_gen = ImageGen(cfg={
    'llm_cfg': {
        'model': 'qwen-vl-plus',
        'api_key': 'your-api-key',
        'model_server': 'dashscope'
    },
    'size': '1024*1024'
})

# Generate an image
result = image_gen.call(
    params=json.dumps({
        'prompt': 'A serene landscape with mountains and a lake at sunset, photorealistic style'
    })
)

print(result)
# Returns: List of ContentItem objects with image data

Custom Image Size

image_gen = ImageGen(cfg={
    'llm_cfg': {
        'model': 'qwen-vl-plus',
        'api_key': 'your-api-key'
    },
    'size': '1920*1080'  # HD landscape
})

result = image_gen.call(
    params=json.dumps({
        'prompt': 'A futuristic city skyline at night, neon lights, cyberpunk aesthetic'
    })
)

Using with Agents

from qwen_agent.agents import Assistant

bot = Assistant(
    llm={'model': 'qwen-max'},
    function_list=['image_gen'],
    # Image generation tool config passed separately
)

# Note: Configure the image_gen tool before adding to agent
from qwen_agent.tools import ImageGen

image_tool = ImageGen(cfg={
    'llm_cfg': {
        'model': 'qwen-vl-plus',
        'api_key': 'your-api-key'
    }
})

messages = [
    {
        'role': 'user',
        'content': 'Create an image of a cute dog playing in a park'
    }
]

for response in bot.run(messages=messages):
    print(response)

Return Format

The tool returns a list of ContentItem objects:

from qwen_agent.llm.schema import ContentItem

# Example return value
[
    ContentItem(
        image='https://example.com/generated-image.png',  # or base64 data
        # Additional metadata may be included
    )
]

Each ContentItem contains:

image: URL or base64-encoded image data
May include additional metadata depending on the LLM

Example: Custom Image Generation Tool

import json
import urllib.parse
from qwen_agent.tools.base import BaseTool, register_tool
import json5

@register_tool('my_image_gen')
class MyImageGen(BaseTool):
    description = 'AI painting service that generates images from text descriptions'
    parameters = [{
        'name': 'prompt',
        'type': 'string',
        'description': 'Detailed description of the desired image content, in English',
        'required': True,
    }]

    def call(self, params: str, **kwargs) -> str:
        prompt = json5.loads(params)['prompt']
        # URL encode for external service
        prompt = urllib.parse.quote(prompt)
        return json.dumps(
            {'image_url': f'https://image.pollinations.ai/prompt/{prompt}'},
            ensure_ascii=False,
        )

# Use the custom tool
bot = Assistant(
    llm={'model': 'qwen-max'},
    function_list=['my_image_gen']
)

Example: Image Generation Agent

from qwen_agent.agents import Assistant
from qwen_agent.tools import ImageGen
from qwen_agent.gui import WebUI

def create_image_generator():
    """Create an agent specialized in image generation."""
    
    # Configure image generation tool
    llm_cfg = {
        'model': 'qwen-vl-plus',
        'api_key': 'your-api-key',
        'model_server': 'dashscope'
    }
    
    bot = Assistant(
        llm={'model': 'qwen-max'},
        name='AI Artist',
        description='AI image generation service',
        system_message=(
            'You are an AI artist that creates images based on user descriptions. '
            'When users request images, use the image_gen tool to create them. '
            'Ask clarifying questions if the description is too vague.'
        ),
        function_list=['image_gen']
    )
    
    return bot

# Create and use the agent
bot = create_image_generator()

# Text-based interaction
messages = []
while True:
    user_input = input('Describe the image you want (or "quit"): ')
    if user_input.lower() in ['quit', 'exit']:
        break
    
    messages.append({'role': 'user', 'content': user_input})
    
    response = []
    for response in bot.run(messages=messages):
        print('Generating...', end='\r')
    
    messages.extend(response)
    print(f"Image generated: {response[-1]['content']}")

# Or launch with GUI
# WebUI(bot).run()

Prompt Engineering Tips

Good Prompts

Include specific details:

prompt = (
    "A golden retriever puppy sitting in a flower garden, "
    "surrounded by pink roses and white daisies, "
    "soft morning sunlight, shallow depth of field, "
    "photorealistic, 8k quality"
)

Specify style:

prompt = "A medieval castle, oil painting style, dramatic lighting, Rembrandt-inspired"

Avoid Vague Prompts

Too vague ❌:

“a dog”
“something nice”
“a picture”

Better ✅:

“A corgi puppy wearing a red bandana, sitting on green grass”
“A peaceful zen garden with raked sand and stone arrangements”
“A portrait of an elderly man with kind eyes, soft studio lighting”

Composition Keywords

Angle: “aerial view”, “close-up”, “wide angle”
Lighting: “golden hour”, “dramatic lighting”, “soft diffused light”
Style: “photorealistic”, “watercolor”, “digital art”, “sketch”
Mood: “serene”, “energetic”, “mysterious”, “cheerful”
Quality: “4k”, “8k”, “high detail”, “cinematic”

Multi-Language Support

Some models support prompts in multiple languages:

# English
prompt = "A cute cat wearing a wizard hat"

# Chinese
prompt = "一只可爱的猫咪戴着巫师帽"

# Works with both

Supported Models

Depends on your LLM configuration. Common options:

Qwen Models (via DashScope)

llm_cfg = {
    'model': 'qwen-vl-plus',
    'api_key': 'your-dashscope-key',
    'model_server': 'dashscope'
}

OpenAI DALL-E

llm_cfg = {
    'model': 'dall-e-3',
    'api_key': 'your-openai-key',
    'model_server': 'https://api.openai.com/v1'
}

Other Compatible Models

Any model that:

Accepts text prompts via chat interface
Returns images as ContentItem objects
Is supported by Qwen-Agent’s LLM interface

Advanced Usage

Batch Generation

image_gen = ImageGen(cfg={
    'llm_cfg': {'model': 'qwen-vl-plus', 'api_key': 'key'},
    'size': '1024*1024'
})

prompts = [
    'A red apple on a wooden table',
    'A blue ocean wave crashing',
    'A green forest in spring'
]

generated_images = []
for prompt in prompts:
    result = image_gen.call(params=json.dumps({'prompt': prompt}))
    generated_images.append(result)
    print(f"Generated: {prompt}")

With Code Interpreter

from qwen_agent.agents import Assistant

bot = Assistant(
    llm={'model': 'qwen-max'},
    function_list=['image_gen', 'code_interpreter'],
    system_message=(
        'You can generate images and then process them with Python code. '
        'First generate the image, then use code_interpreter to analyze or modify it.'
    )
)

messages = [{
    'role': 'user',
    'content': 'Generate an image of a data chart and then analyze its colors'
}]

for response in bot.run(messages=messages):
    print(response)

Error Handling

from qwen_agent.tools import ImageGen
import json

image_gen = ImageGen(cfg={
    'llm_cfg': {'model': 'qwen-vl-plus', 'api_key': 'key'}
})

try:
    result = image_gen.call(
        params=json.dumps({'prompt': 'A beautiful landscape'})
    )
    print("Image generated successfully")
except ValueError as e:
    print(f"Configuration error: {e}")
except Exception as e:
    print(f"Generation failed: {e}")

Best Practices

Prompt Quality

Be specific and detailed
Include style/mood descriptors
Specify quality/resolution keywords
Test different phrasings for best results

Resource Management

Image generation can be slow (10-60 seconds)
Consider caching generated images
Implement timeouts for agent workflows
Monitor API usage and costs

Content Safety

Most models have built-in content filters
Respect usage policies
Don’t attempt to bypass safety features
Review generated content before sharing

Limitations

Requires LLM with image generation capability
Generation can take 10-60 seconds
API costs may apply per image
Content policies restrict certain prompts
Quality varies by model
Some prompts may fail or be filtered

Troubleshooting

ValueError: llm_cfg is required

Ensure you provide LLM configuration:

image_gen = ImageGen(cfg={
    'llm_cfg': {
        'model': 'qwen-vl-plus',
        'api_key': 'your-key'
    }
})

Generation timeout

Increase timeout in agent configuration or implement retry logic:

import time
max_retries = 3
for attempt in range(max_retries):
    try:
        result = image_gen.call(params=...)
        break
    except TimeoutError:
        if attempt < max_retries - 1:
            time.sleep(5)
            continue
        raise

Low quality results

Add quality keywords to prompt (“4k”, “high detail”)
Try different models
Be more specific in descriptions
Experiment with style keywords

Code Interpreter

Process generated images with Python

Image Zoom (Qwen3VL)

Zoom into specific regions of images

Assistant Agent

Agent that can generate images

Get Started

Core Concepts

Guides

Built-in Agents

Built-in Tools

Overview

Registration

Parameters

Parameter Schema

Configuration

Usage

Basic Image Generation

Custom Image Size

Using with Agents

Return Format

Example: Custom Image Generation Tool

Example: Image Generation Agent

Prompt Engineering Tips

Supported Models

Qwen Models (via DashScope)

OpenAI DALL-E

Other Compatible Models

Advanced Usage

Batch Generation

With Code Interpreter

Error Handling

Best Practices

Limitations

Troubleshooting

Code Interpreter

Image Zoom (Qwen3VL)

Assistant Agent

Get Started

Core Concepts

Guides

Built-in Agents

Built-in Tools

Documentation Index

​Overview

​Registration

​Parameters

​Parameter Schema

​Configuration

​Usage

​Basic Image Generation

​Custom Image Size

​Using with Agents

​Return Format

​Example: Custom Image Generation Tool

​Example: Image Generation Agent

​Prompt Engineering Tips

​Supported Models

​Qwen Models (via DashScope)

​OpenAI DALL-E

​Other Compatible Models

​Advanced Usage

​Batch Generation

​With Code Interpreter

​Error Handling

​Best Practices

​Limitations

​Troubleshooting

​Related

Code Interpreter

Image Zoom (Qwen3VL)

Assistant Agent

Overview

Registration

Parameters

Parameter Schema

Configuration

Usage

Basic Image Generation

Custom Image Size

Using with Agents

Return Format

Example: Custom Image Generation Tool

Example: Image Generation Agent

Prompt Engineering Tips

Supported Models

Qwen Models (via DashScope)

OpenAI DALL-E

Other Compatible Models

Advanced Usage

Batch Generation

With Code Interpreter

Error Handling

Best Practices

Limitations

Troubleshooting

Related