Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen-Agent/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Qwen-Agent supports multiple LLM providers and model types through a unified configuration interface. The framework automatically selects the appropriate model client based on your configuration.

Basic Configuration

from qwen_agent.llm import get_chat_model

# Simple configuration
llm = get_chat_model({
    'model': 'qwen-plus',
    'model_server': 'dashscope',
    'api_key': 'your-api-key'
})

# Or use shorthand
llm = get_chat_model('qwen-plus')

Configuration Parameters

Core Parameters

model
str
required
Model identifier. Examples: 'qwen-plus', 'qwen-max', 'gpt-4'
model_server
str
Model service endpoint:
  • 'dashscope' - Use Alibaba Cloud DashScope
  • 'http://127.0.0.1:7905/v1' - Custom OpenAI-compatible endpoint
  • 'https://api.openai.com/v1' - OpenAI API
api_key
str
API key for authentication. Can also be set via environment variables:
  • DASHSCOPE_API_KEY for DashScope
  • OPENAI_API_KEY for OpenAI
model_type
str
Explicitly specify the model type. Auto-detected if not provided.Available types:
  • 'qwen_dashscope' - Qwen models via DashScope
  • 'qwenvl_dashscope' - Qwen-VL vision models
  • 'qwenaudio_dashscope' - Qwen-Audio models
  • 'oai' - OpenAI-compatible API
  • 'qwenvl_oai' - Vision models via OpenAI API
  • 'azure' - Azure OpenAI
  • 'transformers' - Local Hugging Face models
  • 'openvino' - OpenVINO optimized models
generate_cfg
dict
Generation hyperparameters (see below)

Generation Configuration

The generate_cfg dictionary controls how the LLM generates responses.

Common Parameters

llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'top_p': 0.8,
        'temperature': 0.7,
        'max_tokens': 2000,
        'max_input_tokens': 6500,
        'max_retries': 10,
        'seed': 42,
        'stop': ['\n\nObservation:', 'END'],
    }
})
top_p
float
default:"0.8"
Nucleus sampling parameter. Controls diversity by sampling from top probability mass. Range: 0.0 to 1.0
temperature
float
default:"1.0"
Sampling temperature. Higher values increase randomness. Range: 0.0 to 2.0
max_tokens
int
Maximum number of tokens to generate in the response
max_input_tokens
int
default:"6500"
Maximum input context length. Messages are automatically truncated if exceeded. Set to -1 to disable truncation.
max_retries
int
default:"0"
Number of retry attempts on service errors. Implements exponential backoff.
seed
int
Random seed for reproducible generation. Auto-generated if not provided.
stop
List[str]
Stop sequences that halt generation when encountered
cache_dir
str
Directory for caching LLM responses. Requires diskcache package.

Function Calling Parameters

parallel_function_calls
bool
default:"false"
Enable parallel execution of multiple function calls in a single response
function_choice
str
default:"'auto'"
Control function calling behavior:
  • 'auto' - Model decides whether to call functions
  • 'none' - Disable function calling
  • 'function_name' - Force call a specific function
thought_in_content
bool
default:"false"
Include reasoning thoughts in the content field along with function calls
fncall_prompt_type
str
default:"'nous'"
Function calling prompt style:
  • 'nous' - Nous Research format
  • 'qwen' - Qwen-specific format

Model Types

DashScope Models (Alibaba Cloud)

# Text generation
llm = get_chat_model({
    'model': 'qwen-max',
    'model_server': 'dashscope'
})

# Available models:
# - qwen-max: Most capable
# - qwen-plus: Balanced performance
# - qwen-turbo: Fast and efficient
Source Reference: qwen_agent/llm/__init__.py:31-100

OpenAI-Compatible Models

llm = get_chat_model({
    'model': 'gpt-4',
    'model_server': 'https://api.openai.com/v1',
    'api_key': 'sk-...'
})

Local Models

llm = get_chat_model({
    'model': 'Qwen/Qwen2.5-7B-Instruct',
    'model_type': 'transformers',
    'generate_cfg': {
        'device_map': 'auto'
    }
})

Using LLM Directly

Chat Interface

from qwen_agent.llm.schema import Message

llm = get_chat_model('qwen-plus')

# Simple query
responses = llm.chat(
    messages=[Message(role='user', content='Hello!')],
    stream=False
)
print(responses[0].content)

Function Calling

functions = [{
    'name': 'get_weather',
    'description': 'Get current weather',
    'parameters': {
        'type': 'object',
        'properties': {
            'location': {
                'type': 'string',
                'description': 'City name'
            }
        },
        'required': ['location']
    }
}]

responses = llm.chat(
    messages=[Message(role='user', content='What is the weather in Beijing?')],
    functions=functions,
    stream=False
)

# Check for function call
if responses[0].function_call:
    print(f"Function: {responses[0].function_call.name}")
    print(f"Arguments: {responses[0].function_call.arguments}")
Source Reference: qwen_agent/llm/base.py:118-290

Advanced Configuration

Response Caching

import os

llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'cache_dir': './llm_cache'  # Requires: pip install diskcache
    }
})

# Identical requests will be served from cache
response1 = llm.chat(messages=[Message(role='user', content='Hello')], stream=False)
response2 = llm.chat(messages=[Message(role='user', content='Hello')], stream=False)
# response2 is instant - served from cache

Raw API Mode

Bypass Qwen-Agent preprocessing for direct model access:
import os
os.environ['QWEN_AGENT_USE_RAW_API'] = 'true'

llm = get_chat_model('qwen-plus')

# Or configure per-model
llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'use_raw_api': True
    }
})
Note: Raw API mode only supports full streaming (stream=True, delta_stream=False) Source Reference: qwen_agent/llm/base.py:89-223

Error Handling

from qwen_agent.llm import ModelServiceError

llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'max_retries': 10  # Retry up to 10 times with exponential backoff
    }
})

try:
    responses = llm.chat(messages=[...], stream=False)
except ModelServiceError as e:
    print(f"Error code: {e.code}")
    print(f"Error message: {e.message}")

Message Schema

Message Format

from qwen_agent.llm.schema import Message, ContentItem, FunctionCall

# Text message
msg = Message(role='user', content='Hello')

# Multimodal message
msg = Message(
    role='user',
    content=[
        ContentItem(text='What is in this image?'),
        ContentItem(image='https://example.com/image.jpg')
    ]
)

# Assistant message with function call
msg = Message(
    role='assistant',
    content='',
    function_call=FunctionCall(
        name='get_weather',
        arguments='{"location": "Beijing"}'
    )
)

# Function result message
msg = Message(
    role='function',
    name='get_weather',
    content='Temperature: 20°C, Sunny'
)
Source Reference: qwen_agent/llm/schema.py:132-164

Content Types

Messages can contain multiple content types:
text
str
Plain text content
image
str
Image URL or base64-encoded image
file
str
File URL or path
audio
Union[str, dict]
Audio URL or audio configuration
video
Union[str, list]
Video URL or frame list
Source Reference: qwen_agent/llm/schema.py:80-129

Best Practices

Model Selection

  • Use qwen-max for complex reasoning tasks
  • Use qwen-plus for balanced performance
  • Use qwen-turbo for speed-critical applications
  • Use vision models only when processing images

Context Management

  • Set max_input_tokens to prevent context overflow
  • The framework auto-truncates old messages when needed
  • Keep system messages concise
  • Consider RAG for large document contexts

Performance

  • Enable caching for repeated queries
  • Use streaming for better UX
  • Configure max_retries for production reliability
  • Use qwen-turbo for latency-sensitive apps

Function Calling

  • Use parallel_function_calls for independent operations
  • Set function_choice='none' to disable functions temporarily
  • Always validate function arguments
  • Handle tool errors gracefully

Environment Variables

DASHSCOPE_API_KEY
string
API key for DashScope services
OPENAI_API_KEY
string
API key for OpenAI services
QWEN_AGENT_USE_RAW_API
boolean
default:"false"
Enable raw API mode globally. Set to 'true' to enable.

Agents

Learn how to use LLMs within agents

Function Calling

Deep dive into function calling