LLM Configuration

Overview

Qwen-Agent supports multiple LLM providers and model types through a unified configuration interface. The framework automatically selects the appropriate model client based on your configuration.

Basic Configuration

from qwen_agent.llm import get_chat_model

# Simple configuration
llm = get_chat_model({
    'model': 'qwen-plus',
    'model_server': 'dashscope',
    'api_key': 'your-api-key'
})

# Or use shorthand
llm = get_chat_model('qwen-plus')

Configuration Parameters

Core Parameters

model

str

required

Model identifier. Examples: 'qwen-plus', 'qwen-max', 'gpt-4'

model_server

str

Model service endpoint:

'dashscope' - Use Alibaba Cloud DashScope
'http://127.0.0.1:7905/v1' - Custom OpenAI-compatible endpoint
'https://api.openai.com/v1' - OpenAI API

api_key

str

API key for authentication. Can also be set via environment variables:

DASHSCOPE_API_KEY for DashScope
OPENAI_API_KEY for OpenAI

model_type

str

Explicitly specify the model type. Auto-detected if not provided.Available types:

'qwen_dashscope' - Qwen models via DashScope
'qwenvl_dashscope' - Qwen-VL vision models
'qwenaudio_dashscope' - Qwen-Audio models
'oai' - OpenAI-compatible API
'qwenvl_oai' - Vision models via OpenAI API
'azure' - Azure OpenAI
'transformers' - Local Hugging Face models
'openvino' - OpenVINO optimized models

generate_cfg

dict

Generation hyperparameters (see below)

Generation Configuration

The generate_cfg dictionary controls how the LLM generates responses.

Common Parameters

llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'top_p': 0.8,
        'temperature': 0.7,
        'max_tokens': 2000,
        'max_input_tokens': 6500,
        'max_retries': 10,
        'seed': 42,
        'stop': ['\n\nObservation:', 'END'],
    }
})

top_p

float

default:"0.8"

Nucleus sampling parameter. Controls diversity by sampling from top probability mass. Range: 0.0 to 1.0

temperature

float

default:"1.0"

Sampling temperature. Higher values increase randomness. Range: 0.0 to 2.0

max_tokens

int

Maximum number of tokens to generate in the response

max_input_tokens

int

default:"6500"

Maximum input context length. Messages are automatically truncated if exceeded. Set to -1 to disable truncation.

max_retries

int

default:"0"

Number of retry attempts on service errors. Implements exponential backoff.

seed

int

Random seed for reproducible generation. Auto-generated if not provided.

stop

List[str]

Stop sequences that halt generation when encountered

cache_dir

str

Directory for caching LLM responses. Requires diskcache package.

Function Calling Parameters

parallel_function_calls

bool

default:"false"

Enable parallel execution of multiple function calls in a single response

function_choice

str

default:"'auto'"

Control function calling behavior:

'auto' - Model decides whether to call functions
'none' - Disable function calling
'function_name' - Force call a specific function

thought_in_content

bool

default:"false"

Include reasoning thoughts in the content field along with function calls

fncall_prompt_type

str

default:"'nous'"

Function calling prompt style:

'nous' - Nous Research format
'qwen' - Qwen-specific format

Model Types

DashScope Models (Alibaba Cloud)

# Text generation
llm = get_chat_model({
    'model': 'qwen-max',
    'model_server': 'dashscope'
})

# Available models:
# - qwen-max: Most capable
# - qwen-plus: Balanced performance
# - qwen-turbo: Fast and efficient

Source Reference: qwen_agent/llm/__init__.py:31-100

OpenAI-Compatible Models

llm = get_chat_model({
    'model': 'gpt-4',
    'model_server': 'https://api.openai.com/v1',
    'api_key': 'sk-...'
})

Local Models

llm = get_chat_model({
    'model': 'Qwen/Qwen2.5-7B-Instruct',
    'model_type': 'transformers',
    'generate_cfg': {
        'device_map': 'auto'
    }
})

Using LLM Directly

Chat Interface

from qwen_agent.llm.schema import Message

llm = get_chat_model('qwen-plus')

# Simple query
responses = llm.chat(
    messages=[Message(role='user', content='Hello!')],
    stream=False
)
print(responses[0].content)

Function Calling

functions = [{
    'name': 'get_weather',
    'description': 'Get current weather',
    'parameters': {
        'type': 'object',
        'properties': {
            'location': {
                'type': 'string',
                'description': 'City name'
            }
        },
        'required': ['location']
    }
}]

responses = llm.chat(
    messages=[Message(role='user', content='What is the weather in Beijing?')],
    functions=functions,
    stream=False
)

# Check for function call
if responses[0].function_call:
    print(f"Function: {responses[0].function_call.name}")
    print(f"Arguments: {responses[0].function_call.arguments}")

Source Reference: qwen_agent/llm/base.py:118-290

Advanced Configuration

Response Caching

import os

llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'cache_dir': './llm_cache'  # Requires: pip install diskcache
    }
})

# Identical requests will be served from cache
response1 = llm.chat(messages=[Message(role='user', content='Hello')], stream=False)
response2 = llm.chat(messages=[Message(role='user', content='Hello')], stream=False)
# response2 is instant - served from cache

Raw API Mode

Bypass Qwen-Agent preprocessing for direct model access:

import os
os.environ['QWEN_AGENT_USE_RAW_API'] = 'true'

llm = get_chat_model('qwen-plus')

# Or configure per-model
llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'use_raw_api': True
    }
})

Note: Raw API mode only supports full streaming (stream=True, delta_stream=False) Source Reference: qwen_agent/llm/base.py:89-223

Error Handling

from qwen_agent.llm import ModelServiceError

llm = get_chat_model({
    'model': 'qwen-plus',
    'generate_cfg': {
        'max_retries': 10  # Retry up to 10 times with exponential backoff
    }
})

try:
    responses = llm.chat(messages=[...], stream=False)
except ModelServiceError as e:
    print(f"Error code: {e.code}")
    print(f"Error message: {e.message}")

Message Schema

Message Format

from qwen_agent.llm.schema import Message, ContentItem, FunctionCall

# Text message
msg = Message(role='user', content='Hello')

# Multimodal message
msg = Message(
    role='user',
    content=[
        ContentItem(text='What is in this image?'),
        ContentItem(image='https://example.com/image.jpg')
    ]
)

# Assistant message with function call
msg = Message(
    role='assistant',
    content='',
    function_call=FunctionCall(
        name='get_weather',
        arguments='{"location": "Beijing"}'
    )
)

# Function result message
msg = Message(
    role='function',
    name='get_weather',
    content='Temperature: 20°C, Sunny'
)

Source Reference: qwen_agent/llm/schema.py:132-164

Content Types

Messages can contain multiple content types:

text

str

Plain text content

image

str

Image URL or base64-encoded image

file

str

File URL or path

audio

Union[str, dict]

Audio URL or audio configuration

video

Union[str, list]

Video URL or frame list

Source Reference: qwen_agent/llm/schema.py:80-129

Best Practices

Model Selection

Use qwen-max for complex reasoning tasks
Use qwen-plus for balanced performance
Use qwen-turbo for speed-critical applications
Use vision models only when processing images

Context Management

Set max_input_tokens to prevent context overflow
The framework auto-truncates old messages when needed
Keep system messages concise
Consider RAG for large document contexts

Performance

Enable caching for repeated queries
Use streaming for better UX
Configure max_retries for production reliability
Use qwen-turbo for latency-sensitive apps

Function Calling

Use parallel_function_calls for independent operations
Set function_choice='none' to disable functions temporarily
Always validate function arguments
Handle tool errors gracefully

Environment Variables

DASHSCOPE_API_KEY

string

API key for DashScope services

OPENAI_API_KEY

string

API key for OpenAI services

QWEN_AGENT_USE_RAW_API

boolean

default:"false"

Enable raw API mode globally. Set to 'true' to enable.

Agents

Learn how to use LLMs within agents

Function Calling

Deep dive into function calling

Get Started

Core Concepts

Guides

Built-in Agents

Built-in Tools

Overview

Basic Configuration

Configuration Parameters

Core Parameters

Generation Configuration

Common Parameters

Function Calling Parameters

Model Types

DashScope Models (Alibaba Cloud)

OpenAI-Compatible Models

Local Models

Using LLM Directly

Chat Interface

Function Calling

Advanced Configuration

Response Caching

Raw API Mode

Error Handling

Message Schema

Message Format

Content Types

Best Practices

Model Selection

Context Management

Performance

Function Calling

Environment Variables

Agents

Function Calling

Get Started

Core Concepts

Guides

Built-in Agents

Built-in Tools

Documentation Index

​Overview

​Basic Configuration

​Configuration Parameters

​Core Parameters

​Generation Configuration

​Common Parameters

​Function Calling Parameters

​Model Types

​DashScope Models (Alibaba Cloud)

​OpenAI-Compatible Models

​Local Models

​Using LLM Directly

​Chat Interface

​Function Calling

​Advanced Configuration

​Response Caching

​Raw API Mode

​Error Handling

​Message Schema

​Message Format

​Content Types

​Best Practices

Model Selection

Context Management

Performance

Function Calling

​Environment Variables

​Related Resources

Agents

Function Calling

Overview

Basic Configuration

Configuration Parameters

Core Parameters

Generation Configuration

Common Parameters

Function Calling Parameters

Model Types

DashScope Models (Alibaba Cloud)

OpenAI-Compatible Models

Local Models

Using LLM Directly

Chat Interface

Function Calling

Advanced Configuration

Response Caching

Raw API Mode

Error Handling

Message Schema

Message Format

Content Types

Best Practices

Environment Variables

Related Resources