feat: add Claude Sonnet 4.5 support with global cross-region inference (#180)
This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929),
Anthropic's most intelligent model with enhanced coding capabilities and complex agent support.
Changes:
- Added global cross-region inference profile discovery (global.anthropic.*)
- Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously)
- Fixed reasoning_effort parameter handling to prevent KeyError
- Added extended thinking/interleaved thinking support via extra_body parameter
- Updated documentation with Claude Sonnet 4.5 examples (English and Chinese)
- Updated README with Sonnet 4.5 announcement
Technical Details:
- src/api/models/bedrock.py: Added global profile support in list_bedrock_models()
- src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter
- src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError
- docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples
- docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples
- docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation
Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
This commit is contained in:
@@ -4,7 +4,9 @@ OpenAI-compatible RESTful APIs for Amazon Bedrock
|
|||||||
|
|
||||||
## What's New 🔥
|
## What's New 🔥
|
||||||
|
|
||||||
This project supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**, check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
|
This project now supports **Claude Sonnet 4.5**, Anthropic's most intelligent model with enhanced coding capabilities and complex agent support, available via global cross-region inference.
|
||||||
|
|
||||||
|
It also supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
|
|||||||
@@ -51,6 +51,43 @@ curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq
|
|||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Chat Completions API
|
||||||
|
|
||||||
|
### Basic Example with Claude Sonnet 4.5
|
||||||
|
|
||||||
|
Claude Sonnet 4.5 is Anthropic's most intelligent model, excelling at coding, complex reasoning, and agent-based tasks. It's available via global cross-region inference profiles.
|
||||||
|
|
||||||
|
**Example Request**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl $OPENAI_BASE_URL/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer $OPENAI_API_KEY" \
|
||||||
|
-d '{
|
||||||
|
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example SDK Usage**
|
||||||
|
|
||||||
|
```python
|
||||||
|
from openai import OpenAI
|
||||||
|
|
||||||
|
client = OpenAI()
|
||||||
|
completion = client.chat.completions.create(
|
||||||
|
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
||||||
|
messages=[{"role": "user", "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."}],
|
||||||
|
)
|
||||||
|
|
||||||
|
print(completion.choices[0].message.content)
|
||||||
|
```
|
||||||
|
|
||||||
## Embedding API
|
## Embedding API
|
||||||
|
|
||||||
**Important Notice**: Please carefully review the following points before using this proxy API for embedding.
|
**Important Notice**: Please carefully review the following points before using this proxy API for embedding.
|
||||||
@@ -451,10 +488,31 @@ for chunk in response:
|
|||||||
Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions.
|
Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions.
|
||||||
With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.
|
With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.
|
||||||
|
|
||||||
|
**Supported Models**: Claude Sonnet 4, Claude Sonnet 4.5
|
||||||
|
|
||||||
**Example Request**
|
**Example Request**
|
||||||
|
|
||||||
- Non-Streaming
|
- Non-Streaming (Claude Sonnet 4.5)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer bedrock" \
|
||||||
|
-d '{
|
||||||
|
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
||||||
|
"max_tokens": 2048,
|
||||||
|
"messages": [{
|
||||||
|
"role": "user",
|
||||||
|
"content": "Explain how to implement a binary search tree with self-balancing capabilities."
|
||||||
|
}],
|
||||||
|
"extra_body": {
|
||||||
|
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
|
||||||
|
"thinking": {"type": "enabled", "budget_tokens": 4096}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
- Non-Streaming (Claude Sonnet 4)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
||||||
@@ -474,7 +532,28 @@ curl http://127.0.0.1:8000/api/v1/chat/completions \
|
|||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
- Streaming
|
- Streaming (Claude Sonnet 4.5)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer bedrock" \
|
||||||
|
-d '{
|
||||||
|
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
||||||
|
"max_tokens": 2048,
|
||||||
|
"messages": [{
|
||||||
|
"role": "user",
|
||||||
|
"content": "Explain how to implement a binary search tree with self-balancing capabilities."
|
||||||
|
}],
|
||||||
|
"stream": true,
|
||||||
|
"extra_body": {
|
||||||
|
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
|
||||||
|
"thinking": {"type": "enabled", "budget_tokens": 4096}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
- Streaming (Claude Sonnet 4)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
||||||
|
|||||||
@@ -49,6 +49,42 @@ curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq
|
|||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Chat Completions API
|
||||||
|
|
||||||
|
### Claude Sonnet 4.5 基础示例
|
||||||
|
|
||||||
|
Claude Sonnet 4.5 是 Anthropic 最智能的模型,在编码、复杂推理和基于代理的任务方面表现出色。它通过全球跨区域推理配置文件提供。
|
||||||
|
|
||||||
|
**Request 示例**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl $OPENAI_BASE_URL/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer $OPENAI_API_KEY" \
|
||||||
|
-d '{
|
||||||
|
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": "编写一个使用动态规划计算斐波那契数列的Python函数。"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**SDK 使用示例**
|
||||||
|
|
||||||
|
```python
|
||||||
|
from openai import OpenAI
|
||||||
|
|
||||||
|
client = OpenAI()
|
||||||
|
completion = client.chat.completions.create(
|
||||||
|
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
||||||
|
messages=[{"role": "user", "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"}],
|
||||||
|
)
|
||||||
|
|
||||||
|
print(completion.choices[0].message.content)
|
||||||
|
```
|
||||||
|
|
||||||
## Embedding API
|
## Embedding API
|
||||||
|
|
||||||
@@ -452,10 +488,31 @@ Claude 4 模型支持借助工具使用的扩展思维功能(Extended Thinking
|
|||||||
|
|
||||||
在交错思考模式下,budget_tokens 可以超过 max_tokens 参数,因为它代表一次助手回合中所有思考块的总 Token 预算。
|
在交错思考模式下,budget_tokens 可以超过 max_tokens 参数,因为它代表一次助手回合中所有思考块的总 Token 预算。
|
||||||
|
|
||||||
|
**支持的模型**: Claude Sonnet 4, Claude Sonnet 4.5
|
||||||
|
|
||||||
**Request 示例**
|
**Request 示例**
|
||||||
|
|
||||||
- Non-Streaming
|
- Non-Streaming (Claude Sonnet 4.5)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer bedrock" \
|
||||||
|
-d '{
|
||||||
|
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
||||||
|
"max_tokens": 2048,
|
||||||
|
"messages": [{
|
||||||
|
"role": "user",
|
||||||
|
"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
|
||||||
|
}],
|
||||||
|
"extra_body": {
|
||||||
|
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
|
||||||
|
"thinking": {"type": "enabled", "budget_tokens": 4096}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
- Non-Streaming (Claude Sonnet 4)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
||||||
@@ -475,7 +532,28 @@ curl http://127.0.0.1:8000/api/v1/chat/completions \
|
|||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
- Streaming
|
- Streaming (Claude Sonnet 4.5)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer bedrock" \
|
||||||
|
-d '{
|
||||||
|
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
|
||||||
|
"max_tokens": 2048,
|
||||||
|
"messages": [{
|
||||||
|
"role": "user",
|
||||||
|
"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
|
||||||
|
}],
|
||||||
|
"stream": true,
|
||||||
|
"extra_body": {
|
||||||
|
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
|
||||||
|
"thinking": {"type": "enabled", "budget_tokens": 4096}
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
- Streaming (Claude Sonnet 4)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
curl http://127.0.0.1:8000/api/v1/chat/completions \
|
||||||
|
|||||||
@@ -158,6 +158,11 @@ def list_bedrock_models() -> dict:
|
|||||||
if profile_id in profile_list:
|
if profile_id in profile_list:
|
||||||
model_list[profile_id] = {"modalities": input_modalities}
|
model_list[profile_id] = {"modalities": input_modalities}
|
||||||
|
|
||||||
|
# Add global cross-region inference profiles
|
||||||
|
global_profile_id = "global." + model_id
|
||||||
|
if global_profile_id in profile_list:
|
||||||
|
model_list[global_profile_id] = {"modalities": input_modalities}
|
||||||
|
|
||||||
# Add application inference profiles (emit all profiles for this model)
|
# Add application inference profiles (emit all profiles for this model)
|
||||||
if model_id in app_profiles_by_model:
|
if model_id in app_profiles_by_model:
|
||||||
for profile_arn in app_profiles_by_model[model_id]:
|
for profile_arn in app_profiles_by_model[model_id]:
|
||||||
@@ -521,6 +526,11 @@ class BedrockModel(BaseChatModel):
|
|||||||
"topP": chat_request.top_p,
|
"topP": chat_request.top_p,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# Claude Sonnet 4.5 doesn't support both temperature and topP
|
||||||
|
# Remove topP for this model
|
||||||
|
if "claude-sonnet-4-5" in chat_request.model.lower():
|
||||||
|
inference_config.pop("topP", None)
|
||||||
|
|
||||||
if chat_request.stop is not None:
|
if chat_request.stop is not None:
|
||||||
stop = chat_request.stop
|
stop = chat_request.stop
|
||||||
if isinstance(stop, str):
|
if isinstance(stop, str):
|
||||||
@@ -547,7 +557,7 @@ class BedrockModel(BaseChatModel):
|
|||||||
)
|
)
|
||||||
inference_config["maxTokens"] = max_tokens
|
inference_config["maxTokens"] = max_tokens
|
||||||
# unset topP - Not supported
|
# unset topP - Not supported
|
||||||
inference_config.pop("topP")
|
inference_config.pop("topP", None)
|
||||||
|
|
||||||
args["additionalModelRequestFields"] = {
|
args["additionalModelRequestFields"] = {
|
||||||
"reasoning_config": {"type": "enabled", "budget_tokens": budget_tokens}
|
"reasoning_config": {"type": "enabled", "budget_tokens": budget_tokens}
|
||||||
@@ -575,6 +585,10 @@ class BedrockModel(BaseChatModel):
|
|||||||
if chat_request.extra_body:
|
if chat_request.extra_body:
|
||||||
# reasoning_config will not be used
|
# reasoning_config will not be used
|
||||||
args["additionalModelRequestFields"] = chat_request.extra_body
|
args["additionalModelRequestFields"] = chat_request.extra_body
|
||||||
|
# Extended thinking doesn't support both temperature and topP
|
||||||
|
# Remove topP to avoid validation error
|
||||||
|
if "thinking" in chat_request.extra_body:
|
||||||
|
inference_config.pop("topP", None)
|
||||||
return args
|
return args
|
||||||
|
|
||||||
def _create_response(
|
def _create_response(
|
||||||
|
|||||||
Reference in New Issue
Block a user