feat: add Claude Sonnet 4.5 support with global cross-region inference (#180)

This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929), Anthropic's most intelligent model with enhanced coding capabilities and complex agent support. Changes: - Added global cross-region inference profile discovery (global.anthropic.*) - Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously) - Fixed reasoning_effort parameter handling to prevent KeyError - Added extended thinking/interleaved thinking support via extra_body parameter - Updated documentation with Claude Sonnet 4.5 examples (English and Chinese) - Updated README with Sonnet 4.5 announcement Technical Details: - src/api/models/bedrock.py: Added global profile support in list_bedrock_models() - src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter - src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError - docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples - docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples - docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
2025-09-30 18:21:26 +09:30
parent 371d11d101
commit 66cb51bb36
4 changed files with 180 additions and 7 deletions
--- a/README.md
+++ b/README.md
@@ -4,7 +4,9 @@ OpenAI-compatible RESTful APIs for Amazon Bedrock

 ## What's New 🔥

-This project supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**, check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
+This project now supports **Claude Sonnet 4.5**, Anthropic's most intelligent model with enhanced coding capabilities and complex agent support, available via global cross-region inference.
+
+It also supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.

 ## Overview

--- a/docs/Usage.md
+++ b/docs/Usage.md
@@ -51,6 +51,43 @@ curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq
 ]
 ```

+## Chat Completions API
+
+### Basic Example with Claude Sonnet 4.5
+
+Claude Sonnet 4.5 is Anthropic's most intelligent model, excelling at coding, complex reasoning, and agent-based tasks. It's available via global cross-region inference profiles.
+
+**Example Request**
+
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+    "messages": [
+      {
+        "role": "user",
+        "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."
+      }
+    ]
+  }'
+```
+
+**Example SDK Usage**
+
+```python
+from openai import OpenAI
+
+client = OpenAI()
+completion = client.chat.completions.create(
+    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+    messages=[{"role": "user", "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."}],
+)
+
+print(completion.choices[0].message.content)
+```
+
 ## Embedding API

 **Important Notice**: Please carefully review the following points before using this proxy API for embedding.
@@ -451,10 +488,31 @@ for chunk in response:
 Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions.
 With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.

+**Supported Models**: Claude Sonnet 4, Claude Sonnet 4.5

 **Example Request**

- Non-Streaming
+- Non-Streaming (Claude Sonnet 4.5)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "Explain how to implement a binary search tree with self-balancing capabilities."
+}],
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
+
+- Non-Streaming (Claude Sonnet 4)

 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
@@ -474,7 +532,28 @@ curl http://127.0.0.1:8000/api/v1/chat/completions \
 }'
 ```

- Streaming
+- Streaming (Claude Sonnet 4.5)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "Explain how to implement a binary search tree with self-balancing capabilities."
+}],
+"stream": true,
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
+
+- Streaming (Claude Sonnet 4)

 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
--- a/docs/Usage_CN.md
+++ b/docs/Usage_CN.md
@@ -49,6 +49,42 @@ curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq
 ]
 ```

+## Chat Completions API
+
+### Claude Sonnet 4.5 基础示例
+
+Claude Sonnet 4.5 是 Anthropic 最智能的模型，在编码、复杂推理和基于代理的任务方面表现出色。它通过全球跨区域推理配置文件提供。
+
+**Request 示例**
+
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+    "messages": [
+      {
+        "role": "user",
+        "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"
+      }
+    ]
+  }'
+```
+
+**SDK 使用示例**
+
+```python
+from openai import OpenAI
+
+client = OpenAI()
+completion = client.chat.completions.create(
+    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+    messages=[{"role": "user", "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"}],
+)
+
+print(completion.choices[0].message.content)
+```

 ## Embedding API

@@ -452,10 +488,31 @@ Claude 4 模型支持借助工具使用的扩展思维功能（Extended Thinking

 在交错思考模式下，budget_tokens 可以超过 max_tokens 参数，因为它代表一次助手回合中所有思考块的总 Token 预算。

+**支持的模型**: Claude Sonnet 4, Claude Sonnet 4.5

 **Request 示例**

- Non-Streaming
+- Non-Streaming (Claude Sonnet 4.5)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
+}],
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
+
+- Non-Streaming (Claude Sonnet 4)

 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
@@ -475,7 +532,28 @@ curl http://127.0.0.1:8000/api/v1/chat/completions \
 }'
 ```

- Streaming
+- Streaming (Claude Sonnet 4.5)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
+}],
+"stream": true,
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
+
+- Streaming (Claude Sonnet 4)

 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
--- a/src/api/models/bedrock.py
+++ b/src/api/models/bedrock.py
@@ -158,6 +158,11 @@ def list_bedrock_models() -> dict:
            if profile_id in profile_list:
                model_list[profile_id] = {"modalities": input_modalities}

+            # Add global cross-region inference profiles
+            global_profile_id = "global." + model_id
+            if global_profile_id in profile_list:
+                model_list[global_profile_id] = {"modalities": input_modalities}
+
            # Add application inference profiles (emit all profiles for this model)
            if model_id in app_profiles_by_model:
                for profile_arn in app_profiles_by_model[model_id]:
@@ -521,6 +526,11 @@ class BedrockModel(BaseChatModel):
            "topP": chat_request.top_p,
        }

+        # Claude Sonnet 4.5 doesn't support both temperature and topP
+        # Remove topP for this model
+        if "claude-sonnet-4-5" in chat_request.model.lower():
+            inference_config.pop("topP", None)
+
        if chat_request.stop is not None:
            stop = chat_request.stop
            if isinstance(stop, str):
@@ -547,7 +557,7 @@ class BedrockModel(BaseChatModel):
            )
            inference_config["maxTokens"] = max_tokens
            # unset topP - Not supported
-            inference_config.pop("topP")
+            inference_config.pop("topP", None)

            args["additionalModelRequestFields"] = {
                "reasoning_config": {"type": "enabled", "budget_tokens": budget_tokens}
@@ -573,8 +583,12 @@ class BedrockModel(BaseChatModel):
            args["toolConfig"] = tool_config
        # add Additional fields to enable extend thinking
        if chat_request.extra_body:
-            # reasoning_config will not be used 
+            # reasoning_config will not be used
            args["additionalModelRequestFields"] = chat_request.extra_body
+            # Extended thinking doesn't support both temperature and topP
+            # Remove topP to avoid validation error
+            if "thinking" in chat_request.extra_body:
+                inference_config.pop("topP", None)
        return args

    def _create_response(