diff --git a/README.md b/README.md index 7da25db..c84e485 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,9 @@ OpenAI-compatible RESTful APIs for Amazon Bedrock ## What's New 🔥 -This project supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**, check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list. +This project now supports **Claude Sonnet 4.5**, Anthropic's most intelligent model with enhanced coding capabilities and complex agent support, available via global cross-region inference. + +It also supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list. ## Overview diff --git a/docs/Usage.md b/docs/Usage.md index 872cbc6..c9d003c 100644 --- a/docs/Usage.md +++ b/docs/Usage.md @@ -51,6 +51,43 @@ curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq ] ``` +## Chat Completions API + +### Basic Example with Claude Sonnet 4.5 + +Claude Sonnet 4.5 is Anthropic's most intelligent model, excelling at coding, complex reasoning, and agent-based tasks. It's available via global cross-region inference profiles. + +**Example Request** + +```bash +curl $OPENAI_BASE_URL/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -d '{ + "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0", + "messages": [ + { + "role": "user", + "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming." + } + ] + }' +``` + +**Example SDK Usage** + +```python +from openai import OpenAI + +client = OpenAI() +completion = client.chat.completions.create( + model="global.anthropic.claude-sonnet-4-5-20250929-v1:0", + messages=[{"role": "user", "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."}], +) + +print(completion.choices[0].message.content) +``` + ## Embedding API **Important Notice**: Please carefully review the following points before using this proxy API for embedding. @@ -451,10 +488,31 @@ for chunk in response: Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions. With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn. +**Supported Models**: Claude Sonnet 4, Claude Sonnet 4.5 **Example Request** -- Non-Streaming +- Non-Streaming (Claude Sonnet 4.5) + +```bash +curl http://127.0.0.1:8000/api/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer bedrock" \ +-d '{ +"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0", +"max_tokens": 2048, +"messages": [{ +"role": "user", +"content": "Explain how to implement a binary search tree with self-balancing capabilities." +}], +"extra_body": { +"anthropic_beta": ["interleaved-thinking-2025-05-14"], +"thinking": {"type": "enabled", "budget_tokens": 4096} +} +}' +``` + +- Non-Streaming (Claude Sonnet 4) ```bash curl http://127.0.0.1:8000/api/v1/chat/completions \ @@ -474,7 +532,28 @@ curl http://127.0.0.1:8000/api/v1/chat/completions \ }' ``` -- Streaming +- Streaming (Claude Sonnet 4.5) + +```bash +curl http://127.0.0.1:8000/api/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer bedrock" \ +-d '{ +"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0", +"max_tokens": 2048, +"messages": [{ +"role": "user", +"content": "Explain how to implement a binary search tree with self-balancing capabilities." +}], +"stream": true, +"extra_body": { +"anthropic_beta": ["interleaved-thinking-2025-05-14"], +"thinking": {"type": "enabled", "budget_tokens": 4096} +} +}' +``` + +- Streaming (Claude Sonnet 4) ```bash curl http://127.0.0.1:8000/api/v1/chat/completions \ diff --git a/docs/Usage_CN.md b/docs/Usage_CN.md index 4ce3d40..985f51b 100644 --- a/docs/Usage_CN.md +++ b/docs/Usage_CN.md @@ -49,6 +49,42 @@ curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq ] ``` +## Chat Completions API + +### Claude Sonnet 4.5 基础示例 + +Claude Sonnet 4.5 是 Anthropic 最智能的模型,在编码、复杂推理和基于代理的任务方面表现出色。它通过全球跨区域推理配置文件提供。 + +**Request 示例** + +```bash +curl $OPENAI_BASE_URL/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -d '{ + "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0", + "messages": [ + { + "role": "user", + "content": "编写一个使用动态规划计算斐波那契数列的Python函数。" + } + ] + }' +``` + +**SDK 使用示例** + +```python +from openai import OpenAI + +client = OpenAI() +completion = client.chat.completions.create( + model="global.anthropic.claude-sonnet-4-5-20250929-v1:0", + messages=[{"role": "user", "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"}], +) + +print(completion.choices[0].message.content) +``` ## Embedding API @@ -452,10 +488,31 @@ Claude 4 模型支持借助工具使用的扩展思维功能(Extended Thinking 在交错思考模式下,budget_tokens 可以超过 max_tokens 参数,因为它代表一次助手回合中所有思考块的总 Token 预算。 +**支持的模型**: Claude Sonnet 4, Claude Sonnet 4.5 **Request 示例** -- Non-Streaming +- Non-Streaming (Claude Sonnet 4.5) + +```bash +curl http://127.0.0.1:8000/api/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer bedrock" \ +-d '{ +"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0", +"max_tokens": 2048, +"messages": [{ +"role": "user", +"content": "解释如何实现一个具有自平衡功能的二叉搜索树。" +}], +"extra_body": { +"anthropic_beta": ["interleaved-thinking-2025-05-14"], +"thinking": {"type": "enabled", "budget_tokens": 4096} +} +}' +``` + +- Non-Streaming (Claude Sonnet 4) ```bash curl http://127.0.0.1:8000/api/v1/chat/completions \ @@ -475,7 +532,28 @@ curl http://127.0.0.1:8000/api/v1/chat/completions \ }' ``` -- Streaming +- Streaming (Claude Sonnet 4.5) + +```bash +curl http://127.0.0.1:8000/api/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer bedrock" \ +-d '{ +"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0", +"max_tokens": 2048, +"messages": [{ +"role": "user", +"content": "解释如何实现一个具有自平衡功能的二叉搜索树。" +}], +"stream": true, +"extra_body": { +"anthropic_beta": ["interleaved-thinking-2025-05-14"], +"thinking": {"type": "enabled", "budget_tokens": 4096} +} +}' +``` + +- Streaming (Claude Sonnet 4) ```bash curl http://127.0.0.1:8000/api/v1/chat/completions \ diff --git a/src/api/models/bedrock.py b/src/api/models/bedrock.py index 374fcd1..e82da47 100644 --- a/src/api/models/bedrock.py +++ b/src/api/models/bedrock.py @@ -158,6 +158,11 @@ def list_bedrock_models() -> dict: if profile_id in profile_list: model_list[profile_id] = {"modalities": input_modalities} + # Add global cross-region inference profiles + global_profile_id = "global." + model_id + if global_profile_id in profile_list: + model_list[global_profile_id] = {"modalities": input_modalities} + # Add application inference profiles (emit all profiles for this model) if model_id in app_profiles_by_model: for profile_arn in app_profiles_by_model[model_id]: @@ -521,6 +526,11 @@ class BedrockModel(BaseChatModel): "topP": chat_request.top_p, } + # Claude Sonnet 4.5 doesn't support both temperature and topP + # Remove topP for this model + if "claude-sonnet-4-5" in chat_request.model.lower(): + inference_config.pop("topP", None) + if chat_request.stop is not None: stop = chat_request.stop if isinstance(stop, str): @@ -547,7 +557,7 @@ class BedrockModel(BaseChatModel): ) inference_config["maxTokens"] = max_tokens # unset topP - Not supported - inference_config.pop("topP") + inference_config.pop("topP", None) args["additionalModelRequestFields"] = { "reasoning_config": {"type": "enabled", "budget_tokens": budget_tokens} @@ -573,8 +583,12 @@ class BedrockModel(BaseChatModel): args["toolConfig"] = tool_config # add Additional fields to enable extend thinking if chat_request.extra_body: - # reasoning_config will not be used + # reasoning_config will not be used args["additionalModelRequestFields"] = chat_request.extra_body + # Extended thinking doesn't support both temperature and topP + # Remove topP to avoid validation error + if "thinking" in chat_request.extra_body: + inference_config.pop("topP", None) return args def _create_response(