Files
bedrock-access-gateway/docs/Usage_CN.md
Neil Mazumdar 66cb51bb36 feat: add Claude Sonnet 4.5 support with global cross-region inference (#180)
This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929),
Anthropic's most intelligent model with enhanced coding capabilities and complex agent support.

Changes:
- Added global cross-region inference profile discovery (global.anthropic.*)
- Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously)
- Fixed reasoning_effort parameter handling to prevent KeyError
- Added extended thinking/interleaved thinking support via extra_body parameter
- Updated documentation with Claude Sonnet 4.5 examples (English and Chinese)
- Updated README with Sonnet 4.5 announcement

Technical Details:
- src/api/models/bedrock.py: Added global profile support in list_bedrock_models()
- src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter
- src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError
- docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples
- docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples
- docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation

Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
2025-09-30 16:51:26 +08:00

576 lines
18 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
[English](./Usage.md)
# Usage Guide
假设您在部署后已设置以下环境变量:
```bash
export OPENAI_API_KEY=<API key>
export OPENAI_BASE_URL=<API base url>
```
**API 示例:**
- [Models API](#models-api)
- [Embedding API](#embedding-api)
- [Multimodal API](#multimodal-api)
- [Tool Call](#tool-call)
- [Reasoning](#reasoning)
- [Interleaved thinking (beta)](#Interleaved thinking (beta))
## Models API
你可以通过这个API 获取支持的models 列表。 另外如果Amazon Bedrock有新模型加入后你也可以用它来更新刷新模型列表。
**Request 示例**
```bash
curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq .data
```
**Response 示例**
```bash
[
...
{
"id": "anthropic.claude-3-5-sonnet-20240620-v1:0",
"created": 1734416893,
"object": "model",
"owned_by": "bedrock"
},
{
"id": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"created": 1734416893,
"object": "model",
"owned_by": "bedrock"
},
...
]
```
## Chat Completions API
### Claude Sonnet 4.5 基础示例
Claude Sonnet 4.5 是 Anthropic 最智能的模型,在编码、复杂推理和基于代理的任务方面表现出色。它通过全球跨区域推理配置文件提供。
**Request 示例**
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [
{
"role": "user",
"content": "编写一个使用动态规划计算斐波那契数列的Python函数。"
}
]
}'
```
**SDK 使用示例**
```python
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
messages=[{"role": "user", "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"}],
)
print(completion.choices[0].message.content)
```
## Embedding API
**重要**: 在使用此代理 API 之前,请仔细阅读以下几点:
1. 如果您之前使用 OpenAI Embedding模型来创建向量,请注意切换到新模型可能没有那么直接。不同模型具有不同的维度(例如,embed-multilingual-v3.0 有 1024 个维度),即使对于相同的文本,它们也可能产生不同的结果。
2. 如果您使用 OpenAI Embedding模型传入的是整数编码(例如与 LangChain 一起使用),此方案将尝试使用 `tiktoken` 进行解码以检索原始文本。但是,无法保证解码后的文本准确无误。
3. 如果您对长文本使用 OpenAI Embedding,您应该验证 Bedrock 模型支持的最大Token数,例如为获得最佳性能,Bedrock 建议将文本长度限制在少于 512 个Token。
**Request 示例**
```bash
curl $OPENAI_BASE_URL/embeddings \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "The food was delicious and the waiter...",
"model": "text-embedding-ada-002",
"encoding_format": "float"
}'
```
**Response 示例**
```json
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [
-0.02279663,
-0.024612427,
0.012863159,
...
0.01612854,
0.0038928986
],
"index": 0
}
],
"model": "cohere.embed-multilingual-v3",
"usage": {
"prompt_tokens": 0,
"total_tokens": 0
}
}
```
或者你可以使用OpenAI 的SDK
```python
from openai import OpenAI
client = OpenAI()
def get_embedding(text, model="text-embedding-3-small"):
text = text.replace("\n", " ")
return client.embeddings.create(input=[text], model=model).data[0].embedding
text = "hello"
# will output like [0.003578186, 0.028717041, 0.031021118, -0.0014066696,...]
print(get_embedding(text))
```
或者 LangChain
```python
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="text-embedding-3-large",
)
text = "This is a test document."
query_result = embeddings.embed_query(text)
print(query_result[:5])
doc_result = embeddings.embed_documents([text])
print(doc_result[0][:5])
```
## Multimodal API
**Request 示例**
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "please identify and count all the objects in this images, list all the names"
},
{
"type": "image_url",
"image_url": {
"url": "https://github.com/aws-samples/bedrock-access-gateway/blob/main/assets/obj-detect.png?raw=true"
}
}
]
}
]
}'
```
如果您需要使用此API处理非公开图像,您可以先对图像进行base64编码,然后传递编码后的字符串。
将"image/jpeg"替换为实际的内容类型(content type)。目前仅支持"image/jpeg"、"image/png"、"image/gif"或"image/webp"。
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "please identify and count all the objects in this images, list all the names"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,<your image data>"
}
}
]
}
]
}'
```
**Response 示例**
```json
{
"id": "msg_01BY3wcz41x7XrKhxY3VzWke",
"created": 1712543069,
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
"system_fingerprint": "fp",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "The image contains the following objects:\n\n1. A peach-colored short-sleeve button-up shirt\n2. An olive green plaid long coat/jacket\n3. A pair of white sneakers or canvas shoes\n4. A brown shoulder bag or purse\n5. A makeup brush or cosmetic applicator\n6. A tube or container (possibly lipstick or lip balm)\n7. A pair of sunglasses\n8. A thought bubble icon\n9. A footprint icon\n10. A leaf or plant icon\n11. A flower icon\n12. A cloud icon\n\nIn total, there are 12 distinct objects depicted in the illustrated scene."
}
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 197,
"completion_tokens": 147,
"total_tokens": 344
}
}
```
## Tool Call
**重要**:在使用此代理API进行Tool Call之前,请仔细阅读以下几点:
1. OpenAI 已经废弃使用Function Call,而推荐使用Tool Call,因此Function Call在此处不受支持,您应该改为Tool Call。
**Request 示例**
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "What is the weather like in Shanghai today?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city or state which is required."
},
"unit": {
"type": "string",
"enum": [
"celsius",
"fahrenheit"
]
}
},
"required": [
"location"
]
}
}
},
{
"type": "function",
"function": {
"name": "get_current_location",
"description": "Use this tool to get the current location if user does not provide a location",
"parameters": {
"type": "object",
"properties": {}
}
}
}
],
"tool_choice": "auto"
}'
```
**Response 示例**
```json
{
"id": "msg_01PjrKDWhYGsrTNdeqzWd6D9",
"created": 1712543689,
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
"system_fingerprint": "fp",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"tool_calls": [
{
"id": "0",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"location\": \"Shanghai\", \"unit\": \"celsius\"}"
}
}
]
}
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 256,
"completion_tokens": 64,
"total_tokens": 320
}
}
```
You can try it with different questions, such as:
1. Hello, who are you? (No tools are needed)
2. What is the weather like today? (Should use get_current_location tool first)
## Reasoning
**重要**: 使用此 reasoning 推理模式前,请仔细阅读以下要点。
- 目前仅 Claude 3.7 Sonnet / Deepseek R1 模型支持推理功能。使用前请确保所用模型支持推理。
- Claude 3.7 Sonnet 推理模式(或思考模式)默认未启用,您必须在请求中传递额外的 reasoning_effort 参数,参数值可选:lowmedium, high。另外请在请求中提供正确的 max_tokens或 max_completion_tokens参数。budget_tokens 基于 reasoning_effort 设置30%60%100% 的max tokens确保最小 budget_tokens 为 1,024Anthropic 建议至少使用 4,000 个令牌以获得全面的推理。详情请参阅 [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html)。
- Deepseek R1 会自动使用推理模式,不需要在中传递额外的 reasoning_effort 参数(否则会报错)
- 推理结果(思维链结果、思考过程)被添加到名为 'reasoning_content' 的额外标签中,这不是 OpenAI 官方支持的格式。此设计遵循 [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example) 的规范。未来可能会有所变动。
**Request 示例**
- Claude 3.7 Sonnet
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"messages": [
{
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
],
"max_completion_tokens": 4096,
"reasoning_effort": "low",
"stream": false
}'
```
- DeepSeek R1
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.deepseek.r1-v1:0",
"messages": [
{
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
],
"stream": false
}'
```
**Response 示例**
```json
{
"id": "chatcmpl-83fb7a88",
"created": 1740545278,
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"system_fingerprint": "fp",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
"reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
}
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 51,
"completion_tokens": 565,
"total_tokens": 616
}
}
```
或者使用 OpenAI SDK (请先运行`pip3 install -U openai` 升级到最新版本)
- Non-Streaming
```python
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=messages,
reasoning_effort="low",
max_completion_tokens=4096,
)
reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content
```
- Streaming
```python
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=messages,
reasoning_effort="low",
max_completion_tokens=4096,
stream=True,
)
reasoning_content = ""
content = ""
for chunk in response:
if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
reasoning_content += chunk.choices[0].delta.reasoning_content
elif chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
```
## Interleaved thinking (beta)
**重要提示**:在使用 Chat Completion API 的推理模式reasoning mode请务必仔细阅读以下内容。
Claude 4 模型支持借助工具使用的扩展思维功能Extended Thinking其中包含交错思考[interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) )。该功能使 Claude 4 可以在多次调用工具之间进行思考,并在收到工具结果后执行更复杂的推理,这对处理更复杂的 Agentic AI 交互非常有帮助。
在交错思考模式下budget_tokens 可以超过 max_tokens 参数,因为它代表一次助手回合中所有思考块的总 Token 预算。
**支持的模型**: Claude Sonnet 4, Claude Sonnet 4.5
**Request 示例**
- Non-Streaming (Claude Sonnet 4.5)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Non-Streaming (Claude Sonnet 4)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Streaming (Claude Sonnet 4.5)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Streaming (Claude Sonnet 4)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```