suppot all Claude models Cross-Region Inference (#65)
This commit is contained in:
48
README.md
48
README.md
@@ -28,6 +28,7 @@ If you find this GitHub repository useful, please consider giving it a free star
|
|||||||
- [x] Support Tool Call (**new**)
|
- [x] Support Tool Call (**new**)
|
||||||
- [x] Support Embedding API (**new**)
|
- [x] Support Embedding API (**new**)
|
||||||
- [x] Support Multimodal API (**new**)
|
- [x] Support Multimodal API (**new**)
|
||||||
|
- [x] Support Cross-Region Inference (**new**)
|
||||||
|
|
||||||
Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.
|
Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.
|
||||||
|
|
||||||
@@ -35,7 +36,7 @@ Please check [Usage Guide](./docs/Usage.md) for more details about how to use th
|
|||||||
|
|
||||||
Supported Amazon Bedrock models family:
|
Supported Amazon Bedrock models family:
|
||||||
|
|
||||||
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
|
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus) / 3.5 Sonnet
|
||||||
- Meta Llama 2 / 3
|
- Meta Llama 2 / 3
|
||||||
- Mistral / Mixtral
|
- Mistral / Mixtral
|
||||||
- Cohere Command R / R+
|
- Cohere Command R / R+
|
||||||
@@ -153,6 +154,51 @@ print(completion.choices[0].message.content)
|
|||||||
|
|
||||||
Please check [Usage Guide](./docs/Usage.md) for more details about how to use embedding API, multimodal API and tool call.
|
Please check [Usage Guide](./docs/Usage.md) for more details about how to use embedding API, multimodal API and tool call.
|
||||||
|
|
||||||
|
### Bedrock Cross-Region Inference
|
||||||
|
|
||||||
|
|
||||||
|
Cross-Region Inference supports accessing foundation models across regions, allowing users to invoke models hosted in different AWS regions for inference. Main advantages:
|
||||||
|
- **Improved Availability**: Provides regional redundancy and enhanced fault tolerance. When issues occur in the primary region, services can failover to backup regions, ensuring continuous service availability and business continuity.
|
||||||
|
- **Reduced Latency**: Enables selection of regions geographically closest to users, optimizing network paths and reducing transmission time, resulting in better user experience and response times.
|
||||||
|
- **Better Performance and Capacity**: Implements load balancing to distribute request pressure, provides greater service capacity and throughput, and better handles traffic spikes.
|
||||||
|
- **Flexibility**: Allows selection of models from different regions based on requirements, meets specific regional compliance requirements, and enables more flexible resource allocation and management.
|
||||||
|
- **Cost Benefits**: Enables selection of more cost-effective regions, reduces overall operational costs through resource optimization, and improves resource utilization efficiency.
|
||||||
|
|
||||||
|
|
||||||
|
Please check [Bedrock Cross-Region Inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html)
|
||||||
|
|
||||||
|
**limitation:**
|
||||||
|
Currently, Bedrock Access Gateway only supports cross-region Inference for the following models:
|
||||||
|
- Claude 3 Haiku
|
||||||
|
- Claude 3 Opus
|
||||||
|
- Claude 3 Sonnet
|
||||||
|
- Claude 3.5 Sonnet
|
||||||
|
|
||||||
|
**Prerequisites:**
|
||||||
|
- IAM policies must allow cross-region access,Callers need permissions to access models and inference profiles in both regions (added in cloudformation template)
|
||||||
|
- Model access must be enabled in both regions, which defined in inference profiles
|
||||||
|
|
||||||
|
**Example API Usage:**
|
||||||
|
- To use Bedrock cross-region inference, you include an inference profile when running model inference by specifying the ID of the inference profile as the modelId, such as `us.anthropic.claude-3-5-sonnet-20240620-v1:0`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl $OPENAI_BASE_URL/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer $OPENAI_API_KEY" \
|
||||||
|
-d '{
|
||||||
|
"model": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
|
||||||
|
"max_tokens": 2048,
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": "Hello!"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Other Examples
|
## Other Examples
|
||||||
|
|
||||||
### AutoGen
|
### AutoGen
|
||||||
|
|||||||
45
README_CN.md
45
README_CN.md
@@ -29,6 +29,7 @@ OpenAI 的 API 或 SDK 无缝集成并试用 Amazon Bedrock 的模型,而无需
|
|||||||
- [x] 支持 Tool Call (**new**)
|
- [x] 支持 Tool Call (**new**)
|
||||||
- [x] 支持 Embedding API (**new**)
|
- [x] 支持 Embedding API (**new**)
|
||||||
- [x] 支持 Multimodal API (**new**)
|
- [x] 支持 Multimodal API (**new**)
|
||||||
|
- [x] 支持 Cross-Region Inference (**new**)
|
||||||
|
|
||||||
请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用新API的更多详细信息。
|
请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用新API的更多详细信息。
|
||||||
|
|
||||||
@@ -36,7 +37,7 @@ OpenAI 的 API 或 SDK 无缝集成并试用 Amazon Bedrock 的模型,而无需
|
|||||||
|
|
||||||
支持的Amazon Bedrock模型家族:
|
支持的Amazon Bedrock模型家族:
|
||||||
|
|
||||||
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
|
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus) / 3.5 Sonnet
|
||||||
- Meta Llama 2 / 3
|
- Meta Llama 2 / 3
|
||||||
- Mistral / Mixtral
|
- Mistral / Mixtral
|
||||||
- Cohere Command R / R+
|
- Cohere Command R / R+
|
||||||
@@ -157,6 +158,48 @@ print(completion.choices[0].message.content)
|
|||||||
|
|
||||||
请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用Embedding API、多模态API和Tool Call的更多详细信息。
|
请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用Embedding API、多模态API和Tool Call的更多详细信息。
|
||||||
|
|
||||||
|
### Bedrock Cross-Region Inference
|
||||||
|
|
||||||
|
Cross-Region Inference 支持跨区域访问的基础模型,即允许用户在一个 AWS 区域中调用其他区域的基础模型进行推理。主要优势:
|
||||||
|
- **提高可用性**: 提供区域冗余,增强容错能力。当主要区域出现问题时可以切换到备用区域,确保服务的持续可用性和业务连续性
|
||||||
|
- **降低延迟**: 可以选择地理位置最接近用户的区域,优化网络路径,减少传输时间,提供更好的用户体验和响应速度
|
||||||
|
- **性能和容量优化**: 实现负载均衡,分散请求压力,提供更大的服务容量和吞吐量,能够更好地处理流量峰值
|
||||||
|
- **灵活性**: 根据需求选择不同区域的模型,满足特定地区的合规要求,更灵活的资源调配和管理
|
||||||
|
- **成本效益**: 可以选择成本更优的区域,通过优化资源使用降低总体运营成本,更好的资源利用效率
|
||||||
|
|
||||||
|
详细介绍请查看[Bedrock Cross-Region Inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html)
|
||||||
|
|
||||||
|
**限制条件:**
|
||||||
|
当前 Gateway 只添加了对 Claude 3 Haiku/Claude 3 Opus/Claude 3 Sonnet/Claude 3.5 Sonnet 的跨区域调用
|
||||||
|
- Claude 3 Haiku
|
||||||
|
- Claude 3 Opus
|
||||||
|
- Claude 3 Sonnet
|
||||||
|
- Claude 3.5 Sonnet
|
||||||
|
|
||||||
|
**使用前提:**
|
||||||
|
- IAM Policy 有 inference profiles 相关的权限和调用模型的权限 (cloudformation template 中已添加)
|
||||||
|
- 对 inference profiles 中定义的模型和区域中都启用模型访问权限
|
||||||
|
|
||||||
|
**使用方法:**
|
||||||
|
- 在调用模型时设置 modelId 为 inference profile ID, 例如 `us.anthropic.claude-3-5-sonnet-20240620-v1:0`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl $OPENAI_BASE_URL/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer $OPENAI_API_KEY" \
|
||||||
|
-d '{
|
||||||
|
"model": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
|
||||||
|
"max_tokens": 2048,
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": "Hello!"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
## 其他例子
|
## 其他例子
|
||||||
|
|
||||||
### AutoGen
|
### AutoGen
|
||||||
|
|||||||
@@ -265,10 +265,15 @@
|
|||||||
{
|
{
|
||||||
"Action": [
|
"Action": [
|
||||||
"bedrock:InvokeModel",
|
"bedrock:InvokeModel",
|
||||||
"bedrock:InvokeModelWithResponseStream"
|
"bedrock:InvokeModelWithResponseStream",
|
||||||
|
"bedrock:GetInferenceProfile",
|
||||||
|
"bedrock:ListInferenceProfiles"
|
||||||
],
|
],
|
||||||
"Effect": "Allow",
|
"Effect": "Allow",
|
||||||
"Resource": "arn:aws:bedrock:*::foundation-model/*"
|
"Resource": [
|
||||||
|
"arn:aws:bedrock:*::foundation-model/*",
|
||||||
|
"arn:aws:bedrock:*:*:inference-profile/*"
|
||||||
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"Action": [
|
"Action": [
|
||||||
|
|||||||
@@ -327,10 +327,15 @@
|
|||||||
{
|
{
|
||||||
"Action": [
|
"Action": [
|
||||||
"bedrock:InvokeModel",
|
"bedrock:InvokeModel",
|
||||||
"bedrock:InvokeModelWithResponseStream"
|
"bedrock:InvokeModelWithResponseStream",
|
||||||
|
"bedrock:GetInferenceProfile",
|
||||||
|
"bedrock:ListInferenceProfiles"
|
||||||
],
|
],
|
||||||
"Effect": "Allow",
|
"Effect": "Allow",
|
||||||
"Resource": "arn:aws:bedrock:*::foundation-model/*"
|
"Resource": [
|
||||||
|
"arn:aws:bedrock:*::foundation-model/*",
|
||||||
|
"arn:aws:bedrock:*:*:inference-profile/*"
|
||||||
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"Action": [
|
"Action": [
|
||||||
|
|||||||
@@ -35,6 +35,8 @@ from api.schema import (
|
|||||||
EmbeddingsResponse,
|
EmbeddingsResponse,
|
||||||
EmbeddingsUsage,
|
EmbeddingsUsage,
|
||||||
Embedding,
|
Embedding,
|
||||||
|
|
||||||
|
|
||||||
)
|
)
|
||||||
from api.setting import DEBUG, AWS_REGION
|
from api.setting import DEBUG, AWS_REGION
|
||||||
|
|
||||||
@@ -197,6 +199,59 @@ class BedrockModel(BaseChatModel):
|
|||||||
"tool_call": True,
|
"tool_call": True,
|
||||||
"stream_tool_call": False,
|
"stream_tool_call": False,
|
||||||
},
|
},
|
||||||
|
# claude 3 Haiku cross-region inference profile
|
||||||
|
"us.anthropic.claude-3-haiku-20240307-v1:0": {
|
||||||
|
"system": True,
|
||||||
|
"multimodal": True,
|
||||||
|
"tool_call": True,
|
||||||
|
"stream_tool_call": True,
|
||||||
|
},
|
||||||
|
"eu.anthropic.claude-3-haiku-20240307-v1:0": {
|
||||||
|
"system": True,
|
||||||
|
"multimodal": True,
|
||||||
|
"tool_call": True,
|
||||||
|
"stream_tool_call": True,
|
||||||
|
},
|
||||||
|
# claude 3 Opus cross-region inference profile
|
||||||
|
"us.anthropic.claude-3-opus-20240229-v1:0": {
|
||||||
|
"system": True,
|
||||||
|
"multimodal": True,
|
||||||
|
"tool_call": True,
|
||||||
|
"stream_tool_call": True,
|
||||||
|
},
|
||||||
|
# claude 3 Sonnet cross-region inference profile
|
||||||
|
"us.anthropic.claude-3-sonnet-20240229-v1:0": {
|
||||||
|
"system": True,
|
||||||
|
"multimodal": True,
|
||||||
|
"tool_call": True,
|
||||||
|
"stream_tool_call": True,
|
||||||
|
},
|
||||||
|
"eu.anthropic.claude-3-sonnet-20240229-v1:0": {
|
||||||
|
"system": True,
|
||||||
|
"multimodal": True,
|
||||||
|
"tool_call": True,
|
||||||
|
"stream_tool_call": True,
|
||||||
|
},
|
||||||
|
# claude 3.5 Sonnet cross-region inference profile
|
||||||
|
"us.anthropic.claude-3-5-sonnet-20240620-v1:0": {
|
||||||
|
"system": True,
|
||||||
|
"multimodal": True,
|
||||||
|
"tool_call": True,
|
||||||
|
"stream_tool_call": True,
|
||||||
|
},
|
||||||
|
"eu.anthropic.claude-3-5-sonnet-20240620-v1:0": {
|
||||||
|
"system": True,
|
||||||
|
"multimodal": True,
|
||||||
|
"tool_call": True,
|
||||||
|
"stream_tool_call": True,
|
||||||
|
},
|
||||||
|
# claude 3.5 Sonnet v2 cross-region inference profile(Now only us-west-2)
|
||||||
|
"us.anthropic.claude-3-5-sonnet-20241022-v2:0": {
|
||||||
|
"system": True,
|
||||||
|
"multimodal": True,
|
||||||
|
"tool_call": True,
|
||||||
|
"stream_tool_call": True,
|
||||||
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
def list_models(self) -> list[str]:
|
def list_models(self) -> list[str]:
|
||||||
|
|||||||
@@ -5,5 +5,6 @@ mangum==0.17.0
|
|||||||
tiktoken==0.6.0
|
tiktoken==0.6.0
|
||||||
requests==2.32.3
|
requests==2.32.3
|
||||||
numpy==1.26.4
|
numpy==1.26.4
|
||||||
boto3==1.34.132
|
boto3==1.35.49
|
||||||
botocore==1.34.132
|
botocore==1.35.49
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user