diff --git a/README.md b/README.md index 859f296..c63505a 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,7 @@ If you find this GitHub repository useful, please consider giving it a free star - [x] Support Tool Call (**new**) - [x] Support Embedding API (**new**) - [x] Support Multimodal API (**new**) +- [x] Support Cross-Region Inference (**new**) Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs. @@ -35,7 +36,7 @@ Please check [Usage Guide](./docs/Usage.md) for more details about how to use th Supported Amazon Bedrock models family: -- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus) +- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus) / 3.5 Sonnet - Meta Llama 2 / 3 - Mistral / Mixtral - Cohere Command R / R+ @@ -153,6 +154,51 @@ print(completion.choices[0].message.content) Please check [Usage Guide](./docs/Usage.md) for more details about how to use embedding API, multimodal API and tool call. +### Bedrock Cross-Region Inference + + +Cross-Region Inference supports accessing foundation models across regions, allowing users to invoke models hosted in different AWS regions for inference. Main advantages: +- **Improved Availability**: Provides regional redundancy and enhanced fault tolerance. When issues occur in the primary region, services can failover to backup regions, ensuring continuous service availability and business continuity. +- **Reduced Latency**: Enables selection of regions geographically closest to users, optimizing network paths and reducing transmission time, resulting in better user experience and response times. +- **Better Performance and Capacity**: Implements load balancing to distribute request pressure, provides greater service capacity and throughput, and better handles traffic spikes. +- **Flexibility**: Allows selection of models from different regions based on requirements, meets specific regional compliance requirements, and enables more flexible resource allocation and management. +- **Cost Benefits**: Enables selection of more cost-effective regions, reduces overall operational costs through resource optimization, and improves resource utilization efficiency. + + +Please check [Bedrock Cross-Region Inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html) + +**limitation:** +Currently, Bedrock Access Gateway only supports cross-region Inference for the following models: +- Claude 3 Haiku +- Claude 3 Opus +- Claude 3 Sonnet +- Claude 3.5 Sonnet + +**Prerequisites:** +- IAM policies must allow cross-region access,Callers need permissions to access models and inference profiles in both regions (added in cloudformation template) +- Model access must be enabled in both regions, which defined in inference profiles + +**Example API Usage:** +- To use Bedrock cross-region inference, you include an inference profile when running model inference by specifying the ID of the inference profile as the modelId, such as `us.anthropic.claude-3-5-sonnet-20240620-v1:0` + +```bash +curl $OPENAI_BASE_URL/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -d '{ + "model": "us.anthropic.claude-3-5-sonnet-20240620-v1:0", + "max_tokens": 2048, + "messages": [ + { + "role": "user", + "content": "Hello!" + } + ] + }' +``` + + + ## Other Examples ### AutoGen diff --git a/README_CN.md b/README_CN.md index 4556304..c093a86 100644 --- a/README_CN.md +++ b/README_CN.md @@ -29,6 +29,7 @@ OpenAI 的 API 或 SDK 无缝集成并试用 Amazon Bedrock 的模型,而无需 - [x] 支持 Tool Call (**new**) - [x] 支持 Embedding API (**new**) - [x] 支持 Multimodal API (**new**) +- [x] 支持 Cross-Region Inference (**new**) 请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用新API的更多详细信息。 @@ -36,7 +37,7 @@ OpenAI 的 API 或 SDK 无缝集成并试用 Amazon Bedrock 的模型,而无需 支持的Amazon Bedrock模型家族: -- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus) +- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus) / 3.5 Sonnet - Meta Llama 2 / 3 - Mistral / Mixtral - Cohere Command R / R+ @@ -157,6 +158,48 @@ print(completion.choices[0].message.content) 请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用Embedding API、多模态API和Tool Call的更多详细信息。 +### Bedrock Cross-Region Inference + +Cross-Region Inference 支持跨区域访问的基础模型,即允许用户在一个 AWS 区域中调用其他区域的基础模型进行推理。主要优势: +- **提高可用性**: 提供区域冗余,增强容错能力。当主要区域出现问题时可以切换到备用区域,确保服务的持续可用性和业务连续性 +- **降低延迟**: 可以选择地理位置最接近用户的区域,优化网络路径,减少传输时间,提供更好的用户体验和响应速度 +- **性能和容量优化**: 实现负载均衡,分散请求压力,提供更大的服务容量和吞吐量,能够更好地处理流量峰值 +- **灵活性**: 根据需求选择不同区域的模型,满足特定地区的合规要求,更灵活的资源调配和管理 +- **成本效益**: 可以选择成本更优的区域,通过优化资源使用降低总体运营成本,更好的资源利用效率 + +详细介绍请查看[Bedrock Cross-Region Inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html) + +**限制条件:** +当前 Gateway 只添加了对 Claude 3 Haiku/Claude 3 Opus/Claude 3 Sonnet/Claude 3.5 Sonnet 的跨区域调用 +- Claude 3 Haiku +- Claude 3 Opus +- Claude 3 Sonnet +- Claude 3.5 Sonnet + +**使用前提:** +- IAM Policy 有 inference profiles 相关的权限和调用模型的权限 (cloudformation template 中已添加) +- 对 inference profiles 中定义的模型和区域中都启用模型访问权限 + +**使用方法:** +- 在调用模型时设置 modelId 为 inference profile ID, 例如 `us.anthropic.claude-3-5-sonnet-20240620-v1:0` + +```bash +curl $OPENAI_BASE_URL/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -d '{ + "model": "us.anthropic.claude-3-5-sonnet-20240620-v1:0", + "max_tokens": 2048, + "messages": [ + { + "role": "user", + "content": "Hello!" + } + ] + }' +``` + + ## 其他例子 ### AutoGen diff --git a/deployment/BedrockProxy.template b/deployment/BedrockProxy.template index 776ac9b..05eb8fc 100644 --- a/deployment/BedrockProxy.template +++ b/deployment/BedrockProxy.template @@ -265,10 +265,15 @@ { "Action": [ "bedrock:InvokeModel", - "bedrock:InvokeModelWithResponseStream" + "bedrock:InvokeModelWithResponseStream", + "bedrock:GetInferenceProfile", + "bedrock:ListInferenceProfiles" ], "Effect": "Allow", - "Resource": "arn:aws:bedrock:*::foundation-model/*" + "Resource": [ + "arn:aws:bedrock:*::foundation-model/*", + "arn:aws:bedrock:*:*:inference-profile/*" + ] }, { "Action": [ diff --git a/deployment/BedrockProxyFargate.template b/deployment/BedrockProxyFargate.template index b6c5e9e..a17f85d 100644 --- a/deployment/BedrockProxyFargate.template +++ b/deployment/BedrockProxyFargate.template @@ -327,10 +327,15 @@ { "Action": [ "bedrock:InvokeModel", - "bedrock:InvokeModelWithResponseStream" + "bedrock:InvokeModelWithResponseStream", + "bedrock:GetInferenceProfile", + "bedrock:ListInferenceProfiles" ], "Effect": "Allow", - "Resource": "arn:aws:bedrock:*::foundation-model/*" + "Resource": [ + "arn:aws:bedrock:*::foundation-model/*", + "arn:aws:bedrock:*:*:inference-profile/*" + ] }, { "Action": [ diff --git a/src/api/models/bedrock.py b/src/api/models/bedrock.py index 231c767..a546275 100644 --- a/src/api/models/bedrock.py +++ b/src/api/models/bedrock.py @@ -35,6 +35,8 @@ from api.schema import ( EmbeddingsResponse, EmbeddingsUsage, Embedding, + + ) from api.setting import DEBUG, AWS_REGION @@ -197,6 +199,59 @@ class BedrockModel(BaseChatModel): "tool_call": True, "stream_tool_call": False, }, + # claude 3 Haiku cross-region inference profile + "us.anthropic.claude-3-haiku-20240307-v1:0": { + "system": True, + "multimodal": True, + "tool_call": True, + "stream_tool_call": True, + }, + "eu.anthropic.claude-3-haiku-20240307-v1:0": { + "system": True, + "multimodal": True, + "tool_call": True, + "stream_tool_call": True, + }, + # claude 3 Opus cross-region inference profile + "us.anthropic.claude-3-opus-20240229-v1:0": { + "system": True, + "multimodal": True, + "tool_call": True, + "stream_tool_call": True, + }, + # claude 3 Sonnet cross-region inference profile + "us.anthropic.claude-3-sonnet-20240229-v1:0": { + "system": True, + "multimodal": True, + "tool_call": True, + "stream_tool_call": True, + }, + "eu.anthropic.claude-3-sonnet-20240229-v1:0": { + "system": True, + "multimodal": True, + "tool_call": True, + "stream_tool_call": True, + }, + # claude 3.5 Sonnet cross-region inference profile + "us.anthropic.claude-3-5-sonnet-20240620-v1:0": { + "system": True, + "multimodal": True, + "tool_call": True, + "stream_tool_call": True, + }, + "eu.anthropic.claude-3-5-sonnet-20240620-v1:0": { + "system": True, + "multimodal": True, + "tool_call": True, + "stream_tool_call": True, + }, + # claude 3.5 Sonnet v2 cross-region inference profile(Now only us-west-2) + "us.anthropic.claude-3-5-sonnet-20241022-v2:0": { + "system": True, + "multimodal": True, + "tool_call": True, + "stream_tool_call": True, + }, } def list_models(self) -> list[str]: diff --git a/src/requirements.txt b/src/requirements.txt index 3d4663e..be7a2e7 100644 --- a/src/requirements.txt +++ b/src/requirements.txt @@ -5,5 +5,6 @@ mangum==0.17.0 tiktoken==0.6.0 requests==2.32.3 numpy==1.26.4 -boto3==1.34.132 -botocore==1.34.132 \ No newline at end of file +boto3==1.35.49 +botocore==1.35.49 +