Use readme

2024-12-16 17:11:54 +08:00
parent dc067affc0
commit 51bc727b38
5 changed files with 38 additions and 112 deletions
--- a/README.md
+++ b/README.md
@@ -6,10 +6,26 @@ OpenAI-compatible RESTful APIs for Amazon Bedrock
 ## Breaking Changes
-The source code is refactored with the new [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) by bedrock which provides native support with tool calls.
+This tool can now automatically detect new models supported in Amazon Bedrock. 
 So whenever new models are added to Amazon Bedrock, you can immediately try them without the need to wait for code changes to this repo. 
-If you are facing any problems, please raise an issue.
+This is to use the `ListFoundationModels` api and the `ListInferenceProfiles` api by Amazon Bedrock, due to this change, additional IAM permissions are required to your Lambda/Fargate role.
 If you are facing error: 'Unsupported model xxx, please use models API to get a list of supported models' even the model ID is correct, 
 please either update your existing stack with the new template in the deployment folder or manually add below permissions to the related Lambda/Fargate role.
 ```json
 {
   "Action": [
       "bedrock:ListFoundationModels",
       "bedrock:ListInferenceProfiles"
   ],
   "Resource": "*",
   "Effect": "Allow"
 }
 ```
 Please raise an GitHub issue if you still have problems.
 ## Overview
@@ -160,65 +176,10 @@ print(completion.choices[0].message.content)
 Please check [Usage Guide](./docs/Usage.md) for more details about how to use embedding API, multimodal API and tool call.
 ### Bedrock Cross-Region Inference
 Cross-Region Inference supports accessing foundation models across regions, allowing users to invoke models hosted in different AWS regions for inference. Main advantages:
 - **Improved Availability**: Provides regional redundancy and enhanced fault tolerance. When issues occur in the primary region, services can failover to backup regions, ensuring continuous service availability and business continuity.
 - **Reduced Latency**: Enables selection of regions geographically closest to users, optimizing network paths and reducing transmission time, resulting in better user experience and response times.
 - **Better Performance and Capacity**: Implements load balancing to distribute request pressure, provides greater service capacity and throughput, and better handles traffic spikes.
 - **Flexibility**: Allows selection of models from different regions based on requirements, meets specific regional compliance requirements, and enables more flexible resource allocation and management.
 - **Cost Benefits**: Enables selection of more cost-effective regions, reduces overall operational costs through resource optimization, and improves resource utilization efficiency.
 Please check [Bedrock Cross-Region Inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html)
 **limitation:**
 Currently, Bedrock Access Gateway only supports cross-region Inference for the following models:
 - Claude 3 Haiku
 - Claude 3 Opus
 - Claude 3 Sonnet
 - Claude 3.5 Sonnet
 - Meta Llama 3.1 8b Instruct
 - Meta Llama 3.1 70b Instruct
 - Meta Llama 3.2 1B Instruct
 - Meta Llama 3.2 3B Instruct
 - Meta Llama 3.2 11B Vision Instruct
 - Meta Llama 3.2 90B Vision Instruct
 **Prerequisites:**
 - IAM policies must allow cross-region access,Callers need permissions to access models and inference profiles in both regions (added in cloudformation template)
 - Model access must be enabled in both regions, which defined in inference profiles
 **Example API Usage:**
 - To use Bedrock cross-region inference, you include an inference profile when running model inference by specifying the ID of the inference profile as the modelId, such as `us.anthropic.claude-3-5-sonnet-20240620-v1:0`
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
    "max_tokens": 2048,
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
 ```
 ## Other Examples
 ### AutoGen
 Below is an image of setting up the model in AutoGen studio.
 ![AutoGen Model](assets/autogen-model.png)
 ### LangChain
 Make sure you use `ChatOpenAI(...)` instead of `OpenAI(...)`
--- a/README_CN.md
+++ b/README_CN.md
@@ -6,9 +6,26 @@
 ## 重大变更
-项目源代码已使用Bedrock提供的新 [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) 进行了重构,该API对工具调用提供了原生支持。
+这个方案现在可以自动检测 Amazon Bedrock 中支持的新模型。
 因此，当 Amazon Bedrock 添加新模型时，您可以立即尝试使用它们，无需等待此代码库的更新。
-如果您遇到任何问题,请提 Github Issue。
+这是通过使用Amazon Bedrock 的 `ListFoundationModels API` 和 `ListInferenceProfiles` API 实现的。由于这一变更，您需要为 Lambda/Fargate 角色添加额外的 IAM 权限。
 如果您遇到错误："Unsupported model xxx, please use models API to get a list of supported models"（即使Model ID 是正确的），
 请使用Deployment 文件夹中的新模板更新您现有的堆栈，或手动为相关的 Lambda/Fargate 角色添加以下权限。
 ```json
 {
   "Action": [
       "bedrock:ListFoundationModels",
       "bedrock:ListInferenceProfiles"
   ],
   "Resource": "*",
   "Effect": "Allow"
 }
 ```
 如果依然有问题，请提个GitHub issue。
 ## 概述
@@ -158,62 +175,10 @@ print(completion.choices[0].message.content)
 请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用Embedding API、多模态API和Tool Call的更多详细信息。
 ### Bedrock Cross-Region Inference
 Cross-Region Inference 支持跨区域访问的基础模型,即允许用户在一个 AWS 区域中调用其他区域的基础模型进行推理。主要优势:
 - **提高可用性**: 提供区域冗余，增强容错能力。当主要区域出现问题时可以切换到备用区域，确保服务的持续可用性和业务连续性
 - **降低延迟**: 可以选择地理位置最接近用户的区域,优化网络路径，减少传输时间,提供更好的用户体验和响应速度
 - **性能和容量优化**: 实现负载均衡，分散请求压力,提供更大的服务容量和吞吐量,能够更好地处理流量峰值
 - **灵活性**: 根据需求选择不同区域的模型,满足特定地区的合规要求,更灵活的资源调配和管理
 - **成本效益**: 可以选择成本更优的区域,通过优化资源使用降低总体运营成本,更好的资源利用效率
 详细介绍请查看[Bedrock Cross-Region Inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html)
 **限制条件:**
 当前 Gateway 只添加了对 Claude 3 Haiku/Claude 3 Opus/Claude 3 Sonnet/Claude 3.5 Sonnet 的跨区域调用
 - Claude 3 Haiku
 - Claude 3 Opus
 - Claude 3 Sonnet
 - Claude 3.5 Sonnet
 - Meta Llama 3.1 8b Instruct
 - Meta Llama 3.1 70b Instruct
 - Meta Llama 3.2 1B Instruct
 - Meta Llama 3.2 3B Instruct
 - Meta Llama 3.2 11B Vision Instruct
 - Meta Llama 3.2 90B Vision Instruct
 **使用前提:**
 - IAM Policy 有 inference profiles 相关的权限和调用模型的权限 (cloudformation template 中已添加)
 - 对 inference profiles 中定义的模型和区域中都启用模型访问权限
 **使用方法:**
 - 在调用模型时设置 modelId 为 inference profile ID, 例如 `us.anthropic.claude-3-5-sonnet-20240620-v1:0`
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
    "max_tokens": 2048,
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
 ```
 ## 其他例子
 ### AutoGen
 例如在AutoGen studio配置和使用模型
 ![AutoGen Model](assets/autogen-model.png)
 ### LangChain
 请确保使用的示`ChatOpenAI(...)` ，而不是`OpenAI(...)`
--- a/assets/autogen-agent.png
+++ b/assets/autogen-agent.png
--- a/assets/autogen-model.png
+++ b/assets/autogen-model.png
--- a/src/Dockerfile_ecs
+++ b/src/Dockerfile_ecs
@@ -8,6 +8,6 @@ RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
 COPY ./api /app/api
-ENV PORT 80
+ENV PORT=80
 CMD ["sh", "-c", "uvicorn api.app:app --host 0.0.0.0 --port ${PORT}"]