test: folder with error file for code review and pr description

2024-10-09 08:20:13 +00:00
34 changed files with 2765 additions and 2927 deletions
--- a/.flake8
+++ b/.flake8
@@ -0,0 +1,19 @@
 [flake8]
 max-line-length = 120
 ignore =
    E203,W191,W503
 exclude =
    build
    .git
    __pycache__
    .tox
    venv
    .venv
    .venv-test
    tmp*
    deployment
    cdk.out
    node_modules
 max-complexity = 10
 require-code = True
--- a/.github/aws-genai-cicd-suite.yml
+++ b/.github/aws-genai-cicd-suite.yml
@@ -0,0 +1,74 @@
 name: Intelligent Code Review
 # Enable manual trigger
 on:
  workflow_dispatch:
  pull_request:
    types: [opened, synchronize]
    # Avoid running the same workflow on the same branch concurrently
 concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
 jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      # read repository contents and write pull request comments
      id-token: write
      # allow github action bot to push new content into existing pull requests
      contents: write
      # contents: read
      pull-requests: write
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
    - name: Set up Node.js
      uses: actions/setup-node@v3
      with:
        node-version: '20'
    - name: Install dependencies
      run: npm ci
      shell: bash
    # check if required dependencies @actions/core and @actions/github are installed
    - name: Check if required dependencies are installed
      run: |
        npm list @actions/core
        npm list @actions/github
      shell: bash
    - name: Debug GitHub Token
      run: |
        if [ -n "${{ secrets.GITHUB_TOKEN }}" ]; then
          echo "GitHub Token is set"
        else
          echo "GitHub Token is not set"
        fi
    # assume the specified IAM role and set up the AWS credentials for use in subsequent steps.
    - name: Configure AWS Credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        # using repository secret to get the role arn
        role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
        aws-region: us-east-1
    - name: Intelligent GitHub Actions
      uses: aws-samples/aws-genai-cicd-suite@stable
      with:
        # Automatic Provision: The GITHUB_TOKEN is automatically created and provided by GitHub for each workflow run. You don't need to manually create or store this token as a secret.
        github-token: ${{ secrets.GITHUB_TOKEN }}
        aws-region: us-east-1
        model-id: anthropic.claude-3-sonnet-20240229-v1:0
        generate-code-review: 'true'
        generate-code-review-level: 'detailed'
        generate-code-review-exclude-files: '*.md,*.json,*.js'
        generate-pr-description: 'true'
        generate-unit-test: 'false'
        generate-unit-test-source-folder: 'debugging'
        # Removed the invalid input 'generate-unit-test-exclude-files'
        # output-language: 'zh'
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
@@ -160,4 +160,3 @@ cython_debug/
 .idea/
 Config
 .vscode/launch.json
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,10 +0,0 @@
 repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    # Ruff version.
    rev: v0.9.10
    hooks:
      # Run the linter.
      - id: ruff
        types_or: [python, pyi]
      # Run the formatter.
      - id: ruff-format
--- a/README.md
+++ b/README.md
@@ -1,19 +1,15 @@
 [中文](./README_CN.md)
 # Bedrock Access Gateway
 OpenAI-compatible RESTful APIs for Amazon Bedrock
-## What's New 🔥
+## Breaking Changes
-**API Gateway Response Streaming Support** - You can now deploy with Amazon API Gateway REST API instead of ALB, enabling true response streaming for better latency and cost optimization. See [Deployment Options](#deployment-options) for details.
+The source code is refactored with the new [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) by bedrock which provides native support with tool calls.
-**Latest Models Supported:**
+If you are facing any problems, please raise an issue.
 - **Claude 4.5 Family**: Opus 4.5, Sonnet 4.5, Haiku 4.5 - Anthropic's most intelligent models with enhanced coding and agent capabilities
 - **Amazon Nova**: Nova Micro, Nova Lite, Nova Pro, Nova Premier - Amazon's native foundation models with multimodal support
 - **DeepSeek**: DeepSeek-R1 (reasoning), DeepSeek-V3.1 - Advanced reasoning and general-purpose models
 - **Qwen 3**: Qwen3-32B, Qwen3-235B, Qwen3-Coder-30B, Qwen3-Coder-480B - Alibaba's latest language and coding models
 - **OpenAI OSS**: gpt-oss-20b, gpt-oss-120b - Open-source GPT models available via Bedrock
 It also supports reasoning for **Claude 4/4.5** (extended thinking and interleaved thinking) and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
 ## Overview
@@ -29,17 +25,25 @@ If you find this GitHub repository useful, please consider giving it a free star
 - [x] Support streaming response via server-sent events (SSE)
 - [x] Support Model APIs
 - [x] Support Chat Completion APIs
- [x] Support Tool Call
+- [x] Support Tool Call (**new**)
- [x] Support Embedding API
+- [x] Support Embedding API (**new**)
- [x] Support Multimodal API
+- [x] Support Multimodal API (**new**)
 - [x] Support Cross-Region Inference
 - [x] Support Application Inference Profiles (**new**)
 - [x] Support Reasoning (**new**)
 - [x] Support Interleaved thinking (**new**)
 - [x] Support Prompt Caching (**new**)
 Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.
 > **Note:** The legacy [text completion](https://platform.openai.com/docs/api-reference/completions) API is not supported, you should change to use chat completion API.
 Supported Amazon Bedrock models family:
 - Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
 - Meta Llama 2 / 3
 - Mistral / Mixtral
 - Cohere Command R / R+
 - Cohere Embedding
 You can call the `models` API to get the full list of model IDs supported.
 > **Note:** The default model is set to `anthropic.claude-3-sonnet-20240229-v1:0` which can be changed via Lambda environment variables (`DEFAULT_MODEL`).
 ## Get Started
@@ -53,100 +57,58 @@ Please make sure you have met below prerequisites:
 ### Architecture
-The following diagram illustrates the reference architecture. It uses [Amazon API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with Lambda for SSE support.
+The following diagram illustrates the reference architecture. Note that it also includes a new **VPC** with two public subnets only for the Application Load Balancer (ALB).
-![Architecture](assets/arch.png)
+![Architecture](assets/arch.svg)
-### Deployment Options
+You can also choose to use [AWS Fargate](https://aws.amazon.com/fargate/) behind the ALB instead of [AWS Lambda](https://aws.amazon.com/lambda/), the main difference is the latency of the first byte for streaming response (Fargate is lower).
-| Option | Pros | Cons | Best For |
+Alternatively, you can use Lambda Function URL to replace ALB, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
 |--------|------|------|----------|
 | **API Gateway + Lambda** | No VPC required, pay-per-request, native streaming support, lower operational overhead | Potential cold starts | Most use cases, cost-sensitive deployments |
 | **ALB + Fargate** | Lowest streaming latency, no cold starts | Higher cost, requires VPC | High-throughput, latency-sensitive workloads |
 You can also use Lambda Function URL as an alternative, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
 ### Deployment
-Please follow the steps below to deploy the Bedrock Proxy APIs into your AWS account. Only supports regions where Amazon Bedrock is available (such as `us-west-2`). The deployment will take approximately **10-15 minutes** 🕒.
+Please follow the steps below to deploy the Bedrock Proxy APIs into your AWS account. Only supports regions where Amazon Bedrock is available (such as `us-west-2`). The deployment will take approximately **3-5 minutes** 🕒.
-**Step 1: Create your own API key in Secrets Manager (MUST)**
+**Step 1: Create your own custom API key (Optional)**
-> **Note:** This step is to use any string (without spaces) you like to create a custom API Key (credential) that will be used to access the proxy API later. This key does not have to match your actual OpenAI key, and you don't need to have an OpenAI API key. please keep the key safe and private.
+> **Note:** This step is to use any string (without spaces) you like to create a custom API Key (credential) that will be used to access the proxy API later. This key does not have to match your actual OpenAI key, and you don't need to have an OpenAI API key. It is recommended that you take this step and ensure that you keep the key safe and private.
-1. Open the AWS Management Console and navigate to the AWS Secrets Manager service.
+1. Open the AWS Management Console and navigate to the Systems Manager service.
-2. Click on "Store a new secret" button.
+2. In the left-hand navigation pane, click on "Parameter Store".
-3. In the "Choose secret type" page, select:
+3. Click on the "Create parameter" button.
 4. In the "Create parameter" window, select the following options:
    - Name: Enter a descriptive name for your parameter (e.g., "BedrockProxyAPIKey").
    - Description: Optionally, provide a description for the parameter.
    - Tier: Select **Standard**.
    - Type: Select **SecureString**.
    - Value: Any string (without spaces).
 5. Click "Create parameter".
 6. Make a note of the parameter name you used (e.g., "BedrockProxyAPIKey"). You'll need this in the next step.
-   Secret type: Other type of secret
+**Step 2: Deploy the CloudFormation stack**
   Key/value pairs:
   - Key: api_key
   - Value: Enter your API key value
-   Click "Next"
+1. Sign in to AWS Management Console, switch to the region to deploy the CloudFormation Stack to.
-4. In the "Configure secret" page:
+2. Click the following button to launch the CloudFormation Stack in that region. Choose one of the following:
-   Secret name: Enter a name (e.g., "BedrockProxyAPIKey")
+   - **ALB + Lambda**
   Description: (Optional) Add a description of your secret
 5. Click "Next" and review all your settings and click "Store"
-After creation, you'll see your secret in the Secrets Manager console. Make note of the secret ARN.
+      [![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxy.template)
   - **ALB + Fargate**
-**Step 2: Build and push container images to ECR**
+      [![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxyFargate.template)
-
+3. Click "Next".
-1. Clone this repository:
+4. On the "Specify stack details" page, provide the following information:
-   ```bash
+    - Stack name: Change the stack name if needed.
-   git clone https://github.com/aws-samples/bedrock-access-gateway.git
+    - ApiKeyParam (if you set up an API key in Step 1): Enter the parameter name you used for storing the API key (e.g., `BedrockProxyAPIKey`). If you did not set up an API key, leave this field blank. Click "Next".
-   cd bedrock-access-gateway
+5. On the "Configure stack options" page, you can leave the default settings or customize them according to your needs.
-   ```
+6. Click "Next".
-
+7. On the "Review" page, review the details of the stack you're about to create. Check the "I acknowledge that AWS CloudFormation might create IAM resources" checkbox at the bottom.
-2. Run the build and push script:
+8. Click "Create stack".
   ```bash
   cd scripts
   bash ./push-to-ecr.sh
   ```
 3. Follow the prompts to configure:
   - ECR repository names (or use defaults)
   - Image tag (or use default: `latest`)
   - AWS region (or use default: `us-east-1`)
 4. The script will build and push both Lambda and ECS/Fargate images to your ECR repositories.
 5. **Important**: Copy the image URIs displayed at the end of the script output. You'll need these in the next step.
 **Step 3: Deploy the CloudFormation stack**
 1. Download the CloudFormation template you want to use:
   - For API Gateway + Lambda: [`deployment/BedrockProxy.template`](deployment/BedrockProxy.template)
   - For ALB + Fargate: [`deployment/BedrockProxyFargate.template`](deployment/BedrockProxyFargate.template)
 2. Sign in to AWS Management Console and navigate to the CloudFormation service in your target region.
 3. Click "Create stack" → "With new resources (standard)".
 4. Upload the template file you downloaded.
 5. On the "Specify stack details" page, provide the following information:
   - **Stack name**: Enter a stack name (e.g., "BedrockProxyAPI")
   - **ApiKeySecretArn**: Enter the secret ARN from Step 1
   - **ContainerImageUri**: Enter the ECR image URI from Step 2 output
   - **DefaultModelId**: (Optional) Change the default model if needed
   Click "Next".
 6. On the "Configure stack options" page, you can leave the default settings or customize them according to your needs. Click "Next".
 7. On the "Review" page, review all details. Check the "I acknowledge that AWS CloudFormation might create IAM resources" checkbox at the bottom. Click "Submit".
 That is it! 🎉 Once deployed, click the CloudFormation stack and go to **Outputs** tab, you can find the API Base URL from `APIBaseUrl`, the value should look like `http://xxxx.xxx.elb.amazonaws.com/api/v1`.
 ### Troubleshooting
 If you encounter any issues, please check the [Troubleshooting Guide](./docs/Troubleshooting.md) for more details.
 ### SDK/API Usage
-All you need is the API Key and the API Base URL. If you didn't set up your own key following Step 1, the application will fail to start with an error message indicating that the API Key is not configured.
+All you need is the API Key and the API Base URL. If you didn't set up your own key, then the default API Key (`bedrock`) will be used.
 Now, you can try out the proxy APIs. Let's say you want to test Claude 3 Sonnet model (model ID: `anthropic.claude-3-sonnet-20240229-v1:0`)...
@@ -191,123 +153,14 @@ print(completion.choices[0].message.content)
 Please check [Usage Guide](./docs/Usage.md) for more details about how to use embedding API, multimodal API and tool call.
 ### Application Inference Profiles
 This proxy now supports **Application Inference Profiles**, which allow you to track usage and costs for your model invocations. You can use application inference profiles created in your AWS account for cost tracking and monitoring purposes.
 **Using Application Inference Profiles:**
 ```bash
 # Use an application inference profile ARN as the model ID
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "arn:aws:bedrock:us-west-2:123456789012:application-inference-profile/your-profile-id",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
 ```
 **SDK Usage with Application Inference Profiles:**
 ```python
 from openai import OpenAI
 client = OpenAI()
 completion = client.chat.completions.create(
    model="arn:aws:bedrock:us-west-2:123456789012:application-inference-profile/your-profile-id",
    messages=[{"role": "user", "content": "Hello!"}],
 )
 print(completion.choices[0].message.content)
 ```
 **Benefits of Application Inference Profiles:**
 - **Cost Tracking**: Track usage and costs for specific applications or use cases
 - **Usage Monitoring**: Monitor model invocation metrics through CloudWatch
 - **Tag-based Cost Allocation**: Use AWS cost allocation tags for detailed billing analysis
 For more information about creating and managing application inference profiles, see the [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-create.html).
 ### Prompt Caching
 This proxy now supports **Prompt Caching** for Claude and Nova models, which can reduce costs by up to 90% and latency by up to 85% for workloads with repeated prompts.
 **Supported Models:**
 - Claude models (Claude 3.5 Haiku, Claude 4, Claude 4.5, etc.)
 - Nova models (Nova Micro, Nova Lite, Nova Pro, Nova Premier)
 **Enabling Prompt Caching:**
 You can enable prompt caching in two ways:
 1. **Globally via Environment Variable** (set in ECS Task Definition or Lambda):
 ```bash
 ENABLE_PROMPT_CACHING=true
 ```
 2. **Per-request via `extra_body`** :
 **Python SDK:**
 ```python
 from openai import OpenAI
 client = OpenAI()
 # Cache system prompts
 response = client.chat.completions.create(
    model="global.anthropic.claude-haiku-4-5-20251001-v1:0",
    messages=[
        {"role": "system", "content": "You are an expert assistant with knowledge of..."},
        {"role": "user", "content": "Help me with this task"}
    ],
    extra_body={
        "prompt_caching": {"system": True}
    }
 )
 # Check cache hit
 if response.usage.prompt_tokens_details:
    cached_tokens = response.usage.prompt_tokens_details.cached_tokens
    print(f"Cached tokens: {cached_tokens}")
 ```
 **cURL:**
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "global.anthropic.claude-haiku-4-5-20251001-v1:0",
    "messages": [
      {"role": "system", "content": "Long system prompt..."},
      {"role": "user", "content": "Question"}
    ],
    "extra_body": {
      "prompt_caching": {"system": true}
    }
  }'
 ```
 **Cache Options:**
 - `"prompt_caching": {"system": true}` - Cache system prompts
 - `"prompt_caching": {"messages": true}` - Cache user messages
 - `"prompt_caching": {"system": true, "messages": true}` - Cache both
 **Requirements:**
 - Prompt must be ≥1,024 tokens to enable caching
 - Cache TTL is 5 minutes (resets on each cache hit)
 - Nova models have a 20,000 token caching limit
 For more information, see the [Amazon Bedrock Prompt Caching Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html).
 ## Other Examples
 ### AutoGen
 Below is an image of setting up the model in AutoGen studio.
 ![AutoGen Model](assets/autogen-model.png)
 ### LangChain
 Make sure you use `ChatOpenAI(...)` instead of `OpenAI(...)`
@@ -346,37 +199,43 @@ print(response)
 This application does not collect any of your data. Furthermore, it does not log any requests or responses by default.
-### Why choose API Gateway vs ALB?
+### Why not used API Gateway instead of Application Load Balancer?
-**API Gateway + Lambda** uses [API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with [Lambda Web Adapter](https://github.com/awslabs/aws-lambda-web-adapter) to support SSE streaming without requiring a VPC. This is a cost-effective, serverless option with up to 10 minutes timeout.
+Short answer is that API Gateway does not support server-sent events (SSE) for streaming response.
 **ALB + Fargate** provides the lowest streaming latency with no cold starts, ideal for high-throughput workloads.
 ### Which regions are supported?
 This solution only supports the regions where Amazon Bedrock is available, as for now, below are the list.
 - US East (N. Virginia): us-east-1
 - US West (Oregon): us-west-2
 - Asia Pacific (Singapore): ap-southeast-1
 - Asia Pacific (Sydney): ap-southeast-2
 - Asia Pacific (Tokyo): ap-northeast-1
 - Europe (Frankfurt): eu-central-1
 - Europe (Paris): eu-west-3
 Generally speaking, all regions that Amazon Bedrock supports will also be supported, if not, please raise an issue in Github.
 Note that not all models are available in those regions.
-### Which models are supported?
+### Can I build and use my own ECR image
-You can use the [Models API](./docs/Usage.md#models-api) to get/refresh a list of supported models in the current region.
+Yes, you can clone the repo and build the container image by yourself (`src/Dockerfile`) and then push to your ECR repo. You can use `scripts/push-to-ecr.sh`
 Replace the repo url in the CloudFormation template before you deploy.
 ### Can I run this locally
-Yes, you can run this locally, e.g. run below command under `src` folder:
+Yes, you can run this locally.
 ```bash
 uvicorn api.app:app --host 0.0.0.0 --port 8000
 ```
 The API base url should look like `http://localhost:8000/api/v1`.
 ### Any performance sacrifice or latency increase by using the proxy APIs
-Compared with direct AWS SDK calls, the proxy architecture will add some latency. The default API Gateway + Lambda deployment provides good streaming performance with Lambda response streaming.
+Comparing with the AWS SDK call, the referenced architecture will bring additional latency on response, you can try and test that on you own.
-For lowest latency on streaming responses, consider the ALB + Fargate deployment option which eliminates cold starts and provides consistent performance.
+Also, you can use Lambda Web Adapter + Function URL (see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.
 ### Any plan to support SageMaker models?
@@ -388,7 +247,13 @@ Fine-tuned models and models with Provisioned Throughput are currently not suppo
 ### How to upgrade?
-To use the latest features, you need follow the deployment guide to redeploy the application. You can upgrade the existing CloudFormation stack to get the latest changes.
+To use the latest features, you don't need to redeploy the CloudFormation stack. You simply need to pull the latest image. 
 To do so, depends on which version you deployed:
 - **Lambda version**: Go to AWS Lambda console, find the Lambda function, then find and click the `Deploy new image` button and click save.
 - **Fargate version**: Go to ECS console, click the ECS cluster, go the `Tasks` tab, select the only task that is running and simply click `Stop selected` menu. A new task with latest image will start automatically.
 ## Security
--- a/README_CN.md
+++ b/README_CN.md
@@ -0,0 +1,267 @@
 [English](./README.md)
 # Bedrock Access Gateway
 使用兼容OpenAI的API访问Amazon Bedrock
 ## 重大变更
 项目源代码已使用Bedrock提供的新 [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) 进行了重构,该API对工具调用提供了原生支持。
 如果您遇到任何问题,请提 Github Issue。
 ## 概述
 Amazon Bedrock提供了广泛的基础模型(如Claude 3 Opus/Sonnet/Haiku、Llama 2/3、Mistral/Mixtral等),以及构建生成式AI应用程序的多种功能。更多详细信息,请查看[Amazon 
 Bedrock](https://aws.amazon.com/bedrock)。
 有时,您可能已经使用OpenAI的API或SDK构建了应用程序,并希望在不修改代码的情况下试用Amazon
 Bedrock的模型。或者,您可能只是希望在AutoGen等工具中评估这些基础模型的功能。 好消息是, 这里提供了一种方便的途径,让您可以使用
 OpenAI 的 API 或 SDK 无缝集成并试用 Amazon Bedrock 的模型,而无需对现有代码进行修改。
 如果您觉得这个项目有用,请考虑给它点个一个免费的小星星 ⭐。
 功能列表：
 - [x] 支持 server-sent events (SSE)的流式响应
 - [x] 支持 Model APIs
 - [x] 支持 Chat Completion APIs
 - [x] 支持 Tool Call (**new**)
 - [x] 支持 Embedding API (**new**)
 - [x] 支持 Multimodal API (**new**)
 请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用新API的更多详细信息。
 > 注意： 不支持旧的 [text completion](https://platform.openai.com/docs/api-reference/completions) API，请更改为使用Chat Completion API。
 支持的Amazon Bedrock模型家族：
 - Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
 - Meta Llama 2 / 3
 - Mistral / Mixtral
 - Cohere Command R / R+
 - Cohere Embedding
 你可以先调用`models` API 获取支持的详细 model ID 列表。
 > 注意: 默认模型为 `anthropic.claude-3-sonnet-20240229-v1:0`， 可以通过更改Lambda环境变量进行更改。
 ## 使用指南
 ### 前提条件
 请确保您已满足以下先决条件:
 - 可以访问Amazon Bedrock基础模型。
 如果您还没有获得模型访问权限,请参考[配置](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html)指南。
 ### 架构图
 下图展示了本方案的参考架构。请注意,它还包括一个新的**VPC**,其中只有两个公共子网用于应用程序负载均衡器(ALB)。
 ![Architecture](assets/arch.svg)
 您也可以选择在 ALB 后面接 [AWS Fargate](https://aws.amazon.com/fargate/) 而不是 [AWS Lambda](https://aws.amazon.com/lambda/)，主要区别在于流响应的首字节延迟（Fargate更低）。
 或者,您可以使用 Lambda Function URL 来代替 ALB,请参阅[示例](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
 ### 部署
 请按以下步骤将Bedrock代理API部署到您的AWS账户中。仅支持Amazon Bedrock可用的区域(如us-west-2)。 部署预计用时**3-5分钟** 🕒。
 **第一步: 自定义您的API Key (可选)**
 > 注意:这一步是使用任意字符串（不带空格）创建一个自定义的API Key(凭证),将用于后续访问代理API。此API Key不必与您实际的OpenAI
 > Key一致,您甚至无需拥有OpenAI API Key。建议您执行此步操作并且请确保保管好此API Key。
 1. 打开AWS管理控制台,导航到Systems Manager服务。
 2. 在左侧导航窗格中,单击"参数存储"。
 3. 单击"创建参数"按钮。
 4. 在"创建参数"窗口中,选择以下选项:
    - 名称:输入参数的描述性名称(例如"BedrockProxyAPIKey")。
    - 描述:可选,为参数提供描述。
    - 层级:选择**标准**。
    - 类型:选择**SecureString**。
    - 值: 随意字符串（不带空格）。
 5. 单击"创建参数"。
 6. 记录您使用的参数名称(例如"BedrockProxyAPIKey")。您将在下一步中需要它。
 **第二步: 部署CloudFormation堆栈**
 1. 登录AWS管理控制台,切换到要部署CloudFormation堆栈的区域。
 2. 单击以下按钮在该区域启动CloudFormation堆栈，选择一种方式部署。
   - **ALB + Lambda**
      [![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxy.template)
   - **ALB + Fargate**
      [![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxyFargate.template)
 3. 单击"下一步"。
 4. 在"指定堆栈详细信息"页面,提供以下信息:
    - 堆栈名称: 可以根据需要更改名称。
    - ApiKeyParam(如果在步骤1中设置了API Key):输入您用于存储API密钥的参数名称(例如"BedrockProxyAPIKey")，否则,请将此字段留空。
      单击"下一步"。
 5. 在"配置堆栈选项"页面,您可以保留默认设置或根据需要进行自定义。
 6. 单击"下一步"。
 7. 在"审核"页面,查看您即将创建的堆栈详细信息。勾选底部的"我确认，AWS CloudFormation 可能创建 IAM 资源。"复选框。
 8. 单击"创建堆栈"。
 仅此而已 🎉 。部署完成后,点击CloudFormation堆栈,进入"输出"选项卡,你可以从"APIBaseUrl"
 中找到API Base URL,它应该类似于`http://xxxx.xxx.elb.amazonaws.com/api/v1` 这样的格式。
 ### SDK/API使用
 你只需要API Key和API Base URL。如果你没有设置自己的密钥,那么默认将使用API Key `bedrock`。
 现在,你可以尝试使用代理API了。假设你想测试Claude 3 Sonnet模型,那么使用"anthropic.claude-3-sonnet-20240229-v1:0"作为模型ID。
 - **API 使用示例**
 ```bash
 export OPENAI_API_KEY=<API key>
 export OPENAI_BASE_URL=<API base url>
 # 旧版本请使用OPENAI_API_BASE
 # https://github.com/openai/openai-python/issues/624
 export OPENAI_API_BASE=<API base url>
 ```
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "anthropic.claude-3-sonnet-20240229-v1:0",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
 ```
 - **SDK 使用示例**
 ```python
 from openai import OpenAI
 client = OpenAI()
 completion = client.chat.completions.create(
    model="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[{"role": "user", "content": "Hello!"}],
 )
 print(completion.choices[0].message.content)
 ```
 请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用Embedding API、多模态API和Tool Call的更多详细信息。
 ## 其他例子
 ### AutoGen
 例如在AutoGen studio配置和使用模型
 ![AutoGen Model](assets/autogen-model.png)
 ### LangChain
 请确保使用的示`ChatOpenAI(...)` ，而不是`OpenAI(...)`
 ```python
 # pip install langchain-openai
 import os
 from langchain.chains import LLMChain
 from langchain.prompts import PromptTemplate
 from langchain_openai import ChatOpenAI
 chat = ChatOpenAI(
    model="anthropic.claude-3-sonnet-20240229-v1:0",
    temperature=0,
    openai_api_key=os.environ['OPENAI_API_KEY'],
    openai_api_base=os.environ['OPENAI_BASE_URL'],
 )
 template = """Question: {question}
 Answer: Let's think step by step."""
 prompt = PromptTemplate.from_template(template)
 llm_chain = LLMChain(prompt=prompt, llm=chat)
 question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
 response = llm_chain.invoke(question)
 print(response)
 ```
 ## FAQs
 ### 关于隐私
 这个方案不会收集您的任何数据。而且,它默认情况下也不会记录任何请求或响应。
 ### 为什么没有使用API Gateway 而是使用了Application Load Balancer?
 简单的答案是API Gateway不支持 server-sent events (SSE) 用于流式响应。
 ### 支持哪些区域?
 只支持Amazon Bedrock可用的区域, 截至当前，包括以下区域:
 - 美国东部(弗吉尼亚北部)：us-east-1
 - 美国西部(俄勒冈州)：us-west-2
 - 亚太地区(新加坡)：ap-southeast-1
 - 亚太地区(悉尼)：ap-southeast-2
 - 亚太地区(东京)：ap-northeast-1
 - 欧洲(法兰克福)：eu-central-1
 - 欧洲(巴黎)：eu-west-3
 通常来说，所有Amazon Bedrock支持的区域都支持，如果不支持，请提个Github Issue。
 注意，并非所有模型都在上面区可用。
 ### 我可以构建并使用自己的ECR镜像吗?
 是的,你可以克隆repo并自行构建容器镜像(src/Dockerfile),然后推送到你自己的ECR仓库。 脚本可以参考`scripts/push-to-ecr.sh`。
 在部署之前,请在CloudFormation模板中替换镜像仓库URL。
 ### 我可以在本地运行吗?
 是的,你可以在本地运行,那么API Base URL应该类似于`http://localhost:8000/api/v1`
 ### 使用代理API会有任何性能牺牲或延迟增加吗?
 与 AWS SDK 调用相比,本方案参考架构会在响应上会有额外的延迟,你可以自己部署并测试。
 另外,你也可以使用 Lambda Web Adapter + Function URL (
 参见 [示例](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming))来代替 ALB
 或使用 AWS Fargate 来代替 Lambda,以获得更好的流响应性能。
 ### 有计划支持SageMaker模型吗?
 目前没有支持SageMaker模型的计划。这取决于是否有客户需求。
 ### 有计划支持Bedrock自定义模型吗?
 不支持微调模型和设置了已预配吞吐量的模型。如有需要,你可以克隆repo并进行自定义。
 ### 如何升级?
 要使用最新功能,您无需重新部署CloudFormation堆栈。您只需拉取最新的镜像即可。
 具体操作方式取决于您部署的版本:
 - **Lambda版本**: 进入AWS Lambda控制台,找到Lambda 函数，然后找到并单击`部署新映像`按钮,然后单击保存。
 - **Fargate版本**: 进入ECS控制台,单击ECS集群,转到`任务`选项卡,选择正在运行的唯一任务,然后点击`停止所选`菜单, ECS会自动启动新任务并且使用最新镜像。
 ## 安全
 更多信息,请参阅[CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications)。
 ## 许可证
 本项目根据MIT-0许可证获得许可。请参阅LICENSE文件。
--- a/8
+++ b/8
@@ -1,8 +0,0 @@
 certifi
 SPDX-License-Identifier: MPL-2.0
 This Source Code Form is subject to the terms of the Mozilla Public
 License, v. 2.0. If a copy of the MPL was not distributed with this
 file, You can obtain one at http://mozilla.org/MPL/2.0/.
 https://github.com/certifi/python-certifi
--- a/assets/arch.png
+++ b/assets/arch.png
--- a/assets/arch.svg
+++ b/assets/arch.svg
--- a/assets/autogen-agent.png
+++ b/assets/autogen-agent.png
--- a/assets/autogen-model.png
+++ b/assets/autogen-model.png
--- a/assets/launch-stack.png
+++ b/assets/launch-stack.png
--- a/deployment/BedrockProxy.template
+++ b/deployment/BedrockProxy.template
@@ -1,178 +1,768 @@
-Description: Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock (API Gateway + Lambda with Streaming)
+{
-Parameters:
+  "Description": "Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock",
-  ApiKeySecretArn:
+  "Transform": "AWS::LanguageExtensions",
-    Type: String
+  "Parameters": {
-    AllowedPattern: ^arn:aws:secretsmanager:.*$
+    "ApiKeyParam": {
-    Description: The secret ARN in Secrets Manager used to store the API Key
+      "Type": "String",
-  ContainerImageUri:
+      "Default": "",
-    Type: String
+      "Description": "The parameter name in System Manager used to store the API Key, leave blank to use a default key"
-    Description: The ECR image URI for the Lambda function (e.g., 123456789012.dkr.ecr.us-east-1.amazonaws.com/bedrock-proxy-api:latest)
+    }
-  DefaultModelId:
+  },
-    Type: String
+  "Resources": {
-    Default: anthropic.claude-3-sonnet-20240229-v1:0
+    "VPCB9E5F0B4": {
-    Description: The default model ID, please make sure the model ID is supported in the current region
+      "Type": "AWS::EC2::VPC",
-  EnablePromptCaching:
+      "Properties": {
-    Type: String
+        "CidrBlock": "10.250.0.0/16",
-    Default: "false"
+        "EnableDnsHostnames": true,
-    AllowedValues:
+        "EnableDnsSupport": true,
-      - "true"
+        "InstanceTenancy": "default",
-      - "false"
+        "Tags": [
-    Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings.
+          {
-Resources:
+            "Key": "Name",
-  # IAM Role for Lambda
+            "Value": "BedrockProxy/VPC"
-  ProxyApiHandlerServiceRole:
+          }
-    Type: AWS::IAM::Role
+        ]
-    Properties:
+      },
-      AssumeRolePolicyDocument:
+      "Metadata": {
-        Statement:
+        "aws:cdk:path": "BedrockProxy/VPC/Resource"
-          - Action: sts:AssumeRole
+      }
-            Effect: Allow
+    },
-            Principal:
+    "VPCPublicSubnet1SubnetB4246D30": {
-              Service: lambda.amazonaws.com
+      "Type": "AWS::EC2::Subnet",
-        Version: "2012-10-17"
+      "Properties": {
-      ManagedPolicyArns:
+        "AvailabilityZone": {
-        - !Sub "arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
+          "Fn::Select": [
-
+            0,
-  ProxyApiHandlerServiceRoleDefaultPolicy:
+            {
-    Type: AWS::IAM::Policy
+              "Fn::GetAZs": ""
-    Properties:
+            }
-      PolicyDocument:
+          ]
-        Statement:
+        },
-          - Action:
+        "CidrBlock": "10.250.0.0/24",
-              - bedrock:ListFoundationModels
+        "MapPublicIpOnLaunch": true,
-              - bedrock:ListInferenceProfiles
+        "Tags": [
-            Effect: Allow
+          {
-            Resource: "*"
+            "Key": "aws-cdk:subnet-name",
-          - Action:
+            "Value": "Public"
-              - bedrock:InvokeModel
+          },
-              - bedrock:InvokeModelWithResponseStream
+          {
-            Effect: Allow
+            "Key": "aws-cdk:subnet-type",
-            Resource:
+            "Value": "Public"
-              - arn:aws:bedrock:*::foundation-model/*
+          },
-              - arn:aws:bedrock:*:*:inference-profile/*
+          {
-              - arn:aws:bedrock:*:*:application-inference-profile/*
+            "Key": "Name",
-          - Action:
+            "Value": "BedrockProxy/VPC/PublicSubnet1"
-              - secretsmanager:GetSecretValue
+          }
-              - secretsmanager:DescribeSecret
+        ],
-            Effect: Allow
+        "VpcId": {
-            Resource: !Ref ApiKeySecretArn
+          "Ref": "VPCB9E5F0B4"
-        Version: "2012-10-17"
+        }
-      PolicyName: ProxyApiHandlerServiceRoleDefaultPolicy
+      },
-      Roles:
+      "Metadata": {
-        - !Ref ProxyApiHandlerServiceRole
+        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/Subnet"
-
+      }
-  # Lambda Function with Lambda Web Adapter for streaming
+    },
-  ProxyApiHandler:
+    "VPCPublicSubnet1RouteTableFEE4B781": {
-    Type: AWS::Lambda::Function
+      "Type": "AWS::EC2::RouteTable",
-    Properties:
+      "Properties": {
-      Architectures:
+        "Tags": [
-        - arm64
+          {
-      Code:
+            "Key": "Name",
-        ImageUri: !Ref ContainerImageUri
+            "Value": "BedrockProxy/VPC/PublicSubnet1"
-      Description: Bedrock Proxy API Handler with Response Streaming
+          }
-      Environment:
+        ],
-        Variables:
+        "VpcId": {
-          # Lambda Web Adapter settings
+          "Ref": "VPCB9E5F0B4"
-          AWS_LWA_INVOKE_MODE: RESPONSE_STREAM
+        }
-          AWS_LWA_READINESS_CHECK_PATH: /health
+      },
-          AWS_LWA_ASYNC_INIT: "true"
+      "Metadata": {
-          PORT: "8080"
+        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/RouteTable"
-          # Application settings
+      }
-          DEBUG: "false"
+    },
-          API_KEY_SECRET_ARN: !Ref ApiKeySecretArn
+    "VPCPublicSubnet1RouteTableAssociation0B0896DC": {
-          DEFAULT_MODEL: !Ref DefaultModelId
+      "Type": "AWS::EC2::SubnetRouteTableAssociation",
-          DEFAULT_EMBEDDING_MODEL: cohere.embed-multilingual-v3
+      "Properties": {
-          ENABLE_CROSS_REGION_INFERENCE: "true"
+        "RouteTableId": {
-          ENABLE_APPLICATION_INFERENCE_PROFILES: "true"
+          "Ref": "VPCPublicSubnet1RouteTableFEE4B781"
-          ENABLE_PROMPT_CACHING: !Ref EnablePromptCaching
+        },
-          API_ROUTE_PREFIX: /v1
+        "SubnetId": {
-      MemorySize: 1024
+          "Ref": "VPCPublicSubnet1SubnetB4246D30"
-      PackageType: Image
+        }
-      Role: !GetAtt ProxyApiHandlerServiceRole.Arn
+      },
-      Timeout: 600
+      "Metadata": {
-    DependsOn:
+        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/RouteTableAssociation"
-      - ProxyApiHandlerServiceRoleDefaultPolicy
+      }
-      - ProxyApiHandlerServiceRole
+    },
-
+    "VPCPublicSubnet1DefaultRoute91CEF279": {
-  # API Gateway REST API (Regional)
+      "Type": "AWS::EC2::Route",
-  RestApi:
+      "Properties": {
-    Type: AWS::ApiGateway::RestApi
+        "DestinationCidrBlock": "0.0.0.0/0",
-    Properties:
+        "GatewayId": {
-      Name: BedrockProxyApi
+          "Ref": "VPCIGWB7E252D3"
-      Description: Bedrock Access Gateway - OpenAI-compatible API with streaming support
+        },
-      EndpointConfiguration:
+        "RouteTableId": {
-        Types:
+          "Ref": "VPCPublicSubnet1RouteTableFEE4B781"
-          - REGIONAL
+        }
-      Body:
+      },
-        openapi: "3.0.1"
+      "DependsOn": [
-        info:
+        "VPCVPCGW99B986DC"
-          title: BedrockProxyApi
+      ],
-          version: "1.0"
+      "Metadata": {
-        paths:
+        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/DefaultRoute"
-          /{proxy+}:
+      }
-            x-amazon-apigateway-any-method:
+    },
-              parameters:
+    "VPCPublicSubnet2Subnet74179F39": {
-                - name: proxy
+      "Type": "AWS::EC2::Subnet",
-                  in: path
+      "Properties": {
-                  required: true
+        "AvailabilityZone": {
-                  schema:
+          "Fn::Select": [
-                    type: string
+            1,
-              x-amazon-apigateway-integration:
+            {
-                type: aws_proxy
+              "Fn::GetAZs": ""
-                httpMethod: POST
+            }
-                uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations"
+          ]
-                passthroughBehavior: when_no_match
+        },
-                timeoutInMillis: 600000
+        "CidrBlock": "10.250.1.0/24",
-                responseTransferMode: STREAM
+        "MapPublicIpOnLaunch": true,
-              responses:
+        "Tags": [
-                default:
+          {
-                  description: Default response
+            "Key": "aws-cdk:subnet-name",
-          /:
+            "Value": "Public"
-            x-amazon-apigateway-any-method:
+          },
-              x-amazon-apigateway-integration:
+          {
-                type: aws_proxy
+            "Key": "aws-cdk:subnet-type",
-                httpMethod: POST
+            "Value": "Public"
-                uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations"
+          },
-                passthroughBehavior: when_no_match
+          {
-                timeoutInMillis: 600000
+            "Key": "Name",
-                responseTransferMode: STREAM
+            "Value": "BedrockProxy/VPC/PublicSubnet2"
-              responses:
+          }
-                default:
+        ],
-                  description: Default response
+        "VpcId": {
-
+          "Ref": "VPCB9E5F0B4"
-  # Lambda Permission for API Gateway
+        }
-  LambdaPermission:
+      },
-    Type: AWS::Lambda::Permission
+      "Metadata": {
-    Properties:
+        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/Subnet"
-      FunctionName: !Ref ProxyApiHandler
+      }
-      Action: lambda:InvokeFunction
+    },
-      Principal: apigateway.amazonaws.com
+    "VPCPublicSubnet2RouteTable6F1A15F1": {
-      SourceArn: !Sub "arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${RestApi}/*"
+      "Type": "AWS::EC2::RouteTable",
-
+      "Properties": {
-  # API Gateway Deployment
+        "Tags": [
-  ApiDeployment:
+          {
-    Type: AWS::ApiGateway::Deployment
+            "Key": "Name",
-    Properties:
+            "Value": "BedrockProxy/VPC/PublicSubnet2"
-      RestApiId: !Ref RestApi
+          }
-    DependsOn:
+        ],
-      - RestApi
+        "VpcId": {
-
+          "Ref": "VPCB9E5F0B4"
-  # API Gateway Stage
+        }
-  ApiStage:
+      },
-    Type: AWS::ApiGateway::Stage
+      "Metadata": {
-    Properties:
+        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/RouteTable"
-      RestApiId: !Ref RestApi
+      }
-      DeploymentId: !Ref ApiDeployment
+    },
-      StageName: api
+    "VPCPublicSubnet2RouteTableAssociation5A808732": {
-      Description: API Stage with streaming support
+      "Type": "AWS::EC2::SubnetRouteTableAssociation",
-
+      "Properties": {
-Outputs:
+        "RouteTableId": {
-  APIBaseUrl:
+          "Ref": "VPCPublicSubnet2RouteTable6F1A15F1"
-    Description: Proxy API Base URL (OPENAI_API_BASE)
+        },
-    Value: !Sub "https://${RestApi}.execute-api.${AWS::Region}.amazonaws.com/api/v1"
+        "SubnetId": {
-  RestApiId:
+          "Ref": "VPCPublicSubnet2Subnet74179F39"
-    Description: API Gateway REST API ID
+        }
-    Value: !Ref RestApi
+      },
-  LambdaFunctionArn:
+      "Metadata": {
-    Description: Lambda Function ARN
+        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/RouteTableAssociation"
-    Value: !GetAtt ProxyApiHandler.Arn
+      }
    },
    "VPCPublicSubnet2DefaultRouteB7481BBA": {
      "Type": "AWS::EC2::Route",
      "Properties": {
        "DestinationCidrBlock": "0.0.0.0/0",
        "GatewayId": {
          "Ref": "VPCIGWB7E252D3"
        },
        "RouteTableId": {
          "Ref": "VPCPublicSubnet2RouteTable6F1A15F1"
        }
      },
      "DependsOn": [
        "VPCVPCGW99B986DC"
      ],
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/DefaultRoute"
      }
    },
    "VPCIGWB7E252D3": {
      "Type": "AWS::EC2::InternetGateway",
      "Properties": {
        "Tags": [
          {
            "Key": "Name",
            "Value": "BedrockProxy/VPC"
          }
        ]
      },
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/VPC/IGW"
      }
    },
    "VPCVPCGW99B986DC": {
      "Type": "AWS::EC2::VPCGatewayAttachment",
      "Properties": {
        "InternetGatewayId": {
          "Ref": "VPCIGWB7E252D3"
        },
        "VpcId": {
          "Ref": "VPCB9E5F0B4"
        }
      },
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/VPC/VPCGW"
      }
    },
    "ProxyApiHandlerServiceRoleBE71BFB1": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Statement": [
            {
              "Action": "sts:AssumeRole",
              "Effect": "Allow",
              "Principal": {
                "Service": "lambda.amazonaws.com"
              }
            }
          ],
          "Version": "2012-10-17"
        },
        "ManagedPolicyArns": [
          {
            "Fn::Join": [
              "",
              [
                "arn:",
                {
                  "Ref": "AWS::Partition"
                },
                ":iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
              ]
            ]
          }
        ]
      },
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/ServiceRole/Resource"
      }
    },
    "ProxyApiHandlerServiceRoleDefaultPolicy86681202": {
      "Type": "AWS::IAM::Policy",
      "Properties": {
        "PolicyDocument": {
          "Statement": [
            {
              "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
              ],
              "Effect": "Allow",
              "Resource": "arn:aws:bedrock:*::foundation-model/*"
            },
            {
              "Action": [
                "ssm:DescribeParameters",
                "ssm:GetParameters",
                "ssm:GetParameter",
                "ssm:GetParameterHistory"
              ],
              "Effect": "Allow",
              "Resource": {
                "Fn::Join": [
                  "",
                  [
                    "arn:",
                    {
                      "Ref": "AWS::Partition"
                    },
                    ":ssm:",
                    {
                      "Ref": "AWS::Region"
                    },
                    ":",
                    {
                      "Ref": "AWS::AccountId"
                    },
                    ":parameter/",
                    {
                      "Ref": "ApiKeyParam"
                    }
                  ]
                ]
              }
            }
          ],
          "Version": "2012-10-17"
        },
        "PolicyName": "ProxyApiHandlerServiceRoleDefaultPolicy86681202",
        "Roles": [
          {
            "Ref": "ProxyApiHandlerServiceRoleBE71BFB1"
          }
        ]
      },
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/ServiceRole/DefaultPolicy/Resource"
      }
    },
    "ProxyApiHandlerEC15A492": {
      "Type": "AWS::Lambda::Function",
      "Properties": {
        "Architectures": [
          "arm64"
        ],
        "Code": {
          "ImageUri": {
            "Fn::Join": [
              "",
              [
                "366590864501.dkr.ecr.",
                {
                  "Ref": "AWS::Region"
                },
                ".",
                {
                  "Ref": "AWS::URLSuffix"
                },
                "/bedrock-proxy-api:latest"
              ]
            ]
          }
        },
        "Description": "Bedrock Proxy API Handler",
        "Environment": {
          "Variables": {
            "API_KEY_PARAM_NAME": {
              "Ref": "ApiKeyParam"
            },
            "DEBUG": "false",
            "DEFAULT_MODEL": {
              "Fn::FindInMap": [
                "ProxyRegionTable03E5BEB3",
                {
                  "Ref": "AWS::Region"
                },
                "model",
                {
                  "DefaultValue": "anthropic.claude-3-sonnet-20240229-v1:0"
                }
              ]
            },
            "DEFAULT_EMBEDDING_MODEL": "cohere.embed-multilingual-v3"
          }
        },
        "MemorySize": 1024,
        "PackageType": "Image",
        "Role": {
          "Fn::GetAtt": [
            "ProxyApiHandlerServiceRoleBE71BFB1",
            "Arn"
          ]
        },
        "Timeout": 300
      },
      "DependsOn": [
        "ProxyApiHandlerServiceRoleDefaultPolicy86681202",
        "ProxyApiHandlerServiceRoleBE71BFB1"
      ],
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/Resource"
      }
    },
    "ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779": {
      "Type": "AWS::Lambda::Permission",
      "Properties": {
        "Action": "lambda:InvokeFunction",
        "FunctionName": {
          "Fn::GetAtt": [
            "ProxyApiHandlerEC15A492",
            "Arn"
          ]
        },
        "Principal": "elasticloadbalancing.amazonaws.com"
      },
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/Invoke2UTWxhlfyqbT5FTn--5jvgbLgj+FfJwzswGk55DU1H--Y="
      }
    },
    "ProxyALB87756780": {
      "Type": "AWS::ElasticLoadBalancingV2::LoadBalancer",
      "Properties": {
        "LoadBalancerAttributes": [
          {
            "Key": "deletion_protection.enabled",
            "Value": "false"
          }
        ],
        "Scheme": "internet-facing",
        "SecurityGroups": [
          {
            "Fn::GetAtt": [
              "ProxyALBSecurityGroup0D6CA3DA",
              "GroupId"
            ]
          }
        ],
        "Subnets": [
          {
            "Ref": "VPCPublicSubnet1SubnetB4246D30"
          },
          {
            "Ref": "VPCPublicSubnet2Subnet74179F39"
          }
        ],
        "Type": "application"
      },
      "DependsOn": [
        "VPCPublicSubnet1DefaultRoute91CEF279",
        "VPCPublicSubnet1RouteTableAssociation0B0896DC",
        "VPCPublicSubnet2DefaultRouteB7481BBA",
        "VPCPublicSubnet2RouteTableAssociation5A808732"
      ],
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/Proxy/ALB/Resource"
      }
    },
    "ProxyALBSecurityGroup0D6CA3DA": {
      "Type": "AWS::EC2::SecurityGroup",
      "Properties": {
        "GroupDescription": "Automatically created Security Group for ELB BedrockProxyALB1CE4CAD1",
        "SecurityGroupEgress": [
          {
            "CidrIp": "255.255.255.255/32",
            "Description": "Disallow all traffic",
            "FromPort": 252,
            "IpProtocol": "icmp",
            "ToPort": 86
          }
        ],
        "SecurityGroupIngress": [
          {
            "CidrIp": "0.0.0.0/0",
            "Description": "Allow from anyone on port 80",
            "FromPort": 80,
            "IpProtocol": "tcp",
            "ToPort": 80
          }
        ],
        "VpcId": {
          "Ref": "VPCB9E5F0B4"
        }
      },
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/Proxy/ALB/SecurityGroup/Resource"
      }
    },
    "ProxyALBListener933E9515": {
      "Type": "AWS::ElasticLoadBalancingV2::Listener",
      "Properties": {
        "DefaultActions": [
          {
            "TargetGroupArn": {
              "Ref": "ProxyALBListenerTargetsGroup187739FA"
            },
            "Type": "forward"
          }
        ],
        "LoadBalancerArn": {
          "Ref": "ProxyALB87756780"
        },
        "Port": 80,
        "Protocol": "HTTP"
      },
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/Proxy/ALB/Listener/Resource"
      }
    },
    "ProxyALBListenerTargetsGroup187739FA": {
      "Type": "AWS::ElasticLoadBalancingV2::TargetGroup",
      "Properties": {
        "HealthCheckEnabled": false,
        "TargetType": "lambda",
        "Targets": [
          {
            "Id": {
              "Fn::GetAtt": [
                "ProxyApiHandlerEC15A492",
                "Arn"
              ]
            }
          }
        ]
      },
      "DependsOn": [
        "ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779"
      ],
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/Proxy/ALB/Listener/TargetsGroup/Resource"
      }
    },
    "CDKMetadata": {
      "Type": "AWS::CDK::Metadata",
      "Properties": {
        "Analytics": "v2:deflate64:H4sIAAAAAAAA/1VRXW/CMAz8LbyHDMovAKZNSJtWFcTr5LpeZ0iTKHFAqOp/n1q+uief7y7ynZLp+WKhZxM4xylWx6nhUrdbATyq9Y/NIUBDQkHBOX63hJlu9x57aZ+vVZ5Kw7hNpSXpuScqXBLaQWnoyT+5ZYwOGYSdfZh7sLFCwZK8g9AZLrczt20pAvjbkBW1JUyB5fIeXPLDgTHRKcKgC/IusrhwWUEkZaApK9Dtq8MjhU0DNb0li/cIY5xTaDhGdrZTDI1uC3etMczcGcYh2hV1igxEYTQOqhIMWGRbnzLdLr03jEPLDwfVatAo9E//7WMfRyF789zxSN9BqEketUdr16mCoksBh6if4D3buodfSXy6fsrIsHa2Yhk6WleRPsSXUzbT87meTQ6ReRqSFW5IF9f5B/Z2H8goAgAA"
      },
      "Metadata": {
        "aws:cdk:path": "BedrockProxy/CDKMetadata/Default"
      },
      "Condition": "CDKMetadataAvailable"
    }
  },
  "Mappings": {
    "ProxyRegionTable03E5BEB3": {
      "us-east-1": {
        "model": "anthropic.claude-3-sonnet-20240229-v1:0"
      },
      "ap-southeast-1": {
        "model": "anthropic.claude-v2"
      },
      "ap-northeast-1": {
        "model": "anthropic.claude-v2:1"
      },
      "eu-central-1": {
        "model": "anthropic.claude-v2:1"
      }
    }
  },
  "Outputs": {
    "APIBaseUrl": {
      "Description": "Proxy API Base URL (OPENAI_API_BASE)",
      "Value": {
        "Fn::Join": [
          "",
          [
            "http://",
            {
              "Fn::GetAtt": [
                "ProxyALB87756780",
                "DNSName"
              ]
            },
            "/api/v1"
          ]
        ]
      }
    }
  },
  "Conditions": {
    "CDKMetadataAvailable": {
      "Fn::Or": [
        {
          "Fn::Or": [
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "af-south-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "ap-east-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "ap-northeast-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "ap-northeast-2"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "ap-south-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "ap-southeast-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "ap-southeast-2"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "ca-central-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "cn-north-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "cn-northwest-1"
              ]
            }
          ]
        },
        {
          "Fn::Or": [
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "eu-central-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "eu-north-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "eu-south-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "eu-west-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "eu-west-2"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "eu-west-3"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "il-central-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "me-central-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "me-south-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "sa-east-1"
              ]
            }
          ]
        },
        {
          "Fn::Or": [
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "us-east-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "us-east-2"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "us-west-1"
              ]
            },
            {
              "Fn::Equals": [
                {
                  "Ref": "AWS::Region"
                },
                "us-west-2"
              ]
            }
          ]
        }
      ]
    }
  }
 }
--- a/deployment/BedrockProxyFargate.template
+++ b/deployment/BedrockProxyFargate.template
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,18 +0,0 @@
 version: '3.8'
 services:
  bedrock-access-gateway:
    build:
      context: ./src
      dockerfile: Dockerfile_ecs
    ports:
      - "127.0.0.1:8000:8080"
    environment:
      - ENABLE_PROMPT_CACHING=true
      - API_KEY=${OPENAI_API_KEY}
      - AWS_PROFILE
      - AWS_ACCESS_KEY_ID
      - AWS_SECRET_ACCESS_KEY
      - AWS_SESSION_TOKEN
    volumes:
      - ${HOME}/.aws:/home/appuser/.aws
--- a/docs/Security.md
+++ b/docs/Security.md
@@ -1,78 +0,0 @@
 # Security
 This document details the security configuration required for the solution. In particular, it covers:
 - **HTTPS Setup**
 Following these guidelines will help ensure that traffic is encrypted over the public network.
 ---
 ## 1. HTTPS Authentication with the ALB
 ### Overview
 Using HTTPS on your ALB guarantees that all client-to-ALB communication is encrypted. This is achieved by:
 - **Obtaining and managing SSL/TLS certificates** using AWS Certificate Manager (ACM). You'll need a domain but you can request a free certificate.
 - **Configuring HTTPS listeners** on the ALB
 - **Automating HTTP to HTTPS redirect** for clients that inadvertently access HTTP endpoints
 - **Allowing traffic in the Security Group of the ALB**
 ### Step-by-Step Setup
 #### 1.1. Request an SSL/TLS Certificate via ACM
 1. **Navigate to AWS Certificate Manager (ACM):**  
   In the AWS Management Console, go to ACM in the region where your ALB is deployed.
 2. **Request the Certificate:**  
   - Click on **"Request a certificate"**.
   - Choose **"Request a public certificate"** (or a private one if using a private CA).
   - Enter your domain names (e.g., `example.com`, `*.example.com`).
   - Complete the validation (via DNS or email). DNS validation is generally preferred for automation purposes.
 3. **Certificate Validation:**  
   Ensure that the certificate status becomes **"Issued"** before proceeding.
 #### 1.2. Configure the ALB for HTTPS
 1. **Create or Modify the ALB Listener:**  
   - Open the **EC2 Dashboard** and navigate to [Load Balancers](https://console.aws.amazon.com/ec2/home?#LoadBalancers:).
   - If you already have an ALB, select it; otherwise, create a new ALB.
   - Under the **Listeners** tab, click **Manage listener** > **Edit Listener**.
   - Configure the listener protocol to **HTTPS** with port **443**.
   - Select the certificate you requested from ACM.
 #### 1.3. (Optional) Redirect HTTP Traffic to HTTPS
 To enhance security, ensure that any HTTP requests are automatically redirected to HTTPS.
 1. **Create an HTTP Listener on Port 80:**
   - Add a listener on port **80**.
   - In the listener settings, add a rule to redirect all traffic to port **443** with the protocol changed to **HTTPS**.
   **Example AWS CLI command for redirection:**
   ```bash
   aws elbv2 create-listener \
       --load-balancer-arn <your-alb-arn> \
       --protocol HTTP \
       --port 80 \
       --default-actions Type=redirect,RedirectConfig="Protocol=https,Port=443,StatusCode=HTTP_301"
   ```
 #### 1.4. Allow traffic in the Security Group of the ALB
 1. **Create a Security Group:**
   - Go to the CloudFormation stack you originally used to deploy, select **Resources** and search for **ProxyALBSecurityGroup**
   - Click on the Security Group
   - Edit the Inbound Rules to allow traffic on Port 443 from `0.0.0.0/0` and (optionally) delete the Inbound Rule on Port 80. **Note**: If you delete the rule on port 80, you will need to update the base url to use HTTPS only as it won't redirect HTTP traffic to HTTPS.
 Now you should be able to test your application! Use the base url like:
 ```
 https://<your-domain>/api/v1
 ```
 ---
 By following the steps outlined in this guide, you can configure a secure environment that uses HTTPS via ALB for encrypted traffic.
--- a/docs/Troubleshooting.md
+++ b/docs/Troubleshooting.md
@@ -1,97 +0,0 @@
 # Troubleshooting Guide
 This guide helps you troubleshoot common issues you might encounter when using the Bedrock Access Gateway.
 ## Common Issues
 ### 1. Parameter Store Access Error
 To see errors, first you need to access the CloudWatch Logs of the Lambda/Fargate instance.
 1. Go to the [CloudWatch Console](https://console.aws.amazon.com/cloudwatch/home?#logsV2:log-groups/)
 2. Search for `/aws/lambda/BedrockProxyAPI`
 3. Click on the `Log Stream` to see the error details
 ```python
 botocore.exceptions.ClientError: An error occurred (ParameterNotFound) when calling the GetParameter operation: Parameter /BedrockProxyAPIKey not found.
 ```
 This error occurs when the Lambda function cannot access the API key parameter in Parameter Store.
 **Possible solutions:**
 - Verify that you created the parameter in Parameter Store with the correct name
 - Check that the parameter name in the CloudFormation stack matches the one in Parameter Store
 - Ensure the Lambda function's IAM role has permission to access Parameter Store
 - If you didn't set up an API key, leave the `ApiKeyParam` field blank during deployment
 ### 2. Model Access Issues
 If you receive an error about model access:
 ```
 {"error": {"message": "User: arn:aws:iam::XXXX:role/XXX is not authorized to perform: bedrock:InvokeModel on resource: arn:aws:bedrock:REGION::foundation-model/XXX", "type": "auth_error", "code": 401}}
 ```
 **Possible solutions:**
 - Ensure you have requested access to the model in Amazon Bedrock
 - Verify the Lambda/Fargate role has the necessary permissions to invoke Bedrock models
 - Check that you're using the correct model ID
 - Verify the model is available in your chosen region
 ### 3. API Key Authentication Failures
 If you receive a 401 Unauthorized error:
 ```
 {"detail": "Could not validate credentials"}
 ```
 **Possible solutions:**
 - Verify you're using the correct API key in your requests
 - Check that the `Authorization` header is properly formatted (`Bearer YOUR-API-KEY`)
 - If using environment variables, ensure `OPENAI_API_KEY` is set correctly
 ### 4. Cross-Region Access Issues
 If you're trying to access models in a different region:
 ```
 {"error": {"message": "Region 'us-east-1' is not enabled for your account", "type": "invalid_request_error", "code": 400}}
 ```
 **Possible solutions:**
 - Ensure the target region is enabled for your AWS account
 - Verify the model you're trying to access is available in that region
 - Check that your IAM roles have the necessary cross-region permissions
 ### 5. Rate Limiting and Quotas
 If you're experiencing throttling or quota issues:
 ```
 {"error": {"message": "Rate limit exceeded", "type": "rate_limit_error", "code": 429}}
 ```
 **Possible solutions:**
 - Check your Bedrock service quotas in the AWS Console
 - Consider implementing retry logic in your application
 - Request a quota increase if needed
 ## Getting Help
 If you're still experiencing issues:
 1. Check the CloudWatch Logs for detailed error messages
 2. Verify your AWS credentials and permissions
 3. Review the [Usage Guide](./Usage.md) for correct API usage
 4. Open a [GitHub issue](https://github.com/aws-samples/bedrock-access-gateway/issues/new?template=bug_report.md) with:
   - Detailed error message
   - Steps to reproduce
   - Your deployment configuration (region, model, etc.)
   - Any relevant CloudWatch logs
 ## Additional Resources
 - [Amazon Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
 - [AWS IAM Documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/)
 - [AWS Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html)
--- a/docs/Usage.md
+++ b/docs/Usage.md
@@ -9,85 +9,6 @@ export OPENAI_API_KEY=<API key>
 export OPENAI_BASE_URL=<API base url>
 ```
 **API Example:**
 - [Models API](#models-api)
 - [Embedding API](#embedding-api)
 - [Multimodal API](#multimodal-api)
 - [Tool Call](#tool-call)
 - [Reasoning](#reasoning)
 - [Interleaved thinking (beta)](#Interleaved thinking (beta))
 ## Models API
 You can use this API to get a list of supported model IDs.
 Also, you can use this API to refresh the model list if new models are added to Amazon Bedrock.
 **Example Request**
 ```bash
 curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq .data
 ```
 **Example Response**
 ```bash
 [
  ...
  {
    "id": "anthropic.claude-3-5-sonnet-20240620-v1:0",
    "created": 1734416893,
    "object": "model",
    "owned_by": "bedrock"
  },
  {
    "id": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
    "created": 1734416893,
    "object": "model",
    "owned_by": "bedrock"
  },
  ...
 ]
 ```
 ## Chat Completions API
 ### Basic Example with Claude Sonnet 4.5
 Claude Sonnet 4.5 is Anthropic's most intelligent model, excelling at coding, complex reasoning, and agent-based tasks. It's available via global cross-region inference profiles.
 **Example Request**
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."
      }
    ]
  }'
 ```
 **Example SDK Usage**
 ```python
 from openai import OpenAI
 client = OpenAI()
 completion = client.chat.completions.create(
    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    messages=[{"role": "user", "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."}],
 )
 print(completion.choices[0].message.content)
 ```
 ## Embedding API
 **Important Notice**: Please carefully review the following points before using this proxy API for embedding.
@@ -170,10 +91,13 @@ print(doc_result[0][:5])
 ## Multimodal API
 **Important Notice**: Please carefully review the following points before using this proxy API for Multimodal.
 1. This API is only supported by Claude 3 model.
 **Example Request**
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
@@ -261,6 +185,7 @@ curl $OPENAI_BASE_URL/chat/completions \
 **Important Notice**: Please carefully review the following points before using this Tool Call for Chat completion API.
 1. Function Call is now deprecated in favor of Tool Call by OpenAI, hence it's not supported here, you should use Tool Call instead.
 2. This API is only supported by Claude 3 model.
 **Example Request**
@@ -358,218 +283,3 @@ curl $OPENAI_BASE_URL/chat/completions \
 You can try it with different questions, such as:
 1. Hello, who are you?  (No tools are needed)
 2. What is the weather like today?  (Should use get_current_location tool first)
 ## Reasoning
 **Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API.
 - Only Claude 3.7 Sonnet (extended thinking) and DeepSeek R1 support Reasoning so far. Please make sure the model supports reasoning before use.
 - For Claude 3.7 Sonnet, the reasoning mode (or thinking mode) is not enabled by default, you must pass additional `reasoning_effort` parameter in your request. Please also provide the right max_tokens (or max_completion_tokens) in your request. The budget_tokens is based on reasoning_effort (low: 30%, medium: 60%, high: 100% of max tokens), ensuring minimum budget_tokens of 1,024 with Anthropic recommending at least 4,000 tokens for comprehensive reasoning. Check [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html) for more details.
 - For DeepSeek R1, you don't need additional reasoning_effort parameter, otherwise, you may get an error.
 - The reasoning response (CoT, thoughts) is added in an additional tag 'reasoning_content' which is not officially supported by OpenAI. This is to follow [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example). This may be changed in the future.
 **Example Request**
 - Claude 3.7 Sonnet
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    "messages": [
            "role": "user",
            "content": "which one is bigger, 3.9 or 3.11?"
        }
    ],
    "max_completion_tokens": 4096,
    "reasoning_effort": "low",
    "stream": false
 }'
 ```
 - DeepSeek R1
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "us.deepseek.r1-v1:0",
    "messages": [
        {
            "role": "user",
            "content": "which one is bigger, 3.9 or 3.11?"
        }
    ],
    "stream": false
 }'
 ```
 **Example Response**
 ```json
 {
    "id": "chatcmpl-83fb7a88",
    "created": 1740545278,
    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    "system_fingerprint": "fp",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "logprobs": null,
            "message": {
                "role": "assistant",
                "content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
                "reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
            }
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 51,
        "completion_tokens": 565,
        "total_tokens": 616
    }
 }
 ```
 You can also use OpenAI SDK (run `pip3 install -U openai` first )
 - Non-Streaming
 ```python
 from openai import OpenAI
 client = OpenAI()
 messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
 response = client.chat.completions.create(
    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    messages=messages,
    reasoning_effort="low",
    max_completion_tokens=4096,
 )
 reasoning_content = response.choices[0].message.reasoning_content
 content = response.choices[0].message.content
 ```
 - Streaming
 ```python
 from openai import OpenAI
 client = OpenAI()
 messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
 response = client.chat.completions.create(
    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    messages=messages,
    reasoning_effort="low",
    max_completion_tokens=4096,
    stream=True,
 )
 reasoning_content = ""
 content = ""
 for chunk in response:
    if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content
    elif chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
 ```
 ## Interleaved thinking (beta)
 **Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API.
 Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions.
 With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.
 **Supported Models**: Claude Sonnet 4, Claude Sonnet 4.5
 **Example Request**
 - Non-Streaming (Claude Sonnet 4.5)
 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer bedrock" \
 -d '{
 "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
 "max_tokens": 2048,
 "messages": [{
 "role": "user",
 "content": "Explain how to implement a binary search tree with self-balancing capabilities."
 }],
 "extra_body": {
 "anthropic_beta": ["interleaved-thinking-2025-05-14"],
 "thinking": {"type": "enabled", "budget_tokens": 4096}
 }
 }'
 ```
 - Non-Streaming (Claude Sonnet 4)
 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer bedrock" \
 -d '{
 "model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
 "max_tokens": 2048,
 "messages": [{
 "role": "user",
 "content": "有一天，一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧，于是偷偷把分数改成了 88 分。她的父亲看到试卷后，怒发冲冠，狠狠地给了她一巴掌，怒吼道：“你这 8 怎么一半是绿的一半是红的，你以为我是傻子吗？”女孩被打后，委屈地哭了起来，什么也没说。过了一会儿，父亲突然崩溃了。请问这位父亲为什么过一会崩溃了？"
 }],
 "extra_body": {
 "anthropic_beta": ["interleaved-thinking-2025-05-14"],
 "thinking": {"type": "enabled", "budget_tokens": 4096}
 }
 }'
 ```
 - Streaming (Claude Sonnet 4.5)
 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer bedrock" \
 -d '{
 "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
 "max_tokens": 2048,
 "messages": [{
 "role": "user",
 "content": "Explain how to implement a binary search tree with self-balancing capabilities."
 }],
 "stream": true,
 "extra_body": {
 "anthropic_beta": ["interleaved-thinking-2025-05-14"],
 "thinking": {"type": "enabled", "budget_tokens": 4096}
 }
 }'
 ```
 - Streaming (Claude Sonnet 4)
 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer bedrock" \
 -d '{
 "model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
 "max_tokens": 2048,
 "messages": [{
 "role": "user",
 "content": "有一天，一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧，于是偷偷把分数改成了 88 分。她的父亲看到试卷后，怒发冲冠，狠狠地给了她一巴掌，怒吼道：“你这 8 怎么一半是绿的一半是红的，你以为我是傻子吗？”女孩被打后，委屈地哭了起来，什么也没说。过了一会儿，父亲突然崩溃了。请问这位父亲为什么过一会崩溃了？"
 }],
 "stream": true,
 "extra_body": {
 "anthropic_beta": ["interleaved-thinking-2025-05-14"],
 "thinking": {"type": "enabled", "budget_tokens": 4096}
 }
 }'
 ```
--- a/docs/Usage_CN.md
+++ b/docs/Usage_CN.md
@@ -9,83 +9,6 @@ export OPENAI_API_KEY=<API key>
 export OPENAI_BASE_URL=<API base url>
 ```
 **API 示例:**
 - [Models API](#models-api)
 - [Embedding API](#embedding-api)
 - [Multimodal API](#multimodal-api)
 - [Tool Call](#tool-call)
 - [Reasoning](#reasoning)
 - [Interleaved thinking (beta)](#Interleaved thinking (beta))
 ## Models API
 你可以通过这个API 获取支持的models 列表。 另外，如果Amazon Bedrock有新模型加入后，你也可以用它来更新刷新模型列表。
 **Request 示例**
 ```bash
 curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq .data
 ```
 **Response 示例**
 ```bash
 [
  ...
  {
    "id": "anthropic.claude-3-5-sonnet-20240620-v1:0",
    "created": 1734416893,
    "object": "model",
    "owned_by": "bedrock"
  },
  {
    "id": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
    "created": 1734416893,
    "object": "model",
    "owned_by": "bedrock"
  },
  ...
 ]
 ```
 ## Chat Completions API
 ### Claude Sonnet 4.5 基础示例
 Claude Sonnet 4.5 是 Anthropic 最智能的模型，在编码、复杂推理和基于代理的任务方面表现出色。它通过全球跨区域推理配置文件提供。
 **Request 示例**
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "messages": [
      {
        "role": "user",
        "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"
      }
    ]
  }'
 ```
 **SDK 使用示例**
 ```python
 from openai import OpenAI
 client = OpenAI()
 completion = client.chat.completions.create(
    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    messages=[{"role": "user", "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"}],
 )
 print(completion.choices[0].message.content)
 ```
 ## Embedding API
 **重要**: 在使用此代理 API 之前,请仔细阅读以下几点:
@@ -167,6 +90,10 @@ print(doc_result[0][:5])
 ## Multimodal API
 **重要**:在使用此代理API进行多模态处理之前,请仔细阅读以下几点:
 1. 此API 仅支持Claude 3模型。
 **Request 示例**
 ```bash
@@ -257,6 +184,7 @@ curl $OPENAI_BASE_URL/chat/completions \
 **重要**:在使用此代理API进行Tool Call之前,请仔细阅读以下几点:
 1. OpenAI 已经废弃使用Function Call,而推荐使用Tool Call,因此Function Call在此处不受支持,您应该改为Tool Call。
 1. 此API 仅支持Claude 3模型。 
 **Request 示例**
@@ -354,222 +282,3 @@ curl $OPENAI_BASE_URL/chat/completions \
 You can try it with different questions, such as:
 1. Hello, who are you?  (No tools are needed)
 2. What is the weather like today?  (Should use get_current_location tool first)
 ## Reasoning
 **重要**: 使用此 reasoning 推理模式前，请仔细阅读以下要点。
 - 目前仅 Claude 3.7 Sonnet / Deepseek R1 模型支持推理功能。使用前请确保所用模型支持推理。
 - Claude 3.7 Sonnet 推理模式（或思考模式）默认未启用，您必须在请求中传递额外的 reasoning_effort 参数，参数值可选:low，medium, high。另外，请在请求中提供正确的 max_tokens（或 max_completion_tokens）参数。budget_tokens 基于 reasoning_effort 设置（低：30%，中：60%，高：100% 的max tokens），确保最小 budget_tokens 为 1,024，Anthropic 建议至少使用 4,000 个令牌以获得全面的推理。详情请参阅 [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html)。
 - Deepseek R1 会自动使用推理模式，不需要在中传递额外的 reasoning_effort 参数（否则会报错）
 - 推理结果（思维链结果、思考过程）被添加到名为 'reasoning_content' 的额外标签中，这不是 OpenAI 官方支持的格式。此设计遵循 [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example)  的规范。未来可能会有所变动。
 **Request 示例**
 - Claude 3.7 Sonnet
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    "messages": [
        {
            "role": "user",
            "content": "which one is bigger, 3.9 or 3.11?"
        }
    ],
    "max_completion_tokens": 4096,
    "reasoning_effort": "low",
    "stream": false
 }'
 ```
 - DeepSeek R1
 ```bash
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "us.deepseek.r1-v1:0",
    "messages": [
        {
            "role": "user",
            "content": "which one is bigger, 3.9 or 3.11?"
        }
    ],
    "stream": false
 }'
 ```
 **Response 示例**
 ```json
 {
    "id": "chatcmpl-83fb7a88",
    "created": 1740545278,
    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    "system_fingerprint": "fp",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "logprobs": null,
            "message": {
                "role": "assistant",
                "content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
                "reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
            }
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 51,
        "completion_tokens": 565,
        "total_tokens": 616
    }
 }
 ```
 或者使用 OpenAI SDK (请先运行`pip3 install -U openai` 升级到最新版本)
 - Non-Streaming
 ```python
 from openai import OpenAI
 client = OpenAI()
 messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
 response = client.chat.completions.create(
    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    messages=messages,
    reasoning_effort="low",
    max_completion_tokens=4096,
 )
 reasoning_content = response.choices[0].message.reasoning_content
 content = response.choices[0].message.content
 ```
 - Streaming
 ```python
 from openai import OpenAI
 client = OpenAI()
 messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
 response = client.chat.completions.create(
    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    messages=messages,
    reasoning_effort="low",
    max_completion_tokens=4096,
    stream=True,
 )
 reasoning_content = ""
 content = ""
 for chunk in response:
    if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content
    elif chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
 ```
 ## Interleaved thinking (beta)
 **重要提示**：在使用 Chat Completion API 的推理模式（reasoning mode）前，请务必仔细阅读以下内容。
 Claude 4 模型支持借助工具使用的扩展思维功能（Extended Thinking），其中包含交错思考（[interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) ）。该功能使 Claude 4 可以在多次调用工具之间进行思考，并在收到工具结果后执行更复杂的推理，这对处理更复杂的 Agentic AI 交互非常有帮助。
 在交错思考模式下，budget_tokens 可以超过 max_tokens 参数，因为它代表一次助手回合中所有思考块的总 Token 预算。
 **支持的模型**: Claude Sonnet 4, Claude Sonnet 4.5
 **Request 示例**
 - Non-Streaming (Claude Sonnet 4.5)
 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer bedrock" \
 -d '{
 "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
 "max_tokens": 2048,
 "messages": [{
 "role": "user",
 "content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
 }],
 "extra_body": {
 "anthropic_beta": ["interleaved-thinking-2025-05-14"],
 "thinking": {"type": "enabled", "budget_tokens": 4096}
 }
 }'
 ```
 - Non-Streaming (Claude Sonnet 4)
 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer bedrock" \
 -d '{
 "model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
 "max_tokens": 2048,
 "messages": [{
 "role": "user",
 "content": "有一天，一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧，于是偷偷把分数改成了 88 分。她的父亲看到试卷后，怒发冲冠，狠狠地给了她一巴掌，怒吼道：“你这 8 怎么一半是绿的一半是红的，你以为我是傻子吗？”女孩被打后，委屈地哭了起来，什么也没说。过了一会儿，父亲突然崩溃了。请问这位父亲为什么过一会崩溃了？"
 }],
 "extra_body": {
 "anthropic_beta": ["interleaved-thinking-2025-05-14"],
 "thinking": {"type": "enabled", "budget_tokens": 4096}
 }
 }'
 ```
 - Streaming (Claude Sonnet 4.5)
 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer bedrock" \
 -d '{
 "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
 "max_tokens": 2048,
 "messages": [{
 "role": "user",
 "content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
 }],
 "stream": true,
 "extra_body": {
 "anthropic_beta": ["interleaved-thinking-2025-05-14"],
 "thinking": {"type": "enabled", "budget_tokens": 4096}
 }
 }'
 ```
 - Streaming (Claude Sonnet 4)
 ```bash
 curl http://127.0.0.1:8000/api/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer bedrock" \
 -d '{
 "model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
 "max_tokens": 2048,
 "messages": [{
 "role": "user",
 "content": "有一天，一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧，于是偷偷把分数改成了 88 分。她的父亲看到试卷后，怒发冲冠，狠狠地给了她一巴掌，怒吼道：“你这 8 怎么一半是绿的一半是红的，你以为我是傻子吗？”女孩被打后，委屈地哭了起来，什么也没说。过了一会儿，父亲突然崩溃了。请问这位父亲为什么过一会崩溃了？"
 }],
 "stream": true,
 "extra_body": {
 "anthropic_beta": ["interleaved-thinking-2025-05-14"],
 "thinking": {"type": "enabled", "budget_tokens": 4096}
 }
 }'
 ```
--- a/ruff.toml
+++ b/ruff.toml
@@ -1,21 +0,0 @@
 line-length = 120
 indent-width = 4
 target-version = "py312"
 exclude = [
    ".venv",
    ".vscode",
    "test/*"
 ]
 [lint]
 select = ["E", "F", "I"]
 ignore = [
    "E501",
    "C901",
    "F401",
 ]
 [format]
 # use double quotes for strings.
 quote-style = "double"
--- a/scripts/push-to-ecr.sh
+++ b/scripts/push-to-ecr.sh
@@ -1,139 +1,35 @@
-# NOTE: The script will try to create the ECR repository if it doesn't exist. Please grant the necessary permissions to the IAM user or role.
+# Make sure you have created the Repo in AWS ECR in every regions you want to push to before executing this script.
 # Usage:
 #    cd scripts
-#    bash ./push-to-ecr.sh
+#    chmod +x push-to-ecr.sh
 #    ./push-to-ecr.sh
 set -o errexit  # exit on first error
 set -o nounset  # exit on using unset variables
 set -o pipefail # exit on any error in a pipeline
 # Change to the directory where the script is located
 cd "$(dirname "$0")"
 # Prompt user for inputs
 echo "================================================"
 echo "Bedrock Access Gateway - Build and Push to ECR"
 echo "================================================"
 echo ""
 # Get repository name for Lambda version
 read -p "Enter ECR repository name for Lambda (default: bedrock-proxy-api): " LAMBDA_REPO
 LAMBDA_REPO=${LAMBDA_REPO:-bedrock-proxy-api}
 # Get repository name for ECS/Fargate version
 read -p "Enter ECR repository name for ECS/Fargate (default: bedrock-proxy-api-ecs): " ECS_REPO
 ECS_REPO=${ECS_REPO:-bedrock-proxy-api-ecs}
 # Get image tag
 read -p "Enter image tag (default: latest): " TAG
 TAG=${TAG:-latest}
 # Get AWS region
 read -p "Enter AWS region (default: us-east-1): " AWS_REGION
 AWS_REGION=${AWS_REGION:-us-east-1}
 echo ""
 echo "Configuration:"
 echo "  Lambda Repository: $LAMBDA_REPO"
 echo "  ECS/Fargate Repository: $ECS_REPO"
 echo "  Image Tag: $TAG"
 echo "  AWS Region: $AWS_REGION"
 echo ""
 read -p "Continue with these settings? (y/n): " CONFIRM
 if [[ ! "$CONFIRM" =~ ^[Yy]$ ]]; then
    echo "Aborted."
    exit 1
 fi
 echo ""
 # Acknowledgment about ECR repository creation
 echo "ℹ️  NOTICE: This script will automatically create ECR repositories if they don't exist."
 echo "   The repositories will be created with the following default settings:"
 echo "   - Image tag mutability: MUTABLE (allows overwriting tags)"
 echo "   - Image scanning: Disabled"
 echo "   - Encryption: AES256 (AWS managed encryption)"
 echo ""
 echo "   You can modify these settings later in the AWS ECR Console if needed."
 echo "   Required IAM permissions: ecr:CreateRepository, ecr:GetAuthorizationToken,"
 echo "   ecr:BatchCheckLayerAvailability, ecr:InitiateLayerUpload, ecr:UploadLayerPart,"
 echo "   ecr:CompleteLayerUpload, ecr:PutImage"
 echo ""
 read -p "Do you acknowledge and want to proceed? (y/n): " ACK_CONFIRM
 if [[ ! "$ACK_CONFIRM" =~ ^[Yy]$ ]]; then
    echo "Aborted."
    exit 1
 fi
 echo ""
 # Define variables
-ARCHS=("arm64")  # Single architecture for simplicity
+IMAGE_NAME="bedrock-proxy-api"
-
+TAG="latest"
-build_and_push_image() {
+AWS_REGIONS=("us-west-2") # List of AWS regions
-    local IMAGE_NAME=$1
+#AWS_REGIONS=("us-east-1" "us-west-2" "eu-central-1" "ap-southeast-1" "ap-northeast-1") # List of AWS regions
    local TAG=$2
    local DOCKERFILE_PATH=$3
    local REGION=$AWS_REGION
    local ARCH=${ARCHS[0]}
    echo "Building $IMAGE_NAME:$TAG..."
 # Build Docker image
-    # Note: --provenance=false and --sbom=false are required for Lambda compatibility
+docker build -t $IMAGE_NAME:$TAG ../src/
    # Without these flags, Docker BuildKit (especially with docker-container driver) may create
    # OCI image manifests with attestations that AWS Lambda does not support.
    # Lambda requires Docker V2 Schema 2 format without multi-manifest index.
    # See: https://github.com/aws-samples/bedrock-access-gateway/issues/206
    docker buildx build \
        --platform linux/$ARCH \
        --provenance=false \
        --sbom=false \
        -t $IMAGE_NAME:$TAG \
        -f $DOCKERFILE_PATH \
        --load \
        ../src/
-    # Get the account ID
+# Loop through each AWS region
 for REGION in "${AWS_REGIONS[@]}"
 do
    # Get the account ID for the current region
    ACCOUNT_ID=$(aws sts get-caller-identity --region $REGION --query Account --output text)
    # Create repository URI
    REPOSITORY_URI="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${IMAGE_NAME}"
    echo "Creating ECR repository if it doesn't exist..."
    # Create ECR repository if it doesn't exist
    aws ecr create-repository --repository-name "${IMAGE_NAME}" --region $REGION || true
    echo "Logging in to ECR..."
    # Log in to ECR
    aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $REPOSITORY_URI
-    echo "Pushing image to ECR..."
+    # Tag the image for the current region
    # Tag the image for ECR
    docker tag $IMAGE_NAME:$TAG $REPOSITORY_URI:$TAG
    # Push the image to ECR
    docker push $REPOSITORY_URI:$TAG
-
+    echo "Pushed $IMAGE_NAME:$TAG to $REPOSITORY_URI"
-    echo "✅ Successfully pushed $IMAGE_NAME:$TAG to $REPOSITORY_URI"
+done
    echo ""
 }
 echo "Building and pushing Lambda image..."
 build_and_push_image "$LAMBDA_REPO" "$TAG" "../src/Dockerfile"
 echo "Building and pushing ECS/Fargate image..."
 build_and_push_image "$ECS_REPO" "$TAG" "../src/Dockerfile_ecs"
 echo "================================================"
 echo "✅ All images successfully pushed!"
 echo "================================================"
 echo ""
 echo "Your container image URIs:"
 ACCOUNT_ID=$(aws sts get-caller-identity --region $AWS_REGION --query Account --output text)
 echo "  Lambda: ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${LAMBDA_REPO}:${TAG}"
 echo "  ECS/Fargate: ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECS_REPO}:${TAG}"
 echo ""
 echo "Next steps:"
 echo "  1. Download the CloudFormation templates from deployment/ folder"
 echo "  2. Update the ContainerImageUri parameter with your image URI above"
 echo "  3. Deploy the stack via AWS CloudFormation Console"
 echo ""
--- a/src/Dockerfile
+++ b/src/Dockerfile
@@ -1,19 +1,9 @@
 FROM public.ecr.aws/lambda/python:3.12
 # Add Lambda Web Adapter for API Gateway response streaming
 COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 /lambda-adapter /opt/extensions/lambda-adapter
 COPY ./api ./api
 COPY requirements.txt .
 RUN pip3 install -r requirements.txt -U --no-cache-dir
-# Preload tiktoken encoding: https://github.com/aws-samples/bedrock-access-gateway/issues/118
+CMD [ "api.app.handler" ]
 ENV TIKTOKEN_CACHE_DIR=/var/task/.cache/tiktoken
 RUN python3 -c 'import tiktoken_ext.openai_public as tke; tke.cl100k_base()'
 # Lambda Web Adapter requires overriding the Lambda base image entrypoint
 # to run the web app directly instead of the Lambda runtime handler
 ENTRYPOINT []
 CMD ["python", "-m", "uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "8080"]
--- a/src/Dockerfile_ecs
+++ b/src/Dockerfile_ecs
@@ -1,4 +1,4 @@
-FROM public.ecr.aws/docker/library/python:3.13-slim
+FROM python:3.12-slim
 WORKDIR /app
@@ -8,19 +8,4 @@ RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
 COPY ./api /app/api
-# Create non-root user
+CMD ["uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "80"]
 RUN groupadd -r appuser && useradd -r -g appuser appuser && \
    chown -R appuser:appuser /app
 USER appuser
 # Preload tiktoken encoding: https://github.com/aws-samples/bedrock-access-gateway/issues/118
 ENV TIKTOKEN_CACHE_DIR=/app/.cache/tiktoken
 RUN python3 -c 'import tiktoken_ext.openai_public as tke; tke.cl100k_base()'
 ENV PORT=8080
 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:${PORT}/health').read()"
 CMD ["sh", "-c", "uvicorn api.app:app --host 0.0.0.0 --port ${PORT}"]
--- a/src/api/app.py
+++ b/src/api/app.py
@@ -1,5 +1,4 @@
 import logging
 import os
 import uvicorn
 from fastapi import FastAPI
@@ -8,8 +7,8 @@ from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import PlainTextResponse
 from mangum import Mangum
-from api.routers import chat, embeddings, model
+from api.routers import model, chat, embeddings
-from api.setting import API_ROUTE_PREFIX, DESCRIPTION, SUMMARY, TITLE, VERSION
+from api.setting import API_ROUTE_PREFIX, TITLE, DESCRIPTION, SUMMARY, VERSION
 config = {
    "title": TITLE,
@@ -24,22 +23,14 @@ logging.basicConfig(
 )
 app = FastAPI(**config)
 allowed_origins = os.environ.get("ALLOWED_ORIGINS", "*")
 origins_list = [origin.strip() for origin in allowed_origins.split(",")] if allowed_origins != "*" else ["*"]
 # Warn if CORS allows all origins
 if origins_list == ["*"]:
    logging.warning("CORS is configured to allow all origins (*). Set ALLOWED_ORIGINS environment variable to restrict access.")
 app.add_middleware(
    CORSMiddleware,
-    allow_origins=origins_list,  # nosec - configurable via ALLOWED_ORIGINS env var
+    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
 app.include_router(model.router, prefix=API_ROUTE_PREFIX)
 app.include_router(chat.router, prefix=API_ROUTE_PREFIX)
 app.include_router(embeddings.router, prefix=API_ROUTE_PREFIX)
@@ -53,21 +44,10 @@ async def health():
@app.exception_handler(RequestValidationError)
 async def validation_exception_handler(request, exc):
    logger = logging.getLogger(__name__)
    # Log essential info only - avoid sensitive data and performance overhead
    logger.warning(
        "Request validation failed: %s %s - %s", 
        request.method, 
        request.url.path,
        str(exc).split('\n')[0]  # First line only
    )
    return PlainTextResponse(str(exc), status_code=400)
 handler = Mangum(app)
 if __name__ == "__main__":
-    # Bind to 0.0.0.0 for container environments, network is handled by network policies and load balancers
+    uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)
    uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=False)  # nosec B104
--- a/src/api/auth.py
+++ b/src/api/auth.py
@@ -1,43 +1,28 @@
 import json
 import os
 from typing import Annotated
 import boto3
 from botocore.exceptions import ClientError
 from fastapi import Depends, HTTPException, status
-from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
+from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
 from api.setting import DEFAULT_API_KEYS
 api_key_param = os.environ.get("API_KEY_PARAM_NAME")
 api_key_secret_arn = os.environ.get("API_KEY_SECRET_ARN")
 api_key_env = os.environ.get("API_KEY")
 if api_key_param:
    # For backward compatibility.
    # Please now use secrets manager instead.
    ssm = boto3.client("ssm")
-    api_key = ssm.get_parameter(Name=api_key_param, WithDecryption=True)["Parameter"]["Value"]
+    api_key = ssm.get_parameter(Name=api_key_param, WithDecryption=True)["Parameter"][
-elif api_key_secret_arn:
+        "Value"
-    sm = boto3.client("secretsmanager")
+    ]
    try:
        response = sm.get_secret_value(SecretId=api_key_secret_arn)
        if "SecretString" in response:
            secret = json.loads(response["SecretString"])
            api_key = secret["api_key"]
    except ClientError:
        raise RuntimeError("Unable to retrieve API KEY, please ensure the secret ARN is correct")
    except KeyError:
        raise RuntimeError('Please ensure the secret contains a "api_key" field')
 elif api_key_env:
    api_key = api_key_env
 else:
-    raise RuntimeError(
+    api_key = DEFAULT_API_KEYS
        "API Key is not configured. Please set up your API Key."
    )
 security = HTTPBearer()
 def api_key_auth(
-    credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)],
+    credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)]
 ):
    if credentials.credentials != api_key:
-        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Key")
+        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Key"
        )
--- a/src/api/models/base.py
+++ b/src/api/models/base.py
@@ -1,4 +1,3 @@
 import logging
 import time
 import uuid
 from abc import ABC, abstractmethod
@@ -6,17 +5,14 @@ from typing import AsyncIterable
 from api.schema import (
    # Chat
    ChatRequest,
    ChatResponse,
    ChatRequest,
    ChatStreamResponse,
    # Embeddings
    EmbeddingsRequest,
    EmbeddingsResponse,
    Error,
 )
 logger = logging.getLogger(__name__)
 class BaseChatModel(ABC):
    """Represent a basic chat model
@@ -33,12 +29,12 @@ class BaseChatModel(ABC):
        pass
    @abstractmethod
-    async def chat(self, chat_request: ChatRequest) -> ChatResponse:
+    def chat(self, chat_request: ChatRequest) -> ChatResponse:
        """Handle a basic chat completion requests."""
        pass
    @abstractmethod
-    async def chat_stream(self, chat_request: ChatRequest) -> AsyncIterable[bytes]:
+    def chat_stream(self, chat_request: ChatRequest) -> AsyncIterable[bytes]:
        """Handle a basic chat completion requests with stream response."""
        pass
@@ -47,20 +43,16 @@ class BaseChatModel(ABC):
        return "chatcmpl-" + str(uuid.uuid4())[:8]
    @staticmethod
-    def stream_response_to_bytes(response: ChatStreamResponse | Error | None = None) -> bytes:
+    def stream_response_to_bytes(
-        if isinstance(response, Error):
+            response: ChatStreamResponse | None = None
-            logger.error("Stream error: %s", response.error.message if response.error else "Unknown error")
+    ) -> bytes:
-            data = response.model_dump_json()
+        if response:
        elif isinstance(response, ChatStreamResponse):
            # to populate other fields when using exclude_unset=True
            response.system_fingerprint = "fp"
            response.object = "chat.completion.chunk"
            response.created = int(time.time())
-            data = response.model_dump_json(exclude_unset=True)
+            return "data: {}\n\n".format(response.model_dump_json(exclude_unset=True)).encode("utf-8")
-        else:
+        return "data: [DONE]\n\n".encode("utf-8")
            data = "[DONE]"
        return f"data: {data}\n\n".encode("utf-8")
 class BaseEmbeddingsModel(ABC):
--- a/src/api/models/bedrock.py
+++ b/src/api/models/bedrock.py
--- a/src/api/routers/chat.py
+++ b/src/api/routers/chat.py
@@ -1,11 +1,11 @@
 from typing import Annotated
-from fastapi import APIRouter, Body, Depends
+from fastapi import APIRouter, Depends, Body
 from fastapi.responses import StreamingResponse
 from api.auth import api_key_auth
 from api.models.bedrock import BedrockModel
-from api.schema import ChatRequest, ChatResponse, ChatStreamResponse, Error
+from api.schema import ChatRequest, ChatResponse, ChatStreamResponse
 from api.setting import DEFAULT_MODEL
 router = APIRouter(
@@ -15,9 +15,7 @@ router = APIRouter(
 )
-@router.post(
+@router.post("/completions", response_model=ChatResponse | ChatStreamResponse, response_model_exclude_unset=True)
    "/completions", response_model=ChatResponse | ChatStreamResponse | Error, response_model_exclude_unset=True
 )
 async def chat_completions(
        chat_request: Annotated[
            ChatRequest,
@@ -32,7 +30,7 @@ async def chat_completions(
                    }
                ],
            ),
-    ],
+        ]
 ):
    if chat_request.model.lower().startswith("gpt-"):
        chat_request.model = DEFAULT_MODEL
@@ -41,5 +39,7 @@ async def chat_completions(
    model = BedrockModel()
    model.validate(chat_request)
    if chat_request.stream:
-        return StreamingResponse(content=model.chat_stream(chat_request), media_type="text/event-stream")
+        return StreamingResponse(
-    return await model.chat(chat_request)
+            content=model.chat_stream(chat_request), media_type="text/event-stream"
        )
    return model.chat(chat_request)
--- a/src/api/routers/embeddings.py
+++ b/src/api/routers/embeddings.py
@@ -1,6 +1,6 @@
 from typing import Annotated
-from fastapi import APIRouter, Body, Depends
+from fastapi import APIRouter, Depends, Body
 from api.auth import api_key_auth
 from api.models.bedrock import get_embeddings_model
@@ -21,11 +21,13 @@ async def embeddings(
                examples=[
                    {
                        "model": "cohere.embed-multilingual-v3",
-                    "input": ["Your text string goes here"],
+                        "input": [
                            "Your text string goes here"
                        ],
                    }
                ],
            ),
-    ],
+        ]
 ):
    if embeddings_request.model.lower().startswith("text-embedding-"):
        embeddings_request.model = DEFAULT_EMBEDDING_MODEL
--- a/src/api/routers/model.py
+++ b/src/api/routers/model.py
@@ -4,7 +4,7 @@ from fastapi import APIRouter, Depends, HTTPException, Path
 from api.auth import api_key_auth
 from api.models.bedrock import BedrockModel
-from api.schema import Model, Models
+from api.schema import Models, Model
 router = APIRouter(
    prefix="/models",
@@ -22,7 +22,9 @@ async def validate_model_id(model_id: str):
@router.get("", response_model=Models)
 async def list_models():
-    model_list = [Model(id=model_id) for model_id in chat_model.list_models()]
+    model_list = [
        Model(id=model_id) for model_id in chat_model.list_models()
    ]
    return Models(data=model_list)
@@ -34,7 +36,7 @@ async def get_model(
        model_id: Annotated[
            str,
            Path(description="Model ID", example="anthropic.claude-3-sonnet-20240229-v1:0"),
-    ],
+        ]
 ):
    await validate_model_id(model_id)
    return Model(id=model_id)
--- a/src/api/schema.py
+++ b/src/api/schema.py
@@ -1,10 +1,8 @@
 import time
-from typing import Iterable, Literal
+from typing import Literal, Iterable
 from pydantic import BaseModel, Field
 from api.setting import DEFAULT_MODEL
 class Model(BaseModel):
    id: str
@@ -41,15 +39,10 @@ class ImageUrl(BaseModel):
 class ImageContent(BaseModel):
-    type: Literal["image_url"] = "image_url"
+    type: Literal["image_url"] = "image"
    image_url: ImageUrl
 class ToolContent(BaseModel):
    type: Literal["text"] = "text"
    text: str
 class SystemMessage(BaseModel):
    name: str | None = None
    role: Literal["system"] = "system"
@@ -65,20 +58,14 @@ class UserMessage(BaseModel):
 class AssistantMessage(BaseModel):
    name: str | None = None
    role: Literal["assistant"] = "assistant"
-    content: str | list[TextContent | ImageContent] | None = None
+    content: str | list[TextContent | ImageContent] | None
    tool_calls: list[ToolCall] | None = None
 class ToolMessage(BaseModel):
    role: Literal["tool"] = "tool"
    content: str | list[ToolContent] | list[dict]
    tool_call_id: str
 class DeveloperMessage(BaseModel):
    name: str | None = None
    role: Literal["developer"] = "developer"
    content: str
    tool_call_id: str
 class Function(BaseModel):
@@ -97,43 +84,25 @@ class StreamOptions(BaseModel):
 class ChatRequest(BaseModel):
-    messages: list[SystemMessage | UserMessage | AssistantMessage | ToolMessage | DeveloperMessage]
+    messages: list[SystemMessage | UserMessage | AssistantMessage | ToolMessage]
-    model: str = DEFAULT_MODEL
+    model: str
    frequency_penalty: float | None = Field(default=0.0, le=2.0, ge=-2.0)  # Not used
    presence_penalty: float | None = Field(default=0.0, le=2.0, ge=-2.0)  # Not used
    stream: bool | None = False
    stream_options: StreamOptions | None = None
-    temperature: float | None = Field(default=None, le=2.0, ge=0.0)
+    temperature: float | None = Field(default=1.0, le=2.0, ge=0.0)
-    top_p: float | None = Field(default=None, le=1.0, ge=0.0)
+    top_p: float | None = Field(default=1.0, le=1.0, ge=0.0)
    user: str | None = None  # Not used
    max_tokens: int | None = 2048
    max_completion_tokens: int | None = None
    reasoning_effort: Literal["low", "medium", "high"] | None = None
    n: int | None = 1  # Not used
    tools: list[Tool] | None = None
    tool_choice: str | object = "auto"
    stop: list[str] | str | None = None
    extra_body: dict | None = None
 class PromptTokensDetails(BaseModel):
    """Details about prompt tokens usage, following OpenAI API format."""
    cached_tokens: int = 0
    audio_tokens: int = 0
 class CompletionTokensDetails(BaseModel):
    """Details about completion tokens usage, following OpenAI API format."""
    reasoning_tokens: int = 0
    audio_tokens: int = 0
 class Usage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    prompt_tokens_details: PromptTokensDetails | None = None
    completion_tokens_details: CompletionTokensDetails | None = None
 class ChatResponseMessage(BaseModel):
@@ -141,7 +110,6 @@ class ChatResponseMessage(BaseModel):
    role: Literal["assistant"] | None = None
    content: str | None = None
    tool_calls: list[ToolCall] | None = None
    reasoning_content: str | None = None
 class BaseChoice(BaseModel):
@@ -182,7 +150,7 @@ class EmbeddingsRequest(BaseModel):
    input: str | list[str] | Iterable[int | Iterable[int]]
    model: str
    encoding_format: Literal["float", "base64"] = "float"
-    dimensions: int | None = None  # Used by Nova embeddings; ignored by other models.
+    dimensions: int | None = None  # not used.
    user: str | None = None  # not used.
@@ -202,11 +170,3 @@ class EmbeddingsResponse(BaseModel):
    data: list[Embedding]
    model: str
    usage: EmbeddingsUsage
 class ErrorMessage(BaseModel):
    message: str
 class Error(BaseModel):
    error: ErrorMessage
--- a/src/api/setting.py
+++ b/src/api/setting.py
@@ -1,18 +1,28 @@
 import os
-API_ROUTE_PREFIX = os.environ.get("API_ROUTE_PREFIX", "/api/v1")
+DEFAULT_API_KEYS = "bedrock"
 API_ROUTE_PREFIX = "/api/v1"
 TITLE = "Amazon Bedrock Proxy APIs"
 SUMMARY = "OpenAI-Compatible RESTful APIs for Amazon Bedrock"
 VERSION = "0.1.0"
 DESCRIPTION = """
 Use OpenAI-Compatible RESTful APIs for Amazon Bedrock models.
 List of Amazon Bedrock models currently supported:
 - Anthropic Claude 2 / 3 /3.5 (Haiku/Sonnet/Opus)
 - Meta Llama 2 / 3
 - Mistral / Mixtral
 - Cohere Command R / R+
 - Cohere Embedding
 """
 DEBUG = os.environ.get("DEBUG", "false").lower() != "false"
 AWS_REGION = os.environ.get("AWS_REGION", "us-west-2")
-DEFAULT_MODEL = os.environ.get("DEFAULT_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0")
+DEFAULT_MODEL = os.environ.get(
-DEFAULT_EMBEDDING_MODEL = os.environ.get("DEFAULT_EMBEDDING_MODEL", "cohere.embed-multilingual-v3")
+    "DEFAULT_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0"
-ENABLE_CROSS_REGION_INFERENCE = os.environ.get("ENABLE_CROSS_REGION_INFERENCE", "true").lower() != "false"
+)
-ENABLE_APPLICATION_INFERENCE_PROFILES = os.environ.get("ENABLE_APPLICATION_INFERENCE_PROFILES", "true").lower() != "false"
+DEFAULT_EMBEDDING_MODEL = os.environ.get(
-ENABLE_PROMPT_CACHING = os.environ.get("ENABLE_PROMPT_CACHING", "false").lower() != "false"
+    "DEFAULT_EMBEDDING_MODEL", "cohere.embed-multilingual-v3"
 )
--- a/src/requirements.txt
+++ b/src/requirements.txt
@@ -1,10 +1,9 @@
-fastapi==0.128.0
+fastapi==0.111.0
-starlette==0.49.1  # CVE-2025-62727: Fix ReDoS in Range header parsing
+pydantic==2.7.1
 pydantic==2.11.4
 uvicorn==0.29.0
 mangum==0.17.0
-tiktoken==0.9.0
+tiktoken==0.6.0
-requests==2.32.4
+requests==2.32.3
-numpy==2.2.5
+numpy==1.26.4
-boto3==1.40.4
+boto3==1.34.132
-botocore==1.40.4
+botocore==1.34.132
--- a/test/erroneous_code_test.py
+++ b/test/erroneous_code_test.py
@@ -0,0 +1,87 @@
 import time
 import random
 def calculate_factorial(n):
    if n == 0:
        return 1
    else:
        return n * calculate_factorial(n - 1)
 def find_largest_number(numbers):
    largest = numbers[0]
    for num in numbers:
        if num > largest:
            largest = num
    return largest
 def inefficient_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
    return arr
 class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def print_user_info(self):
        print(f"Name: {self.name}, Age: {self.age}")
 def process_data(data):
    result = []
    for item in data:
        if item % 2 == 0:
            result.append(item * 2)
        else:
            result.append(item * 3)
    return result
 def generate_random_numbers(n):
    numbers = []
    for i in range(n):
        numbers.append(random.randint(1, 100))
    return numbers
 def calculate_average(numbers):
    total = sum(numbers)
    count = len(numbers)
    average = total / count
    return average
 def main():
    # Inefficient factorial calculation
    print(calculate_factorial(20))
    # Unnecessary loop for finding largest number
    numbers = [3, 7, 2, 9, 1, 5]
    print(find_largest_number(numbers))
    # Inefficient sorting algorithm
    unsorted_list = [64, 34, 25, 12, 22, 11, 90]
    print(inefficient_sort(unsorted_list))
    # Inconsistent naming convention
    user1 = User("John Doe", 30)
    user1.print_user_info()
    # Redundant if-else structure
    data = [1, 2, 3, 4, 5]
    print(process_data(data))
    # Inefficient random number generation
    random_numbers = generate_random_numbers(1000000)
    print(f"Generated {len(random_numbers)} random numbers")
    # Potential division by zero
    empty_list = []
    print(calculate_average(empty_list))
    # Unnecessary time delay
    time.sleep(5)
    print("Finished processing after 5 seconds")
 if __name__ == "__main__":
    main()