fix: Fix ImageContent schema to use proper default value (#234 )

fix: merge additionalModelRequestFields instead of overwriting
When both reasoning_effort and extra_body are provided, additionalModelRequestFields set by reasoning_effort (containing reasoning_config) was silently overwritten by extra_body processing. This prevented features like anthropic_beta for 1M context from coexisting with reasoning_effort.
2026-03-13 10:42:22 +08:00 · 2026-03-10 16:41:52 +08:00 · 2026-02-26 11:48:05 +08:00 · 2026-02-26 11:41:17 +08:00 · 2026-02-19 17:00:05 +08:00 · 2026-02-12 15:21:50 +08:00
34 changed files with 2931 additions and 2769 deletions
--- a/.flake8
+++ b/.flake8
@@ -1,19 +0,0 @@
-[flake8]
-max-line-length = 120
-ignore =
-    E203,W191,W503
-exclude =
-    build
-    .git
-    __pycache__
-    .tox
-    venv
-    .venv
-    .venv-test
-    tmp*
-    deployment
-    cdk.out
-    node_modules
-
-max-complexity = 10
-require-code = True
--- a/.github/aws-genai-cicd-suite.yml
+++ b/.github/aws-genai-cicd-suite.yml
@@ -1,74 +0,0 @@
-name: Intelligent Code Review
-# Enable manual trigger
-on:
-  workflow_dispatch:
-  pull_request:
-    types: [opened, synchronize]
-
-    # Avoid running the same workflow on the same branch concurrently
-concurrency:
-  group: ${{ github.workflow }}-${{ github.ref }}
-
-jobs:
-  review:
-    runs-on: ubuntu-latest
-    permissions:
-      # read repository contents and write pull request comments
-      id-token: write
-      # allow github action bot to push new content into existing pull requests
-      contents: write
-      # contents: read
-      pull-requests: write
-    steps:
-    - name: Checkout code
-      uses: actions/checkout@v3
-
-    - name: Set up Node.js
-      uses: actions/setup-node@v3
-      with:
-        node-version: '20'
-
-    - name: Install dependencies
-      run: npm ci
-      shell: bash
-
-    # check if required dependencies @actions/core and @actions/github are installed
-    - name: Check if required dependencies are installed
-      run: |
-        npm list @actions/core
-        npm list @actions/github
-      shell: bash
-
-    - name: Debug GitHub Token
-      run: |
-        if [ -n "${{ secrets.GITHUB_TOKEN }}" ]; then
-          echo "GitHub Token is set"
-        else
-          echo "GitHub Token is not set"
-        fi
-
-    # assume the specified IAM role and set up the AWS credentials for use in subsequent steps.
-    - name: Configure AWS Credentials
-      uses: aws-actions/configure-aws-credentials@v4
-      with:
-        # using repository secret to get the role arn
-        role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
-        aws-region: us-east-1
-
-    - name: Intelligent GitHub Actions
-      uses: aws-samples/aws-genai-cicd-suite@stable
-      with:
-        # Automatic Provision: The GITHUB_TOKEN is automatically created and provided by GitHub for each workflow run. You don't need to manually create or store this token as a secret.
-        github-token: ${{ secrets.GITHUB_TOKEN }}
-        aws-region: us-east-1
-        model-id: anthropic.claude-3-sonnet-20240229-v1:0
-        generate-code-review: 'true'
-        generate-code-review-level: 'detailed'
-        generate-code-review-exclude-files: '*.md,*.json,*.js'
-        generate-pr-description: 'true'
-        generate-unit-test: 'false'
-        generate-unit-test-source-folder: 'debugging'
-        # Removed the invalid input 'generate-unit-test-exclude-files'
-        # output-language: 'zh'
-      env:
-        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
@@ -160,3 +160,4 @@ cython_debug/
 .idea/

 Config
+.vscode/launch.json
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,10 @@
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Ruff version.
+    rev: v0.9.10
+    hooks:
+      # Run the linter.
+      - id: ruff
+        types_or: [python, pyi]
+      # Run the formatter.
+      - id: ruff-format
--- a/README.md
+++ b/README.md
@@ -1,15 +1,19 @@
-[中文](./README_CN.md)
-
 # Bedrock Access Gateway

 OpenAI-compatible RESTful APIs for Amazon Bedrock

-## Breaking Changes
+## What's New 🔥

-The source code is refactored with the new [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) by bedrock which provides native support with tool calls.
+**API Gateway Response Streaming Support** - You can now deploy with Amazon API Gateway REST API instead of ALB, enabling true response streaming for better latency and cost optimization. See [Deployment Options](#deployment-options) for details.

-If you are facing any problems, please raise an issue.
+**Latest Models Supported:**
+- **Claude 4.5 Family**: Opus 4.5, Sonnet 4.5, Haiku 4.5 - Anthropic's most intelligent models with enhanced coding and agent capabilities
+- **Amazon Nova**: Nova Micro, Nova Lite, Nova Pro, Nova Premier - Amazon's native foundation models with multimodal support
+- **DeepSeek**: DeepSeek-R1 (reasoning), DeepSeek-V3.1 - Advanced reasoning and general-purpose models
+- **Qwen 3**: Qwen3-32B, Qwen3-235B, Qwen3-Coder-30B, Qwen3-Coder-480B - Alibaba's latest language and coding models
+- **OpenAI OSS**: gpt-oss-20b, gpt-oss-120b - Open-source GPT models available via Bedrock

+It also supports reasoning for **Claude 4/4.5** (extended thinking and interleaved thinking) and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.

 ## Overview

@@ -25,25 +29,17 @@ If you find this GitHub repository useful, please consider giving it a free star
 - [x] Support streaming response via server-sent events (SSE)
 - [x] Support Model APIs
 - [x] Support Chat Completion APIs
- [x] Support Tool Call (**new**)
- [x] Support Embedding API (**new**)
- [x] Support Multimodal API (**new**)
+- [x] Support Tool Call
+- [x] Support Embedding API
+- [x] Support Multimodal API
+- [x] Support Cross-Region Inference
+- [x] Support Application Inference Profiles (**new**)
+- [x] Support Reasoning (**new**)
+- [x] Support Interleaved thinking (**new**)
+- [x] Support Prompt Caching (**new**)

 Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.

-> **Note:** The legacy [text completion](https://platform.openai.com/docs/api-reference/completions) API is not supported, you should change to use chat completion API.
-
-Supported Amazon Bedrock models family:
-
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding
-
-You can call the `models` API to get the full list of model IDs supported.
-
-> **Note:** The default model is set to `anthropic.claude-3-sonnet-20240229-v1:0` which can be changed via Lambda environment variables (`DEFAULT_MODEL`).

 ## Get Started

@@ -57,58 +53,100 @@ Please make sure you have met below prerequisites:

 ### Architecture

-The following diagram illustrates the reference architecture. Note that it also includes a new **VPC** with two public subnets only for the Application Load Balancer (ALB).
+The following diagram illustrates the reference architecture. It uses [Amazon API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with Lambda for SSE support.

-![Architecture](assets/arch.svg)
+![Architecture](assets/arch.png)

-You can also choose to use [AWS Fargate](https://aws.amazon.com/fargate/) behind the ALB instead of [AWS Lambda](https://aws.amazon.com/lambda/), the main difference is the latency of the first byte for streaming response (Fargate is lower).
+### Deployment Options

-Alternatively, you can use Lambda Function URL to replace ALB, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
+| Option | Pros | Cons | Best For |
+|--------|------|------|----------|
+| **API Gateway + Lambda** | No VPC required, pay-per-request, native streaming support, lower operational overhead | Potential cold starts | Most use cases, cost-sensitive deployments |
+| **ALB + Fargate** | Lowest streaming latency, no cold starts | Higher cost, requires VPC | High-throughput, latency-sensitive workloads |
+
+You can also use Lambda Function URL as an alternative, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)

 ### Deployment

-Please follow the steps below to deploy the Bedrock Proxy APIs into your AWS account. Only supports regions where Amazon Bedrock is available (such as `us-west-2`). The deployment will take approximately **3-5 minutes** 🕒.
+Please follow the steps below to deploy the Bedrock Proxy APIs into your AWS account. Only supports regions where Amazon Bedrock is available (such as `us-west-2`). The deployment will take approximately **10-15 minutes** 🕒.

-**Step 1: Create your own custom API key (Optional)**
+**Step 1: Create your own API key in Secrets Manager (MUST)**

-> **Note:** This step is to use any string (without spaces) you like to create a custom API Key (credential) that will be used to access the proxy API later. This key does not have to match your actual OpenAI key, and you don't need to have an OpenAI API key. It is recommended that you take this step and ensure that you keep the key safe and private.
+> **Note:** This step is to use any string (without spaces) you like to create a custom API Key (credential) that will be used to access the proxy API later. This key does not have to match your actual OpenAI key, and you don't need to have an OpenAI API key. please keep the key safe and private.

-1. Open the AWS Management Console and navigate to the Systems Manager service.
-2. In the left-hand navigation pane, click on "Parameter Store".
-3. Click on the "Create parameter" button.
-4. In the "Create parameter" window, select the following options:
-    - Name: Enter a descriptive name for your parameter (e.g., "BedrockProxyAPIKey").
-    - Description: Optionally, provide a description for the parameter.
-    - Tier: Select **Standard**.
-    - Type: Select **SecureString**.
-    - Value: Any string (without spaces).
-5. Click "Create parameter".
-6. Make a note of the parameter name you used (e.g., "BedrockProxyAPIKey"). You'll need this in the next step.
+1. Open the AWS Management Console and navigate to the AWS Secrets Manager service.
+2. Click on "Store a new secret" button.
+3. In the "Choose secret type" page, select:

-**Step 2: Deploy the CloudFormation stack**
+   Secret type: Other type of secret
+   Key/value pairs:
+   - Key: api_key
+   - Value: Enter your API key value

-1. Sign in to AWS Management Console, switch to the region to deploy the CloudFormation Stack to.
-2. Click the following button to launch the CloudFormation Stack in that region. Choose one of the following:
-   - **ALB + Lambda**
+   Click "Next"
+4. In the "Configure secret" page:
+   Secret name: Enter a name (e.g., "BedrockProxyAPIKey")
+   Description: (Optional) Add a description of your secret
+5. Click "Next" and review all your settings and click "Store"

-      [![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxy.template)
-   - **ALB + Fargate**
+After creation, you'll see your secret in the Secrets Manager console. Make note of the secret ARN.

-      [![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxyFargate.template)
-3. Click "Next".
-4. On the "Specify stack details" page, provide the following information:
-    - Stack name: Change the stack name if needed.
-    - ApiKeyParam (if you set up an API key in Step 1): Enter the parameter name you used for storing the API key (e.g., `BedrockProxyAPIKey`). If you did not set up an API key, leave this field blank. Click "Next".
-5. On the "Configure stack options" page, you can leave the default settings or customize them according to your needs.
-6. Click "Next".
-7. On the "Review" page, review the details of the stack you're about to create. Check the "I acknowledge that AWS CloudFormation might create IAM resources" checkbox at the bottom.
-8. Click "Create stack".
+**Step 2: Build and push container images to ECR**
+
+1. Clone this repository:
+   ```bash
+   git clone https://github.com/aws-samples/bedrock-access-gateway.git
+   cd bedrock-access-gateway
+   ```
+
+2. Run the build and push script:
+   ```bash
+   cd scripts
+   bash ./push-to-ecr.sh
+   ```
+
+3. Follow the prompts to configure:
+   - ECR repository names (or use defaults)
+   - Image tag (or use default: `latest`)
+   - AWS region (or use default: `us-east-1`)
+
+4. The script will build and push both Lambda and ECS/Fargate images to your ECR repositories.
+
+5. **Important**: Copy the image URIs displayed at the end of the script output. You'll need these in the next step.
+
+**Step 3: Deploy the CloudFormation stack**
+
+1. Download the CloudFormation template you want to use:
+   - For API Gateway + Lambda: [`deployment/BedrockProxy.template`](deployment/BedrockProxy.template)
+   - For ALB + Fargate: [`deployment/BedrockProxyFargate.template`](deployment/BedrockProxyFargate.template)
+
+2. Sign in to AWS Management Console and navigate to the CloudFormation service in your target region.
+
+3. Click "Create stack" → "With new resources (standard)".
+
+4. Upload the template file you downloaded.
+
+5. On the "Specify stack details" page, provide the following information:
+   - **Stack name**: Enter a stack name (e.g., "BedrockProxyAPI")
+   - **ApiKeySecretArn**: Enter the secret ARN from Step 1
+   - **ContainerImageUri**: Enter the ECR image URI from Step 2 output
+   - **DefaultModelId**: (Optional) Change the default model if needed
+
+   Click "Next".
+
+6. On the "Configure stack options" page, you can leave the default settings or customize them according to your needs. Click "Next".
+
+7. On the "Review" page, review all details. Check the "I acknowledge that AWS CloudFormation might create IAM resources" checkbox at the bottom. Click "Submit".

 That is it! 🎉 Once deployed, click the CloudFormation stack and go to **Outputs** tab, you can find the API Base URL from `APIBaseUrl`, the value should look like `http://xxxx.xxx.elb.amazonaws.com/api/v1`.

+### Troubleshooting
+
+If you encounter any issues, please check the [Troubleshooting Guide](./docs/Troubleshooting.md) for more details.
+
 ### SDK/API Usage

-All you need is the API Key and the API Base URL. If you didn't set up your own key, then the default API Key (`bedrock`) will be used.
+All you need is the API Key and the API Base URL. If you didn't set up your own key following Step 1, the application will fail to start with an error message indicating that the API Key is not configured.

 Now, you can try out the proxy APIs. Let's say you want to test Claude 3 Sonnet model (model ID: `anthropic.claude-3-sonnet-20240229-v1:0`)...

@@ -153,14 +191,123 @@ print(completion.choices[0].message.content)

 Please check [Usage Guide](./docs/Usage.md) for more details about how to use embedding API, multimodal API and tool call.

+### Application Inference Profiles
+
+This proxy now supports **Application Inference Profiles**, which allow you to track usage and costs for your model invocations. You can use application inference profiles created in your AWS account for cost tracking and monitoring purposes.
+
+**Using Application Inference Profiles:**
+
+```bash
+# Use an application inference profile ARN as the model ID
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "arn:aws:bedrock:us-west-2:123456789012:application-inference-profile/your-profile-id",
+    "messages": [
+      {
+        "role": "user",
+        "content": "Hello!"
+      }
+    ]
+  }'
+```
+
+**SDK Usage with Application Inference Profiles:**
+
+```python
+from openai import OpenAI
+
+client = OpenAI()
+completion = client.chat.completions.create(
+    model="arn:aws:bedrock:us-west-2:123456789012:application-inference-profile/your-profile-id",
+    messages=[{"role": "user", "content": "Hello!"}],
+)
+
+print(completion.choices[0].message.content)
+```
+
+**Benefits of Application Inference Profiles:**
+- **Cost Tracking**: Track usage and costs for specific applications or use cases
+- **Usage Monitoring**: Monitor model invocation metrics through CloudWatch
+- **Tag-based Cost Allocation**: Use AWS cost allocation tags for detailed billing analysis
+
+For more information about creating and managing application inference profiles, see the [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-create.html).
+
+### Prompt Caching
+
+This proxy now supports **Prompt Caching** for Claude and Nova models, which can reduce costs by up to 90% and latency by up to 85% for workloads with repeated prompts.
+
+**Supported Models:**
+- Claude models (Claude 3.5 Haiku, Claude 4, Claude 4.5, etc.)
+- Nova models (Nova Micro, Nova Lite, Nova Pro, Nova Premier)
+
+**Enabling Prompt Caching:**
+
+You can enable prompt caching in two ways:
+
+1. **Globally via Environment Variable** (set in ECS Task Definition or Lambda):
+```bash
+ENABLE_PROMPT_CACHING=true
+```
+
+2. **Per-request via `extra_body`** :
+
+**Python SDK:**
+```python
+from openai import OpenAI
+
+client = OpenAI()
+
+# Cache system prompts
+response = client.chat.completions.create(
+    model="global.anthropic.claude-haiku-4-5-20251001-v1:0",
+    messages=[
+        {"role": "system", "content": "You are an expert assistant with knowledge of..."},
+        {"role": "user", "content": "Help me with this task"}
+    ],
+    extra_body={
+        "prompt_caching": {"system": True}
+    }
+)
+
+# Check cache hit
+if response.usage.prompt_tokens_details:
+    cached_tokens = response.usage.prompt_tokens_details.cached_tokens
+    print(f"Cached tokens: {cached_tokens}")
+```
+
+**cURL:**
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "global.anthropic.claude-haiku-4-5-20251001-v1:0",
+    "messages": [
+      {"role": "system", "content": "Long system prompt..."},
+      {"role": "user", "content": "Question"}
+    ],
+    "extra_body": {
+      "prompt_caching": {"system": true}
+    }
+  }'
+```
+
+**Cache Options:**
+- `"prompt_caching": {"system": true}` - Cache system prompts
+- `"prompt_caching": {"messages": true}` - Cache user messages
+- `"prompt_caching": {"system": true, "messages": true}` - Cache both
+
+**Requirements:**
+- Prompt must be ≥1,024 tokens to enable caching
+- Cache TTL is 5 minutes (resets on each cache hit)
+- Nova models have a 20,000 token caching limit
+
+For more information, see the [Amazon Bedrock Prompt Caching Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html).
+
 ## Other Examples

-### AutoGen
-
-Below is an image of setting up the model in AutoGen studio.
-
-![AutoGen Model](assets/autogen-model.png)
-
 ### LangChain

 Make sure you use `ChatOpenAI(...)` instead of `OpenAI(...)`
@@ -199,43 +346,37 @@ print(response)

 This application does not collect any of your data. Furthermore, it does not log any requests or responses by default.

-### Why not used API Gateway instead of Application Load Balancer?
+### Why choose API Gateway vs ALB?

-Short answer is that API Gateway does not support server-sent events (SSE) for streaming response.
+**API Gateway + Lambda** uses [API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with [Lambda Web Adapter](https://github.com/awslabs/aws-lambda-web-adapter) to support SSE streaming without requiring a VPC. This is a cost-effective, serverless option with up to 10 minutes timeout.
+
+**ALB + Fargate** provides the lowest streaming latency with no cold starts, ideal for high-throughput workloads.

 ### Which regions are supported?

-This solution only supports the regions where Amazon Bedrock is available, as for now, below are the list.
-
- US East (N. Virginia): us-east-1
- US West (Oregon): us-west-2
- Asia Pacific (Singapore): ap-southeast-1
- Asia Pacific (Sydney): ap-southeast-2
- Asia Pacific (Tokyo): ap-northeast-1
- Europe (Frankfurt): eu-central-1
- Europe (Paris): eu-west-3
-
 Generally speaking, all regions that Amazon Bedrock supports will also be supported, if not, please raise an issue in Github.

 Note that not all models are available in those regions.

-### Can I build and use my own ECR image
+### Which models are supported?

-Yes, you can clone the repo and build the container image by yourself (`src/Dockerfile`) and then push to your ECR repo. You can use `scripts/push-to-ecr.sh`
-
-Replace the repo url in the CloudFormation template before you deploy.
+You can use the [Models API](./docs/Usage.md#models-api) to get/refresh a list of supported models in the current region.

 ### Can I run this locally

-Yes, you can run this locally.
+Yes, you can run this locally, e.g. run below command under `src` folder:
+
+```bash
+uvicorn api.app:app --host 0.0.0.0 --port 8000
+```

 The API base url should look like `http://localhost:8000/api/v1`.

 ### Any performance sacrifice or latency increase by using the proxy APIs

-Comparing with the AWS SDK call, the referenced architecture will bring additional latency on response, you can try and test that on you own.
+Compared with direct AWS SDK calls, the proxy architecture will add some latency. The default API Gateway + Lambda deployment provides good streaming performance with Lambda response streaming.

-Also, you can use Lambda Web Adapter + Function URL (see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.
+For lowest latency on streaming responses, consider the ALB + Fargate deployment option which eliminates cold starts and provides consistent performance.

 ### Any plan to support SageMaker models?

@@ -247,13 +388,7 @@ Fine-tuned models and models with Provisioned Throughput are currently not suppo

 ### How to upgrade?

-To use the latest features, you don't need to redeploy the CloudFormation stack. You simply need to pull the latest image. 
-
-To do so, depends on which version you deployed:
-
- **Lambda version**: Go to AWS Lambda console, find the Lambda function, then find and click the `Deploy new image` button and click save.
- **Fargate version**: Go to ECS console, click the ECS cluster, go the `Tasks` tab, select the only task that is running and simply click `Stop selected` menu. A new task with latest image will start automatically.
-
+To use the latest features, you need follow the deployment guide to redeploy the application. You can upgrade the existing CloudFormation stack to get the latest changes.

 ## Security

--- a/README_CN.md
+++ b/README_CN.md
@@ -1,267 +0,0 @@
-[English](./README.md)
-
-# Bedrock Access Gateway
-
-使用兼容OpenAI的API访问Amazon Bedrock
-
-## 重大变更
-
-项目源代码已使用Bedrock提供的新 [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) 进行了重构,该API对工具调用提供了原生支持。
-
-如果您遇到任何问题,请提 Github Issue。
-
-## 概述
-
-Amazon Bedrock提供了广泛的基础模型(如Claude 3 Opus/Sonnet/Haiku、Llama 2/3、Mistral/Mixtral等),以及构建生成式AI应用程序的多种功能。更多详细信息,请查看[Amazon 
-Bedrock](https://aws.amazon.com/bedrock)。
-
-有时,您可能已经使用OpenAI的API或SDK构建了应用程序,并希望在不修改代码的情况下试用Amazon
-Bedrock的模型。或者,您可能只是希望在AutoGen等工具中评估这些基础模型的功能。 好消息是, 这里提供了一种方便的途径,让您可以使用
-OpenAI 的 API 或 SDK 无缝集成并试用 Amazon Bedrock 的模型,而无需对现有代码进行修改。
-
-如果您觉得这个项目有用,请考虑给它点个一个免费的小星星 ⭐。
-
-功能列表：
-
- [x] 支持 server-sent events (SSE)的流式响应
- [x] 支持 Model APIs
- [x] 支持 Chat Completion APIs
- [x] 支持 Tool Call (**new**)
- [x] 支持 Embedding API (**new**)
- [x] 支持 Multimodal API (**new**)
-
-请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用新API的更多详细信息。
-
-> 注意： 不支持旧的 [text completion](https://platform.openai.com/docs/api-reference/completions) API，请更改为使用Chat Completion API。
-
-支持的Amazon Bedrock模型家族：
-
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding
-
-你可以先调用`models` API 获取支持的详细 model ID 列表。
-
-> 注意: 默认模型为 `anthropic.claude-3-sonnet-20240229-v1:0`， 可以通过更改Lambda环境变量进行更改。
-
-## 使用指南
-
-### 前提条件
-
-请确保您已满足以下先决条件:
-
- 可以访问Amazon Bedrock基础模型。
-
-如果您还没有获得模型访问权限,请参考[配置](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html)指南。
-
-### 架构图
-
-下图展示了本方案的参考架构。请注意,它还包括一个新的**VPC**,其中只有两个公共子网用于应用程序负载均衡器(ALB)。
-
-![Architecture](assets/arch.svg)
-
-您也可以选择在 ALB 后面接 [AWS Fargate](https://aws.amazon.com/fargate/) 而不是 [AWS Lambda](https://aws.amazon.com/lambda/)，主要区别在于流响应的首字节延迟（Fargate更低）。
-
-或者,您可以使用 Lambda Function URL 来代替 ALB,请参阅[示例](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
-
-### 部署
-
-请按以下步骤将Bedrock代理API部署到您的AWS账户中。仅支持Amazon Bedrock可用的区域(如us-west-2)。 部署预计用时**3-5分钟** 🕒。
-
-**第一步: 自定义您的API Key (可选)**
-
-> 注意:这一步是使用任意字符串（不带空格）创建一个自定义的API Key(凭证),将用于后续访问代理API。此API Key不必与您实际的OpenAI
-> Key一致,您甚至无需拥有OpenAI API Key。建议您执行此步操作并且请确保保管好此API Key。
-
-1. 打开AWS管理控制台,导航到Systems Manager服务。
-2. 在左侧导航窗格中,单击"参数存储"。
-3. 单击"创建参数"按钮。
-4. 在"创建参数"窗口中,选择以下选项:
-    - 名称:输入参数的描述性名称(例如"BedrockProxyAPIKey")。
-    - 描述:可选,为参数提供描述。
-    - 层级:选择**标准**。
-    - 类型:选择**SecureString**。
-    - 值: 随意字符串（不带空格）。
-5. 单击"创建参数"。
-6. 记录您使用的参数名称(例如"BedrockProxyAPIKey")。您将在下一步中需要它。
-
-**第二步: 部署CloudFormation堆栈**
-
-1. 登录AWS管理控制台,切换到要部署CloudFormation堆栈的区域。
-2. 单击以下按钮在该区域启动CloudFormation堆栈，选择一种方式部署。
-   - **ALB + Lambda**
-
-      [![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxy.template)
-   - **ALB + Fargate**
-   
-      [![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxyFargate.template)
-3. 单击"下一步"。
-4. 在"指定堆栈详细信息"页面,提供以下信息:
-    - 堆栈名称: 可以根据需要更改名称。
-    - ApiKeyParam(如果在步骤1中设置了API Key):输入您用于存储API密钥的参数名称(例如"BedrockProxyAPIKey")，否则,请将此字段留空。
-      单击"下一步"。
-5. 在"配置堆栈选项"页面,您可以保留默认设置或根据需要进行自定义。
-6. 单击"下一步"。
-7. 在"审核"页面,查看您即将创建的堆栈详细信息。勾选底部的"我确认，AWS CloudFormation 可能创建 IAM 资源。"复选框。
-8. 单击"创建堆栈"。
-
-仅此而已 🎉 。部署完成后,点击CloudFormation堆栈,进入"输出"选项卡,你可以从"APIBaseUrl"
-中找到API Base URL,它应该类似于`http://xxxx.xxx.elb.amazonaws.com/api/v1` 这样的格式。
-
-### SDK/API使用
-
-你只需要API Key和API Base URL。如果你没有设置自己的密钥,那么默认将使用API Key `bedrock`。
-
-现在,你可以尝试使用代理API了。假设你想测试Claude 3 Sonnet模型,那么使用"anthropic.claude-3-sonnet-20240229-v1:0"作为模型ID。
-
- **API 使用示例**
-
-```bash
-export OPENAI_API_KEY=<API key>
-export OPENAI_BASE_URL=<API base url>
-# 旧版本请使用OPENAI_API_BASE
-# https://github.com/openai/openai-python/issues/624
-export OPENAI_API_BASE=<API base url>
-```
-
-```bash
-curl $OPENAI_BASE_URL/chat/completions \
-  -H "Content-Type: application/json" \
-  -H "Authorization: Bearer $OPENAI_API_KEY" \
-  -d '{
-    "model": "anthropic.claude-3-sonnet-20240229-v1:0",
-    "messages": [
-      {
-        "role": "user",
-        "content": "Hello!"
-      }
-    ]
-  }'
-```
-
- **SDK 使用示例**
-
-```python
-from openai import OpenAI
-
-client = OpenAI()
-completion = client.chat.completions.create(
-    model="anthropic.claude-3-sonnet-20240229-v1:0",
-    messages=[{"role": "user", "content": "Hello!"}],
-)
-
-print(completion.choices[0].message.content)
-```
-
-请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用Embedding API、多模态API和Tool Call的更多详细信息。
-
-## 其他例子
-
-### AutoGen
-
-例如在AutoGen studio配置和使用模型
-
-![AutoGen Model](assets/autogen-model.png)
-
-### LangChain
-
-请确保使用的示`ChatOpenAI(...)` ，而不是`OpenAI(...)`
-
-```python
-# pip install langchain-openai
-import os
-
-from langchain.chains import LLMChain
-from langchain.prompts import PromptTemplate
-from langchain_openai import ChatOpenAI
-
-chat = ChatOpenAI(
-    model="anthropic.claude-3-sonnet-20240229-v1:0",
-    temperature=0,
-    openai_api_key=os.environ['OPENAI_API_KEY'],
-    openai_api_base=os.environ['OPENAI_BASE_URL'],
-)
-
-template = """Question: {question}
-
-Answer: Let's think step by step."""
-
-prompt = PromptTemplate.from_template(template)
-llm_chain = LLMChain(prompt=prompt, llm=chat)
-
-question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
-response = llm_chain.invoke(question)
-print(response)
-
-```
-
-## FAQs
-
-### 关于隐私
-
-这个方案不会收集您的任何数据。而且,它默认情况下也不会记录任何请求或响应。
-
-### 为什么没有使用API Gateway 而是使用了Application Load Balancer?
-
-简单的答案是API Gateway不支持 server-sent events (SSE) 用于流式响应。
-
-### 支持哪些区域?
-
-只支持Amazon Bedrock可用的区域, 截至当前，包括以下区域:
-
- 美国东部(弗吉尼亚北部)：us-east-1
- 美国西部(俄勒冈州)：us-west-2
- 亚太地区(新加坡)：ap-southeast-1
- 亚太地区(悉尼)：ap-southeast-2
- 亚太地区(东京)：ap-northeast-1
- 欧洲(法兰克福)：eu-central-1
- 欧洲(巴黎)：eu-west-3
-
-通常来说，所有Amazon Bedrock支持的区域都支持，如果不支持，请提个Github Issue。
-
-注意，并非所有模型都在上面区可用。
-
-### 我可以构建并使用自己的ECR镜像吗?
-
-是的,你可以克隆repo并自行构建容器镜像(src/Dockerfile),然后推送到你自己的ECR仓库。 脚本可以参考`scripts/push-to-ecr.sh`。
-
-在部署之前,请在CloudFormation模板中替换镜像仓库URL。
-
-### 我可以在本地运行吗?
-
-是的,你可以在本地运行,那么API Base URL应该类似于`http://localhost:8000/api/v1`
-
-### 使用代理API会有任何性能牺牲或延迟增加吗?
-
-与 AWS SDK 调用相比,本方案参考架构会在响应上会有额外的延迟,你可以自己部署并测试。
-
-另外,你也可以使用 Lambda Web Adapter + Function URL (
-参见 [示例](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming))来代替 ALB
-或使用 AWS Fargate 来代替 Lambda,以获得更好的流响应性能。
-
-### 有计划支持SageMaker模型吗?
-
-目前没有支持SageMaker模型的计划。这取决于是否有客户需求。
-
-### 有计划支持Bedrock自定义模型吗?
-
-不支持微调模型和设置了已预配吞吐量的模型。如有需要,你可以克隆repo并进行自定义。
-
-### 如何升级?
-
-要使用最新功能,您无需重新部署CloudFormation堆栈。您只需拉取最新的镜像即可。
-
-具体操作方式取决于您部署的版本:
-
- **Lambda版本**: 进入AWS Lambda控制台,找到Lambda 函数，然后找到并单击`部署新映像`按钮,然后单击保存。
- **Fargate版本**: 进入ECS控制台,单击ECS集群,转到`任务`选项卡,选择正在运行的唯一任务,然后点击`停止所选`菜单, ECS会自动启动新任务并且使用最新镜像。
-
-## 安全
-
-更多信息,请参阅[CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications)。
-
-## 许可证
-
-本项目根据MIT-0许可证获得许可。请参阅LICENSE文件。
--- a/8
+++ b/8
@@ -0,0 +1,8 @@
+certifi
+
+SPDX-License-Identifier: MPL-2.0
+This Source Code Form is subject to the terms of the Mozilla Public
+License, v. 2.0. If a copy of the MPL was not distributed with this
+file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+https://github.com/certifi/python-certifi
--- a/assets/arch.png
+++ b/assets/arch.png
--- a/assets/arch.svg
+++ b/assets/arch.svg
--- a/assets/autogen-agent.png
+++ b/assets/autogen-agent.png
--- a/assets/autogen-model.png
+++ b/assets/autogen-model.png
--- a/assets/launch-stack.png
+++ b/assets/launch-stack.png
--- a/deployment/BedrockProxy.template
+++ b/deployment/BedrockProxy.template
@@ -1,768 +1,178 @@
-{
-  "Description": "Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock",
-  "Transform": "AWS::LanguageExtensions",
-  "Parameters": {
-    "ApiKeyParam": {
-      "Type": "String",
-      "Default": "",
-      "Description": "The parameter name in System Manager used to store the API Key, leave blank to use a default key"
-    }
-  },
-  "Resources": {
-    "VPCB9E5F0B4": {
-      "Type": "AWS::EC2::VPC",
-      "Properties": {
-        "CidrBlock": "10.250.0.0/16",
-        "EnableDnsHostnames": true,
-        "EnableDnsSupport": true,
-        "InstanceTenancy": "default",
-        "Tags": [
-          {
-            "Key": "Name",
-            "Value": "BedrockProxy/VPC"
-          }
-        ]
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/Resource"
-      }
-    },
-    "VPCPublicSubnet1SubnetB4246D30": {
-      "Type": "AWS::EC2::Subnet",
-      "Properties": {
-        "AvailabilityZone": {
-          "Fn::Select": [
-            0,
-            {
-              "Fn::GetAZs": ""
-            }
-          ]
-        },
-        "CidrBlock": "10.250.0.0/24",
-        "MapPublicIpOnLaunch": true,
-        "Tags": [
-          {
-            "Key": "aws-cdk:subnet-name",
-            "Value": "Public"
-          },
-          {
-            "Key": "aws-cdk:subnet-type",
-            "Value": "Public"
-          },
-          {
-            "Key": "Name",
-            "Value": "BedrockProxy/VPC/PublicSubnet1"
-          }
-        ],
-        "VpcId": {
-          "Ref": "VPCB9E5F0B4"
-        }
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/Subnet"
-      }
-    },
-    "VPCPublicSubnet1RouteTableFEE4B781": {
-      "Type": "AWS::EC2::RouteTable",
-      "Properties": {
-        "Tags": [
-          {
-            "Key": "Name",
-            "Value": "BedrockProxy/VPC/PublicSubnet1"
-          }
-        ],
-        "VpcId": {
-          "Ref": "VPCB9E5F0B4"
-        }
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/RouteTable"
-      }
-    },
-    "VPCPublicSubnet1RouteTableAssociation0B0896DC": {
-      "Type": "AWS::EC2::SubnetRouteTableAssociation",
-      "Properties": {
-        "RouteTableId": {
-          "Ref": "VPCPublicSubnet1RouteTableFEE4B781"
-        },
-        "SubnetId": {
-          "Ref": "VPCPublicSubnet1SubnetB4246D30"
-        }
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/RouteTableAssociation"
-      }
-    },
-    "VPCPublicSubnet1DefaultRoute91CEF279": {
-      "Type": "AWS::EC2::Route",
-      "Properties": {
-        "DestinationCidrBlock": "0.0.0.0/0",
-        "GatewayId": {
-          "Ref": "VPCIGWB7E252D3"
-        },
-        "RouteTableId": {
-          "Ref": "VPCPublicSubnet1RouteTableFEE4B781"
-        }
-      },
-      "DependsOn": [
-        "VPCVPCGW99B986DC"
-      ],
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/DefaultRoute"
-      }
-    },
-    "VPCPublicSubnet2Subnet74179F39": {
-      "Type": "AWS::EC2::Subnet",
-      "Properties": {
-        "AvailabilityZone": {
-          "Fn::Select": [
-            1,
-            {
-              "Fn::GetAZs": ""
-            }
-          ]
-        },
-        "CidrBlock": "10.250.1.0/24",
-        "MapPublicIpOnLaunch": true,
-        "Tags": [
-          {
-            "Key": "aws-cdk:subnet-name",
-            "Value": "Public"
-          },
-          {
-            "Key": "aws-cdk:subnet-type",
-            "Value": "Public"
-          },
-          {
-            "Key": "Name",
-            "Value": "BedrockProxy/VPC/PublicSubnet2"
-          }
-        ],
-        "VpcId": {
-          "Ref": "VPCB9E5F0B4"
-        }
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/Subnet"
-      }
-    },
-    "VPCPublicSubnet2RouteTable6F1A15F1": {
-      "Type": "AWS::EC2::RouteTable",
-      "Properties": {
-        "Tags": [
-          {
-            "Key": "Name",
-            "Value": "BedrockProxy/VPC/PublicSubnet2"
-          }
-        ],
-        "VpcId": {
-          "Ref": "VPCB9E5F0B4"
-        }
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/RouteTable"
-      }
-    },
-    "VPCPublicSubnet2RouteTableAssociation5A808732": {
-      "Type": "AWS::EC2::SubnetRouteTableAssociation",
-      "Properties": {
-        "RouteTableId": {
-          "Ref": "VPCPublicSubnet2RouteTable6F1A15F1"
-        },
-        "SubnetId": {
-          "Ref": "VPCPublicSubnet2Subnet74179F39"
-        }
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/RouteTableAssociation"
-      }
-    },
-    "VPCPublicSubnet2DefaultRouteB7481BBA": {
-      "Type": "AWS::EC2::Route",
-      "Properties": {
-        "DestinationCidrBlock": "0.0.0.0/0",
-        "GatewayId": {
-          "Ref": "VPCIGWB7E252D3"
-        },
-        "RouteTableId": {
-          "Ref": "VPCPublicSubnet2RouteTable6F1A15F1"
-        }
-      },
-      "DependsOn": [
-        "VPCVPCGW99B986DC"
-      ],
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/DefaultRoute"
-      }
-    },
-    "VPCIGWB7E252D3": {
-      "Type": "AWS::EC2::InternetGateway",
-      "Properties": {
-        "Tags": [
-          {
-            "Key": "Name",
-            "Value": "BedrockProxy/VPC"
-          }
-        ]
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/IGW"
-      }
-    },
-    "VPCVPCGW99B986DC": {
-      "Type": "AWS::EC2::VPCGatewayAttachment",
-      "Properties": {
-        "InternetGatewayId": {
-          "Ref": "VPCIGWB7E252D3"
-        },
-        "VpcId": {
-          "Ref": "VPCB9E5F0B4"
-        }
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/VPC/VPCGW"
-      }
-    },
-    "ProxyApiHandlerServiceRoleBE71BFB1": {
-      "Type": "AWS::IAM::Role",
-      "Properties": {
-        "AssumeRolePolicyDocument": {
-          "Statement": [
-            {
-              "Action": "sts:AssumeRole",
-              "Effect": "Allow",
-              "Principal": {
-                "Service": "lambda.amazonaws.com"
-              }
-            }
-          ],
-          "Version": "2012-10-17"
-        },
-        "ManagedPolicyArns": [
-          {
-            "Fn::Join": [
-              "",
-              [
-                "arn:",
-                {
-                  "Ref": "AWS::Partition"
-                },
-                ":iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
-              ]
-            ]
-          }
-        ]
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/ServiceRole/Resource"
-      }
-    },
-    "ProxyApiHandlerServiceRoleDefaultPolicy86681202": {
-      "Type": "AWS::IAM::Policy",
-      "Properties": {
-        "PolicyDocument": {
-          "Statement": [
-            {
-              "Action": [
-                "bedrock:InvokeModel",
-                "bedrock:InvokeModelWithResponseStream"
-              ],
-              "Effect": "Allow",
-              "Resource": "arn:aws:bedrock:*::foundation-model/*"
-            },
-            {
-              "Action": [
-                "ssm:DescribeParameters",
-                "ssm:GetParameters",
-                "ssm:GetParameter",
-                "ssm:GetParameterHistory"
-              ],
-              "Effect": "Allow",
-              "Resource": {
-                "Fn::Join": [
-                  "",
-                  [
-                    "arn:",
-                    {
-                      "Ref": "AWS::Partition"
-                    },
-                    ":ssm:",
-                    {
-                      "Ref": "AWS::Region"
-                    },
-                    ":",
-                    {
-                      "Ref": "AWS::AccountId"
-                    },
-                    ":parameter/",
-                    {
-                      "Ref": "ApiKeyParam"
-                    }
-                  ]
-                ]
-              }
-            }
-          ],
-          "Version": "2012-10-17"
-        },
-        "PolicyName": "ProxyApiHandlerServiceRoleDefaultPolicy86681202",
-        "Roles": [
-          {
-            "Ref": "ProxyApiHandlerServiceRoleBE71BFB1"
-          }
-        ]
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/ServiceRole/DefaultPolicy/Resource"
-      }
-    },
-    "ProxyApiHandlerEC15A492": {
-      "Type": "AWS::Lambda::Function",
-      "Properties": {
-        "Architectures": [
-          "arm64"
-        ],
-        "Code": {
-          "ImageUri": {
-            "Fn::Join": [
-              "",
-              [
-                "366590864501.dkr.ecr.",
-                {
-                  "Ref": "AWS::Region"
-                },
-                ".",
-                {
-                  "Ref": "AWS::URLSuffix"
-                },
-                "/bedrock-proxy-api:latest"
-              ]
-            ]
-          }
-        },
-        "Description": "Bedrock Proxy API Handler",
-        "Environment": {
-          "Variables": {
-            "API_KEY_PARAM_NAME": {
-              "Ref": "ApiKeyParam"
-            },
-            "DEBUG": "false",
-            "DEFAULT_MODEL": {
-              "Fn::FindInMap": [
-                "ProxyRegionTable03E5BEB3",
-                {
-                  "Ref": "AWS::Region"
-                },
-                "model",
-                {
-                  "DefaultValue": "anthropic.claude-3-sonnet-20240229-v1:0"
-                }
-              ]
-            },
-            "DEFAULT_EMBEDDING_MODEL": "cohere.embed-multilingual-v3"
-          }
-        },
-        "MemorySize": 1024,
-        "PackageType": "Image",
-        "Role": {
-          "Fn::GetAtt": [
-            "ProxyApiHandlerServiceRoleBE71BFB1",
-            "Arn"
-          ]
-        },
-        "Timeout": 300
-      },
-      "DependsOn": [
-        "ProxyApiHandlerServiceRoleDefaultPolicy86681202",
-        "ProxyApiHandlerServiceRoleBE71BFB1"
-      ],
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/Resource"
-      }
-    },
-    "ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779": {
-      "Type": "AWS::Lambda::Permission",
-      "Properties": {
-        "Action": "lambda:InvokeFunction",
-        "FunctionName": {
-          "Fn::GetAtt": [
-            "ProxyApiHandlerEC15A492",
-            "Arn"
-          ]
-        },
-        "Principal": "elasticloadbalancing.amazonaws.com"
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/Invoke2UTWxhlfyqbT5FTn--5jvgbLgj+FfJwzswGk55DU1H--Y="
-      }
-    },
-    "ProxyALB87756780": {
-      "Type": "AWS::ElasticLoadBalancingV2::LoadBalancer",
-      "Properties": {
-        "LoadBalancerAttributes": [
-          {
-            "Key": "deletion_protection.enabled",
-            "Value": "false"
-          }
-        ],
-        "Scheme": "internet-facing",
-        "SecurityGroups": [
-          {
-            "Fn::GetAtt": [
-              "ProxyALBSecurityGroup0D6CA3DA",
-              "GroupId"
-            ]
-          }
-        ],
-        "Subnets": [
-          {
-            "Ref": "VPCPublicSubnet1SubnetB4246D30"
-          },
-          {
-            "Ref": "VPCPublicSubnet2Subnet74179F39"
-          }
-        ],
-        "Type": "application"
-      },
-      "DependsOn": [
-        "VPCPublicSubnet1DefaultRoute91CEF279",
-        "VPCPublicSubnet1RouteTableAssociation0B0896DC",
-        "VPCPublicSubnet2DefaultRouteB7481BBA",
-        "VPCPublicSubnet2RouteTableAssociation5A808732"
-      ],
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/Proxy/ALB/Resource"
-      }
-    },
-    "ProxyALBSecurityGroup0D6CA3DA": {
-      "Type": "AWS::EC2::SecurityGroup",
-      "Properties": {
-        "GroupDescription": "Automatically created Security Group for ELB BedrockProxyALB1CE4CAD1",
-        "SecurityGroupEgress": [
-          {
-            "CidrIp": "255.255.255.255/32",
-            "Description": "Disallow all traffic",
-            "FromPort": 252,
-            "IpProtocol": "icmp",
-            "ToPort": 86
-          }
-        ],
-        "SecurityGroupIngress": [
-          {
-            "CidrIp": "0.0.0.0/0",
-            "Description": "Allow from anyone on port 80",
-            "FromPort": 80,
-            "IpProtocol": "tcp",
-            "ToPort": 80
-          }
-        ],
-        "VpcId": {
-          "Ref": "VPCB9E5F0B4"
-        }
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/Proxy/ALB/SecurityGroup/Resource"
-      }
-    },
-    "ProxyALBListener933E9515": {
-      "Type": "AWS::ElasticLoadBalancingV2::Listener",
-      "Properties": {
-        "DefaultActions": [
-          {
-            "TargetGroupArn": {
-              "Ref": "ProxyALBListenerTargetsGroup187739FA"
-            },
-            "Type": "forward"
-          }
-        ],
-        "LoadBalancerArn": {
-          "Ref": "ProxyALB87756780"
-        },
-        "Port": 80,
-        "Protocol": "HTTP"
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/Proxy/ALB/Listener/Resource"
-      }
-    },
-    "ProxyALBListenerTargetsGroup187739FA": {
-      "Type": "AWS::ElasticLoadBalancingV2::TargetGroup",
-      "Properties": {
-        "HealthCheckEnabled": false,
-        "TargetType": "lambda",
-        "Targets": [
-          {
-            "Id": {
-              "Fn::GetAtt": [
-                "ProxyApiHandlerEC15A492",
-                "Arn"
-              ]
-            }
-          }
-        ]
-      },
-      "DependsOn": [
-        "ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779"
-      ],
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/Proxy/ALB/Listener/TargetsGroup/Resource"
-      }
-    },
-    "CDKMetadata": {
-      "Type": "AWS::CDK::Metadata",
-      "Properties": {
-        "Analytics": "v2:deflate64:H4sIAAAAAAAA/1VRXW/CMAz8LbyHDMovAKZNSJtWFcTr5LpeZ0iTKHFAqOp/n1q+uief7y7ynZLp+WKhZxM4xylWx6nhUrdbATyq9Y/NIUBDQkHBOX63hJlu9x57aZ+vVZ5Kw7hNpSXpuScqXBLaQWnoyT+5ZYwOGYSdfZh7sLFCwZK8g9AZLrczt20pAvjbkBW1JUyB5fIeXPLDgTHRKcKgC/IusrhwWUEkZaApK9Dtq8MjhU0DNb0li/cIY5xTaDhGdrZTDI1uC3etMczcGcYh2hV1igxEYTQOqhIMWGRbnzLdLr03jEPLDwfVatAo9E//7WMfRyF789zxSN9BqEketUdr16mCoksBh6if4D3buodfSXy6fsrIsHa2Yhk6WleRPsSXUzbT87meTQ6ReRqSFW5IF9f5B/Z2H8goAgAA"
-      },
-      "Metadata": {
-        "aws:cdk:path": "BedrockProxy/CDKMetadata/Default"
-      },
-      "Condition": "CDKMetadataAvailable"
-    }
-  },
-  "Mappings": {
-    "ProxyRegionTable03E5BEB3": {
-      "us-east-1": {
-        "model": "anthropic.claude-3-sonnet-20240229-v1:0"
-      },
-      "ap-southeast-1": {
-        "model": "anthropic.claude-v2"
-      },
-      "ap-northeast-1": {
-        "model": "anthropic.claude-v2:1"
-      },
-      "eu-central-1": {
-        "model": "anthropic.claude-v2:1"
-      }
-    }
-  },
-  "Outputs": {
-    "APIBaseUrl": {
-      "Description": "Proxy API Base URL (OPENAI_API_BASE)",
-      "Value": {
-        "Fn::Join": [
-          "",
-          [
-            "http://",
-            {
-              "Fn::GetAtt": [
-                "ProxyALB87756780",
-                "DNSName"
-              ]
-            },
-            "/api/v1"
-          ]
-        ]
-      }
-    }
-  },
-  "Conditions": {
-    "CDKMetadataAvailable": {
-      "Fn::Or": [
-        {
-          "Fn::Or": [
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "af-south-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "ap-east-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "ap-northeast-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "ap-northeast-2"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "ap-south-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "ap-southeast-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "ap-southeast-2"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "ca-central-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "cn-north-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "cn-northwest-1"
-              ]
-            }
-          ]
-        },
-        {
-          "Fn::Or": [
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "eu-central-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "eu-north-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "eu-south-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "eu-west-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "eu-west-2"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "eu-west-3"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "il-central-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "me-central-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "me-south-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "sa-east-1"
-              ]
-            }
-          ]
-        },
-        {
-          "Fn::Or": [
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "us-east-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "us-east-2"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "us-west-1"
-              ]
-            },
-            {
-              "Fn::Equals": [
-                {
-                  "Ref": "AWS::Region"
-                },
-                "us-west-2"
-              ]
-            }
-          ]
-        }
-      ]
-    }
-  }
-}
+Description: Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock (API Gateway + Lambda with Streaming)
+Parameters:
+  ApiKeySecretArn:
+    Type: String
+    AllowedPattern: ^arn:aws:secretsmanager:.*$
+    Description: The secret ARN in Secrets Manager used to store the API Key
+  ContainerImageUri:
+    Type: String
+    Description: The ECR image URI for the Lambda function (e.g., 123456789012.dkr.ecr.us-east-1.amazonaws.com/bedrock-proxy-api:latest)
+  DefaultModelId:
+    Type: String
+    Default: anthropic.claude-3-sonnet-20240229-v1:0
+    Description: The default model ID, please make sure the model ID is supported in the current region
+  EnablePromptCaching:
+    Type: String
+    Default: "false"
+    AllowedValues:
+      - "true"
+      - "false"
+    Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings.
+Resources:
+  # IAM Role for Lambda
+  ProxyApiHandlerServiceRole:
+    Type: AWS::IAM::Role
+    Properties:
+      AssumeRolePolicyDocument:
+        Statement:
+          - Action: sts:AssumeRole
+            Effect: Allow
+            Principal:
+              Service: lambda.amazonaws.com
+        Version: "2012-10-17"
+      ManagedPolicyArns:
+        - !Sub "arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
+
+  ProxyApiHandlerServiceRoleDefaultPolicy:
+    Type: AWS::IAM::Policy
+    Properties:
+      PolicyDocument:
+        Statement:
+          - Action:
+              - bedrock:ListFoundationModels
+              - bedrock:ListInferenceProfiles
+            Effect: Allow
+            Resource: "*"
+          - Action:
+              - bedrock:InvokeModel
+              - bedrock:InvokeModelWithResponseStream
+            Effect: Allow
+            Resource:
+              - arn:aws:bedrock:*::foundation-model/*
+              - arn:aws:bedrock:*:*:inference-profile/*
+              - arn:aws:bedrock:*:*:application-inference-profile/*
+          - Action:
+              - secretsmanager:GetSecretValue
+              - secretsmanager:DescribeSecret
+            Effect: Allow
+            Resource: !Ref ApiKeySecretArn
+        Version: "2012-10-17"
+      PolicyName: ProxyApiHandlerServiceRoleDefaultPolicy
+      Roles:
+        - !Ref ProxyApiHandlerServiceRole
+
+  # Lambda Function with Lambda Web Adapter for streaming
+  ProxyApiHandler:
+    Type: AWS::Lambda::Function
+    Properties:
+      Architectures:
+        - arm64
+      Code:
+        ImageUri: !Ref ContainerImageUri
+      Description: Bedrock Proxy API Handler with Response Streaming
+      Environment:
+        Variables:
+          # Lambda Web Adapter settings
+          AWS_LWA_INVOKE_MODE: RESPONSE_STREAM
+          AWS_LWA_READINESS_CHECK_PATH: /health
+          AWS_LWA_ASYNC_INIT: "true"
+          PORT: "8080"
+          # Application settings
+          DEBUG: "false"
+          API_KEY_SECRET_ARN: !Ref ApiKeySecretArn
+          DEFAULT_MODEL: !Ref DefaultModelId
+          DEFAULT_EMBEDDING_MODEL: cohere.embed-multilingual-v3
+          ENABLE_CROSS_REGION_INFERENCE: "true"
+          ENABLE_APPLICATION_INFERENCE_PROFILES: "true"
+          ENABLE_PROMPT_CACHING: !Ref EnablePromptCaching
+          API_ROUTE_PREFIX: /v1
+      MemorySize: 1024
+      PackageType: Image
+      Role: !GetAtt ProxyApiHandlerServiceRole.Arn
+      Timeout: 600
+    DependsOn:
+      - ProxyApiHandlerServiceRoleDefaultPolicy
+      - ProxyApiHandlerServiceRole
+
+  # API Gateway REST API (Regional)
+  RestApi:
+    Type: AWS::ApiGateway::RestApi
+    Properties:
+      Name: BedrockProxyApi
+      Description: Bedrock Access Gateway - OpenAI-compatible API with streaming support
+      EndpointConfiguration:
+        Types:
+          - REGIONAL
+      Body:
+        openapi: "3.0.1"
+        info:
+          title: BedrockProxyApi
+          version: "1.0"
+        paths:
+          /{proxy+}:
+            x-amazon-apigateway-any-method:
+              parameters:
+                - name: proxy
+                  in: path
+                  required: true
+                  schema:
+                    type: string
+              x-amazon-apigateway-integration:
+                type: aws_proxy
+                httpMethod: POST
+                uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations"
+                passthroughBehavior: when_no_match
+                timeoutInMillis: 600000
+                responseTransferMode: STREAM
+              responses:
+                default:
+                  description: Default response
+          /:
+            x-amazon-apigateway-any-method:
+              x-amazon-apigateway-integration:
+                type: aws_proxy
+                httpMethod: POST
+                uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations"
+                passthroughBehavior: when_no_match
+                timeoutInMillis: 600000
+                responseTransferMode: STREAM
+              responses:
+                default:
+                  description: Default response
+
+  # Lambda Permission for API Gateway
+  LambdaPermission:
+    Type: AWS::Lambda::Permission
+    Properties:
+      FunctionName: !Ref ProxyApiHandler
+      Action: lambda:InvokeFunction
+      Principal: apigateway.amazonaws.com
+      SourceArn: !Sub "arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${RestApi}/*"
+
+  # API Gateway Deployment
+  ApiDeployment:
+    Type: AWS::ApiGateway::Deployment
+    Properties:
+      RestApiId: !Ref RestApi
+    DependsOn:
+      - RestApi
+
+  # API Gateway Stage
+  ApiStage:
+    Type: AWS::ApiGateway::Stage
+    Properties:
+      RestApiId: !Ref RestApi
+      DeploymentId: !Ref ApiDeployment
+      StageName: api
+      Description: API Stage with streaming support
+
+Outputs:
+  APIBaseUrl:
+    Description: Proxy API Base URL (OPENAI_API_BASE)
+    Value: !Sub "https://${RestApi}.execute-api.${AWS::Region}.amazonaws.com/api/v1"
+  RestApiId:
+    Description: API Gateway REST API ID
+    Value: !Ref RestApi
+  LambdaFunctionArn:
+    Description: Lambda Function ARN
+    Value: !GetAtt ProxyApiHandler.Arn
--- a/deployment/BedrockProxyFargate.template
+++ b/deployment/BedrockProxyFargate.template
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -0,0 +1,18 @@
+version: '3.8'
+
+services:
+  bedrock-access-gateway:
+    build:
+      context: ./src
+      dockerfile: Dockerfile_ecs
+    ports:
+      - "127.0.0.1:8000:8080"
+    environment:
+      - ENABLE_PROMPT_CACHING=true
+      - API_KEY=${OPENAI_API_KEY}
+      - AWS_PROFILE
+      - AWS_ACCESS_KEY_ID
+      - AWS_SECRET_ACCESS_KEY
+      - AWS_SESSION_TOKEN
+    volumes:
+      - ${HOME}/.aws:/home/appuser/.aws
--- a/docs/Security.md
+++ b/docs/Security.md
@@ -0,0 +1,78 @@
+# Security
+
+This document details the security configuration required for the solution. In particular, it covers:
+
+- **HTTPS Setup**
+
+Following these guidelines will help ensure that traffic is encrypted over the public network.
+
+---
+
+## 1. HTTPS Authentication with the ALB
+
+### Overview
+
+Using HTTPS on your ALB guarantees that all client-to-ALB communication is encrypted. This is achieved by:
+- **Obtaining and managing SSL/TLS certificates** using AWS Certificate Manager (ACM). You'll need a domain but you can request a free certificate.
+- **Configuring HTTPS listeners** on the ALB
+- **Automating HTTP to HTTPS redirect** for clients that inadvertently access HTTP endpoints
+- **Allowing traffic in the Security Group of the ALB**
+
+### Step-by-Step Setup
+
+#### 1.1. Request an SSL/TLS Certificate via ACM
+
+1. **Navigate to AWS Certificate Manager (ACM):**  
+   In the AWS Management Console, go to ACM in the region where your ALB is deployed.
+
+2. **Request the Certificate:**  
+   - Click on **"Request a certificate"**.
+   - Choose **"Request a public certificate"** (or a private one if using a private CA).
+   - Enter your domain names (e.g., `example.com`, `*.example.com`).
+   - Complete the validation (via DNS or email). DNS validation is generally preferred for automation purposes.
+
+3. **Certificate Validation:**  
+   Ensure that the certificate status becomes **"Issued"** before proceeding.
+
+#### 1.2. Configure the ALB for HTTPS
+
+1. **Create or Modify the ALB Listener:**  
+   - Open the **EC2 Dashboard** and navigate to [Load Balancers](https://console.aws.amazon.com/ec2/home?#LoadBalancers:).
+   - If you already have an ALB, select it; otherwise, create a new ALB.
+   - Under the **Listeners** tab, click **Manage listener** > **Edit Listener**.
+   - Configure the listener protocol to **HTTPS** with port **443**.
+   - Select the certificate you requested from ACM.
+
+#### 1.3. (Optional) Redirect HTTP Traffic to HTTPS
+
+To enhance security, ensure that any HTTP requests are automatically redirected to HTTPS.
+
+1. **Create an HTTP Listener on Port 80:**
+   - Add a listener on port **80**.
+   - In the listener settings, add a rule to redirect all traffic to port **443** with the protocol changed to **HTTPS**.
+     
+   **Example AWS CLI command for redirection:**
+   ```bash
+   aws elbv2 create-listener \
+       --load-balancer-arn <your-alb-arn> \
+       --protocol HTTP \
+       --port 80 \
+       --default-actions Type=redirect,RedirectConfig="Protocol=https,Port=443,StatusCode=HTTP_301"
+   ```
+
+#### 1.4. Allow traffic in the Security Group of the ALB
+
+1. **Create a Security Group:**
+   - Go to the CloudFormation stack you originally used to deploy, select **Resources** and search for **ProxyALBSecurityGroup**
+   - Click on the Security Group
+   - Edit the Inbound Rules to allow traffic on Port 443 from `0.0.0.0/0` and (optionally) delete the Inbound Rule on Port 80. **Note**: If you delete the rule on port 80, you will need to update the base url to use HTTPS only as it won't redirect HTTP traffic to HTTPS.
+
+Now you should be able to test your application! Use the base url like:
+
+```
+https://<your-domain>/api/v1
+```
+
+---
+
+By following the steps outlined in this guide, you can configure a secure environment that uses HTTPS via ALB for encrypted traffic.
--- a/docs/Troubleshooting.md
+++ b/docs/Troubleshooting.md
@@ -0,0 +1,97 @@
+# Troubleshooting Guide
+
+This guide helps you troubleshoot common issues you might encounter when using the Bedrock Access Gateway.
+
+## Common Issues
+
+### 1. Parameter Store Access Error
+
+To see errors, first you need to access the CloudWatch Logs of the Lambda/Fargate instance.
+
+1. Go to the [CloudWatch Console](https://console.aws.amazon.com/cloudwatch/home?#logsV2:log-groups/)
+2. Search for `/aws/lambda/BedrockProxyAPI`
+3. Click on the `Log Stream` to see the error details
+
+```python
+botocore.exceptions.ClientError: An error occurred (ParameterNotFound) when calling the GetParameter operation: Parameter /BedrockProxyAPIKey not found.
+```
+
+This error occurs when the Lambda function cannot access the API key parameter in Parameter Store.
+
+**Possible solutions:**
+- Verify that you created the parameter in Parameter Store with the correct name
+- Check that the parameter name in the CloudFormation stack matches the one in Parameter Store
+- Ensure the Lambda function's IAM role has permission to access Parameter Store
+- If you didn't set up an API key, leave the `ApiKeyParam` field blank during deployment
+
+### 2. Model Access Issues
+
+If you receive an error about model access:
+
+```
+{"error": {"message": "User: arn:aws:iam::XXXX:role/XXX is not authorized to perform: bedrock:InvokeModel on resource: arn:aws:bedrock:REGION::foundation-model/XXX", "type": "auth_error", "code": 401}}
+```
+
+**Possible solutions:**
+- Ensure you have requested access to the model in Amazon Bedrock
+- Verify the Lambda/Fargate role has the necessary permissions to invoke Bedrock models
+- Check that you're using the correct model ID
+- Verify the model is available in your chosen region
+
+### 3. API Key Authentication Failures
+
+If you receive a 401 Unauthorized error:
+
+```
+{"detail": "Could not validate credentials"}
+```
+
+**Possible solutions:**
+- Verify you're using the correct API key in your requests
+- Check that the `Authorization` header is properly formatted (`Bearer YOUR-API-KEY`)
+- If using environment variables, ensure `OPENAI_API_KEY` is set correctly
+
+### 4. Cross-Region Access Issues
+
+If you're trying to access models in a different region:
+
+```
+{"error": {"message": "Region 'us-east-1' is not enabled for your account", "type": "invalid_request_error", "code": 400}}
+```
+
+**Possible solutions:**
+- Ensure the target region is enabled for your AWS account
+- Verify the model you're trying to access is available in that region
+- Check that your IAM roles have the necessary cross-region permissions
+
+### 5. Rate Limiting and Quotas
+
+If you're experiencing throttling or quota issues:
+
+```
+{"error": {"message": "Rate limit exceeded", "type": "rate_limit_error", "code": 429}}
+```
+
+**Possible solutions:**
+- Check your Bedrock service quotas in the AWS Console
+- Consider implementing retry logic in your application
+- Request a quota increase if needed
+
+## Getting Help
+
+If you're still experiencing issues:
+
+1. Check the CloudWatch Logs for detailed error messages
+2. Verify your AWS credentials and permissions
+3. Review the [Usage Guide](./Usage.md) for correct API usage
+4. Open a [GitHub issue](https://github.com/aws-samples/bedrock-access-gateway/issues/new?template=bug_report.md) with:
+   - Detailed error message
+   - Steps to reproduce
+   - Your deployment configuration (region, model, etc.)
+   - Any relevant CloudWatch logs
+
+## Additional Resources
+
+- [Amazon Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
+- [AWS IAM Documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/)
+- [AWS Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html)
--- a/docs/Usage.md
+++ b/docs/Usage.md
@@ -9,6 +9,85 @@ export OPENAI_API_KEY=<API key>
 export OPENAI_BASE_URL=<API base url>
 ```

+**API Example:**
+- [Models API](#models-api)
+- [Embedding API](#embedding-api)
+- [Multimodal API](#multimodal-api)
+- [Tool Call](#tool-call)
+- [Reasoning](#reasoning)
+- [Interleaved thinking (beta)](#Interleaved thinking (beta))
+
+## Models API
+
+You can use this API to get a list of supported model IDs.
+
+Also, you can use this API to refresh the model list if new models are added to Amazon Bedrock.
+
+
+**Example Request**
+
+```bash
+curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq .data
+```
+
+**Example Response**
+
+```bash
+[
+  ...
+  {
+    "id": "anthropic.claude-3-5-sonnet-20240620-v1:0",
+    "created": 1734416893,
+    "object": "model",
+    "owned_by": "bedrock"
+  },
+  {
+    "id": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
+    "created": 1734416893,
+    "object": "model",
+    "owned_by": "bedrock"
+  },
+  ...
+]
+```
+
+## Chat Completions API
+
+### Basic Example with Claude Sonnet 4.5
+
+Claude Sonnet 4.5 is Anthropic's most intelligent model, excelling at coding, complex reasoning, and agent-based tasks. It's available via global cross-region inference profiles.
+
+**Example Request**
+
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+    "messages": [
+      {
+        "role": "user",
+        "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."
+      }
+    ]
+  }'
+```
+
+**Example SDK Usage**
+
+```python
+from openai import OpenAI
+
+client = OpenAI()
+completion = client.chat.completions.create(
+    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+    messages=[{"role": "user", "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."}],
+)
+
+print(completion.choices[0].message.content)
+```
+
 ## Embedding API

 **Important Notice**: Please carefully review the following points before using this proxy API for embedding.
@@ -91,13 +170,10 @@ print(doc_result[0][:5])

 ## Multimodal API

-**Important Notice**: Please carefully review the following points before using this proxy API for Multimodal.
-
-1. This API is only supported by Claude 3 model.
-
 **Example Request**

 ```bash
+curl $OPENAI_BASE_URL/chat/completions \
 curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
@@ -185,7 +261,6 @@ curl $OPENAI_BASE_URL/chat/completions \
 **Important Notice**: Please carefully review the following points before using this Tool Call for Chat completion API.

 1. Function Call is now deprecated in favor of Tool Call by OpenAI, hence it's not supported here, you should use Tool Call instead.
-2. This API is only supported by Claude 3 model.

 **Example Request**

@@ -283,3 +358,218 @@ curl $OPENAI_BASE_URL/chat/completions \
 You can try it with different questions, such as:
 1. Hello, who are you?  (No tools are needed)
 2. What is the weather like today?  (Should use get_current_location tool first)
+
+
+## Reasoning
+
+**Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API.
+- Only Claude 3.7 Sonnet (extended thinking) and DeepSeek R1 support Reasoning so far. Please make sure the model supports reasoning before use.
+- For Claude 3.7 Sonnet, the reasoning mode (or thinking mode) is not enabled by default, you must pass additional `reasoning_effort` parameter in your request. Please also provide the right max_tokens (or max_completion_tokens) in your request. The budget_tokens is based on reasoning_effort (low: 30%, medium: 60%, high: 100% of max tokens), ensuring minimum budget_tokens of 1,024 with Anthropic recommending at least 4,000 tokens for comprehensive reasoning. Check [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html) for more details.
+- For DeepSeek R1, you don't need additional reasoning_effort parameter, otherwise, you may get an error.
+- The reasoning response (CoT, thoughts) is added in an additional tag 'reasoning_content' which is not officially supported by OpenAI. This is to follow [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example). This may be changed in the future.
+
+**Example Request**
+
+- Claude 3.7 Sonnet
+
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    "messages": [
+            "role": "user",
+            "content": "which one is bigger, 3.9 or 3.11?"
+        }
+    ],
+    "max_completion_tokens": 4096,
+    "reasoning_effort": "low",
+    "stream": false
+}'
+```
+
+- DeepSeek R1
+
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "us.deepseek.r1-v1:0",
+    "messages": [
+        {
+            "role": "user",
+            "content": "which one is bigger, 3.9 or 3.11?"
+        }
+    ],
+    "stream": false
+}'
+```
+
+**Example Response**
+
+```json
+{
+    "id": "chatcmpl-83fb7a88",
+    "created": 1740545278,
+    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    "system_fingerprint": "fp",
+    "choices": [
+        {
+            "index": 0,
+            "finish_reason": "stop",
+            "logprobs": null,
+            "message": {
+                "role": "assistant",
+                "content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
+                "reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
+            }
+        }
+    ],
+    "object": "chat.completion",
+    "usage": {
+        "prompt_tokens": 51,
+        "completion_tokens": 565,
+        "total_tokens": 616
+    }
+}
+```
+
+You can also use OpenAI SDK (run `pip3 install -U openai` first )
+
+- Non-Streaming
+
+```python
+from openai import OpenAI
+client = OpenAI()
+
+messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
+response = client.chat.completions.create(
+    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    messages=messages,
+    reasoning_effort="low",
+    max_completion_tokens=4096,
+)
+
+reasoning_content = response.choices[0].message.reasoning_content
+content = response.choices[0].message.content
+```
+
+- Streaming
+
+```python
+from openai import OpenAI
+client = OpenAI()
+
+messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
+response = client.chat.completions.create(
+    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    messages=messages,
+    reasoning_effort="low",
+    max_completion_tokens=4096,
+    stream=True,
+)
+
+reasoning_content = ""
+content = ""
+
+for chunk in response:
+    if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
+        reasoning_content += chunk.choices[0].delta.reasoning_content
+    elif chunk.choices[0].delta.content:
+        content += chunk.choices[0].delta.content
+```
+
+## Interleaved thinking (beta)
+
+**Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API.
+
+Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions.
+With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.
+
+**Supported Models**: Claude Sonnet 4, Claude Sonnet 4.5
+
+**Example Request**
+
+- Non-Streaming (Claude Sonnet 4.5)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "Explain how to implement a binary search tree with self-balancing capabilities."
+}],
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
+
+- Non-Streaming (Claude Sonnet 4)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "有一天，一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧，于是偷偷把分数改成了 88 分。她的父亲看到试卷后，怒发冲冠，狠狠地给了她一巴掌，怒吼道：“你这 8 怎么一半是绿的一半是红的，你以为我是傻子吗？”女孩被打后，委屈地哭了起来，什么也没说。过了一会儿，父亲突然崩溃了。请问这位父亲为什么过一会崩溃了？"
+}],
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
+
+- Streaming (Claude Sonnet 4.5)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "Explain how to implement a binary search tree with self-balancing capabilities."
+}],
+"stream": true,
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
+
+- Streaming (Claude Sonnet 4)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "有一天，一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧，于是偷偷把分数改成了 88 分。她的父亲看到试卷后，怒发冲冠，狠狠地给了她一巴掌，怒吼道：“你这 8 怎么一半是绿的一半是红的，你以为我是傻子吗？”女孩被打后，委屈地哭了起来，什么也没说。过了一会儿，父亲突然崩溃了。请问这位父亲为什么过一会崩溃了？"
+}],
+"stream": true,
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
--- a/docs/Usage_CN.md
+++ b/docs/Usage_CN.md
@@ -9,6 +9,83 @@ export OPENAI_API_KEY=<API key>
 export OPENAI_BASE_URL=<API base url>
 ```

+**API 示例:**
+- [Models API](#models-api)
+- [Embedding API](#embedding-api)
+- [Multimodal API](#multimodal-api)
+- [Tool Call](#tool-call)
+- [Reasoning](#reasoning)
+- [Interleaved thinking (beta)](#Interleaved thinking (beta))
+
+
+## Models API
+
+你可以通过这个API 获取支持的models 列表。 另外，如果Amazon Bedrock有新模型加入后，你也可以用它来更新刷新模型列表。
+
+**Request 示例**
+
+```bash
+curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq .data
+```
+
+**Response 示例**
+
+```bash
+[
+  ...
+  {
+    "id": "anthropic.claude-3-5-sonnet-20240620-v1:0",
+    "created": 1734416893,
+    "object": "model",
+    "owned_by": "bedrock"
+  },
+  {
+    "id": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
+    "created": 1734416893,
+    "object": "model",
+    "owned_by": "bedrock"
+  },
+  ...
+]
+```
+
+## Chat Completions API
+
+### Claude Sonnet 4.5 基础示例
+
+Claude Sonnet 4.5 是 Anthropic 最智能的模型，在编码、复杂推理和基于代理的任务方面表现出色。它通过全球跨区域推理配置文件提供。
+
+**Request 示例**
+
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+    "messages": [
+      {
+        "role": "user",
+        "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"
+      }
+    ]
+  }'
+```
+
+**SDK 使用示例**
+
+```python
+from openai import OpenAI
+
+client = OpenAI()
+completion = client.chat.completions.create(
+    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+    messages=[{"role": "user", "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"}],
+)
+
+print(completion.choices[0].message.content)
+```
+
 ## Embedding API

 **重要**: 在使用此代理 API 之前,请仔细阅读以下几点:
@@ -90,10 +167,6 @@ print(doc_result[0][:5])

 ## Multimodal API

-**重要**:在使用此代理API进行多模态处理之前,请仔细阅读以下几点:
-
-1. 此API 仅支持Claude 3模型。
-
 **Request 示例**

 ```bash
@@ -184,7 +257,6 @@ curl $OPENAI_BASE_URL/chat/completions \
 **重要**:在使用此代理API进行Tool Call之前,请仔细阅读以下几点:

 1. OpenAI 已经废弃使用Function Call,而推荐使用Tool Call,因此Function Call在此处不受支持,您应该改为Tool Call。
-1. 此API 仅支持Claude 3模型。 

 **Request 示例**

@@ -282,3 +354,222 @@ curl $OPENAI_BASE_URL/chat/completions \
 You can try it with different questions, such as:
 1. Hello, who are you?  (No tools are needed)
 2. What is the weather like today?  (Should use get_current_location tool first)
+
+## Reasoning
+
+
+**重要**: 使用此 reasoning 推理模式前，请仔细阅读以下要点。
+
+- 目前仅 Claude 3.7 Sonnet / Deepseek R1 模型支持推理功能。使用前请确保所用模型支持推理。
+- Claude 3.7 Sonnet 推理模式（或思考模式）默认未启用，您必须在请求中传递额外的 reasoning_effort 参数，参数值可选:low，medium, high。另外，请在请求中提供正确的 max_tokens（或 max_completion_tokens）参数。budget_tokens 基于 reasoning_effort 设置（低：30%，中：60%，高：100% 的max tokens），确保最小 budget_tokens 为 1,024，Anthropic 建议至少使用 4,000 个令牌以获得全面的推理。详情请参阅 [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html)。
+- Deepseek R1 会自动使用推理模式，不需要在中传递额外的 reasoning_effort 参数（否则会报错）
+- 推理结果（思维链结果、思考过程）被添加到名为 'reasoning_content' 的额外标签中，这不是 OpenAI 官方支持的格式。此设计遵循 [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example)  的规范。未来可能会有所变动。
+
+**Request 示例**
+
+- Claude 3.7 Sonnet
+
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    "messages": [
+        {
+            "role": "user",
+            "content": "which one is bigger, 3.9 or 3.11?"
+        }
+    ],
+    "max_completion_tokens": 4096,
+    "reasoning_effort": "low",
+    "stream": false
+}'
+```
+
+- DeepSeek R1
+
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "us.deepseek.r1-v1:0",
+    "messages": [
+        {
+            "role": "user",
+            "content": "which one is bigger, 3.9 or 3.11?"
+        }
+    ],
+    "stream": false
+}'
+```
+
+
+**Response 示例**
+
+```json
+{
+    "id": "chatcmpl-83fb7a88",
+    "created": 1740545278,
+    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    "system_fingerprint": "fp",
+    "choices": [
+        {
+            "index": 0,
+            "finish_reason": "stop",
+            "logprobs": null,
+            "message": {
+                "role": "assistant",
+                "content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
+                "reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
+            }
+        }
+    ],
+    "object": "chat.completion",
+    "usage": {
+        "prompt_tokens": 51,
+        "completion_tokens": 565,
+        "total_tokens": 616
+    }
+}
+```
+
+或者使用 OpenAI SDK (请先运行`pip3 install -U openai` 升级到最新版本)
+
+- Non-Streaming
+
+```python
+from openai import OpenAI
+client = OpenAI()
+
+messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
+response = client.chat.completions.create(
+    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    messages=messages,
+    reasoning_effort="low",
+    max_completion_tokens=4096,
+)
+
+reasoning_content = response.choices[0].message.reasoning_content
+content = response.choices[0].message.content
+```
+
+- Streaming
+
+```python
+from openai import OpenAI
+client = OpenAI()
+
+messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
+response = client.chat.completions.create(
+    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    messages=messages,
+    reasoning_effort="low",
+    max_completion_tokens=4096,
+    stream=True,
+)
+
+reasoning_content = ""
+content = ""
+
+for chunk in response:
+    if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
+        reasoning_content += chunk.choices[0].delta.reasoning_content
+    elif chunk.choices[0].delta.content:
+        content += chunk.choices[0].delta.content
+```
+
+## Interleaved thinking (beta)
+
+**重要提示**：在使用 Chat Completion API 的推理模式（reasoning mode）前，请务必仔细阅读以下内容。
+
+Claude 4 模型支持借助工具使用的扩展思维功能（Extended Thinking），其中包含交错思考（[interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) ）。该功能使 Claude 4 可以在多次调用工具之间进行思考，并在收到工具结果后执行更复杂的推理，这对处理更复杂的 Agentic AI 交互非常有帮助。
+
+在交错思考模式下，budget_tokens 可以超过 max_tokens 参数，因为它代表一次助手回合中所有思考块的总 Token 预算。
+
+**支持的模型**: Claude Sonnet 4, Claude Sonnet 4.5
+
+**Request 示例**
+
+- Non-Streaming (Claude Sonnet 4.5)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
+}],
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
+
+- Non-Streaming (Claude Sonnet 4)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "有一天，一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧，于是偷偷把分数改成了 88 分。她的父亲看到试卷后，怒发冲冠，狠狠地给了她一巴掌，怒吼道：“你这 8 怎么一半是绿的一半是红的，你以为我是傻子吗？”女孩被打后，委屈地哭了起来，什么也没说。过了一会儿，父亲突然崩溃了。请问这位父亲为什么过一会崩溃了？"
+}],
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
+
+- Streaming (Claude Sonnet 4.5)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
+}],
+"stream": true,
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
+
+- Streaming (Claude Sonnet 4)
+
+```bash
+curl http://127.0.0.1:8000/api/v1/chat/completions \
+-H "Content-Type: application/json" \
+-H "Authorization: Bearer bedrock" \
+-d '{
+"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
+"max_tokens": 2048,
+"messages": [{
+"role": "user",
+"content": "有一天，一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧，于是偷偷把分数改成了 88 分。她的父亲看到试卷后，怒发冲冠，狠狠地给了她一巴掌，怒吼道：“你这 8 怎么一半是绿的一半是红的，你以为我是傻子吗？”女孩被打后，委屈地哭了起来，什么也没说。过了一会儿，父亲突然崩溃了。请问这位父亲为什么过一会崩溃了？"
+}],
+"stream": true,
+"extra_body": {
+"anthropic_beta": ["interleaved-thinking-2025-05-14"],
+"thinking": {"type": "enabled", "budget_tokens": 4096}
+}
+}'
+```
--- a/ruff.toml
+++ b/ruff.toml
@@ -0,0 +1,21 @@
+line-length = 120
+indent-width = 4
+target-version = "py312"
+
+exclude = [
+    ".venv",
+    ".vscode",
+    "test/*"
+]
+
+[lint]
+select = ["E", "F", "I"]
+ignore = [
+    "E501",
+    "C901",
+    "F401",
+]
+
+[format]
+# use double quotes for strings.
+quote-style = "double"
--- a/scripts/push-to-ecr.sh
+++ b/scripts/push-to-ecr.sh
@@ -1,35 +1,139 @@
-# Make sure you have created the Repo in AWS ECR in every regions you want to push to before executing this script.
+# NOTE: The script will try to create the ECR repository if it doesn't exist. Please grant the necessary permissions to the IAM user or role.
 # Usage:
 #    cd scripts
-#    chmod +x push-to-ecr.sh
-#    ./push-to-ecr.sh
+#    bash ./push-to-ecr.sh

+set -o errexit  # exit on first error
+set -o nounset  # exit on using unset variables
+set -o pipefail # exit on any error in a pipeline
+
+# Change to the directory where the script is located
+cd "$(dirname "$0")"
+
+# Prompt user for inputs
+echo "================================================"
+echo "Bedrock Access Gateway - Build and Push to ECR"
+echo "================================================"
+echo ""
+
+# Get repository name for Lambda version
+read -p "Enter ECR repository name for Lambda (default: bedrock-proxy-api): " LAMBDA_REPO
+LAMBDA_REPO=${LAMBDA_REPO:-bedrock-proxy-api}
+
+# Get repository name for ECS/Fargate version
+read -p "Enter ECR repository name for ECS/Fargate (default: bedrock-proxy-api-ecs): " ECS_REPO
+ECS_REPO=${ECS_REPO:-bedrock-proxy-api-ecs}
+
+# Get image tag
+read -p "Enter image tag (default: latest): " TAG
+TAG=${TAG:-latest}
+
+# Get AWS region
+read -p "Enter AWS region (default: us-east-1): " AWS_REGION
+AWS_REGION=${AWS_REGION:-us-east-1}
+
+echo ""
+echo "Configuration:"
+echo "  Lambda Repository: $LAMBDA_REPO"
+echo "  ECS/Fargate Repository: $ECS_REPO"
+echo "  Image Tag: $TAG"
+echo "  AWS Region: $AWS_REGION"
+echo ""
+read -p "Continue with these settings? (y/n): " CONFIRM
+if [[ ! "$CONFIRM" =~ ^[Yy]$ ]]; then
+    echo "Aborted."
+    exit 1
+fi
+echo ""
+
+# Acknowledgment about ECR repository creation
+echo "ℹ️  NOTICE: This script will automatically create ECR repositories if they don't exist."
+echo "   The repositories will be created with the following default settings:"
+echo "   - Image tag mutability: MUTABLE (allows overwriting tags)"
+echo "   - Image scanning: Disabled"
+echo "   - Encryption: AES256 (AWS managed encryption)"
+echo ""
+echo "   You can modify these settings later in the AWS ECR Console if needed."
+echo "   Required IAM permissions: ecr:CreateRepository, ecr:GetAuthorizationToken,"
+echo "   ecr:BatchCheckLayerAvailability, ecr:InitiateLayerUpload, ecr:UploadLayerPart,"
+echo "   ecr:CompleteLayerUpload, ecr:PutImage"
+echo ""
+read -p "Do you acknowledge and want to proceed? (y/n): " ACK_CONFIRM
+if [[ ! "$ACK_CONFIRM" =~ ^[Yy]$ ]]; then
+    echo "Aborted."
+    exit 1
+fi
+echo ""

 # Define variables
-IMAGE_NAME="bedrock-proxy-api"
-TAG="latest"
-AWS_REGIONS=("us-west-2") # List of AWS regions
-#AWS_REGIONS=("us-east-1" "us-west-2" "eu-central-1" "ap-southeast-1" "ap-northeast-1") # List of AWS regions
+ARCHS=("arm64")  # Single architecture for simplicity

-# Build Docker image
-docker build -t $IMAGE_NAME:$TAG ../src/
+build_and_push_image() {
+    local IMAGE_NAME=$1
+    local TAG=$2
+    local DOCKERFILE_PATH=$3
+    local REGION=$AWS_REGION
+    local ARCH=${ARCHS[0]}

-# Loop through each AWS region
-for REGION in "${AWS_REGIONS[@]}"
-do
-    # Get the account ID for the current region
+    echo "Building $IMAGE_NAME:$TAG..."
+
+    # Build Docker image
+    # Note: --provenance=false and --sbom=false are required for Lambda compatibility
+    # Without these flags, Docker BuildKit (especially with docker-container driver) may create
+    # OCI image manifests with attestations that AWS Lambda does not support.
+    # Lambda requires Docker V2 Schema 2 format without multi-manifest index.
+    # See: https://github.com/aws-samples/bedrock-access-gateway/issues/206
+    docker buildx build \
+        --platform linux/$ARCH \
+        --provenance=false \
+        --sbom=false \
+        -t $IMAGE_NAME:$TAG \
+        -f $DOCKERFILE_PATH \
+        --load \
+        ../src/
+
+    # Get the account ID
    ACCOUNT_ID=$(aws sts get-caller-identity --region $REGION --query Account --output text)

    # Create repository URI
    REPOSITORY_URI="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${IMAGE_NAME}"

+    echo "Creating ECR repository if it doesn't exist..."
+    # Create ECR repository if it doesn't exist
+    aws ecr create-repository --repository-name "${IMAGE_NAME}" --region $REGION || true
+
+    echo "Logging in to ECR..."
    # Log in to ECR
    aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $REPOSITORY_URI

-    # Tag the image for the current region
+    echo "Pushing image to ECR..."
+    # Tag the image for ECR
    docker tag $IMAGE_NAME:$TAG $REPOSITORY_URI:$TAG

    # Push the image to ECR
    docker push $REPOSITORY_URI:$TAG
-    echo "Pushed $IMAGE_NAME:$TAG to $REPOSITORY_URI"
-done
+
+    echo "✅ Successfully pushed $IMAGE_NAME:$TAG to $REPOSITORY_URI"
+    echo ""
+}
+
+echo "Building and pushing Lambda image..."
+build_and_push_image "$LAMBDA_REPO" "$TAG" "../src/Dockerfile"
+
+echo "Building and pushing ECS/Fargate image..."
+build_and_push_image "$ECS_REPO" "$TAG" "../src/Dockerfile_ecs"
+
+echo "================================================"
+echo "✅ All images successfully pushed!"
+echo "================================================"
+echo ""
+echo "Your container image URIs:"
+ACCOUNT_ID=$(aws sts get-caller-identity --region $AWS_REGION --query Account --output text)
+echo "  Lambda: ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${LAMBDA_REPO}:${TAG}"
+echo "  ECS/Fargate: ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECS_REPO}:${TAG}"
+echo ""
+echo "Next steps:"
+echo "  1. Download the CloudFormation templates from deployment/ folder"
+echo "  2. Update the ContainerImageUri parameter with your image URI above"
+echo "  3. Deploy the stack via AWS CloudFormation Console"
+echo ""
--- a/src/Dockerfile
+++ b/src/Dockerfile
@@ -1,9 +1,19 @@
 FROM public.ecr.aws/lambda/python:3.12

+# Add Lambda Web Adapter for API Gateway response streaming
+COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 /lambda-adapter /opt/extensions/lambda-adapter
+
 COPY ./api ./api

 COPY requirements.txt .

 RUN pip3 install -r requirements.txt -U --no-cache-dir

-CMD [ "api.app.handler" ]
+# Preload tiktoken encoding: https://github.com/aws-samples/bedrock-access-gateway/issues/118
+ENV TIKTOKEN_CACHE_DIR=/var/task/.cache/tiktoken
+RUN python3 -c 'import tiktoken_ext.openai_public as tke; tke.cl100k_base()'
+
+# Lambda Web Adapter requires overriding the Lambda base image entrypoint
+# to run the web app directly instead of the Lambda runtime handler
+ENTRYPOINT []
+CMD ["python", "-m", "uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "8080"]
--- a/src/Dockerfile_ecs
+++ b/src/Dockerfile_ecs
@@ -1,4 +1,4 @@
-FROM python:3.12-slim
+FROM public.ecr.aws/docker/library/python:3.13-slim

 WORKDIR /app

@@ -8,4 +8,19 @@ RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt

 COPY ./api /app/api

-CMD ["uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "80"]
+# Create non-root user
+RUN groupadd -r appuser && useradd -r -g appuser appuser && \
+    chown -R appuser:appuser /app
+
+USER appuser
+
+# Preload tiktoken encoding: https://github.com/aws-samples/bedrock-access-gateway/issues/118
+ENV TIKTOKEN_CACHE_DIR=/app/.cache/tiktoken
+RUN python3 -c 'import tiktoken_ext.openai_public as tke; tke.cl100k_base()'
+
+ENV PORT=8080
+
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:${PORT}/health').read()"
+
+CMD ["sh", "-c", "uvicorn api.app:app --host 0.0.0.0 --port ${PORT}"]
--- a/src/api/app.py
+++ b/src/api/app.py
@@ -1,4 +1,5 @@
 import logging
+import os

 import uvicorn
 from fastapi import FastAPI
@@ -7,8 +8,8 @@ from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import PlainTextResponse
 from mangum import Mangum

-from api.routers import model, chat, embeddings
-from api.setting import API_ROUTE_PREFIX, TITLE, DESCRIPTION, SUMMARY, VERSION
+from api.routers import chat, embeddings, model
+from api.setting import API_ROUTE_PREFIX, DESCRIPTION, SUMMARY, TITLE, VERSION

 config = {
    "title": TITLE,
@@ -23,14 +24,22 @@ logging.basicConfig(
 )
 app = FastAPI(**config)

+allowed_origins = os.environ.get("ALLOWED_ORIGINS", "*")
+origins_list = [origin.strip() for origin in allowed_origins.split(",")] if allowed_origins != "*" else ["*"]
+
+# Warn if CORS allows all origins
+if origins_list == ["*"]:
+    logging.warning("CORS is configured to allow all origins (*). Set ALLOWED_ORIGINS environment variable to restrict access.")
+
 app.add_middleware(
    CORSMiddleware,
-    allow_origins=["*"],
+    allow_origins=origins_list,  # nosec - configurable via ALLOWED_ORIGINS env var
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )

+
 app.include_router(model.router, prefix=API_ROUTE_PREFIX)
 app.include_router(chat.router, prefix=API_ROUTE_PREFIX)
 app.include_router(embeddings.router, prefix=API_ROUTE_PREFIX)
@@ -44,10 +53,21 @@ async def health():

@app.exception_handler(RequestValidationError)
 async def validation_exception_handler(request, exc):
+    logger = logging.getLogger(__name__)
+    
+    # Log essential info only - avoid sensitive data and performance overhead
+    logger.warning(
+        "Request validation failed: %s %s - %s", 
+        request.method, 
+        request.url.path,
+        str(exc).split('\n')[0]  # First line only
+    )
+    
    return PlainTextResponse(str(exc), status_code=400)


 handler = Mangum(app)

 if __name__ == "__main__":
-    uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)
+    # Bind to 0.0.0.0 for container environments, network is handled by network policies and load balancers
+    uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=False)  # nosec B104
--- a/src/api/auth.py
+++ b/src/api/auth.py
@@ -1,28 +1,43 @@
+import json
 import os
 from typing import Annotated

 import boto3
+from botocore.exceptions import ClientError
 from fastapi import Depends, HTTPException, status
-from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
-
-from api.setting import DEFAULT_API_KEYS
+from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer

 api_key_param = os.environ.get("API_KEY_PARAM_NAME")
+api_key_secret_arn = os.environ.get("API_KEY_SECRET_ARN")
+api_key_env = os.environ.get("API_KEY")
 if api_key_param:
+    # For backward compatibility.
+    # Please now use secrets manager instead.
    ssm = boto3.client("ssm")
-    api_key = ssm.get_parameter(Name=api_key_param, WithDecryption=True)["Parameter"][
-        "Value"
-    ]
+    api_key = ssm.get_parameter(Name=api_key_param, WithDecryption=True)["Parameter"]["Value"]
+elif api_key_secret_arn:
+    sm = boto3.client("secretsmanager")
+    try:
+        response = sm.get_secret_value(SecretId=api_key_secret_arn)
+        if "SecretString" in response:
+            secret = json.loads(response["SecretString"])
+            api_key = secret["api_key"]
+    except ClientError:
+        raise RuntimeError("Unable to retrieve API KEY, please ensure the secret ARN is correct")
+    except KeyError:
+        raise RuntimeError('Please ensure the secret contains a "api_key" field')
+elif api_key_env:
+    api_key = api_key_env
 else:
-    api_key = DEFAULT_API_KEYS
+    raise RuntimeError(
+        "API Key is not configured. Please set up your API Key."
+    )

 security = HTTPBearer()


 def api_key_auth(
-    credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)]
+    credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)],
 ):
    if credentials.credentials != api_key:
-        raise HTTPException(
-            status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Key"
-        )
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Key")
--- a/src/api/models/base.py
+++ b/src/api/models/base.py
@@ -1,3 +1,4 @@
+import logging
 import time
 import uuid
 from abc import ABC, abstractmethod
@@ -5,14 +6,17 @@ from typing import AsyncIterable

 from api.schema import (
    # Chat
-    ChatResponse,
    ChatRequest,
+    ChatResponse,
    ChatStreamResponse,
    # Embeddings
    EmbeddingsRequest,
    EmbeddingsResponse,
+    Error,
 )

+logger = logging.getLogger(__name__)
+

 class BaseChatModel(ABC):
    """Represent a basic chat model
@@ -29,12 +33,12 @@ class BaseChatModel(ABC):
        pass

    @abstractmethod
-    def chat(self, chat_request: ChatRequest) -> ChatResponse:
+    async def chat(self, chat_request: ChatRequest) -> ChatResponse:
        """Handle a basic chat completion requests."""
        pass

    @abstractmethod
-    def chat_stream(self, chat_request: ChatRequest) -> AsyncIterable[bytes]:
+    async def chat_stream(self, chat_request: ChatRequest) -> AsyncIterable[bytes]:
        """Handle a basic chat completion requests with stream response."""
        pass

@@ -43,16 +47,20 @@ class BaseChatModel(ABC):
        return "chatcmpl-" + str(uuid.uuid4())[:8]

    @staticmethod
-    def stream_response_to_bytes(
-            response: ChatStreamResponse | None = None
-    ) -> bytes:
-        if response:
+    def stream_response_to_bytes(response: ChatStreamResponse | Error | None = None) -> bytes:
+        if isinstance(response, Error):
+            logger.error("Stream error: %s", response.error.message if response.error else "Unknown error")
+            data = response.model_dump_json()
+        elif isinstance(response, ChatStreamResponse):
            # to populate other fields when using exclude_unset=True
            response.system_fingerprint = "fp"
            response.object = "chat.completion.chunk"
            response.created = int(time.time())
-            return "data: {}\n\n".format(response.model_dump_json(exclude_unset=True)).encode("utf-8")
-        return "data: [DONE]\n\n".encode("utf-8")
+            data = response.model_dump_json(exclude_unset=True)
+        else:
+            data = "[DONE]"
+
+        return f"data: {data}\n\n".encode("utf-8")


 class BaseEmbeddingsModel(ABC):
--- a/src/api/models/bedrock.py
+++ b/src/api/models/bedrock.py
--- a/src/api/routers/chat.py
+++ b/src/api/routers/chat.py
@@ -1,11 +1,11 @@
 from typing import Annotated

-from fastapi import APIRouter, Depends, Body
+from fastapi import APIRouter, Body, Depends
 from fastapi.responses import StreamingResponse

 from api.auth import api_key_auth
 from api.models.bedrock import BedrockModel
-from api.schema import ChatRequest, ChatResponse, ChatStreamResponse
+from api.schema import ChatRequest, ChatResponse, ChatStreamResponse, Error
 from api.setting import DEFAULT_MODEL

 router = APIRouter(
@@ -15,7 +15,9 @@ router = APIRouter(
 )


-@router.post("/completions", response_model=ChatResponse | ChatStreamResponse, response_model_exclude_unset=True)
+@router.post(
+    "/completions", response_model=ChatResponse | ChatStreamResponse | Error, response_model_exclude_unset=True
+)
 async def chat_completions(
    chat_request: Annotated[
        ChatRequest,
@@ -30,7 +32,7 @@ async def chat_completions(
                }
            ],
        ),
-        ]
+    ],
 ):
    if chat_request.model.lower().startswith("gpt-"):
        chat_request.model = DEFAULT_MODEL
@@ -39,7 +41,5 @@ async def chat_completions(
    model = BedrockModel()
    model.validate(chat_request)
    if chat_request.stream:
-        return StreamingResponse(
-            content=model.chat_stream(chat_request), media_type="text/event-stream"
-        )
-    return model.chat(chat_request)
+        return StreamingResponse(content=model.chat_stream(chat_request), media_type="text/event-stream")
+    return await model.chat(chat_request)
--- a/src/api/routers/embeddings.py
+++ b/src/api/routers/embeddings.py
@@ -1,6 +1,6 @@
 from typing import Annotated

-from fastapi import APIRouter, Depends, Body
+from fastapi import APIRouter, Body, Depends

 from api.auth import api_key_auth
 from api.models.bedrock import get_embeddings_model
@@ -21,13 +21,11 @@ async def embeddings(
            examples=[
                {
                    "model": "cohere.embed-multilingual-v3",
-                        "input": [
-                            "Your text string goes here"
-                        ],
+                    "input": ["Your text string goes here"],
                }
            ],
        ),
-        ]
+    ],
 ):
    if embeddings_request.model.lower().startswith("text-embedding-"):
        embeddings_request.model = DEFAULT_EMBEDDING_MODEL
--- a/src/api/routers/model.py
+++ b/src/api/routers/model.py
@@ -4,7 +4,7 @@ from fastapi import APIRouter, Depends, HTTPException, Path

 from api.auth import api_key_auth
 from api.models.bedrock import BedrockModel
-from api.schema import Models, Model
+from api.schema import Model, Models

 router = APIRouter(
    prefix="/models",
@@ -22,9 +22,7 @@ async def validate_model_id(model_id: str):

@router.get("", response_model=Models)
 async def list_models():
-    model_list = [
-        Model(id=model_id) for model_id in chat_model.list_models()
-    ]
+    model_list = [Model(id=model_id) for model_id in chat_model.list_models()]
    return Models(data=model_list)


@@ -36,7 +34,7 @@ async def get_model(
    model_id: Annotated[
        str,
        Path(description="Model ID", example="anthropic.claude-3-sonnet-20240229-v1:0"),
-        ]
+    ],
 ):
    await validate_model_id(model_id)
    return Model(id=model_id)
--- a/src/api/schema.py
+++ b/src/api/schema.py
@@ -1,8 +1,10 @@
 import time
-from typing import Literal, Iterable
+from typing import Iterable, Literal

 from pydantic import BaseModel, Field

+from api.setting import DEFAULT_MODEL
+

 class Model(BaseModel):
    id: str
@@ -39,10 +41,15 @@ class ImageUrl(BaseModel):


 class ImageContent(BaseModel):
-    type: Literal["image_url"] = "image"
+    type: Literal["image_url"] = "image_url"
    image_url: ImageUrl


+class ToolContent(BaseModel):
+    type: Literal["text"] = "text"
+    text: str
+
+
 class SystemMessage(BaseModel):
    name: str | None = None
    role: Literal["system"] = "system"
@@ -58,16 +65,22 @@ class UserMessage(BaseModel):
 class AssistantMessage(BaseModel):
    name: str | None = None
    role: Literal["assistant"] = "assistant"
-    content: str | list[TextContent | ImageContent] | None
+    content: str | list[TextContent | ImageContent] | None = None
    tool_calls: list[ToolCall] | None = None


 class ToolMessage(BaseModel):
    role: Literal["tool"] = "tool"
-    content: str
+    content: str | list[ToolContent] | list[dict]
    tool_call_id: str


+class DeveloperMessage(BaseModel):
+    name: str | None = None
+    role: Literal["developer"] = "developer"
+    content: str
+
+
 class Function(BaseModel):
    name: str
    description: str | None = None
@@ -84,25 +97,43 @@ class StreamOptions(BaseModel):


 class ChatRequest(BaseModel):
-    messages: list[SystemMessage | UserMessage | AssistantMessage | ToolMessage]
-    model: str
+    messages: list[SystemMessage | UserMessage | AssistantMessage | ToolMessage | DeveloperMessage]
+    model: str = DEFAULT_MODEL
    frequency_penalty: float | None = Field(default=0.0, le=2.0, ge=-2.0)  # Not used
    presence_penalty: float | None = Field(default=0.0, le=2.0, ge=-2.0)  # Not used
    stream: bool | None = False
    stream_options: StreamOptions | None = None
-    temperature: float | None = Field(default=1.0, le=2.0, ge=0.0)
-    top_p: float | None = Field(default=1.0, le=1.0, ge=0.0)
+    temperature: float | None = Field(default=None, le=2.0, ge=0.0)
+    top_p: float | None = Field(default=None, le=1.0, ge=0.0)
    user: str | None = None  # Not used
    max_tokens: int | None = 2048
+    max_completion_tokens: int | None = None
+    reasoning_effort: Literal["low", "medium", "high"] | None = None
    n: int | None = 1  # Not used
    tools: list[Tool] | None = None
    tool_choice: str | object = "auto"
+    stop: list[str] | str | None = None
+    extra_body: dict | None = None
+
+
+class PromptTokensDetails(BaseModel):
+    """Details about prompt tokens usage, following OpenAI API format."""
+    cached_tokens: int = 0
+    audio_tokens: int = 0
+
+
+class CompletionTokensDetails(BaseModel):
+    """Details about completion tokens usage, following OpenAI API format."""
+    reasoning_tokens: int = 0
+    audio_tokens: int = 0


 class Usage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
+    prompt_tokens_details: PromptTokensDetails | None = None
+    completion_tokens_details: CompletionTokensDetails | None = None


 class ChatResponseMessage(BaseModel):
@@ -110,6 +141,7 @@ class ChatResponseMessage(BaseModel):
    role: Literal["assistant"] | None = None
    content: str | None = None
    tool_calls: list[ToolCall] | None = None
+    reasoning_content: str | None = None


 class BaseChoice(BaseModel):
@@ -150,7 +182,7 @@ class EmbeddingsRequest(BaseModel):
    input: str | list[str] | Iterable[int | Iterable[int]]
    model: str
    encoding_format: Literal["float", "base64"] = "float"
-    dimensions: int | None = None  # not used.
+    dimensions: int | None = None  # Used by Nova embeddings; ignored by other models.
    user: str | None = None  # not used.


@@ -170,3 +202,11 @@ class EmbeddingsResponse(BaseModel):
    data: list[Embedding]
    model: str
    usage: EmbeddingsUsage
+
+
+class ErrorMessage(BaseModel):
+    message: str
+
+
+class Error(BaseModel):
+    error: ErrorMessage
--- a/src/api/setting.py
+++ b/src/api/setting.py
@@ -1,28 +1,18 @@
 import os

-DEFAULT_API_KEYS = "bedrock"
-
-API_ROUTE_PREFIX = "/api/v1"
+API_ROUTE_PREFIX = os.environ.get("API_ROUTE_PREFIX", "/api/v1")

 TITLE = "Amazon Bedrock Proxy APIs"
 SUMMARY = "OpenAI-Compatible RESTful APIs for Amazon Bedrock"
 VERSION = "0.1.0"
 DESCRIPTION = """
 Use OpenAI-Compatible RESTful APIs for Amazon Bedrock models.
-
-List of Amazon Bedrock models currently supported:
- Anthropic Claude 2 / 3 /3.5 (Haiku/Sonnet/Opus)
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding
 """

 DEBUG = os.environ.get("DEBUG", "false").lower() != "false"
 AWS_REGION = os.environ.get("AWS_REGION", "us-west-2")
-DEFAULT_MODEL = os.environ.get(
-    "DEFAULT_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0"
-)
-DEFAULT_EMBEDDING_MODEL = os.environ.get(
-    "DEFAULT_EMBEDDING_MODEL", "cohere.embed-multilingual-v3"
-)
+DEFAULT_MODEL = os.environ.get("DEFAULT_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0")
+DEFAULT_EMBEDDING_MODEL = os.environ.get("DEFAULT_EMBEDDING_MODEL", "cohere.embed-multilingual-v3")
+ENABLE_CROSS_REGION_INFERENCE = os.environ.get("ENABLE_CROSS_REGION_INFERENCE", "true").lower() != "false"
+ENABLE_APPLICATION_INFERENCE_PROFILES = os.environ.get("ENABLE_APPLICATION_INFERENCE_PROFILES", "true").lower() != "false"
+ENABLE_PROMPT_CACHING = os.environ.get("ENABLE_PROMPT_CACHING", "false").lower() != "false"
--- a/src/requirements.txt
+++ b/src/requirements.txt
@@ -1,9 +1,10 @@
-fastapi==0.111.0
-pydantic==2.7.1
+fastapi==0.128.0
+starlette==0.49.1  # CVE-2025-62727: Fix ReDoS in Range header parsing
+pydantic==2.11.4
 uvicorn==0.29.0
 mangum==0.17.0
-tiktoken==0.6.0
-requests==2.32.3
-numpy==1.26.4
-boto3==1.34.132
-botocore==1.34.132
+tiktoken==0.9.0
+requests==2.32.4
+numpy==2.2.5
+boto3==1.40.4
+botocore==1.40.4
--- a/test/erroneous_code_test.py
+++ b/test/erroneous_code_test.py
@@ -1,87 +0,0 @@
-import time
-import random
-
-def calculate_factorial(n):
-    if n == 0:
-        return 1
-    else:
-        return n * calculate_factorial(n - 1)
-
-def find_largest_number(numbers):
-    largest = numbers[0]
-    for num in numbers:
-        if num > largest:
-            largest = num
-    return largest
-
-def inefficient_sort(arr):
-    n = len(arr)
-    for i in range(n):
-        for j in range(0, n-i-1):
-            if arr[j] > arr[j+1]:
-                arr[j], arr[j+1] = arr[j+1], arr[j]
-    return arr
-
-class User:
-    def __init__(self, name, age):
-        self.name = name
-        self.age = age
-    
-    def print_user_info(self):
-        print(f"Name: {self.name}, Age: {self.age}")
-
-def process_data(data):
-    result = []
-    for item in data:
-        if item % 2 == 0:
-            result.append(item * 2)
-        else:
-            result.append(item * 3)
-    return result
-
-def generate_random_numbers(n):
-    numbers = []
-    for i in range(n):
-        numbers.append(random.randint(1, 100))
-    return numbers
-
-def calculate_average(numbers):
-    total = sum(numbers)
-    count = len(numbers)
-    average = total / count
-    return average
-
-def main():
-    # Inefficient factorial calculation
-    print(calculate_factorial(20))
-
-    # Unnecessary loop for finding largest number
-    numbers = [3, 7, 2, 9, 1, 5]
-    print(find_largest_number(numbers))
-
-    # Inefficient sorting algorithm
-    unsorted_list = [64, 34, 25, 12, 22, 11, 90]
-    print(inefficient_sort(unsorted_list))
-
-    # Inconsistent naming convention
-    user1 = User("John Doe", 30)
-    user1.print_user_info()
-
-    # Redundant if-else structure
-    data = [1, 2, 3, 4, 5]
-    print(process_data(data))
-
-    # Inefficient random number generation
-    random_numbers = generate_random_numbers(1000000)
-    print(f"Generated {len(random_numbers)} random numbers")
-
-    # Potential division by zero
-    empty_list = []
-    print(calculate_average(empty_list))
-
-    # Unnecessary time delay
-    time.sleep(5)
-    print("Finished processing after 5 seconds")
-
-if __name__ == "__main__":
-    main()
Author	SHA1	Message	Date
Donghee Na	737cf076a0	fix: Fix ImageContent schema to use proper default value (#234 )	2026-03-13 10:42:22 +08:00
Kane Zhu	6ae73c0c69	fix: merge additionalModelRequestFields instead of overwriting When both reasoning_effort and extra_body are provided, additionalModelRequestFields set by reasoning_effort (containing reasoning_config) was silently overwritten by extra_body processing. This prevented features like anthropic_beta for 1M context from coexisting with reasoning_effort.	2026-03-10 16:41:52 +08:00
Donghee Na	d1dc4ed164	fix: Support reasoning_tokens at bedrock streaming response (#223 )	2026-02-26 11:48:05 +08:00
Gabriel Koo	d14596ff47	feat: add Amazon Nova 2 multimodal embeddings support (#222 ) * feat: add Amazon Nova 2 multimodal embeddings support Adds support for `amazon.nova-2-multimodal-embeddings-v1:0` via the new `NovaEmbeddingsModel` class, using the `taskType`/`singleEmbeddingParams` request format documented in the Nova 2 user guide. - Supports single and batch text inputs - Respects the `dimensions` parameter (256/512/1024/2048/3072, default 3072) - Supports `float` and `base64` encoding formats - Includes `test_nova_embed.py` for quick end-to-end verification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: remove test script from repo Test script moved to PR description instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: validate Nova embedding dimensions and fix falsy-zero bug - Add VALID_DIMENSIONS set and upfront validation with a clear error message - Fix `dimensions or DEFAULT` which would incorrectly ignore dimensions=0 - Add inline comment explaining approximate token counting (Nova API does not return token counts in the response) * fix: address PR review comments for NovaEmbeddingsModel - Fix VALID_DIMENSIONS to {256, 384, 1024, 3072} per Nova embeddings schema docs (previous values 512/2048 were mistakenly referenced from Titan embedding model docs) - Replace str(item) fallback with HTTPException(400) to avoid silent garbage embeddings - Update schema.py dimensions comment: 'not used' -> 'Used by Nova embeddings' - Replace getattr() with direct .dimensions access on Pydantic model - Move dimension validation before the loop (validates once, not per-text) - Add enumerate to batch loop; include input index in error detail - Switch isinstance(item, Iterable) to isinstance(item, list) for precise matching - Add comment explaining embeddingPurpose hardcoded to GENERIC_INDEX --------- Co-authored-by: Gabriel <gabrielkoo@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 11:41:17 +08:00
mjkam	a1844f95d4	Preload tiktoken encoding in Dockerfile (Lambda) (#220 ) PR #193 added tiktoken preloading to Dockerfile_ecs but the same fix was not applied to the Lambda Dockerfile. This causes a ConnectTimeout error in network-restricted environments (e.g. Lambda in VPC without NAT Gateway) when tiktoken tries to download cl100k_base encoding at runtime from openaipublic.blob.core.windows.net. Cache the encoding at build time, consistent with Dockerfile_ecs. Related to #118	2026-02-19 17:00:05 +08:00
Hooman Yar	a150f7bb1c	fix: support continue response for claude opus 4.6 (#219 ) Co-authored-by: Hooman Yar <yarhooma@amazon.com>	2026-02-12 15:21:50 +08:00
Mengxin Zhu	9b3da3a5c8	fix(deps): update fastapi and starlette for CVE-2025-62727 (#216 ) Update dependencies to fix HIGH severity ReDoS vulnerability: - fastapi==0.128.0 - starlette==0.49.1 CVE-2025-62727 allows unauthenticated attackers to send crafted HTTP Range headers that trigger quadratic-time processing in FileResponse Range parsing, causing CPU exhaustion and DoS. Fixes #215	2026-01-19 11:57:01 +08:00
Angélica de Oliveira	1a7f55b89b	Add support for 'developer' role in chat messages (#209 )	2025-12-09 11:26:10 +08:00
Mengxin Zhu	b41633b826	feat(apigw): add API Gateway response streaming support (#207 ) Replace ALB + Lambda architecture with API Gateway REST API + Lambda using response streaming for SSE support. This provides: - No VPC required, reducing complexity and cost - Native streaming support via API Gateway response streaming - Pay-per-request pricing model Changes: - Add Lambda Web Adapter to Dockerfile for streaming support - Replace BedrockProxy.template with API Gateway configuration - Update README with new deployment options and latest models - Update architecture diagram for API Gateway flow	2025-12-05 10:54:13 +08:00
Hooman Yar	0411454b3a	feat: add claude-opus-4-5 to TEMPERATURE_TOPP_CONFLICT_MODELS set (#208 ) Co-authored-by: Hooman Yar <yarhooma@amazon.com>	2025-12-05 09:22:37 +08:00
Kane Zhu	2c518bbd70	fix(docker): add --provenance=false --sbom=false for Lambda compatibility Docker BuildKit (especially with docker-container driver) may create OCI image manifests with attestations that AWS Lambda does not support. Lambda requires Docker V2 Schema 2 format without multi-manifest index. This fix ensures the build script generates Lambda-compatible images regardless of the user's Docker/BuildKit configuration. Fixes #206	2025-11-27 18:54:58 +08:00
Justin Dray	37374e79ba	fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually (#202 ) * fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually * Add docker-compose to support running locally	2025-11-20 14:33:43 +08:00
Viktor Isaev	b3c1c82367	Fix healthcheck in Dockerfile_ecs (#199 ) The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.	2025-11-20 14:30:00 +08:00
user-error1	ce4cfabb21	Fixed <think> </think> tags for GPT-OSS in bedrock.py (#200 ) Added handling for message and content block deltas, including safety checks for open thinking tags. Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.	2025-11-20 14:29:20 +08:00
Donghee Na	7e03ab062d	fix: Fix invalid cache_creation_tokens metric key (#195 )	2025-10-27 14:31:21 +08:00
Shion Ichikawa	18b68bd3a7	🐳 preload tiktoken encoding in Dockerfile_ecs (#193 )	2025-10-22 22:28:40 +08:00
Kane Zhu	d86e64eed3	refactor(bedrock): unify inference profile metadata handling and cleanup - Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles - Remove unused region prefix functions and defaultdict import - Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts - Improve model ARN parsing and error handling in profile enumeration - Consolidate profile metadata storage to enable consistent feature detection	2025-10-16 15:24:02 +08:00
Kane Zhu	b4800c54a0	feat: add prompt caching support for Claude and Nova models Add comprehensive prompt caching support with flexible control options: Features: - ENV variable control (ENABLE_PROMPT_CACHING, default: false) - Per-request control via extra_body.prompt_caching - Pattern-based model detection (Claude, Nova) - Token limit warnings (Nova 20K limit) - OpenAI-compatible response format (prompt_tokens_details.cached_tokens) Supported models: - Claude 3+ models (anthropic.claude-) - Nova models (amazon.nova-) - Auto-detection prevents breaking unsupported models Implementation: - System prompts caching via extra_body.prompt_caching.system - Messages caching via extra_body.prompt_caching.messages - Non-streaming and streaming modes - Compatible with reasoning, thinking, and tool calls	2025-10-15 11:03:19 +08:00
Scott Baxter	7756532b4c	fix: ECS container /health endpoint does not require API_KEY Bearer Token (#184 )	2025-10-13 11:59:42 +08:00
Li Yi	9cea7f9314	chore: polish code with little update (#182 ) - Run Docker container as non-root user (appuser) to minimize security risks - Add Docker HEALTHCHECK for better container orchestration - Make CORS configurable via ALLOWED_ORIGINS env var with security warning - Replace assertions with proper error handling (TypeError/ValueError) - Add 30s timeout to HTTP requests to prevent hanging connections - Disable auto-reload in production uvicorn settings	2025-10-11 14:49:18 +08:00
Fabian Franz	8177876e5e	Support <think> tags (#117 )	2025-09-30 20:29:19 +08:00
Neil Mazumdar	66cb51bb36	feat: add Claude Sonnet 4.5 support with global cross-region inference (#180 ) This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929), Anthropic's most intelligent model with enhanced coding capabilities and complex agent support. Changes: - Added global cross-region inference profile discovery (global.anthropic.*) - Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously) - Fixed reasoning_effort parameter handling to prevent KeyError - Added extended thinking/interleaved thinking support via extra_body parameter - Updated documentation with Claude Sonnet 4.5 examples (English and Chinese) - Updated README with Sonnet 4.5 announcement Technical Details: - src/api/models/bedrock.py: Added global profile support in list_bedrock_models() - src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter - src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError - docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples - docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples - docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0	2025-09-30 16:51:26 +08:00
Mengxin Zhu	371d11d101	chore: cleanup useless files	2025-09-30 16:08:56 +08:00
Mengxin Zhu	e3ee9a707f	docs: update deployment instructions and enhance ECR push script	2025-09-30 16:06:21 +08:00
Divyateja Pasupuleti	bdfa57c277	chore: update requirements to fix vulnerability (#177 ) * chore: update requirements to fix vulnerability * Update Python base image to version 3.13-slim	2025-09-19 16:15:32 +08:00
jbrockett	911dfe26d6	models: fix Application Inference Profiles mapping (#175 ) * models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs * Fix rebase issue --------- Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>	2025-08-14 15:21:14 +08:00
RizviR	a2110ff648	Add pagination to list_inference_profiles calls (#173 ) Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>	2025-08-13 10:26:34 +08:00
Fabian Franz	0cce2edab0	feat: update boto3 to version 1.40.4 (#169 ) Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-08-13 10:23:30 +08:00
heisenbergye	3f1b56a526	feat: support Claude 4 Interleaved thinking (beta) (#164 )	2025-07-21 16:44:21 +08:00
Mengxin Zhu	76a3614f17	fix: properly handle tool_use messages in conversation	2025-06-30 00:14:26 +08:00
Gagan M	01836087b1	feat: add support to include application inference profiles as models (#131 ) --------- Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>	2025-06-23 22:49:27 +08:00
dependabot[bot]	dd191d7cd9	Bump requests from 2.32.3 to 2.32.4 in /src (#151 ) Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4) --- updated-dependencies: - dependency-name: requests dependency-version: 2.32.4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-06-20 17:50:19 +08:00
Zack Elias	844efec086	add titan G1 embeddings (#152 )	2025-06-17 11:09:22 +08:00
UniMa007	aed57307bc	Add Titan Embeddings G2 (#94 )	2025-05-27 21:52:15 +08:00
Aiden Dai	4e8a913e43	fix empty content issue	2025-04-20 09:21:47 +08:00
Aiden Dai	b27e83624f	fix typo	2025-03-26 13:10:07 +08:00
Aiden Dai	c98e123c8f	optimize error response in streaming	2025-03-26 11:32:39 +08:00
Aiden Dai	4f1a75b49f	fix potential process stuck issue	2025-03-22 18:39:08 +08:00
Aiden Dai	0ead770069	performance improvement	2025-03-13 18:24:08 +08:00
Aiden Dai	fa14ae8c05	apply ruff linter	2025-03-13 14:24:41 +08:00
Aiden Dai	879b8e2ac7	apply ruff linter	2025-03-13 13:58:18 +08:00
Aiden Dai	f21b9a2e84	apply ruff linter	2025-03-13 13:50:57 +08:00
Aiden Dai	33e8fcfd3b	fix potential bad request issue	2025-03-13 07:16:42 +08:00
Aiden Dai	5ff18c0acd	Update usage guide for deepseek-r1	2025-03-11 10:25:50 +08:00
Aiden Dai	fcbfa9fe3d	Update usage guide for deepseek-r1	2025-03-11 10:24:19 +08:00
Aiden Dai	1a9c0f461e	Update usage guide for deepseek-r1	2025-03-11 10:14:06 +08:00
Aiden Dai	66b8967d30	Update usage guide for deepseek-r1	2025-03-11 10:10:58 +08:00
Zhongsheng Ji	fcfebf9d9d	feat: Response 429 if ThrottlingException (#91 )	2025-03-10 09:01:33 +08:00
Aiden Dai	283115000a	Support of reasoning	2025-02-28 08:08:54 +08:00
Aiden Dai	4095c2e74e	Support of reasoning	2025-02-26 13:28:23 +08:00
Aiden Dai	a46e329c97	Support of reasoning	2025-02-26 12:25:38 +08:00
Omri Shaiko	54f4a2b017	Fix issue with toolResult error with Cursor. Use default DEFAULT_MODEL in ChatRequest (#110 )	2025-02-26 10:43:44 +08:00
Aiden Dai	3ce47ff278	Partial support of reasoning	2025-02-25 16:23:06 +08:00
Sean Smith	b26ee3e9ea	Added troubleshooting guide and made buttons cool (#96 ) Signed-off-by: Sean Smith <sean.smith@contextual.ai>	2025-02-11 12:40:27 +08:00
Aiden Dai	1cb8a6a603	Update readme	2025-02-10 15:48:34 +08:00
Aiden Dai	c39f6bc942	Use secrets manager for api key	2025-02-10 15:25:12 +08:00
Aiden Dai	74ca3b938e	Update architecture diagram	2025-02-10 10:02:43 +08:00
Aiden Dai	a6f3e1176b	fix secret access issue	2025-02-09 06:53:23 +08:00
Aiden Dai	4d88731233	Use secrets manager for api key	2025-02-08 21:36:59 +08:00
Sean Smith	48bf360456	Security Guide (#101 ) Signed-off-by: Sean Smith <sean.smith@contextual.ai>	2025-02-08 11:40:24 +08:00
yytdfc	093c6fa586	add stop parameter (#86 )	2024-12-31 11:15:24 +08:00
Aiden Dai	b2c187c716	Increase connect timeout	2024-12-19 16:45:18 +08:00
Aiden Dai	581638b794	Update docs	2024-12-17 17:38:21 +08:00
Aiden Dai	51bc727b38	Use readme	2024-12-16 17:11:54 +08:00
Aiden Dai	dc067affc0	Use yaml template	2024-12-16 16:33:37 +08:00
Aiden Dai	29621ae59c	Automatically detect model list	2024-12-16 16:15:09 +08:00
Aiden Dai	d4938a0af2	Automatically detect model list	2024-12-16 16:01:59 +08:00
Attila Szucs	cb38d328aa	Add environment variable for PORT (#47 ) * Customizable port * Fix CMD	2024-12-16 10:00:17 +08:00
Fabio Nonato	4fc0d3bc94	Image error fix (#80 ) --------- Co-authored-by: Fabio Nonato <fnp@amazon.com>	2024-12-11 11:26:51 +08:00
Hans Knecht	241d5c0f3e	feat: allow the use of an ENV variable to set the API key if the ParameterStore isn't used. (#40 )	2024-12-06 14:32:06 +08:00
Fabian Fischer	25b3cfb146	feat: add amazon nova inference profiles in us (#79 )	2024-12-06 13:52:50 +08:00
mschfh	17503b032a	Add cross-region inference profiles for Llama 3.2 models. (#75 )	2024-12-05 11:22:11 +08:00
bkocik	6849ca828a	Add cross-region inference profiles for Llama 3.1 models. (#72 )	2024-11-20 09:57:35 +08:00
KAEYL98	11a31b5584	feat: add support for APAC claude 3 profiles (#69 )	2024-11-07 16:43:15 +08:00
heisenbergye	5f7676608a	suppot all Claude models Cross-Region Inference (#65 )	2024-10-29 14:43:31 +08:00
Meng Xin Zhu	9cc3ea8253	chore: publish templates to s3 in release workflow (#64 )	2024-10-28 17:36:35 +08:00
Aaron Yi	8785c63ddf	fix: remove the code review pipeline until the access right can be grant to pull request from fork	2024-10-25 13:12:59 +08:00
yike5460	0afd0463e1	fix: add debugging info onto workflow	2024-10-25 02:33:26 +00:00
Sergei Mikhailov	3a97677b97	Added "new Claude 3.5 Sonnet" v2 model to the list (#60 )	2024-10-23 14:54:45 +08:00
yike5460	728ef6d8a6	fix: update workflow action to user var instead of secret	2024-10-10 06:24:04 +00:00
Mengxin Zhu	46fb759137	chore: use correct Dockerfile for building lambda image	2024-10-09 23:39:37 +08:00
Mengxin Zhu	326e566105	chore: use arm64 architecture image for lambda	2024-10-09 23:15:10 +08:00
Meng Xin Zhu	c1ee1b4244	chore: add automation script to release images (#58 )	2024-10-09 18:20:14 +08:00
yike5460	552578a0ee	fix: fix action dep issue	2024-10-09 08:30:19 +00:00
yike5460	d9590d6504	fix: place action file into the right folder	2024-10-09 08:22:14 +00:00