Compare commits

1 Commits
main ... dev

Author SHA1 Message Date
yike5460
5c5e370a81 test: folder with error file for code review and pr description 2024-10-09 08:20:13 +00:00
34 changed files with 2765 additions and 2927 deletions

19
.flake8 Normal file
View File

@@ -0,0 +1,19 @@
[flake8]
max-line-length = 120
ignore =
E203,W191,W503
exclude =
build
.git
__pycache__
.tox
venv
.venv
.venv-test
tmp*
deployment
cdk.out
node_modules
max-complexity = 10
require-code = True

74
.github/aws-genai-cicd-suite.yml vendored Normal file
View File

@@ -0,0 +1,74 @@
name: Intelligent Code Review
# Enable manual trigger
on:
workflow_dispatch:
pull_request:
types: [opened, synchronize]
# Avoid running the same workflow on the same branch concurrently
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
jobs:
review:
runs-on: ubuntu-latest
permissions:
# read repository contents and write pull request comments
id-token: write
# allow github action bot to push new content into existing pull requests
contents: write
# contents: read
pull-requests: write
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: '20'
- name: Install dependencies
run: npm ci
shell: bash
# check if required dependencies @actions/core and @actions/github are installed
- name: Check if required dependencies are installed
run: |
npm list @actions/core
npm list @actions/github
shell: bash
- name: Debug GitHub Token
run: |
if [ -n "${{ secrets.GITHUB_TOKEN }}" ]; then
echo "GitHub Token is set"
else
echo "GitHub Token is not set"
fi
# assume the specified IAM role and set up the AWS credentials for use in subsequent steps.
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
# using repository secret to get the role arn
role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
aws-region: us-east-1
- name: Intelligent GitHub Actions
uses: aws-samples/aws-genai-cicd-suite@stable
with:
# Automatic Provision: The GITHUB_TOKEN is automatically created and provided by GitHub for each workflow run. You don't need to manually create or store this token as a secret.
github-token: ${{ secrets.GITHUB_TOKEN }}
aws-region: us-east-1
model-id: anthropic.claude-3-sonnet-20240229-v1:0
generate-code-review: 'true'
generate-code-review-level: 'detailed'
generate-code-review-exclude-files: '*.md,*.json,*.js'
generate-pr-description: 'true'
generate-unit-test: 'false'
generate-unit-test-source-folder: 'debugging'
# Removed the invalid input 'generate-unit-test-exclude-files'
# output-language: 'zh'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

1
.gitignore vendored
View File

@@ -160,4 +160,3 @@ cython_debug/
.idea/ .idea/
Config Config
.vscode/launch.json

View File

@@ -1,10 +0,0 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.9.10
hooks:
# Run the linter.
- id: ruff
types_or: [python, pyi]
# Run the formatter.
- id: ruff-format

309
README.md
View File

@@ -1,19 +1,15 @@
[中文](./README_CN.md)
# Bedrock Access Gateway # Bedrock Access Gateway
OpenAI-compatible RESTful APIs for Amazon Bedrock OpenAI-compatible RESTful APIs for Amazon Bedrock
## What's New 🔥 ## Breaking Changes
**API Gateway Response Streaming Support** - You can now deploy with Amazon API Gateway REST API instead of ALB, enabling true response streaming for better latency and cost optimization. See [Deployment Options](#deployment-options) for details. The source code is refactored with the new [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) by bedrock which provides native support with tool calls.
**Latest Models Supported:** If you are facing any problems, please raise an issue.
- **Claude 4.5 Family**: Opus 4.5, Sonnet 4.5, Haiku 4.5 - Anthropic's most intelligent models with enhanced coding and agent capabilities
- **Amazon Nova**: Nova Micro, Nova Lite, Nova Pro, Nova Premier - Amazon's native foundation models with multimodal support
- **DeepSeek**: DeepSeek-R1 (reasoning), DeepSeek-V3.1 - Advanced reasoning and general-purpose models
- **Qwen 3**: Qwen3-32B, Qwen3-235B, Qwen3-Coder-30B, Qwen3-Coder-480B - Alibaba's latest language and coding models
- **OpenAI OSS**: gpt-oss-20b, gpt-oss-120b - Open-source GPT models available via Bedrock
It also supports reasoning for **Claude 4/4.5** (extended thinking and interleaved thinking) and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
## Overview ## Overview
@@ -29,17 +25,25 @@ If you find this GitHub repository useful, please consider giving it a free star
- [x] Support streaming response via server-sent events (SSE) - [x] Support streaming response via server-sent events (SSE)
- [x] Support Model APIs - [x] Support Model APIs
- [x] Support Chat Completion APIs - [x] Support Chat Completion APIs
- [x] Support Tool Call - [x] Support Tool Call (**new**)
- [x] Support Embedding API - [x] Support Embedding API (**new**)
- [x] Support Multimodal API - [x] Support Multimodal API (**new**)
- [x] Support Cross-Region Inference
- [x] Support Application Inference Profiles (**new**)
- [x] Support Reasoning (**new**)
- [x] Support Interleaved thinking (**new**)
- [x] Support Prompt Caching (**new**)
Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs. Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.
> **Note:** The legacy [text completion](https://platform.openai.com/docs/api-reference/completions) API is not supported, you should change to use chat completion API.
Supported Amazon Bedrock models family:
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding
You can call the `models` API to get the full list of model IDs supported.
> **Note:** The default model is set to `anthropic.claude-3-sonnet-20240229-v1:0` which can be changed via Lambda environment variables (`DEFAULT_MODEL`).
## Get Started ## Get Started
@@ -53,100 +57,58 @@ Please make sure you have met below prerequisites:
### Architecture ### Architecture
The following diagram illustrates the reference architecture. It uses [Amazon API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with Lambda for SSE support. The following diagram illustrates the reference architecture. Note that it also includes a new **VPC** with two public subnets only for the Application Load Balancer (ALB).
![Architecture](assets/arch.png) ![Architecture](assets/arch.svg)
### Deployment Options You can also choose to use [AWS Fargate](https://aws.amazon.com/fargate/) behind the ALB instead of [AWS Lambda](https://aws.amazon.com/lambda/), the main difference is the latency of the first byte for streaming response (Fargate is lower).
| Option | Pros | Cons | Best For | Alternatively, you can use Lambda Function URL to replace ALB, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
|--------|------|------|----------|
| **API Gateway + Lambda** | No VPC required, pay-per-request, native streaming support, lower operational overhead | Potential cold starts | Most use cases, cost-sensitive deployments |
| **ALB + Fargate** | Lowest streaming latency, no cold starts | Higher cost, requires VPC | High-throughput, latency-sensitive workloads |
You can also use Lambda Function URL as an alternative, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
### Deployment ### Deployment
Please follow the steps below to deploy the Bedrock Proxy APIs into your AWS account. Only supports regions where Amazon Bedrock is available (such as `us-west-2`). The deployment will take approximately **10-15 minutes** 🕒. Please follow the steps below to deploy the Bedrock Proxy APIs into your AWS account. Only supports regions where Amazon Bedrock is available (such as `us-west-2`). The deployment will take approximately **3-5 minutes** 🕒.
**Step 1: Create your own API key in Secrets Manager (MUST)** **Step 1: Create your own custom API key (Optional)**
> **Note:** This step is to use any string (without spaces) you like to create a custom API Key (credential) that will be used to access the proxy API later. This key does not have to match your actual OpenAI key, and you don't need to have an OpenAI API key. please keep the key safe and private. > **Note:** This step is to use any string (without spaces) you like to create a custom API Key (credential) that will be used to access the proxy API later. This key does not have to match your actual OpenAI key, and you don't need to have an OpenAI API key. It is recommended that you take this step and ensure that you keep the key safe and private.
1. Open the AWS Management Console and navigate to the AWS Secrets Manager service. 1. Open the AWS Management Console and navigate to the Systems Manager service.
2. Click on "Store a new secret" button. 2. In the left-hand navigation pane, click on "Parameter Store".
3. In the "Choose secret type" page, select: 3. Click on the "Create parameter" button.
4. In the "Create parameter" window, select the following options:
- Name: Enter a descriptive name for your parameter (e.g., "BedrockProxyAPIKey").
- Description: Optionally, provide a description for the parameter.
- Tier: Select **Standard**.
- Type: Select **SecureString**.
- Value: Any string (without spaces).
5. Click "Create parameter".
6. Make a note of the parameter name you used (e.g., "BedrockProxyAPIKey"). You'll need this in the next step.
Secret type: Other type of secret **Step 2: Deploy the CloudFormation stack**
Key/value pairs:
- Key: api_key
- Value: Enter your API key value
Click "Next" 1. Sign in to AWS Management Console, switch to the region to deploy the CloudFormation Stack to.
4. In the "Configure secret" page: 2. Click the following button to launch the CloudFormation Stack in that region. Choose one of the following:
Secret name: Enter a name (e.g., "BedrockProxyAPIKey") - **ALB + Lambda**
Description: (Optional) Add a description of your secret
5. Click "Next" and review all your settings and click "Store"
After creation, you'll see your secret in the Secrets Manager console. Make note of the secret ARN. [![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxy.template)
- **ALB + Fargate**
**Step 2: Build and push container images to ECR** [![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxyFargate.template)
3. Click "Next".
1. Clone this repository: 4. On the "Specify stack details" page, provide the following information:
```bash - Stack name: Change the stack name if needed.
git clone https://github.com/aws-samples/bedrock-access-gateway.git - ApiKeyParam (if you set up an API key in Step 1): Enter the parameter name you used for storing the API key (e.g., `BedrockProxyAPIKey`). If you did not set up an API key, leave this field blank. Click "Next".
cd bedrock-access-gateway 5. On the "Configure stack options" page, you can leave the default settings or customize them according to your needs.
``` 6. Click "Next".
7. On the "Review" page, review the details of the stack you're about to create. Check the "I acknowledge that AWS CloudFormation might create IAM resources" checkbox at the bottom.
2. Run the build and push script: 8. Click "Create stack".
```bash
cd scripts
bash ./push-to-ecr.sh
```
3. Follow the prompts to configure:
- ECR repository names (or use defaults)
- Image tag (or use default: `latest`)
- AWS region (or use default: `us-east-1`)
4. The script will build and push both Lambda and ECS/Fargate images to your ECR repositories.
5. **Important**: Copy the image URIs displayed at the end of the script output. You'll need these in the next step.
**Step 3: Deploy the CloudFormation stack**
1. Download the CloudFormation template you want to use:
- For API Gateway + Lambda: [`deployment/BedrockProxy.template`](deployment/BedrockProxy.template)
- For ALB + Fargate: [`deployment/BedrockProxyFargate.template`](deployment/BedrockProxyFargate.template)
2. Sign in to AWS Management Console and navigate to the CloudFormation service in your target region.
3. Click "Create stack" → "With new resources (standard)".
4. Upload the template file you downloaded.
5. On the "Specify stack details" page, provide the following information:
- **Stack name**: Enter a stack name (e.g., "BedrockProxyAPI")
- **ApiKeySecretArn**: Enter the secret ARN from Step 1
- **ContainerImageUri**: Enter the ECR image URI from Step 2 output
- **DefaultModelId**: (Optional) Change the default model if needed
Click "Next".
6. On the "Configure stack options" page, you can leave the default settings or customize them according to your needs. Click "Next".
7. On the "Review" page, review all details. Check the "I acknowledge that AWS CloudFormation might create IAM resources" checkbox at the bottom. Click "Submit".
That is it! 🎉 Once deployed, click the CloudFormation stack and go to **Outputs** tab, you can find the API Base URL from `APIBaseUrl`, the value should look like `http://xxxx.xxx.elb.amazonaws.com/api/v1`. That is it! 🎉 Once deployed, click the CloudFormation stack and go to **Outputs** tab, you can find the API Base URL from `APIBaseUrl`, the value should look like `http://xxxx.xxx.elb.amazonaws.com/api/v1`.
### Troubleshooting
If you encounter any issues, please check the [Troubleshooting Guide](./docs/Troubleshooting.md) for more details.
### SDK/API Usage ### SDK/API Usage
All you need is the API Key and the API Base URL. If you didn't set up your own key following Step 1, the application will fail to start with an error message indicating that the API Key is not configured. All you need is the API Key and the API Base URL. If you didn't set up your own key, then the default API Key (`bedrock`) will be used.
Now, you can try out the proxy APIs. Let's say you want to test Claude 3 Sonnet model (model ID: `anthropic.claude-3-sonnet-20240229-v1:0`)... Now, you can try out the proxy APIs. Let's say you want to test Claude 3 Sonnet model (model ID: `anthropic.claude-3-sonnet-20240229-v1:0`)...
@@ -191,123 +153,14 @@ print(completion.choices[0].message.content)
Please check [Usage Guide](./docs/Usage.md) for more details about how to use embedding API, multimodal API and tool call. Please check [Usage Guide](./docs/Usage.md) for more details about how to use embedding API, multimodal API and tool call.
### Application Inference Profiles
This proxy now supports **Application Inference Profiles**, which allow you to track usage and costs for your model invocations. You can use application inference profiles created in your AWS account for cost tracking and monitoring purposes.
**Using Application Inference Profiles:**
```bash
# Use an application inference profile ARN as the model ID
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "arn:aws:bedrock:us-west-2:123456789012:application-inference-profile/your-profile-id",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
```
**SDK Usage with Application Inference Profiles:**
```python
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="arn:aws:bedrock:us-west-2:123456789012:application-inference-profile/your-profile-id",
messages=[{"role": "user", "content": "Hello!"}],
)
print(completion.choices[0].message.content)
```
**Benefits of Application Inference Profiles:**
- **Cost Tracking**: Track usage and costs for specific applications or use cases
- **Usage Monitoring**: Monitor model invocation metrics through CloudWatch
- **Tag-based Cost Allocation**: Use AWS cost allocation tags for detailed billing analysis
For more information about creating and managing application inference profiles, see the [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-create.html).
### Prompt Caching
This proxy now supports **Prompt Caching** for Claude and Nova models, which can reduce costs by up to 90% and latency by up to 85% for workloads with repeated prompts.
**Supported Models:**
- Claude models (Claude 3.5 Haiku, Claude 4, Claude 4.5, etc.)
- Nova models (Nova Micro, Nova Lite, Nova Pro, Nova Premier)
**Enabling Prompt Caching:**
You can enable prompt caching in two ways:
1. **Globally via Environment Variable** (set in ECS Task Definition or Lambda):
```bash
ENABLE_PROMPT_CACHING=true
```
2. **Per-request via `extra_body`** :
**Python SDK:**
```python
from openai import OpenAI
client = OpenAI()
# Cache system prompts
response = client.chat.completions.create(
model="global.anthropic.claude-haiku-4-5-20251001-v1:0",
messages=[
{"role": "system", "content": "You are an expert assistant with knowledge of..."},
{"role": "user", "content": "Help me with this task"}
],
extra_body={
"prompt_caching": {"system": True}
}
)
# Check cache hit
if response.usage.prompt_tokens_details:
cached_tokens = response.usage.prompt_tokens_details.cached_tokens
print(f"Cached tokens: {cached_tokens}")
```
**cURL:**
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "global.anthropic.claude-haiku-4-5-20251001-v1:0",
"messages": [
{"role": "system", "content": "Long system prompt..."},
{"role": "user", "content": "Question"}
],
"extra_body": {
"prompt_caching": {"system": true}
}
}'
```
**Cache Options:**
- `"prompt_caching": {"system": true}` - Cache system prompts
- `"prompt_caching": {"messages": true}` - Cache user messages
- `"prompt_caching": {"system": true, "messages": true}` - Cache both
**Requirements:**
- Prompt must be ≥1,024 tokens to enable caching
- Cache TTL is 5 minutes (resets on each cache hit)
- Nova models have a 20,000 token caching limit
For more information, see the [Amazon Bedrock Prompt Caching Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html).
## Other Examples ## Other Examples
### AutoGen
Below is an image of setting up the model in AutoGen studio.
![AutoGen Model](assets/autogen-model.png)
### LangChain ### LangChain
Make sure you use `ChatOpenAI(...)` instead of `OpenAI(...)` Make sure you use `ChatOpenAI(...)` instead of `OpenAI(...)`
@@ -346,37 +199,43 @@ print(response)
This application does not collect any of your data. Furthermore, it does not log any requests or responses by default. This application does not collect any of your data. Furthermore, it does not log any requests or responses by default.
### Why choose API Gateway vs ALB? ### Why not used API Gateway instead of Application Load Balancer?
**API Gateway + Lambda** uses [API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with [Lambda Web Adapter](https://github.com/awslabs/aws-lambda-web-adapter) to support SSE streaming without requiring a VPC. This is a cost-effective, serverless option with up to 10 minutes timeout. Short answer is that API Gateway does not support server-sent events (SSE) for streaming response.
**ALB + Fargate** provides the lowest streaming latency with no cold starts, ideal for high-throughput workloads.
### Which regions are supported? ### Which regions are supported?
This solution only supports the regions where Amazon Bedrock is available, as for now, below are the list.
- US East (N. Virginia): us-east-1
- US West (Oregon): us-west-2
- Asia Pacific (Singapore): ap-southeast-1
- Asia Pacific (Sydney): ap-southeast-2
- Asia Pacific (Tokyo): ap-northeast-1
- Europe (Frankfurt): eu-central-1
- Europe (Paris): eu-west-3
Generally speaking, all regions that Amazon Bedrock supports will also be supported, if not, please raise an issue in Github. Generally speaking, all regions that Amazon Bedrock supports will also be supported, if not, please raise an issue in Github.
Note that not all models are available in those regions. Note that not all models are available in those regions.
### Which models are supported? ### Can I build and use my own ECR image
You can use the [Models API](./docs/Usage.md#models-api) to get/refresh a list of supported models in the current region. Yes, you can clone the repo and build the container image by yourself (`src/Dockerfile`) and then push to your ECR repo. You can use `scripts/push-to-ecr.sh`
Replace the repo url in the CloudFormation template before you deploy.
### Can I run this locally ### Can I run this locally
Yes, you can run this locally, e.g. run below command under `src` folder: Yes, you can run this locally.
```bash
uvicorn api.app:app --host 0.0.0.0 --port 8000
```
The API base url should look like `http://localhost:8000/api/v1`. The API base url should look like `http://localhost:8000/api/v1`.
### Any performance sacrifice or latency increase by using the proxy APIs ### Any performance sacrifice or latency increase by using the proxy APIs
Compared with direct AWS SDK calls, the proxy architecture will add some latency. The default API Gateway + Lambda deployment provides good streaming performance with Lambda response streaming. Comparing with the AWS SDK call, the referenced architecture will bring additional latency on response, you can try and test that on you own.
For lowest latency on streaming responses, consider the ALB + Fargate deployment option which eliminates cold starts and provides consistent performance. Also, you can use Lambda Web Adapter + Function URL (see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.
### Any plan to support SageMaker models? ### Any plan to support SageMaker models?
@@ -388,7 +247,13 @@ Fine-tuned models and models with Provisioned Throughput are currently not suppo
### How to upgrade? ### How to upgrade?
To use the latest features, you need follow the deployment guide to redeploy the application. You can upgrade the existing CloudFormation stack to get the latest changes. To use the latest features, you don't need to redeploy the CloudFormation stack. You simply need to pull the latest image.
To do so, depends on which version you deployed:
- **Lambda version**: Go to AWS Lambda console, find the Lambda function, then find and click the `Deploy new image` button and click save.
- **Fargate version**: Go to ECS console, click the ECS cluster, go the `Tasks` tab, select the only task that is running and simply click `Stop selected` menu. A new task with latest image will start automatically.
## Security ## Security

267
README_CN.md Normal file
View File

@@ -0,0 +1,267 @@
[English](./README.md)
# Bedrock Access Gateway
使用兼容OpenAI的API访问Amazon Bedrock
## 重大变更
项目源代码已使用Bedrock提供的新 [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) 进行了重构,该API对工具调用提供了原生支持。
如果您遇到任何问题,请提 Github Issue。
## 概述
Amazon Bedrock提供了广泛的基础模型(如Claude 3 Opus/Sonnet/Haiku、Llama 2/3、Mistral/Mixtral等),以及构建生成式AI应用程序的多种功能。更多详细信息,请查看[Amazon
Bedrock](https://aws.amazon.com/bedrock)。
有时,您可能已经使用OpenAI的API或SDK构建了应用程序,并希望在不修改代码的情况下试用Amazon
Bedrock的模型。或者,您可能只是希望在AutoGen等工具中评估这些基础模型的功能。 好消息是, 这里提供了一种方便的途径,让您可以使用
OpenAI 的 API 或 SDK 无缝集成并试用 Amazon Bedrock 的模型,而无需对现有代码进行修改。
如果您觉得这个项目有用,请考虑给它点个一个免费的小星星 ⭐。
功能列表:
- [x] 支持 server-sent events (SSE)的流式响应
- [x] 支持 Model APIs
- [x] 支持 Chat Completion APIs
- [x] 支持 Tool Call (**new**)
- [x] 支持 Embedding API (**new**)
- [x] 支持 Multimodal API (**new**)
请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用新API的更多详细信息。
> 注意: 不支持旧的 [text completion](https://platform.openai.com/docs/api-reference/completions) API请更改为使用Chat Completion API。
支持的Amazon Bedrock模型家族
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding
你可以先调用`models` API 获取支持的详细 model ID 列表。
> 注意: 默认模型为 `anthropic.claude-3-sonnet-20240229-v1:0` 可以通过更改Lambda环境变量进行更改。
## 使用指南
### 前提条件
请确保您已满足以下先决条件:
- 可以访问Amazon Bedrock基础模型。
如果您还没有获得模型访问权限,请参考[配置](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html)指南。
### 架构图
下图展示了本方案的参考架构。请注意,它还包括一个新的**VPC**,其中只有两个公共子网用于应用程序负载均衡器(ALB)。
![Architecture](assets/arch.svg)
您也可以选择在 ALB 后面接 [AWS Fargate](https://aws.amazon.com/fargate/) 而不是 [AWS Lambda](https://aws.amazon.com/lambda/)主要区别在于流响应的首字节延迟Fargate更低
或者,您可以使用 Lambda Function URL 来代替 ALB,请参阅[示例](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
### 部署
请按以下步骤将Bedrock代理API部署到您的AWS账户中。仅支持Amazon Bedrock可用的区域(如us-west-2)。 部署预计用时**3-5分钟** 🕒。
**第一步: 自定义您的API Key (可选)**
> 注意:这一步是使用任意字符串不带空格创建一个自定义的API Key(凭证),将用于后续访问代理API。此API Key不必与您实际的OpenAI
> Key一致,您甚至无需拥有OpenAI API Key。建议您执行此步操作并且请确保保管好此API Key。
1. 打开AWS管理控制台,导航到Systems Manager服务。
2. 在左侧导航窗格中,单击"参数存储"。
3. 单击"创建参数"按钮。
4. 在"创建参数"窗口中,选择以下选项:
- 名称:输入参数的描述性名称(例如"BedrockProxyAPIKey")。
- 描述:可选,为参数提供描述。
- 层级:选择**标准**。
- 类型:选择**SecureString**。
- 值: 随意字符串(不带空格)。
5. 单击"创建参数"。
6. 记录您使用的参数名称(例如"BedrockProxyAPIKey")。您将在下一步中需要它。
**第二步: 部署CloudFormation堆栈**
1. 登录AWS管理控制台,切换到要部署CloudFormation堆栈的区域。
2. 单击以下按钮在该区域启动CloudFormation堆栈选择一种方式部署。
- **ALB + Lambda**
[![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxy.template)
- **ALB + Fargate**
[![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxyFargate.template)
3. 单击"下一步"。
4. 在"指定堆栈详细信息"页面,提供以下信息:
- 堆栈名称: 可以根据需要更改名称。
- ApiKeyParam(如果在步骤1中设置了API Key):输入您用于存储API密钥的参数名称(例如"BedrockProxyAPIKey"),否则,请将此字段留空。
单击"下一步"。
5. 在"配置堆栈选项"页面,您可以保留默认设置或根据需要进行自定义。
6. 单击"下一步"。
7. 在"审核"页面,查看您即将创建的堆栈详细信息。勾选底部的"我确认AWS CloudFormation 可能创建 IAM 资源。"复选框。
8. 单击"创建堆栈"。
仅此而已 🎉 。部署完成后,点击CloudFormation堆栈,进入"输出"选项卡,你可以从"APIBaseUrl"
中找到API Base URL,它应该类似于`http://xxxx.xxx.elb.amazonaws.com/api/v1` 这样的格式。
### SDK/API使用
你只需要API Key和API Base URL。如果你没有设置自己的密钥,那么默认将使用API Key `bedrock`
现在,你可以尝试使用代理API了。假设你想测试Claude 3 Sonnet模型,那么使用"anthropic.claude-3-sonnet-20240229-v1:0"作为模型ID。
- **API 使用示例**
```bash
export OPENAI_API_KEY=<API key>
export OPENAI_BASE_URL=<API base url>
# 旧版本请使用OPENAI_API_BASE
# https://github.com/openai/openai-python/issues/624
export OPENAI_API_BASE=<API base url>
```
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
```
- **SDK 使用示例**
```python
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "Hello!"}],
)
print(completion.choices[0].message.content)
```
请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用Embedding API、多模态API和Tool Call的更多详细信息。
## 其他例子
### AutoGen
例如在AutoGen studio配置和使用模型
![AutoGen Model](assets/autogen-model.png)
### LangChain
请确保使用的示`ChatOpenAI(...)` ,而不是`OpenAI(...)`
```python
# pip install langchain-openai
import os
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
chat = ChatOpenAI(
model="anthropic.claude-3-sonnet-20240229-v1:0",
temperature=0,
openai_api_key=os.environ['OPENAI_API_KEY'],
openai_api_base=os.environ['OPENAI_BASE_URL'],
)
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
llm_chain = LLMChain(prompt=prompt, llm=chat)
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
response = llm_chain.invoke(question)
print(response)
```
## FAQs
### 关于隐私
这个方案不会收集您的任何数据。而且,它默认情况下也不会记录任何请求或响应。
### 为什么没有使用API Gateway 而是使用了Application Load Balancer?
简单的答案是API Gateway不支持 server-sent events (SSE) 用于流式响应。
### 支持哪些区域?
只支持Amazon Bedrock可用的区域, 截至当前,包括以下区域:
- 美国东部(弗吉尼亚北部)us-east-1
- 美国西部(俄勒冈州)us-west-2
- 亚太地区(新加坡)ap-southeast-1
- 亚太地区(悉尼)ap-southeast-2
- 亚太地区(东京)ap-northeast-1
- 欧洲(法兰克福)eu-central-1
- 欧洲(巴黎)eu-west-3
通常来说所有Amazon Bedrock支持的区域都支持如果不支持请提个Github Issue。
注意,并非所有模型都在上面区可用。
### 我可以构建并使用自己的ECR镜像吗?
是的,你可以克隆repo并自行构建容器镜像(src/Dockerfile),然后推送到你自己的ECR仓库。 脚本可以参考`scripts/push-to-ecr.sh`
在部署之前,请在CloudFormation模板中替换镜像仓库URL。
### 我可以在本地运行吗?
是的,你可以在本地运行,那么API Base URL应该类似于`http://localhost:8000/api/v1`
### 使用代理API会有任何性能牺牲或延迟增加吗?
与 AWS SDK 调用相比,本方案参考架构会在响应上会有额外的延迟,你可以自己部署并测试。
另外,你也可以使用 Lambda Web Adapter + Function URL (
参见 [示例](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming))来代替 ALB
或使用 AWS Fargate 来代替 Lambda,以获得更好的流响应性能。
### 有计划支持SageMaker模型吗?
目前没有支持SageMaker模型的计划。这取决于是否有客户需求。
### 有计划支持Bedrock自定义模型吗?
不支持微调模型和设置了已预配吞吐量的模型。如有需要,你可以克隆repo并进行自定义。
### 如何升级?
要使用最新功能,您无需重新部署CloudFormation堆栈。您只需拉取最新的镜像即可。
具体操作方式取决于您部署的版本:
- **Lambda版本**: 进入AWS Lambda控制台,找到Lambda 函数,然后找到并单击`部署新映像`按钮,然后单击保存。
- **Fargate版本**: 进入ECS控制台,单击ECS集群,转到`任务`选项卡,选择正在运行的唯一任务,然后点击`停止所选`菜单, ECS会自动启动新任务并且使用最新镜像。
## 安全
更多信息,请参阅[CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications)。
## 许可证
本项目根据MIT-0许可证获得许可。请参阅LICENSE文件。

View File

@@ -1,8 +0,0 @@
certifi
SPDX-License-Identifier: MPL-2.0
This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.
https://github.com/certifi/python-certifi

Binary file not shown.

Before

Width:  |  Height:  |  Size: 50 KiB

4
assets/arch.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 25 KiB

BIN
assets/autogen-agent.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 209 KiB

BIN
assets/autogen-model.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 212 KiB

BIN
assets/launch-stack.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 KiB

View File

@@ -1,178 +1,768 @@
Description: Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock (API Gateway + Lambda with Streaming) {
Parameters: "Description": "Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock",
ApiKeySecretArn: "Transform": "AWS::LanguageExtensions",
Type: String "Parameters": {
AllowedPattern: ^arn:aws:secretsmanager:.*$ "ApiKeyParam": {
Description: The secret ARN in Secrets Manager used to store the API Key "Type": "String",
ContainerImageUri: "Default": "",
Type: String "Description": "The parameter name in System Manager used to store the API Key, leave blank to use a default key"
Description: The ECR image URI for the Lambda function (e.g., 123456789012.dkr.ecr.us-east-1.amazonaws.com/bedrock-proxy-api:latest) }
DefaultModelId: },
Type: String "Resources": {
Default: anthropic.claude-3-sonnet-20240229-v1:0 "VPCB9E5F0B4": {
Description: The default model ID, please make sure the model ID is supported in the current region "Type": "AWS::EC2::VPC",
EnablePromptCaching: "Properties": {
Type: String "CidrBlock": "10.250.0.0/16",
Default: "false" "EnableDnsHostnames": true,
AllowedValues: "EnableDnsSupport": true,
- "true" "InstanceTenancy": "default",
- "false" "Tags": [
Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings. {
Resources: "Key": "Name",
# IAM Role for Lambda "Value": "BedrockProxy/VPC"
ProxyApiHandlerServiceRole: }
Type: AWS::IAM::Role ]
Properties: },
AssumeRolePolicyDocument: "Metadata": {
Statement: "aws:cdk:path": "BedrockProxy/VPC/Resource"
- Action: sts:AssumeRole }
Effect: Allow },
Principal: "VPCPublicSubnet1SubnetB4246D30": {
Service: lambda.amazonaws.com "Type": "AWS::EC2::Subnet",
Version: "2012-10-17" "Properties": {
ManagedPolicyArns: "AvailabilityZone": {
- !Sub "arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" "Fn::Select": [
0,
ProxyApiHandlerServiceRoleDefaultPolicy: {
Type: AWS::IAM::Policy "Fn::GetAZs": ""
Properties: }
PolicyDocument: ]
Statement: },
- Action: "CidrBlock": "10.250.0.0/24",
- bedrock:ListFoundationModels "MapPublicIpOnLaunch": true,
- bedrock:ListInferenceProfiles "Tags": [
Effect: Allow {
Resource: "*" "Key": "aws-cdk:subnet-name",
- Action: "Value": "Public"
- bedrock:InvokeModel },
- bedrock:InvokeModelWithResponseStream {
Effect: Allow "Key": "aws-cdk:subnet-type",
Resource: "Value": "Public"
- arn:aws:bedrock:*::foundation-model/* },
- arn:aws:bedrock:*:*:inference-profile/* {
- arn:aws:bedrock:*:*:application-inference-profile/* "Key": "Name",
- Action: "Value": "BedrockProxy/VPC/PublicSubnet1"
- secretsmanager:GetSecretValue }
- secretsmanager:DescribeSecret ],
Effect: Allow "VpcId": {
Resource: !Ref ApiKeySecretArn "Ref": "VPCB9E5F0B4"
Version: "2012-10-17" }
PolicyName: ProxyApiHandlerServiceRoleDefaultPolicy },
Roles: "Metadata": {
- !Ref ProxyApiHandlerServiceRole "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/Subnet"
}
# Lambda Function with Lambda Web Adapter for streaming },
ProxyApiHandler: "VPCPublicSubnet1RouteTableFEE4B781": {
Type: AWS::Lambda::Function "Type": "AWS::EC2::RouteTable",
Properties: "Properties": {
Architectures: "Tags": [
- arm64 {
Code: "Key": "Name",
ImageUri: !Ref ContainerImageUri "Value": "BedrockProxy/VPC/PublicSubnet1"
Description: Bedrock Proxy API Handler with Response Streaming }
Environment: ],
Variables: "VpcId": {
# Lambda Web Adapter settings "Ref": "VPCB9E5F0B4"
AWS_LWA_INVOKE_MODE: RESPONSE_STREAM }
AWS_LWA_READINESS_CHECK_PATH: /health },
AWS_LWA_ASYNC_INIT: "true" "Metadata": {
PORT: "8080" "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/RouteTable"
# Application settings }
DEBUG: "false" },
API_KEY_SECRET_ARN: !Ref ApiKeySecretArn "VPCPublicSubnet1RouteTableAssociation0B0896DC": {
DEFAULT_MODEL: !Ref DefaultModelId "Type": "AWS::EC2::SubnetRouteTableAssociation",
DEFAULT_EMBEDDING_MODEL: cohere.embed-multilingual-v3 "Properties": {
ENABLE_CROSS_REGION_INFERENCE: "true" "RouteTableId": {
ENABLE_APPLICATION_INFERENCE_PROFILES: "true" "Ref": "VPCPublicSubnet1RouteTableFEE4B781"
ENABLE_PROMPT_CACHING: !Ref EnablePromptCaching },
API_ROUTE_PREFIX: /v1 "SubnetId": {
MemorySize: 1024 "Ref": "VPCPublicSubnet1SubnetB4246D30"
PackageType: Image }
Role: !GetAtt ProxyApiHandlerServiceRole.Arn },
Timeout: 600 "Metadata": {
DependsOn: "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/RouteTableAssociation"
- ProxyApiHandlerServiceRoleDefaultPolicy }
- ProxyApiHandlerServiceRole },
"VPCPublicSubnet1DefaultRoute91CEF279": {
# API Gateway REST API (Regional) "Type": "AWS::EC2::Route",
RestApi: "Properties": {
Type: AWS::ApiGateway::RestApi "DestinationCidrBlock": "0.0.0.0/0",
Properties: "GatewayId": {
Name: BedrockProxyApi "Ref": "VPCIGWB7E252D3"
Description: Bedrock Access Gateway - OpenAI-compatible API with streaming support },
EndpointConfiguration: "RouteTableId": {
Types: "Ref": "VPCPublicSubnet1RouteTableFEE4B781"
- REGIONAL }
Body: },
openapi: "3.0.1" "DependsOn": [
info: "VPCVPCGW99B986DC"
title: BedrockProxyApi ],
version: "1.0" "Metadata": {
paths: "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/DefaultRoute"
/{proxy+}: }
x-amazon-apigateway-any-method: },
parameters: "VPCPublicSubnet2Subnet74179F39": {
- name: proxy "Type": "AWS::EC2::Subnet",
in: path "Properties": {
required: true "AvailabilityZone": {
schema: "Fn::Select": [
type: string 1,
x-amazon-apigateway-integration: {
type: aws_proxy "Fn::GetAZs": ""
httpMethod: POST }
uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations" ]
passthroughBehavior: when_no_match },
timeoutInMillis: 600000 "CidrBlock": "10.250.1.0/24",
responseTransferMode: STREAM "MapPublicIpOnLaunch": true,
responses: "Tags": [
default: {
description: Default response "Key": "aws-cdk:subnet-name",
/: "Value": "Public"
x-amazon-apigateway-any-method: },
x-amazon-apigateway-integration: {
type: aws_proxy "Key": "aws-cdk:subnet-type",
httpMethod: POST "Value": "Public"
uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations" },
passthroughBehavior: when_no_match {
timeoutInMillis: 600000 "Key": "Name",
responseTransferMode: STREAM "Value": "BedrockProxy/VPC/PublicSubnet2"
responses: }
default: ],
description: Default response "VpcId": {
"Ref": "VPCB9E5F0B4"
# Lambda Permission for API Gateway }
LambdaPermission: },
Type: AWS::Lambda::Permission "Metadata": {
Properties: "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/Subnet"
FunctionName: !Ref ProxyApiHandler }
Action: lambda:InvokeFunction },
Principal: apigateway.amazonaws.com "VPCPublicSubnet2RouteTable6F1A15F1": {
SourceArn: !Sub "arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${RestApi}/*" "Type": "AWS::EC2::RouteTable",
"Properties": {
# API Gateway Deployment "Tags": [
ApiDeployment: {
Type: AWS::ApiGateway::Deployment "Key": "Name",
Properties: "Value": "BedrockProxy/VPC/PublicSubnet2"
RestApiId: !Ref RestApi }
DependsOn: ],
- RestApi "VpcId": {
"Ref": "VPCB9E5F0B4"
# API Gateway Stage }
ApiStage: },
Type: AWS::ApiGateway::Stage "Metadata": {
Properties: "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/RouteTable"
RestApiId: !Ref RestApi }
DeploymentId: !Ref ApiDeployment },
StageName: api "VPCPublicSubnet2RouteTableAssociation5A808732": {
Description: API Stage with streaming support "Type": "AWS::EC2::SubnetRouteTableAssociation",
"Properties": {
Outputs: "RouteTableId": {
APIBaseUrl: "Ref": "VPCPublicSubnet2RouteTable6F1A15F1"
Description: Proxy API Base URL (OPENAI_API_BASE) },
Value: !Sub "https://${RestApi}.execute-api.${AWS::Region}.amazonaws.com/api/v1" "SubnetId": {
RestApiId: "Ref": "VPCPublicSubnet2Subnet74179F39"
Description: API Gateway REST API ID }
Value: !Ref RestApi },
LambdaFunctionArn: "Metadata": {
Description: Lambda Function ARN "aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/RouteTableAssociation"
Value: !GetAtt ProxyApiHandler.Arn }
},
"VPCPublicSubnet2DefaultRouteB7481BBA": {
"Type": "AWS::EC2::Route",
"Properties": {
"DestinationCidrBlock": "0.0.0.0/0",
"GatewayId": {
"Ref": "VPCIGWB7E252D3"
},
"RouteTableId": {
"Ref": "VPCPublicSubnet2RouteTable6F1A15F1"
}
},
"DependsOn": [
"VPCVPCGW99B986DC"
],
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/DefaultRoute"
}
},
"VPCIGWB7E252D3": {
"Type": "AWS::EC2::InternetGateway",
"Properties": {
"Tags": [
{
"Key": "Name",
"Value": "BedrockProxy/VPC"
}
]
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/IGW"
}
},
"VPCVPCGW99B986DC": {
"Type": "AWS::EC2::VPCGatewayAttachment",
"Properties": {
"InternetGatewayId": {
"Ref": "VPCIGWB7E252D3"
},
"VpcId": {
"Ref": "VPCB9E5F0B4"
}
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/VPCGW"
}
},
"ProxyApiHandlerServiceRoleBE71BFB1": {
"Type": "AWS::IAM::Role",
"Properties": {
"AssumeRolePolicyDocument": {
"Statement": [
{
"Action": "sts:AssumeRole",
"Effect": "Allow",
"Principal": {
"Service": "lambda.amazonaws.com"
}
}
],
"Version": "2012-10-17"
},
"ManagedPolicyArns": [
{
"Fn::Join": [
"",
[
"arn:",
{
"Ref": "AWS::Partition"
},
":iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
]
]
}
]
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/ServiceRole/Resource"
}
},
"ProxyApiHandlerServiceRoleDefaultPolicy86681202": {
"Type": "AWS::IAM::Policy",
"Properties": {
"PolicyDocument": {
"Statement": [
{
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Effect": "Allow",
"Resource": "arn:aws:bedrock:*::foundation-model/*"
},
{
"Action": [
"ssm:DescribeParameters",
"ssm:GetParameters",
"ssm:GetParameter",
"ssm:GetParameterHistory"
],
"Effect": "Allow",
"Resource": {
"Fn::Join": [
"",
[
"arn:",
{
"Ref": "AWS::Partition"
},
":ssm:",
{
"Ref": "AWS::Region"
},
":",
{
"Ref": "AWS::AccountId"
},
":parameter/",
{
"Ref": "ApiKeyParam"
}
]
]
}
}
],
"Version": "2012-10-17"
},
"PolicyName": "ProxyApiHandlerServiceRoleDefaultPolicy86681202",
"Roles": [
{
"Ref": "ProxyApiHandlerServiceRoleBE71BFB1"
}
]
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/ServiceRole/DefaultPolicy/Resource"
}
},
"ProxyApiHandlerEC15A492": {
"Type": "AWS::Lambda::Function",
"Properties": {
"Architectures": [
"arm64"
],
"Code": {
"ImageUri": {
"Fn::Join": [
"",
[
"366590864501.dkr.ecr.",
{
"Ref": "AWS::Region"
},
".",
{
"Ref": "AWS::URLSuffix"
},
"/bedrock-proxy-api:latest"
]
]
}
},
"Description": "Bedrock Proxy API Handler",
"Environment": {
"Variables": {
"API_KEY_PARAM_NAME": {
"Ref": "ApiKeyParam"
},
"DEBUG": "false",
"DEFAULT_MODEL": {
"Fn::FindInMap": [
"ProxyRegionTable03E5BEB3",
{
"Ref": "AWS::Region"
},
"model",
{
"DefaultValue": "anthropic.claude-3-sonnet-20240229-v1:0"
}
]
},
"DEFAULT_EMBEDDING_MODEL": "cohere.embed-multilingual-v3"
}
},
"MemorySize": 1024,
"PackageType": "Image",
"Role": {
"Fn::GetAtt": [
"ProxyApiHandlerServiceRoleBE71BFB1",
"Arn"
]
},
"Timeout": 300
},
"DependsOn": [
"ProxyApiHandlerServiceRoleDefaultPolicy86681202",
"ProxyApiHandlerServiceRoleBE71BFB1"
],
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/Resource"
}
},
"ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779": {
"Type": "AWS::Lambda::Permission",
"Properties": {
"Action": "lambda:InvokeFunction",
"FunctionName": {
"Fn::GetAtt": [
"ProxyApiHandlerEC15A492",
"Arn"
]
},
"Principal": "elasticloadbalancing.amazonaws.com"
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/Invoke2UTWxhlfyqbT5FTn--5jvgbLgj+FfJwzswGk55DU1H--Y="
}
},
"ProxyALB87756780": {
"Type": "AWS::ElasticLoadBalancingV2::LoadBalancer",
"Properties": {
"LoadBalancerAttributes": [
{
"Key": "deletion_protection.enabled",
"Value": "false"
}
],
"Scheme": "internet-facing",
"SecurityGroups": [
{
"Fn::GetAtt": [
"ProxyALBSecurityGroup0D6CA3DA",
"GroupId"
]
}
],
"Subnets": [
{
"Ref": "VPCPublicSubnet1SubnetB4246D30"
},
{
"Ref": "VPCPublicSubnet2Subnet74179F39"
}
],
"Type": "application"
},
"DependsOn": [
"VPCPublicSubnet1DefaultRoute91CEF279",
"VPCPublicSubnet1RouteTableAssociation0B0896DC",
"VPCPublicSubnet2DefaultRouteB7481BBA",
"VPCPublicSubnet2RouteTableAssociation5A808732"
],
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ALB/Resource"
}
},
"ProxyALBSecurityGroup0D6CA3DA": {
"Type": "AWS::EC2::SecurityGroup",
"Properties": {
"GroupDescription": "Automatically created Security Group for ELB BedrockProxyALB1CE4CAD1",
"SecurityGroupEgress": [
{
"CidrIp": "255.255.255.255/32",
"Description": "Disallow all traffic",
"FromPort": 252,
"IpProtocol": "icmp",
"ToPort": 86
}
],
"SecurityGroupIngress": [
{
"CidrIp": "0.0.0.0/0",
"Description": "Allow from anyone on port 80",
"FromPort": 80,
"IpProtocol": "tcp",
"ToPort": 80
}
],
"VpcId": {
"Ref": "VPCB9E5F0B4"
}
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ALB/SecurityGroup/Resource"
}
},
"ProxyALBListener933E9515": {
"Type": "AWS::ElasticLoadBalancingV2::Listener",
"Properties": {
"DefaultActions": [
{
"TargetGroupArn": {
"Ref": "ProxyALBListenerTargetsGroup187739FA"
},
"Type": "forward"
}
],
"LoadBalancerArn": {
"Ref": "ProxyALB87756780"
},
"Port": 80,
"Protocol": "HTTP"
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ALB/Listener/Resource"
}
},
"ProxyALBListenerTargetsGroup187739FA": {
"Type": "AWS::ElasticLoadBalancingV2::TargetGroup",
"Properties": {
"HealthCheckEnabled": false,
"TargetType": "lambda",
"Targets": [
{
"Id": {
"Fn::GetAtt": [
"ProxyApiHandlerEC15A492",
"Arn"
]
}
}
]
},
"DependsOn": [
"ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779"
],
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ALB/Listener/TargetsGroup/Resource"
}
},
"CDKMetadata": {
"Type": "AWS::CDK::Metadata",
"Properties": {
"Analytics": "v2:deflate64:H4sIAAAAAAAA/1VRXW/CMAz8LbyHDMovAKZNSJtWFcTr5LpeZ0iTKHFAqOp/n1q+uief7y7ynZLp+WKhZxM4xylWx6nhUrdbATyq9Y/NIUBDQkHBOX63hJlu9x57aZ+vVZ5Kw7hNpSXpuScqXBLaQWnoyT+5ZYwOGYSdfZh7sLFCwZK8g9AZLrczt20pAvjbkBW1JUyB5fIeXPLDgTHRKcKgC/IusrhwWUEkZaApK9Dtq8MjhU0DNb0li/cIY5xTaDhGdrZTDI1uC3etMczcGcYh2hV1igxEYTQOqhIMWGRbnzLdLr03jEPLDwfVatAo9E//7WMfRyF789zxSN9BqEketUdr16mCoksBh6if4D3buodfSXy6fsrIsHa2Yhk6WleRPsSXUzbT87meTQ6ReRqSFW5IF9f5B/Z2H8goAgAA"
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/CDKMetadata/Default"
},
"Condition": "CDKMetadataAvailable"
}
},
"Mappings": {
"ProxyRegionTable03E5BEB3": {
"us-east-1": {
"model": "anthropic.claude-3-sonnet-20240229-v1:0"
},
"ap-southeast-1": {
"model": "anthropic.claude-v2"
},
"ap-northeast-1": {
"model": "anthropic.claude-v2:1"
},
"eu-central-1": {
"model": "anthropic.claude-v2:1"
}
}
},
"Outputs": {
"APIBaseUrl": {
"Description": "Proxy API Base URL (OPENAI_API_BASE)",
"Value": {
"Fn::Join": [
"",
[
"http://",
{
"Fn::GetAtt": [
"ProxyALB87756780",
"DNSName"
]
},
"/api/v1"
]
]
}
}
},
"Conditions": {
"CDKMetadataAvailable": {
"Fn::Or": [
{
"Fn::Or": [
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"af-south-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-east-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-northeast-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-northeast-2"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-south-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-southeast-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-southeast-2"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ca-central-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"cn-north-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"cn-northwest-1"
]
}
]
},
{
"Fn::Or": [
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-central-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-north-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-south-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-west-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-west-2"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-west-3"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"il-central-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"me-central-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"me-south-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"sa-east-1"
]
}
]
},
{
"Fn::Or": [
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"us-east-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"us-east-2"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"us-west-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"us-west-2"
]
}
]
}
]
}
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,18 +0,0 @@
version: '3.8'
services:
bedrock-access-gateway:
build:
context: ./src
dockerfile: Dockerfile_ecs
ports:
- "127.0.0.1:8000:8080"
environment:
- ENABLE_PROMPT_CACHING=true
- API_KEY=${OPENAI_API_KEY}
- AWS_PROFILE
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_SESSION_TOKEN
volumes:
- ${HOME}/.aws:/home/appuser/.aws

View File

@@ -1,78 +0,0 @@
# Security
This document details the security configuration required for the solution. In particular, it covers:
- **HTTPS Setup**
Following these guidelines will help ensure that traffic is encrypted over the public network.
---
## 1. HTTPS Authentication with the ALB
### Overview
Using HTTPS on your ALB guarantees that all client-to-ALB communication is encrypted. This is achieved by:
- **Obtaining and managing SSL/TLS certificates** using AWS Certificate Manager (ACM). You'll need a domain but you can request a free certificate.
- **Configuring HTTPS listeners** on the ALB
- **Automating HTTP to HTTPS redirect** for clients that inadvertently access HTTP endpoints
- **Allowing traffic in the Security Group of the ALB**
### Step-by-Step Setup
#### 1.1. Request an SSL/TLS Certificate via ACM
1. **Navigate to AWS Certificate Manager (ACM):**
In the AWS Management Console, go to ACM in the region where your ALB is deployed.
2. **Request the Certificate:**
- Click on **"Request a certificate"**.
- Choose **"Request a public certificate"** (or a private one if using a private CA).
- Enter your domain names (e.g., `example.com`, `*.example.com`).
- Complete the validation (via DNS or email). DNS validation is generally preferred for automation purposes.
3. **Certificate Validation:**
Ensure that the certificate status becomes **"Issued"** before proceeding.
#### 1.2. Configure the ALB for HTTPS
1. **Create or Modify the ALB Listener:**
- Open the **EC2 Dashboard** and navigate to [Load Balancers](https://console.aws.amazon.com/ec2/home?#LoadBalancers:).
- If you already have an ALB, select it; otherwise, create a new ALB.
- Under the **Listeners** tab, click **Manage listener** > **Edit Listener**.
- Configure the listener protocol to **HTTPS** with port **443**.
- Select the certificate you requested from ACM.
#### 1.3. (Optional) Redirect HTTP Traffic to HTTPS
To enhance security, ensure that any HTTP requests are automatically redirected to HTTPS.
1. **Create an HTTP Listener on Port 80:**
- Add a listener on port **80**.
- In the listener settings, add a rule to redirect all traffic to port **443** with the protocol changed to **HTTPS**.
**Example AWS CLI command for redirection:**
```bash
aws elbv2 create-listener \
--load-balancer-arn <your-alb-arn> \
--protocol HTTP \
--port 80 \
--default-actions Type=redirect,RedirectConfig="Protocol=https,Port=443,StatusCode=HTTP_301"
```
#### 1.4. Allow traffic in the Security Group of the ALB
1. **Create a Security Group:**
- Go to the CloudFormation stack you originally used to deploy, select **Resources** and search for **ProxyALBSecurityGroup**
- Click on the Security Group
- Edit the Inbound Rules to allow traffic on Port 443 from `0.0.0.0/0` and (optionally) delete the Inbound Rule on Port 80. **Note**: If you delete the rule on port 80, you will need to update the base url to use HTTPS only as it won't redirect HTTP traffic to HTTPS.
Now you should be able to test your application! Use the base url like:
```
https://<your-domain>/api/v1
```
---
By following the steps outlined in this guide, you can configure a secure environment that uses HTTPS via ALB for encrypted traffic.

View File

@@ -1,97 +0,0 @@
# Troubleshooting Guide
This guide helps you troubleshoot common issues you might encounter when using the Bedrock Access Gateway.
## Common Issues
### 1. Parameter Store Access Error
To see errors, first you need to access the CloudWatch Logs of the Lambda/Fargate instance.
1. Go to the [CloudWatch Console](https://console.aws.amazon.com/cloudwatch/home?#logsV2:log-groups/)
2. Search for `/aws/lambda/BedrockProxyAPI`
3. Click on the `Log Stream` to see the error details
```python
botocore.exceptions.ClientError: An error occurred (ParameterNotFound) when calling the GetParameter operation: Parameter /BedrockProxyAPIKey not found.
```
This error occurs when the Lambda function cannot access the API key parameter in Parameter Store.
**Possible solutions:**
- Verify that you created the parameter in Parameter Store with the correct name
- Check that the parameter name in the CloudFormation stack matches the one in Parameter Store
- Ensure the Lambda function's IAM role has permission to access Parameter Store
- If you didn't set up an API key, leave the `ApiKeyParam` field blank during deployment
### 2. Model Access Issues
If you receive an error about model access:
```
{"error": {"message": "User: arn:aws:iam::XXXX:role/XXX is not authorized to perform: bedrock:InvokeModel on resource: arn:aws:bedrock:REGION::foundation-model/XXX", "type": "auth_error", "code": 401}}
```
**Possible solutions:**
- Ensure you have requested access to the model in Amazon Bedrock
- Verify the Lambda/Fargate role has the necessary permissions to invoke Bedrock models
- Check that you're using the correct model ID
- Verify the model is available in your chosen region
### 3. API Key Authentication Failures
If you receive a 401 Unauthorized error:
```
{"detail": "Could not validate credentials"}
```
**Possible solutions:**
- Verify you're using the correct API key in your requests
- Check that the `Authorization` header is properly formatted (`Bearer YOUR-API-KEY`)
- If using environment variables, ensure `OPENAI_API_KEY` is set correctly
### 4. Cross-Region Access Issues
If you're trying to access models in a different region:
```
{"error": {"message": "Region 'us-east-1' is not enabled for your account", "type": "invalid_request_error", "code": 400}}
```
**Possible solutions:**
- Ensure the target region is enabled for your AWS account
- Verify the model you're trying to access is available in that region
- Check that your IAM roles have the necessary cross-region permissions
### 5. Rate Limiting and Quotas
If you're experiencing throttling or quota issues:
```
{"error": {"message": "Rate limit exceeded", "type": "rate_limit_error", "code": 429}}
```
**Possible solutions:**
- Check your Bedrock service quotas in the AWS Console
- Consider implementing retry logic in your application
- Request a quota increase if needed
## Getting Help
If you're still experiencing issues:
1. Check the CloudWatch Logs for detailed error messages
2. Verify your AWS credentials and permissions
3. Review the [Usage Guide](./Usage.md) for correct API usage
4. Open a [GitHub issue](https://github.com/aws-samples/bedrock-access-gateway/issues/new?template=bug_report.md) with:
- Detailed error message
- Steps to reproduce
- Your deployment configuration (region, model, etc.)
- Any relevant CloudWatch logs
## Additional Resources
- [Amazon Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
- [AWS IAM Documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/)
- [AWS Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html)

View File

@@ -9,85 +9,6 @@ export OPENAI_API_KEY=<API key>
export OPENAI_BASE_URL=<API base url> export OPENAI_BASE_URL=<API base url>
``` ```
**API Example:**
- [Models API](#models-api)
- [Embedding API](#embedding-api)
- [Multimodal API](#multimodal-api)
- [Tool Call](#tool-call)
- [Reasoning](#reasoning)
- [Interleaved thinking (beta)](#Interleaved thinking (beta))
## Models API
You can use this API to get a list of supported model IDs.
Also, you can use this API to refresh the model list if new models are added to Amazon Bedrock.
**Example Request**
```bash
curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq .data
```
**Example Response**
```bash
[
...
{
"id": "anthropic.claude-3-5-sonnet-20240620-v1:0",
"created": 1734416893,
"object": "model",
"owned_by": "bedrock"
},
{
"id": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"created": 1734416893,
"object": "model",
"owned_by": "bedrock"
},
...
]
```
## Chat Completions API
### Basic Example with Claude Sonnet 4.5
Claude Sonnet 4.5 is Anthropic's most intelligent model, excelling at coding, complex reasoning, and agent-based tasks. It's available via global cross-region inference profiles.
**Example Request**
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [
{
"role": "user",
"content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."
}
]
}'
```
**Example SDK Usage**
```python
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
messages=[{"role": "user", "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."}],
)
print(completion.choices[0].message.content)
```
## Embedding API ## Embedding API
**Important Notice**: Please carefully review the following points before using this proxy API for embedding. **Important Notice**: Please carefully review the following points before using this proxy API for embedding.
@@ -170,10 +91,13 @@ print(doc_result[0][:5])
## Multimodal API ## Multimodal API
**Important Notice**: Please carefully review the following points before using this proxy API for Multimodal.
1. This API is only supported by Claude 3 model.
**Example Request** **Example Request**
```bash ```bash
curl $OPENAI_BASE_URL/chat/completions \
curl $OPENAI_BASE_URL/chat/completions \ curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Authorization: Bearer $OPENAI_API_KEY" \
@@ -261,6 +185,7 @@ curl $OPENAI_BASE_URL/chat/completions \
**Important Notice**: Please carefully review the following points before using this Tool Call for Chat completion API. **Important Notice**: Please carefully review the following points before using this Tool Call for Chat completion API.
1. Function Call is now deprecated in favor of Tool Call by OpenAI, hence it's not supported here, you should use Tool Call instead. 1. Function Call is now deprecated in favor of Tool Call by OpenAI, hence it's not supported here, you should use Tool Call instead.
2. This API is only supported by Claude 3 model.
**Example Request** **Example Request**
@@ -358,218 +283,3 @@ curl $OPENAI_BASE_URL/chat/completions \
You can try it with different questions, such as: You can try it with different questions, such as:
1. Hello, who are you? (No tools are needed) 1. Hello, who are you? (No tools are needed)
2. What is the weather like today? (Should use get_current_location tool first) 2. What is the weather like today? (Should use get_current_location tool first)
## Reasoning
**Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API.
- Only Claude 3.7 Sonnet (extended thinking) and DeepSeek R1 support Reasoning so far. Please make sure the model supports reasoning before use.
- For Claude 3.7 Sonnet, the reasoning mode (or thinking mode) is not enabled by default, you must pass additional `reasoning_effort` parameter in your request. Please also provide the right max_tokens (or max_completion_tokens) in your request. The budget_tokens is based on reasoning_effort (low: 30%, medium: 60%, high: 100% of max tokens), ensuring minimum budget_tokens of 1,024 with Anthropic recommending at least 4,000 tokens for comprehensive reasoning. Check [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html) for more details.
- For DeepSeek R1, you don't need additional reasoning_effort parameter, otherwise, you may get an error.
- The reasoning response (CoT, thoughts) is added in an additional tag 'reasoning_content' which is not officially supported by OpenAI. This is to follow [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example). This may be changed in the future.
**Example Request**
- Claude 3.7 Sonnet
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"messages": [
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
],
"max_completion_tokens": 4096,
"reasoning_effort": "low",
"stream": false
}'
```
- DeepSeek R1
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.deepseek.r1-v1:0",
"messages": [
{
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
],
"stream": false
}'
```
**Example Response**
```json
{
"id": "chatcmpl-83fb7a88",
"created": 1740545278,
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"system_fingerprint": "fp",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
"reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
}
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 51,
"completion_tokens": 565,
"total_tokens": 616
}
}
```
You can also use OpenAI SDK (run `pip3 install -U openai` first )
- Non-Streaming
```python
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=messages,
reasoning_effort="low",
max_completion_tokens=4096,
)
reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content
```
- Streaming
```python
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=messages,
reasoning_effort="low",
max_completion_tokens=4096,
stream=True,
)
reasoning_content = ""
content = ""
for chunk in response:
if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
reasoning_content += chunk.choices[0].delta.reasoning_content
elif chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
```
## Interleaved thinking (beta)
**Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API.
Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions.
With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.
**Supported Models**: Claude Sonnet 4, Claude Sonnet 4.5
**Example Request**
- Non-Streaming (Claude Sonnet 4.5)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "Explain how to implement a binary search tree with self-balancing capabilities."
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Non-Streaming (Claude Sonnet 4)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Streaming (Claude Sonnet 4.5)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "Explain how to implement a binary search tree with self-balancing capabilities."
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Streaming (Claude Sonnet 4)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```

View File

@@ -9,83 +9,6 @@ export OPENAI_API_KEY=<API key>
export OPENAI_BASE_URL=<API base url> export OPENAI_BASE_URL=<API base url>
``` ```
**API 示例:**
- [Models API](#models-api)
- [Embedding API](#embedding-api)
- [Multimodal API](#multimodal-api)
- [Tool Call](#tool-call)
- [Reasoning](#reasoning)
- [Interleaved thinking (beta)](#Interleaved thinking (beta))
## Models API
你可以通过这个API 获取支持的models 列表。 另外如果Amazon Bedrock有新模型加入后你也可以用它来更新刷新模型列表。
**Request 示例**
```bash
curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq .data
```
**Response 示例**
```bash
[
...
{
"id": "anthropic.claude-3-5-sonnet-20240620-v1:0",
"created": 1734416893,
"object": "model",
"owned_by": "bedrock"
},
{
"id": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"created": 1734416893,
"object": "model",
"owned_by": "bedrock"
},
...
]
```
## Chat Completions API
### Claude Sonnet 4.5 基础示例
Claude Sonnet 4.5 是 Anthropic 最智能的模型,在编码、复杂推理和基于代理的任务方面表现出色。它通过全球跨区域推理配置文件提供。
**Request 示例**
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [
{
"role": "user",
"content": "编写一个使用动态规划计算斐波那契数列的Python函数。"
}
]
}'
```
**SDK 使用示例**
```python
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
messages=[{"role": "user", "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"}],
)
print(completion.choices[0].message.content)
```
## Embedding API ## Embedding API
**重要**: 在使用此代理 API 之前,请仔细阅读以下几点: **重要**: 在使用此代理 API 之前,请仔细阅读以下几点:
@@ -167,6 +90,10 @@ print(doc_result[0][:5])
## Multimodal API ## Multimodal API
**重要**:在使用此代理API进行多模态处理之前,请仔细阅读以下几点:
1. 此API 仅支持Claude 3模型。
**Request 示例** **Request 示例**
```bash ```bash
@@ -257,6 +184,7 @@ curl $OPENAI_BASE_URL/chat/completions \
**重要**:在使用此代理API进行Tool Call之前,请仔细阅读以下几点: **重要**:在使用此代理API进行Tool Call之前,请仔细阅读以下几点:
1. OpenAI 已经废弃使用Function Call,而推荐使用Tool Call,因此Function Call在此处不受支持,您应该改为Tool Call。 1. OpenAI 已经废弃使用Function Call,而推荐使用Tool Call,因此Function Call在此处不受支持,您应该改为Tool Call。
1. 此API 仅支持Claude 3模型。
**Request 示例** **Request 示例**
@@ -354,222 +282,3 @@ curl $OPENAI_BASE_URL/chat/completions \
You can try it with different questions, such as: You can try it with different questions, such as:
1. Hello, who are you? (No tools are needed) 1. Hello, who are you? (No tools are needed)
2. What is the weather like today? (Should use get_current_location tool first) 2. What is the weather like today? (Should use get_current_location tool first)
## Reasoning
**重要**: 使用此 reasoning 推理模式前,请仔细阅读以下要点。
- 目前仅 Claude 3.7 Sonnet / Deepseek R1 模型支持推理功能。使用前请确保所用模型支持推理。
- Claude 3.7 Sonnet 推理模式(或思考模式)默认未启用,您必须在请求中传递额外的 reasoning_effort 参数,参数值可选:lowmedium, high。另外请在请求中提供正确的 max_tokens或 max_completion_tokens参数。budget_tokens 基于 reasoning_effort 设置30%60%100% 的max tokens确保最小 budget_tokens 为 1,024Anthropic 建议至少使用 4,000 个令牌以获得全面的推理。详情请参阅 [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html)。
- Deepseek R1 会自动使用推理模式,不需要在中传递额外的 reasoning_effort 参数(否则会报错)
- 推理结果(思维链结果、思考过程)被添加到名为 'reasoning_content' 的额外标签中,这不是 OpenAI 官方支持的格式。此设计遵循 [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example) 的规范。未来可能会有所变动。
**Request 示例**
- Claude 3.7 Sonnet
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"messages": [
{
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
],
"max_completion_tokens": 4096,
"reasoning_effort": "low",
"stream": false
}'
```
- DeepSeek R1
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.deepseek.r1-v1:0",
"messages": [
{
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
],
"stream": false
}'
```
**Response 示例**
```json
{
"id": "chatcmpl-83fb7a88",
"created": 1740545278,
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"system_fingerprint": "fp",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
"reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
}
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 51,
"completion_tokens": 565,
"total_tokens": 616
}
}
```
或者使用 OpenAI SDK (请先运行`pip3 install -U openai` 升级到最新版本)
- Non-Streaming
```python
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=messages,
reasoning_effort="low",
max_completion_tokens=4096,
)
reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content
```
- Streaming
```python
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=messages,
reasoning_effort="low",
max_completion_tokens=4096,
stream=True,
)
reasoning_content = ""
content = ""
for chunk in response:
if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
reasoning_content += chunk.choices[0].delta.reasoning_content
elif chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
```
## Interleaved thinking (beta)
**重要提示**:在使用 Chat Completion API 的推理模式reasoning mode请务必仔细阅读以下内容。
Claude 4 模型支持借助工具使用的扩展思维功能Extended Thinking其中包含交错思考[interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) )。该功能使 Claude 4 可以在多次调用工具之间进行思考,并在收到工具结果后执行更复杂的推理,这对处理更复杂的 Agentic AI 交互非常有帮助。
在交错思考模式下budget_tokens 可以超过 max_tokens 参数,因为它代表一次助手回合中所有思考块的总 Token 预算。
**支持的模型**: Claude Sonnet 4, Claude Sonnet 4.5
**Request 示例**
- Non-Streaming (Claude Sonnet 4.5)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Non-Streaming (Claude Sonnet 4)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Streaming (Claude Sonnet 4.5)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Streaming (Claude Sonnet 4)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```

View File

@@ -1,21 +0,0 @@
line-length = 120
indent-width = 4
target-version = "py312"
exclude = [
".venv",
".vscode",
"test/*"
]
[lint]
select = ["E", "F", "I"]
ignore = [
"E501",
"C901",
"F401",
]
[format]
# use double quotes for strings.
quote-style = "double"

View File

@@ -1,139 +1,35 @@
# NOTE: The script will try to create the ECR repository if it doesn't exist. Please grant the necessary permissions to the IAM user or role. # Make sure you have created the Repo in AWS ECR in every regions you want to push to before executing this script.
# Usage: # Usage:
# cd scripts # cd scripts
# bash ./push-to-ecr.sh # chmod +x push-to-ecr.sh
# ./push-to-ecr.sh
set -o errexit # exit on first error
set -o nounset # exit on using unset variables
set -o pipefail # exit on any error in a pipeline
# Change to the directory where the script is located
cd "$(dirname "$0")"
# Prompt user for inputs
echo "================================================"
echo "Bedrock Access Gateway - Build and Push to ECR"
echo "================================================"
echo ""
# Get repository name for Lambda version
read -p "Enter ECR repository name for Lambda (default: bedrock-proxy-api): " LAMBDA_REPO
LAMBDA_REPO=${LAMBDA_REPO:-bedrock-proxy-api}
# Get repository name for ECS/Fargate version
read -p "Enter ECR repository name for ECS/Fargate (default: bedrock-proxy-api-ecs): " ECS_REPO
ECS_REPO=${ECS_REPO:-bedrock-proxy-api-ecs}
# Get image tag
read -p "Enter image tag (default: latest): " TAG
TAG=${TAG:-latest}
# Get AWS region
read -p "Enter AWS region (default: us-east-1): " AWS_REGION
AWS_REGION=${AWS_REGION:-us-east-1}
echo ""
echo "Configuration:"
echo " Lambda Repository: $LAMBDA_REPO"
echo " ECS/Fargate Repository: $ECS_REPO"
echo " Image Tag: $TAG"
echo " AWS Region: $AWS_REGION"
echo ""
read -p "Continue with these settings? (y/n): " CONFIRM
if [[ ! "$CONFIRM" =~ ^[Yy]$ ]]; then
echo "Aborted."
exit 1
fi
echo ""
# Acknowledgment about ECR repository creation
echo " NOTICE: This script will automatically create ECR repositories if they don't exist."
echo " The repositories will be created with the following default settings:"
echo " - Image tag mutability: MUTABLE (allows overwriting tags)"
echo " - Image scanning: Disabled"
echo " - Encryption: AES256 (AWS managed encryption)"
echo ""
echo " You can modify these settings later in the AWS ECR Console if needed."
echo " Required IAM permissions: ecr:CreateRepository, ecr:GetAuthorizationToken,"
echo " ecr:BatchCheckLayerAvailability, ecr:InitiateLayerUpload, ecr:UploadLayerPart,"
echo " ecr:CompleteLayerUpload, ecr:PutImage"
echo ""
read -p "Do you acknowledge and want to proceed? (y/n): " ACK_CONFIRM
if [[ ! "$ACK_CONFIRM" =~ ^[Yy]$ ]]; then
echo "Aborted."
exit 1
fi
echo ""
# Define variables # Define variables
ARCHS=("arm64") # Single architecture for simplicity IMAGE_NAME="bedrock-proxy-api"
TAG="latest"
build_and_push_image() { AWS_REGIONS=("us-west-2") # List of AWS regions
local IMAGE_NAME=$1 #AWS_REGIONS=("us-east-1" "us-west-2" "eu-central-1" "ap-southeast-1" "ap-northeast-1") # List of AWS regions
local TAG=$2
local DOCKERFILE_PATH=$3
local REGION=$AWS_REGION
local ARCH=${ARCHS[0]}
echo "Building $IMAGE_NAME:$TAG..."
# Build Docker image # Build Docker image
# Note: --provenance=false and --sbom=false are required for Lambda compatibility docker build -t $IMAGE_NAME:$TAG ../src/
# Without these flags, Docker BuildKit (especially with docker-container driver) may create
# OCI image manifests with attestations that AWS Lambda does not support.
# Lambda requires Docker V2 Schema 2 format without multi-manifest index.
# See: https://github.com/aws-samples/bedrock-access-gateway/issues/206
docker buildx build \
--platform linux/$ARCH \
--provenance=false \
--sbom=false \
-t $IMAGE_NAME:$TAG \
-f $DOCKERFILE_PATH \
--load \
../src/
# Get the account ID # Loop through each AWS region
for REGION in "${AWS_REGIONS[@]}"
do
# Get the account ID for the current region
ACCOUNT_ID=$(aws sts get-caller-identity --region $REGION --query Account --output text) ACCOUNT_ID=$(aws sts get-caller-identity --region $REGION --query Account --output text)
# Create repository URI # Create repository URI
REPOSITORY_URI="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${IMAGE_NAME}" REPOSITORY_URI="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${IMAGE_NAME}"
echo "Creating ECR repository if it doesn't exist..."
# Create ECR repository if it doesn't exist
aws ecr create-repository --repository-name "${IMAGE_NAME}" --region $REGION || true
echo "Logging in to ECR..."
# Log in to ECR # Log in to ECR
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $REPOSITORY_URI aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $REPOSITORY_URI
echo "Pushing image to ECR..." # Tag the image for the current region
# Tag the image for ECR
docker tag $IMAGE_NAME:$TAG $REPOSITORY_URI:$TAG docker tag $IMAGE_NAME:$TAG $REPOSITORY_URI:$TAG
# Push the image to ECR # Push the image to ECR
docker push $REPOSITORY_URI:$TAG docker push $REPOSITORY_URI:$TAG
echo "Pushed $IMAGE_NAME:$TAG to $REPOSITORY_URI"
echo "✅ Successfully pushed $IMAGE_NAME:$TAG to $REPOSITORY_URI" done
echo ""
}
echo "Building and pushing Lambda image..."
build_and_push_image "$LAMBDA_REPO" "$TAG" "../src/Dockerfile"
echo "Building and pushing ECS/Fargate image..."
build_and_push_image "$ECS_REPO" "$TAG" "../src/Dockerfile_ecs"
echo "================================================"
echo "✅ All images successfully pushed!"
echo "================================================"
echo ""
echo "Your container image URIs:"
ACCOUNT_ID=$(aws sts get-caller-identity --region $AWS_REGION --query Account --output text)
echo " Lambda: ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${LAMBDA_REPO}:${TAG}"
echo " ECS/Fargate: ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECS_REPO}:${TAG}"
echo ""
echo "Next steps:"
echo " 1. Download the CloudFormation templates from deployment/ folder"
echo " 2. Update the ContainerImageUri parameter with your image URI above"
echo " 3. Deploy the stack via AWS CloudFormation Console"
echo ""

View File

@@ -1,19 +1,9 @@
FROM public.ecr.aws/lambda/python:3.12 FROM public.ecr.aws/lambda/python:3.12
# Add Lambda Web Adapter for API Gateway response streaming
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 /lambda-adapter /opt/extensions/lambda-adapter
COPY ./api ./api COPY ./api ./api
COPY requirements.txt . COPY requirements.txt .
RUN pip3 install -r requirements.txt -U --no-cache-dir RUN pip3 install -r requirements.txt -U --no-cache-dir
# Preload tiktoken encoding: https://github.com/aws-samples/bedrock-access-gateway/issues/118 CMD [ "api.app.handler" ]
ENV TIKTOKEN_CACHE_DIR=/var/task/.cache/tiktoken
RUN python3 -c 'import tiktoken_ext.openai_public as tke; tke.cl100k_base()'
# Lambda Web Adapter requires overriding the Lambda base image entrypoint
# to run the web app directly instead of the Lambda runtime handler
ENTRYPOINT []
CMD ["python", "-m", "uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "8080"]

View File

@@ -1,4 +1,4 @@
FROM public.ecr.aws/docker/library/python:3.13-slim FROM python:3.12-slim
WORKDIR /app WORKDIR /app
@@ -8,19 +8,4 @@ RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
COPY ./api /app/api COPY ./api /app/api
# Create non-root user CMD ["uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "80"]
RUN groupadd -r appuser && useradd -r -g appuser appuser && \
chown -R appuser:appuser /app
USER appuser
# Preload tiktoken encoding: https://github.com/aws-samples/bedrock-access-gateway/issues/118
ENV TIKTOKEN_CACHE_DIR=/app/.cache/tiktoken
RUN python3 -c 'import tiktoken_ext.openai_public as tke; tke.cl100k_base()'
ENV PORT=8080
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:${PORT}/health').read()"
CMD ["sh", "-c", "uvicorn api.app:app --host 0.0.0.0 --port ${PORT}"]

View File

@@ -1,5 +1,4 @@
import logging import logging
import os
import uvicorn import uvicorn
from fastapi import FastAPI from fastapi import FastAPI
@@ -8,8 +7,8 @@ from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import PlainTextResponse from fastapi.responses import PlainTextResponse
from mangum import Mangum from mangum import Mangum
from api.routers import chat, embeddings, model from api.routers import model, chat, embeddings
from api.setting import API_ROUTE_PREFIX, DESCRIPTION, SUMMARY, TITLE, VERSION from api.setting import API_ROUTE_PREFIX, TITLE, DESCRIPTION, SUMMARY, VERSION
config = { config = {
"title": TITLE, "title": TITLE,
@@ -24,22 +23,14 @@ logging.basicConfig(
) )
app = FastAPI(**config) app = FastAPI(**config)
allowed_origins = os.environ.get("ALLOWED_ORIGINS", "*")
origins_list = [origin.strip() for origin in allowed_origins.split(",")] if allowed_origins != "*" else ["*"]
# Warn if CORS allows all origins
if origins_list == ["*"]:
logging.warning("CORS is configured to allow all origins (*). Set ALLOWED_ORIGINS environment variable to restrict access.")
app.add_middleware( app.add_middleware(
CORSMiddleware, CORSMiddleware,
allow_origins=origins_list, # nosec - configurable via ALLOWED_ORIGINS env var allow_origins=["*"],
allow_credentials=True, allow_credentials=True,
allow_methods=["*"], allow_methods=["*"],
allow_headers=["*"], allow_headers=["*"],
) )
app.include_router(model.router, prefix=API_ROUTE_PREFIX) app.include_router(model.router, prefix=API_ROUTE_PREFIX)
app.include_router(chat.router, prefix=API_ROUTE_PREFIX) app.include_router(chat.router, prefix=API_ROUTE_PREFIX)
app.include_router(embeddings.router, prefix=API_ROUTE_PREFIX) app.include_router(embeddings.router, prefix=API_ROUTE_PREFIX)
@@ -53,21 +44,10 @@ async def health():
@app.exception_handler(RequestValidationError) @app.exception_handler(RequestValidationError)
async def validation_exception_handler(request, exc): async def validation_exception_handler(request, exc):
logger = logging.getLogger(__name__)
# Log essential info only - avoid sensitive data and performance overhead
logger.warning(
"Request validation failed: %s %s - %s",
request.method,
request.url.path,
str(exc).split('\n')[0] # First line only
)
return PlainTextResponse(str(exc), status_code=400) return PlainTextResponse(str(exc), status_code=400)
handler = Mangum(app) handler = Mangum(app)
if __name__ == "__main__": if __name__ == "__main__":
# Bind to 0.0.0.0 for container environments, network is handled by network policies and load balancers uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)
uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=False) # nosec B104

View File

@@ -1,43 +1,28 @@
import json
import os import os
from typing import Annotated from typing import Annotated
import boto3 import boto3
from botocore.exceptions import ClientError
from fastapi import Depends, HTTPException, status from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from api.setting import DEFAULT_API_KEYS
api_key_param = os.environ.get("API_KEY_PARAM_NAME") api_key_param = os.environ.get("API_KEY_PARAM_NAME")
api_key_secret_arn = os.environ.get("API_KEY_SECRET_ARN")
api_key_env = os.environ.get("API_KEY")
if api_key_param: if api_key_param:
# For backward compatibility.
# Please now use secrets manager instead.
ssm = boto3.client("ssm") ssm = boto3.client("ssm")
api_key = ssm.get_parameter(Name=api_key_param, WithDecryption=True)["Parameter"]["Value"] api_key = ssm.get_parameter(Name=api_key_param, WithDecryption=True)["Parameter"][
elif api_key_secret_arn: "Value"
sm = boto3.client("secretsmanager") ]
try:
response = sm.get_secret_value(SecretId=api_key_secret_arn)
if "SecretString" in response:
secret = json.loads(response["SecretString"])
api_key = secret["api_key"]
except ClientError:
raise RuntimeError("Unable to retrieve API KEY, please ensure the secret ARN is correct")
except KeyError:
raise RuntimeError('Please ensure the secret contains a "api_key" field')
elif api_key_env:
api_key = api_key_env
else: else:
raise RuntimeError( api_key = DEFAULT_API_KEYS
"API Key is not configured. Please set up your API Key."
)
security = HTTPBearer() security = HTTPBearer()
def api_key_auth( def api_key_auth(
credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)], credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)]
): ):
if credentials.credentials != api_key: if credentials.credentials != api_key:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Key") raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Key"
)

View File

@@ -1,4 +1,3 @@
import logging
import time import time
import uuid import uuid
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
@@ -6,17 +5,14 @@ from typing import AsyncIterable
from api.schema import ( from api.schema import (
# Chat # Chat
ChatRequest,
ChatResponse, ChatResponse,
ChatRequest,
ChatStreamResponse, ChatStreamResponse,
# Embeddings # Embeddings
EmbeddingsRequest, EmbeddingsRequest,
EmbeddingsResponse, EmbeddingsResponse,
Error,
) )
logger = logging.getLogger(__name__)
class BaseChatModel(ABC): class BaseChatModel(ABC):
"""Represent a basic chat model """Represent a basic chat model
@@ -33,12 +29,12 @@ class BaseChatModel(ABC):
pass pass
@abstractmethod @abstractmethod
async def chat(self, chat_request: ChatRequest) -> ChatResponse: def chat(self, chat_request: ChatRequest) -> ChatResponse:
"""Handle a basic chat completion requests.""" """Handle a basic chat completion requests."""
pass pass
@abstractmethod @abstractmethod
async def chat_stream(self, chat_request: ChatRequest) -> AsyncIterable[bytes]: def chat_stream(self, chat_request: ChatRequest) -> AsyncIterable[bytes]:
"""Handle a basic chat completion requests with stream response.""" """Handle a basic chat completion requests with stream response."""
pass pass
@@ -47,20 +43,16 @@ class BaseChatModel(ABC):
return "chatcmpl-" + str(uuid.uuid4())[:8] return "chatcmpl-" + str(uuid.uuid4())[:8]
@staticmethod @staticmethod
def stream_response_to_bytes(response: ChatStreamResponse | Error | None = None) -> bytes: def stream_response_to_bytes(
if isinstance(response, Error): response: ChatStreamResponse | None = None
logger.error("Stream error: %s", response.error.message if response.error else "Unknown error") ) -> bytes:
data = response.model_dump_json() if response:
elif isinstance(response, ChatStreamResponse):
# to populate other fields when using exclude_unset=True # to populate other fields when using exclude_unset=True
response.system_fingerprint = "fp" response.system_fingerprint = "fp"
response.object = "chat.completion.chunk" response.object = "chat.completion.chunk"
response.created = int(time.time()) response.created = int(time.time())
data = response.model_dump_json(exclude_unset=True) return "data: {}\n\n".format(response.model_dump_json(exclude_unset=True)).encode("utf-8")
else: return "data: [DONE]\n\n".encode("utf-8")
data = "[DONE]"
return f"data: {data}\n\n".encode("utf-8")
class BaseEmbeddingsModel(ABC): class BaseEmbeddingsModel(ABC):

File diff suppressed because it is too large Load Diff

View File

@@ -1,11 +1,11 @@
from typing import Annotated from typing import Annotated
from fastapi import APIRouter, Body, Depends from fastapi import APIRouter, Depends, Body
from fastapi.responses import StreamingResponse from fastapi.responses import StreamingResponse
from api.auth import api_key_auth from api.auth import api_key_auth
from api.models.bedrock import BedrockModel from api.models.bedrock import BedrockModel
from api.schema import ChatRequest, ChatResponse, ChatStreamResponse, Error from api.schema import ChatRequest, ChatResponse, ChatStreamResponse
from api.setting import DEFAULT_MODEL from api.setting import DEFAULT_MODEL
router = APIRouter( router = APIRouter(
@@ -15,9 +15,7 @@ router = APIRouter(
) )
@router.post( @router.post("/completions", response_model=ChatResponse | ChatStreamResponse, response_model_exclude_unset=True)
"/completions", response_model=ChatResponse | ChatStreamResponse | Error, response_model_exclude_unset=True
)
async def chat_completions( async def chat_completions(
chat_request: Annotated[ chat_request: Annotated[
ChatRequest, ChatRequest,
@@ -32,7 +30,7 @@ async def chat_completions(
} }
], ],
), ),
], ]
): ):
if chat_request.model.lower().startswith("gpt-"): if chat_request.model.lower().startswith("gpt-"):
chat_request.model = DEFAULT_MODEL chat_request.model = DEFAULT_MODEL
@@ -41,5 +39,7 @@ async def chat_completions(
model = BedrockModel() model = BedrockModel()
model.validate(chat_request) model.validate(chat_request)
if chat_request.stream: if chat_request.stream:
return StreamingResponse(content=model.chat_stream(chat_request), media_type="text/event-stream") return StreamingResponse(
return await model.chat(chat_request) content=model.chat_stream(chat_request), media_type="text/event-stream"
)
return model.chat(chat_request)

View File

@@ -1,6 +1,6 @@
from typing import Annotated from typing import Annotated
from fastapi import APIRouter, Body, Depends from fastapi import APIRouter, Depends, Body
from api.auth import api_key_auth from api.auth import api_key_auth
from api.models.bedrock import get_embeddings_model from api.models.bedrock import get_embeddings_model
@@ -21,11 +21,13 @@ async def embeddings(
examples=[ examples=[
{ {
"model": "cohere.embed-multilingual-v3", "model": "cohere.embed-multilingual-v3",
"input": ["Your text string goes here"], "input": [
"Your text string goes here"
],
} }
], ],
), ),
], ]
): ):
if embeddings_request.model.lower().startswith("text-embedding-"): if embeddings_request.model.lower().startswith("text-embedding-"):
embeddings_request.model = DEFAULT_EMBEDDING_MODEL embeddings_request.model = DEFAULT_EMBEDDING_MODEL

View File

@@ -4,7 +4,7 @@ from fastapi import APIRouter, Depends, HTTPException, Path
from api.auth import api_key_auth from api.auth import api_key_auth
from api.models.bedrock import BedrockModel from api.models.bedrock import BedrockModel
from api.schema import Model, Models from api.schema import Models, Model
router = APIRouter( router = APIRouter(
prefix="/models", prefix="/models",
@@ -22,7 +22,9 @@ async def validate_model_id(model_id: str):
@router.get("", response_model=Models) @router.get("", response_model=Models)
async def list_models(): async def list_models():
model_list = [Model(id=model_id) for model_id in chat_model.list_models()] model_list = [
Model(id=model_id) for model_id in chat_model.list_models()
]
return Models(data=model_list) return Models(data=model_list)
@@ -34,7 +36,7 @@ async def get_model(
model_id: Annotated[ model_id: Annotated[
str, str,
Path(description="Model ID", example="anthropic.claude-3-sonnet-20240229-v1:0"), Path(description="Model ID", example="anthropic.claude-3-sonnet-20240229-v1:0"),
], ]
): ):
await validate_model_id(model_id) await validate_model_id(model_id)
return Model(id=model_id) return Model(id=model_id)

View File

@@ -1,10 +1,8 @@
import time import time
from typing import Iterable, Literal from typing import Literal, Iterable
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from api.setting import DEFAULT_MODEL
class Model(BaseModel): class Model(BaseModel):
id: str id: str
@@ -41,15 +39,10 @@ class ImageUrl(BaseModel):
class ImageContent(BaseModel): class ImageContent(BaseModel):
type: Literal["image_url"] = "image_url" type: Literal["image_url"] = "image"
image_url: ImageUrl image_url: ImageUrl
class ToolContent(BaseModel):
type: Literal["text"] = "text"
text: str
class SystemMessage(BaseModel): class SystemMessage(BaseModel):
name: str | None = None name: str | None = None
role: Literal["system"] = "system" role: Literal["system"] = "system"
@@ -65,20 +58,14 @@ class UserMessage(BaseModel):
class AssistantMessage(BaseModel): class AssistantMessage(BaseModel):
name: str | None = None name: str | None = None
role: Literal["assistant"] = "assistant" role: Literal["assistant"] = "assistant"
content: str | list[TextContent | ImageContent] | None = None content: str | list[TextContent | ImageContent] | None
tool_calls: list[ToolCall] | None = None tool_calls: list[ToolCall] | None = None
class ToolMessage(BaseModel): class ToolMessage(BaseModel):
role: Literal["tool"] = "tool" role: Literal["tool"] = "tool"
content: str | list[ToolContent] | list[dict]
tool_call_id: str
class DeveloperMessage(BaseModel):
name: str | None = None
role: Literal["developer"] = "developer"
content: str content: str
tool_call_id: str
class Function(BaseModel): class Function(BaseModel):
@@ -97,43 +84,25 @@ class StreamOptions(BaseModel):
class ChatRequest(BaseModel): class ChatRequest(BaseModel):
messages: list[SystemMessage | UserMessage | AssistantMessage | ToolMessage | DeveloperMessage] messages: list[SystemMessage | UserMessage | AssistantMessage | ToolMessage]
model: str = DEFAULT_MODEL model: str
frequency_penalty: float | None = Field(default=0.0, le=2.0, ge=-2.0) # Not used frequency_penalty: float | None = Field(default=0.0, le=2.0, ge=-2.0) # Not used
presence_penalty: float | None = Field(default=0.0, le=2.0, ge=-2.0) # Not used presence_penalty: float | None = Field(default=0.0, le=2.0, ge=-2.0) # Not used
stream: bool | None = False stream: bool | None = False
stream_options: StreamOptions | None = None stream_options: StreamOptions | None = None
temperature: float | None = Field(default=None, le=2.0, ge=0.0) temperature: float | None = Field(default=1.0, le=2.0, ge=0.0)
top_p: float | None = Field(default=None, le=1.0, ge=0.0) top_p: float | None = Field(default=1.0, le=1.0, ge=0.0)
user: str | None = None # Not used user: str | None = None # Not used
max_tokens: int | None = 2048 max_tokens: int | None = 2048
max_completion_tokens: int | None = None
reasoning_effort: Literal["low", "medium", "high"] | None = None
n: int | None = 1 # Not used n: int | None = 1 # Not used
tools: list[Tool] | None = None tools: list[Tool] | None = None
tool_choice: str | object = "auto" tool_choice: str | object = "auto"
stop: list[str] | str | None = None
extra_body: dict | None = None
class PromptTokensDetails(BaseModel):
"""Details about prompt tokens usage, following OpenAI API format."""
cached_tokens: int = 0
audio_tokens: int = 0
class CompletionTokensDetails(BaseModel):
"""Details about completion tokens usage, following OpenAI API format."""
reasoning_tokens: int = 0
audio_tokens: int = 0
class Usage(BaseModel): class Usage(BaseModel):
prompt_tokens: int prompt_tokens: int
completion_tokens: int completion_tokens: int
total_tokens: int total_tokens: int
prompt_tokens_details: PromptTokensDetails | None = None
completion_tokens_details: CompletionTokensDetails | None = None
class ChatResponseMessage(BaseModel): class ChatResponseMessage(BaseModel):
@@ -141,7 +110,6 @@ class ChatResponseMessage(BaseModel):
role: Literal["assistant"] | None = None role: Literal["assistant"] | None = None
content: str | None = None content: str | None = None
tool_calls: list[ToolCall] | None = None tool_calls: list[ToolCall] | None = None
reasoning_content: str | None = None
class BaseChoice(BaseModel): class BaseChoice(BaseModel):
@@ -182,7 +150,7 @@ class EmbeddingsRequest(BaseModel):
input: str | list[str] | Iterable[int | Iterable[int]] input: str | list[str] | Iterable[int | Iterable[int]]
model: str model: str
encoding_format: Literal["float", "base64"] = "float" encoding_format: Literal["float", "base64"] = "float"
dimensions: int | None = None # Used by Nova embeddings; ignored by other models. dimensions: int | None = None # not used.
user: str | None = None # not used. user: str | None = None # not used.
@@ -202,11 +170,3 @@ class EmbeddingsResponse(BaseModel):
data: list[Embedding] data: list[Embedding]
model: str model: str
usage: EmbeddingsUsage usage: EmbeddingsUsage
class ErrorMessage(BaseModel):
message: str
class Error(BaseModel):
error: ErrorMessage

View File

@@ -1,18 +1,28 @@
import os import os
API_ROUTE_PREFIX = os.environ.get("API_ROUTE_PREFIX", "/api/v1") DEFAULT_API_KEYS = "bedrock"
API_ROUTE_PREFIX = "/api/v1"
TITLE = "Amazon Bedrock Proxy APIs" TITLE = "Amazon Bedrock Proxy APIs"
SUMMARY = "OpenAI-Compatible RESTful APIs for Amazon Bedrock" SUMMARY = "OpenAI-Compatible RESTful APIs for Amazon Bedrock"
VERSION = "0.1.0" VERSION = "0.1.0"
DESCRIPTION = """ DESCRIPTION = """
Use OpenAI-Compatible RESTful APIs for Amazon Bedrock models. Use OpenAI-Compatible RESTful APIs for Amazon Bedrock models.
List of Amazon Bedrock models currently supported:
- Anthropic Claude 2 / 3 /3.5 (Haiku/Sonnet/Opus)
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding
""" """
DEBUG = os.environ.get("DEBUG", "false").lower() != "false" DEBUG = os.environ.get("DEBUG", "false").lower() != "false"
AWS_REGION = os.environ.get("AWS_REGION", "us-west-2") AWS_REGION = os.environ.get("AWS_REGION", "us-west-2")
DEFAULT_MODEL = os.environ.get("DEFAULT_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0") DEFAULT_MODEL = os.environ.get(
DEFAULT_EMBEDDING_MODEL = os.environ.get("DEFAULT_EMBEDDING_MODEL", "cohere.embed-multilingual-v3") "DEFAULT_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0"
ENABLE_CROSS_REGION_INFERENCE = os.environ.get("ENABLE_CROSS_REGION_INFERENCE", "true").lower() != "false" )
ENABLE_APPLICATION_INFERENCE_PROFILES = os.environ.get("ENABLE_APPLICATION_INFERENCE_PROFILES", "true").lower() != "false" DEFAULT_EMBEDDING_MODEL = os.environ.get(
ENABLE_PROMPT_CACHING = os.environ.get("ENABLE_PROMPT_CACHING", "false").lower() != "false" "DEFAULT_EMBEDDING_MODEL", "cohere.embed-multilingual-v3"
)

View File

@@ -1,10 +1,9 @@
fastapi==0.128.0 fastapi==0.111.0
starlette==0.49.1 # CVE-2025-62727: Fix ReDoS in Range header parsing pydantic==2.7.1
pydantic==2.11.4
uvicorn==0.29.0 uvicorn==0.29.0
mangum==0.17.0 mangum==0.17.0
tiktoken==0.9.0 tiktoken==0.6.0
requests==2.32.4 requests==2.32.3
numpy==2.2.5 numpy==1.26.4
boto3==1.40.4 boto3==1.34.132
botocore==1.40.4 botocore==1.34.132

View File

@@ -0,0 +1,87 @@
import time
import random
def calculate_factorial(n):
if n == 0:
return 1
else:
return n * calculate_factorial(n - 1)
def find_largest_number(numbers):
largest = numbers[0]
for num in numbers:
if num > largest:
largest = num
return largest
def inefficient_sort(arr):
n = len(arr)
for i in range(n):
for j in range(0, n-i-1):
if arr[j] > arr[j+1]:
arr[j], arr[j+1] = arr[j+1], arr[j]
return arr
class User:
def __init__(self, name, age):
self.name = name
self.age = age
def print_user_info(self):
print(f"Name: {self.name}, Age: {self.age}")
def process_data(data):
result = []
for item in data:
if item % 2 == 0:
result.append(item * 2)
else:
result.append(item * 3)
return result
def generate_random_numbers(n):
numbers = []
for i in range(n):
numbers.append(random.randint(1, 100))
return numbers
def calculate_average(numbers):
total = sum(numbers)
count = len(numbers)
average = total / count
return average
def main():
# Inefficient factorial calculation
print(calculate_factorial(20))
# Unnecessary loop for finding largest number
numbers = [3, 7, 2, 9, 1, 5]
print(find_largest_number(numbers))
# Inefficient sorting algorithm
unsorted_list = [64, 34, 25, 12, 22, 11, 90]
print(inefficient_sort(unsorted_list))
# Inconsistent naming convention
user1 = User("John Doe", 30)
user1.print_user_info()
# Redundant if-else structure
data = [1, 2, 3, 4, 5]
print(process_data(data))
# Inefficient random number generation
random_numbers = generate_random_numbers(1000000)
print(f"Generated {len(random_numbers)} random numbers")
# Potential division by zero
empty_list = []
print(calculate_average(empty_list))
# Unnecessary time delay
time.sleep(5)
print("Finished processing after 5 seconds")
if __name__ == "__main__":
main()