Compare commits

85 Commits
dev ... main

Author SHA1 Message Date
Donghee Na
737cf076a0 fix: Fix ImageContent schema to use proper default value (#234) 2026-03-13 10:42:22 +08:00
Kane Zhu
6ae73c0c69 fix: merge additionalModelRequestFields instead of overwriting
When both reasoning_effort and extra_body are provided,
additionalModelRequestFields set by reasoning_effort (containing
reasoning_config) was silently overwritten by extra_body processing.
This prevented features like anthropic_beta for 1M context from
coexisting with reasoning_effort.
2026-03-10 16:41:52 +08:00
Donghee Na
d1dc4ed164 fix: Support reasoning_tokens at bedrock streaming response (#223) 2026-02-26 11:48:05 +08:00
Gabriel Koo
d14596ff47 feat: add Amazon Nova 2 multimodal embeddings support (#222)
* feat: add Amazon Nova 2 multimodal embeddings support

Adds support for `amazon.nova-2-multimodal-embeddings-v1:0` via the
new `NovaEmbeddingsModel` class, using the `taskType`/`singleEmbeddingParams`
request format documented in the Nova 2 user guide.

- Supports single and batch text inputs
- Respects the `dimensions` parameter (256/512/1024/2048/3072, default 3072)
- Supports `float` and `base64` encoding formats
- Includes `test_nova_embed.py` for quick end-to-end verification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove test script from repo

Test script moved to PR description instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: validate Nova embedding dimensions and fix falsy-zero bug

- Add VALID_DIMENSIONS set and upfront validation with a clear error message
- Fix `dimensions or DEFAULT` which would incorrectly ignore dimensions=0
- Add inline comment explaining approximate token counting (Nova API
  does not return token counts in the response)

* fix: address PR review comments for NovaEmbeddingsModel

- Fix VALID_DIMENSIONS to {256, 384, 1024, 3072} per Nova embeddings schema docs
  (previous values 512/2048 were mistakenly referenced from Titan embedding model docs)
- Replace str(item) fallback with HTTPException(400) to avoid silent garbage embeddings
- Update schema.py dimensions comment: 'not used' -> 'Used by Nova embeddings'
- Replace getattr() with direct .dimensions access on Pydantic model
- Move dimension validation before the loop (validates once, not per-text)
- Add enumerate to batch loop; include input index in error detail
- Switch isinstance(item, Iterable) to isinstance(item, list) for precise matching
- Add comment explaining embeddingPurpose hardcoded to GENERIC_INDEX

---------

Co-authored-by: Gabriel <gabrielkoo@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:41:17 +08:00
mjkam
a1844f95d4 Preload tiktoken encoding in Dockerfile (Lambda) (#220)
PR #193 added tiktoken preloading to Dockerfile_ecs but the same fix
was not applied to the Lambda Dockerfile. This causes a ConnectTimeout
error in network-restricted environments (e.g. Lambda in VPC without
NAT Gateway) when tiktoken tries to download cl100k_base encoding at
runtime from openaipublic.blob.core.windows.net.

Cache the encoding at build time, consistent with Dockerfile_ecs.

Related to #118
2026-02-19 17:00:05 +08:00
Hooman Yar
a150f7bb1c fix: support continue response for claude opus 4.6 (#219)
Co-authored-by: Hooman Yar <yarhooma@amazon.com>
2026-02-12 15:21:50 +08:00
Mengxin Zhu
9b3da3a5c8 fix(deps): update fastapi and starlette for CVE-2025-62727 (#216)
Update dependencies to fix HIGH severity ReDoS vulnerability:
- fastapi==0.128.0
- starlette==0.49.1

CVE-2025-62727 allows unauthenticated attackers to send crafted HTTP
Range headers that trigger quadratic-time processing in FileResponse
Range parsing, causing CPU exhaustion and DoS.

Fixes #215
2026-01-19 11:57:01 +08:00
Angélica de Oliveira
1a7f55b89b Add support for 'developer' role in chat messages (#209) 2025-12-09 11:26:10 +08:00
Mengxin Zhu
b41633b826 feat(apigw): add API Gateway response streaming support (#207)
Replace ALB + Lambda architecture with API Gateway REST API + Lambda
using response streaming for SSE support. This provides:

- No VPC required, reducing complexity and cost
- Native streaming support via API Gateway response streaming
- Pay-per-request pricing model

Changes:
- Add Lambda Web Adapter to Dockerfile for streaming support
- Replace BedrockProxy.template with API Gateway configuration
- Update README with new deployment options and latest models
- Update architecture diagram for API Gateway flow
2025-12-05 10:54:13 +08:00
Hooman Yar
0411454b3a feat: add claude-opus-4-5 to TEMPERATURE_TOPP_CONFLICT_MODELS set (#208)
Co-authored-by: Hooman Yar <yarhooma@amazon.com>
2025-12-05 09:22:37 +08:00
Kane Zhu
2c518bbd70 fix(docker): add --provenance=false --sbom=false for Lambda compatibility
Docker BuildKit (especially with docker-container driver) may create
OCI image manifests with attestations that AWS Lambda does not support.
Lambda requires Docker V2 Schema 2 format without multi-manifest index.

This fix ensures the build script generates Lambda-compatible images
regardless of the user's Docker/BuildKit configuration.

Fixes #206
2025-11-27 18:54:58 +08:00
Justin Dray
37374e79ba fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually (#202)
* fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually

* Add docker-compose to support running locally
2025-11-20 14:33:43 +08:00
Viktor Isaev
b3c1c82367 Fix healthcheck in Dockerfile_ecs (#199)
The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.
2025-11-20 14:30:00 +08:00
user-error1
ce4cfabb21 Fixed <think> </think> tags for GPT-OSS in bedrock.py (#200)
Added handling for message and content block deltas, including safety checks for open thinking tags.

Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.
2025-11-20 14:29:20 +08:00
Donghee Na
7e03ab062d fix: Fix invalid cache_creation_tokens metric key (#195) 2025-10-27 14:31:21 +08:00
Shion Ichikawa
18b68bd3a7 🐳 preload tiktoken encoding in Dockerfile_ecs (#193) 2025-10-22 22:28:40 +08:00
Kane Zhu
d86e64eed3 refactor(bedrock): unify inference profile metadata handling and cleanup
- Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles
- Remove unused region prefix functions and defaultdict import
- Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts
- Improve model ARN parsing and error handling in profile enumeration
- Consolidate profile metadata storage to enable consistent feature detection
2025-10-16 15:24:02 +08:00
Kane Zhu
b4800c54a0 feat: add prompt caching support for Claude and Nova models
Add comprehensive prompt caching support with flexible control options:

Features:
- ENV variable control (ENABLE_PROMPT_CACHING, default: false)
- Per-request control via extra_body.prompt_caching
- Pattern-based model detection (Claude, Nova)
- Token limit warnings (Nova 20K limit)
- OpenAI-compatible response format (prompt_tokens_details.cached_tokens)

Supported models:
- Claude 3+ models (anthropic.claude-*)
- Nova models (amazon.nova-*)
- Auto-detection prevents breaking unsupported models

Implementation:
- System prompts caching via extra_body.prompt_caching.system
- Messages caching via extra_body.prompt_caching.messages
- Non-streaming and streaming modes
- Compatible with reasoning, thinking, and tool calls
2025-10-15 11:03:19 +08:00
Scott Baxter
7756532b4c fix: ECS container /health endpoint does not require API_KEY Bearer Token (#184) 2025-10-13 11:59:42 +08:00
Li Yi
9cea7f9314 chore: polish code with little update (#182)
- Run Docker container as non-root user (appuser) to minimize security risks
- Add Docker HEALTHCHECK for better container orchestration
- Make CORS configurable via ALLOWED_ORIGINS env var with security warning
- Replace assertions with proper error handling (TypeError/ValueError)
- Add 30s timeout to HTTP requests to prevent hanging connections
- Disable auto-reload in production uvicorn settings
2025-10-11 14:49:18 +08:00
Fabian Franz
8177876e5e Support <think> tags (#117) 2025-09-30 20:29:19 +08:00
Neil Mazumdar
66cb51bb36 feat: add Claude Sonnet 4.5 support with global cross-region inference (#180)
This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929),
Anthropic's most intelligent model with enhanced coding capabilities and complex agent support.

Changes:
- Added global cross-region inference profile discovery (global.anthropic.*)
- Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously)
- Fixed reasoning_effort parameter handling to prevent KeyError
- Added extended thinking/interleaved thinking support via extra_body parameter
- Updated documentation with Claude Sonnet 4.5 examples (English and Chinese)
- Updated README with Sonnet 4.5 announcement

Technical Details:
- src/api/models/bedrock.py: Added global profile support in list_bedrock_models()
- src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter
- src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError
- docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples
- docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples
- docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation

Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
2025-09-30 16:51:26 +08:00
Mengxin Zhu
371d11d101 chore: cleanup useless files 2025-09-30 16:08:56 +08:00
Mengxin Zhu
e3ee9a707f docs: update deployment instructions and enhance ECR push script 2025-09-30 16:06:21 +08:00
Divyateja Pasupuleti
bdfa57c277 chore: update requirements to fix vulnerability (#177)
* chore: update requirements to fix vulnerability

* Update Python base image to version 3.13-slim
2025-09-19 16:15:32 +08:00
jbrockett
911dfe26d6 models: fix Application Inference Profiles mapping (#175)
* models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs

* Fix rebase issue

---------

Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>
2025-08-14 15:21:14 +08:00
RizviR
a2110ff648 Add pagination to list_inference_profiles calls (#173)
Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>
2025-08-13 10:26:34 +08:00
Fabian Franz
0cce2edab0 feat: update boto3 to version 1.40.4 (#169)
Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-13 10:23:30 +08:00
heisenbergye
3f1b56a526 feat: support Claude 4 Interleaved thinking (beta) (#164) 2025-07-21 16:44:21 +08:00
Mengxin Zhu
76a3614f17 fix: properly handle tool_use messages in conversation 2025-06-30 00:14:26 +08:00
Gagan M
01836087b1 feat: add support to include application inference profiles as models (#131)
---------

Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>
2025-06-23 22:49:27 +08:00
dependabot[bot]
dd191d7cd9 Bump requests from 2.32.3 to 2.32.4 in /src (#151)
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-20 17:50:19 +08:00
Zack Elias
844efec086 add titan G1 embeddings (#152) 2025-06-17 11:09:22 +08:00
UniMa007
aed57307bc Add Titan Embeddings G2 (#94) 2025-05-27 21:52:15 +08:00
Aiden Dai
4e8a913e43 fix empty content issue 2025-04-20 09:21:47 +08:00
Aiden Dai
b27e83624f fix typo 2025-03-26 13:10:07 +08:00
Aiden Dai
c98e123c8f optimize error response in streaming 2025-03-26 11:32:39 +08:00
Aiden Dai
4f1a75b49f fix potential process stuck issue 2025-03-22 18:39:08 +08:00
Aiden Dai
0ead770069 performance improvement 2025-03-13 18:24:08 +08:00
Aiden Dai
fa14ae8c05 apply ruff linter 2025-03-13 14:24:41 +08:00
Aiden Dai
879b8e2ac7 apply ruff linter 2025-03-13 13:58:18 +08:00
Aiden Dai
f21b9a2e84 apply ruff linter 2025-03-13 13:50:57 +08:00
Aiden Dai
33e8fcfd3b fix potential bad request issue 2025-03-13 07:16:42 +08:00
Aiden Dai
5ff18c0acd Update usage guide for deepseek-r1 2025-03-11 10:25:50 +08:00
Aiden Dai
fcbfa9fe3d Update usage guide for deepseek-r1 2025-03-11 10:24:19 +08:00
Aiden Dai
1a9c0f461e Update usage guide for deepseek-r1 2025-03-11 10:14:06 +08:00
Aiden Dai
66b8967d30 Update usage guide for deepseek-r1 2025-03-11 10:10:58 +08:00
Zhongsheng Ji
fcfebf9d9d feat: Response 429 if ThrottlingException (#91) 2025-03-10 09:01:33 +08:00
Aiden Dai
283115000a Support of reasoning 2025-02-28 08:08:54 +08:00
Aiden Dai
4095c2e74e Support of reasoning 2025-02-26 13:28:23 +08:00
Aiden Dai
a46e329c97 Support of reasoning 2025-02-26 12:25:38 +08:00
Omri Shaiko
54f4a2b017 Fix issue with toolResult error with Cursor. Use default DEFAULT_MODEL in ChatRequest (#110) 2025-02-26 10:43:44 +08:00
Aiden Dai
3ce47ff278 Partial support of reasoning 2025-02-25 16:23:06 +08:00
Sean Smith
b26ee3e9ea Added troubleshooting guide and made buttons cool (#96)
Signed-off-by: Sean Smith <sean.smith@contextual.ai>
2025-02-11 12:40:27 +08:00
Aiden Dai
1cb8a6a603 Update readme 2025-02-10 15:48:34 +08:00
Aiden Dai
c39f6bc942 Use secrets manager for api key 2025-02-10 15:25:12 +08:00
Aiden Dai
74ca3b938e Update architecture diagram 2025-02-10 10:02:43 +08:00
Aiden Dai
a6f3e1176b fix secret access issue 2025-02-09 06:53:23 +08:00
Aiden Dai
4d88731233 Use secrets manager for api key 2025-02-08 21:36:59 +08:00
Sean Smith
48bf360456 Security Guide (#101)
Signed-off-by: Sean Smith <sean.smith@contextual.ai>
2025-02-08 11:40:24 +08:00
yytdfc
093c6fa586 add stop parameter (#86) 2024-12-31 11:15:24 +08:00
Aiden Dai
b2c187c716 Increase connect timeout 2024-12-19 16:45:18 +08:00
Aiden Dai
581638b794 Update docs 2024-12-17 17:38:21 +08:00
Aiden Dai
51bc727b38 Use readme 2024-12-16 17:11:54 +08:00
Aiden Dai
dc067affc0 Use yaml template 2024-12-16 16:33:37 +08:00
Aiden Dai
29621ae59c Automatically detect model list 2024-12-16 16:15:09 +08:00
Aiden Dai
d4938a0af2 Automatically detect model list 2024-12-16 16:01:59 +08:00
Attila Szucs
cb38d328aa Add environment variable for PORT (#47)
* Customizable port

* Fix CMD
2024-12-16 10:00:17 +08:00
Fabio Nonato
4fc0d3bc94 Image error fix (#80)
---------

Co-authored-by: Fabio Nonato <fnp@amazon.com>
2024-12-11 11:26:51 +08:00
Hans Knecht
241d5c0f3e feat: allow the use of an ENV variable to set the API key if the ParameterStore isn't used. (#40) 2024-12-06 14:32:06 +08:00
Fabian Fischer
25b3cfb146 feat: add amazon nova inference profiles in us (#79) 2024-12-06 13:52:50 +08:00
mschfh
17503b032a Add cross-region inference profiles for Llama 3.2 models. (#75) 2024-12-05 11:22:11 +08:00
bkocik
6849ca828a Add cross-region inference profiles for Llama 3.1 models. (#72) 2024-11-20 09:57:35 +08:00
KAEYL98
11a31b5584 feat: add support for APAC claude 3 profiles (#69) 2024-11-07 16:43:15 +08:00
heisenbergye
5f7676608a suppot all Claude models Cross-Region Inference (#65) 2024-10-29 14:43:31 +08:00
Meng Xin Zhu
9cc3ea8253 chore: publish templates to s3 in release workflow (#64) 2024-10-28 17:36:35 +08:00
Aaron Yi
8785c63ddf fix: remove the code review pipeline
until the access right can be grant to pull request from fork
2024-10-25 13:12:59 +08:00
yike5460
0afd0463e1 fix: add debugging info onto workflow 2024-10-25 02:33:26 +00:00
Sergei Mikhailov
3a97677b97 Added "new Claude 3.5 Sonnet" v2 model to the list (#60) 2024-10-23 14:54:45 +08:00
yike5460
728ef6d8a6 fix: update workflow action to user var instead of secret 2024-10-10 06:24:04 +00:00
Mengxin Zhu
46fb759137 chore: use correct Dockerfile for building lambda image 2024-10-09 23:39:37 +08:00
Mengxin Zhu
326e566105 chore: use arm64 architecture image for lambda 2024-10-09 23:15:10 +08:00
Meng Xin Zhu
c1ee1b4244 chore: add automation script to release images (#58) 2024-10-09 18:20:14 +08:00
yike5460
552578a0ee fix: fix action dep issue 2024-10-09 08:30:19 +00:00
yike5460
d9590d6504 fix: place action file into the right folder 2024-10-09 08:22:14 +00:00
34 changed files with 2931 additions and 2769 deletions

19
.flake8
View File

@@ -1,19 +0,0 @@
[flake8]
max-line-length = 120
ignore =
E203,W191,W503
exclude =
build
.git
__pycache__
.tox
venv
.venv
.venv-test
tmp*
deployment
cdk.out
node_modules
max-complexity = 10
require-code = True

View File

@@ -1,74 +0,0 @@
name: Intelligent Code Review
# Enable manual trigger
on:
workflow_dispatch:
pull_request:
types: [opened, synchronize]
# Avoid running the same workflow on the same branch concurrently
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
jobs:
review:
runs-on: ubuntu-latest
permissions:
# read repository contents and write pull request comments
id-token: write
# allow github action bot to push new content into existing pull requests
contents: write
# contents: read
pull-requests: write
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: '20'
- name: Install dependencies
run: npm ci
shell: bash
# check if required dependencies @actions/core and @actions/github are installed
- name: Check if required dependencies are installed
run: |
npm list @actions/core
npm list @actions/github
shell: bash
- name: Debug GitHub Token
run: |
if [ -n "${{ secrets.GITHUB_TOKEN }}" ]; then
echo "GitHub Token is set"
else
echo "GitHub Token is not set"
fi
# assume the specified IAM role and set up the AWS credentials for use in subsequent steps.
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
# using repository secret to get the role arn
role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
aws-region: us-east-1
- name: Intelligent GitHub Actions
uses: aws-samples/aws-genai-cicd-suite@stable
with:
# Automatic Provision: The GITHUB_TOKEN is automatically created and provided by GitHub for each workflow run. You don't need to manually create or store this token as a secret.
github-token: ${{ secrets.GITHUB_TOKEN }}
aws-region: us-east-1
model-id: anthropic.claude-3-sonnet-20240229-v1:0
generate-code-review: 'true'
generate-code-review-level: 'detailed'
generate-code-review-exclude-files: '*.md,*.json,*.js'
generate-pr-description: 'true'
generate-unit-test: 'false'
generate-unit-test-source-folder: 'debugging'
# Removed the invalid input 'generate-unit-test-exclude-files'
# output-language: 'zh'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

1
.gitignore vendored
View File

@@ -160,3 +160,4 @@ cython_debug/
.idea/
Config
.vscode/launch.json

10
.pre-commit-config.yaml Normal file
View File

@@ -0,0 +1,10 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.9.10
hooks:
# Run the linter.
- id: ruff
types_or: [python, pyi]
# Run the formatter.
- id: ruff-format

309
README.md
View File

@@ -1,15 +1,19 @@
[中文](./README_CN.md)
# Bedrock Access Gateway
OpenAI-compatible RESTful APIs for Amazon Bedrock
## Breaking Changes
## What's New 🔥
The source code is refactored with the new [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) by bedrock which provides native support with tool calls.
**API Gateway Response Streaming Support** - You can now deploy with Amazon API Gateway REST API instead of ALB, enabling true response streaming for better latency and cost optimization. See [Deployment Options](#deployment-options) for details.
If you are facing any problems, please raise an issue.
**Latest Models Supported:**
- **Claude 4.5 Family**: Opus 4.5, Sonnet 4.5, Haiku 4.5 - Anthropic's most intelligent models with enhanced coding and agent capabilities
- **Amazon Nova**: Nova Micro, Nova Lite, Nova Pro, Nova Premier - Amazon's native foundation models with multimodal support
- **DeepSeek**: DeepSeek-R1 (reasoning), DeepSeek-V3.1 - Advanced reasoning and general-purpose models
- **Qwen 3**: Qwen3-32B, Qwen3-235B, Qwen3-Coder-30B, Qwen3-Coder-480B - Alibaba's latest language and coding models
- **OpenAI OSS**: gpt-oss-20b, gpt-oss-120b - Open-source GPT models available via Bedrock
It also supports reasoning for **Claude 4/4.5** (extended thinking and interleaved thinking) and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
## Overview
@@ -25,25 +29,17 @@ If you find this GitHub repository useful, please consider giving it a free star
- [x] Support streaming response via server-sent events (SSE)
- [x] Support Model APIs
- [x] Support Chat Completion APIs
- [x] Support Tool Call (**new**)
- [x] Support Embedding API (**new**)
- [x] Support Multimodal API (**new**)
- [x] Support Tool Call
- [x] Support Embedding API
- [x] Support Multimodal API
- [x] Support Cross-Region Inference
- [x] Support Application Inference Profiles (**new**)
- [x] Support Reasoning (**new**)
- [x] Support Interleaved thinking (**new**)
- [x] Support Prompt Caching (**new**)
Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.
> **Note:** The legacy [text completion](https://platform.openai.com/docs/api-reference/completions) API is not supported, you should change to use chat completion API.
Supported Amazon Bedrock models family:
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding
You can call the `models` API to get the full list of model IDs supported.
> **Note:** The default model is set to `anthropic.claude-3-sonnet-20240229-v1:0` which can be changed via Lambda environment variables (`DEFAULT_MODEL`).
## Get Started
@@ -57,58 +53,100 @@ Please make sure you have met below prerequisites:
### Architecture
The following diagram illustrates the reference architecture. Note that it also includes a new **VPC** with two public subnets only for the Application Load Balancer (ALB).
The following diagram illustrates the reference architecture. It uses [Amazon API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with Lambda for SSE support.
![Architecture](assets/arch.svg)
![Architecture](assets/arch.png)
You can also choose to use [AWS Fargate](https://aws.amazon.com/fargate/) behind the ALB instead of [AWS Lambda](https://aws.amazon.com/lambda/), the main difference is the latency of the first byte for streaming response (Fargate is lower).
### Deployment Options
Alternatively, you can use Lambda Function URL to replace ALB, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
| Option | Pros | Cons | Best For |
|--------|------|------|----------|
| **API Gateway + Lambda** | No VPC required, pay-per-request, native streaming support, lower operational overhead | Potential cold starts | Most use cases, cost-sensitive deployments |
| **ALB + Fargate** | Lowest streaming latency, no cold starts | Higher cost, requires VPC | High-throughput, latency-sensitive workloads |
You can also use Lambda Function URL as an alternative, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
### Deployment
Please follow the steps below to deploy the Bedrock Proxy APIs into your AWS account. Only supports regions where Amazon Bedrock is available (such as `us-west-2`). The deployment will take approximately **3-5 minutes** 🕒.
Please follow the steps below to deploy the Bedrock Proxy APIs into your AWS account. Only supports regions where Amazon Bedrock is available (such as `us-west-2`). The deployment will take approximately **10-15 minutes** 🕒.
**Step 1: Create your own custom API key (Optional)**
**Step 1: Create your own API key in Secrets Manager (MUST)**
> **Note:** This step is to use any string (without spaces) you like to create a custom API Key (credential) that will be used to access the proxy API later. This key does not have to match your actual OpenAI key, and you don't need to have an OpenAI API key. It is recommended that you take this step and ensure that you keep the key safe and private.
> **Note:** This step is to use any string (without spaces) you like to create a custom API Key (credential) that will be used to access the proxy API later. This key does not have to match your actual OpenAI key, and you don't need to have an OpenAI API key. please keep the key safe and private.
1. Open the AWS Management Console and navigate to the Systems Manager service.
2. In the left-hand navigation pane, click on "Parameter Store".
3. Click on the "Create parameter" button.
4. In the "Create parameter" window, select the following options:
- Name: Enter a descriptive name for your parameter (e.g., "BedrockProxyAPIKey").
- Description: Optionally, provide a description for the parameter.
- Tier: Select **Standard**.
- Type: Select **SecureString**.
- Value: Any string (without spaces).
5. Click "Create parameter".
6. Make a note of the parameter name you used (e.g., "BedrockProxyAPIKey"). You'll need this in the next step.
1. Open the AWS Management Console and navigate to the AWS Secrets Manager service.
2. Click on "Store a new secret" button.
3. In the "Choose secret type" page, select:
**Step 2: Deploy the CloudFormation stack**
Secret type: Other type of secret
Key/value pairs:
- Key: api_key
- Value: Enter your API key value
1. Sign in to AWS Management Console, switch to the region to deploy the CloudFormation Stack to.
2. Click the following button to launch the CloudFormation Stack in that region. Choose one of the following:
- **ALB + Lambda**
Click "Next"
4. In the "Configure secret" page:
Secret name: Enter a name (e.g., "BedrockProxyAPIKey")
Description: (Optional) Add a description of your secret
5. Click "Next" and review all your settings and click "Store"
[![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxy.template)
- **ALB + Fargate**
After creation, you'll see your secret in the Secrets Manager console. Make note of the secret ARN.
[![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxyFargate.template)
3. Click "Next".
4. On the "Specify stack details" page, provide the following information:
- Stack name: Change the stack name if needed.
- ApiKeyParam (if you set up an API key in Step 1): Enter the parameter name you used for storing the API key (e.g., `BedrockProxyAPIKey`). If you did not set up an API key, leave this field blank. Click "Next".
5. On the "Configure stack options" page, you can leave the default settings or customize them according to your needs.
6. Click "Next".
7. On the "Review" page, review the details of the stack you're about to create. Check the "I acknowledge that AWS CloudFormation might create IAM resources" checkbox at the bottom.
8. Click "Create stack".
**Step 2: Build and push container images to ECR**
1. Clone this repository:
```bash
git clone https://github.com/aws-samples/bedrock-access-gateway.git
cd bedrock-access-gateway
```
2. Run the build and push script:
```bash
cd scripts
bash ./push-to-ecr.sh
```
3. Follow the prompts to configure:
- ECR repository names (or use defaults)
- Image tag (or use default: `latest`)
- AWS region (or use default: `us-east-1`)
4. The script will build and push both Lambda and ECS/Fargate images to your ECR repositories.
5. **Important**: Copy the image URIs displayed at the end of the script output. You'll need these in the next step.
**Step 3: Deploy the CloudFormation stack**
1. Download the CloudFormation template you want to use:
- For API Gateway + Lambda: [`deployment/BedrockProxy.template`](deployment/BedrockProxy.template)
- For ALB + Fargate: [`deployment/BedrockProxyFargate.template`](deployment/BedrockProxyFargate.template)
2. Sign in to AWS Management Console and navigate to the CloudFormation service in your target region.
3. Click "Create stack" → "With new resources (standard)".
4. Upload the template file you downloaded.
5. On the "Specify stack details" page, provide the following information:
- **Stack name**: Enter a stack name (e.g., "BedrockProxyAPI")
- **ApiKeySecretArn**: Enter the secret ARN from Step 1
- **ContainerImageUri**: Enter the ECR image URI from Step 2 output
- **DefaultModelId**: (Optional) Change the default model if needed
Click "Next".
6. On the "Configure stack options" page, you can leave the default settings or customize them according to your needs. Click "Next".
7. On the "Review" page, review all details. Check the "I acknowledge that AWS CloudFormation might create IAM resources" checkbox at the bottom. Click "Submit".
That is it! 🎉 Once deployed, click the CloudFormation stack and go to **Outputs** tab, you can find the API Base URL from `APIBaseUrl`, the value should look like `http://xxxx.xxx.elb.amazonaws.com/api/v1`.
### Troubleshooting
If you encounter any issues, please check the [Troubleshooting Guide](./docs/Troubleshooting.md) for more details.
### SDK/API Usage
All you need is the API Key and the API Base URL. If you didn't set up your own key, then the default API Key (`bedrock`) will be used.
All you need is the API Key and the API Base URL. If you didn't set up your own key following Step 1, the application will fail to start with an error message indicating that the API Key is not configured.
Now, you can try out the proxy APIs. Let's say you want to test Claude 3 Sonnet model (model ID: `anthropic.claude-3-sonnet-20240229-v1:0`)...
@@ -153,14 +191,123 @@ print(completion.choices[0].message.content)
Please check [Usage Guide](./docs/Usage.md) for more details about how to use embedding API, multimodal API and tool call.
### Application Inference Profiles
This proxy now supports **Application Inference Profiles**, which allow you to track usage and costs for your model invocations. You can use application inference profiles created in your AWS account for cost tracking and monitoring purposes.
**Using Application Inference Profiles:**
```bash
# Use an application inference profile ARN as the model ID
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "arn:aws:bedrock:us-west-2:123456789012:application-inference-profile/your-profile-id",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
```
**SDK Usage with Application Inference Profiles:**
```python
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="arn:aws:bedrock:us-west-2:123456789012:application-inference-profile/your-profile-id",
messages=[{"role": "user", "content": "Hello!"}],
)
print(completion.choices[0].message.content)
```
**Benefits of Application Inference Profiles:**
- **Cost Tracking**: Track usage and costs for specific applications or use cases
- **Usage Monitoring**: Monitor model invocation metrics through CloudWatch
- **Tag-based Cost Allocation**: Use AWS cost allocation tags for detailed billing analysis
For more information about creating and managing application inference profiles, see the [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-create.html).
### Prompt Caching
This proxy now supports **Prompt Caching** for Claude and Nova models, which can reduce costs by up to 90% and latency by up to 85% for workloads with repeated prompts.
**Supported Models:**
- Claude models (Claude 3.5 Haiku, Claude 4, Claude 4.5, etc.)
- Nova models (Nova Micro, Nova Lite, Nova Pro, Nova Premier)
**Enabling Prompt Caching:**
You can enable prompt caching in two ways:
1. **Globally via Environment Variable** (set in ECS Task Definition or Lambda):
```bash
ENABLE_PROMPT_CACHING=true
```
2. **Per-request via `extra_body`** :
**Python SDK:**
```python
from openai import OpenAI
client = OpenAI()
# Cache system prompts
response = client.chat.completions.create(
model="global.anthropic.claude-haiku-4-5-20251001-v1:0",
messages=[
{"role": "system", "content": "You are an expert assistant with knowledge of..."},
{"role": "user", "content": "Help me with this task"}
],
extra_body={
"prompt_caching": {"system": True}
}
)
# Check cache hit
if response.usage.prompt_tokens_details:
cached_tokens = response.usage.prompt_tokens_details.cached_tokens
print(f"Cached tokens: {cached_tokens}")
```
**cURL:**
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "global.anthropic.claude-haiku-4-5-20251001-v1:0",
"messages": [
{"role": "system", "content": "Long system prompt..."},
{"role": "user", "content": "Question"}
],
"extra_body": {
"prompt_caching": {"system": true}
}
}'
```
**Cache Options:**
- `"prompt_caching": {"system": true}` - Cache system prompts
- `"prompt_caching": {"messages": true}` - Cache user messages
- `"prompt_caching": {"system": true, "messages": true}` - Cache both
**Requirements:**
- Prompt must be ≥1,024 tokens to enable caching
- Cache TTL is 5 minutes (resets on each cache hit)
- Nova models have a 20,000 token caching limit
For more information, see the [Amazon Bedrock Prompt Caching Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html).
## Other Examples
### AutoGen
Below is an image of setting up the model in AutoGen studio.
![AutoGen Model](assets/autogen-model.png)
### LangChain
Make sure you use `ChatOpenAI(...)` instead of `OpenAI(...)`
@@ -199,43 +346,37 @@ print(response)
This application does not collect any of your data. Furthermore, it does not log any requests or responses by default.
### Why not used API Gateway instead of Application Load Balancer?
### Why choose API Gateway vs ALB?
Short answer is that API Gateway does not support server-sent events (SSE) for streaming response.
**API Gateway + Lambda** uses [API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with [Lambda Web Adapter](https://github.com/awslabs/aws-lambda-web-adapter) to support SSE streaming without requiring a VPC. This is a cost-effective, serverless option with up to 10 minutes timeout.
**ALB + Fargate** provides the lowest streaming latency with no cold starts, ideal for high-throughput workloads.
### Which regions are supported?
This solution only supports the regions where Amazon Bedrock is available, as for now, below are the list.
- US East (N. Virginia): us-east-1
- US West (Oregon): us-west-2
- Asia Pacific (Singapore): ap-southeast-1
- Asia Pacific (Sydney): ap-southeast-2
- Asia Pacific (Tokyo): ap-northeast-1
- Europe (Frankfurt): eu-central-1
- Europe (Paris): eu-west-3
Generally speaking, all regions that Amazon Bedrock supports will also be supported, if not, please raise an issue in Github.
Note that not all models are available in those regions.
### Can I build and use my own ECR image
### Which models are supported?
Yes, you can clone the repo and build the container image by yourself (`src/Dockerfile`) and then push to your ECR repo. You can use `scripts/push-to-ecr.sh`
Replace the repo url in the CloudFormation template before you deploy.
You can use the [Models API](./docs/Usage.md#models-api) to get/refresh a list of supported models in the current region.
### Can I run this locally
Yes, you can run this locally.
Yes, you can run this locally, e.g. run below command under `src` folder:
```bash
uvicorn api.app:app --host 0.0.0.0 --port 8000
```
The API base url should look like `http://localhost:8000/api/v1`.
### Any performance sacrifice or latency increase by using the proxy APIs
Comparing with the AWS SDK call, the referenced architecture will bring additional latency on response, you can try and test that on you own.
Compared with direct AWS SDK calls, the proxy architecture will add some latency. The default API Gateway + Lambda deployment provides good streaming performance with Lambda response streaming.
Also, you can use Lambda Web Adapter + Function URL (see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.
For lowest latency on streaming responses, consider the ALB + Fargate deployment option which eliminates cold starts and provides consistent performance.
### Any plan to support SageMaker models?
@@ -247,13 +388,7 @@ Fine-tuned models and models with Provisioned Throughput are currently not suppo
### How to upgrade?
To use the latest features, you don't need to redeploy the CloudFormation stack. You simply need to pull the latest image.
To do so, depends on which version you deployed:
- **Lambda version**: Go to AWS Lambda console, find the Lambda function, then find and click the `Deploy new image` button and click save.
- **Fargate version**: Go to ECS console, click the ECS cluster, go the `Tasks` tab, select the only task that is running and simply click `Stop selected` menu. A new task with latest image will start automatically.
To use the latest features, you need follow the deployment guide to redeploy the application. You can upgrade the existing CloudFormation stack to get the latest changes.
## Security

View File

@@ -1,267 +0,0 @@
[English](./README.md)
# Bedrock Access Gateway
使用兼容OpenAI的API访问Amazon Bedrock
## 重大变更
项目源代码已使用Bedrock提供的新 [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) 进行了重构,该API对工具调用提供了原生支持。
如果您遇到任何问题,请提 Github Issue。
## 概述
Amazon Bedrock提供了广泛的基础模型(如Claude 3 Opus/Sonnet/Haiku、Llama 2/3、Mistral/Mixtral等),以及构建生成式AI应用程序的多种功能。更多详细信息,请查看[Amazon
Bedrock](https://aws.amazon.com/bedrock)。
有时,您可能已经使用OpenAI的API或SDK构建了应用程序,并希望在不修改代码的情况下试用Amazon
Bedrock的模型。或者,您可能只是希望在AutoGen等工具中评估这些基础模型的功能。 好消息是, 这里提供了一种方便的途径,让您可以使用
OpenAI 的 API 或 SDK 无缝集成并试用 Amazon Bedrock 的模型,而无需对现有代码进行修改。
如果您觉得这个项目有用,请考虑给它点个一个免费的小星星 ⭐。
功能列表:
- [x] 支持 server-sent events (SSE)的流式响应
- [x] 支持 Model APIs
- [x] 支持 Chat Completion APIs
- [x] 支持 Tool Call (**new**)
- [x] 支持 Embedding API (**new**)
- [x] 支持 Multimodal API (**new**)
请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用新API的更多详细信息。
> 注意: 不支持旧的 [text completion](https://platform.openai.com/docs/api-reference/completions) API请更改为使用Chat Completion API。
支持的Amazon Bedrock模型家族
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding
你可以先调用`models` API 获取支持的详细 model ID 列表。
> 注意: 默认模型为 `anthropic.claude-3-sonnet-20240229-v1:0` 可以通过更改Lambda环境变量进行更改。
## 使用指南
### 前提条件
请确保您已满足以下先决条件:
- 可以访问Amazon Bedrock基础模型。
如果您还没有获得模型访问权限,请参考[配置](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html)指南。
### 架构图
下图展示了本方案的参考架构。请注意,它还包括一个新的**VPC**,其中只有两个公共子网用于应用程序负载均衡器(ALB)。
![Architecture](assets/arch.svg)
您也可以选择在 ALB 后面接 [AWS Fargate](https://aws.amazon.com/fargate/) 而不是 [AWS Lambda](https://aws.amazon.com/lambda/)主要区别在于流响应的首字节延迟Fargate更低
或者,您可以使用 Lambda Function URL 来代替 ALB,请参阅[示例](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
### 部署
请按以下步骤将Bedrock代理API部署到您的AWS账户中。仅支持Amazon Bedrock可用的区域(如us-west-2)。 部署预计用时**3-5分钟** 🕒。
**第一步: 自定义您的API Key (可选)**
> 注意:这一步是使用任意字符串不带空格创建一个自定义的API Key(凭证),将用于后续访问代理API。此API Key不必与您实际的OpenAI
> Key一致,您甚至无需拥有OpenAI API Key。建议您执行此步操作并且请确保保管好此API Key。
1. 打开AWS管理控制台,导航到Systems Manager服务。
2. 在左侧导航窗格中,单击"参数存储"。
3. 单击"创建参数"按钮。
4. 在"创建参数"窗口中,选择以下选项:
- 名称:输入参数的描述性名称(例如"BedrockProxyAPIKey")。
- 描述:可选,为参数提供描述。
- 层级:选择**标准**。
- 类型:选择**SecureString**。
- 值: 随意字符串(不带空格)。
5. 单击"创建参数"。
6. 记录您使用的参数名称(例如"BedrockProxyAPIKey")。您将在下一步中需要它。
**第二步: 部署CloudFormation堆栈**
1. 登录AWS管理控制台,切换到要部署CloudFormation堆栈的区域。
2. 单击以下按钮在该区域启动CloudFormation堆栈选择一种方式部署。
- **ALB + Lambda**
[![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxy.template)
- **ALB + Fargate**
[![Launch Stack](assets/launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/create/template?stackName=BedrockProxyAPI&templateURL=https://aws-gcr-solutions.s3.amazonaws.com/bedrock-access-gateway/latest/BedrockProxyFargate.template)
3. 单击"下一步"。
4. 在"指定堆栈详细信息"页面,提供以下信息:
- 堆栈名称: 可以根据需要更改名称。
- ApiKeyParam(如果在步骤1中设置了API Key):输入您用于存储API密钥的参数名称(例如"BedrockProxyAPIKey"),否则,请将此字段留空。
单击"下一步"。
5. 在"配置堆栈选项"页面,您可以保留默认设置或根据需要进行自定义。
6. 单击"下一步"。
7. 在"审核"页面,查看您即将创建的堆栈详细信息。勾选底部的"我确认AWS CloudFormation 可能创建 IAM 资源。"复选框。
8. 单击"创建堆栈"。
仅此而已 🎉 。部署完成后,点击CloudFormation堆栈,进入"输出"选项卡,你可以从"APIBaseUrl"
中找到API Base URL,它应该类似于`http://xxxx.xxx.elb.amazonaws.com/api/v1` 这样的格式。
### SDK/API使用
你只需要API Key和API Base URL。如果你没有设置自己的密钥,那么默认将使用API Key `bedrock`
现在,你可以尝试使用代理API了。假设你想测试Claude 3 Sonnet模型,那么使用"anthropic.claude-3-sonnet-20240229-v1:0"作为模型ID。
- **API 使用示例**
```bash
export OPENAI_API_KEY=<API key>
export OPENAI_BASE_URL=<API base url>
# 旧版本请使用OPENAI_API_BASE
# https://github.com/openai/openai-python/issues/624
export OPENAI_API_BASE=<API base url>
```
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
```
- **SDK 使用示例**
```python
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "Hello!"}],
)
print(completion.choices[0].message.content)
```
请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用Embedding API、多模态API和Tool Call的更多详细信息。
## 其他例子
### AutoGen
例如在AutoGen studio配置和使用模型
![AutoGen Model](assets/autogen-model.png)
### LangChain
请确保使用的示`ChatOpenAI(...)` ,而不是`OpenAI(...)`
```python
# pip install langchain-openai
import os
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
chat = ChatOpenAI(
model="anthropic.claude-3-sonnet-20240229-v1:0",
temperature=0,
openai_api_key=os.environ['OPENAI_API_KEY'],
openai_api_base=os.environ['OPENAI_BASE_URL'],
)
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
llm_chain = LLMChain(prompt=prompt, llm=chat)
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
response = llm_chain.invoke(question)
print(response)
```
## FAQs
### 关于隐私
这个方案不会收集您的任何数据。而且,它默认情况下也不会记录任何请求或响应。
### 为什么没有使用API Gateway 而是使用了Application Load Balancer?
简单的答案是API Gateway不支持 server-sent events (SSE) 用于流式响应。
### 支持哪些区域?
只支持Amazon Bedrock可用的区域, 截至当前,包括以下区域:
- 美国东部(弗吉尼亚北部)us-east-1
- 美国西部(俄勒冈州)us-west-2
- 亚太地区(新加坡)ap-southeast-1
- 亚太地区(悉尼)ap-southeast-2
- 亚太地区(东京)ap-northeast-1
- 欧洲(法兰克福)eu-central-1
- 欧洲(巴黎)eu-west-3
通常来说所有Amazon Bedrock支持的区域都支持如果不支持请提个Github Issue。
注意,并非所有模型都在上面区可用。
### 我可以构建并使用自己的ECR镜像吗?
是的,你可以克隆repo并自行构建容器镜像(src/Dockerfile),然后推送到你自己的ECR仓库。 脚本可以参考`scripts/push-to-ecr.sh`
在部署之前,请在CloudFormation模板中替换镜像仓库URL。
### 我可以在本地运行吗?
是的,你可以在本地运行,那么API Base URL应该类似于`http://localhost:8000/api/v1`
### 使用代理API会有任何性能牺牲或延迟增加吗?
与 AWS SDK 调用相比,本方案参考架构会在响应上会有额外的延迟,你可以自己部署并测试。
另外,你也可以使用 Lambda Web Adapter + Function URL (
参见 [示例](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming))来代替 ALB
或使用 AWS Fargate 来代替 Lambda,以获得更好的流响应性能。
### 有计划支持SageMaker模型吗?
目前没有支持SageMaker模型的计划。这取决于是否有客户需求。
### 有计划支持Bedrock自定义模型吗?
不支持微调模型和设置了已预配吞吐量的模型。如有需要,你可以克隆repo并进行自定义。
### 如何升级?
要使用最新功能,您无需重新部署CloudFormation堆栈。您只需拉取最新的镜像即可。
具体操作方式取决于您部署的版本:
- **Lambda版本**: 进入AWS Lambda控制台,找到Lambda 函数,然后找到并单击`部署新映像`按钮,然后单击保存。
- **Fargate版本**: 进入ECS控制台,单击ECS集群,转到`任务`选项卡,选择正在运行的唯一任务,然后点击`停止所选`菜单, ECS会自动启动新任务并且使用最新镜像。
## 安全
更多信息,请参阅[CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications)。
## 许可证
本项目根据MIT-0许可证获得许可。请参阅LICENSE文件。

8
THIRD_PARTY Normal file
View File

@@ -0,0 +1,8 @@
certifi
SPDX-License-Identifier: MPL-2.0
This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.
https://github.com/certifi/python-certifi

BIN
assets/arch.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 209 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 212 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.3 KiB

View File

@@ -1,768 +1,178 @@
{
"Description": "Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock",
"Transform": "AWS::LanguageExtensions",
"Parameters": {
"ApiKeyParam": {
"Type": "String",
"Default": "",
"Description": "The parameter name in System Manager used to store the API Key, leave blank to use a default key"
}
},
"Resources": {
"VPCB9E5F0B4": {
"Type": "AWS::EC2::VPC",
"Properties": {
"CidrBlock": "10.250.0.0/16",
"EnableDnsHostnames": true,
"EnableDnsSupport": true,
"InstanceTenancy": "default",
"Tags": [
{
"Key": "Name",
"Value": "BedrockProxy/VPC"
}
]
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/Resource"
}
},
"VPCPublicSubnet1SubnetB4246D30": {
"Type": "AWS::EC2::Subnet",
"Properties": {
"AvailabilityZone": {
"Fn::Select": [
0,
{
"Fn::GetAZs": ""
}
]
},
"CidrBlock": "10.250.0.0/24",
"MapPublicIpOnLaunch": true,
"Tags": [
{
"Key": "aws-cdk:subnet-name",
"Value": "Public"
},
{
"Key": "aws-cdk:subnet-type",
"Value": "Public"
},
{
"Key": "Name",
"Value": "BedrockProxy/VPC/PublicSubnet1"
}
],
"VpcId": {
"Ref": "VPCB9E5F0B4"
}
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/Subnet"
}
},
"VPCPublicSubnet1RouteTableFEE4B781": {
"Type": "AWS::EC2::RouteTable",
"Properties": {
"Tags": [
{
"Key": "Name",
"Value": "BedrockProxy/VPC/PublicSubnet1"
}
],
"VpcId": {
"Ref": "VPCB9E5F0B4"
}
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/RouteTable"
}
},
"VPCPublicSubnet1RouteTableAssociation0B0896DC": {
"Type": "AWS::EC2::SubnetRouteTableAssociation",
"Properties": {
"RouteTableId": {
"Ref": "VPCPublicSubnet1RouteTableFEE4B781"
},
"SubnetId": {
"Ref": "VPCPublicSubnet1SubnetB4246D30"
}
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/RouteTableAssociation"
}
},
"VPCPublicSubnet1DefaultRoute91CEF279": {
"Type": "AWS::EC2::Route",
"Properties": {
"DestinationCidrBlock": "0.0.0.0/0",
"GatewayId": {
"Ref": "VPCIGWB7E252D3"
},
"RouteTableId": {
"Ref": "VPCPublicSubnet1RouteTableFEE4B781"
}
},
"DependsOn": [
"VPCVPCGW99B986DC"
],
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/PublicSubnet1/DefaultRoute"
}
},
"VPCPublicSubnet2Subnet74179F39": {
"Type": "AWS::EC2::Subnet",
"Properties": {
"AvailabilityZone": {
"Fn::Select": [
1,
{
"Fn::GetAZs": ""
}
]
},
"CidrBlock": "10.250.1.0/24",
"MapPublicIpOnLaunch": true,
"Tags": [
{
"Key": "aws-cdk:subnet-name",
"Value": "Public"
},
{
"Key": "aws-cdk:subnet-type",
"Value": "Public"
},
{
"Key": "Name",
"Value": "BedrockProxy/VPC/PublicSubnet2"
}
],
"VpcId": {
"Ref": "VPCB9E5F0B4"
}
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/Subnet"
}
},
"VPCPublicSubnet2RouteTable6F1A15F1": {
"Type": "AWS::EC2::RouteTable",
"Properties": {
"Tags": [
{
"Key": "Name",
"Value": "BedrockProxy/VPC/PublicSubnet2"
}
],
"VpcId": {
"Ref": "VPCB9E5F0B4"
}
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/RouteTable"
}
},
"VPCPublicSubnet2RouteTableAssociation5A808732": {
"Type": "AWS::EC2::SubnetRouteTableAssociation",
"Properties": {
"RouteTableId": {
"Ref": "VPCPublicSubnet2RouteTable6F1A15F1"
},
"SubnetId": {
"Ref": "VPCPublicSubnet2Subnet74179F39"
}
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/RouteTableAssociation"
}
},
"VPCPublicSubnet2DefaultRouteB7481BBA": {
"Type": "AWS::EC2::Route",
"Properties": {
"DestinationCidrBlock": "0.0.0.0/0",
"GatewayId": {
"Ref": "VPCIGWB7E252D3"
},
"RouteTableId": {
"Ref": "VPCPublicSubnet2RouteTable6F1A15F1"
}
},
"DependsOn": [
"VPCVPCGW99B986DC"
],
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/PublicSubnet2/DefaultRoute"
}
},
"VPCIGWB7E252D3": {
"Type": "AWS::EC2::InternetGateway",
"Properties": {
"Tags": [
{
"Key": "Name",
"Value": "BedrockProxy/VPC"
}
]
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/IGW"
}
},
"VPCVPCGW99B986DC": {
"Type": "AWS::EC2::VPCGatewayAttachment",
"Properties": {
"InternetGatewayId": {
"Ref": "VPCIGWB7E252D3"
},
"VpcId": {
"Ref": "VPCB9E5F0B4"
}
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/VPC/VPCGW"
}
},
"ProxyApiHandlerServiceRoleBE71BFB1": {
"Type": "AWS::IAM::Role",
"Properties": {
"AssumeRolePolicyDocument": {
"Statement": [
{
"Action": "sts:AssumeRole",
"Effect": "Allow",
"Principal": {
"Service": "lambda.amazonaws.com"
}
}
],
"Version": "2012-10-17"
},
"ManagedPolicyArns": [
{
"Fn::Join": [
"",
[
"arn:",
{
"Ref": "AWS::Partition"
},
":iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
]
]
}
]
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/ServiceRole/Resource"
}
},
"ProxyApiHandlerServiceRoleDefaultPolicy86681202": {
"Type": "AWS::IAM::Policy",
"Properties": {
"PolicyDocument": {
"Statement": [
{
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Effect": "Allow",
"Resource": "arn:aws:bedrock:*::foundation-model/*"
},
{
"Action": [
"ssm:DescribeParameters",
"ssm:GetParameters",
"ssm:GetParameter",
"ssm:GetParameterHistory"
],
"Effect": "Allow",
"Resource": {
"Fn::Join": [
"",
[
"arn:",
{
"Ref": "AWS::Partition"
},
":ssm:",
{
"Ref": "AWS::Region"
},
":",
{
"Ref": "AWS::AccountId"
},
":parameter/",
{
"Ref": "ApiKeyParam"
}
]
]
}
}
],
"Version": "2012-10-17"
},
"PolicyName": "ProxyApiHandlerServiceRoleDefaultPolicy86681202",
"Roles": [
{
"Ref": "ProxyApiHandlerServiceRoleBE71BFB1"
}
]
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/ServiceRole/DefaultPolicy/Resource"
}
},
"ProxyApiHandlerEC15A492": {
"Type": "AWS::Lambda::Function",
"Properties": {
"Architectures": [
"arm64"
],
"Code": {
"ImageUri": {
"Fn::Join": [
"",
[
"366590864501.dkr.ecr.",
{
"Ref": "AWS::Region"
},
".",
{
"Ref": "AWS::URLSuffix"
},
"/bedrock-proxy-api:latest"
]
]
}
},
"Description": "Bedrock Proxy API Handler",
"Environment": {
"Variables": {
"API_KEY_PARAM_NAME": {
"Ref": "ApiKeyParam"
},
"DEBUG": "false",
"DEFAULT_MODEL": {
"Fn::FindInMap": [
"ProxyRegionTable03E5BEB3",
{
"Ref": "AWS::Region"
},
"model",
{
"DefaultValue": "anthropic.claude-3-sonnet-20240229-v1:0"
}
]
},
"DEFAULT_EMBEDDING_MODEL": "cohere.embed-multilingual-v3"
}
},
"MemorySize": 1024,
"PackageType": "Image",
"Role": {
"Fn::GetAtt": [
"ProxyApiHandlerServiceRoleBE71BFB1",
"Arn"
]
},
"Timeout": 300
},
"DependsOn": [
"ProxyApiHandlerServiceRoleDefaultPolicy86681202",
"ProxyApiHandlerServiceRoleBE71BFB1"
],
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/Resource"
}
},
"ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779": {
"Type": "AWS::Lambda::Permission",
"Properties": {
"Action": "lambda:InvokeFunction",
"FunctionName": {
"Fn::GetAtt": [
"ProxyApiHandlerEC15A492",
"Arn"
]
},
"Principal": "elasticloadbalancing.amazonaws.com"
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ApiHandler/Invoke2UTWxhlfyqbT5FTn--5jvgbLgj+FfJwzswGk55DU1H--Y="
}
},
"ProxyALB87756780": {
"Type": "AWS::ElasticLoadBalancingV2::LoadBalancer",
"Properties": {
"LoadBalancerAttributes": [
{
"Key": "deletion_protection.enabled",
"Value": "false"
}
],
"Scheme": "internet-facing",
"SecurityGroups": [
{
"Fn::GetAtt": [
"ProxyALBSecurityGroup0D6CA3DA",
"GroupId"
]
}
],
"Subnets": [
{
"Ref": "VPCPublicSubnet1SubnetB4246D30"
},
{
"Ref": "VPCPublicSubnet2Subnet74179F39"
}
],
"Type": "application"
},
"DependsOn": [
"VPCPublicSubnet1DefaultRoute91CEF279",
"VPCPublicSubnet1RouteTableAssociation0B0896DC",
"VPCPublicSubnet2DefaultRouteB7481BBA",
"VPCPublicSubnet2RouteTableAssociation5A808732"
],
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ALB/Resource"
}
},
"ProxyALBSecurityGroup0D6CA3DA": {
"Type": "AWS::EC2::SecurityGroup",
"Properties": {
"GroupDescription": "Automatically created Security Group for ELB BedrockProxyALB1CE4CAD1",
"SecurityGroupEgress": [
{
"CidrIp": "255.255.255.255/32",
"Description": "Disallow all traffic",
"FromPort": 252,
"IpProtocol": "icmp",
"ToPort": 86
}
],
"SecurityGroupIngress": [
{
"CidrIp": "0.0.0.0/0",
"Description": "Allow from anyone on port 80",
"FromPort": 80,
"IpProtocol": "tcp",
"ToPort": 80
}
],
"VpcId": {
"Ref": "VPCB9E5F0B4"
}
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ALB/SecurityGroup/Resource"
}
},
"ProxyALBListener933E9515": {
"Type": "AWS::ElasticLoadBalancingV2::Listener",
"Properties": {
"DefaultActions": [
{
"TargetGroupArn": {
"Ref": "ProxyALBListenerTargetsGroup187739FA"
},
"Type": "forward"
}
],
"LoadBalancerArn": {
"Ref": "ProxyALB87756780"
},
"Port": 80,
"Protocol": "HTTP"
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ALB/Listener/Resource"
}
},
"ProxyALBListenerTargetsGroup187739FA": {
"Type": "AWS::ElasticLoadBalancingV2::TargetGroup",
"Properties": {
"HealthCheckEnabled": false,
"TargetType": "lambda",
"Targets": [
{
"Id": {
"Fn::GetAtt": [
"ProxyApiHandlerEC15A492",
"Arn"
]
}
}
]
},
"DependsOn": [
"ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779"
],
"Metadata": {
"aws:cdk:path": "BedrockProxy/Proxy/ALB/Listener/TargetsGroup/Resource"
}
},
"CDKMetadata": {
"Type": "AWS::CDK::Metadata",
"Properties": {
"Analytics": "v2:deflate64:H4sIAAAAAAAA/1VRXW/CMAz8LbyHDMovAKZNSJtWFcTr5LpeZ0iTKHFAqOp/n1q+uief7y7ynZLp+WKhZxM4xylWx6nhUrdbATyq9Y/NIUBDQkHBOX63hJlu9x57aZ+vVZ5Kw7hNpSXpuScqXBLaQWnoyT+5ZYwOGYSdfZh7sLFCwZK8g9AZLrczt20pAvjbkBW1JUyB5fIeXPLDgTHRKcKgC/IusrhwWUEkZaApK9Dtq8MjhU0DNb0li/cIY5xTaDhGdrZTDI1uC3etMczcGcYh2hV1igxEYTQOqhIMWGRbnzLdLr03jEPLDwfVatAo9E//7WMfRyF789zxSN9BqEketUdr16mCoksBh6if4D3buodfSXy6fsrIsHa2Yhk6WleRPsSXUzbT87meTQ6ReRqSFW5IF9f5B/Z2H8goAgAA"
},
"Metadata": {
"aws:cdk:path": "BedrockProxy/CDKMetadata/Default"
},
"Condition": "CDKMetadataAvailable"
}
},
"Mappings": {
"ProxyRegionTable03E5BEB3": {
"us-east-1": {
"model": "anthropic.claude-3-sonnet-20240229-v1:0"
},
"ap-southeast-1": {
"model": "anthropic.claude-v2"
},
"ap-northeast-1": {
"model": "anthropic.claude-v2:1"
},
"eu-central-1": {
"model": "anthropic.claude-v2:1"
}
}
},
"Outputs": {
"APIBaseUrl": {
"Description": "Proxy API Base URL (OPENAI_API_BASE)",
"Value": {
"Fn::Join": [
"",
[
"http://",
{
"Fn::GetAtt": [
"ProxyALB87756780",
"DNSName"
]
},
"/api/v1"
]
]
}
}
},
"Conditions": {
"CDKMetadataAvailable": {
"Fn::Or": [
{
"Fn::Or": [
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"af-south-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-east-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-northeast-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-northeast-2"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-south-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-southeast-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ap-southeast-2"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"ca-central-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"cn-north-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"cn-northwest-1"
]
}
]
},
{
"Fn::Or": [
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-central-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-north-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-south-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-west-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-west-2"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"eu-west-3"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"il-central-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"me-central-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"me-south-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"sa-east-1"
]
}
]
},
{
"Fn::Or": [
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"us-east-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"us-east-2"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"us-west-1"
]
},
{
"Fn::Equals": [
{
"Ref": "AWS::Region"
},
"us-west-2"
]
}
]
}
]
}
}
}
Description: Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock (API Gateway + Lambda with Streaming)
Parameters:
ApiKeySecretArn:
Type: String
AllowedPattern: ^arn:aws:secretsmanager:.*$
Description: The secret ARN in Secrets Manager used to store the API Key
ContainerImageUri:
Type: String
Description: The ECR image URI for the Lambda function (e.g., 123456789012.dkr.ecr.us-east-1.amazonaws.com/bedrock-proxy-api:latest)
DefaultModelId:
Type: String
Default: anthropic.claude-3-sonnet-20240229-v1:0
Description: The default model ID, please make sure the model ID is supported in the current region
EnablePromptCaching:
Type: String
Default: "false"
AllowedValues:
- "true"
- "false"
Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings.
Resources:
# IAM Role for Lambda
ProxyApiHandlerServiceRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Action: sts:AssumeRole
Effect: Allow
Principal:
Service: lambda.amazonaws.com
Version: "2012-10-17"
ManagedPolicyArns:
- !Sub "arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
ProxyApiHandlerServiceRoleDefaultPolicy:
Type: AWS::IAM::Policy
Properties:
PolicyDocument:
Statement:
- Action:
- bedrock:ListFoundationModels
- bedrock:ListInferenceProfiles
Effect: Allow
Resource: "*"
- Action:
- bedrock:InvokeModel
- bedrock:InvokeModelWithResponseStream
Effect: Allow
Resource:
- arn:aws:bedrock:*::foundation-model/*
- arn:aws:bedrock:*:*:inference-profile/*
- arn:aws:bedrock:*:*:application-inference-profile/*
- Action:
- secretsmanager:GetSecretValue
- secretsmanager:DescribeSecret
Effect: Allow
Resource: !Ref ApiKeySecretArn
Version: "2012-10-17"
PolicyName: ProxyApiHandlerServiceRoleDefaultPolicy
Roles:
- !Ref ProxyApiHandlerServiceRole
# Lambda Function with Lambda Web Adapter for streaming
ProxyApiHandler:
Type: AWS::Lambda::Function
Properties:
Architectures:
- arm64
Code:
ImageUri: !Ref ContainerImageUri
Description: Bedrock Proxy API Handler with Response Streaming
Environment:
Variables:
# Lambda Web Adapter settings
AWS_LWA_INVOKE_MODE: RESPONSE_STREAM
AWS_LWA_READINESS_CHECK_PATH: /health
AWS_LWA_ASYNC_INIT: "true"
PORT: "8080"
# Application settings
DEBUG: "false"
API_KEY_SECRET_ARN: !Ref ApiKeySecretArn
DEFAULT_MODEL: !Ref DefaultModelId
DEFAULT_EMBEDDING_MODEL: cohere.embed-multilingual-v3
ENABLE_CROSS_REGION_INFERENCE: "true"
ENABLE_APPLICATION_INFERENCE_PROFILES: "true"
ENABLE_PROMPT_CACHING: !Ref EnablePromptCaching
API_ROUTE_PREFIX: /v1
MemorySize: 1024
PackageType: Image
Role: !GetAtt ProxyApiHandlerServiceRole.Arn
Timeout: 600
DependsOn:
- ProxyApiHandlerServiceRoleDefaultPolicy
- ProxyApiHandlerServiceRole
# API Gateway REST API (Regional)
RestApi:
Type: AWS::ApiGateway::RestApi
Properties:
Name: BedrockProxyApi
Description: Bedrock Access Gateway - OpenAI-compatible API with streaming support
EndpointConfiguration:
Types:
- REGIONAL
Body:
openapi: "3.0.1"
info:
title: BedrockProxyApi
version: "1.0"
paths:
/{proxy+}:
x-amazon-apigateway-any-method:
parameters:
- name: proxy
in: path
required: true
schema:
type: string
x-amazon-apigateway-integration:
type: aws_proxy
httpMethod: POST
uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations"
passthroughBehavior: when_no_match
timeoutInMillis: 600000
responseTransferMode: STREAM
responses:
default:
description: Default response
/:
x-amazon-apigateway-any-method:
x-amazon-apigateway-integration:
type: aws_proxy
httpMethod: POST
uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations"
passthroughBehavior: when_no_match
timeoutInMillis: 600000
responseTransferMode: STREAM
responses:
default:
description: Default response
# Lambda Permission for API Gateway
LambdaPermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref ProxyApiHandler
Action: lambda:InvokeFunction
Principal: apigateway.amazonaws.com
SourceArn: !Sub "arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${RestApi}/*"
# API Gateway Deployment
ApiDeployment:
Type: AWS::ApiGateway::Deployment
Properties:
RestApiId: !Ref RestApi
DependsOn:
- RestApi
# API Gateway Stage
ApiStage:
Type: AWS::ApiGateway::Stage
Properties:
RestApiId: !Ref RestApi
DeploymentId: !Ref ApiDeployment
StageName: api
Description: API Stage with streaming support
Outputs:
APIBaseUrl:
Description: Proxy API Base URL (OPENAI_API_BASE)
Value: !Sub "https://${RestApi}.execute-api.${AWS::Region}.amazonaws.com/api/v1"
RestApiId:
Description: API Gateway REST API ID
Value: !Ref RestApi
LambdaFunctionArn:
Description: Lambda Function ARN
Value: !GetAtt ProxyApiHandler.Arn

File diff suppressed because it is too large Load Diff

18
docker-compose.yml Normal file
View File

@@ -0,0 +1,18 @@
version: '3.8'
services:
bedrock-access-gateway:
build:
context: ./src
dockerfile: Dockerfile_ecs
ports:
- "127.0.0.1:8000:8080"
environment:
- ENABLE_PROMPT_CACHING=true
- API_KEY=${OPENAI_API_KEY}
- AWS_PROFILE
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_SESSION_TOKEN
volumes:
- ${HOME}/.aws:/home/appuser/.aws

78
docs/Security.md Normal file
View File

@@ -0,0 +1,78 @@
# Security
This document details the security configuration required for the solution. In particular, it covers:
- **HTTPS Setup**
Following these guidelines will help ensure that traffic is encrypted over the public network.
---
## 1. HTTPS Authentication with the ALB
### Overview
Using HTTPS on your ALB guarantees that all client-to-ALB communication is encrypted. This is achieved by:
- **Obtaining and managing SSL/TLS certificates** using AWS Certificate Manager (ACM). You'll need a domain but you can request a free certificate.
- **Configuring HTTPS listeners** on the ALB
- **Automating HTTP to HTTPS redirect** for clients that inadvertently access HTTP endpoints
- **Allowing traffic in the Security Group of the ALB**
### Step-by-Step Setup
#### 1.1. Request an SSL/TLS Certificate via ACM
1. **Navigate to AWS Certificate Manager (ACM):**
In the AWS Management Console, go to ACM in the region where your ALB is deployed.
2. **Request the Certificate:**
- Click on **"Request a certificate"**.
- Choose **"Request a public certificate"** (or a private one if using a private CA).
- Enter your domain names (e.g., `example.com`, `*.example.com`).
- Complete the validation (via DNS or email). DNS validation is generally preferred for automation purposes.
3. **Certificate Validation:**
Ensure that the certificate status becomes **"Issued"** before proceeding.
#### 1.2. Configure the ALB for HTTPS
1. **Create or Modify the ALB Listener:**
- Open the **EC2 Dashboard** and navigate to [Load Balancers](https://console.aws.amazon.com/ec2/home?#LoadBalancers:).
- If you already have an ALB, select it; otherwise, create a new ALB.
- Under the **Listeners** tab, click **Manage listener** > **Edit Listener**.
- Configure the listener protocol to **HTTPS** with port **443**.
- Select the certificate you requested from ACM.
#### 1.3. (Optional) Redirect HTTP Traffic to HTTPS
To enhance security, ensure that any HTTP requests are automatically redirected to HTTPS.
1. **Create an HTTP Listener on Port 80:**
- Add a listener on port **80**.
- In the listener settings, add a rule to redirect all traffic to port **443** with the protocol changed to **HTTPS**.
**Example AWS CLI command for redirection:**
```bash
aws elbv2 create-listener \
--load-balancer-arn <your-alb-arn> \
--protocol HTTP \
--port 80 \
--default-actions Type=redirect,RedirectConfig="Protocol=https,Port=443,StatusCode=HTTP_301"
```
#### 1.4. Allow traffic in the Security Group of the ALB
1. **Create a Security Group:**
- Go to the CloudFormation stack you originally used to deploy, select **Resources** and search for **ProxyALBSecurityGroup**
- Click on the Security Group
- Edit the Inbound Rules to allow traffic on Port 443 from `0.0.0.0/0` and (optionally) delete the Inbound Rule on Port 80. **Note**: If you delete the rule on port 80, you will need to update the base url to use HTTPS only as it won't redirect HTTP traffic to HTTPS.
Now you should be able to test your application! Use the base url like:
```
https://<your-domain>/api/v1
```
---
By following the steps outlined in this guide, you can configure a secure environment that uses HTTPS via ALB for encrypted traffic.

97
docs/Troubleshooting.md Normal file
View File

@@ -0,0 +1,97 @@
# Troubleshooting Guide
This guide helps you troubleshoot common issues you might encounter when using the Bedrock Access Gateway.
## Common Issues
### 1. Parameter Store Access Error
To see errors, first you need to access the CloudWatch Logs of the Lambda/Fargate instance.
1. Go to the [CloudWatch Console](https://console.aws.amazon.com/cloudwatch/home?#logsV2:log-groups/)
2. Search for `/aws/lambda/BedrockProxyAPI`
3. Click on the `Log Stream` to see the error details
```python
botocore.exceptions.ClientError: An error occurred (ParameterNotFound) when calling the GetParameter operation: Parameter /BedrockProxyAPIKey not found.
```
This error occurs when the Lambda function cannot access the API key parameter in Parameter Store.
**Possible solutions:**
- Verify that you created the parameter in Parameter Store with the correct name
- Check that the parameter name in the CloudFormation stack matches the one in Parameter Store
- Ensure the Lambda function's IAM role has permission to access Parameter Store
- If you didn't set up an API key, leave the `ApiKeyParam` field blank during deployment
### 2. Model Access Issues
If you receive an error about model access:
```
{"error": {"message": "User: arn:aws:iam::XXXX:role/XXX is not authorized to perform: bedrock:InvokeModel on resource: arn:aws:bedrock:REGION::foundation-model/XXX", "type": "auth_error", "code": 401}}
```
**Possible solutions:**
- Ensure you have requested access to the model in Amazon Bedrock
- Verify the Lambda/Fargate role has the necessary permissions to invoke Bedrock models
- Check that you're using the correct model ID
- Verify the model is available in your chosen region
### 3. API Key Authentication Failures
If you receive a 401 Unauthorized error:
```
{"detail": "Could not validate credentials"}
```
**Possible solutions:**
- Verify you're using the correct API key in your requests
- Check that the `Authorization` header is properly formatted (`Bearer YOUR-API-KEY`)
- If using environment variables, ensure `OPENAI_API_KEY` is set correctly
### 4. Cross-Region Access Issues
If you're trying to access models in a different region:
```
{"error": {"message": "Region 'us-east-1' is not enabled for your account", "type": "invalid_request_error", "code": 400}}
```
**Possible solutions:**
- Ensure the target region is enabled for your AWS account
- Verify the model you're trying to access is available in that region
- Check that your IAM roles have the necessary cross-region permissions
### 5. Rate Limiting and Quotas
If you're experiencing throttling or quota issues:
```
{"error": {"message": "Rate limit exceeded", "type": "rate_limit_error", "code": 429}}
```
**Possible solutions:**
- Check your Bedrock service quotas in the AWS Console
- Consider implementing retry logic in your application
- Request a quota increase if needed
## Getting Help
If you're still experiencing issues:
1. Check the CloudWatch Logs for detailed error messages
2. Verify your AWS credentials and permissions
3. Review the [Usage Guide](./Usage.md) for correct API usage
4. Open a [GitHub issue](https://github.com/aws-samples/bedrock-access-gateway/issues/new?template=bug_report.md) with:
- Detailed error message
- Steps to reproduce
- Your deployment configuration (region, model, etc.)
- Any relevant CloudWatch logs
## Additional Resources
- [Amazon Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
- [AWS IAM Documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/)
- [AWS Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html)

View File

@@ -9,6 +9,85 @@ export OPENAI_API_KEY=<API key>
export OPENAI_BASE_URL=<API base url>
```
**API Example:**
- [Models API](#models-api)
- [Embedding API](#embedding-api)
- [Multimodal API](#multimodal-api)
- [Tool Call](#tool-call)
- [Reasoning](#reasoning)
- [Interleaved thinking (beta)](#Interleaved thinking (beta))
## Models API
You can use this API to get a list of supported model IDs.
Also, you can use this API to refresh the model list if new models are added to Amazon Bedrock.
**Example Request**
```bash
curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq .data
```
**Example Response**
```bash
[
...
{
"id": "anthropic.claude-3-5-sonnet-20240620-v1:0",
"created": 1734416893,
"object": "model",
"owned_by": "bedrock"
},
{
"id": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"created": 1734416893,
"object": "model",
"owned_by": "bedrock"
},
...
]
```
## Chat Completions API
### Basic Example with Claude Sonnet 4.5
Claude Sonnet 4.5 is Anthropic's most intelligent model, excelling at coding, complex reasoning, and agent-based tasks. It's available via global cross-region inference profiles.
**Example Request**
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [
{
"role": "user",
"content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."
}
]
}'
```
**Example SDK Usage**
```python
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
messages=[{"role": "user", "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."}],
)
print(completion.choices[0].message.content)
```
## Embedding API
**Important Notice**: Please carefully review the following points before using this proxy API for embedding.
@@ -91,13 +170,10 @@ print(doc_result[0][:5])
## Multimodal API
**Important Notice**: Please carefully review the following points before using this proxy API for Multimodal.
1. This API is only supported by Claude 3 model.
**Example Request**
```bash
curl $OPENAI_BASE_URL/chat/completions \
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
@@ -185,7 +261,6 @@ curl $OPENAI_BASE_URL/chat/completions \
**Important Notice**: Please carefully review the following points before using this Tool Call for Chat completion API.
1. Function Call is now deprecated in favor of Tool Call by OpenAI, hence it's not supported here, you should use Tool Call instead.
2. This API is only supported by Claude 3 model.
**Example Request**
@@ -283,3 +358,218 @@ curl $OPENAI_BASE_URL/chat/completions \
You can try it with different questions, such as:
1. Hello, who are you? (No tools are needed)
2. What is the weather like today? (Should use get_current_location tool first)
## Reasoning
**Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API.
- Only Claude 3.7 Sonnet (extended thinking) and DeepSeek R1 support Reasoning so far. Please make sure the model supports reasoning before use.
- For Claude 3.7 Sonnet, the reasoning mode (or thinking mode) is not enabled by default, you must pass additional `reasoning_effort` parameter in your request. Please also provide the right max_tokens (or max_completion_tokens) in your request. The budget_tokens is based on reasoning_effort (low: 30%, medium: 60%, high: 100% of max tokens), ensuring minimum budget_tokens of 1,024 with Anthropic recommending at least 4,000 tokens for comprehensive reasoning. Check [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html) for more details.
- For DeepSeek R1, you don't need additional reasoning_effort parameter, otherwise, you may get an error.
- The reasoning response (CoT, thoughts) is added in an additional tag 'reasoning_content' which is not officially supported by OpenAI. This is to follow [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example). This may be changed in the future.
**Example Request**
- Claude 3.7 Sonnet
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"messages": [
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
],
"max_completion_tokens": 4096,
"reasoning_effort": "low",
"stream": false
}'
```
- DeepSeek R1
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.deepseek.r1-v1:0",
"messages": [
{
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
],
"stream": false
}'
```
**Example Response**
```json
{
"id": "chatcmpl-83fb7a88",
"created": 1740545278,
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"system_fingerprint": "fp",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
"reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
}
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 51,
"completion_tokens": 565,
"total_tokens": 616
}
}
```
You can also use OpenAI SDK (run `pip3 install -U openai` first )
- Non-Streaming
```python
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=messages,
reasoning_effort="low",
max_completion_tokens=4096,
)
reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content
```
- Streaming
```python
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=messages,
reasoning_effort="low",
max_completion_tokens=4096,
stream=True,
)
reasoning_content = ""
content = ""
for chunk in response:
if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
reasoning_content += chunk.choices[0].delta.reasoning_content
elif chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
```
## Interleaved thinking (beta)
**Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API.
Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions.
With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.
**Supported Models**: Claude Sonnet 4, Claude Sonnet 4.5
**Example Request**
- Non-Streaming (Claude Sonnet 4.5)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "Explain how to implement a binary search tree with self-balancing capabilities."
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Non-Streaming (Claude Sonnet 4)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Streaming (Claude Sonnet 4.5)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "Explain how to implement a binary search tree with self-balancing capabilities."
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Streaming (Claude Sonnet 4)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```

View File

@@ -9,6 +9,83 @@ export OPENAI_API_KEY=<API key>
export OPENAI_BASE_URL=<API base url>
```
**API 示例:**
- [Models API](#models-api)
- [Embedding API](#embedding-api)
- [Multimodal API](#multimodal-api)
- [Tool Call](#tool-call)
- [Reasoning](#reasoning)
- [Interleaved thinking (beta)](#Interleaved thinking (beta))
## Models API
你可以通过这个API 获取支持的models 列表。 另外如果Amazon Bedrock有新模型加入后你也可以用它来更新刷新模型列表。
**Request 示例**
```bash
curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq .data
```
**Response 示例**
```bash
[
...
{
"id": "anthropic.claude-3-5-sonnet-20240620-v1:0",
"created": 1734416893,
"object": "model",
"owned_by": "bedrock"
},
{
"id": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
"created": 1734416893,
"object": "model",
"owned_by": "bedrock"
},
...
]
```
## Chat Completions API
### Claude Sonnet 4.5 基础示例
Claude Sonnet 4.5 是 Anthropic 最智能的模型,在编码、复杂推理和基于代理的任务方面表现出色。它通过全球跨区域推理配置文件提供。
**Request 示例**
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [
{
"role": "user",
"content": "编写一个使用动态规划计算斐波那契数列的Python函数。"
}
]
}'
```
**SDK 使用示例**
```python
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
messages=[{"role": "user", "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"}],
)
print(completion.choices[0].message.content)
```
## Embedding API
**重要**: 在使用此代理 API 之前,请仔细阅读以下几点:
@@ -90,10 +167,6 @@ print(doc_result[0][:5])
## Multimodal API
**重要**:在使用此代理API进行多模态处理之前,请仔细阅读以下几点:
1. 此API 仅支持Claude 3模型。
**Request 示例**
```bash
@@ -184,7 +257,6 @@ curl $OPENAI_BASE_URL/chat/completions \
**重要**:在使用此代理API进行Tool Call之前,请仔细阅读以下几点:
1. OpenAI 已经废弃使用Function Call,而推荐使用Tool Call,因此Function Call在此处不受支持,您应该改为Tool Call。
1. 此API 仅支持Claude 3模型。
**Request 示例**
@@ -282,3 +354,222 @@ curl $OPENAI_BASE_URL/chat/completions \
You can try it with different questions, such as:
1. Hello, who are you? (No tools are needed)
2. What is the weather like today? (Should use get_current_location tool first)
## Reasoning
**重要**: 使用此 reasoning 推理模式前,请仔细阅读以下要点。
- 目前仅 Claude 3.7 Sonnet / Deepseek R1 模型支持推理功能。使用前请确保所用模型支持推理。
- Claude 3.7 Sonnet 推理模式(或思考模式)默认未启用,您必须在请求中传递额外的 reasoning_effort 参数,参数值可选:lowmedium, high。另外请在请求中提供正确的 max_tokens或 max_completion_tokens参数。budget_tokens 基于 reasoning_effort 设置30%60%100% 的max tokens确保最小 budget_tokens 为 1,024Anthropic 建议至少使用 4,000 个令牌以获得全面的推理。详情请参阅 [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html)。
- Deepseek R1 会自动使用推理模式,不需要在中传递额外的 reasoning_effort 参数(否则会报错)
- 推理结果(思维链结果、思考过程)被添加到名为 'reasoning_content' 的额外标签中,这不是 OpenAI 官方支持的格式。此设计遵循 [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example) 的规范。未来可能会有所变动。
**Request 示例**
- Claude 3.7 Sonnet
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"messages": [
{
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
],
"max_completion_tokens": 4096,
"reasoning_effort": "low",
"stream": false
}'
```
- DeepSeek R1
```bash
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.deepseek.r1-v1:0",
"messages": [
{
"role": "user",
"content": "which one is bigger, 3.9 or 3.11?"
}
],
"stream": false
}'
```
**Response 示例**
```json
{
"id": "chatcmpl-83fb7a88",
"created": 1740545278,
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"system_fingerprint": "fp",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"logprobs": null,
"message": {
"role": "assistant",
"content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
"reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
}
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 51,
"completion_tokens": 565,
"total_tokens": 616
}
}
```
或者使用 OpenAI SDK (请先运行`pip3 install -U openai` 升级到最新版本)
- Non-Streaming
```python
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=messages,
reasoning_effort="low",
max_completion_tokens=4096,
)
reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content
```
- Streaming
```python
from openai import OpenAI
client = OpenAI()
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=messages,
reasoning_effort="low",
max_completion_tokens=4096,
stream=True,
)
reasoning_content = ""
content = ""
for chunk in response:
if hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
reasoning_content += chunk.choices[0].delta.reasoning_content
elif chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
```
## Interleaved thinking (beta)
**重要提示**:在使用 Chat Completion API 的推理模式reasoning mode请务必仔细阅读以下内容。
Claude 4 模型支持借助工具使用的扩展思维功能Extended Thinking其中包含交错思考[interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) )。该功能使 Claude 4 可以在多次调用工具之间进行思考,并在收到工具结果后执行更复杂的推理,这对处理更复杂的 Agentic AI 交互非常有帮助。
在交错思考模式下budget_tokens 可以超过 max_tokens 参数,因为它代表一次助手回合中所有思考块的总 Token 预算。
**支持的模型**: Claude Sonnet 4, Claude Sonnet 4.5
**Request 示例**
- Non-Streaming (Claude Sonnet 4.5)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Non-Streaming (Claude Sonnet 4)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Streaming (Claude Sonnet 4.5)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```
- Streaming (Claude Sonnet 4)
```bash
curl http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": "有一天,一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧,于是偷偷把分数改成了 88 分。她的父亲看到试卷后,怒发冲冠,狠狠地给了她一巴掌,怒吼道:“你这 8 怎么一半是绿的一半是红的,你以为我是傻子吗?”女孩被打后,委屈地哭了起来,什么也没说。过了一会儿,父亲突然崩溃了。请问这位父亲为什么过一会崩溃了?"
}],
"stream": true,
"extra_body": {
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
"thinking": {"type": "enabled", "budget_tokens": 4096}
}
}'
```

21
ruff.toml Normal file
View File

@@ -0,0 +1,21 @@
line-length = 120
indent-width = 4
target-version = "py312"
exclude = [
".venv",
".vscode",
"test/*"
]
[lint]
select = ["E", "F", "I"]
ignore = [
"E501",
"C901",
"F401",
]
[format]
# use double quotes for strings.
quote-style = "double"

View File

@@ -1,35 +1,139 @@
# Make sure you have created the Repo in AWS ECR in every regions you want to push to before executing this script.
# NOTE: The script will try to create the ECR repository if it doesn't exist. Please grant the necessary permissions to the IAM user or role.
# Usage:
# cd scripts
# chmod +x push-to-ecr.sh
# ./push-to-ecr.sh
# bash ./push-to-ecr.sh
set -o errexit # exit on first error
set -o nounset # exit on using unset variables
set -o pipefail # exit on any error in a pipeline
# Change to the directory where the script is located
cd "$(dirname "$0")"
# Prompt user for inputs
echo "================================================"
echo "Bedrock Access Gateway - Build and Push to ECR"
echo "================================================"
echo ""
# Get repository name for Lambda version
read -p "Enter ECR repository name for Lambda (default: bedrock-proxy-api): " LAMBDA_REPO
LAMBDA_REPO=${LAMBDA_REPO:-bedrock-proxy-api}
# Get repository name for ECS/Fargate version
read -p "Enter ECR repository name for ECS/Fargate (default: bedrock-proxy-api-ecs): " ECS_REPO
ECS_REPO=${ECS_REPO:-bedrock-proxy-api-ecs}
# Get image tag
read -p "Enter image tag (default: latest): " TAG
TAG=${TAG:-latest}
# Get AWS region
read -p "Enter AWS region (default: us-east-1): " AWS_REGION
AWS_REGION=${AWS_REGION:-us-east-1}
echo ""
echo "Configuration:"
echo " Lambda Repository: $LAMBDA_REPO"
echo " ECS/Fargate Repository: $ECS_REPO"
echo " Image Tag: $TAG"
echo " AWS Region: $AWS_REGION"
echo ""
read -p "Continue with these settings? (y/n): " CONFIRM
if [[ ! "$CONFIRM" =~ ^[Yy]$ ]]; then
echo "Aborted."
exit 1
fi
echo ""
# Acknowledgment about ECR repository creation
echo " NOTICE: This script will automatically create ECR repositories if they don't exist."
echo " The repositories will be created with the following default settings:"
echo " - Image tag mutability: MUTABLE (allows overwriting tags)"
echo " - Image scanning: Disabled"
echo " - Encryption: AES256 (AWS managed encryption)"
echo ""
echo " You can modify these settings later in the AWS ECR Console if needed."
echo " Required IAM permissions: ecr:CreateRepository, ecr:GetAuthorizationToken,"
echo " ecr:BatchCheckLayerAvailability, ecr:InitiateLayerUpload, ecr:UploadLayerPart,"
echo " ecr:CompleteLayerUpload, ecr:PutImage"
echo ""
read -p "Do you acknowledge and want to proceed? (y/n): " ACK_CONFIRM
if [[ ! "$ACK_CONFIRM" =~ ^[Yy]$ ]]; then
echo "Aborted."
exit 1
fi
echo ""
# Define variables
IMAGE_NAME="bedrock-proxy-api"
TAG="latest"
AWS_REGIONS=("us-west-2") # List of AWS regions
#AWS_REGIONS=("us-east-1" "us-west-2" "eu-central-1" "ap-southeast-1" "ap-northeast-1") # List of AWS regions
ARCHS=("arm64") # Single architecture for simplicity
# Build Docker image
docker build -t $IMAGE_NAME:$TAG ../src/
build_and_push_image() {
local IMAGE_NAME=$1
local TAG=$2
local DOCKERFILE_PATH=$3
local REGION=$AWS_REGION
local ARCH=${ARCHS[0]}
# Loop through each AWS region
for REGION in "${AWS_REGIONS[@]}"
do
# Get the account ID for the current region
echo "Building $IMAGE_NAME:$TAG..."
# Build Docker image
# Note: --provenance=false and --sbom=false are required for Lambda compatibility
# Without these flags, Docker BuildKit (especially with docker-container driver) may create
# OCI image manifests with attestations that AWS Lambda does not support.
# Lambda requires Docker V2 Schema 2 format without multi-manifest index.
# See: https://github.com/aws-samples/bedrock-access-gateway/issues/206
docker buildx build \
--platform linux/$ARCH \
--provenance=false \
--sbom=false \
-t $IMAGE_NAME:$TAG \
-f $DOCKERFILE_PATH \
--load \
../src/
# Get the account ID
ACCOUNT_ID=$(aws sts get-caller-identity --region $REGION --query Account --output text)
# Create repository URI
REPOSITORY_URI="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${IMAGE_NAME}"
echo "Creating ECR repository if it doesn't exist..."
# Create ECR repository if it doesn't exist
aws ecr create-repository --repository-name "${IMAGE_NAME}" --region $REGION || true
echo "Logging in to ECR..."
# Log in to ECR
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $REPOSITORY_URI
# Tag the image for the current region
echo "Pushing image to ECR..."
# Tag the image for ECR
docker tag $IMAGE_NAME:$TAG $REPOSITORY_URI:$TAG
# Push the image to ECR
docker push $REPOSITORY_URI:$TAG
echo "Pushed $IMAGE_NAME:$TAG to $REPOSITORY_URI"
done
echo "✅ Successfully pushed $IMAGE_NAME:$TAG to $REPOSITORY_URI"
echo ""
}
echo "Building and pushing Lambda image..."
build_and_push_image "$LAMBDA_REPO" "$TAG" "../src/Dockerfile"
echo "Building and pushing ECS/Fargate image..."
build_and_push_image "$ECS_REPO" "$TAG" "../src/Dockerfile_ecs"
echo "================================================"
echo "✅ All images successfully pushed!"
echo "================================================"
echo ""
echo "Your container image URIs:"
ACCOUNT_ID=$(aws sts get-caller-identity --region $AWS_REGION --query Account --output text)
echo " Lambda: ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${LAMBDA_REPO}:${TAG}"
echo " ECS/Fargate: ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECS_REPO}:${TAG}"
echo ""
echo "Next steps:"
echo " 1. Download the CloudFormation templates from deployment/ folder"
echo " 2. Update the ContainerImageUri parameter with your image URI above"
echo " 3. Deploy the stack via AWS CloudFormation Console"
echo ""

View File

@@ -1,9 +1,19 @@
FROM public.ecr.aws/lambda/python:3.12
# Add Lambda Web Adapter for API Gateway response streaming
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 /lambda-adapter /opt/extensions/lambda-adapter
COPY ./api ./api
COPY requirements.txt .
RUN pip3 install -r requirements.txt -U --no-cache-dir
CMD [ "api.app.handler" ]
# Preload tiktoken encoding: https://github.com/aws-samples/bedrock-access-gateway/issues/118
ENV TIKTOKEN_CACHE_DIR=/var/task/.cache/tiktoken
RUN python3 -c 'import tiktoken_ext.openai_public as tke; tke.cl100k_base()'
# Lambda Web Adapter requires overriding the Lambda base image entrypoint
# to run the web app directly instead of the Lambda runtime handler
ENTRYPOINT []
CMD ["python", "-m", "uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "8080"]

View File

@@ -1,4 +1,4 @@
FROM python:3.12-slim
FROM public.ecr.aws/docker/library/python:3.13-slim
WORKDIR /app
@@ -8,4 +8,19 @@ RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
COPY ./api /app/api
CMD ["uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "80"]
# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser && \
chown -R appuser:appuser /app
USER appuser
# Preload tiktoken encoding: https://github.com/aws-samples/bedrock-access-gateway/issues/118
ENV TIKTOKEN_CACHE_DIR=/app/.cache/tiktoken
RUN python3 -c 'import tiktoken_ext.openai_public as tke; tke.cl100k_base()'
ENV PORT=8080
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:${PORT}/health').read()"
CMD ["sh", "-c", "uvicorn api.app:app --host 0.0.0.0 --port ${PORT}"]

View File

@@ -1,4 +1,5 @@
import logging
import os
import uvicorn
from fastapi import FastAPI
@@ -7,8 +8,8 @@ from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import PlainTextResponse
from mangum import Mangum
from api.routers import model, chat, embeddings
from api.setting import API_ROUTE_PREFIX, TITLE, DESCRIPTION, SUMMARY, VERSION
from api.routers import chat, embeddings, model
from api.setting import API_ROUTE_PREFIX, DESCRIPTION, SUMMARY, TITLE, VERSION
config = {
"title": TITLE,
@@ -23,14 +24,22 @@ logging.basicConfig(
)
app = FastAPI(**config)
allowed_origins = os.environ.get("ALLOWED_ORIGINS", "*")
origins_list = [origin.strip() for origin in allowed_origins.split(",")] if allowed_origins != "*" else ["*"]
# Warn if CORS allows all origins
if origins_list == ["*"]:
logging.warning("CORS is configured to allow all origins (*). Set ALLOWED_ORIGINS environment variable to restrict access.")
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_origins=origins_list, # nosec - configurable via ALLOWED_ORIGINS env var
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
app.include_router(model.router, prefix=API_ROUTE_PREFIX)
app.include_router(chat.router, prefix=API_ROUTE_PREFIX)
app.include_router(embeddings.router, prefix=API_ROUTE_PREFIX)
@@ -44,10 +53,21 @@ async def health():
@app.exception_handler(RequestValidationError)
async def validation_exception_handler(request, exc):
logger = logging.getLogger(__name__)
# Log essential info only - avoid sensitive data and performance overhead
logger.warning(
"Request validation failed: %s %s - %s",
request.method,
request.url.path,
str(exc).split('\n')[0] # First line only
)
return PlainTextResponse(str(exc), status_code=400)
handler = Mangum(app)
if __name__ == "__main__":
uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)
# Bind to 0.0.0.0 for container environments, network is handled by network policies and load balancers
uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=False) # nosec B104

View File

@@ -1,28 +1,43 @@
import json
import os
from typing import Annotated
import boto3
from botocore.exceptions import ClientError
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from api.setting import DEFAULT_API_KEYS
from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
api_key_param = os.environ.get("API_KEY_PARAM_NAME")
api_key_secret_arn = os.environ.get("API_KEY_SECRET_ARN")
api_key_env = os.environ.get("API_KEY")
if api_key_param:
# For backward compatibility.
# Please now use secrets manager instead.
ssm = boto3.client("ssm")
api_key = ssm.get_parameter(Name=api_key_param, WithDecryption=True)["Parameter"][
"Value"
]
api_key = ssm.get_parameter(Name=api_key_param, WithDecryption=True)["Parameter"]["Value"]
elif api_key_secret_arn:
sm = boto3.client("secretsmanager")
try:
response = sm.get_secret_value(SecretId=api_key_secret_arn)
if "SecretString" in response:
secret = json.loads(response["SecretString"])
api_key = secret["api_key"]
except ClientError:
raise RuntimeError("Unable to retrieve API KEY, please ensure the secret ARN is correct")
except KeyError:
raise RuntimeError('Please ensure the secret contains a "api_key" field')
elif api_key_env:
api_key = api_key_env
else:
api_key = DEFAULT_API_KEYS
raise RuntimeError(
"API Key is not configured. Please set up your API Key."
)
security = HTTPBearer()
def api_key_auth(
credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)]
credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)],
):
if credentials.credentials != api_key:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Key"
)
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Key")

View File

@@ -1,3 +1,4 @@
import logging
import time
import uuid
from abc import ABC, abstractmethod
@@ -5,14 +6,17 @@ from typing import AsyncIterable
from api.schema import (
# Chat
ChatResponse,
ChatRequest,
ChatResponse,
ChatStreamResponse,
# Embeddings
EmbeddingsRequest,
EmbeddingsResponse,
Error,
)
logger = logging.getLogger(__name__)
class BaseChatModel(ABC):
"""Represent a basic chat model
@@ -29,12 +33,12 @@ class BaseChatModel(ABC):
pass
@abstractmethod
def chat(self, chat_request: ChatRequest) -> ChatResponse:
async def chat(self, chat_request: ChatRequest) -> ChatResponse:
"""Handle a basic chat completion requests."""
pass
@abstractmethod
def chat_stream(self, chat_request: ChatRequest) -> AsyncIterable[bytes]:
async def chat_stream(self, chat_request: ChatRequest) -> AsyncIterable[bytes]:
"""Handle a basic chat completion requests with stream response."""
pass
@@ -43,16 +47,20 @@ class BaseChatModel(ABC):
return "chatcmpl-" + str(uuid.uuid4())[:8]
@staticmethod
def stream_response_to_bytes(
response: ChatStreamResponse | None = None
) -> bytes:
if response:
def stream_response_to_bytes(response: ChatStreamResponse | Error | None = None) -> bytes:
if isinstance(response, Error):
logger.error("Stream error: %s", response.error.message if response.error else "Unknown error")
data = response.model_dump_json()
elif isinstance(response, ChatStreamResponse):
# to populate other fields when using exclude_unset=True
response.system_fingerprint = "fp"
response.object = "chat.completion.chunk"
response.created = int(time.time())
return "data: {}\n\n".format(response.model_dump_json(exclude_unset=True)).encode("utf-8")
return "data: [DONE]\n\n".encode("utf-8")
data = response.model_dump_json(exclude_unset=True)
else:
data = "[DONE]"
return f"data: {data}\n\n".encode("utf-8")
class BaseEmbeddingsModel(ABC):

File diff suppressed because it is too large Load Diff

View File

@@ -1,11 +1,11 @@
from typing import Annotated
from fastapi import APIRouter, Depends, Body
from fastapi import APIRouter, Body, Depends
from fastapi.responses import StreamingResponse
from api.auth import api_key_auth
from api.models.bedrock import BedrockModel
from api.schema import ChatRequest, ChatResponse, ChatStreamResponse
from api.schema import ChatRequest, ChatResponse, ChatStreamResponse, Error
from api.setting import DEFAULT_MODEL
router = APIRouter(
@@ -15,7 +15,9 @@ router = APIRouter(
)
@router.post("/completions", response_model=ChatResponse | ChatStreamResponse, response_model_exclude_unset=True)
@router.post(
"/completions", response_model=ChatResponse | ChatStreamResponse | Error, response_model_exclude_unset=True
)
async def chat_completions(
chat_request: Annotated[
ChatRequest,
@@ -30,7 +32,7 @@ async def chat_completions(
}
],
),
]
],
):
if chat_request.model.lower().startswith("gpt-"):
chat_request.model = DEFAULT_MODEL
@@ -39,7 +41,5 @@ async def chat_completions(
model = BedrockModel()
model.validate(chat_request)
if chat_request.stream:
return StreamingResponse(
content=model.chat_stream(chat_request), media_type="text/event-stream"
)
return model.chat(chat_request)
return StreamingResponse(content=model.chat_stream(chat_request), media_type="text/event-stream")
return await model.chat(chat_request)

View File

@@ -1,6 +1,6 @@
from typing import Annotated
from fastapi import APIRouter, Depends, Body
from fastapi import APIRouter, Body, Depends
from api.auth import api_key_auth
from api.models.bedrock import get_embeddings_model
@@ -21,13 +21,11 @@ async def embeddings(
examples=[
{
"model": "cohere.embed-multilingual-v3",
"input": [
"Your text string goes here"
],
"input": ["Your text string goes here"],
}
],
),
]
],
):
if embeddings_request.model.lower().startswith("text-embedding-"):
embeddings_request.model = DEFAULT_EMBEDDING_MODEL

View File

@@ -4,7 +4,7 @@ from fastapi import APIRouter, Depends, HTTPException, Path
from api.auth import api_key_auth
from api.models.bedrock import BedrockModel
from api.schema import Models, Model
from api.schema import Model, Models
router = APIRouter(
prefix="/models",
@@ -22,9 +22,7 @@ async def validate_model_id(model_id: str):
@router.get("", response_model=Models)
async def list_models():
model_list = [
Model(id=model_id) for model_id in chat_model.list_models()
]
model_list = [Model(id=model_id) for model_id in chat_model.list_models()]
return Models(data=model_list)
@@ -36,7 +34,7 @@ async def get_model(
model_id: Annotated[
str,
Path(description="Model ID", example="anthropic.claude-3-sonnet-20240229-v1:0"),
]
],
):
await validate_model_id(model_id)
return Model(id=model_id)

View File

@@ -1,8 +1,10 @@
import time
from typing import Literal, Iterable
from typing import Iterable, Literal
from pydantic import BaseModel, Field
from api.setting import DEFAULT_MODEL
class Model(BaseModel):
id: str
@@ -39,10 +41,15 @@ class ImageUrl(BaseModel):
class ImageContent(BaseModel):
type: Literal["image_url"] = "image"
type: Literal["image_url"] = "image_url"
image_url: ImageUrl
class ToolContent(BaseModel):
type: Literal["text"] = "text"
text: str
class SystemMessage(BaseModel):
name: str | None = None
role: Literal["system"] = "system"
@@ -58,16 +65,22 @@ class UserMessage(BaseModel):
class AssistantMessage(BaseModel):
name: str | None = None
role: Literal["assistant"] = "assistant"
content: str | list[TextContent | ImageContent] | None
content: str | list[TextContent | ImageContent] | None = None
tool_calls: list[ToolCall] | None = None
class ToolMessage(BaseModel):
role: Literal["tool"] = "tool"
content: str
content: str | list[ToolContent] | list[dict]
tool_call_id: str
class DeveloperMessage(BaseModel):
name: str | None = None
role: Literal["developer"] = "developer"
content: str
class Function(BaseModel):
name: str
description: str | None = None
@@ -84,25 +97,43 @@ class StreamOptions(BaseModel):
class ChatRequest(BaseModel):
messages: list[SystemMessage | UserMessage | AssistantMessage | ToolMessage]
model: str
messages: list[SystemMessage | UserMessage | AssistantMessage | ToolMessage | DeveloperMessage]
model: str = DEFAULT_MODEL
frequency_penalty: float | None = Field(default=0.0, le=2.0, ge=-2.0) # Not used
presence_penalty: float | None = Field(default=0.0, le=2.0, ge=-2.0) # Not used
stream: bool | None = False
stream_options: StreamOptions | None = None
temperature: float | None = Field(default=1.0, le=2.0, ge=0.0)
top_p: float | None = Field(default=1.0, le=1.0, ge=0.0)
temperature: float | None = Field(default=None, le=2.0, ge=0.0)
top_p: float | None = Field(default=None, le=1.0, ge=0.0)
user: str | None = None # Not used
max_tokens: int | None = 2048
max_completion_tokens: int | None = None
reasoning_effort: Literal["low", "medium", "high"] | None = None
n: int | None = 1 # Not used
tools: list[Tool] | None = None
tool_choice: str | object = "auto"
stop: list[str] | str | None = None
extra_body: dict | None = None
class PromptTokensDetails(BaseModel):
"""Details about prompt tokens usage, following OpenAI API format."""
cached_tokens: int = 0
audio_tokens: int = 0
class CompletionTokensDetails(BaseModel):
"""Details about completion tokens usage, following OpenAI API format."""
reasoning_tokens: int = 0
audio_tokens: int = 0
class Usage(BaseModel):
prompt_tokens: int
completion_tokens: int
total_tokens: int
prompt_tokens_details: PromptTokensDetails | None = None
completion_tokens_details: CompletionTokensDetails | None = None
class ChatResponseMessage(BaseModel):
@@ -110,6 +141,7 @@ class ChatResponseMessage(BaseModel):
role: Literal["assistant"] | None = None
content: str | None = None
tool_calls: list[ToolCall] | None = None
reasoning_content: str | None = None
class BaseChoice(BaseModel):
@@ -150,7 +182,7 @@ class EmbeddingsRequest(BaseModel):
input: str | list[str] | Iterable[int | Iterable[int]]
model: str
encoding_format: Literal["float", "base64"] = "float"
dimensions: int | None = None # not used.
dimensions: int | None = None # Used by Nova embeddings; ignored by other models.
user: str | None = None # not used.
@@ -170,3 +202,11 @@ class EmbeddingsResponse(BaseModel):
data: list[Embedding]
model: str
usage: EmbeddingsUsage
class ErrorMessage(BaseModel):
message: str
class Error(BaseModel):
error: ErrorMessage

View File

@@ -1,28 +1,18 @@
import os
DEFAULT_API_KEYS = "bedrock"
API_ROUTE_PREFIX = "/api/v1"
API_ROUTE_PREFIX = os.environ.get("API_ROUTE_PREFIX", "/api/v1")
TITLE = "Amazon Bedrock Proxy APIs"
SUMMARY = "OpenAI-Compatible RESTful APIs for Amazon Bedrock"
VERSION = "0.1.0"
DESCRIPTION = """
Use OpenAI-Compatible RESTful APIs for Amazon Bedrock models.
List of Amazon Bedrock models currently supported:
- Anthropic Claude 2 / 3 /3.5 (Haiku/Sonnet/Opus)
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding
"""
DEBUG = os.environ.get("DEBUG", "false").lower() != "false"
AWS_REGION = os.environ.get("AWS_REGION", "us-west-2")
DEFAULT_MODEL = os.environ.get(
"DEFAULT_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0"
)
DEFAULT_EMBEDDING_MODEL = os.environ.get(
"DEFAULT_EMBEDDING_MODEL", "cohere.embed-multilingual-v3"
)
DEFAULT_MODEL = os.environ.get("DEFAULT_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0")
DEFAULT_EMBEDDING_MODEL = os.environ.get("DEFAULT_EMBEDDING_MODEL", "cohere.embed-multilingual-v3")
ENABLE_CROSS_REGION_INFERENCE = os.environ.get("ENABLE_CROSS_REGION_INFERENCE", "true").lower() != "false"
ENABLE_APPLICATION_INFERENCE_PROFILES = os.environ.get("ENABLE_APPLICATION_INFERENCE_PROFILES", "true").lower() != "false"
ENABLE_PROMPT_CACHING = os.environ.get("ENABLE_PROMPT_CACHING", "false").lower() != "false"

View File

@@ -1,9 +1,10 @@
fastapi==0.111.0
pydantic==2.7.1
fastapi==0.128.0
starlette==0.49.1 # CVE-2025-62727: Fix ReDoS in Range header parsing
pydantic==2.11.4
uvicorn==0.29.0
mangum==0.17.0
tiktoken==0.6.0
requests==2.32.3
numpy==1.26.4
boto3==1.34.132
botocore==1.34.132
tiktoken==0.9.0
requests==2.32.4
numpy==2.2.5
boto3==1.40.4
botocore==1.40.4

View File

@@ -1,87 +0,0 @@
import time
import random
def calculate_factorial(n):
if n == 0:
return 1
else:
return n * calculate_factorial(n - 1)
def find_largest_number(numbers):
largest = numbers[0]
for num in numbers:
if num > largest:
largest = num
return largest
def inefficient_sort(arr):
n = len(arr)
for i in range(n):
for j in range(0, n-i-1):
if arr[j] > arr[j+1]:
arr[j], arr[j+1] = arr[j+1], arr[j]
return arr
class User:
def __init__(self, name, age):
self.name = name
self.age = age
def print_user_info(self):
print(f"Name: {self.name}, Age: {self.age}")
def process_data(data):
result = []
for item in data:
if item % 2 == 0:
result.append(item * 2)
else:
result.append(item * 3)
return result
def generate_random_numbers(n):
numbers = []
for i in range(n):
numbers.append(random.randint(1, 100))
return numbers
def calculate_average(numbers):
total = sum(numbers)
count = len(numbers)
average = total / count
return average
def main():
# Inefficient factorial calculation
print(calculate_factorial(20))
# Unnecessary loop for finding largest number
numbers = [3, 7, 2, 9, 1, 5]
print(find_largest_number(numbers))
# Inefficient sorting algorithm
unsorted_list = [64, 34, 25, 12, 22, 11, 90]
print(inefficient_sort(unsorted_list))
# Inconsistent naming convention
user1 = User("John Doe", 30)
user1.print_user_info()
# Redundant if-else structure
data = [1, 2, 3, 4, 5]
print(process_data(data))
# Inefficient random number generation
random_numbers = generate_random_numbers(1000000)
print(f"Generated {len(random_numbers)} random numbers")
# Potential division by zero
empty_list = []
print(calculate_average(empty_list))
# Unnecessary time delay
time.sleep(5)
print("Finished processing after 5 seconds")
if __name__ == "__main__":
main()