feat(apigw): add API Gateway response streaming support (#207)

Replace ALB + Lambda architecture with API Gateway REST API + Lambda
using response streaming for SSE support. This provides:

- No VPC required, reducing complexity and cost
- Native streaming support via API Gateway response streaming
- Pay-per-request pricing model

Changes:
- Add Lambda Web Adapter to Dockerfile for streaming support
- Replace BedrockProxy.template with API Gateway configuration
- Update README with new deployment options and latest models
- Update architecture diagram for API Gateway flow
This commit is contained in:
Mengxin Zhu
2025-12-05 10:54:13 +08:00
committed by GitHub
parent 0411454b3a
commit b41633b826
4 changed files with 136 additions and 225 deletions

View File

@@ -4,9 +4,16 @@ OpenAI-compatible RESTful APIs for Amazon Bedrock
## What's New 🔥
This project now supports **Claude Sonnet 4.5**, Anthropic's most intelligent model with enhanced coding capabilities and complex agent support, available via global cross-region inference.
**API Gateway Response Streaming Support** - You can now deploy with Amazon API Gateway REST API instead of ALB, enabling true response streaming for better latency and cost optimization. See [Deployment Options](#deployment-options) for details.
It also supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
**Latest Models Supported:**
- **Claude 4.5 Family**: Opus 4.5, Sonnet 4.5, Haiku 4.5 - Anthropic's most intelligent models with enhanced coding and agent capabilities
- **Amazon Nova**: Nova Micro, Nova Lite, Nova Pro, Nova Premier - Amazon's native foundation models with multimodal support
- **DeepSeek**: DeepSeek-R1 (reasoning), DeepSeek-V3.1 - Advanced reasoning and general-purpose models
- **Qwen 3**: Qwen3-32B, Qwen3-235B, Qwen3-Coder-30B, Qwen3-Coder-480B - Alibaba's latest language and coding models
- **OpenAI OSS**: gpt-oss-20b, gpt-oss-120b - Open-source GPT models available via Bedrock
It also supports reasoning for **Claude 4/4.5** (extended thinking and interleaved thinking) and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
## Overview
@@ -46,13 +53,18 @@ Please make sure you have met below prerequisites:
### Architecture
The following diagram illustrates the reference architecture. Note that it also includes a new **VPC** with two public subnets only for the Application Load Balancer (ALB).
The following diagram illustrates the reference architecture. It uses [Amazon API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with Lambda for SSE support.
![Architecture](assets/arch.png)
You can also choose to use [AWS Fargate](https://aws.amazon.com/fargate/) behind the ALB instead of [AWS Lambda](https://aws.amazon.com/lambda/), the main difference is the latency of the first byte for streaming response (Fargate is lower).
### Deployment Options
Alternatively, you can use Lambda Function URL to replace ALB, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
| Option | Pros | Cons | Best For |
|--------|------|------|----------|
| **API Gateway + Lambda** | No VPC required, pay-per-request, native streaming support, lower operational overhead | Potential cold starts | Most use cases, cost-sensitive deployments |
| **ALB + Fargate** | Lowest streaming latency, no cold starts | Higher cost, requires VPC | High-throughput, latency-sensitive workloads |
You can also use Lambda Function URL as an alternative, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
### Deployment
@@ -105,8 +117,8 @@ After creation, you'll see your secret in the Secrets Manager console. Make note
**Step 3: Deploy the CloudFormation stack**
1. Download the CloudFormation template you want to use:
- For Lambda: [`deployment/BedrockProxy.template`](deployment/BedrockProxy.template)
- For Fargate: [`deployment/BedrockProxyFargate.template`](deployment/BedrockProxyFargate.template)
- For API Gateway + Lambda: [`deployment/BedrockProxy.template`](deployment/BedrockProxy.template)
- For ALB + Fargate: [`deployment/BedrockProxyFargate.template`](deployment/BedrockProxyFargate.template)
2. Sign in to AWS Management Console and navigate to the CloudFormation service in your target region.
@@ -227,7 +239,7 @@ For more information about creating and managing application inference profiles,
This proxy now supports **Prompt Caching** for Claude and Nova models, which can reduce costs by up to 90% and latency by up to 85% for workloads with repeated prompts.
**Supported Models:**
- Claude 3+ models (Claude 3.5 Haiku, Claude 3.7 Sonnet, Claude 4, Claude 4.5, etc.)
- Claude models (Claude 3.5 Haiku, Claude 4, Claude 4.5, etc.)
- Nova models (Nova Micro, Nova Lite, Nova Pro, Nova Premier)
**Enabling Prompt Caching:**
@@ -249,7 +261,7 @@ client = OpenAI()
# Cache system prompts
response = client.chat.completions.create(
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
model="global.anthropic.claude-haiku-4-5-20251001-v1:0",
messages=[
{"role": "system", "content": "You are an expert assistant with knowledge of..."},
{"role": "user", "content": "Help me with this task"}
@@ -271,7 +283,7 @@ curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"model": "global.anthropic.claude-haiku-4-5-20251001-v1:0",
"messages": [
{"role": "system", "content": "Long system prompt..."},
{"role": "user", "content": "Question"}
@@ -334,9 +346,11 @@ print(response)
This application does not collect any of your data. Furthermore, it does not log any requests or responses by default.
### Why not used API Gateway instead of Application Load Balancer?
### Why choose API Gateway vs ALB?
Short answer is that API Gateway does not support server-sent events (SSE) for streaming response.
**API Gateway + Lambda** uses [API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with [Lambda Web Adapter](https://github.com/awslabs/aws-lambda-web-adapter) to support SSE streaming without requiring a VPC. This is a cost-effective, serverless option with up to 10 minutes timeout.
**ALB + Fargate** provides the lowest streaming latency with no cold starts, ideal for high-throughput workloads.
### Which regions are supported?
@@ -360,9 +374,9 @@ The API base url should look like `http://localhost:8000/api/v1`.
### Any performance sacrifice or latency increase by using the proxy APIs
Comparing with the AWS SDK call, the referenced architecture will bring additional latency on response, you can try and test that on you own.
Compared with direct AWS SDK calls, the proxy architecture will add some latency. The default API Gateway + Lambda deployment provides good streaming performance with Lambda response streaming.
Also, you can use Lambda Web Adapter + Function URL (see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.
For lowest latency on streaming responses, consider the ALB + Fargate deployment option which eliminates cold starts and provides consistent performance.
### Any plan to support SageMaker models?

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

After

Width:  |  Height:  |  Size: 50 KiB

View File

@@ -1,4 +1,4 @@
Description: Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock
Description: Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock (API Gateway + Lambda with Streaming)
Parameters:
ApiKeySecretArn:
Type: String
@@ -19,116 +19,8 @@ Parameters:
- "false"
Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings.
Resources:
VPCB9E5F0B4:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.250.0.0/16
EnableDnsHostnames: true
EnableDnsSupport: true
InstanceTenancy: default
Tags:
- Key: Name
Value: BedrockProxy/VPC
VPCPublicSubnet1SubnetB4246D30:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 0
- Fn::GetAZs: ""
CidrBlock: 10.250.0.0/24
MapPublicIpOnLaunch: true
Tags:
- Key: aws-cdk:subnet-name
Value: Public
- Key: aws-cdk:subnet-type
Value: Public
- Key: Name
Value: BedrockProxy/VPC/PublicSubnet1
VpcId:
Ref: VPCB9E5F0B4
VPCPublicSubnet1RouteTableFEE4B781:
Type: AWS::EC2::RouteTable
Properties:
Tags:
- Key: Name
Value: BedrockProxy/VPC/PublicSubnet1
VpcId:
Ref: VPCB9E5F0B4
VPCPublicSubnet1RouteTableAssociation0B0896DC:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId:
Ref: VPCPublicSubnet1RouteTableFEE4B781
SubnetId:
Ref: VPCPublicSubnet1SubnetB4246D30
VPCPublicSubnet1DefaultRoute91CEF279:
Type: AWS::EC2::Route
Properties:
DestinationCidrBlock: 0.0.0.0/0
GatewayId:
Ref: VPCIGWB7E252D3
RouteTableId:
Ref: VPCPublicSubnet1RouteTableFEE4B781
DependsOn:
- VPCVPCGW99B986DC
VPCPublicSubnet2Subnet74179F39:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 1
- Fn::GetAZs: ""
CidrBlock: 10.250.1.0/24
MapPublicIpOnLaunch: true
Tags:
- Key: aws-cdk:subnet-name
Value: Public
- Key: aws-cdk:subnet-type
Value: Public
- Key: Name
Value: BedrockProxy/VPC/PublicSubnet2
VpcId:
Ref: VPCB9E5F0B4
VPCPublicSubnet2RouteTable6F1A15F1:
Type: AWS::EC2::RouteTable
Properties:
Tags:
- Key: Name
Value: BedrockProxy/VPC/PublicSubnet2
VpcId:
Ref: VPCB9E5F0B4
VPCPublicSubnet2RouteTableAssociation5A808732:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId:
Ref: VPCPublicSubnet2RouteTable6F1A15F1
SubnetId:
Ref: VPCPublicSubnet2Subnet74179F39
VPCPublicSubnet2DefaultRouteB7481BBA:
Type: AWS::EC2::Route
Properties:
DestinationCidrBlock: 0.0.0.0/0
GatewayId:
Ref: VPCIGWB7E252D3
RouteTableId:
Ref: VPCPublicSubnet2RouteTable6F1A15F1
DependsOn:
- VPCVPCGW99B986DC
VPCIGWB7E252D3:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: BedrockProxy/VPC
VPCVPCGW99B986DC:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
InternetGatewayId:
Ref: VPCIGWB7E252D3
VpcId:
Ref: VPCB9E5F0B4
ProxyApiHandlerServiceRoleBE71BFB1:
# IAM Role for Lambda
ProxyApiHandlerServiceRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
@@ -139,12 +31,9 @@ Resources:
Service: lambda.amazonaws.com
Version: "2012-10-17"
ManagedPolicyArns:
- Fn::Join:
- ""
- - "arn:"
- Ref: AWS::Partition
- :iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
ProxyApiHandlerServiceRoleDefaultPolicy86681202:
- !Sub "arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
ProxyApiHandlerServiceRoleDefaultPolicy:
Type: AWS::IAM::Policy
Properties:
PolicyDocument:
@@ -166,122 +55,124 @@ Resources:
- secretsmanager:GetSecretValue
- secretsmanager:DescribeSecret
Effect: Allow
Resource:
Ref: ApiKeySecretArn
Resource: !Ref ApiKeySecretArn
Version: "2012-10-17"
PolicyName: ProxyApiHandlerServiceRoleDefaultPolicy86681202
PolicyName: ProxyApiHandlerServiceRoleDefaultPolicy
Roles:
- Ref: ProxyApiHandlerServiceRoleBE71BFB1
ProxyApiHandlerEC15A492:
- !Ref ProxyApiHandlerServiceRole
# Lambda Function with Lambda Web Adapter for streaming
ProxyApiHandler:
Type: AWS::Lambda::Function
Properties:
Architectures:
- arm64
Code:
ImageUri:
Ref: ContainerImageUri
Description: Bedrock Proxy API Handler
ImageUri: !Ref ContainerImageUri
Description: Bedrock Proxy API Handler with Response Streaming
Environment:
Variables:
# Lambda Web Adapter settings
AWS_LWA_INVOKE_MODE: RESPONSE_STREAM
AWS_LWA_READINESS_CHECK_PATH: /health
AWS_LWA_ASYNC_INIT: "true"
PORT: "8080"
# Application settings
DEBUG: "false"
API_KEY_SECRET_ARN:
Ref: ApiKeySecretArn
DEFAULT_MODEL:
Ref: DefaultModelId
API_KEY_SECRET_ARN: !Ref ApiKeySecretArn
DEFAULT_MODEL: !Ref DefaultModelId
DEFAULT_EMBEDDING_MODEL: cohere.embed-multilingual-v3
ENABLE_CROSS_REGION_INFERENCE: "true"
ENABLE_APPLICATION_INFERENCE_PROFILES: "true"
ENABLE_PROMPT_CACHING:
Ref: EnablePromptCaching
ENABLE_PROMPT_CACHING: !Ref EnablePromptCaching
API_ROUTE_PREFIX: /v1
MemorySize: 1024
PackageType: Image
Role:
Fn::GetAtt:
- ProxyApiHandlerServiceRoleBE71BFB1
- Arn
Role: !GetAtt ProxyApiHandlerServiceRole.Arn
Timeout: 600
DependsOn:
- ProxyApiHandlerServiceRoleDefaultPolicy86681202
- ProxyApiHandlerServiceRoleBE71BFB1
ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779:
- ProxyApiHandlerServiceRoleDefaultPolicy
- ProxyApiHandlerServiceRole
# API Gateway REST API (Regional)
RestApi:
Type: AWS::ApiGateway::RestApi
Properties:
Name: BedrockProxyApi
Description: Bedrock Access Gateway - OpenAI-compatible API with streaming support
EndpointConfiguration:
Types:
- REGIONAL
Body:
openapi: "3.0.1"
info:
title: BedrockProxyApi
version: "1.0"
paths:
/{proxy+}:
x-amazon-apigateway-any-method:
parameters:
- name: proxy
in: path
required: true
schema:
type: string
x-amazon-apigateway-integration:
type: aws_proxy
httpMethod: POST
uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations"
passthroughBehavior: when_no_match
timeoutInMillis: 600000
responseTransferMode: STREAM
responses:
default:
description: Default response
/:
x-amazon-apigateway-any-method:
x-amazon-apigateway-integration:
type: aws_proxy
httpMethod: POST
uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations"
passthroughBehavior: when_no_match
timeoutInMillis: 600000
responseTransferMode: STREAM
responses:
default:
description: Default response
# Lambda Permission for API Gateway
LambdaPermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref ProxyApiHandler
Action: lambda:InvokeFunction
FunctionName:
Fn::GetAtt:
- ProxyApiHandlerEC15A492
- Arn
Principal: elasticloadbalancing.amazonaws.com
ProxyALB87756780:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Principal: apigateway.amazonaws.com
SourceArn: !Sub "arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${RestApi}/*"
# API Gateway Deployment
ApiDeployment:
Type: AWS::ApiGateway::Deployment
Properties:
LoadBalancerAttributes:
- Key: deletion_protection.enabled
Value: "false"
Scheme: internet-facing
SecurityGroups:
- Fn::GetAtt:
- ProxyALBSecurityGroup0D6CA3DA
- GroupId
Subnets:
- Ref: VPCPublicSubnet1SubnetB4246D30
- Ref: VPCPublicSubnet2Subnet74179F39
Type: application
RestApiId: !Ref RestApi
DependsOn:
- VPCPublicSubnet1DefaultRoute91CEF279
- VPCPublicSubnet1RouteTableAssociation0B0896DC
- VPCPublicSubnet2DefaultRouteB7481BBA
- VPCPublicSubnet2RouteTableAssociation5A808732
ProxyALBSecurityGroup0D6CA3DA:
Type: AWS::EC2::SecurityGroup
- RestApi
# API Gateway Stage
ApiStage:
Type: AWS::ApiGateway::Stage
Properties:
GroupDescription: Automatically created Security Group for ELB BedrockProxyALB1CE4CAD1
SecurityGroupEgress:
- CidrIp: 255.255.255.255/32
Description: Disallow all traffic
FromPort: 252
IpProtocol: icmp
ToPort: 86
SecurityGroupIngress:
- CidrIp: 0.0.0.0/0
Description: Allow from anyone on port 80
FromPort: 80
IpProtocol: tcp
ToPort: 80
VpcId:
Ref: VPCB9E5F0B4
ProxyALBListener933E9515:
Type: AWS::ElasticLoadBalancingV2::Listener
Properties:
DefaultActions:
- TargetGroupArn:
Ref: ProxyALBListenerTargetsGroup187739FA
Type: forward
LoadBalancerArn:
Ref: ProxyALB87756780
Port: 80
Protocol: HTTP
ProxyALBListenerTargetsGroup187739FA:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckEnabled: false
TargetType: lambda
Targets:
- Id:
Fn::GetAtt:
- ProxyApiHandlerEC15A492
- Arn
DependsOn:
- ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779
RestApiId: !Ref RestApi
DeploymentId: !Ref ApiDeployment
StageName: api
Description: API Stage with streaming support
Outputs:
APIBaseUrl:
Description: Proxy API Base URL (OPENAI_API_BASE)
Value:
Fn::Join:
- ""
- - http://
- Fn::GetAtt:
- ProxyALB87756780
- DNSName
- /api/v1
Value: !Sub "https://${RestApi}.execute-api.${AWS::Region}.amazonaws.com/api/v1"
RestApiId:
Description: API Gateway REST API ID
Value: !Ref RestApi
LambdaFunctionArn:
Description: Lambda Function ARN
Value: !GetAtt ProxyApiHandler.Arn

View File

@@ -1,9 +1,15 @@
FROM public.ecr.aws/lambda/python:3.12
# Add Lambda Web Adapter for API Gateway response streaming
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 /lambda-adapter /opt/extensions/lambda-adapter
COPY ./api ./api
COPY requirements.txt .
RUN pip3 install -r requirements.txt -U --no-cache-dir
CMD [ "api.app.handler" ]
# Lambda Web Adapter requires overriding the Lambda base image entrypoint
# to run the web app directly instead of the Lambda runtime handler
ENTRYPOINT []
CMD ["python", "-m", "uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "8080"]