feat(apigw): add API Gateway response streaming support (#207)

Replace ALB + Lambda architecture with API Gateway REST API + Lambda using response streaming for SSE support. This provides: - No VPC required, reducing complexity and cost - Native streaming support via API Gateway response streaming - Pay-per-request pricing model Changes: - Add Lambda Web Adapter to Dockerfile for streaming support - Replace BedrockProxy.template with API Gateway configuration - Update README with new deployment options and latest models - Update architecture diagram for API Gateway flow
2025-12-05 10:54:13 +08:00
parent 0411454b3a
commit b41633b826
4 changed files with 136 additions and 225 deletions
--- a/README.md
+++ b/README.md
@@ -4,9 +4,16 @@ OpenAI-compatible RESTful APIs for Amazon Bedrock
 ## What's New 🔥
-This project now supports **Claude Sonnet 4.5**, Anthropic's most intelligent model with enhanced coding capabilities and complex agent support, available via global cross-region inference.
+**API Gateway Response Streaming Support** - You can now deploy with Amazon API Gateway REST API instead of ALB, enabling true response streaming for better latency and cost optimization. See [Deployment Options](#deployment-options) for details.
-It also supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
+**Latest Models Supported:**
 - **Claude 4.5 Family**: Opus 4.5, Sonnet 4.5, Haiku 4.5 - Anthropic's most intelligent models with enhanced coding and agent capabilities
 - **Amazon Nova**: Nova Micro, Nova Lite, Nova Pro, Nova Premier - Amazon's native foundation models with multimodal support
 - **DeepSeek**: DeepSeek-R1 (reasoning), DeepSeek-V3.1 - Advanced reasoning and general-purpose models
 - **Qwen 3**: Qwen3-32B, Qwen3-235B, Qwen3-Coder-30B, Qwen3-Coder-480B - Alibaba's latest language and coding models
 - **OpenAI OSS**: gpt-oss-20b, gpt-oss-120b - Open-source GPT models available via Bedrock
 It also supports reasoning for **Claude 4/4.5** (extended thinking and interleaved thinking) and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
 ## Overview
@@ -46,13 +53,18 @@ Please make sure you have met below prerequisites:
 ### Architecture
-The following diagram illustrates the reference architecture. Note that it also includes a new **VPC** with two public subnets only for the Application Load Balancer (ALB).
+The following diagram illustrates the reference architecture. It uses [Amazon API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with Lambda for SSE support.
 ![Architecture](assets/arch.png)
-You can also choose to use [AWS Fargate](https://aws.amazon.com/fargate/) behind the ALB instead of [AWS Lambda](https://aws.amazon.com/lambda/), the main difference is the latency of the first byte for streaming response (Fargate is lower).
+### Deployment Options
-Alternatively, you can use Lambda Function URL to replace ALB, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
+| Option | Pros | Cons | Best For |
 |--------|------|------|----------|
 | **API Gateway + Lambda** | No VPC required, pay-per-request, native streaming support, lower operational overhead | Potential cold starts | Most use cases, cost-sensitive deployments |
 | **ALB + Fargate** | Lowest streaming latency, no cold starts | Higher cost, requires VPC | High-throughput, latency-sensitive workloads |
 You can also use Lambda Function URL as an alternative, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)
 ### Deployment
@@ -105,8 +117,8 @@ After creation, you'll see your secret in the Secrets Manager console. Make note
 **Step 3: Deploy the CloudFormation stack**
 1. Download the CloudFormation template you want to use:
-   - For Lambda: [`deployment/BedrockProxy.template`](deployment/BedrockProxy.template)
+   - For API Gateway + Lambda: [`deployment/BedrockProxy.template`](deployment/BedrockProxy.template)
-   - For Fargate: [`deployment/BedrockProxyFargate.template`](deployment/BedrockProxyFargate.template)
+   - For ALB + Fargate: [`deployment/BedrockProxyFargate.template`](deployment/BedrockProxyFargate.template)
 2. Sign in to AWS Management Console and navigate to the CloudFormation service in your target region.
@@ -227,7 +239,7 @@ For more information about creating and managing application inference profiles,
 This proxy now supports **Prompt Caching** for Claude and Nova models, which can reduce costs by up to 90% and latency by up to 85% for workloads with repeated prompts.
 **Supported Models:**
- Claude 3+ models (Claude 3.5 Haiku, Claude 3.7 Sonnet, Claude 4, Claude 4.5, etc.)
+- Claude models (Claude 3.5 Haiku, Claude 4, Claude 4.5, etc.)
 - Nova models (Nova Micro, Nova Lite, Nova Pro, Nova Premier)
 **Enabling Prompt Caching:**
@@ -249,7 +261,7 @@ client = OpenAI()
 # Cache system prompts
 response = client.chat.completions.create(
-    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    model="global.anthropic.claude-haiku-4-5-20251001-v1:0",
    messages=[
        {"role": "system", "content": "You are an expert assistant with knowledge of..."},
        {"role": "user", "content": "Help me with this task"}
@@ -271,7 +283,7 @@ curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
-    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    "model": "global.anthropic.claude-haiku-4-5-20251001-v1:0",
    "messages": [
      {"role": "system", "content": "Long system prompt..."},
      {"role": "user", "content": "Question"}
@@ -334,9 +346,11 @@ print(response)
 This application does not collect any of your data. Furthermore, it does not log any requests or responses by default.
-### Why not used API Gateway instead of Application Load Balancer?
+### Why choose API Gateway vs ALB?
-Short answer is that API Gateway does not support server-sent events (SSE) for streaming response.
+**API Gateway + Lambda** uses [API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with [Lambda Web Adapter](https://github.com/awslabs/aws-lambda-web-adapter) to support SSE streaming without requiring a VPC. This is a cost-effective, serverless option with up to 10 minutes timeout.
 **ALB + Fargate** provides the lowest streaming latency with no cold starts, ideal for high-throughput workloads.
 ### Which regions are supported?
@@ -360,9 +374,9 @@ The API base url should look like `http://localhost:8000/api/v1`.
 ### Any performance sacrifice or latency increase by using the proxy APIs
-Comparing with the AWS SDK call, the referenced architecture will bring additional latency on response, you can try and test that on you own.
+Compared with direct AWS SDK calls, the proxy architecture will add some latency. The default API Gateway + Lambda deployment provides good streaming performance with Lambda response streaming.
-Also, you can use Lambda Web Adapter + Function URL (see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.
+For lowest latency on streaming responses, consider the ALB + Fargate deployment option which eliminates cold starts and provides consistent performance.
 ### Any plan to support SageMaker models?
--- a/assets/arch.png
+++ b/assets/arch.png
--- a/deployment/BedrockProxy.template
+++ b/deployment/BedrockProxy.template
@@ -1,4 +1,4 @@
-Description: Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock
+Description: Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock (API Gateway + Lambda with Streaming)
 Parameters:
  ApiKeySecretArn:
    Type: String
@@ -19,116 +19,8 @@ Parameters:
      - "false"
    Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings.
 Resources:
-  VPCB9E5F0B4:
+  # IAM Role for Lambda
-    Type: AWS::EC2::VPC
+  ProxyApiHandlerServiceRole:
    Properties:
      CidrBlock: 10.250.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      InstanceTenancy: default
      Tags:
        - Key: Name
          Value: BedrockProxy/VPC
  VPCPublicSubnet1SubnetB4246D30:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone:
        Fn::Select:
          - 0
          - Fn::GetAZs: ""
      CidrBlock: 10.250.0.0/24
      MapPublicIpOnLaunch: true
      Tags:
        - Key: aws-cdk:subnet-name
          Value: Public
        - Key: aws-cdk:subnet-type
          Value: Public
        - Key: Name
          Value: BedrockProxy/VPC/PublicSubnet1
      VpcId:
        Ref: VPCB9E5F0B4
  VPCPublicSubnet1RouteTableFEE4B781:
    Type: AWS::EC2::RouteTable
    Properties:
      Tags:
        - Key: Name
          Value: BedrockProxy/VPC/PublicSubnet1
      VpcId:
        Ref: VPCB9E5F0B4
  VPCPublicSubnet1RouteTableAssociation0B0896DC:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: VPCPublicSubnet1RouteTableFEE4B781
      SubnetId:
        Ref: VPCPublicSubnet1SubnetB4246D30
  VPCPublicSubnet1DefaultRoute91CEF279:
    Type: AWS::EC2::Route
    Properties:
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId:
        Ref: VPCIGWB7E252D3
      RouteTableId:
        Ref: VPCPublicSubnet1RouteTableFEE4B781
    DependsOn:
      - VPCVPCGW99B986DC
  VPCPublicSubnet2Subnet74179F39:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone:
        Fn::Select:
          - 1
          - Fn::GetAZs: ""
      CidrBlock: 10.250.1.0/24
      MapPublicIpOnLaunch: true
      Tags:
        - Key: aws-cdk:subnet-name
          Value: Public
        - Key: aws-cdk:subnet-type
          Value: Public
        - Key: Name
          Value: BedrockProxy/VPC/PublicSubnet2
      VpcId:
        Ref: VPCB9E5F0B4
  VPCPublicSubnet2RouteTable6F1A15F1:
    Type: AWS::EC2::RouteTable
    Properties:
      Tags:
        - Key: Name
          Value: BedrockProxy/VPC/PublicSubnet2
      VpcId:
        Ref: VPCB9E5F0B4
  VPCPublicSubnet2RouteTableAssociation5A808732:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: VPCPublicSubnet2RouteTable6F1A15F1
      SubnetId:
        Ref: VPCPublicSubnet2Subnet74179F39
  VPCPublicSubnet2DefaultRouteB7481BBA:
    Type: AWS::EC2::Route
    Properties:
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId:
        Ref: VPCIGWB7E252D3
      RouteTableId:
        Ref: VPCPublicSubnet2RouteTable6F1A15F1
    DependsOn:
      - VPCVPCGW99B986DC
  VPCIGWB7E252D3:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Name
          Value: BedrockProxy/VPC
  VPCVPCGW99B986DC:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      InternetGatewayId:
        Ref: VPCIGWB7E252D3
      VpcId:
        Ref: VPCB9E5F0B4
  ProxyApiHandlerServiceRoleBE71BFB1:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
@@ -139,12 +31,9 @@ Resources:
              Service: lambda.amazonaws.com
        Version: "2012-10-17"
      ManagedPolicyArns:
-        - Fn::Join:
+        - !Sub "arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
-            - ""
+
-            - - "arn:"
+  ProxyApiHandlerServiceRoleDefaultPolicy:
              - Ref: AWS::Partition
              - :iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
  ProxyApiHandlerServiceRoleDefaultPolicy86681202:
    Type: AWS::IAM::Policy
    Properties:
      PolicyDocument:
@@ -166,122 +55,124 @@ Resources:
              - secretsmanager:GetSecretValue
              - secretsmanager:DescribeSecret
            Effect: Allow
-            Resource:
+            Resource: !Ref ApiKeySecretArn
              Ref: ApiKeySecretArn
        Version: "2012-10-17"
-      PolicyName: ProxyApiHandlerServiceRoleDefaultPolicy86681202
+      PolicyName: ProxyApiHandlerServiceRoleDefaultPolicy
      Roles:
-        - Ref: ProxyApiHandlerServiceRoleBE71BFB1
+        - !Ref ProxyApiHandlerServiceRole
-  ProxyApiHandlerEC15A492:
+
  # Lambda Function with Lambda Web Adapter for streaming
  ProxyApiHandler:
    Type: AWS::Lambda::Function
    Properties:
      Architectures:
        - arm64
      Code:
-        ImageUri:
+        ImageUri: !Ref ContainerImageUri
-          Ref: ContainerImageUri
+      Description: Bedrock Proxy API Handler with Response Streaming
      Description: Bedrock Proxy API Handler
      Environment:
        Variables:
          # Lambda Web Adapter settings
          AWS_LWA_INVOKE_MODE: RESPONSE_STREAM
          AWS_LWA_READINESS_CHECK_PATH: /health
          AWS_LWA_ASYNC_INIT: "true"
          PORT: "8080"
          # Application settings
          DEBUG: "false"
-          API_KEY_SECRET_ARN:
+          API_KEY_SECRET_ARN: !Ref ApiKeySecretArn
-            Ref: ApiKeySecretArn
+          DEFAULT_MODEL: !Ref DefaultModelId
          DEFAULT_MODEL:
            Ref: DefaultModelId
          DEFAULT_EMBEDDING_MODEL: cohere.embed-multilingual-v3
          ENABLE_CROSS_REGION_INFERENCE: "true"
          ENABLE_APPLICATION_INFERENCE_PROFILES: "true"
-          ENABLE_PROMPT_CACHING:
+          ENABLE_PROMPT_CACHING: !Ref EnablePromptCaching
-            Ref: EnablePromptCaching
+          API_ROUTE_PREFIX: /v1
      MemorySize: 1024
      PackageType: Image
-      Role:
+      Role: !GetAtt ProxyApiHandlerServiceRole.Arn
        Fn::GetAtt:
          - ProxyApiHandlerServiceRoleBE71BFB1
          - Arn
      Timeout: 600
    DependsOn:
-      - ProxyApiHandlerServiceRoleDefaultPolicy86681202
+      - ProxyApiHandlerServiceRoleDefaultPolicy
-      - ProxyApiHandlerServiceRoleBE71BFB1
+      - ProxyApiHandlerServiceRole
-  ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779:
+
  # API Gateway REST API (Regional)
  RestApi:
    Type: AWS::ApiGateway::RestApi
    Properties:
      Name: BedrockProxyApi
      Description: Bedrock Access Gateway - OpenAI-compatible API with streaming support
      EndpointConfiguration:
        Types:
          - REGIONAL
      Body:
        openapi: "3.0.1"
        info:
          title: BedrockProxyApi
          version: "1.0"
        paths:
          /{proxy+}:
            x-amazon-apigateway-any-method:
              parameters:
                - name: proxy
                  in: path
                  required: true
                  schema:
                    type: string
              x-amazon-apigateway-integration:
                type: aws_proxy
                httpMethod: POST
                uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations"
                passthroughBehavior: when_no_match
                timeoutInMillis: 600000
                responseTransferMode: STREAM
              responses:
                default:
                  description: Default response
          /:
            x-amazon-apigateway-any-method:
              x-amazon-apigateway-integration:
                type: aws_proxy
                httpMethod: POST
                uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations"
                passthroughBehavior: when_no_match
                timeoutInMillis: 600000
                responseTransferMode: STREAM
              responses:
                default:
                  description: Default response
  # Lambda Permission for API Gateway
  LambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !Ref ProxyApiHandler
      Action: lambda:InvokeFunction
-      FunctionName:
+      Principal: apigateway.amazonaws.com
-        Fn::GetAtt:
+      SourceArn: !Sub "arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${RestApi}/*"
-          - ProxyApiHandlerEC15A492
+
-          - Arn
+  # API Gateway Deployment
-      Principal: elasticloadbalancing.amazonaws.com
+  ApiDeployment:
-  ProxyALB87756780:
+    Type: AWS::ApiGateway::Deployment
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
-      LoadBalancerAttributes:
+      RestApiId: !Ref RestApi
        - Key: deletion_protection.enabled
          Value: "false"
      Scheme: internet-facing
      SecurityGroups:
        - Fn::GetAtt:
            - ProxyALBSecurityGroup0D6CA3DA
            - GroupId
      Subnets:
        - Ref: VPCPublicSubnet1SubnetB4246D30
        - Ref: VPCPublicSubnet2Subnet74179F39
      Type: application
    DependsOn:
-      - VPCPublicSubnet1DefaultRoute91CEF279
+      - RestApi
-      - VPCPublicSubnet1RouteTableAssociation0B0896DC
+
-      - VPCPublicSubnet2DefaultRouteB7481BBA
+  # API Gateway Stage
-      - VPCPublicSubnet2RouteTableAssociation5A808732
+  ApiStage:
-  ProxyALBSecurityGroup0D6CA3DA:
+    Type: AWS::ApiGateway::Stage
    Type: AWS::EC2::SecurityGroup
    Properties:
-      GroupDescription: Automatically created Security Group for ELB BedrockProxyALB1CE4CAD1
+      RestApiId: !Ref RestApi
-      SecurityGroupEgress:
+      DeploymentId: !Ref ApiDeployment
-        - CidrIp: 255.255.255.255/32
+      StageName: api
-          Description: Disallow all traffic
+      Description: API Stage with streaming support
-          FromPort: 252
+
          IpProtocol: icmp
          ToPort: 86
      SecurityGroupIngress:
        - CidrIp: 0.0.0.0/0
          Description: Allow from anyone on port 80
          FromPort: 80
          IpProtocol: tcp
          ToPort: 80
      VpcId:
        Ref: VPCB9E5F0B4
  ProxyALBListener933E9515:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      DefaultActions:
        - TargetGroupArn:
            Ref: ProxyALBListenerTargetsGroup187739FA
          Type: forward
      LoadBalancerArn:
        Ref: ProxyALB87756780
      Port: 80
      Protocol: HTTP
  ProxyALBListenerTargetsGroup187739FA:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      HealthCheckEnabled: false
      TargetType: lambda
      Targets:
        - Id:
            Fn::GetAtt:
              - ProxyApiHandlerEC15A492
              - Arn
    DependsOn:
      - ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779
 Outputs:
  APIBaseUrl:
    Description: Proxy API Base URL (OPENAI_API_BASE)
-    Value:
+    Value: !Sub "https://${RestApi}.execute-api.${AWS::Region}.amazonaws.com/api/v1"
-      Fn::Join:
+  RestApiId:
-        - ""
+    Description: API Gateway REST API ID
-        - - http://
+    Value: !Ref RestApi
-          - Fn::GetAtt:
+  LambdaFunctionArn:
-              - ProxyALB87756780
+    Description: Lambda Function ARN
-              - DNSName
+    Value: !GetAtt ProxyApiHandler.Arn
          - /api/v1
--- a/src/Dockerfile
+++ b/src/Dockerfile
@@ -1,9 +1,15 @@
 FROM public.ecr.aws/lambda/python:3.12
 # Add Lambda Web Adapter for API Gateway response streaming
 COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 /lambda-adapter /opt/extensions/lambda-adapter
 COPY ./api ./api
 COPY requirements.txt .
 RUN pip3 install -r requirements.txt -U --no-cache-dir
-CMD [ "api.app.handler" ]
+# Lambda Web Adapter requires overriding the Lambda base image entrypoint
 # to run the web app directly instead of the Lambda runtime handler
 ENTRYPOINT []
 CMD ["python", "-m", "uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "8080"]