diff --git a/README.md b/README.md index 306990e..362dee9 100644 --- a/README.md +++ b/README.md @@ -4,9 +4,16 @@ OpenAI-compatible RESTful APIs for Amazon Bedrock ## What's New 🔥 -This project now supports **Claude Sonnet 4.5**, Anthropic's most intelligent model with enhanced coding capabilities and complex agent support, available via global cross-region inference. +**API Gateway Response Streaming Support** - You can now deploy with Amazon API Gateway REST API instead of ALB, enabling true response streaming for better latency and cost optimization. See [Deployment Options](#deployment-options) for details. -It also supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list. +**Latest Models Supported:** +- **Claude 4.5 Family**: Opus 4.5, Sonnet 4.5, Haiku 4.5 - Anthropic's most intelligent models with enhanced coding and agent capabilities +- **Amazon Nova**: Nova Micro, Nova Lite, Nova Pro, Nova Premier - Amazon's native foundation models with multimodal support +- **DeepSeek**: DeepSeek-R1 (reasoning), DeepSeek-V3.1 - Advanced reasoning and general-purpose models +- **Qwen 3**: Qwen3-32B, Qwen3-235B, Qwen3-Coder-30B, Qwen3-Coder-480B - Alibaba's latest language and coding models +- **OpenAI OSS**: gpt-oss-20b, gpt-oss-120b - Open-source GPT models available via Bedrock + +It also supports reasoning for **Claude 4/4.5** (extended thinking and interleaved thinking) and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list. ## Overview @@ -46,13 +53,18 @@ Please make sure you have met below prerequisites: ### Architecture -The following diagram illustrates the reference architecture. Note that it also includes a new **VPC** with two public subnets only for the Application Load Balancer (ALB). +The following diagram illustrates the reference architecture. It uses [Amazon API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with Lambda for SSE support. ![Architecture](assets/arch.png) -You can also choose to use [AWS Fargate](https://aws.amazon.com/fargate/) behind the ALB instead of [AWS Lambda](https://aws.amazon.com/lambda/), the main difference is the latency of the first byte for streaming response (Fargate is lower). +### Deployment Options -Alternatively, you can use Lambda Function URL to replace ALB, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming) +| Option | Pros | Cons | Best For | +|--------|------|------|----------| +| **API Gateway + Lambda** | No VPC required, pay-per-request, native streaming support, lower operational overhead | Potential cold starts | Most use cases, cost-sensitive deployments | +| **ALB + Fargate** | Lowest streaming latency, no cold starts | Higher cost, requires VPC | High-throughput, latency-sensitive workloads | + +You can also use Lambda Function URL as an alternative, see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming) ### Deployment @@ -105,8 +117,8 @@ After creation, you'll see your secret in the Secrets Manager console. Make note **Step 3: Deploy the CloudFormation stack** 1. Download the CloudFormation template you want to use: - - For Lambda: [`deployment/BedrockProxy.template`](deployment/BedrockProxy.template) - - For Fargate: [`deployment/BedrockProxyFargate.template`](deployment/BedrockProxyFargate.template) + - For API Gateway + Lambda: [`deployment/BedrockProxy.template`](deployment/BedrockProxy.template) + - For ALB + Fargate: [`deployment/BedrockProxyFargate.template`](deployment/BedrockProxyFargate.template) 2. Sign in to AWS Management Console and navigate to the CloudFormation service in your target region. @@ -227,7 +239,7 @@ For more information about creating and managing application inference profiles, This proxy now supports **Prompt Caching** for Claude and Nova models, which can reduce costs by up to 90% and latency by up to 85% for workloads with repeated prompts. **Supported Models:** -- Claude 3+ models (Claude 3.5 Haiku, Claude 3.7 Sonnet, Claude 4, Claude 4.5, etc.) +- Claude models (Claude 3.5 Haiku, Claude 4, Claude 4.5, etc.) - Nova models (Nova Micro, Nova Lite, Nova Pro, Nova Premier) **Enabling Prompt Caching:** @@ -249,7 +261,7 @@ client = OpenAI() # Cache system prompts response = client.chat.completions.create( - model="us.anthropic.claude-3-7-sonnet-20250219-v1:0", + model="global.anthropic.claude-haiku-4-5-20251001-v1:0", messages=[ {"role": "system", "content": "You are an expert assistant with knowledge of..."}, {"role": "user", "content": "Help me with this task"} @@ -271,7 +283,7 @@ curl $OPENAI_BASE_URL/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{ - "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0", + "model": "global.anthropic.claude-haiku-4-5-20251001-v1:0", "messages": [ {"role": "system", "content": "Long system prompt..."}, {"role": "user", "content": "Question"} @@ -334,9 +346,11 @@ print(response) This application does not collect any of your data. Furthermore, it does not log any requests or responses by default. -### Why not used API Gateway instead of Application Load Balancer? +### Why choose API Gateway vs ALB? -Short answer is that API Gateway does not support server-sent events (SSE) for streaming response. +**API Gateway + Lambda** uses [API Gateway response streaming](https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/) with [Lambda Web Adapter](https://github.com/awslabs/aws-lambda-web-adapter) to support SSE streaming without requiring a VPC. This is a cost-effective, serverless option with up to 10 minutes timeout. + +**ALB + Fargate** provides the lowest streaming latency with no cold starts, ideal for high-throughput workloads. ### Which regions are supported? @@ -360,9 +374,9 @@ The API base url should look like `http://localhost:8000/api/v1`. ### Any performance sacrifice or latency increase by using the proxy APIs -Comparing with the AWS SDK call, the referenced architecture will bring additional latency on response, you can try and test that on you own. +Compared with direct AWS SDK calls, the proxy architecture will add some latency. The default API Gateway + Lambda deployment provides good streaming performance with Lambda response streaming. -Also, you can use Lambda Web Adapter + Function URL (see [example](https://github.com/awslabs/aws-lambda-web-adapter/tree/main/examples/fastapi-response-streaming)) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response. +For lowest latency on streaming responses, consider the ALB + Fargate deployment option which eliminates cold starts and provides consistent performance. ### Any plan to support SageMaker models? diff --git a/assets/arch.png b/assets/arch.png index 3740a9f..7adb0b8 100644 Binary files a/assets/arch.png and b/assets/arch.png differ diff --git a/deployment/BedrockProxy.template b/deployment/BedrockProxy.template index 1b15de4..6f6cda3 100644 --- a/deployment/BedrockProxy.template +++ b/deployment/BedrockProxy.template @@ -1,4 +1,4 @@ -Description: Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock +Description: Bedrock Access Gateway - OpenAI-compatible RESTful APIs for Amazon Bedrock (API Gateway + Lambda with Streaming) Parameters: ApiKeySecretArn: Type: String @@ -19,116 +19,8 @@ Parameters: - "false" Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings. Resources: - VPCB9E5F0B4: - Type: AWS::EC2::VPC - Properties: - CidrBlock: 10.250.0.0/16 - EnableDnsHostnames: true - EnableDnsSupport: true - InstanceTenancy: default - Tags: - - Key: Name - Value: BedrockProxy/VPC - VPCPublicSubnet1SubnetB4246D30: - Type: AWS::EC2::Subnet - Properties: - AvailabilityZone: - Fn::Select: - - 0 - - Fn::GetAZs: "" - CidrBlock: 10.250.0.0/24 - MapPublicIpOnLaunch: true - Tags: - - Key: aws-cdk:subnet-name - Value: Public - - Key: aws-cdk:subnet-type - Value: Public - - Key: Name - Value: BedrockProxy/VPC/PublicSubnet1 - VpcId: - Ref: VPCB9E5F0B4 - VPCPublicSubnet1RouteTableFEE4B781: - Type: AWS::EC2::RouteTable - Properties: - Tags: - - Key: Name - Value: BedrockProxy/VPC/PublicSubnet1 - VpcId: - Ref: VPCB9E5F0B4 - VPCPublicSubnet1RouteTableAssociation0B0896DC: - Type: AWS::EC2::SubnetRouteTableAssociation - Properties: - RouteTableId: - Ref: VPCPublicSubnet1RouteTableFEE4B781 - SubnetId: - Ref: VPCPublicSubnet1SubnetB4246D30 - VPCPublicSubnet1DefaultRoute91CEF279: - Type: AWS::EC2::Route - Properties: - DestinationCidrBlock: 0.0.0.0/0 - GatewayId: - Ref: VPCIGWB7E252D3 - RouteTableId: - Ref: VPCPublicSubnet1RouteTableFEE4B781 - DependsOn: - - VPCVPCGW99B986DC - VPCPublicSubnet2Subnet74179F39: - Type: AWS::EC2::Subnet - Properties: - AvailabilityZone: - Fn::Select: - - 1 - - Fn::GetAZs: "" - CidrBlock: 10.250.1.0/24 - MapPublicIpOnLaunch: true - Tags: - - Key: aws-cdk:subnet-name - Value: Public - - Key: aws-cdk:subnet-type - Value: Public - - Key: Name - Value: BedrockProxy/VPC/PublicSubnet2 - VpcId: - Ref: VPCB9E5F0B4 - VPCPublicSubnet2RouteTable6F1A15F1: - Type: AWS::EC2::RouteTable - Properties: - Tags: - - Key: Name - Value: BedrockProxy/VPC/PublicSubnet2 - VpcId: - Ref: VPCB9E5F0B4 - VPCPublicSubnet2RouteTableAssociation5A808732: - Type: AWS::EC2::SubnetRouteTableAssociation - Properties: - RouteTableId: - Ref: VPCPublicSubnet2RouteTable6F1A15F1 - SubnetId: - Ref: VPCPublicSubnet2Subnet74179F39 - VPCPublicSubnet2DefaultRouteB7481BBA: - Type: AWS::EC2::Route - Properties: - DestinationCidrBlock: 0.0.0.0/0 - GatewayId: - Ref: VPCIGWB7E252D3 - RouteTableId: - Ref: VPCPublicSubnet2RouteTable6F1A15F1 - DependsOn: - - VPCVPCGW99B986DC - VPCIGWB7E252D3: - Type: AWS::EC2::InternetGateway - Properties: - Tags: - - Key: Name - Value: BedrockProxy/VPC - VPCVPCGW99B986DC: - Type: AWS::EC2::VPCGatewayAttachment - Properties: - InternetGatewayId: - Ref: VPCIGWB7E252D3 - VpcId: - Ref: VPCB9E5F0B4 - ProxyApiHandlerServiceRoleBE71BFB1: + # IAM Role for Lambda + ProxyApiHandlerServiceRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: @@ -139,12 +31,9 @@ Resources: Service: lambda.amazonaws.com Version: "2012-10-17" ManagedPolicyArns: - - Fn::Join: - - "" - - - "arn:" - - Ref: AWS::Partition - - :iam::aws:policy/service-role/AWSLambdaBasicExecutionRole - ProxyApiHandlerServiceRoleDefaultPolicy86681202: + - !Sub "arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" + + ProxyApiHandlerServiceRoleDefaultPolicy: Type: AWS::IAM::Policy Properties: PolicyDocument: @@ -166,122 +55,124 @@ Resources: - secretsmanager:GetSecretValue - secretsmanager:DescribeSecret Effect: Allow - Resource: - Ref: ApiKeySecretArn + Resource: !Ref ApiKeySecretArn Version: "2012-10-17" - PolicyName: ProxyApiHandlerServiceRoleDefaultPolicy86681202 + PolicyName: ProxyApiHandlerServiceRoleDefaultPolicy Roles: - - Ref: ProxyApiHandlerServiceRoleBE71BFB1 - ProxyApiHandlerEC15A492: + - !Ref ProxyApiHandlerServiceRole + + # Lambda Function with Lambda Web Adapter for streaming + ProxyApiHandler: Type: AWS::Lambda::Function Properties: Architectures: - arm64 Code: - ImageUri: - Ref: ContainerImageUri - Description: Bedrock Proxy API Handler + ImageUri: !Ref ContainerImageUri + Description: Bedrock Proxy API Handler with Response Streaming Environment: Variables: + # Lambda Web Adapter settings + AWS_LWA_INVOKE_MODE: RESPONSE_STREAM + AWS_LWA_READINESS_CHECK_PATH: /health + AWS_LWA_ASYNC_INIT: "true" + PORT: "8080" + # Application settings DEBUG: "false" - API_KEY_SECRET_ARN: - Ref: ApiKeySecretArn - DEFAULT_MODEL: - Ref: DefaultModelId + API_KEY_SECRET_ARN: !Ref ApiKeySecretArn + DEFAULT_MODEL: !Ref DefaultModelId DEFAULT_EMBEDDING_MODEL: cohere.embed-multilingual-v3 ENABLE_CROSS_REGION_INFERENCE: "true" ENABLE_APPLICATION_INFERENCE_PROFILES: "true" - ENABLE_PROMPT_CACHING: - Ref: EnablePromptCaching + ENABLE_PROMPT_CACHING: !Ref EnablePromptCaching + API_ROUTE_PREFIX: /v1 MemorySize: 1024 PackageType: Image - Role: - Fn::GetAtt: - - ProxyApiHandlerServiceRoleBE71BFB1 - - Arn + Role: !GetAtt ProxyApiHandlerServiceRole.Arn Timeout: 600 DependsOn: - - ProxyApiHandlerServiceRoleDefaultPolicy86681202 - - ProxyApiHandlerServiceRoleBE71BFB1 - ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779: + - ProxyApiHandlerServiceRoleDefaultPolicy + - ProxyApiHandlerServiceRole + + # API Gateway REST API (Regional) + RestApi: + Type: AWS::ApiGateway::RestApi + Properties: + Name: BedrockProxyApi + Description: Bedrock Access Gateway - OpenAI-compatible API with streaming support + EndpointConfiguration: + Types: + - REGIONAL + Body: + openapi: "3.0.1" + info: + title: BedrockProxyApi + version: "1.0" + paths: + /{proxy+}: + x-amazon-apigateway-any-method: + parameters: + - name: proxy + in: path + required: true + schema: + type: string + x-amazon-apigateway-integration: + type: aws_proxy + httpMethod: POST + uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations" + passthroughBehavior: when_no_match + timeoutInMillis: 600000 + responseTransferMode: STREAM + responses: + default: + description: Default response + /: + x-amazon-apigateway-any-method: + x-amazon-apigateway-integration: + type: aws_proxy + httpMethod: POST + uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${ProxyApiHandler.Arn}/response-streaming-invocations" + passthroughBehavior: when_no_match + timeoutInMillis: 600000 + responseTransferMode: STREAM + responses: + default: + description: Default response + + # Lambda Permission for API Gateway + LambdaPermission: Type: AWS::Lambda::Permission Properties: + FunctionName: !Ref ProxyApiHandler Action: lambda:InvokeFunction - FunctionName: - Fn::GetAtt: - - ProxyApiHandlerEC15A492 - - Arn - Principal: elasticloadbalancing.amazonaws.com - ProxyALB87756780: - Type: AWS::ElasticLoadBalancingV2::LoadBalancer + Principal: apigateway.amazonaws.com + SourceArn: !Sub "arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${RestApi}/*" + + # API Gateway Deployment + ApiDeployment: + Type: AWS::ApiGateway::Deployment Properties: - LoadBalancerAttributes: - - Key: deletion_protection.enabled - Value: "false" - Scheme: internet-facing - SecurityGroups: - - Fn::GetAtt: - - ProxyALBSecurityGroup0D6CA3DA - - GroupId - Subnets: - - Ref: VPCPublicSubnet1SubnetB4246D30 - - Ref: VPCPublicSubnet2Subnet74179F39 - Type: application + RestApiId: !Ref RestApi DependsOn: - - VPCPublicSubnet1DefaultRoute91CEF279 - - VPCPublicSubnet1RouteTableAssociation0B0896DC - - VPCPublicSubnet2DefaultRouteB7481BBA - - VPCPublicSubnet2RouteTableAssociation5A808732 - ProxyALBSecurityGroup0D6CA3DA: - Type: AWS::EC2::SecurityGroup + - RestApi + + # API Gateway Stage + ApiStage: + Type: AWS::ApiGateway::Stage Properties: - GroupDescription: Automatically created Security Group for ELB BedrockProxyALB1CE4CAD1 - SecurityGroupEgress: - - CidrIp: 255.255.255.255/32 - Description: Disallow all traffic - FromPort: 252 - IpProtocol: icmp - ToPort: 86 - SecurityGroupIngress: - - CidrIp: 0.0.0.0/0 - Description: Allow from anyone on port 80 - FromPort: 80 - IpProtocol: tcp - ToPort: 80 - VpcId: - Ref: VPCB9E5F0B4 - ProxyALBListener933E9515: - Type: AWS::ElasticLoadBalancingV2::Listener - Properties: - DefaultActions: - - TargetGroupArn: - Ref: ProxyALBListenerTargetsGroup187739FA - Type: forward - LoadBalancerArn: - Ref: ProxyALB87756780 - Port: 80 - Protocol: HTTP - ProxyALBListenerTargetsGroup187739FA: - Type: AWS::ElasticLoadBalancingV2::TargetGroup - Properties: - HealthCheckEnabled: false - TargetType: lambda - Targets: - - Id: - Fn::GetAtt: - - ProxyApiHandlerEC15A492 - - Arn - DependsOn: - - ProxyApiHandlerInvoke2UTWxhlfyqbT5FTn5jvgbLgjFfJwzswGk55DU1HYF6C33779 + RestApiId: !Ref RestApi + DeploymentId: !Ref ApiDeployment + StageName: api + Description: API Stage with streaming support + Outputs: APIBaseUrl: Description: Proxy API Base URL (OPENAI_API_BASE) - Value: - Fn::Join: - - "" - - - http:// - - Fn::GetAtt: - - ProxyALB87756780 - - DNSName - - /api/v1 - + Value: !Sub "https://${RestApi}.execute-api.${AWS::Region}.amazonaws.com/api/v1" + RestApiId: + Description: API Gateway REST API ID + Value: !Ref RestApi + LambdaFunctionArn: + Description: Lambda Function ARN + Value: !GetAtt ProxyApiHandler.Arn diff --git a/src/Dockerfile b/src/Dockerfile index 920a01e..c0cf065 100644 --- a/src/Dockerfile +++ b/src/Dockerfile @@ -1,9 +1,15 @@ FROM public.ecr.aws/lambda/python:3.12 +# Add Lambda Web Adapter for API Gateway response streaming +COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.1 /lambda-adapter /opt/extensions/lambda-adapter + COPY ./api ./api COPY requirements.txt . RUN pip3 install -r requirements.txt -U --no-cache-dir -CMD [ "api.app.handler" ] \ No newline at end of file +# Lambda Web Adapter requires overriding the Lambda base image entrypoint +# to run the web app directly instead of the Lambda runtime handler +ENTRYPOINT [] +CMD ["python", "-m", "uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "8080"] \ No newline at end of file