Replace ALB + Lambda architecture with API Gateway REST API + Lambda
using response streaming for SSE support. This provides:
- No VPC required, reducing complexity and cost
- Native streaming support via API Gateway response streaming
- Pay-per-request pricing model
Changes:
- Add Lambda Web Adapter to Dockerfile for streaming support
- Replace BedrockProxy.template with API Gateway configuration
- Update README with new deployment options and latest models
- Update architecture diagram for API Gateway flow
Docker BuildKit (especially with docker-container driver) may create
OCI image manifests with attestations that AWS Lambda does not support.
Lambda requires Docker V2 Schema 2 format without multi-manifest index.
This fix ensures the build script generates Lambda-compatible images
regardless of the user's Docker/BuildKit configuration.
Fixes#206
* fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually
* Add docker-compose to support running locally
Added handling for message and content block deltas, including safety checks for open thinking tags.
Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.
- Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles
- Remove unused region prefix functions and defaultdict import
- Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts
- Improve model ARN parsing and error handling in profile enumeration
- Consolidate profile metadata storage to enable consistent feature detection
- Run Docker container as non-root user (appuser) to minimize security risks
- Add Docker HEALTHCHECK for better container orchestration
- Make CORS configurable via ALLOWED_ORIGINS env var with security warning
- Replace assertions with proper error handling (TypeError/ValueError)
- Add 30s timeout to HTTP requests to prevent hanging connections
- Disable auto-reload in production uvicorn settings
This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929),
Anthropic's most intelligent model with enhanced coding capabilities and complex agent support.
Changes:
- Added global cross-region inference profile discovery (global.anthropic.*)
- Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously)
- Fixed reasoning_effort parameter handling to prevent KeyError
- Added extended thinking/interleaved thinking support via extra_body parameter
- Updated documentation with Claude Sonnet 4.5 examples (English and Chinese)
- Updated README with Sonnet 4.5 announcement
Technical Details:
- src/api/models/bedrock.py: Added global profile support in list_bedrock_models()
- src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter
- src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError
- docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples
- docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples
- docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation
Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
* models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs
* Fix rebase issue
---------
Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>
Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>