Mengxin Zhu
b41633b826
feat(apigw): add API Gateway response streaming support ( #207 )
...
Replace ALB + Lambda architecture with API Gateway REST API + Lambda
using response streaming for SSE support. This provides:
- No VPC required, reducing complexity and cost
- Native streaming support via API Gateway response streaming
- Pay-per-request pricing model
Changes:
- Add Lambda Web Adapter to Dockerfile for streaming support
- Replace BedrockProxy.template with API Gateway configuration
- Update README with new deployment options and latest models
- Update architecture diagram for API Gateway flow
2025-12-05 10:54:13 +08:00
Kane Zhu
b4800c54a0
feat: add prompt caching support for Claude and Nova models
...
Add comprehensive prompt caching support with flexible control options:
Features:
- ENV variable control (ENABLE_PROMPT_CACHING, default: false)
- Per-request control via extra_body.prompt_caching
- Pattern-based model detection (Claude, Nova)
- Token limit warnings (Nova 20K limit)
- OpenAI-compatible response format (prompt_tokens_details.cached_tokens)
Supported models:
- Claude 3+ models (anthropic.claude-*)
- Nova models (amazon.nova-*)
- Auto-detection prevents breaking unsupported models
Implementation:
- System prompts caching via extra_body.prompt_caching.system
- Messages caching via extra_body.prompt_caching.messages
- Non-streaming and streaming modes
- Compatible with reasoning, thinking, and tool calls
2025-10-15 11:03:19 +08:00
Mengxin Zhu
e3ee9a707f
docs: update deployment instructions and enhance ECR push script
2025-09-30 16:06:21 +08:00
Gagan M
01836087b1
feat: add support to include application inference profiles as models ( #131 )
...
---------
Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com >
2025-06-23 22:49:27 +08:00
Aiden Dai
c39f6bc942
Use secrets manager for api key
2025-02-10 15:25:12 +08:00
Aiden Dai
4d88731233
Use secrets manager for api key
2025-02-08 21:36:59 +08:00
Aiden Dai
dc067affc0
Use yaml template
2024-12-16 16:33:37 +08:00
heisenbergye
5f7676608a
suppot all Claude models Cross-Region Inference ( #65 )
2024-10-29 14:43:31 +08:00
Aiden Dai
2bee83a79a
Add support of Sydney region
2024-04-12 13:51:57 +08:00
Aiden Dai
b7adaf3040
Update cfn template
2024-04-08 10:45:56 +08:00
Aiden Dai
f974cb2728
Initial commit
2024-03-27 15:20:24 +08:00