13 Commits

Author SHA1 Message Date
Mengxin Zhu
b41633b826 feat(apigw): add API Gateway response streaming support (#207)
Replace ALB + Lambda architecture with API Gateway REST API + Lambda
using response streaming for SSE support. This provides:

- No VPC required, reducing complexity and cost
- Native streaming support via API Gateway response streaming
- Pay-per-request pricing model

Changes:
- Add Lambda Web Adapter to Dockerfile for streaming support
- Replace BedrockProxy.template with API Gateway configuration
- Update README with new deployment options and latest models
- Update architecture diagram for API Gateway flow
2025-12-05 10:54:13 +08:00
Kane Zhu
b4800c54a0 feat: add prompt caching support for Claude and Nova models
Add comprehensive prompt caching support with flexible control options:

Features:
- ENV variable control (ENABLE_PROMPT_CACHING, default: false)
- Per-request control via extra_body.prompt_caching
- Pattern-based model detection (Claude, Nova)
- Token limit warnings (Nova 20K limit)
- OpenAI-compatible response format (prompt_tokens_details.cached_tokens)

Supported models:
- Claude 3+ models (anthropic.claude-*)
- Nova models (amazon.nova-*)
- Auto-detection prevents breaking unsupported models

Implementation:
- System prompts caching via extra_body.prompt_caching.system
- Messages caching via extra_body.prompt_caching.messages
- Non-streaming and streaming modes
- Compatible with reasoning, thinking, and tool calls
2025-10-15 11:03:19 +08:00
Li Yi
9cea7f9314 chore: polish code with little update (#182)
- Run Docker container as non-root user (appuser) to minimize security risks
- Add Docker HEALTHCHECK for better container orchestration
- Make CORS configurable via ALLOWED_ORIGINS env var with security warning
- Replace assertions with proper error handling (TypeError/ValueError)
- Add 30s timeout to HTTP requests to prevent hanging connections
- Disable auto-reload in production uvicorn settings
2025-10-11 14:49:18 +08:00
Mengxin Zhu
e3ee9a707f docs: update deployment instructions and enhance ECR push script 2025-09-30 16:06:21 +08:00
Gagan M
01836087b1 feat: add support to include application inference profiles as models (#131)
---------

Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>
2025-06-23 22:49:27 +08:00
Aiden Dai
c39f6bc942 Use secrets manager for api key 2025-02-10 15:25:12 +08:00
Aiden Dai
a6f3e1176b fix secret access issue 2025-02-09 06:53:23 +08:00
Aiden Dai
4d88731233 Use secrets manager for api key 2025-02-08 21:36:59 +08:00
Aiden Dai
dc067affc0 Use yaml template 2024-12-16 16:33:37 +08:00
heisenbergye
5f7676608a suppot all Claude models Cross-Region Inference (#65) 2024-10-29 14:43:31 +08:00
Aiden Dai
2bee83a79a Add support of Sydney region 2024-04-12 13:51:57 +08:00
Aiden Dai
b7adaf3040 Update cfn template 2024-04-08 10:45:56 +08:00
Aiden Dai
f974cb2728 Initial commit 2024-03-27 15:20:24 +08:00