Kane Zhu
b4800c54a0
feat: add prompt caching support for Claude and Nova models
...
Add comprehensive prompt caching support with flexible control options:
Features:
- ENV variable control (ENABLE_PROMPT_CACHING, default: false)
- Per-request control via extra_body.prompt_caching
- Pattern-based model detection (Claude, Nova)
- Token limit warnings (Nova 20K limit)
- OpenAI-compatible response format (prompt_tokens_details.cached_tokens)
Supported models:
- Claude 3+ models (anthropic.claude-*)
- Nova models (amazon.nova-*)
- Auto-detection prevents breaking unsupported models
Implementation:
- System prompts caching via extra_body.prompt_caching.system
- Messages caching via extra_body.prompt_caching.messages
- Non-streaming and streaming modes
- Compatible with reasoning, thinking, and tool calls
2025-10-15 11:03:19 +08:00
Li Yi
9cea7f9314
chore: polish code with little update ( #182 )
...
- Run Docker container as non-root user (appuser) to minimize security risks
- Add Docker HEALTHCHECK for better container orchestration
- Make CORS configurable via ALLOWED_ORIGINS env var with security warning
- Replace assertions with proper error handling (TypeError/ValueError)
- Add 30s timeout to HTTP requests to prevent hanging connections
- Disable auto-reload in production uvicorn settings
2025-10-11 14:49:18 +08:00
Mengxin Zhu
e3ee9a707f
docs: update deployment instructions and enhance ECR push script
2025-09-30 16:06:21 +08:00
Gagan M
01836087b1
feat: add support to include application inference profiles as models ( #131 )
...
---------
Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com >
2025-06-23 22:49:27 +08:00
Aiden Dai
c39f6bc942
Use secrets manager for api key
2025-02-10 15:25:12 +08:00
Aiden Dai
a6f3e1176b
fix secret access issue
2025-02-09 06:53:23 +08:00
Aiden Dai
4d88731233
Use secrets manager for api key
2025-02-08 21:36:59 +08:00
Aiden Dai
dc067affc0
Use yaml template
2024-12-16 16:33:37 +08:00
heisenbergye
5f7676608a
suppot all Claude models Cross-Region Inference ( #65 )
2024-10-29 14:43:31 +08:00
Aiden Dai
2bee83a79a
Add support of Sydney region
2024-04-12 13:51:57 +08:00
Aiden Dai
b7adaf3040
Update cfn template
2024-04-08 10:45:56 +08:00