Commit Graph

134 Commits

Author SHA1 Message Date
user-error1
ce4cfabb21 Fixed <think> </think> tags for GPT-OSS in bedrock.py (#200)
Added handling for message and content block deltas, including safety checks for open thinking tags.

Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.
2025-11-20 14:29:20 +08:00
Donghee Na
7e03ab062d fix: Fix invalid cache_creation_tokens metric key (#195) 2025-10-27 14:31:21 +08:00
Shion Ichikawa
18b68bd3a7 🐳 preload tiktoken encoding in Dockerfile_ecs (#193) 2025-10-22 22:28:40 +08:00
Kane Zhu
d86e64eed3 refactor(bedrock): unify inference profile metadata handling and cleanup
- Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles
- Remove unused region prefix functions and defaultdict import
- Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts
- Improve model ARN parsing and error handling in profile enumeration
- Consolidate profile metadata storage to enable consistent feature detection
2025-10-16 15:24:02 +08:00
Kane Zhu
b4800c54a0 feat: add prompt caching support for Claude and Nova models
Add comprehensive prompt caching support with flexible control options:

Features:
- ENV variable control (ENABLE_PROMPT_CACHING, default: false)
- Per-request control via extra_body.prompt_caching
- Pattern-based model detection (Claude, Nova)
- Token limit warnings (Nova 20K limit)
- OpenAI-compatible response format (prompt_tokens_details.cached_tokens)

Supported models:
- Claude 3+ models (anthropic.claude-*)
- Nova models (amazon.nova-*)
- Auto-detection prevents breaking unsupported models

Implementation:
- System prompts caching via extra_body.prompt_caching.system
- Messages caching via extra_body.prompt_caching.messages
- Non-streaming and streaming modes
- Compatible with reasoning, thinking, and tool calls
2025-10-15 11:03:19 +08:00
Scott Baxter
7756532b4c fix: ECS container /health endpoint does not require API_KEY Bearer Token (#184) 2025-10-13 11:59:42 +08:00
Li Yi
9cea7f9314 chore: polish code with little update (#182)
- Run Docker container as non-root user (appuser) to minimize security risks
- Add Docker HEALTHCHECK for better container orchestration
- Make CORS configurable via ALLOWED_ORIGINS env var with security warning
- Replace assertions with proper error handling (TypeError/ValueError)
- Add 30s timeout to HTTP requests to prevent hanging connections
- Disable auto-reload in production uvicorn settings
2025-10-11 14:49:18 +08:00
Fabian Franz
8177876e5e Support <think> tags (#117) 2025-09-30 20:29:19 +08:00
Neil Mazumdar
66cb51bb36 feat: add Claude Sonnet 4.5 support with global cross-region inference (#180)
This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929),
Anthropic's most intelligent model with enhanced coding capabilities and complex agent support.

Changes:
- Added global cross-region inference profile discovery (global.anthropic.*)
- Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously)
- Fixed reasoning_effort parameter handling to prevent KeyError
- Added extended thinking/interleaved thinking support via extra_body parameter
- Updated documentation with Claude Sonnet 4.5 examples (English and Chinese)
- Updated README with Sonnet 4.5 announcement

Technical Details:
- src/api/models/bedrock.py: Added global profile support in list_bedrock_models()
- src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter
- src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError
- docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples
- docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples
- docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation

Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
2025-09-30 16:51:26 +08:00
Mengxin Zhu
371d11d101 chore: cleanup useless files 2025-09-30 16:08:56 +08:00
Mengxin Zhu
e3ee9a707f docs: update deployment instructions and enhance ECR push script 2025-09-30 16:06:21 +08:00
Divyateja Pasupuleti
bdfa57c277 chore: update requirements to fix vulnerability (#177)
* chore: update requirements to fix vulnerability

* Update Python base image to version 3.13-slim
2025-09-19 16:15:32 +08:00
jbrockett
911dfe26d6 models: fix Application Inference Profiles mapping (#175)
* models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs

* Fix rebase issue

---------

Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>
2025-08-14 15:21:14 +08:00
RizviR
a2110ff648 Add pagination to list_inference_profiles calls (#173)
Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>
2025-08-13 10:26:34 +08:00
Fabian Franz
0cce2edab0 feat: update boto3 to version 1.40.4 (#169)
Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-13 10:23:30 +08:00
heisenbergye
3f1b56a526 feat: support Claude 4 Interleaved thinking (beta) (#164) 2025-07-21 16:44:21 +08:00
Mengxin Zhu
76a3614f17 fix: properly handle tool_use messages in conversation 2025-06-30 00:14:26 +08:00
Gagan M
01836087b1 feat: add support to include application inference profiles as models (#131)
---------

Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>
2025-06-23 22:49:27 +08:00
dependabot[bot]
dd191d7cd9 Bump requests from 2.32.3 to 2.32.4 in /src (#151)
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-20 17:50:19 +08:00
Zack Elias
844efec086 add titan G1 embeddings (#152) 2025-06-17 11:09:22 +08:00
UniMa007
aed57307bc Add Titan Embeddings G2 (#94) 2025-05-27 21:52:15 +08:00
Aiden Dai
4e8a913e43 fix empty content issue 2025-04-20 09:21:47 +08:00
Aiden Dai
b27e83624f fix typo 2025-03-26 13:10:07 +08:00
Aiden Dai
c98e123c8f optimize error response in streaming 2025-03-26 11:32:39 +08:00
Aiden Dai
4f1a75b49f fix potential process stuck issue 2025-03-22 18:39:08 +08:00
Aiden Dai
0ead770069 performance improvement 2025-03-13 18:24:08 +08:00
Aiden Dai
fa14ae8c05 apply ruff linter 2025-03-13 14:24:41 +08:00
Aiden Dai
879b8e2ac7 apply ruff linter 2025-03-13 13:58:18 +08:00
Aiden Dai
f21b9a2e84 apply ruff linter 2025-03-13 13:50:57 +08:00
Aiden Dai
33e8fcfd3b fix potential bad request issue 2025-03-13 07:16:42 +08:00
Aiden Dai
5ff18c0acd Update usage guide for deepseek-r1 2025-03-11 10:25:50 +08:00
Aiden Dai
fcbfa9fe3d Update usage guide for deepseek-r1 2025-03-11 10:24:19 +08:00
Aiden Dai
1a9c0f461e Update usage guide for deepseek-r1 2025-03-11 10:14:06 +08:00
Aiden Dai
66b8967d30 Update usage guide for deepseek-r1 2025-03-11 10:10:58 +08:00
Zhongsheng Ji
fcfebf9d9d feat: Response 429 if ThrottlingException (#91) 2025-03-10 09:01:33 +08:00
Aiden Dai
283115000a Support of reasoning 2025-02-28 08:08:54 +08:00
Aiden Dai
4095c2e74e Support of reasoning 2025-02-26 13:28:23 +08:00
Aiden Dai
a46e329c97 Support of reasoning 2025-02-26 12:25:38 +08:00
Omri Shaiko
54f4a2b017 Fix issue with toolResult error with Cursor. Use default DEFAULT_MODEL in ChatRequest (#110) 2025-02-26 10:43:44 +08:00
Aiden Dai
3ce47ff278 Partial support of reasoning 2025-02-25 16:23:06 +08:00
Sean Smith
b26ee3e9ea Added troubleshooting guide and made buttons cool (#96)
Signed-off-by: Sean Smith <sean.smith@contextual.ai>
2025-02-11 12:40:27 +08:00
Aiden Dai
1cb8a6a603 Update readme 2025-02-10 15:48:34 +08:00
Aiden Dai
c39f6bc942 Use secrets manager for api key 2025-02-10 15:25:12 +08:00
Aiden Dai
74ca3b938e Update architecture diagram 2025-02-10 10:02:43 +08:00
Aiden Dai
a6f3e1176b fix secret access issue 2025-02-09 06:53:23 +08:00
Aiden Dai
4d88731233 Use secrets manager for api key 2025-02-08 21:36:59 +08:00
Sean Smith
48bf360456 Security Guide (#101)
Signed-off-by: Sean Smith <sean.smith@contextual.ai>
2025-02-08 11:40:24 +08:00
yytdfc
093c6fa586 add stop parameter (#86) 2024-12-31 11:15:24 +08:00
Aiden Dai
b2c187c716 Increase connect timeout 2024-12-19 16:45:18 +08:00
Aiden Dai
581638b794 Update docs 2024-12-17 17:38:21 +08:00