Commit Graph

147 Commits

Author SHA1 Message Date
Donghee Na
737cf076a0 fix: Fix ImageContent schema to use proper default value (#234) 2026-03-13 10:42:22 +08:00
Kane Zhu
6ae73c0c69 fix: merge additionalModelRequestFields instead of overwriting
When both reasoning_effort and extra_body are provided,
additionalModelRequestFields set by reasoning_effort (containing
reasoning_config) was silently overwritten by extra_body processing.
This prevented features like anthropic_beta for 1M context from
coexisting with reasoning_effort.
2026-03-10 16:41:52 +08:00
Donghee Na
d1dc4ed164 fix: Support reasoning_tokens at bedrock streaming response (#223) 2026-02-26 11:48:05 +08:00
Gabriel Koo
d14596ff47 feat: add Amazon Nova 2 multimodal embeddings support (#222)
* feat: add Amazon Nova 2 multimodal embeddings support

Adds support for `amazon.nova-2-multimodal-embeddings-v1:0` via the
new `NovaEmbeddingsModel` class, using the `taskType`/`singleEmbeddingParams`
request format documented in the Nova 2 user guide.

- Supports single and batch text inputs
- Respects the `dimensions` parameter (256/512/1024/2048/3072, default 3072)
- Supports `float` and `base64` encoding formats
- Includes `test_nova_embed.py` for quick end-to-end verification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove test script from repo

Test script moved to PR description instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: validate Nova embedding dimensions and fix falsy-zero bug

- Add VALID_DIMENSIONS set and upfront validation with a clear error message
- Fix `dimensions or DEFAULT` which would incorrectly ignore dimensions=0
- Add inline comment explaining approximate token counting (Nova API
  does not return token counts in the response)

* fix: address PR review comments for NovaEmbeddingsModel

- Fix VALID_DIMENSIONS to {256, 384, 1024, 3072} per Nova embeddings schema docs
  (previous values 512/2048 were mistakenly referenced from Titan embedding model docs)
- Replace str(item) fallback with HTTPException(400) to avoid silent garbage embeddings
- Update schema.py dimensions comment: 'not used' -> 'Used by Nova embeddings'
- Replace getattr() with direct .dimensions access on Pydantic model
- Move dimension validation before the loop (validates once, not per-text)
- Add enumerate to batch loop; include input index in error detail
- Switch isinstance(item, Iterable) to isinstance(item, list) for precise matching
- Add comment explaining embeddingPurpose hardcoded to GENERIC_INDEX

---------

Co-authored-by: Gabriel <gabrielkoo@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:41:17 +08:00
mjkam
a1844f95d4 Preload tiktoken encoding in Dockerfile (Lambda) (#220)
PR #193 added tiktoken preloading to Dockerfile_ecs but the same fix
was not applied to the Lambda Dockerfile. This causes a ConnectTimeout
error in network-restricted environments (e.g. Lambda in VPC without
NAT Gateway) when tiktoken tries to download cl100k_base encoding at
runtime from openaipublic.blob.core.windows.net.

Cache the encoding at build time, consistent with Dockerfile_ecs.

Related to #118
2026-02-19 17:00:05 +08:00
Hooman Yar
a150f7bb1c fix: support continue response for claude opus 4.6 (#219)
Co-authored-by: Hooman Yar <yarhooma@amazon.com>
2026-02-12 15:21:50 +08:00
Mengxin Zhu
9b3da3a5c8 fix(deps): update fastapi and starlette for CVE-2025-62727 (#216)
Update dependencies to fix HIGH severity ReDoS vulnerability:
- fastapi==0.128.0
- starlette==0.49.1

CVE-2025-62727 allows unauthenticated attackers to send crafted HTTP
Range headers that trigger quadratic-time processing in FileResponse
Range parsing, causing CPU exhaustion and DoS.

Fixes #215
2026-01-19 11:57:01 +08:00
Angélica de Oliveira
1a7f55b89b Add support for 'developer' role in chat messages (#209) 2025-12-09 11:26:10 +08:00
Mengxin Zhu
b41633b826 feat(apigw): add API Gateway response streaming support (#207)
Replace ALB + Lambda architecture with API Gateway REST API + Lambda
using response streaming for SSE support. This provides:

- No VPC required, reducing complexity and cost
- Native streaming support via API Gateway response streaming
- Pay-per-request pricing model

Changes:
- Add Lambda Web Adapter to Dockerfile for streaming support
- Replace BedrockProxy.template with API Gateway configuration
- Update README with new deployment options and latest models
- Update architecture diagram for API Gateway flow
2025-12-05 10:54:13 +08:00
Hooman Yar
0411454b3a feat: add claude-opus-4-5 to TEMPERATURE_TOPP_CONFLICT_MODELS set (#208)
Co-authored-by: Hooman Yar <yarhooma@amazon.com>
2025-12-05 09:22:37 +08:00
Kane Zhu
2c518bbd70 fix(docker): add --provenance=false --sbom=false for Lambda compatibility
Docker BuildKit (especially with docker-container driver) may create
OCI image manifests with attestations that AWS Lambda does not support.
Lambda requires Docker V2 Schema 2 format without multi-manifest index.

This fix ensures the build script generates Lambda-compatible images
regardless of the user's Docker/BuildKit configuration.

Fixes #206
2025-11-27 18:54:58 +08:00
Justin Dray
37374e79ba fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually (#202)
* fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually

* Add docker-compose to support running locally
2025-11-20 14:33:43 +08:00
Viktor Isaev
b3c1c82367 Fix healthcheck in Dockerfile_ecs (#199)
The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.
2025-11-20 14:30:00 +08:00
user-error1
ce4cfabb21 Fixed <think> </think> tags for GPT-OSS in bedrock.py (#200)
Added handling for message and content block deltas, including safety checks for open thinking tags.

Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.
2025-11-20 14:29:20 +08:00
Donghee Na
7e03ab062d fix: Fix invalid cache_creation_tokens metric key (#195) 2025-10-27 14:31:21 +08:00
Shion Ichikawa
18b68bd3a7 🐳 preload tiktoken encoding in Dockerfile_ecs (#193) 2025-10-22 22:28:40 +08:00
Kane Zhu
d86e64eed3 refactor(bedrock): unify inference profile metadata handling and cleanup
- Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles
- Remove unused region prefix functions and defaultdict import
- Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts
- Improve model ARN parsing and error handling in profile enumeration
- Consolidate profile metadata storage to enable consistent feature detection
2025-10-16 15:24:02 +08:00
Kane Zhu
b4800c54a0 feat: add prompt caching support for Claude and Nova models
Add comprehensive prompt caching support with flexible control options:

Features:
- ENV variable control (ENABLE_PROMPT_CACHING, default: false)
- Per-request control via extra_body.prompt_caching
- Pattern-based model detection (Claude, Nova)
- Token limit warnings (Nova 20K limit)
- OpenAI-compatible response format (prompt_tokens_details.cached_tokens)

Supported models:
- Claude 3+ models (anthropic.claude-*)
- Nova models (amazon.nova-*)
- Auto-detection prevents breaking unsupported models

Implementation:
- System prompts caching via extra_body.prompt_caching.system
- Messages caching via extra_body.prompt_caching.messages
- Non-streaming and streaming modes
- Compatible with reasoning, thinking, and tool calls
2025-10-15 11:03:19 +08:00
Scott Baxter
7756532b4c fix: ECS container /health endpoint does not require API_KEY Bearer Token (#184) 2025-10-13 11:59:42 +08:00
Li Yi
9cea7f9314 chore: polish code with little update (#182)
- Run Docker container as non-root user (appuser) to minimize security risks
- Add Docker HEALTHCHECK for better container orchestration
- Make CORS configurable via ALLOWED_ORIGINS env var with security warning
- Replace assertions with proper error handling (TypeError/ValueError)
- Add 30s timeout to HTTP requests to prevent hanging connections
- Disable auto-reload in production uvicorn settings
2025-10-11 14:49:18 +08:00
Fabian Franz
8177876e5e Support <think> tags (#117) 2025-09-30 20:29:19 +08:00
Neil Mazumdar
66cb51bb36 feat: add Claude Sonnet 4.5 support with global cross-region inference (#180)
This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929),
Anthropic's most intelligent model with enhanced coding capabilities and complex agent support.

Changes:
- Added global cross-region inference profile discovery (global.anthropic.*)
- Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously)
- Fixed reasoning_effort parameter handling to prevent KeyError
- Added extended thinking/interleaved thinking support via extra_body parameter
- Updated documentation with Claude Sonnet 4.5 examples (English and Chinese)
- Updated README with Sonnet 4.5 announcement

Technical Details:
- src/api/models/bedrock.py: Added global profile support in list_bedrock_models()
- src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter
- src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError
- docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples
- docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples
- docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation

Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
2025-09-30 16:51:26 +08:00
Mengxin Zhu
371d11d101 chore: cleanup useless files 2025-09-30 16:08:56 +08:00
Mengxin Zhu
e3ee9a707f docs: update deployment instructions and enhance ECR push script 2025-09-30 16:06:21 +08:00
Divyateja Pasupuleti
bdfa57c277 chore: update requirements to fix vulnerability (#177)
* chore: update requirements to fix vulnerability

* Update Python base image to version 3.13-slim
2025-09-19 16:15:32 +08:00
jbrockett
911dfe26d6 models: fix Application Inference Profiles mapping (#175)
* models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs

* Fix rebase issue

---------

Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>
2025-08-14 15:21:14 +08:00
RizviR
a2110ff648 Add pagination to list_inference_profiles calls (#173)
Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>
2025-08-13 10:26:34 +08:00
Fabian Franz
0cce2edab0 feat: update boto3 to version 1.40.4 (#169)
Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-13 10:23:30 +08:00
heisenbergye
3f1b56a526 feat: support Claude 4 Interleaved thinking (beta) (#164) 2025-07-21 16:44:21 +08:00
Mengxin Zhu
76a3614f17 fix: properly handle tool_use messages in conversation 2025-06-30 00:14:26 +08:00
Gagan M
01836087b1 feat: add support to include application inference profiles as models (#131)
---------

Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>
2025-06-23 22:49:27 +08:00
dependabot[bot]
dd191d7cd9 Bump requests from 2.32.3 to 2.32.4 in /src (#151)
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-20 17:50:19 +08:00
Zack Elias
844efec086 add titan G1 embeddings (#152) 2025-06-17 11:09:22 +08:00
UniMa007
aed57307bc Add Titan Embeddings G2 (#94) 2025-05-27 21:52:15 +08:00
Aiden Dai
4e8a913e43 fix empty content issue 2025-04-20 09:21:47 +08:00
Aiden Dai
b27e83624f fix typo 2025-03-26 13:10:07 +08:00
Aiden Dai
c98e123c8f optimize error response in streaming 2025-03-26 11:32:39 +08:00
Aiden Dai
4f1a75b49f fix potential process stuck issue 2025-03-22 18:39:08 +08:00
Aiden Dai
0ead770069 performance improvement 2025-03-13 18:24:08 +08:00
Aiden Dai
fa14ae8c05 apply ruff linter 2025-03-13 14:24:41 +08:00
Aiden Dai
879b8e2ac7 apply ruff linter 2025-03-13 13:58:18 +08:00
Aiden Dai
f21b9a2e84 apply ruff linter 2025-03-13 13:50:57 +08:00
Aiden Dai
33e8fcfd3b fix potential bad request issue 2025-03-13 07:16:42 +08:00
Aiden Dai
5ff18c0acd Update usage guide for deepseek-r1 2025-03-11 10:25:50 +08:00
Aiden Dai
fcbfa9fe3d Update usage guide for deepseek-r1 2025-03-11 10:24:19 +08:00
Aiden Dai
1a9c0f461e Update usage guide for deepseek-r1 2025-03-11 10:14:06 +08:00
Aiden Dai
66b8967d30 Update usage guide for deepseek-r1 2025-03-11 10:10:58 +08:00
Zhongsheng Ji
fcfebf9d9d feat: Response 429 if ThrottlingException (#91) 2025-03-10 09:01:33 +08:00
Aiden Dai
283115000a Support of reasoning 2025-02-28 08:08:54 +08:00
Aiden Dai
4095c2e74e Support of reasoning 2025-02-26 13:28:23 +08:00