When both reasoning_effort and extra_body are provided,
additionalModelRequestFields set by reasoning_effort (containing
reasoning_config) was silently overwritten by extra_body processing.
This prevented features like anthropic_beta for 1M context from
coexisting with reasoning_effort.
* feat: add Amazon Nova 2 multimodal embeddings support
Adds support for `amazon.nova-2-multimodal-embeddings-v1:0` via the
new `NovaEmbeddingsModel` class, using the `taskType`/`singleEmbeddingParams`
request format documented in the Nova 2 user guide.
- Supports single and batch text inputs
- Respects the `dimensions` parameter (256/512/1024/2048/3072, default 3072)
- Supports `float` and `base64` encoding formats
- Includes `test_nova_embed.py` for quick end-to-end verification
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: remove test script from repo
Test script moved to PR description instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: validate Nova embedding dimensions and fix falsy-zero bug
- Add VALID_DIMENSIONS set and upfront validation with a clear error message
- Fix `dimensions or DEFAULT` which would incorrectly ignore dimensions=0
- Add inline comment explaining approximate token counting (Nova API
does not return token counts in the response)
* fix: address PR review comments for NovaEmbeddingsModel
- Fix VALID_DIMENSIONS to {256, 384, 1024, 3072} per Nova embeddings schema docs
(previous values 512/2048 were mistakenly referenced from Titan embedding model docs)
- Replace str(item) fallback with HTTPException(400) to avoid silent garbage embeddings
- Update schema.py dimensions comment: 'not used' -> 'Used by Nova embeddings'
- Replace getattr() with direct .dimensions access on Pydantic model
- Move dimension validation before the loop (validates once, not per-text)
- Add enumerate to batch loop; include input index in error detail
- Switch isinstance(item, Iterable) to isinstance(item, list) for precise matching
- Add comment explaining embeddingPurpose hardcoded to GENERIC_INDEX
---------
Co-authored-by: Gabriel <gabrielkoo@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
PR #193 added tiktoken preloading to Dockerfile_ecs but the same fix
was not applied to the Lambda Dockerfile. This causes a ConnectTimeout
error in network-restricted environments (e.g. Lambda in VPC without
NAT Gateway) when tiktoken tries to download cl100k_base encoding at
runtime from openaipublic.blob.core.windows.net.
Cache the encoding at build time, consistent with Dockerfile_ecs.
Related to #118
Update dependencies to fix HIGH severity ReDoS vulnerability:
- fastapi==0.128.0
- starlette==0.49.1
CVE-2025-62727 allows unauthenticated attackers to send crafted HTTP
Range headers that trigger quadratic-time processing in FileResponse
Range parsing, causing CPU exhaustion and DoS.
Fixes#215
Replace ALB + Lambda architecture with API Gateway REST API + Lambda
using response streaming for SSE support. This provides:
- No VPC required, reducing complexity and cost
- Native streaming support via API Gateway response streaming
- Pay-per-request pricing model
Changes:
- Add Lambda Web Adapter to Dockerfile for streaming support
- Replace BedrockProxy.template with API Gateway configuration
- Update README with new deployment options and latest models
- Update architecture diagram for API Gateway flow
Added handling for message and content block deltas, including safety checks for open thinking tags.
Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.
- Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles
- Remove unused region prefix functions and defaultdict import
- Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts
- Improve model ARN parsing and error handling in profile enumeration
- Consolidate profile metadata storage to enable consistent feature detection
- Run Docker container as non-root user (appuser) to minimize security risks
- Add Docker HEALTHCHECK for better container orchestration
- Make CORS configurable via ALLOWED_ORIGINS env var with security warning
- Replace assertions with proper error handling (TypeError/ValueError)
- Add 30s timeout to HTTP requests to prevent hanging connections
- Disable auto-reload in production uvicorn settings
This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929),
Anthropic's most intelligent model with enhanced coding capabilities and complex agent support.
Changes:
- Added global cross-region inference profile discovery (global.anthropic.*)
- Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously)
- Fixed reasoning_effort parameter handling to prevent KeyError
- Added extended thinking/interleaved thinking support via extra_body parameter
- Updated documentation with Claude Sonnet 4.5 examples (English and Chinese)
- Updated README with Sonnet 4.5 announcement
Technical Details:
- src/api/models/bedrock.py: Added global profile support in list_bedrock_models()
- src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter
- src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError
- docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples
- docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples
- docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation
Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
* models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs
* Fix rebase issue
---------
Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>
Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>