bedrock-access-gateway

Author	SHA1	Message	Date
Donghee Na	737cf076a0	fix: Fix ImageContent schema to use proper default value (#234 )	2026-03-13 10:42:22 +08:00
Kane Zhu	6ae73c0c69	fix: merge additionalModelRequestFields instead of overwriting When both reasoning_effort and extra_body are provided, additionalModelRequestFields set by reasoning_effort (containing reasoning_config) was silently overwritten by extra_body processing. This prevented features like anthropic_beta for 1M context from coexisting with reasoning_effort.	2026-03-10 16:41:52 +08:00
Donghee Na	d1dc4ed164	fix: Support reasoning_tokens at bedrock streaming response (#223 )	2026-02-26 11:48:05 +08:00
Gabriel Koo	d14596ff47	feat: add Amazon Nova 2 multimodal embeddings support (#222 ) * feat: add Amazon Nova 2 multimodal embeddings support Adds support for `amazon.nova-2-multimodal-embeddings-v1:0` via the new `NovaEmbeddingsModel` class, using the `taskType`/`singleEmbeddingParams` request format documented in the Nova 2 user guide. - Supports single and batch text inputs - Respects the `dimensions` parameter (256/512/1024/2048/3072, default 3072) - Supports `float` and `base64` encoding formats - Includes `test_nova_embed.py` for quick end-to-end verification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: remove test script from repo Test script moved to PR description instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: validate Nova embedding dimensions and fix falsy-zero bug - Add VALID_DIMENSIONS set and upfront validation with a clear error message - Fix `dimensions or DEFAULT` which would incorrectly ignore dimensions=0 - Add inline comment explaining approximate token counting (Nova API does not return token counts in the response) * fix: address PR review comments for NovaEmbeddingsModel - Fix VALID_DIMENSIONS to {256, 384, 1024, 3072} per Nova embeddings schema docs (previous values 512/2048 were mistakenly referenced from Titan embedding model docs) - Replace str(item) fallback with HTTPException(400) to avoid silent garbage embeddings - Update schema.py dimensions comment: 'not used' -> 'Used by Nova embeddings' - Replace getattr() with direct .dimensions access on Pydantic model - Move dimension validation before the loop (validates once, not per-text) - Add enumerate to batch loop; include input index in error detail - Switch isinstance(item, Iterable) to isinstance(item, list) for precise matching - Add comment explaining embeddingPurpose hardcoded to GENERIC_INDEX --------- Co-authored-by: Gabriel <gabrielkoo@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 11:41:17 +08:00
mjkam	a1844f95d4	Preload tiktoken encoding in Dockerfile (Lambda) (#220 ) PR #193 added tiktoken preloading to Dockerfile_ecs but the same fix was not applied to the Lambda Dockerfile. This causes a ConnectTimeout error in network-restricted environments (e.g. Lambda in VPC without NAT Gateway) when tiktoken tries to download cl100k_base encoding at runtime from openaipublic.blob.core.windows.net. Cache the encoding at build time, consistent with Dockerfile_ecs. Related to #118	2026-02-19 17:00:05 +08:00
Hooman Yar	a150f7bb1c	fix: support continue response for claude opus 4.6 (#219 ) Co-authored-by: Hooman Yar <yarhooma@amazon.com>	2026-02-12 15:21:50 +08:00
Mengxin Zhu	9b3da3a5c8	fix(deps): update fastapi and starlette for CVE-2025-62727 (#216 ) Update dependencies to fix HIGH severity ReDoS vulnerability: - fastapi==0.128.0 - starlette==0.49.1 CVE-2025-62727 allows unauthenticated attackers to send crafted HTTP Range headers that trigger quadratic-time processing in FileResponse Range parsing, causing CPU exhaustion and DoS. Fixes #215	2026-01-19 11:57:01 +08:00
Angélica de Oliveira	1a7f55b89b	Add support for 'developer' role in chat messages (#209 )	2025-12-09 11:26:10 +08:00
Mengxin Zhu	b41633b826	feat(apigw): add API Gateway response streaming support (#207 ) Replace ALB + Lambda architecture with API Gateway REST API + Lambda using response streaming for SSE support. This provides: - No VPC required, reducing complexity and cost - Native streaming support via API Gateway response streaming - Pay-per-request pricing model Changes: - Add Lambda Web Adapter to Dockerfile for streaming support - Replace BedrockProxy.template with API Gateway configuration - Update README with new deployment options and latest models - Update architecture diagram for API Gateway flow	2025-12-05 10:54:13 +08:00
Hooman Yar	0411454b3a	feat: add claude-opus-4-5 to TEMPERATURE_TOPP_CONFLICT_MODELS set (#208 ) Co-authored-by: Hooman Yar <yarhooma@amazon.com>	2025-12-05 09:22:37 +08:00
Kane Zhu	2c518bbd70	fix(docker): add --provenance=false --sbom=false for Lambda compatibility Docker BuildKit (especially with docker-container driver) may create OCI image manifests with attestations that AWS Lambda does not support. Lambda requires Docker V2 Schema 2 format without multi-manifest index. This fix ensures the build script generates Lambda-compatible images regardless of the user's Docker/BuildKit configuration. Fixes #206	2025-11-27 18:54:58 +08:00
Justin Dray	37374e79ba	fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually (#202 ) * fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually * Add docker-compose to support running locally	2025-11-20 14:33:43 +08:00
Viktor Isaev	b3c1c82367	Fix healthcheck in Dockerfile_ecs (#199 ) The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.	2025-11-20 14:30:00 +08:00
user-error1	ce4cfabb21	Fixed <think> </think> tags for GPT-OSS in bedrock.py (#200 ) Added handling for message and content block deltas, including safety checks for open thinking tags. Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.	2025-11-20 14:29:20 +08:00
Donghee Na	7e03ab062d	fix: Fix invalid cache_creation_tokens metric key (#195 )	2025-10-27 14:31:21 +08:00
Shion Ichikawa	18b68bd3a7	🐳 preload tiktoken encoding in Dockerfile_ecs (#193 )	2025-10-22 22:28:40 +08:00
Kane Zhu	d86e64eed3	refactor(bedrock): unify inference profile metadata handling and cleanup - Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles - Remove unused region prefix functions and defaultdict import - Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts - Improve model ARN parsing and error handling in profile enumeration - Consolidate profile metadata storage to enable consistent feature detection	2025-10-16 15:24:02 +08:00
Kane Zhu	b4800c54a0	feat: add prompt caching support for Claude and Nova models Add comprehensive prompt caching support with flexible control options: Features: - ENV variable control (ENABLE_PROMPT_CACHING, default: false) - Per-request control via extra_body.prompt_caching - Pattern-based model detection (Claude, Nova) - Token limit warnings (Nova 20K limit) - OpenAI-compatible response format (prompt_tokens_details.cached_tokens) Supported models: - Claude 3+ models (anthropic.claude-) - Nova models (amazon.nova-) - Auto-detection prevents breaking unsupported models Implementation: - System prompts caching via extra_body.prompt_caching.system - Messages caching via extra_body.prompt_caching.messages - Non-streaming and streaming modes - Compatible with reasoning, thinking, and tool calls	2025-10-15 11:03:19 +08:00
Scott Baxter	7756532b4c	fix: ECS container /health endpoint does not require API_KEY Bearer Token (#184 )	2025-10-13 11:59:42 +08:00
Li Yi	9cea7f9314	chore: polish code with little update (#182 ) - Run Docker container as non-root user (appuser) to minimize security risks - Add Docker HEALTHCHECK for better container orchestration - Make CORS configurable via ALLOWED_ORIGINS env var with security warning - Replace assertions with proper error handling (TypeError/ValueError) - Add 30s timeout to HTTP requests to prevent hanging connections - Disable auto-reload in production uvicorn settings	2025-10-11 14:49:18 +08:00
Fabian Franz	8177876e5e	Support <think> tags (#117 )	2025-09-30 20:29:19 +08:00
Neil Mazumdar	66cb51bb36	feat: add Claude Sonnet 4.5 support with global cross-region inference (#180 ) This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929), Anthropic's most intelligent model with enhanced coding capabilities and complex agent support. Changes: - Added global cross-region inference profile discovery (global.anthropic.*) - Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously) - Fixed reasoning_effort parameter handling to prevent KeyError - Added extended thinking/interleaved thinking support via extra_body parameter - Updated documentation with Claude Sonnet 4.5 examples (English and Chinese) - Updated README with Sonnet 4.5 announcement Technical Details: - src/api/models/bedrock.py: Added global profile support in list_bedrock_models() - src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter - src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError - docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples - docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples - docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0	2025-09-30 16:51:26 +08:00
Mengxin Zhu	371d11d101	chore: cleanup useless files	2025-09-30 16:08:56 +08:00
Mengxin Zhu	e3ee9a707f	docs: update deployment instructions and enhance ECR push script	2025-09-30 16:06:21 +08:00
Divyateja Pasupuleti	bdfa57c277	chore: update requirements to fix vulnerability (#177 ) * chore: update requirements to fix vulnerability * Update Python base image to version 3.13-slim	2025-09-19 16:15:32 +08:00
jbrockett	911dfe26d6	models: fix Application Inference Profiles mapping (#175 ) * models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs * Fix rebase issue --------- Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>	2025-08-14 15:21:14 +08:00
RizviR	a2110ff648	Add pagination to list_inference_profiles calls (#173 ) Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>	2025-08-13 10:26:34 +08:00
Fabian Franz	0cce2edab0	feat: update boto3 to version 1.40.4 (#169 ) Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-08-13 10:23:30 +08:00
heisenbergye	3f1b56a526	feat: support Claude 4 Interleaved thinking (beta) (#164 )	2025-07-21 16:44:21 +08:00
Mengxin Zhu	76a3614f17	fix: properly handle tool_use messages in conversation	2025-06-30 00:14:26 +08:00
Gagan M	01836087b1	feat: add support to include application inference profiles as models (#131 ) --------- Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>	2025-06-23 22:49:27 +08:00
dependabot[bot]	dd191d7cd9	Bump requests from 2.32.3 to 2.32.4 in /src (#151 ) Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4) --- updated-dependencies: - dependency-name: requests dependency-version: 2.32.4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-06-20 17:50:19 +08:00
Zack Elias	844efec086	add titan G1 embeddings (#152 )	2025-06-17 11:09:22 +08:00
UniMa007	aed57307bc	Add Titan Embeddings G2 (#94 )	2025-05-27 21:52:15 +08:00
Aiden Dai	4e8a913e43	fix empty content issue	2025-04-20 09:21:47 +08:00
Aiden Dai	b27e83624f	fix typo	2025-03-26 13:10:07 +08:00
Aiden Dai	c98e123c8f	optimize error response in streaming	2025-03-26 11:32:39 +08:00
Aiden Dai	4f1a75b49f	fix potential process stuck issue	2025-03-22 18:39:08 +08:00
Aiden Dai	0ead770069	performance improvement	2025-03-13 18:24:08 +08:00
Aiden Dai	fa14ae8c05	apply ruff linter	2025-03-13 14:24:41 +08:00
Aiden Dai	879b8e2ac7	apply ruff linter	2025-03-13 13:58:18 +08:00
Aiden Dai	f21b9a2e84	apply ruff linter	2025-03-13 13:50:57 +08:00
Aiden Dai	33e8fcfd3b	fix potential bad request issue	2025-03-13 07:16:42 +08:00
Aiden Dai	5ff18c0acd	Update usage guide for deepseek-r1	2025-03-11 10:25:50 +08:00
Aiden Dai	fcbfa9fe3d	Update usage guide for deepseek-r1	2025-03-11 10:24:19 +08:00
Aiden Dai	1a9c0f461e	Update usage guide for deepseek-r1	2025-03-11 10:14:06 +08:00
Aiden Dai	66b8967d30	Update usage guide for deepseek-r1	2025-03-11 10:10:58 +08:00
Zhongsheng Ji	fcfebf9d9d	feat: Response 429 if ThrottlingException (#91 )	2025-03-10 09:01:33 +08:00
Aiden Dai	283115000a	Support of reasoning	2025-02-28 08:08:54 +08:00
Aiden Dai	4095c2e74e	Support of reasoning	2025-02-26 13:28:23 +08:00

1 2 3

147 Commits