feat: add prompt caching support for Claude and Nova models

Add comprehensive prompt caching support with flexible control options: Features: - ENV variable control (ENABLE_PROMPT_CACHING, default: false) - Per-request control via extra_body.prompt_caching - Pattern-based model detection (Claude, Nova) - Token limit warnings (Nova 20K limit) - OpenAI-compatible response format (prompt_tokens_details.cached_tokens) Supported models: - Claude 3+ models (anthropic.claude-*) - Nova models (amazon.nova-*) - Auto-detection prevents breaking unsupported models Implementation: - System prompts caching via extra_body.prompt_caching.system - Messages caching via extra_body.prompt_caching.messages - Non-streaming and streaming modes - Compatible with reasoning, thinking, and tool calls
2025-10-11 14:08:22 +08:00
parent 7756532b4c
commit b4800c54a0
6 changed files with 376 additions and 39 deletions
--- a/deployment/BedrockProxyFargate.template
+++ b/deployment/BedrockProxyFargate.template
@@ -11,6 +11,13 @@ Parameters:
    Type: String
    Default: anthropic.claude-3-sonnet-20240229-v1:0
    Description: The default model ID, please make sure the model ID is supported in the current region
+  EnablePromptCaching:
+    Type: String
+    Default: "false"
+    AllowedValues:
+      - "true"
+      - "false"
+    Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings.
 Resources:
  VPCB9E5F0B4:
    Type: AWS::EC2::VPC
@@ -251,6 +258,9 @@ Resources:
              Value: "true"
            - Name: ENABLE_APPLICATION_INFERENCE_PROFILES
              Value: "true"
+            - Name: ENABLE_PROMPT_CACHING
+              Value:
+                Ref: EnablePromptCaching
          Essential: true
          Image:
            Ref: ContainerImageUri