feat: add prompt caching support for Claude and Nova models
Add comprehensive prompt caching support with flexible control options: Features: - ENV variable control (ENABLE_PROMPT_CACHING, default: false) - Per-request control via extra_body.prompt_caching - Pattern-based model detection (Claude, Nova) - Token limit warnings (Nova 20K limit) - OpenAI-compatible response format (prompt_tokens_details.cached_tokens) Supported models: - Claude 3+ models (anthropic.claude-*) - Nova models (amazon.nova-*) - Auto-detection prevents breaking unsupported models Implementation: - System prompts caching via extra_body.prompt_caching.system - Messages caching via extra_body.prompt_caching.messages - Non-streaming and streaming modes - Compatible with reasoning, thinking, and tool calls
This commit is contained in:
@@ -11,6 +11,13 @@ Parameters:
|
||||
Type: String
|
||||
Default: anthropic.claude-3-sonnet-20240229-v1:0
|
||||
Description: The default model ID, please make sure the model ID is supported in the current region
|
||||
EnablePromptCaching:
|
||||
Type: String
|
||||
Default: "false"
|
||||
AllowedValues:
|
||||
- "true"
|
||||
- "false"
|
||||
Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings.
|
||||
Resources:
|
||||
VPCB9E5F0B4:
|
||||
Type: AWS::EC2::VPC
|
||||
@@ -251,6 +258,9 @@ Resources:
|
||||
Value: "true"
|
||||
- Name: ENABLE_APPLICATION_INFERENCE_PROFILES
|
||||
Value: "true"
|
||||
- Name: ENABLE_PROMPT_CACHING
|
||||
Value:
|
||||
Ref: EnablePromptCaching
|
||||
Essential: true
|
||||
Image:
|
||||
Ref: ContainerImageUri
|
||||
|
||||
Reference in New Issue
Block a user