Starexe
📖 Tutorial

How to Optimize Prompts in Amazon Bedrock: A Step-by-Step Guide

Last updated: 2026-05-18 01:11:33 Intermediate
Complete guide
Follow along with this comprehensive guide

Introduction

Amazon Bedrock's new Advanced Prompt Optimization tool lets you automatically refine your prompts for any supported model—and even compare performance across up to five models at once. Whether you're migrating to a newer model or simply want to squeeze more accuracy from your current one, this tool uses a metric-driven feedback loop to iteratively improve your prompts. You can test optimized prompts against known use cases to ensure no regressions and to boost underperforming tasks. This guide walks you through the entire process, from preparation to analysis.

How to Optimize Prompts in Amazon Bedrock: A Step-by-Step Guide
Source: aws.amazon.com

What You Need

  • An active AWS account with access to Amazon Bedrock.
  • Permission to use the Bedrock console and create prompt optimization jobs.
  • A set of prompt templates in JSONL format (see below for structure).
  • Example user inputs for variable placeholders in your prompts.
  • Ground truth answers for evaluation (if using a metric-driven approach).
  • An evaluation metric or rewriting guidance—choose one of:
    • A short natural language description of what you want.
    • An LLM-as-a-judge rubric (custom LLM prompt and model ID).
    • An AWS Lambda function ARN for custom evaluation.
  • (Optional) Multimodal inputs: PNG, JPG, or PDF files for document/image analysis tasks.

Step-by-Step Instructions

Step 1: Access the Advanced Prompt Optimization Page

Log in to the AWS Management Console and navigate to Amazon Bedrock. In the left navigation pane, under Prompt management, choose Advanced Prompt Optimization. Click the Create prompt optimization button to start a new job.

Step 2: Select Up to Five Inference Models

You can choose up to 5 models for prompt optimization. This feature is ideal for two scenarios:

  • Model migration: Select your current model as a baseline and up to 4 target models you want to evaluate.
  • Performance improvement: Select only your current model to compare original vs. optimized prompts on the same model.

For each model, the tool will generate both an original and an optimized prompt version, allowing you to compare scores, cost, and latency.

Step 3: Prepare Your Prompt Templates in JSONL Format

The tool expects a JSONL file where each line is a single JSON object representing one template. Each object must include the following fields:

  • version (required): use "bedrock-2026-05-14".
  • templateId (required): a unique identifier for this template.
  • promptTemplate (required): your prompt text with variable placeholders, e.g., "Please summarize the following article: ".
  • steeringCriteria (optional): an array of strings describing desired behavior.
  • customEvaluationMetricLabel (required if you use a custom LLM judge or Lambda): a label for your metric.
  • customLLMJConfig (optional): an object with customLLMJPrompt (the rubric or instruction for the judge LLM) and customLLMJModelId (the model ID for the judge).
  • evaluationMetricLambdaArn (optional): ARN of a Lambda function that computes the metric.
  • evaluationSamples (required): an array of sample objects, each containing:
    • inputVariables: an array of objects mapping variable names to example values (e.g., [{"article": "Text of article here"}]).
    • referenceResp (required): the ground truth answer for this sample.

Important: Each JSON object must be on a single line. Save the file with a .jsonl extension.

How to Optimize Prompts in Amazon Bedrock: A Step-by-Step Guide
Source: aws.amazon.com

Step 4: Configure the Evaluation Metric

Choose how the optimization will measure success:

  • Natural language description: Provide a short instruction like “The response should be accurate, concise, and in bullet points.” The tool uses an internal judge to score each output.
  • LLM-as-a-judge: Supply a custom rubric prompt and specify a judge model ID (e.g., Claude 3.5 Haiku). The judge evaluates model responses against your rubric.
  • Lambda function: Provide the ARN of your own Lambda function that takes the model response and ground truth and returns a numeric score.

If you use a custom metric, you must also provide a custom evaluation metric label (e.g., "accuracy_score").

Step 5: Upload Your JSONL File and Start the Optimization

On the creation page, upload your prepared JSONL file. The system will parse the templates and samples. Review the configuration and click Start optimization. The process runs in a metric-driven feedback loop: the optimizer iteratively refines the prompt, generates model responses, evaluates them using your chosen metric, and adjusts the prompt until it converges on the best version.

Step 6: Review the Results

After completion, you’ll see a report comparing original and optimized prompts for each selected model. The report includes:

  • Evaluation scores for both original and optimized prompts.
  • Cost estimates per inference call.
  • Latency predictions.
  • The final optimized prompt template.

You can drill down into individual samples to see how the optimized prompt performed. Use this information to decide whether to adopt the new prompt, deploy it to a new model, or iterate further.

Tips and Best Practices

  • Start with 2–3 samples for quick iteration; add more samples for higher reliability.
  • Include edge cases in your evaluation samples to test robustness.
  • For multimodal tasks, you can include image or PDF file references in the input variables (the tool supports PNG, JPG, and PDF).
  • Use steering criteria to guide the optimizer away from undesired behaviors (e.g., "Avoid overly verbose responses").
  • Compare across models to find the best fit for your use case—don’t assume your current model is optimal.
  • Review latency and cost alongside accuracy; the best prompt may not be the cheapest.
  • Iterate: The tool is designed to be used repeatedly as your tasks evolve.