Day 2

Prompt Engineering
Workshop

Mastering Prompt Techniques for All Users

AnyCompany Financial Group · Generative & Agentic AI on AWS

Module 1

Prompt Fundamentals
Deep Dive

The 4 pillars that determine 80% of output quality

The 80/20 Rule of Prompting

80% of prompt quality comes from 4 fundamentals:

1. Clarity

Say exactly what you mean. If a colleague would ask "what do you mean?" — your prompt needs work.

2. Context

Give the AI the background it needs. Without context, it guesses — dangerous in finance.

3. Role Assignment

Tell the AI who to be. A "risk analyst" focuses on different signals than a "support agent."

4. Output Framing

Define what "done" looks like — format, length, structure, style.

Pillar 1: Clarity

Vague	Clear
"Summarize this report"	"Summarize this quarterly earnings report in 5 bullet points, focusing on revenue growth, cost changes, and risk factors"
"Help me with this data"	"Analyze this CSV of 500 transactions and identify the top 3 merchants by total volume"
"Write something about compliance"	"Draft a 200-word summary of MAS Notice 626 requirements for e-payment service providers"

Rule of thumb: The more specific your prompt, the less the AI has to guess.

Pillar 2: Context

Without context:

"Is this transaction suspicious?"

With context:

"This merchant is a convenience store 
in Singapore, typically 50-80 txns/day 
averaging $15 SGD. Today: 340 txns 
averaging $4.50. Is this suspicious?"

Types of context: Domain · Data · Situational · Constraints

4 Types of Context

Type	What it tells the AI	Finance example
Domain	The industry, market, and business area	"In the context of Southeast Asian digital payments and PayLater services..."
Data	The specific numbers, records, or documents to analyze	"Here is the merchant's transaction history for the last 6 months: [data]"
Situational	Why you need this now — the trigger or event	"We are preparing for a quarterly board review" / "This merchant was flagged by our monitoring system"
Constraints	Rules, limits, and requirements the output must follow	"All amounts in SGD with 2 decimal places" / "Follow MAS Notice 626 guidelines"

Rule of thumb: If you skip Domain context, the AI gives generic answers. If you skip Data context, it hallucinates. If you skip Situational context, it guesses your purpose. If you skip Constraints, it ignores your standards.

Context in Action: Merchant Review

[DOMAIN]
You are reviewing an AnyCompany Pay merchant in 
Singapore's food & beverage sector.

[DATA]
Merchant: Kopi Corner Pte Ltd (ID: MC-8842)
Monthly txn volume: 4,200 → 15,600 (6-month trend)
Avg transaction: $8.50 SGD
Chargeback rate: 0.3% → 4.1% (6-month trend)
Complaints: 12 in last 30 days (up from 2)

[SITUATIONAL]
Auto-flagged: chargeback rate exceeds 1.0% threshold.
Risk committee meets Friday.

[CONSTRAINTS]
- All amounts in SGD
- Reference AnyCompany's chargeback policy (max 1.0%)
- Use only the data provided above
- Include a GREEN/AMBER/RED risk rating

Pillar 3: Role Assignment

Role	What changes in the output
Compliance Officer	Focuses on regulatory requirements, flags risks
Customer Support Agent	Empathetic language, resolution-focused
Financial Analyst	Numbers, trends, comparisons, frameworks
Fraud Investigator	Patterns, anomalies, evidence chains

Pro tip: Add experience level — "Senior Credit Risk Analyst with 10 years of experience in Southeast Asian consumer lending, specializing in PayLater products"

Pillar 4: Output Framing

Dimension	Example
Format	"Respond as a bullet list" / "Use a table"
Length	"In exactly 3 sentences" / "Under 200 words"
Structure	"Use sections: Summary, Analysis, Recommendation"
Style	"Write for a non-technical executive audience"
Constraints	"Do not include personal opinions"

Output Framing in Action

Same question: "What happened with PayLater chargebacks this quarter?"

❌ No output framing

PayLater chargebacks have been increasing this quarter. The rate went from about 2.1% to 2.8% which is a notable increase. There are several factors that could be contributing to this including seasonal spending patterns, new merchant onboarding, and changes in customer demographics. The team should look into this further and consider implementing additional fraud checks. It might also be worth reviewing the underwriting criteria for new merchants. Overall, while the increase is concerning, it's not yet at critical levels but should be monitored closely going forward.

⚠ Wall of text. No structure. Vague recommendations. Key numbers buried.

✅ With output framing

Summary: Chargeback rate rose 2.1% → 2.8% (+33%).

Root causes:
• New merchant cohort (Q2): 4.2% vs 1.8% established
• "Item not received" disputes +45% in Indonesia

Actions:
• Tighten onboarding for high-risk MCCs — Risk, 2 wks
• Add delivery confirmation for ID — Product, 1 wk
• Review top 20 merchants by volume — Ops, immediate

✓ Scannable. Key number upfront. Actions with owners & timelines.

All 4 Pillars Together

[ROLE] You are a Payment Operations Manager at a 
Southeast Asian fintech.

[CONTEXT] Below is the daily transaction summary across 
6 markets. We had a system outage in Thailand from 
2:00-4:30 PM yesterday.

[TASK] Analyze the data and produce a daily operations 
brief for leadership.

[OUTPUT] Format as:
1. Executive Summary (3 sentences)
2. Market Performance Table
3. Incidents & Impact (bullets)
4. Action Items (numbered, with owners)
Keep under 400 words.

See the Difference: Merchant Review

❌ Without Technique

Prompt: "Review this merchant"

The merchant seems to be doing okay. Their transactions have been increasing which is good. There are some chargebacks but that's normal for this type of business. Overall they appear to be a reasonable merchant to work with. I would recommend continuing the relationship.

⚠ Vague, no data cited, no structure, no actionable recommendation, no risk rating

✅ With 4 Pillars

Prompt: Role + Context + Structure + Constraints

RISK RATING: AMBER
Transaction Analysis: Volume surged 271% (4,200 to 15,600) over 6 months. Avg txn size flat at $10 suggests organic growth, not fraud.
Chargeback Alert: Rate escalated from 0.3% to 4.1%, exceeding the 1.0% benchmark by 4x. Primary driver: "Item not received" (45%).
Action: 1) Issue chargeback warning letter (Merchant Ops, 5 days) 2) Reduce PayLater limit to $5 minimum (Risk, immediate)

✓ Structured, data-cited, specific actions with owners and timelines

Module 2

Chain-of-Thought
Reasoning

Making AI show its work — step by step

Why Chain-of-Thought?

Financial decisions require multi-step logic. CoT makes reasoning visible and auditable.

Without CoT

"Can this merchant afford a 
$50K credit line?"

Answer: "Yes, they can afford it."
(No reasoning shown)

With CoT

"Think through this step by step."

Revenue: $120K
- Costs: $95K = $25K free cash
- Existing debt: $15K
= Available: $10K
New repayment: $5K
DSCR = 2.0x → Affordable

CoT Techniques

Technique	Trigger	Best for
Zero-Shot CoT	"Think step by step"	Quick calculations, simple logic
Few-Shot CoT	Provide example with reasoning	Consistent multi-step processes
Step-Back	"First identify key factors, then analyze"	Complex analysis needing prioritization
Self-Consistency	"Solve 3 ways, report majority"	High-stakes decisions

Finance rule: Any decision that could be audited should use CoT — the reasoning trail is your documentation.

Zero-Shot CoT Example

Just add "Think step by step" to any prompt — no examples needed.

❌ Without CoT

PROMPT: A merchant has revenue $120K, costs $95K, existing loan repayment $15K/month. They want a $50K credit line with $5K monthly repayments. Can they afford it?

AI RESPONSE:
Yes, they can afford it. The merchant has sufficient revenue to cover the additional repayment.

⚠ No math. No reasoning. Not auditable. Could be wrong.

✅ With "Think step by step"

PROMPT: Same data as left. + "Think through this step by step."

AI RESPONSE:
Revenue: $120K
− Operating costs: $95K
= Free cash flow: $25K
− Existing loan: $15K
= Available: $10K
New repayment: $5K
DSCR = $10K ÷ $5K = 2.0x
✅ Affordable — DSCR above 1.25x threshold

✓ Every step visible. Math auditable. Conclusion backed by evidence.

Few-Shot CoT Example

Provide one example with reasoning — the AI follows the exact same pattern.

Your prompt (with example)

PROMPT: Assess PayLater applications step by step.

EXAMPLE:
Customer A: income $4,000, balance $800, req $2,000 increase.
Step 1 — Income: $4,000 > $2,500 min ✅
Step 2 — Current DTI: $800/$4,000 = 20% < 30% ✅
Step 3 — New DTI: $2,800/$4,000 = 70% > 30% ❌
Step 4 — DECLINE: DTI exceeds policy max.

NOW ASSESS:
Customer B: income $6,500, balance $1,200, req $1,500 increase.

AI output (follows your pattern)

AI RESPONSE:
Step 1 — Income: $6,500 > $2,500 min ✅
Step 2 — Current DTI: $1,200/$6,500 = 18.5% < 30% ✅
Step 3 — New DTI: $2,700/$6,500 = 41.5% > 30% ❌
Step 4 — DECLINE: DTI 41.5% exceeds 30% max.

Counter-offer: Increase to $750 only (DTI = $1,950/$6,500 = 30.0%).

Key insight: The AI copied your exact Step 1-4 format, applied the same thresholds, and even added a counter-offer — all from one example.

Step-Back Prompting Example

First identify principles, then apply them to the specific case.

Your prompt

PROMPT:
Before analyzing this insurance claim, first answer:
1. What are the 3 most important factors for travel insurance claim validity?
2. What are common red flags in Southeast Asia?

Now assess this claim using those factors:
Claimant: Sarah Tan (SG)
Trip: Bangkok, 3 nights
Claim: $4,200 SGD — laptop stolen from hotel
Filed: 12 days after return
Police report: Filed on last day of trip
Receipt: Online purchase, 2 months before trip

AI response

STEP 1 — KEY FACTORS:
1. Documentation timeliness (police report, filing delay)
2. Evidence quality (receipts, proof of ownership)
3. Claim amount vs trip context

STEP 2 — RED FLAGS:
• Filing delay >7 days • Police report on departure day
• High-value electronics • No hotel incident report

ASSESSMENT: HIGH RISK
• 12-day filing delay ❌
• Police report timing suspicious (last day) ❌
• $4,200 claim for 3-night trip ⚠
Recommend: Request hotel incident report + purchase proof before processing.

Why Step-Back works: The AI built a framework FIRST (factors + red flags), then applied it systematically — instead of jumping to "looks suspicious."

Self-Consistency for High Stakes

Solve 3 ways, report the majority — for decisions where being wrong is costly.

Your prompt

PROMPT:
Investigate this merchant for potential fraud.

Analyze using 3 independent approaches:
1. Velocity patterns (txn frequency vs norms)
2. Amount patterns (deviation from avg size)
3. Geographic patterns (location consistency)

For each: conclude FRAUDULENT / SUSPICIOUS / LEGITIMATE

Final assessment = majority conclusion.
If approaches disagree → flag for human review.

AI response

Approach 1 — Velocity: 340 txns/day vs 75 norm (+353%). Burst pattern at 2-4am. → FRAUDULENT

Approach 2 — Amount: Avg $4.50 vs $15 norm (−70%). Micro-transactions consistent with card testing. → FRAUDULENT

Approach 3 — Geographic: All transactions from Singapore IP. Consistent with merchant location. → LEGITIMATE

MAJORITY: 2/3 FRAUDULENT
Recommend: Suspend merchant, escalate to fraud team for full investigation.

Why 3 approaches? Approach 3 alone would have cleared this merchant. The majority vote catches what a single analysis misses.

See the Difference: Loan Decision

❌ Without CoT

Prompt: "Should we approve this PayLater application?"

Yes, I would recommend approving this PayLater application. The customer has a decent income and has been on the platform for a reasonable amount of time. Their payment history is mostly good with only minor issues.

⚠ No reasoning shown, no math, not auditable, "decent" and "reasonable" are not metrics

✅ With Chain-of-Thought

Prompt: "Think through each policy criterion step by step"

Step 1 — Income: $5,200 > $2,000 min ✅
Step 2 — DTI: ($400 loan + $3,000 req) / $5,200 = 65% > 40% max ❌
Step 3 — Late payments: 2 in 8 months ≤ 2 max ✅
Step 4 — History: 8 months > 6 min ✅
Decision: DECLINE. DTI of 65% exceeds 40% policy maximum. Suggest $1,500 counter-offer (DTI = 37%).

✓ Every criterion checked, math shown, auditable, includes counter-offer

Module 3

Role & Persona
Prompting

Same data, dramatically different insights — just by changing who the AI "is"

The Persona Formula

You are [TITLE] at [COMPANY TYPE] 
with [X years] of experience in [SPECIALTY].
You are known for [CHARACTERISTIC]. 
When [SITUATION], you always [BEHAVIOR].

Conservative Analyst

"Senior Risk Analyst, 12 years in consumer credit. Known for being cautious. When in doubt, recommends additional verification."

Growth Analyst

"Business Development Analyst, 6 years in merchant acquisition. Known for finding opportunities. Always highlights growth potential alongside risks."

The last two fields matter most: CHARACTERISTIC and BEHAVIOR shape the AI's judgment calls. "Cautious" vs "opportunity-focused" produces completely different recommendations from the same data.

Persona in Action: Same Merchant, Different Eyes

Data: Merchant with 271% volume growth + chargeback rate rising from 0.3% to 4.1%

🛡 Conservative Risk Analyst

RISK RATING: RED — IMMEDIATE ACTION

Primary concern: Chargeback rate at 4.1% is 4x above the 1.0% policy threshold. This trajectory suggests systemic issues, not isolated incidents.

Volume growth concern: 271% growth without proportional infrastructure investment raises fulfillment risk. Rapid scaling often precedes quality collapse.

Recommendation:
• Reduce PayLater limit to minimum immediately
• Issue formal chargeback warning letter
• Place on 30-day enhanced monitoring
• If no improvement → initiate offboarding review

📈 Growth Business Analyst

ASSESSMENT: HIGH-GROWTH MERCHANT — MANAGE, DON'T EXIT

Growth signal: 271% volume increase indicates strong product-market fit. This merchant is scaling fast — a potential top-10 partner within 6 months.

Chargeback context: 4.1% rate is elevated but common during rapid scaling. 45% are "item not received" — a logistics issue, not fraud. Solvable with delivery confirmation.

Recommendation:
• Maintain current limits to support growth
• Offer delivery tracking integration (reduces chargebacks 60%)
• Assign dedicated merchant success manager
• Review in 60 days with chargeback target of 1.5%

Multi-Agent Framing

Get 3 perspectives in one prompt — no need to schedule 3 meetings.

Your prompt

PROMPT:
Evaluate this PayLater policy change (raising limit from $500 to $1,000):

🛡 RISK MANAGER: Default rate, exposure, regulation
📊 PRODUCT MANAGER: Adoption, competition, revenue
⚖ COMPLIANCE: Responsible lending, MAS guidelines

Each: 3 concerns + Recommendation + 1 metric to monitor

Then: Balanced synthesis of all three views.

AI response (synthesis excerpt)

🛡 RISK: "Doubling limits increases exposure by $12M. Default rate likely rises 0.3-0.5pp. Recommend: phased rollout to top-tier customers first."

📊 PRODUCT: "Competitors offer $1,500+. Current $500 limit is #1 reason for churn. Revenue uplift est. $2.4M/quarter."

⚖ COMPLIANCE: "MAS Notice requires affordability assessment above $500. Must add income verification step."

SYNTHESIS: Proceed with phased rollout ($750 first) with income verification. Monitor default rate weekly. Full $1,000 after 90-day review.

Why this works: Forces balanced analysis. No single perspective dominates. The synthesis is where the real insight lives.

Same Data, Different Audiences

Data: "PayLater default rate increased from 2.1% to 2.8% this quarter"

Audience	Persona	Output style
Board	"You are the CFO presenting to the board"	Strategic, 5-minute read
Ops Team	"You are the Ops Manager briefing your team"	Actionable, task-oriented
Regulators	"You are Compliance Head responding to MAS"	Formal, regulation-referenced
Customers	"You are a support specialist"	Simple, empathetic

💡 Practice activity (10 min): Pick the same data point above. Write prompts for 2 different audiences. Compare how the tone, detail level, and recommendations change.

Module 4

Structured Outputs
& RAG

JSON extraction, document grounding, and meta-prompting

Why Structure Matters

Unstructured = Conversation

Different every time. Hard to compare. Can't feed into systems. Requires human parsing.

Structured = Form

Consistent format. Comparable across items. Machine-parseable. Scannable by busy stakeholders.

Finance use cases:

Invoice extraction → accounts payable system
Transaction categorization → reconciliation
Complaint classification → route to correct team
KYC document parsing → verification forms

RAG — Grounding in Documents

CRITICAL RULES:
1. Base your answer ONLY on the provided documents
2. After each claim, cite: [Doc Name, Section]
3. If not in documents: "Not available in 
   provided documents"
4. Do NOT use outside knowledge

DOCUMENTS:
[Doc 1: PayLater Terms v3.2]
[Doc 2: MAS Notice PSN 06]

QUESTION: What are our obligations if a customer 
misses 3 consecutive PayLater payments?

Critical for compliance: Without "ONLY on provided documents," the AI mixes its knowledge with yours — you can't trace which claims come from where.

Meta-Prompting

Ask the AI to write prompts for you:

I need a prompt that helps our compliance team 
assess whether a new MAS regulatory update 
affects our PayLater product.

The prompt should:
- Work with any MAS circular
- Produce a structured impact assessment
- Identify operations that need to change
- Suggest a compliance timeline

Write the best possible prompt for this task.

The AI builds a tool for your team. Use the generated prompt repeatedly — it's a reusable asset.

See the Difference: Policy Q&A

❌ Without RAG Grounding

Prompt: "What happens if a customer misses 3 PayLater payments?"

If a customer misses 3 consecutive PayLater payments, the account is typically suspended and the outstanding balance may be sent to collections. Late fees of around 1-2% per month are usually applied. The customer's credit score will be negatively impacted and they may be barred from future PayLater services.

⚠ Sounds plausible but NONE of this may be in your actual policy. "Typically" and "usually" = hallucination. No citations.

✅ With RAG Grounding

Prompt: "ONLY from provided documents" + policy doc attached

Per the PayLater Terms v3.2:
• After 3 missed payments, the account enters "Default" status [Section 8.3]
• A late fee of $5 or 1.5% (whichever is higher) applies per missed payment [Section 7.1]
• [INSUFFICIENT DATA: The provided document does not specify credit score reporting obligations. Refer to Section 12 or the Credit Bureau Reporting Policy.]

✓ Every claim cites a section. Admits what it doesn't know. No hallucination.

Module 5

Model-Specific
Tuning

Choosing the right model and parameters for each task

Model Comparison

Capability	Claude	GPT-4	Gemini
Long documents	200K tokens	128K tokens	1M tokens
Structured output	XML tags	Numbered rules	Schemas
Instruction following	Very precise	Good (may add extras)	Good
Best prompt style	XML + constraints	Numbered rules	Grounding instructions
Finance/compliance	Strong safety	Flexible	Good with grounding

Temperature Guide for Finance

Temperature	Behavior	Use for
0.1	Deterministic, same answer every time	Data extraction, compliance, calculations
0.3	Slightly varied, still focused	Report writing, narratives, summaries
0.7	Creative, diverse outputs	Brainstorming, ideation, drafting

Rule: Never use Temperature > 0.7 for anything that will be audited or submitted to regulators.

Module 6 · NEW

Evaluating
Your Prompts

How do you KNOW your prompts are working?

Why Evaluate?

Prompts degrade over time — model updates change behavior
"It looks good" is not a metric — you need measurable quality
Compliance requires evidence that AI outputs meet standards
You need to compare version A vs version B objectively

The problem: Most teams deploy prompts based on "it looked good when I tested it once." That's like shipping software without tests.

Manual Evaluation: Rubrics

Criterion	1 (Poor)	3 (OK)	5 (Excellent)
Completeness	Missing 3+ sections	All sections, some thin	All sections thorough
Data grounding	Unsupported claims	Mostly grounded	Every claim cites data
Actionability	No recommendation	Vague recommendation	Specific actions + owners
Consistency	Different each run	Mostly consistent	Identical structure

Process: Run same prompt 5 times → score each → average = quality score

LLM-as-Judge

Use one AI to evaluate another AI's output:

Score this merchant risk assessment:

1. All 8 sections present? (0-10)
2. Every claim cites data? (0-10)
3. Risk rating justified? (0-10)
4. Actions specific & actionable? (0-10)

Return JSON: {"completeness": X, "grounding": X, 
"consistency": X, "actionability": X, 
"total": X, "issues": ["..."]}

Run on 10 outputs → compare scores between template versions

A/B Testing Prompts

Process

Same input data, two prompt versions
Run both 10 times each
Score with the judge prompt
Higher average score wins

When to Re-evaluate

After any model update
When users report quality issues
Monthly for production templates
After any template modification

Module 7 · NEW

From Manual Prompts
to Automated Tools

You build the template once. The tools do the rest.

The Reality: Nobody Writes Long Prompts Every Day

You learn the techniques → build the template once → let the tools handle the rest.

Phase	What you do	Tool
1. Learn	Master the techniques (today)	Your brain
2. Build	Create a reusable template with `{{variables}}`	Kiro / any AI chat
3. Optimize	Let AI rewrite your prompt for better performance	Bedrock Prompt Optimization
4. Store & Share	Save versioned templates with metadata	Bedrock Prompt Management
5. Reuse	Fill in variables and run — no rewriting needed	Bedrock Console / API

Bedrock Prompt Management

Your prompt library — stored, versioned, and shared across the team.

Manual (today)	Bedrock Prompt Management
Templates in markdown files	Stored as managed resources
Copy-paste to test	One-click testing across models
No version history	Immutable version snapshots
Manual comparison	Side-by-side model comparison
Share via email/Slack	Shared across team via API

No additional charge — you only pay for model tokens during testing.

Prompt Management: Key Features

Prompt Templates with {{variables}} — same syntax from the exercises. Define variables with descriptions and defaults.
Version Management — every change creates an immutable snapshot. Roll back anytime.
Multi-Model Testing — test across Claude, Nova, Llama side-by-side. Compare quality, latency, cost.
Up to 3 Prompt Variants — compare different versions of the same prompt to find the best performer.

Think of it as: Google Docs for prompts — versioned, shared, and always accessible. But with built-in testing across multiple AI models.

Prompt Optimization (Instructor Demo)

You write a basic prompt. Bedrock rewrites it for better performance — automatically.

Your prompt

"Assess this merchant's risk level"

6 words. No structure, no role, no constraints.

Bedrock's optimized version

"You are a Senior Risk Analyst 
specializing in SEA digital payments.
Produce a risk assessment:
1. Rating (GREEN/AMBER/RED)
2. Transaction Pattern Analysis
3. Chargeback Assessment
4. Recommended Actions
Base analysis ONLY on provided data."

Persona + structure + grounding — applied automatically

How Prompt Optimization Works

Step 1: Submit your prompt (even a short, rough one)
Step 2: Bedrock analyzes the prompt components
Step 3: It rewrites with best practices — structure, constraints, model-specific formatting
Step 4: Compare original vs optimized output side-by-side
Step 5: Save the optimized version to your Prompt Management library

GA — April 2025. Supports Claude, Amazon Nova, Meta Llama, DeepSeek, Mistral. The techniques you learned today help you evaluate whether the optimized prompt is actually good.

The Bottom Line

Your concern	The solution
"I don't want to write long prompts every time"	Build the template once → reuse with `{{variables}}`
"I'm not sure my prompt is good enough"	Prompt Optimization rewrites it automatically
"My team needs to share and version prompts"	Prompt Management stores everything centrally
"Which model gives the best result?"	Multi-model testing compares side-by-side

For developers: Intelligent Prompt Routing auto-selects cheaper models for simple tasks (up to 30% cost savings). Prompt Flows chains prompts into automated workflows. These are covered in Day 3.

Hands-on

Prompt Engineering
Exercises

Build a reusable prompt template in 7 steps

Choose Your Exercise

Exercise 1: Merchant Risk Assessment

Best for: Risk analysts, merchant ops, compliance

Techniques: Zero-Shot → Persona → Few-Shot → Structured → RAG + Self-Critique → Meta-Prompting → Validation

Deliverable: Reusable template for GREEN/AMBER/RED merchant risk assessments

Exercise 2: Credit Risk Narrative

Best for: Credit analysts, PayLater ops, financing

Techniques: Zero-Shot → Step-Back → Audience Framing → Multi-Perspective → Structured + Length Control → Meta-Prompting → Validation

Deliverable: Reusable template for APPROVE/CONDITIONS/DECLINE credit narratives

Open the workshop site → Prompt Engineering Exercises

Wrap-up

Best Practices &
Prompt Optimization

Common mistakes, optimization strategies, and recovery patterns

7 Prompt Mistakes Everyone Makes

#	Mistake	Why it hurts	Quick fix
1	The Kitchen Sink	Cramming 5 tasks into 1 prompt	One task per prompt, chain results
2	The Blank Canvas	No examples = AI guesses your format	Show 1-2 examples of desired output
3	The Trust Fall	No grounding = confident hallucinations	"ONLY from provided data"
4	The Vague Ask	"Analyze this" — analyze what, how, for whom?	Specify audience, format, length
5	The One-Shot Wonder	Expecting perfection on first try	Plan for 2-3 refinement turns
6	The Copy-Paste Trap	Using the same prompt for different models	Tune syntax per model family
7	The Set-and-Forget	Never re-testing after model updates	Monthly prompt health checks

The Draft-Score-Revise Loop

Don't accept the first output. Build a self-improving cycle into your prompt:

Step 1 — DRAFT: Write a merchant risk summary 
  using the data provided.

Step 2 — SCORE: Rate your draft on these criteria:
  - Completeness (0-5): All required sections?
  - Grounding (0-5): Every claim cites data?
  - Actionability (0-5): Specific next steps?

Step 3 — REVISE: If total < 12, rewrite to fix 
  the lowest-scoring area. Max 2 revisions.

Output only the final version.

Result: The AI self-corrects before you even read it. Teams using this pattern report 40-60% fewer revision cycles.

Break Big Tasks into Small Steps

Complex tasks fail when you ask for everything at once. Decompose instead:

❌ One Giant Prompt

"Analyze our Q2 transactions, 
identify fraud patterns, calculate 
loss exposure, compare to Q1, 
draft a board summary, and 
recommend 3 prevention measures."

6 tasks = shallow work on each

✅ Chained Prompts

Prompt 1: "Analyze Q2 transactions 
  and flag anomalies"
Prompt 2: "From these anomalies, 
  identify the top 3 fraud patterns"
Prompt 3: "Calculate loss exposure 
  for each pattern"
Prompt 4: "Draft a board summary 
  with prevention measures"

Each step gets full attention

Tell the AI What NOT to Do

Positive instructions tell the AI what to include. Negative constraints prevent common failure modes:

Problem	Negative constraint to add
AI adds unsolicited opinions	"Do not include personal opinions or speculation"
AI uses data not in your input	"Do not reference any data outside the provided documents"
AI writes too much	"Do not exceed 300 words. Do not add a conclusion section"
AI hedges everything	"Do not use phrases like 'it depends' or 'generally speaking'"
AI explains obvious things	"Do not explain what PayLater is or how digital wallets work"
AI invents numbers	"If a metric is not in the data, write [DATA NOT AVAILABLE]"

Pro tip: After your first test run, note what went wrong and add a "Do NOT" line for each issue. Your prompt improves with every iteration.

Structure Your Prompts Like Documents

Well-organized prompts produce well-organized outputs. Use clear sections and delimiters:

### ROLE
You are a Senior Payment Operations Analyst.

### CONTEXT
<<<
[Paste your transaction data or document here]
>>>

### TASK
Analyze the data for anomalies in Thailand and Vietnam.

### OUTPUT FORMAT
- Executive summary (3 sentences)
- Anomaly table: Market | Type | Severity | Evidence
- Recommended actions (numbered, with owner)

### CONSTRAINTS
- Use ONLY the data provided above
- All amounts in SGD
- Do not exceed 400 words

Why delimiters matter: Without clear separation, the AI may confuse your instructions with your data — especially dangerous when pasting policy documents.

Show, Don't Tell: The Power of Examples

One good example is worth 100 words of instruction:

❌ Telling

"Categorize each transaction as 
high risk, medium risk, or low 
risk based on amount, frequency, 
and merchant type. Format as a 
table with columns for transaction 
ID, category, and reasoning."

50 words of instruction, AI still guesses your format

✅ Showing

"Categorize transactions like this:

| ID | Risk | Reason |
| T001 | HIGH | $12K single txn, 
  new merchant, no history |
| T002 | LOW | $45 recurring, 
  12-month pattern |

Now categorize these: [data]"

One example = perfect format every time

The 3-Round Prompt Improvement Workflow

Every production-quality prompt goes through this cycle:

Round	What you do	What improves
Round 1: Baseline	Write your first prompt using the 4 pillars. Run it 3 times.	You see what the AI gets right and wrong
Round 2: Fix failures	Add negative constraints for each failure. Add an example of good output. Run 3 more times.	Consistency jumps from ~60% to ~85%
Round 3: Polish	Add self-review step. Tighten length/format. Test with edge cases.	Production-ready at ~95% consistency

Total time: 15-20 minutes to go from first draft to production template. That template then saves hours every week.

Build a Team Prompt Library

Your best prompts are team assets, not personal notes. Treat them like shared templates:

What to include

Prompt name and purpose
The full prompt with {{variables}}
Which model and temperature to use
1-2 example outputs (good vs bad)
Known limitations and edge cases
Last tested date and model version

Starter library for finance

Merchant risk assessment
Transaction anomaly detection
Customer complaint classification
Policy document Q&A (RAG)
Board summary generator
Regulatory impact assessment
KYC document extraction
Fraud investigation narrative

Start today: The template you built in the exercise is your first library entry. Share it with your team this week.

Why AI "Gets Dumber" Mid-Conversation

It's not a bug — it's a context window problem. Every AI has a limited "working memory."

What happens inside

Every message you send + every AI response stays in the context window
At 60-70% capacity, performance drops sharply — not gradually, but in sudden cliffs
The AI compresses and deprioritizes earlier messages to make room
"Lost in the Middle" effect: AI remembers the start and end of conversations best, but forgets what's in the middle

What you experience

AI contradicts instructions you gave 10 messages ago
AI re-introduces ideas you already rejected
AI ignores constraints from the start of the chat
Outputs get vague, generic, or repetitive
AI starts "hallucinating" more frequently

Key insight: Most people blame the AI for "getting stupid." The real problem is the conversation got too long. The fix is context management, not a better model.

5 Rules for Managing Long Conversations

#	Rule	Why it works
1	One task per session — don't mix debugging, writing, and analysis in one chat	Each session gets full attention capacity
2	Paste only what's relevant — don't dump entire documents when you need one section	Reduces noise, keeps AI focused on what matters
3	Put key instructions at the start AND end — not buried in the middle	Exploits primacy + recency bias
4	Keep sessions under 15-20 turns — start fresh after that	Stays within the performance sweet spot
5	Use "session summaries" to carry state — ask AI to summarize, then paste into new chat	Fresh context window with all the knowledge

The Session Summary Technique

When a conversation gets too long but you can't lose the state:

Step 1: Ask for a summary

PROMPT (in the old session):

Summarize our conversation so far:
• Key decisions we made
• Data and findings so far
• What we still need to do next

Format as a briefing I can paste into a new session.

Step 2: Start fresh with context

PROMPT (in the new session):

Here is the context from our previous session:

[PASTE SUMMARY HERE]

Continue from where we left off. The next step is to draft the risk committee report based on the findings above.

✓ Fresh context window + all accumulated knowledge = best of both worlds

Think of it as "saving your game." You compress hours of conversation into a focused briefing, then load it into a fresh session with full attention capacity.

The Conversation Funnel

Start broad, then narrow. Each turn builds on context — but keep it focused.

The pattern

Turn 1 (Explore):
"Analyze this month's transaction data — identify top 3 trends"

Turn 2 (Deep-dive):
"Expand on trend #2 — the PayLater chargeback increase"

Turn 3 (Produce):
"Draft a 1-page summary for the risk committee"

Turn 4 (Polish):
"Make the tone more formal and add data citations"

Why it works

Each turn is focused on one thing
You review and correct at each step
Errors don't compound — you catch them early
4 focused turns > 1 massive prompt

When to reset: If Turn 3 goes wrong, don't keep correcting. Start a new session with: "Here's the data and the trend analysis. Draft a risk committee summary."

When to Start Fresh vs. Continue

🟢 Start a New Session

Switching to a completely different task
Conversation has gone off track
Testing a refined prompt cleanly
Session is longer than 15-20 turns
AI keeps repeating the same mistake
AI contradicts earlier instructions

🔵 Continue the Session

Iterating on the same output
Need AI to remember earlier context
Building step by step (funnel pattern)
Refining format or tone
Follow-up questions on same topic
Session is still under 15 turns

The 3-strike rule: If you've corrected the AI 3 times and it's still wrong — the context is working against you. Start fresh. It's faster than fighting a polluted conversation.

Circuit Breaker Patterns

Pattern	Symptom	Fix
Repetition Loop	Same wrong output after correction	New session, rephrase
Hallucination Spiral	Inventing data	"Use ONLY provided data"
Over-Eager Helper	2,000 words for 5 bullets	"Exactly 5 bullets, under 20 words"
Format Drift	Format changes mid-output	"Continue EXACTLY same format"
Confidence Trap	Uncertain info as fact	"Prefix uncertain with [UNCERTAIN]"

Using Kiro for Business Users

Vibe mode: Describe what you want → Kiro writes and runs the code
File context: Drag CSVs, PDFs, JSON into chat
Iterative refinement: "Make the chart bigger" / "Add a percentage column"
New Session per task: Keep context focused

Remember: You don't need to understand the code Kiro writes. You just need to describe what you want clearly — using the 4 pillars from Module 1.

Quick Reference Card

Technique	Trigger Phrase
Zero-Shot CoT	"Think step by step before answering"
Expert Persona	"You are a Senior [ROLE] with X years in [SPECIALTY]"
Multi-Perspective	"Present the case FOR and AGAINST"
Structured Output	"Use EXACTLY these sections: 1... 2... 3..."
RAG Grounding	"Base your answer ONLY on the provided documents"
Self-Critique	"Review: Is every claim supported by data?"
Meta-Prompting	"Write the best prompt for [TASK]"
LLM-as-Judge	"Score this output against these criteria"
Negative Constraints	"Do NOT include / Do NOT use / Do NOT exceed"
Task Decomposition	Break 1 big prompt into 3-4 focused prompts
Draft-Score-Revise	"Draft, then score on [rubric], then revise if < threshold"
Show Don't Tell	Include 1-2 examples of desired output format

Preview

From Prompts to
Workflow Automation

Everything you learned today becomes the foundation for autonomous AI agents

Your Prompt Skills = Agent Design Skills

Every technique you learned today maps directly to how AI agents are built:

Day 2: Prompt Technique	Day 3: Agent Component	What it does in an agent
Persona prompting	Agent role definition	Defines who the agent "is" and how it behaves
Structured output	Output contracts	Ensures consistent, usable results
Chain-of-Thought	Reasoning strategy	Agent thinks step-by-step before acting
RAG grounding	Knowledge base	Agent accesses your company's documents
Negative constraints	Guardrails	Prevents the agent from doing things it shouldn't
Prompt template	SKILL.md file	The template becomes a reusable, shareable skill

Key insight: You don't need to code to design an AI agent. You need to write great instructions — which is exactly what you practiced today.

Preview: Templates → Skills → Automation

Tomorrow you'll turn your prompt templates into automated workflows:

Today: Prompt template

You are a Senior Risk Analyst...
Analyze merchant data and produce:
1. Risk Rating (GREEN/AMBER/RED)
2. Transaction Analysis
3. Recommended Actions

Pasted manually each time

Tomorrow: SKILL.md + Hooks

---
name: merchant-risk-assessment
description: Generate risk assessments
---
[Same template + auto-trigger]

✓ Auto-activates, shared, versioned
✓ Works in Kiro AND Claude Cowork

Day 3 covers: Workflow patterns (chaining, parallelization, routing, orchestration), the Kiro stack (steering + skills + hooks), and you'll design an agent for your team's workflow.

Quick preview only — don't go deep. Show the before/after to build excitement for tomorrow. The key message: the template they built today becomes a portable skill file with 4 lines of frontmatter. Day 3 covers the full stack and they'll design a real agent. The callout lists what's coming tomorrow.

The 3-Day Journey

📚

Day 1

"What can AI do?"

Fundamentals, use cases, responsible AI

💬

Day 2 (Today)

"How do I talk to AI?"

Prompt engineering, templates, tools

🤖

Day 3 (Tomorrow)

"How do I make AI work on its own?"

Agentic AI, workflow automation, no code

💡 Homework: What repetitive task does your team do every week that could be automated? Come to Day 3 with a specific workflow — you'll design an AI agent for it.

Day 2 Outcomes

Design prompts using the 4 pillars (Clarity, Context, Role, Output)
Apply Chain-of-Thought and Self-Consistency for financial reasoning
Create expert personas for different audiences
Extract structured data and ground responses in documents
Evaluate prompt quality with rubrics and LLM-as-Judge
Use Bedrock tools to optimize and manage prompts at scale
Manage long conversations and know when to start fresh
Build reusable prompt templates — the foundation for AI agents
Identify a workflow from your team to automate on Day 3

Thank You

Tomorrow: Make AI Work On Its Own

Agentic AI · Workflow Automation · Agent Design · No Coding Required

💡 Homework: Come with a workflow your team does every week that could be automated

AnyCompany Financial Group · Generative & Agentic AI on AWS

Prompt EngineeringWorkshop

Prompt FundamentalsDeep Dive

The 80/20 Rule of Prompting

1. Clarity

2. Context

3. Role Assignment

4. Output Framing

Pillar 1: Clarity

Pillar 2: Context

4 Types of Context

Context in Action: Merchant Review

Pillar 3: Role Assignment

Pillar 4: Output Framing

Output Framing in Action

❌ No output framing

✅ With output framing

All 4 Pillars Together

See the Difference: Merchant Review

❌ Without Technique

✅ With 4 Pillars

Chain-of-ThoughtReasoning

Why Chain-of-Thought?

Without CoT

With CoT

CoT Techniques

Zero-Shot CoT Example

❌ Without CoT

✅ With "Think step by step"

Few-Shot CoT Example

Your prompt (with example)

AI output (follows your pattern)

Step-Back Prompting Example

Your prompt

AI response

Self-Consistency for High Stakes

Your prompt

AI response

See the Difference: Loan Decision

❌ Without CoT

✅ With Chain-of-Thought

Role & PersonaPrompting

The Persona Formula

Conservative Analyst

Growth Analyst

Persona in Action: Same Merchant, Different Eyes

🛡 Conservative Risk Analyst

📈 Growth Business Analyst

Multi-Agent Framing

Your prompt

AI response (synthesis excerpt)

Same Data, Different Audiences

Structured Outputs& RAG

Why Structure Matters

Unstructured = Conversation

Structured = Form

RAG — Grounding in Documents

Meta-Prompting

See the Difference: Policy Q&A

❌ Without RAG Grounding

✅ With RAG Grounding

Model-SpecificTuning

Model Comparison

Temperature Guide for Finance

EvaluatingYour Prompts

Why Evaluate?

Manual Evaluation: Rubrics

LLM-as-Judge

A/B Testing Prompts

Process

When to Re-evaluate

From Manual Promptsto Automated Tools

The Reality: Nobody Writes Long Prompts Every Day

Bedrock Prompt Management

Prompt Management: Key Features

Prompt Optimization (Instructor Demo)

Your prompt

Bedrock's optimized version

How Prompt Optimization Works

The Bottom Line

Prompt EngineeringExercises

Prompt Engineering
Workshop

Prompt Fundamentals
Deep Dive

Chain-of-Thought
Reasoning

Role & Persona
Prompting

Structured Outputs
& RAG

Model-Specific
Tuning

Evaluating
Your Prompts

From Manual Prompts
to Automated Tools

Prompt Engineering
Exercises

Best Practices &
Prompt Optimization

From Prompts to
Workflow Automation