Day 2

Prompt Engineering
Workshop

Mastering Prompt Techniques for All Users

AnyCompany Financial Group · Generative & Agentic AI on AWS

Module 1

Prompt Fundamentals
Deep Dive

The 4 pillars that determine 80% of output quality

The 80/20 Rule of Prompting

80% of prompt quality comes from 4 fundamentals:

1. Clarity

Say exactly what you mean. If a colleague would ask "what do you mean?" — your prompt needs work.

2. Context

Give the AI the background it needs. Without context, it guesses — dangerous in finance.

3. Role Assignment

Tell the AI who to be. A "risk analyst" focuses on different signals than a "support agent."

4. Output Framing

Define what "done" looks like — format, length, structure, style.

Pillar 1: Clarity

VagueClear
"Summarize this report""Summarize this quarterly earnings report in 5 bullet points, focusing on revenue growth, cost changes, and risk factors"
"Help me with this data""Analyze this CSV of 500 transactions and identify the top 3 merchants by total volume"
"Write something about compliance""Draft a 200-word summary of MAS Notice 626 requirements for e-payment service providers"

Rule of thumb: The more specific your prompt, the less the AI has to guess.

Pillar 2: Context

Without context:

"Is this transaction suspicious?"
With context:

"This merchant is a convenience store 
in Singapore, typically 50-80 txns/day 
averaging $15 SGD. Today: 340 txns 
averaging $4.50. Is this suspicious?"

Types of context: Domain · Data · Situational · Constraints

4 Types of Context

TypeWhat it tells the AIFinance example
DomainThe industry, market, and business area"In the context of Southeast Asian digital payments and PayLater services..."
DataThe specific numbers, records, or documents to analyze"Here is the merchant's transaction history for the last 6 months: [data]"
SituationalWhy you need this now — the trigger or event"We are preparing for a quarterly board review" / "This merchant was flagged by our monitoring system"
ConstraintsRules, limits, and requirements the output must follow"All amounts in SGD with 2 decimal places" / "Follow MAS Notice 626 guidelines"

Rule of thumb: If you skip Domain context, the AI gives generic answers. If you skip Data context, it hallucinates. If you skip Situational context, it guesses your purpose. If you skip Constraints, it ignores your standards.

Context in Action: Merchant Review

[DOMAIN]
You are reviewing an AnyCompany Pay merchant in 
Singapore's food & beverage sector.

[DATA]
Merchant: Kopi Corner Pte Ltd (ID: MC-8842)
Monthly txn volume: 4,200 → 15,600 (6-month trend)
Avg transaction: $8.50 SGD
Chargeback rate: 0.3% → 4.1% (6-month trend)
Complaints: 12 in last 30 days (up from 2)

[SITUATIONAL]
Auto-flagged: chargeback rate exceeds 1.0% threshold.
Risk committee meets Friday.

[CONSTRAINTS]
- All amounts in SGD
- Reference AnyCompany's chargeback policy (max 1.0%)
- Use only the data provided above
- Include a GREEN/AMBER/RED risk rating

Pillar 3: Role Assignment

RoleWhat changes in the output
Compliance OfficerFocuses on regulatory requirements, flags risks
Customer Support AgentEmpathetic language, resolution-focused
Financial AnalystNumbers, trends, comparisons, frameworks
Fraud InvestigatorPatterns, anomalies, evidence chains

Pro tip: Add experience level — "Senior Credit Risk Analyst with 10 years of experience in Southeast Asian consumer lending, specializing in PayLater products"

Pillar 4: Output Framing

DimensionExample
Format"Respond as a bullet list" / "Use a table"
Length"In exactly 3 sentences" / "Under 200 words"
Structure"Use sections: Summary, Analysis, Recommendation"
Style"Write for a non-technical executive audience"
Constraints"Do not include personal opinions"

Output Framing in Action

Same question: "What happened with PayLater chargebacks this quarter?"

❌ No output framing

PayLater chargebacks have been increasing this quarter. The rate went from about 2.1% to 2.8% which is a notable increase. There are several factors that could be contributing to this including seasonal spending patterns, new merchant onboarding, and changes in customer demographics. The team should look into this further and consider implementing additional fraud checks. It might also be worth reviewing the underwriting criteria for new merchants. Overall, while the increase is concerning, it's not yet at critical levels but should be monitored closely going forward.

⚠ Wall of text. No structure. Vague recommendations. Key numbers buried.

✅ With output framing

Summary: Chargeback rate rose 2.1% → 2.8% (+33%).

Root causes:
• New merchant cohort (Q2): 4.2% vs 1.8% established
• "Item not received" disputes +45% in Indonesia

Actions:
• Tighten onboarding for high-risk MCCs — Risk, 2 wks
• Add delivery confirmation for ID — Product, 1 wk
• Review top 20 merchants by volume — Ops, immediate

✓ Scannable. Key number upfront. Actions with owners & timelines.

All 4 Pillars Together

[ROLE] You are a Payment Operations Manager at a 
Southeast Asian fintech.

[CONTEXT] Below is the daily transaction summary across 
6 markets. We had a system outage in Thailand from 
2:00-4:30 PM yesterday.

[TASK] Analyze the data and produce a daily operations 
brief for leadership.

[OUTPUT] Format as:
1. Executive Summary (3 sentences)
2. Market Performance Table
3. Incidents & Impact (bullets)
4. Action Items (numbered, with owners)
Keep under 400 words.

See the Difference: Merchant Review

❌ Without Technique

Prompt: "Review this merchant"

The merchant seems to be doing okay. Their transactions have been increasing which is good. There are some chargebacks but that's normal for this type of business. Overall they appear to be a reasonable merchant to work with. I would recommend continuing the relationship.

⚠ Vague, no data cited, no structure, no actionable recommendation, no risk rating

✅ With 4 Pillars

Prompt: Role + Context + Structure + Constraints

RISK RATING: AMBER
Transaction Analysis: Volume surged 271% (4,200 to 15,600) over 6 months. Avg txn size flat at $10 suggests organic growth, not fraud.
Chargeback Alert: Rate escalated from 0.3% to 4.1%, exceeding the 1.0% benchmark by 4x. Primary driver: "Item not received" (45%).
Action: 1) Issue chargeback warning letter (Merchant Ops, 5 days) 2) Reduce PayLater limit to $5 minimum (Risk, immediate)

✓ Structured, data-cited, specific actions with owners and timelines

Module 2

Chain-of-Thought
Reasoning

Making AI show its work — step by step

Why Chain-of-Thought?

Financial decisions require multi-step logic. CoT makes reasoning visible and auditable.

Without CoT

"Can this merchant afford a 
$50K credit line?"

Answer: "Yes, they can afford it."
(No reasoning shown)

With CoT

"Think through this step by step."

Revenue: $120K
- Costs: $95K = $25K free cash
- Existing debt: $15K
= Available: $10K
New repayment: $5K
DSCR = 2.0x → Affordable

CoT Techniques

TechniqueTriggerBest for
Zero-Shot CoT"Think step by step"Quick calculations, simple logic
Few-Shot CoTProvide example with reasoningConsistent multi-step processes
Step-Back"First identify key factors, then analyze"Complex analysis needing prioritization
Self-Consistency"Solve 3 ways, report majority"High-stakes decisions

Finance rule: Any decision that could be audited should use CoT — the reasoning trail is your documentation.

Zero-Shot CoT Example

Just add "Think step by step" to any prompt — no examples needed.

❌ Without CoT

PROMPT: A merchant has revenue $120K, costs $95K, existing loan repayment $15K/month. They want a $50K credit line with $5K monthly repayments. Can they afford it?
AI RESPONSE:
Yes, they can afford it. The merchant has sufficient revenue to cover the additional repayment.

⚠ No math. No reasoning. Not auditable. Could be wrong.

✅ With "Think step by step"

PROMPT: Same data as left. + "Think through this step by step."
AI RESPONSE:
Revenue: $120K
− Operating costs: $95K
= Free cash flow: $25K
− Existing loan: $15K
= Available: $10K
New repayment: $5K
DSCR = $10K ÷ $5K = 2.0x
Affordable — DSCR above 1.25x threshold

✓ Every step visible. Math auditable. Conclusion backed by evidence.

Few-Shot CoT Example

Provide one example with reasoning — the AI follows the exact same pattern.

Your prompt (with example)

PROMPT: Assess PayLater applications step by step.

EXAMPLE:
Customer A: income $4,000, balance $800, req $2,000 increase.
Step 1 — Income: $4,000 > $2,500 min ✅
Step 2 — Current DTI: $800/$4,000 = 20% < 30% ✅
Step 3 — New DTI: $2,800/$4,000 = 70% > 30% ❌
Step 4 — DECLINE: DTI exceeds policy max.

NOW ASSESS:
Customer B: income $6,500, balance $1,200, req $1,500 increase.

AI output (follows your pattern)

AI RESPONSE:
Step 1 — Income: $6,500 > $2,500 min ✅
Step 2 — Current DTI: $1,200/$6,500 = 18.5% < 30% ✅
Step 3 — New DTI: $2,700/$6,500 = 41.5% > 30% ❌
Step 4 — DECLINE: DTI 41.5% exceeds 30% max.

Counter-offer: Increase to $750 only (DTI = $1,950/$6,500 = 30.0%).

Key insight: The AI copied your exact Step 1-4 format, applied the same thresholds, and even added a counter-offer — all from one example.

Step-Back Prompting Example

First identify principles, then apply them to the specific case.

Your prompt

PROMPT:
Before analyzing this insurance claim, first answer:
1. What are the 3 most important factors for travel insurance claim validity?
2. What are common red flags in Southeast Asia?

Now assess this claim using those factors:
Claimant: Sarah Tan (SG)
Trip: Bangkok, 3 nights
Claim: $4,200 SGD — laptop stolen from hotel
Filed: 12 days after return
Police report: Filed on last day of trip
Receipt: Online purchase, 2 months before trip

AI response

STEP 1 — KEY FACTORS:
1. Documentation timeliness (police report, filing delay)
2. Evidence quality (receipts, proof of ownership)
3. Claim amount vs trip context

STEP 2 — RED FLAGS:
• Filing delay >7 days • Police report on departure day
• High-value electronics • No hotel incident report

ASSESSMENT: HIGH RISK
• 12-day filing delay ❌
• Police report timing suspicious (last day) ❌
• $4,200 claim for 3-night trip ⚠
Recommend: Request hotel incident report + purchase proof before processing.

Why Step-Back works: The AI built a framework FIRST (factors + red flags), then applied it systematically — instead of jumping to "looks suspicious."

Self-Consistency for High Stakes

Solve 3 ways, report the majority — for decisions where being wrong is costly.

Your prompt

PROMPT:
Investigate this merchant for potential fraud.

Analyze using 3 independent approaches:
1. Velocity patterns (txn frequency vs norms)
2. Amount patterns (deviation from avg size)
3. Geographic patterns (location consistency)

For each: conclude FRAUDULENT / SUSPICIOUS / LEGITIMATE

Final assessment = majority conclusion.
If approaches disagree → flag for human review.

AI response

Approach 1 — Velocity: 340 txns/day vs 75 norm (+353%). Burst pattern at 2-4am. → FRAUDULENT

Approach 2 — Amount: Avg $4.50 vs $15 norm (−70%). Micro-transactions consistent with card testing. → FRAUDULENT

Approach 3 — Geographic: All transactions from Singapore IP. Consistent with merchant location. → LEGITIMATE

MAJORITY: 2/3 FRAUDULENT
Recommend: Suspend merchant, escalate to fraud team for full investigation.

Why 3 approaches? Approach 3 alone would have cleared this merchant. The majority vote catches what a single analysis misses.

See the Difference: Loan Decision

❌ Without CoT

Prompt: "Should we approve this PayLater application?"

Yes, I would recommend approving this PayLater application. The customer has a decent income and has been on the platform for a reasonable amount of time. Their payment history is mostly good with only minor issues.

⚠ No reasoning shown, no math, not auditable, "decent" and "reasonable" are not metrics

✅ With Chain-of-Thought

Prompt: "Think through each policy criterion step by step"

Step 1 — Income: $5,200 > $2,000 min ✅
Step 2 — DTI: ($400 loan + $3,000 req) / $5,200 = 65% > 40% max ❌
Step 3 — Late payments: 2 in 8 months ≤ 2 max ✅
Step 4 — History: 8 months > 6 min ✅
Decision: DECLINE. DTI of 65% exceeds 40% policy maximum. Suggest $1,500 counter-offer (DTI = 37%).

✓ Every criterion checked, math shown, auditable, includes counter-offer

Module 3

Role & Persona
Prompting

Same data, dramatically different insights — just by changing who the AI "is"

The Persona Formula

You are [TITLE] at [COMPANY TYPE] 
with [X years] of experience in [SPECIALTY].
You are known for [CHARACTERISTIC]. 
When [SITUATION], you always [BEHAVIOR].

Conservative Analyst

"Senior Risk Analyst, 12 years in consumer credit. Known for being cautious. When in doubt, recommends additional verification."

Growth Analyst

"Business Development Analyst, 6 years in merchant acquisition. Known for finding opportunities. Always highlights growth potential alongside risks."

The last two fields matter most: CHARACTERISTIC and BEHAVIOR shape the AI's judgment calls. "Cautious" vs "opportunity-focused" produces completely different recommendations from the same data.

Persona in Action: Same Merchant, Different Eyes

Data: Merchant with 271% volume growth + chargeback rate rising from 0.3% to 4.1%

🛡 Conservative Risk Analyst

RISK RATING: RED — IMMEDIATE ACTION

Primary concern: Chargeback rate at 4.1% is 4x above the 1.0% policy threshold. This trajectory suggests systemic issues, not isolated incidents.

Volume growth concern: 271% growth without proportional infrastructure investment raises fulfillment risk. Rapid scaling often precedes quality collapse.

Recommendation:
• Reduce PayLater limit to minimum immediately
• Issue formal chargeback warning letter
• Place on 30-day enhanced monitoring
• If no improvement → initiate offboarding review

📈 Growth Business Analyst

ASSESSMENT: HIGH-GROWTH MERCHANT — MANAGE, DON'T EXIT

Growth signal: 271% volume increase indicates strong product-market fit. This merchant is scaling fast — a potential top-10 partner within 6 months.

Chargeback context: 4.1% rate is elevated but common during rapid scaling. 45% are "item not received" — a logistics issue, not fraud. Solvable with delivery confirmation.

Recommendation:
• Maintain current limits to support growth
• Offer delivery tracking integration (reduces chargebacks 60%)
• Assign dedicated merchant success manager
• Review in 60 days with chargeback target of 1.5%

Multi-Agent Framing

Get 3 perspectives in one prompt — no need to schedule 3 meetings.

Your prompt

PROMPT:
Evaluate this PayLater policy change (raising limit from $500 to $1,000):

🛡 RISK MANAGER: Default rate, exposure, regulation
📊 PRODUCT MANAGER: Adoption, competition, revenue
⚖ COMPLIANCE: Responsible lending, MAS guidelines

Each: 3 concerns + Recommendation + 1 metric to monitor

Then: Balanced synthesis of all three views.

AI response (synthesis excerpt)

🛡 RISK: "Doubling limits increases exposure by $12M. Default rate likely rises 0.3-0.5pp. Recommend: phased rollout to top-tier customers first."

📊 PRODUCT: "Competitors offer $1,500+. Current $500 limit is #1 reason for churn. Revenue uplift est. $2.4M/quarter."

⚖ COMPLIANCE: "MAS Notice requires affordability assessment above $500. Must add income verification step."

SYNTHESIS: Proceed with phased rollout ($750 first) with income verification. Monitor default rate weekly. Full $1,000 after 90-day review.

Why this works: Forces balanced analysis. No single perspective dominates. The synthesis is where the real insight lives.

Same Data, Different Audiences

Data: "PayLater default rate increased from 2.1% to 2.8% this quarter"

AudiencePersonaOutput style
Board"You are the CFO presenting to the board"Strategic, 5-minute read
Ops Team"You are the Ops Manager briefing your team"Actionable, task-oriented
Regulators"You are Compliance Head responding to MAS"Formal, regulation-referenced
Customers"You are a support specialist"Simple, empathetic

💡 Practice activity (10 min): Pick the same data point above. Write prompts for 2 different audiences. Compare how the tone, detail level, and recommendations change.

Module 4

Structured Outputs
& RAG

JSON extraction, document grounding, and meta-prompting

Why Structure Matters

Unstructured = Conversation

Different every time. Hard to compare. Can't feed into systems. Requires human parsing.

Structured = Form

Consistent format. Comparable across items. Machine-parseable. Scannable by busy stakeholders.

Finance use cases:

  • Invoice extraction → accounts payable system
  • Transaction categorization → reconciliation
  • Complaint classification → route to correct team
  • KYC document parsing → verification forms

RAG — Grounding in Documents

CRITICAL RULES:
1. Base your answer ONLY on the provided documents
2. After each claim, cite: [Doc Name, Section]
3. If not in documents: "Not available in 
   provided documents"
4. Do NOT use outside knowledge

DOCUMENTS:
[Doc 1: PayLater Terms v3.2]
[Doc 2: MAS Notice PSN 06]

QUESTION: What are our obligations if a customer 
misses 3 consecutive PayLater payments?

Critical for compliance: Without "ONLY on provided documents," the AI mixes its knowledge with yours — you can't trace which claims come from where.

Meta-Prompting

Ask the AI to write prompts for you:

I need a prompt that helps our compliance team 
assess whether a new MAS regulatory update 
affects our PayLater product.

The prompt should:
- Work with any MAS circular
- Produce a structured impact assessment
- Identify operations that need to change
- Suggest a compliance timeline

Write the best possible prompt for this task.

The AI builds a tool for your team. Use the generated prompt repeatedly — it's a reusable asset.

See the Difference: Policy Q&A

❌ Without RAG Grounding

Prompt: "What happens if a customer misses 3 PayLater payments?"

If a customer misses 3 consecutive PayLater payments, the account is typically suspended and the outstanding balance may be sent to collections. Late fees of around 1-2% per month are usually applied. The customer's credit score will be negatively impacted and they may be barred from future PayLater services.

⚠ Sounds plausible but NONE of this may be in your actual policy. "Typically" and "usually" = hallucination. No citations.

✅ With RAG Grounding

Prompt: "ONLY from provided documents" + policy doc attached

Per the PayLater Terms v3.2:
• After 3 missed payments, the account enters "Default" status [Section 8.3]
• A late fee of $5 or 1.5% (whichever is higher) applies per missed payment [Section 7.1]
• [INSUFFICIENT DATA: The provided document does not specify credit score reporting obligations. Refer to Section 12 or the Credit Bureau Reporting Policy.]

✓ Every claim cites a section. Admits what it doesn't know. No hallucination.

Module 5

Model-Specific
Tuning

Choosing the right model and parameters for each task

Model Comparison

CapabilityClaudeGPT-4Gemini
Long documents200K tokens128K tokens1M tokens
Structured outputXML tagsNumbered rulesSchemas
Instruction followingVery preciseGood (may add extras)Good
Best prompt styleXML + constraintsNumbered rulesGrounding instructions
Finance/complianceStrong safetyFlexibleGood with grounding

Temperature Guide for Finance

TemperatureBehaviorUse for
0.1Deterministic, same answer every timeData extraction, compliance, calculations
0.3Slightly varied, still focusedReport writing, narratives, summaries
0.7Creative, diverse outputsBrainstorming, ideation, drafting

Rule: Never use Temperature > 0.7 for anything that will be audited or submitted to regulators.

Module 6 · NEW

Evaluating
Your Prompts

How do you KNOW your prompts are working?

Why Evaluate?

  • Prompts degrade over time — model updates change behavior
  • "It looks good" is not a metric — you need measurable quality
  • Compliance requires evidence that AI outputs meet standards
  • You need to compare version A vs version B objectively

The problem: Most teams deploy prompts based on "it looked good when I tested it once." That's like shipping software without tests.

Manual Evaluation: Rubrics

Criterion1 (Poor)3 (OK)5 (Excellent)
CompletenessMissing 3+ sectionsAll sections, some thinAll sections thorough
Data groundingUnsupported claimsMostly groundedEvery claim cites data
ActionabilityNo recommendationVague recommendationSpecific actions + owners
ConsistencyDifferent each runMostly consistentIdentical structure

Process: Run same prompt 5 times → score each → average = quality score

LLM-as-Judge

Use one AI to evaluate another AI's output:

Score this merchant risk assessment:

1. All 8 sections present? (0-10)
2. Every claim cites data? (0-10)
3. Risk rating justified? (0-10)
4. Actions specific & actionable? (0-10)

Return JSON: {"completeness": X, "grounding": X, 
"consistency": X, "actionability": X, 
"total": X, "issues": ["..."]}

Run on 10 outputs → compare scores between template versions

A/B Testing Prompts

Process

  • Same input data, two prompt versions
  • Run both 10 times each
  • Score with the judge prompt
  • Higher average score wins

When to Re-evaluate

  • After any model update
  • When users report quality issues
  • Monthly for production templates
  • After any template modification

Module 7 · NEW

From Manual Prompts
to Automated Tools

You build the template once. The tools do the rest.

The Reality: Nobody Writes Long Prompts Every Day

You learn the techniques → build the template once → let the tools handle the rest.

PhaseWhat you doTool
1. LearnMaster the techniques (today)Your brain
2. BuildCreate a reusable template with {{variables}}Kiro / any AI chat
3. OptimizeLet AI rewrite your prompt for better performanceBedrock Prompt Optimization
4. Store & ShareSave versioned templates with metadataBedrock Prompt Management
5. ReuseFill in variables and run — no rewriting neededBedrock Console / API

Bedrock Prompt Management

Your prompt library — stored, versioned, and shared across the team.

Manual (today)Bedrock Prompt Management
Templates in markdown filesStored as managed resources
Copy-paste to testOne-click testing across models
No version historyImmutable version snapshots
Manual comparisonSide-by-side model comparison
Share via email/SlackShared across team via API

No additional charge — you only pay for model tokens during testing.

Prompt Management: Key Features

  • Prompt Templates with {{variables}} — same syntax from the exercises. Define variables with descriptions and defaults.
  • Version Management — every change creates an immutable snapshot. Roll back anytime.
  • Multi-Model Testing — test across Claude, Nova, Llama side-by-side. Compare quality, latency, cost.
  • Up to 3 Prompt Variants — compare different versions of the same prompt to find the best performer.

Think of it as: Google Docs for prompts — versioned, shared, and always accessible. But with built-in testing across multiple AI models.

Prompt Optimization (Instructor Demo)

You write a basic prompt. Bedrock rewrites it for better performance — automatically.

Your prompt

"Assess this merchant's risk level"

6 words. No structure, no role, no constraints.

Bedrock's optimized version

"You are a Senior Risk Analyst 
specializing in SEA digital payments.
Produce a risk assessment:
1. Rating (GREEN/AMBER/RED)
2. Transaction Pattern Analysis
3. Chargeback Assessment
4. Recommended Actions
Base analysis ONLY on provided data."

Persona + structure + grounding — applied automatically

How Prompt Optimization Works

  • Step 1: Submit your prompt (even a short, rough one)
  • Step 2: Bedrock analyzes the prompt components
  • Step 3: It rewrites with best practices — structure, constraints, model-specific formatting
  • Step 4: Compare original vs optimized output side-by-side
  • Step 5: Save the optimized version to your Prompt Management library

GA — April 2025. Supports Claude, Amazon Nova, Meta Llama, DeepSeek, Mistral. The techniques you learned today help you evaluate whether the optimized prompt is actually good.

The Bottom Line

Your concernThe solution
"I don't want to write long prompts every time"Build the template once → reuse with {{variables}}
"I'm not sure my prompt is good enough"Prompt Optimization rewrites it automatically
"My team needs to share and version prompts"Prompt Management stores everything centrally
"Which model gives the best result?"Multi-model testing compares side-by-side

For developers: Intelligent Prompt Routing auto-selects cheaper models for simple tasks (up to 30% cost savings). Prompt Flows chains prompts into automated workflows. These are covered in Day 3.

Hands-on

Prompt Engineering
Exercises

Build a reusable prompt template in 7 steps

Choose Your Exercise

Exercise 1: Merchant Risk Assessment

Best for: Risk analysts, merchant ops, compliance

Techniques: Zero-Shot → Persona → Few-Shot → Structured → RAG + Self-Critique → Meta-Prompting → Validation

Deliverable: Reusable template for GREEN/AMBER/RED merchant risk assessments

Exercise 2: Credit Risk Narrative

Best for: Credit analysts, PayLater ops, financing

Techniques: Zero-Shot → Step-Back → Audience Framing → Multi-Perspective → Structured + Length Control → Meta-Prompting → Validation

Deliverable: Reusable template for APPROVE/CONDITIONS/DECLINE credit narratives

Open the workshop site → Prompt Engineering Exercises

Wrap-up

Best Practices &
Prompt Optimization

Common mistakes, optimization strategies, and recovery patterns

7 Prompt Mistakes Everyone Makes

#MistakeWhy it hurtsQuick fix
1The Kitchen SinkCramming 5 tasks into 1 promptOne task per prompt, chain results
2The Blank CanvasNo examples = AI guesses your formatShow 1-2 examples of desired output
3The Trust FallNo grounding = confident hallucinations"ONLY from provided data"
4The Vague Ask"Analyze this" — analyze what, how, for whom?Specify audience, format, length
5The One-Shot WonderExpecting perfection on first tryPlan for 2-3 refinement turns
6The Copy-Paste TrapUsing the same prompt for different modelsTune syntax per model family
7The Set-and-ForgetNever re-testing after model updatesMonthly prompt health checks

The Draft-Score-Revise Loop

Don't accept the first output. Build a self-improving cycle into your prompt:

Step 1 — DRAFT: Write a merchant risk summary 
  using the data provided.

Step 2 — SCORE: Rate your draft on these criteria:
  - Completeness (0-5): All required sections?
  - Grounding (0-5): Every claim cites data?
  - Actionability (0-5): Specific next steps?

Step 3 — REVISE: If total < 12, rewrite to fix 
  the lowest-scoring area. Max 2 revisions.

Output only the final version.

Result: The AI self-corrects before you even read it. Teams using this pattern report 40-60% fewer revision cycles.

Break Big Tasks into Small Steps

Complex tasks fail when you ask for everything at once. Decompose instead:

❌ One Giant Prompt

"Analyze our Q2 transactions, 
identify fraud patterns, calculate 
loss exposure, compare to Q1, 
draft a board summary, and 
recommend 3 prevention measures."

6 tasks = shallow work on each

✅ Chained Prompts

Prompt 1: "Analyze Q2 transactions 
  and flag anomalies"
Prompt 2: "From these anomalies, 
  identify the top 3 fraud patterns"
Prompt 3: "Calculate loss exposure 
  for each pattern"
Prompt 4: "Draft a board summary 
  with prevention measures"

Each step gets full attention

Tell the AI What NOT to Do

Positive instructions tell the AI what to include. Negative constraints prevent common failure modes:

ProblemNegative constraint to add
AI adds unsolicited opinions"Do not include personal opinions or speculation"
AI uses data not in your input"Do not reference any data outside the provided documents"
AI writes too much"Do not exceed 300 words. Do not add a conclusion section"
AI hedges everything"Do not use phrases like 'it depends' or 'generally speaking'"
AI explains obvious things"Do not explain what PayLater is or how digital wallets work"
AI invents numbers"If a metric is not in the data, write [DATA NOT AVAILABLE]"

Pro tip: After your first test run, note what went wrong and add a "Do NOT" line for each issue. Your prompt improves with every iteration.

Structure Your Prompts Like Documents

Well-organized prompts produce well-organized outputs. Use clear sections and delimiters:

### ROLE
You are a Senior Payment Operations Analyst.

### CONTEXT
<<<
[Paste your transaction data or document here]
>>>

### TASK
Analyze the data for anomalies in Thailand and Vietnam.

### OUTPUT FORMAT
- Executive summary (3 sentences)
- Anomaly table: Market | Type | Severity | Evidence
- Recommended actions (numbered, with owner)

### CONSTRAINTS
- Use ONLY the data provided above
- All amounts in SGD
- Do not exceed 400 words

Why delimiters matter: Without clear separation, the AI may confuse your instructions with your data — especially dangerous when pasting policy documents.

Show, Don't Tell: The Power of Examples

One good example is worth 100 words of instruction:

❌ Telling

"Categorize each transaction as 
high risk, medium risk, or low 
risk based on amount, frequency, 
and merchant type. Format as a 
table with columns for transaction 
ID, category, and reasoning."

50 words of instruction, AI still guesses your format

✅ Showing

"Categorize transactions like this:

| ID | Risk | Reason |
| T001 | HIGH | $12K single txn, 
  new merchant, no history |
| T002 | LOW | $45 recurring, 
  12-month pattern |

Now categorize these: [data]"

One example = perfect format every time

The 3-Round Prompt Improvement Workflow

Every production-quality prompt goes through this cycle:

RoundWhat you doWhat improves
Round 1: BaselineWrite your first prompt using the 4 pillars. Run it 3 times.You see what the AI gets right and wrong
Round 2: Fix failuresAdd negative constraints for each failure. Add an example of good output. Run 3 more times.Consistency jumps from ~60% to ~85%
Round 3: PolishAdd self-review step. Tighten length/format. Test with edge cases.Production-ready at ~95% consistency

Total time: 15-20 minutes to go from first draft to production template. That template then saves hours every week.

Build a Team Prompt Library

Your best prompts are team assets, not personal notes. Treat them like shared templates:

What to include

  • Prompt name and purpose
  • The full prompt with {{variables}}
  • Which model and temperature to use
  • 1-2 example outputs (good vs bad)
  • Known limitations and edge cases
  • Last tested date and model version

Starter library for finance

  • Merchant risk assessment
  • Transaction anomaly detection
  • Customer complaint classification
  • Policy document Q&A (RAG)
  • Board summary generator
  • Regulatory impact assessment
  • KYC document extraction
  • Fraud investigation narrative

Start today: The template you built in the exercise is your first library entry. Share it with your team this week.

Why AI "Gets Dumber" Mid-Conversation

It's not a bug — it's a context window problem. Every AI has a limited "working memory."

What happens inside

  • Every message you send + every AI response stays in the context window
  • At 60-70% capacity, performance drops sharply — not gradually, but in sudden cliffs
  • The AI compresses and deprioritizes earlier messages to make room
  • "Lost in the Middle" effect: AI remembers the start and end of conversations best, but forgets what's in the middle

What you experience

  • AI contradicts instructions you gave 10 messages ago
  • AI re-introduces ideas you already rejected
  • AI ignores constraints from the start of the chat
  • Outputs get vague, generic, or repetitive
  • AI starts "hallucinating" more frequently

Key insight: Most people blame the AI for "getting stupid." The real problem is the conversation got too long. The fix is context management, not a better model.

5 Rules for Managing Long Conversations

#RuleWhy it works
1One task per session — don't mix debugging, writing, and analysis in one chatEach session gets full attention capacity
2Paste only what's relevant — don't dump entire documents when you need one sectionReduces noise, keeps AI focused on what matters
3Put key instructions at the start AND end — not buried in the middleExploits primacy + recency bias
4Keep sessions under 15-20 turns — start fresh after thatStays within the performance sweet spot
5Use "session summaries" to carry state — ask AI to summarize, then paste into new chatFresh context window with all the knowledge

The Session Summary Technique

When a conversation gets too long but you can't lose the state:

Step 1: Ask for a summary

PROMPT (in the old session):

Summarize our conversation so far:
• Key decisions we made
• Data and findings so far
• What we still need to do next

Format as a briefing I can paste into a new session.

Step 2: Start fresh with context

PROMPT (in the new session):

Here is the context from our previous session:

[PASTE SUMMARY HERE]

Continue from where we left off. The next step is to draft the risk committee report based on the findings above.

✓ Fresh context window + all accumulated knowledge = best of both worlds

Think of it as "saving your game." You compress hours of conversation into a focused briefing, then load it into a fresh session with full attention capacity.

The Conversation Funnel

Start broad, then narrow. Each turn builds on context — but keep it focused.

The pattern

Turn 1 (Explore):
"Analyze this month's transaction data — identify top 3 trends"

Turn 2 (Deep-dive):
"Expand on trend #2 — the PayLater chargeback increase"

Turn 3 (Produce):
"Draft a 1-page summary for the risk committee"

Turn 4 (Polish):
"Make the tone more formal and add data citations"

Why it works

  • Each turn is focused on one thing
  • You review and correct at each step
  • Errors don't compound — you catch them early
  • 4 focused turns > 1 massive prompt

When to reset: If Turn 3 goes wrong, don't keep correcting. Start a new session with: "Here's the data and the trend analysis. Draft a risk committee summary."

When to Start Fresh vs. Continue

🟢 Start a New Session

  • Switching to a completely different task
  • Conversation has gone off track
  • Testing a refined prompt cleanly
  • Session is longer than 15-20 turns
  • AI keeps repeating the same mistake
  • AI contradicts earlier instructions

🔵 Continue the Session

  • Iterating on the same output
  • Need AI to remember earlier context
  • Building step by step (funnel pattern)
  • Refining format or tone
  • Follow-up questions on same topic
  • Session is still under 15 turns

The 3-strike rule: If you've corrected the AI 3 times and it's still wrong — the context is working against you. Start fresh. It's faster than fighting a polluted conversation.

Circuit Breaker Patterns

PatternSymptomFix
Repetition LoopSame wrong output after correctionNew session, rephrase
Hallucination SpiralInventing data"Use ONLY provided data"
Over-Eager Helper2,000 words for 5 bullets"Exactly 5 bullets, under 20 words"
Format DriftFormat changes mid-output"Continue EXACTLY same format"
Confidence TrapUncertain info as fact"Prefix uncertain with [UNCERTAIN]"

Using Kiro for Business Users

  • Vibe mode: Describe what you want → Kiro writes and runs the code
  • File context: Drag CSVs, PDFs, JSON into chat
  • Iterative refinement: "Make the chart bigger" / "Add a percentage column"
  • New Session per task: Keep context focused

Remember: You don't need to understand the code Kiro writes. You just need to describe what you want clearly — using the 4 pillars from Module 1.

Quick Reference Card

TechniqueTrigger Phrase
Zero-Shot CoT"Think step by step before answering"
Expert Persona"You are a Senior [ROLE] with X years in [SPECIALTY]"
Multi-Perspective"Present the case FOR and AGAINST"
Structured Output"Use EXACTLY these sections: 1... 2... 3..."
RAG Grounding"Base your answer ONLY on the provided documents"
Self-Critique"Review: Is every claim supported by data?"
Meta-Prompting"Write the best prompt for [TASK]"
LLM-as-Judge"Score this output against these criteria"
Negative Constraints"Do NOT include / Do NOT use / Do NOT exceed"
Task DecompositionBreak 1 big prompt into 3-4 focused prompts
Draft-Score-Revise"Draft, then score on [rubric], then revise if < threshold"
Show Don't TellInclude 1-2 examples of desired output format

Preview

From Prompts to
Workflow Automation

Everything you learned today becomes the foundation for autonomous AI agents

Your Prompt Skills = Agent Design Skills

Every technique you learned today maps directly to how AI agents are built:

Day 2: Prompt TechniqueDay 3: Agent ComponentWhat it does in an agent
Persona promptingAgent role definitionDefines who the agent "is" and how it behaves
Structured outputOutput contractsEnsures consistent, usable results
Chain-of-ThoughtReasoning strategyAgent thinks step-by-step before acting
RAG groundingKnowledge baseAgent accesses your company's documents
Negative constraintsGuardrailsPrevents the agent from doing things it shouldn't
Prompt templateSKILL.md fileThe template becomes a reusable, shareable skill

Key insight: You don't need to code to design an AI agent. You need to write great instructions — which is exactly what you practiced today.

Preview: Templates → Skills → Automation

Tomorrow you'll turn your prompt templates into automated workflows:

Today: Prompt template

You are a Senior Risk Analyst...
Analyze merchant data and produce:
1. Risk Rating (GREEN/AMBER/RED)
2. Transaction Analysis
3. Recommended Actions

Pasted manually each time

Tomorrow: SKILL.md + Hooks

---
name: merchant-risk-assessment
description: Generate risk assessments
---
[Same template + auto-trigger]

✓ Auto-activates, shared, versioned
✓ Works in Kiro AND Claude Cowork

Day 3 covers: Workflow patterns (chaining, parallelization, routing, orchestration), the Kiro stack (steering + skills + hooks), and you'll design an agent for your team's workflow.

Quick preview only — don't go deep. Show the before/after to build excitement for tomorrow. The key message: the template they built today becomes a portable skill file with 4 lines of frontmatter. Day 3 covers the full stack and they'll design a real agent. The callout lists what's coming tomorrow.

The 3-Day Journey

📚

Day 1

"What can AI do?"

Fundamentals, use cases, responsible AI

💬

Day 2 (Today)

"How do I talk to AI?"

Prompt engineering, templates, tools

🤖

Day 3 (Tomorrow)

"How do I make AI work on its own?"

Agentic AI, workflow automation, no code

💡 Homework: What repetitive task does your team do every week that could be automated? Come to Day 3 with a specific workflow — you'll design an AI agent for it.

Day 2 Outcomes

  • Design prompts using the 4 pillars (Clarity, Context, Role, Output)
  • Apply Chain-of-Thought and Self-Consistency for financial reasoning
  • Create expert personas for different audiences
  • Extract structured data and ground responses in documents
  • Evaluate prompt quality with rubrics and LLM-as-Judge
  • Use Bedrock tools to optimize and manage prompts at scale
  • Manage long conversations and know when to start fresh
  • Build reusable prompt templates — the foundation for AI agents
  • Identify a workflow from your team to automate on Day 3

Thank You

Tomorrow: Make AI Work On Its Own

Agentic AI · Workflow Automation · Agent Design · No Coding Required

💡 Homework: Come with a workflow your team does every week that could be automated

AnyCompany Financial Group · Generative & Agentic AI on AWS