AnyCompany Financial Group · Generative & Agentic AI on AWS
Module 1
Prompt Fundamentals Deep Dive
The 4 pillars that determine 80% of output quality
The 80/20 Rule of Prompting
80% of prompt quality comes from 4 fundamentals:
1. Clarity
Say exactly what you mean. If a colleague would ask "what do you mean?" — your prompt needs work.
2. Context
Give the AI the background it needs. Without context, it guesses — dangerous in finance.
3. Role Assignment
Tell the AI who to be. A "risk analyst" focuses on different signals than a "support agent."
4. Output Framing
Define what "done" looks like — format, length, structure, style.
Pillar 1: Clarity
Vague
Clear
"Summarize this report"
"Summarize this quarterly earnings report in 5 bullet points, focusing on revenue growth, cost changes, and risk factors"
"Help me with this data"
"Analyze this CSV of 500 transactions and identify the top 3 merchants by total volume"
"Write something about compliance"
"Draft a 200-word summary of MAS Notice 626 requirements for e-payment service providers"
Rule of thumb: The more specific your prompt, the less the AI has to guess.
Pillar 2: Context
Without context:
"Is this transaction suspicious?"
With context:
"This merchant is a convenience store
in Singapore, typically 50-80 txns/day
averaging $15 SGD. Today: 340 txns
averaging $4.50. Is this suspicious?"
Types of context: Domain · Data · Situational · Constraints
4 Types of Context
Type
What it tells the AI
Finance example
Domain
The industry, market, and business area
"In the context of Southeast Asian digital payments and PayLater services..."
Data
The specific numbers, records, or documents to analyze
"Here is the merchant's transaction history for the last 6 months: [data]"
Situational
Why you need this now — the trigger or event
"We are preparing for a quarterly board review" / "This merchant was flagged by our monitoring system"
Constraints
Rules, limits, and requirements the output must follow
"All amounts in SGD with 2 decimal places" / "Follow MAS Notice 626 guidelines"
Rule of thumb: If you skip Domain context, the AI gives generic answers. If you skip Data context, it hallucinates. If you skip Situational context, it guesses your purpose. If you skip Constraints, it ignores your standards.
Context in Action: Merchant Review
[DOMAIN]
You are reviewing an AnyCompany Pay merchant in
Singapore's food & beverage sector.
[DATA]
Merchant: Kopi Corner Pte Ltd (ID: MC-8842)
Monthly txn volume: 4,200 → 15,600 (6-month trend)
Avg transaction: $8.50 SGD
Chargeback rate: 0.3% → 4.1% (6-month trend)
Complaints: 12 in last 30 days (up from 2)
[SITUATIONAL]
Auto-flagged: chargeback rate exceeds 1.0% threshold.
Risk committee meets Friday.
[CONSTRAINTS]
- All amounts in SGD
- Reference AnyCompany's chargeback policy (max 1.0%)
- Use only the data provided above
- Include a GREEN/AMBER/RED risk rating
Pillar 3: Role Assignment
Role
What changes in the output
Compliance Officer
Focuses on regulatory requirements, flags risks
Customer Support Agent
Empathetic language, resolution-focused
Financial Analyst
Numbers, trends, comparisons, frameworks
Fraud Investigator
Patterns, anomalies, evidence chains
Pro tip: Add experience level — "Senior Credit Risk Analyst with 10 years of experience in Southeast Asian consumer lending, specializing in PayLater products"
Pillar 4: Output Framing
Dimension
Example
Format
"Respond as a bullet list" / "Use a table"
Length
"In exactly 3 sentences" / "Under 200 words"
Structure
"Use sections: Summary, Analysis, Recommendation"
Style
"Write for a non-technical executive audience"
Constraints
"Do not include personal opinions"
Output Framing in Action
Same question: "What happened with PayLater chargebacks this quarter?"
❌ No output framing
PayLater chargebacks have been increasing this quarter. The rate went from about 2.1% to 2.8% which is a notable increase. There are several factors that could be contributing to this including seasonal spending patterns, new merchant onboarding, and changes in customer demographics. The team should look into this further and consider implementing additional fraud checks. It might also be worth reviewing the underwriting criteria for new merchants. Overall, while the increase is concerning, it's not yet at critical levels but should be monitored closely going forward.
⚠ Wall of text. No structure. Vague recommendations. Key numbers buried.
✅ With output framing
Summary: Chargeback rate rose 2.1% → 2.8% (+33%).
Root causes:
• New merchant cohort (Q2): 4.2% vs 1.8% established
• "Item not received" disputes +45% in Indonesia
Actions:
• Tighten onboarding for high-risk MCCs — Risk, 2 wks
• Add delivery confirmation for ID — Product, 1 wk
• Review top 20 merchants by volume — Ops, immediate
✓ Scannable. Key number upfront. Actions with owners & timelines.
All 4 Pillars Together
[ROLE] You are a Payment Operations Manager at a
Southeast Asian fintech.
[CONTEXT] Below is the daily transaction summary across
6 markets. We had a system outage in Thailand from
2:00-4:30 PM yesterday.
[TASK] Analyze the data and produce a daily operations
brief for leadership.
[OUTPUT] Format as:
1. Executive Summary (3 sentences)
2. Market Performance Table
3. Incidents & Impact (bullets)
4. Action Items (numbered, with owners)
Keep under 400 words.
See the Difference: Merchant Review
❌ Without Technique
Prompt: "Review this merchant"
The merchant seems to be doing okay. Their transactions have been increasing which is good. There are some chargebacks but that's normal for this type of business. Overall they appear to be a reasonable merchant to work with. I would recommend continuing the relationship.
⚠ Vague, no data cited, no structure, no actionable recommendation, no risk rating
✅ With 4 Pillars
Prompt: Role + Context + Structure + Constraints
RISK RATING: AMBER Transaction Analysis: Volume surged 271% (4,200 to 15,600) over 6 months. Avg txn size flat at $10 suggests organic growth, not fraud. Chargeback Alert: Rate escalated from 0.3% to 4.1%, exceeding the 1.0% benchmark by 4x. Primary driver: "Item not received" (45%). Action: 1) Issue chargeback warning letter (Merchant Ops, 5 days) 2) Reduce PayLater limit to $5 minimum (Risk, immediate)
✓ Structured, data-cited, specific actions with owners and timelines
Module 2
Chain-of-Thought Reasoning
Making AI show its work — step by step
Why Chain-of-Thought?
Financial decisions require multi-step logic. CoT makes reasoning visible and auditable.
Without CoT
"Can this merchant afford a
$50K credit line?"
Answer: "Yes, they can afford it."
(No reasoning shown)
With CoT
"Think through this step by step."
Revenue: $120K
- Costs: $95K = $25K free cash
- Existing debt: $15K
= Available: $10K
New repayment: $5K
DSCR = 2.0x → Affordable
CoT Techniques
Technique
Trigger
Best for
Zero-Shot CoT
"Think step by step"
Quick calculations, simple logic
Few-Shot CoT
Provide example with reasoning
Consistent multi-step processes
Step-Back
"First identify key factors, then analyze"
Complex analysis needing prioritization
Self-Consistency
"Solve 3 ways, report majority"
High-stakes decisions
Finance rule: Any decision that could be audited should use CoT — the reasoning trail is your documentation.
Zero-Shot CoT Example
Just add "Think step by step" to any prompt — no examples needed.
❌ Without CoT
PROMPT: A merchant has revenue $120K, costs $95K, existing loan repayment $15K/month. They want a $50K credit line with $5K monthly repayments. Can they afford it?
AI RESPONSE:
Yes, they can afford it. The merchant has sufficient revenue to cover the additional repayment.
⚠ No math. No reasoning. Not auditable. Could be wrong.
✅ With "Think step by step"
PROMPT: Same data as left. + "Think through this step by step."
Counter-offer: Increase to $750 only (DTI = $1,950/$6,500 = 30.0%).
Key insight: The AI copied your exact Step 1-4 format, applied the same thresholds, and even added a counter-offer — all from one example.
Step-Back Prompting Example
First identify principles, then apply them to the specific case.
Your prompt
PROMPT:
Before analyzing this insurance claim, first answer:
1. What are the 3 most important factors for travel insurance claim validity?
2. What are common red flags in Southeast Asia?
Now assess this claim using those factors:
Claimant: Sarah Tan (SG)
Trip: Bangkok, 3 nights
Claim: $4,200 SGD — laptop stolen from hotel
Filed: 12 days after return
Police report: Filed on last day of trip
Receipt: Online purchase, 2 months before trip
STEP 2 — RED FLAGS:
• Filing delay >7 days • Police report on departure day
• High-value electronics • No hotel incident report
ASSESSMENT: HIGH RISK
• 12-day filing delay ❌
• Police report timing suspicious (last day) ❌
• $4,200 claim for 3-night trip ⚠ Recommend: Request hotel incident report + purchase proof before processing.
Why Step-Back works: The AI built a framework FIRST (factors + red flags), then applied it systematically — instead of jumping to "looks suspicious."
Self-Consistency for High Stakes
Solve 3 ways, report the majority — for decisions where being wrong is costly.
Your prompt
PROMPT:
Investigate this merchant for potential fraud.
Analyze using 3 independent approaches:
1. Velocity patterns (txn frequency vs norms)
2. Amount patterns (deviation from avg size)
3. Geographic patterns (location consistency)
For each: conclude FRAUDULENT / SUSPICIOUS / LEGITIMATE
Final assessment = majority conclusion.
If approaches disagree → flag for human review.
AI response
Approach 1 — Velocity: 340 txns/day vs 75 norm (+353%). Burst pattern at 2-4am. → FRAUDULENT
Approach 2 — Amount: Avg $4.50 vs $15 norm (−70%). Micro-transactions consistent with card testing. → FRAUDULENT
Approach 3 — Geographic: All transactions from Singapore IP. Consistent with merchant location. → LEGITIMATE
MAJORITY: 2/3 FRAUDULENT Recommend: Suspend merchant, escalate to fraud team for full investigation.
Why 3 approaches? Approach 3 alone would have cleared this merchant. The majority vote catches what a single analysis misses.
See the Difference: Loan Decision
❌ Without CoT
Prompt: "Should we approve this PayLater application?"
Yes, I would recommend approving this PayLater application. The customer has a decent income and has been on the platform for a reasonable amount of time. Their payment history is mostly good with only minor issues.
⚠ No reasoning shown, no math, not auditable, "decent" and "reasonable" are not metrics
✅ With Chain-of-Thought
Prompt: "Think through each policy criterion step by step"
Step 1 — Income: $5,200 > $2,000 min ✅ Step 2 — DTI: ($400 loan + $3,000 req) / $5,200 = 65% > 40% max ❌ Step 3 — Late payments: 2 in 8 months ≤ 2 max ✅ Step 4 — History: 8 months > 6 min ✅ Decision: DECLINE. DTI of 65% exceeds 40% policy maximum. Suggest $1,500 counter-offer (DTI = 37%).
✓ Every criterion checked, math shown, auditable, includes counter-offer
Module 3
Role & Persona Prompting
Same data, dramatically different insights — just by changing who the AI "is"
The Persona Formula
You are [TITLE] at [COMPANY TYPE]
with [X years] of experience in [SPECIALTY].
You are known for [CHARACTERISTIC].
When [SITUATION], you always [BEHAVIOR].
Conservative Analyst
"Senior Risk Analyst, 12 years in consumer credit. Known for being cautious. When in doubt, recommends additional verification."
Growth Analyst
"Business Development Analyst, 6 years in merchant acquisition. Known for finding opportunities. Always highlights growth potential alongside risks."
The last two fields matter most: CHARACTERISTIC and BEHAVIOR shape the AI's judgment calls. "Cautious" vs "opportunity-focused" produces completely different recommendations from the same data.
Persona in Action: Same Merchant, Different Eyes
Data: Merchant with 271% volume growth + chargeback rate rising from 0.3% to 4.1%
🛡 Conservative Risk Analyst
RISK RATING: RED — IMMEDIATE ACTION
Primary concern: Chargeback rate at 4.1% is 4x above the 1.0% policy threshold. This trajectory suggests systemic issues, not isolated incidents.
Volume growth concern: 271% growth without proportional infrastructure investment raises fulfillment risk. Rapid scaling often precedes quality collapse.
Recommendation:
• Reduce PayLater limit to minimum immediately
• Issue formal chargeback warning letter
• Place on 30-day enhanced monitoring
• If no improvement → initiate offboarding review
Growth signal: 271% volume increase indicates strong product-market fit. This merchant is scaling fast — a potential top-10 partner within 6 months.
Chargeback context: 4.1% rate is elevated but common during rapid scaling. 45% are "item not received" — a logistics issue, not fraud. Solvable with delivery confirmation.
Recommendation:
• Maintain current limits to support growth
• Offer delivery tracking integration (reduces chargebacks 60%)
• Assign dedicated merchant success manager
• Review in 60 days with chargeback target of 1.5%
Multi-Agent Framing
Get 3 perspectives in one prompt — no need to schedule 3 meetings.
Your prompt
PROMPT:
Evaluate this PayLater policy change (raising limit from $500 to $1,000):
Each: 3 concerns + Recommendation + 1 metric to monitor
Then: Balanced synthesis of all three views.
AI response (synthesis excerpt)
🛡 RISK: "Doubling limits increases exposure by $12M. Default rate likely rises 0.3-0.5pp. Recommend: phased rollout to top-tier customers first."
📊 PRODUCT: "Competitors offer $1,500+. Current $500 limit is #1 reason for churn. Revenue uplift est. $2.4M/quarter."
⚖ COMPLIANCE: "MAS Notice requires affordability assessment above $500. Must add income verification step."
SYNTHESIS: Proceed with phased rollout ($750 first) with income verification. Monitor default rate weekly. Full $1,000 after 90-day review.
Why this works: Forces balanced analysis. No single perspective dominates. The synthesis is where the real insight lives.
Same Data, Different Audiences
Data: "PayLater default rate increased from 2.1% to 2.8% this quarter"
Audience
Persona
Output style
Board
"You are the CFO presenting to the board"
Strategic, 5-minute read
Ops Team
"You are the Ops Manager briefing your team"
Actionable, task-oriented
Regulators
"You are Compliance Head responding to MAS"
Formal, regulation-referenced
Customers
"You are a support specialist"
Simple, empathetic
💡 Practice activity (10 min): Pick the same data point above. Write prompts for 2 different audiences. Compare how the tone, detail level, and recommendations change.
Module 4
Structured Outputs & RAG
JSON extraction, document grounding, and meta-prompting
Why Structure Matters
Unstructured = Conversation
Different every time. Hard to compare. Can't feed into systems. Requires human parsing.
Structured = Form
Consistent format. Comparable across items. Machine-parseable. Scannable by busy stakeholders.
Finance use cases:
Invoice extraction → accounts payable system
Transaction categorization → reconciliation
Complaint classification → route to correct team
KYC document parsing → verification forms
RAG — Grounding in Documents
CRITICAL RULES:
1. Base your answer ONLY on the provided documents
2. After each claim, cite: [Doc Name, Section]
3. If not in documents: "Not available in
provided documents"
4. Do NOT use outside knowledge
DOCUMENTS:
[Doc 1: PayLater Terms v3.2]
[Doc 2: MAS Notice PSN 06]
QUESTION: What are our obligations if a customer
misses 3 consecutive PayLater payments?
Critical for compliance: Without "ONLY on provided documents," the AI mixes its knowledge with yours — you can't trace which claims come from where.
Meta-Prompting
Ask the AI to write prompts for you:
I need a prompt that helps our compliance team
assess whether a new MAS regulatory update
affects our PayLater product.
The prompt should:
- Work with any MAS circular
- Produce a structured impact assessment
- Identify operations that need to change
- Suggest a compliance timeline
Write the best possible prompt for this task.
The AI builds a tool for your team. Use the generated prompt repeatedly — it's a reusable asset.
See the Difference: Policy Q&A
❌ Without RAG Grounding
Prompt: "What happens if a customer misses 3 PayLater payments?"
If a customer misses 3 consecutive PayLater payments, the account is typically suspended and the outstanding balance may be sent to collections. Late fees of around 1-2% per month are usually applied. The customer's credit score will be negatively impacted and they may be barred from future PayLater services.
⚠ Sounds plausible but NONE of this may be in your actual policy. "Typically" and "usually" = hallucination. No citations.
✅ With RAG Grounding
Prompt: "ONLY from provided documents" + policy doc attached
Per the PayLater Terms v3.2:
• After 3 missed payments, the account enters "Default" status [Section 8.3]
• A late fee of $5 or 1.5% (whichever is higher) applies per missed payment [Section 7.1]
• [INSUFFICIENT DATA: The provided document does not specify credit score reporting obligations. Refer to Section 12 or the Credit Bureau Reporting Policy.]
✓ Every claim cites a section. Admits what it doesn't know. No hallucination.
Module 5
Model-Specific Tuning
Choosing the right model and parameters for each task
Model Comparison
Capability
Claude
GPT-4
Gemini
Long documents
200K tokens
128K tokens
1M tokens
Structured output
XML tags
Numbered rules
Schemas
Instruction following
Very precise
Good (may add extras)
Good
Best prompt style
XML + constraints
Numbered rules
Grounding instructions
Finance/compliance
Strong safety
Flexible
Good with grounding
Temperature Guide for Finance
Temperature
Behavior
Use for
0.1
Deterministic, same answer every time
Data extraction, compliance, calculations
0.3
Slightly varied, still focused
Report writing, narratives, summaries
0.7
Creative, diverse outputs
Brainstorming, ideation, drafting
Rule: Never use Temperature > 0.7 for anything that will be audited or submitted to regulators.
Module 6 · NEW
Evaluating Your Prompts
How do you KNOW your prompts are working?
Why Evaluate?
Prompts degrade over time — model updates change behavior
"It looks good" is not a metric — you need measurable quality
Compliance requires evidence that AI outputs meet standards
You need to compare version A vs version B objectively
The problem: Most teams deploy prompts based on "it looked good when I tested it once." That's like shipping software without tests.
Manual Evaluation: Rubrics
Criterion
1 (Poor)
3 (OK)
5 (Excellent)
Completeness
Missing 3+ sections
All sections, some thin
All sections thorough
Data grounding
Unsupported claims
Mostly grounded
Every claim cites data
Actionability
No recommendation
Vague recommendation
Specific actions + owners
Consistency
Different each run
Mostly consistent
Identical structure
Process: Run same prompt 5 times → score each → average = quality score
Run on 10 outputs → compare scores between template versions
A/B Testing Prompts
Process
Same input data, two prompt versions
Run both 10 times each
Score with the judge prompt
Higher average score wins
When to Re-evaluate
After any model update
When users report quality issues
Monthly for production templates
After any template modification
Module 7 · NEW
From Manual Prompts to Automated Tools
You build the template once. The tools do the rest.
The Reality: Nobody Writes Long Prompts Every Day
You learn the techniques → build the template once → let the tools handle the rest.
Phase
What you do
Tool
1. Learn
Master the techniques (today)
Your brain
2. Build
Create a reusable template with {{variables}}
Kiro / any AI chat
3. Optimize
Let AI rewrite your prompt for better performance
Bedrock Prompt Optimization
4. Store & Share
Save versioned templates with metadata
Bedrock Prompt Management
5. Reuse
Fill in variables and run — no rewriting needed
Bedrock Console / API
Bedrock Prompt Management
Your prompt library — stored, versioned, and shared across the team.
Manual (today)
Bedrock Prompt Management
Templates in markdown files
Stored as managed resources
Copy-paste to test
One-click testing across models
No version history
Immutable version snapshots
Manual comparison
Side-by-side model comparison
Share via email/Slack
Shared across team via API
No additional charge — you only pay for model tokens during testing.
Prompt Management: Key Features
Prompt Templates with {{variables}} — same syntax from the exercises. Define variables with descriptions and defaults.
Version Management — every change creates an immutable snapshot. Roll back anytime.
Multi-Model Testing — test across Claude, Nova, Llama side-by-side. Compare quality, latency, cost.
Up to 3 Prompt Variants — compare different versions of the same prompt to find the best performer.
Think of it as: Google Docs for prompts — versioned, shared, and always accessible. But with built-in testing across multiple AI models.
Prompt Optimization (Instructor Demo)
You write a basic prompt. Bedrock rewrites it for better performance — automatically.
Your prompt
"Assess this merchant's risk level"
6 words. No structure, no role, no constraints.
Bedrock's optimized version
"You are a Senior Risk Analyst
specializing in SEA digital payments.
Produce a risk assessment:
1. Rating (GREEN/AMBER/RED)
2. Transaction Pattern Analysis
3. Chargeback Assessment
4. Recommended Actions
Base analysis ONLY on provided data."
Persona + structure + grounding — applied automatically
How Prompt Optimization Works
Step 1: Submit your prompt (even a short, rough one)
Step 2: Bedrock analyzes the prompt components
Step 3: It rewrites with best practices — structure, constraints, model-specific formatting
Step 4: Compare original vs optimized output side-by-side
Step 5: Save the optimized version to your Prompt Management library
GA — April 2025. Supports Claude, Amazon Nova, Meta Llama, DeepSeek, Mistral. The techniques you learned today help you evaluate whether the optimized prompt is actually good.
The Bottom Line
Your concern
The solution
"I don't want to write long prompts every time"
Build the template once → reuse with {{variables}}
"I'm not sure my prompt is good enough"
Prompt Optimization rewrites it automatically
"My team needs to share and version prompts"
Prompt Management stores everything centrally
"Which model gives the best result?"
Multi-model testing compares side-by-side
For developers: Intelligent Prompt Routing auto-selects cheaper models for simple tasks (up to 30% cost savings). Prompt Flows chains prompts into automated workflows. These are covered in Day 3.
Deliverable: Reusable template for APPROVE/CONDITIONS/DECLINE credit narratives
Open the workshop site → Prompt Engineering Exercises
Wrap-up
Best Practices & Prompt Optimization
Common mistakes, optimization strategies, and recovery patterns
7 Prompt Mistakes Everyone Makes
#
Mistake
Why it hurts
Quick fix
1
The Kitchen Sink
Cramming 5 tasks into 1 prompt
One task per prompt, chain results
2
The Blank Canvas
No examples = AI guesses your format
Show 1-2 examples of desired output
3
The Trust Fall
No grounding = confident hallucinations
"ONLY from provided data"
4
The Vague Ask
"Analyze this" — analyze what, how, for whom?
Specify audience, format, length
5
The One-Shot Wonder
Expecting perfection on first try
Plan for 2-3 refinement turns
6
The Copy-Paste Trap
Using the same prompt for different models
Tune syntax per model family
7
The Set-and-Forget
Never re-testing after model updates
Monthly prompt health checks
The Draft-Score-Revise Loop
Don't accept the first output. Build a self-improving cycle into your prompt:
Step 1 — DRAFT: Write a merchant risk summary
using the data provided.
Step 2 — SCORE: Rate your draft on these criteria:
- Completeness (0-5): All required sections?
- Grounding (0-5): Every claim cites data?
- Actionability (0-5): Specific next steps?
Step 3 — REVISE: If total < 12, rewrite to fix
the lowest-scoring area. Max 2 revisions.
Output only the final version.
Result: The AI self-corrects before you even read it. Teams using this pattern report 40-60% fewer revision cycles.
Break Big Tasks into Small Steps
Complex tasks fail when you ask for everything at once. Decompose instead:
❌ One Giant Prompt
"Analyze our Q2 transactions,
identify fraud patterns, calculate
loss exposure, compare to Q1,
draft a board summary, and
recommend 3 prevention measures."
6 tasks = shallow work on each
✅ Chained Prompts
Prompt 1: "Analyze Q2 transactions
and flag anomalies"
Prompt 2: "From these anomalies,
identify the top 3 fraud patterns"
Prompt 3: "Calculate loss exposure
for each pattern"
Prompt 4: "Draft a board summary
with prevention measures"
Each step gets full attention
Tell the AI What NOT to Do
Positive instructions tell the AI what to include. Negative constraints prevent common failure modes:
Problem
Negative constraint to add
AI adds unsolicited opinions
"Do not include personal opinions or speculation"
AI uses data not in your input
"Do not reference any data outside the provided documents"
AI writes too much
"Do not exceed 300 words. Do not add a conclusion section"
AI hedges everything
"Do not use phrases like 'it depends' or 'generally speaking'"
AI explains obvious things
"Do not explain what PayLater is or how digital wallets work"
AI invents numbers
"If a metric is not in the data, write [DATA NOT AVAILABLE]"
Pro tip: After your first test run, note what went wrong and add a "Do NOT" line for each issue. Your prompt improves with every iteration.
Structure Your Prompts Like Documents
Well-organized prompts produce well-organized outputs. Use clear sections and delimiters:
### ROLE
You are a Senior Payment Operations Analyst.
### CONTEXT
<<<
[Paste your transaction data or document here]
>>>
### TASK
Analyze the data for anomalies in Thailand and Vietnam.
### OUTPUT FORMAT
- Executive summary (3 sentences)
- Anomaly table: Market | Type | Severity | Evidence
- Recommended actions (numbered, with owner)
### CONSTRAINTS
- Use ONLY the data provided above
- All amounts in SGD
- Do not exceed 400 words
Why delimiters matter: Without clear separation, the AI may confuse your instructions with your data — especially dangerous when pasting policy documents.
Show, Don't Tell: The Power of Examples
One good example is worth 100 words of instruction:
❌ Telling
"Categorize each transaction as
high risk, medium risk, or low
risk based on amount, frequency,
and merchant type. Format as a
table with columns for transaction
ID, category, and reasoning."
50 words of instruction, AI still guesses your format
✅ Showing
"Categorize transactions like this:
| ID | Risk | Reason |
| T001 | HIGH | $12K single txn,
new merchant, no history |
| T002 | LOW | $45 recurring,
12-month pattern |
Now categorize these: [data]"
One example = perfect format every time
The 3-Round Prompt Improvement Workflow
Every production-quality prompt goes through this cycle:
Round
What you do
What improves
Round 1: Baseline
Write your first prompt using the 4 pillars. Run it 3 times.
You see what the AI gets right and wrong
Round 2: Fix failures
Add negative constraints for each failure. Add an example of good output. Run 3 more times.
Consistency jumps from ~60% to ~85%
Round 3: Polish
Add self-review step. Tighten length/format. Test with edge cases.
Production-ready at ~95% consistency
Total time: 15-20 minutes to go from first draft to production template. That template then saves hours every week.
Build a Team Prompt Library
Your best prompts are team assets, not personal notes. Treat them like shared templates:
What to include
Prompt name and purpose
The full prompt with {{variables}}
Which model and temperature to use
1-2 example outputs (good vs bad)
Known limitations and edge cases
Last tested date and model version
Starter library for finance
Merchant risk assessment
Transaction anomaly detection
Customer complaint classification
Policy document Q&A (RAG)
Board summary generator
Regulatory impact assessment
KYC document extraction
Fraud investigation narrative
Start today: The template you built in the exercise is your first library entry. Share it with your team this week.
Why AI "Gets Dumber" Mid-Conversation
It's not a bug — it's a context window problem. Every AI has a limited "working memory."
What happens inside
Every message you send + every AI response stays in the context window
At 60-70% capacity, performance drops sharply — not gradually, but in sudden cliffs
The AI compresses and deprioritizes earlier messages to make room
"Lost in the Middle" effect: AI remembers the start and end of conversations best, but forgets what's in the middle
What you experience
AI contradicts instructions you gave 10 messages ago
AI re-introduces ideas you already rejected
AI ignores constraints from the start of the chat
Outputs get vague, generic, or repetitive
AI starts "hallucinating" more frequently
Key insight: Most people blame the AI for "getting stupid." The real problem is the conversation got too long. The fix is context management, not a better model.
5 Rules for Managing Long Conversations
#
Rule
Why it works
1
One task per session — don't mix debugging, writing, and analysis in one chat
Each session gets full attention capacity
2
Paste only what's relevant — don't dump entire documents when you need one section
Reduces noise, keeps AI focused on what matters
3
Put key instructions at the start AND end — not buried in the middle
Exploits primacy + recency bias
4
Keep sessions under 15-20 turns — start fresh after that
Stays within the performance sweet spot
5
Use "session summaries" to carry state — ask AI to summarize, then paste into new chat
Fresh context window with all the knowledge
The Session Summary Technique
When a conversation gets too long but you can't lose the state:
Step 1: Ask for a summary
PROMPT (in the old session):
Summarize our conversation so far:
• Key decisions we made
• Data and findings so far
• What we still need to do next
Format as a briefing I can paste into a new session.
Step 2: Start fresh with context
PROMPT (in the new session):
Here is the context from our previous session:
[PASTE SUMMARY HERE]
Continue from where we left off. The next step is to draft the risk committee report based on the findings above.
✓ Fresh context window + all accumulated knowledge = best of both worlds
Think of it as "saving your game." You compress hours of conversation into a focused briefing, then load it into a fresh session with full attention capacity.
The Conversation Funnel
Start broad, then narrow. Each turn builds on context — but keep it focused.
The pattern
Turn 1 (Explore):
"Analyze this month's transaction data — identify top 3 trends"
Turn 2 (Deep-dive):
"Expand on trend #2 — the PayLater chargeback increase"
Turn 3 (Produce):
"Draft a 1-page summary for the risk committee"
Turn 4 (Polish):
"Make the tone more formal and add data citations"
Why it works
Each turn is focused on one thing
You review and correct at each step
Errors don't compound — you catch them early
4 focused turns > 1 massive prompt
When to reset: If Turn 3 goes wrong, don't keep correcting. Start a new session with: "Here's the data and the trend analysis. Draft a risk committee summary."
When to Start Fresh vs. Continue
🟢 Start a New Session
Switching to a completely different task
Conversation has gone off track
Testing a refined prompt cleanly
Session is longer than 15-20 turns
AI keeps repeating the same mistake
AI contradicts earlier instructions
🔵 Continue the Session
Iterating on the same output
Need AI to remember earlier context
Building step by step (funnel pattern)
Refining format or tone
Follow-up questions on same topic
Session is still under 15 turns
The 3-strike rule: If you've corrected the AI 3 times and it's still wrong — the context is working against you. Start fresh. It's faster than fighting a polluted conversation.
Circuit Breaker Patterns
Pattern
Symptom
Fix
Repetition Loop
Same wrong output after correction
New session, rephrase
Hallucination Spiral
Inventing data
"Use ONLY provided data"
Over-Eager Helper
2,000 words for 5 bullets
"Exactly 5 bullets, under 20 words"
Format Drift
Format changes mid-output
"Continue EXACTLY same format"
Confidence Trap
Uncertain info as fact
"Prefix uncertain with [UNCERTAIN]"
Using Kiro for Business Users
Vibe mode: Describe what you want → Kiro writes and runs the code
File context: Drag CSVs, PDFs, JSON into chat
Iterative refinement: "Make the chart bigger" / "Add a percentage column"
New Session per task: Keep context focused
Remember: You don't need to understand the code Kiro writes. You just need to describe what you want clearly — using the 4 pillars from Module 1.
Quick Reference Card
Technique
Trigger Phrase
Zero-Shot CoT
"Think step by step before answering"
Expert Persona
"You are a Senior [ROLE] with X years in [SPECIALTY]"
Multi-Perspective
"Present the case FOR and AGAINST"
Structured Output
"Use EXACTLY these sections: 1... 2... 3..."
RAG Grounding
"Base your answer ONLY on the provided documents"
Self-Critique
"Review: Is every claim supported by data?"
Meta-Prompting
"Write the best prompt for [TASK]"
LLM-as-Judge
"Score this output against these criteria"
Negative Constraints
"Do NOT include / Do NOT use / Do NOT exceed"
Task Decomposition
Break 1 big prompt into 3-4 focused prompts
Draft-Score-Revise
"Draft, then score on [rubric], then revise if < threshold"
Show Don't Tell
Include 1-2 examples of desired output format
Preview
From Prompts to Workflow Automation
Everything you learned today becomes the foundation for autonomous AI agents
Your Prompt Skills = Agent Design Skills
Every technique you learned today maps directly to how AI agents are built:
Day 2: Prompt Technique
Day 3: Agent Component
What it does in an agent
Persona prompting
Agent role definition
Defines who the agent "is" and how it behaves
Structured output
Output contracts
Ensures consistent, usable results
Chain-of-Thought
Reasoning strategy
Agent thinks step-by-step before acting
RAG grounding
Knowledge base
Agent accesses your company's documents
Negative constraints
Guardrails
Prevents the agent from doing things it shouldn't
Prompt template
SKILL.md file
The template becomes a reusable, shareable skill
Key insight: You don't need to code to design an AI agent. You need to write great instructions — which is exactly what you practiced today.
Preview: Templates → Skills → Automation
Tomorrow you'll turn your prompt templates into automated workflows:
Today: Prompt template
You are a Senior Risk Analyst...
Analyze merchant data and produce:
1. Risk Rating (GREEN/AMBER/RED)
2. Transaction Analysis
3. Recommended Actions
✓ Auto-activates, shared, versioned ✓ Works in Kiro AND Claude Cowork
Day 3 covers: Workflow patterns (chaining, parallelization, routing, orchestration), the Kiro stack (steering + skills + hooks), and you'll design an agent for your team's workflow.
Quick preview only — don't go deep. Show the before/after to build excitement for tomorrow. The key message: the template they built today becomes a portable skill file with 4 lines of frontmatter. Day 3 covers the full stack and they'll design a real agent. The callout lists what's coming tomorrow.
The 3-Day Journey
📚
Day 1
"What can AI do?"
Fundamentals, use cases, responsible AI
💬
Day 2 (Today)
"How do I talk to AI?"
Prompt engineering, templates, tools
🤖
Day 3 (Tomorrow)
"How do I make AI work on its own?"
Agentic AI, workflow automation, no code
💡 Homework: What repetitive task does your team do every week that could be automated? Come to Day 3 with a specific workflow — you'll design an AI agent for it.
Day 2 Outcomes
Design prompts using the 4 pillars (Clarity, Context, Role, Output)
Apply Chain-of-Thought and Self-Consistency for financial reasoning
Create expert personas for different audiences
Extract structured data and ground responses in documents
Evaluate prompt quality with rubrics and LLM-as-Judge
Use Bedrock tools to optimize and manage prompts at scale
Manage long conversations and know when to start fresh
Build reusable prompt templates — the foundation for AI agents
Identify a workflow from your team to automate on Day 3
Thank You
Tomorrow: Make AI Work On Its Own
Agentic AI · Workflow Automation · Agent Design · No Coding Required
💡 Homework: Come with a workflow your team does every week that could be automated
AnyCompany Financial Group · Generative & Agentic AI on AWS