๐ What We Covered Today
| Module | Topic | Key Takeaway |
| M1 | Prompt Fundamentals Deep Dive | The 4 Pillars โ Clarity, Context, Role Assignment, Output Framing |
| M2 | Chain-of-Thought Reasoning | Zero-Shot CoT, Few-Shot CoT, Step-Back Prompting, Self-Consistency |
| M3 | Role & Persona Prompting | Expert personas, multi-agent framing, audience-specific outputs |
| M4 | Structured Outputs & RAG | JSON extraction, RAG grounding ("ONLY on provided documents"), meta-prompting |
| M5 | Model-Specific Tuning | Temperature guide, model comparison (Claude vs GPT vs Gemini), prompt styles per model |
| M6 | Evaluating Your Prompts | Rubric scoring, LLM-as-Judge, A/B testing prompts |
| M7 | Manual Prompts โ Automated Tools | Bedrock Prompt Management, Prompt Optimization, reusable templates with {{variables}} |
| โ | Continuous AI Interactions | Conversation funnel, circuit breaker patterns, when to start fresh vs continue |
๐งช Hands-On Labs Completed
| Lab | What You Built |
| Lab 1: Invoice Processing | PDF extraction pipeline โ generated realistic invoices, extracted structured data with PyMuPDF, validated against purchase orders, flagged discrepancies, produced professional HTML report |
| Lab 2: Transaction Dashboard | Web-based monitoring dashboard โ transaction volumes, reconciliation status, anomaly alerts for AnyCompany Pay |
| Lab 3: Finance Presentation | Monthly Finance Review presentation from CSV data โ data-driven slides with charts and corporate branding |
| Lab 6: Compliance Report | Automated compliance report generator โ scanned transactions against MAS (Singapore), BNM (Malaysia), and OJK (Indonesia) regulatory rules |
Prompt Exercise: Merchant Risk Assessment Narrative โ iterative 8-step prompt refinement from basic prompt to production-grade template with {{variables}} and LLM-as-Judge scoring.
โก Key Techniques Reference
Zero-Shot CoT
"Think step by step before answering"
Expert Persona
"You are a Senior [ROLE] with X years in [SPECIALTY]"
Multi-Perspective
"Present the case FOR and AGAINST"
Structured Output
"Use EXACTLY these sections: 1... 2... 3..."
RAG Grounding
"Base your answer ONLY on the provided documents"
Self-Critique
"Review: Is every claim supported by data?"
Meta-Prompting
"Write the best prompt for [TASK]"
Length Control
"Under [N] words" / "Exactly [N] bullets"
Self-Consistency
"Solve 3 ways, report majority answer"
LLM-as-Judge
"Score this output against these criteria: [rubric]"
๐ฏ Interactive Explainers (Self-Paced Reference)
๐ฎ Looking Ahead
Day 3: Agentic AI & Workflow Automation
- AI Agents vs Agentic AI โ definitions, differences, when to use which
- The Agentic Loop โ Observe โ Plan โ Act โ Reflect (automated self-critique)
- 4 Workflow Patterns โ Chaining, Parallelization, Routing, Orchestration
- The Kiro Automation Stack โ Steering + Skills + Hooks + MCP
- MCP Lab โ Connect Kiro to a merchant database, query in plain English
- Agent Design Canvas โ Design an automated workflow for your team (with LLM-as-Judge scoring & leaderboard)
- Deliverable โ A workflow design you can hand to your tech team for implementation
๐ Homework: Look at the prompt template you built today. Think about: what if this ran automatically every time a new merchant application arrived? That's what we'll design tomorrow.
๐ฌ Participant Q&A โ Day 2 Discussions
Questions raised during Day 2, answered with context and practical guidance.
Q12 For Bedrock Guardrails, why don't we just create regex filtering for words we want to block?
Regex works for exact keyword blocking but fails at semantic understanding. Consider these two sentences:
- "I want to harm myself" โ should be blocked (self-harm intent)
- "The product could harm our reputation" โ should NOT be blocked (business discussion)
A regex filter on "harm" blocks both. Bedrock Guardrails uses ML classifiers that understand intent and context, not just pattern matching.
| Approach | Strengths | Weaknesses |
| Regex filtering | Fast, predictable, zero false negatives for exact matches | No context awareness, high false positives, easily bypassed with synonyms/misspellings |
| Bedrock Guardrails | Understands intent, handles paraphrasing, configurable thresholds, auditable | Slight latency, occasional false positives on edge cases |
Best practice: Use both. Regex for known exact patterns (account numbers, profanity word lists, PII formats like NRIC). Bedrock Guardrails for nuanced content filtering where context determines whether something is harmful. They complement each other.
Q13 Why not just rely on the built-in guardrails from the LLM/FM itself?
Every foundation model ships with built-in safety training (RLHF). So why add another layer? Three reasons:
- Customization: Built-in safety is a black box โ you can't adjust thresholds, add company-specific denied topics, or configure PII categories. Bedrock Guardrails lets you define exactly what's allowed for YOUR organization.
- Consistency across models: Different models have different built-in filters. Switching from Claude to Nova changes your safety behavior unpredictably. Bedrock Guardrails applies the same rules regardless of which model you use.
- Auditability: Built-in safety doesn't log why something was blocked. Bedrock Guardrails logs every intervention with reasons โ which you need for MAS/BNM regulatory reporting and internal compliance audits.
Analogy: Built-in guardrails are the seatbelt (always on, generic, one-size-fits-all). Bedrock Guardrails are the airbag system (configurable deployment thresholds, auditable activation records, organization-specific calibration).
Q14 I'm a business user โ how do I prompt AI to create Python or JSON for optimization?
You don't need to know Python or JSON syntax. Describe what you want in plain English and let the AI write the technical implementation. The pattern:
- Describe the goal: "I need a script that reads invoices and flags discrepancies"
- Describe the inputs: "The invoices are PDF files in this folder, POs are in a CSV"
- Describe the output: "Generate an HTML report with a summary dashboard showing approved/flagged/escalated"
- Describe the rules: "Flag if variance exceeds 2%, escalate if amount exceeds $25,000"
That's exactly what you did in today's labs โ you described business requirements in English, Kiro wrote and ran the Python code. Your job is knowing WHAT to build, not HOW to code it.
Pro tip: The more specific your business rules, the better the code. "Flag discrepancies" is vague. "Flag if invoice total differs from PO approved amount by more than 2%" gives the AI exactly what it needs to write correct logic.
Q15 How can we apply steering documents as standard organization rules, like a global policy?
In Kiro, steering files in .kiro/steering/ are loaded into every AI conversation. Here's how to implement organization-wide standards:
- Always-on steering: Create a file like
company-standards.md with your team's rules (formatting standards, compliance requirements, approved terminology). This loads automatically in every session.
- Share across repos: Include the
.kiro/ folder in your team's shared repositories. Everyone gets the same AI behavior.
- Conditional steering: Use
inclusion: fileMatch for context-specific rules โ e.g., compliance rules only activate when working on regulatory files.
- Manual steering: Use
inclusion: manual for heavy reference docs that only load when explicitly referenced with #.
Think of it as: Steering = system prompt that every team member's AI follows automatically. It's the equivalent of a "global policy" that shapes all AI outputs without anyone needing to remember to include it in their prompts.
Q16 Why are we separating JSON files to become variables of our Python code?
Separation of concerns โ a fundamental principle that applies to AI workflows too:
| File | Contains | Who changes it |
validation_rules.json | Business rules (thresholds, required fields, limits) | Business users / policy owners |
invoice_validator.py | Logic (how to apply those rules) | Developers / Kiro |
Benefits:
- Business users can update rules (change variance from 2% to 5%) without touching code
- Different environments can use different rules (stricter for production, relaxed for testing)
- Rules are auditable โ you can track who changed what threshold and when
- Same code works across teams with different policies โ just swap the JSON file
Real-world example: Your Singapore team uses rules-sg.json (MAS thresholds), Malaysia team uses rules-my.json (BNM thresholds). Same Python code, different business rules per market.
Q17 If I do self-reflection (AI assessing its own response), what's the impact on tokens and context?
Self-reflection roughly doubles your token usage per interaction:
| Step | Tokens (approx) | What happens |
| Draft generation | ~N tokens | AI produces initial output |
| Self-evaluation | ~N tokens | AI reviews its own output against criteria |
| Revision (if needed) | ~N tokens | AI produces improved version |
| Total | ~2-3x N | For a 500-token output, expect ~1,000-1,500 total |
When to use self-reflection:
- โ
High-stakes outputs โ compliance reports, risk assessments, customer-facing content
- โ
Complex multi-step reasoning โ where errors compound
- โ Simple extraction or classification โ speed matters more than perfection
- โ Internal drafts โ where a human will review anyway
Production tip: Implement self-reflection as a two-step chain (generate โ evaluate) rather than a single prompt. This gives you control over when reflection runs โ skip it for simple tasks, enable it for high-value outputs.
Q18 Why create Python scripts in exercises instead of just asking Kiro directly?
Three reasons why we generate code rather than asking the LLM to do the work directly:
- Deterministic accuracy: Python computes
(36900 - 36415) / 36415 = 1.33% correctly every single time. An LLM might hallucinate the math. For financial validation where a penny matters, code is non-negotiable.
- Reusability at zero cost: The script runs 1,000 times for free. Asking the LLM 1,000 times costs tokens (~$0.50-2.00 per batch). Write once, run forever.
- Auditability: Code produces the same output given the same input โ which compliance requires. An LLM might give slightly different answers each time (temperature, model updates).
The pattern: Use AI to write the tool, then use the tool to do the work. AI is the developer; code is the worker. This is the most cost-effective and reliable way to automate finance workflows โ and it's exactly what Citizen Developers do with Kiro.