๐Ÿ“จ Day 2 Summary

Prompt Engineering Workshop โ€” Thank you for attending!

ModuleTopicKey Takeaway
M1Prompt Fundamentals Deep DiveThe 4 Pillars โ€” Clarity, Context, Role Assignment, Output Framing
M2Chain-of-Thought ReasoningZero-Shot CoT, Few-Shot CoT, Step-Back Prompting, Self-Consistency
M3Role & Persona PromptingExpert personas, multi-agent framing, audience-specific outputs
M4Structured Outputs & RAGJSON extraction, RAG grounding ("ONLY on provided documents"), meta-prompting
M5Model-Specific TuningTemperature guide, model comparison (Claude vs GPT vs Gemini), prompt styles per model
M6Evaluating Your PromptsRubric scoring, LLM-as-Judge, A/B testing prompts
M7Manual Prompts โ†’ Automated ToolsBedrock Prompt Management, Prompt Optimization, reusable templates with {{variables}}
โ€”Continuous AI InteractionsConversation funnel, circuit breaker patterns, when to start fresh vs continue
LabWhat You Built
Lab 1: Invoice ProcessingPDF extraction pipeline โ€” generated realistic invoices, extracted structured data with PyMuPDF, validated against purchase orders, flagged discrepancies, produced professional HTML report
Lab 2: Transaction DashboardWeb-based monitoring dashboard โ€” transaction volumes, reconciliation status, anomaly alerts for AnyCompany Pay
Lab 3: Finance PresentationMonthly Finance Review presentation from CSV data โ€” data-driven slides with charts and corporate branding
Lab 6: Compliance ReportAutomated compliance report generator โ€” scanned transactions against MAS (Singapore), BNM (Malaysia), and OJK (Indonesia) regulatory rules

Prompt Exercise: Merchant Risk Assessment Narrative โ€” iterative 8-step prompt refinement from basic prompt to production-grade template with {{variables}} and LLM-as-Judge scoring.

Zero-Shot CoT
"Think step by step before answering"
Expert Persona
"You are a Senior [ROLE] with X years in [SPECIALTY]"
Multi-Perspective
"Present the case FOR and AGAINST"
Structured Output
"Use EXACTLY these sections: 1... 2... 3..."
RAG Grounding
"Base your answer ONLY on the provided documents"
Self-Critique
"Review: Is every claim supported by data?"
Meta-Prompting
"Write the best prompt for [TASK]"
Length Control
"Under [N] words" / "Exactly [N] bullets"
Self-Consistency
"Solve 3 ways, report majority answer"
LLM-as-Judge
"Score this output against these criteria: [rubric]"

Day 3: Agentic AI & Workflow Automation

  • AI Agents vs Agentic AI โ€” definitions, differences, when to use which
  • The Agentic Loop โ€” Observe โ†’ Plan โ†’ Act โ†’ Reflect (automated self-critique)
  • 4 Workflow Patterns โ€” Chaining, Parallelization, Routing, Orchestration
  • The Kiro Automation Stack โ€” Steering + Skills + Hooks + MCP
  • MCP Lab โ€” Connect Kiro to a merchant database, query in plain English
  • Agent Design Canvas โ€” Design an automated workflow for your team (with LLM-as-Judge scoring & leaderboard)
  • Deliverable โ€” A workflow design you can hand to your tech team for implementation
๐Ÿ“ Homework: Look at the prompt template you built today. Think about: what if this ran automatically every time a new merchant application arrived? That's what we'll design tomorrow.

Questions raised during Day 2, answered with context and practical guidance.

Q12 For Bedrock Guardrails, why don't we just create regex filtering for words we want to block?

Regex works for exact keyword blocking but fails at semantic understanding. Consider these two sentences:

A regex filter on "harm" blocks both. Bedrock Guardrails uses ML classifiers that understand intent and context, not just pattern matching.

ApproachStrengthsWeaknesses
Regex filteringFast, predictable, zero false negatives for exact matchesNo context awareness, high false positives, easily bypassed with synonyms/misspellings
Bedrock GuardrailsUnderstands intent, handles paraphrasing, configurable thresholds, auditableSlight latency, occasional false positives on edge cases
Best practice: Use both. Regex for known exact patterns (account numbers, profanity word lists, PII formats like NRIC). Bedrock Guardrails for nuanced content filtering where context determines whether something is harmful. They complement each other.

Q13 Why not just rely on the built-in guardrails from the LLM/FM itself?

Every foundation model ships with built-in safety training (RLHF). So why add another layer? Three reasons:

  1. Customization: Built-in safety is a black box โ€” you can't adjust thresholds, add company-specific denied topics, or configure PII categories. Bedrock Guardrails lets you define exactly what's allowed for YOUR organization.
  2. Consistency across models: Different models have different built-in filters. Switching from Claude to Nova changes your safety behavior unpredictably. Bedrock Guardrails applies the same rules regardless of which model you use.
  3. Auditability: Built-in safety doesn't log why something was blocked. Bedrock Guardrails logs every intervention with reasons โ€” which you need for MAS/BNM regulatory reporting and internal compliance audits.
Analogy: Built-in guardrails are the seatbelt (always on, generic, one-size-fits-all). Bedrock Guardrails are the airbag system (configurable deployment thresholds, auditable activation records, organization-specific calibration).

Q14 I'm a business user โ€” how do I prompt AI to create Python or JSON for optimization?

You don't need to know Python or JSON syntax. Describe what you want in plain English and let the AI write the technical implementation. The pattern:

  1. Describe the goal: "I need a script that reads invoices and flags discrepancies"
  2. Describe the inputs: "The invoices are PDF files in this folder, POs are in a CSV"
  3. Describe the output: "Generate an HTML report with a summary dashboard showing approved/flagged/escalated"
  4. Describe the rules: "Flag if variance exceeds 2%, escalate if amount exceeds $25,000"

That's exactly what you did in today's labs โ€” you described business requirements in English, Kiro wrote and ran the Python code. Your job is knowing WHAT to build, not HOW to code it.

Pro tip: The more specific your business rules, the better the code. "Flag discrepancies" is vague. "Flag if invoice total differs from PO approved amount by more than 2%" gives the AI exactly what it needs to write correct logic.

Q15 How can we apply steering documents as standard organization rules, like a global policy?

In Kiro, steering files in .kiro/steering/ are loaded into every AI conversation. Here's how to implement organization-wide standards:

  1. Always-on steering: Create a file like company-standards.md with your team's rules (formatting standards, compliance requirements, approved terminology). This loads automatically in every session.
  2. Share across repos: Include the .kiro/ folder in your team's shared repositories. Everyone gets the same AI behavior.
  3. Conditional steering: Use inclusion: fileMatch for context-specific rules โ€” e.g., compliance rules only activate when working on regulatory files.
  4. Manual steering: Use inclusion: manual for heavy reference docs that only load when explicitly referenced with #.
Think of it as: Steering = system prompt that every team member's AI follows automatically. It's the equivalent of a "global policy" that shapes all AI outputs without anyone needing to remember to include it in their prompts.

Q16 Why are we separating JSON files to become variables of our Python code?

Separation of concerns โ€” a fundamental principle that applies to AI workflows too:

FileContainsWho changes it
validation_rules.jsonBusiness rules (thresholds, required fields, limits)Business users / policy owners
invoice_validator.pyLogic (how to apply those rules)Developers / Kiro

Benefits:

Real-world example: Your Singapore team uses rules-sg.json (MAS thresholds), Malaysia team uses rules-my.json (BNM thresholds). Same Python code, different business rules per market.

Q17 If I do self-reflection (AI assessing its own response), what's the impact on tokens and context?

Self-reflection roughly doubles your token usage per interaction:

StepTokens (approx)What happens
Draft generation~N tokensAI produces initial output
Self-evaluation~N tokensAI reviews its own output against criteria
Revision (if needed)~N tokensAI produces improved version
Total~2-3x NFor a 500-token output, expect ~1,000-1,500 total

When to use self-reflection:

Production tip: Implement self-reflection as a two-step chain (generate โ†’ evaluate) rather than a single prompt. This gives you control over when reflection runs โ€” skip it for simple tasks, enable it for high-value outputs.

Q18 Why create Python scripts in exercises instead of just asking Kiro directly?

Three reasons why we generate code rather than asking the LLM to do the work directly:

  1. Deterministic accuracy: Python computes (36900 - 36415) / 36415 = 1.33% correctly every single time. An LLM might hallucinate the math. For financial validation where a penny matters, code is non-negotiable.
  2. Reusability at zero cost: The script runs 1,000 times for free. Asking the LLM 1,000 times costs tokens (~$0.50-2.00 per batch). Write once, run forever.
  3. Auditability: Code produces the same output given the same input โ€” which compliance requires. An LLM might give slightly different answers each time (temperature, model updates).
The pattern: Use AI to write the tool, then use the tool to do the work. AI is the developer; code is the worker. This is the most cost-effective and reliable way to automate finance workflows โ€” and it's exactly what Citizen Developers do with Kiro.