Day 2 Summary — Prompt Engineering Workshop

📎 Resources

📚 AWS Skill BuilderOfficial student guide & lab access 🌐 Workshop SiteCustom materials, labs & explainers ☁️ Workshop StudioLab environment (3 days)

📋 What We Covered Today

Module	Topic	Key Takeaway
M1	Prompt Fundamentals Deep Dive	The 4 Pillars — Clarity, Context, Role Assignment, Output Framing
M2	Chain-of-Thought Reasoning	Zero-Shot CoT, Few-Shot CoT, Step-Back Prompting, Self-Consistency
M3	Role & Persona Prompting	Expert personas, multi-agent framing, audience-specific outputs
M4	Structured Outputs & RAG	JSON extraction, RAG grounding ("ONLY on provided documents"), meta-prompting
M5	Model-Specific Tuning	Temperature guide, model comparison (Claude vs GPT vs Gemini), prompt styles per model
M6	Evaluating Your Prompts	Rubric scoring, LLM-as-Judge, A/B testing prompts
M7	Manual Prompts → Automated Tools	Bedrock Prompt Management, Prompt Optimization, reusable templates with {{variables}}
—	Continuous AI Interactions	Conversation funnel, circuit breaker patterns, when to start fresh vs continue

🧪 Hands-On Labs Completed

Lab	What You Built
Lab 1: Invoice Processing	PDF extraction pipeline — generated realistic invoices, extracted structured data with PyMuPDF, validated against purchase orders, flagged discrepancies, produced professional HTML report
Lab 2: Transaction Dashboard	Web-based monitoring dashboard — transaction volumes, reconciliation status, anomaly alerts for AnyCompany Pay
Lab 3: Finance Presentation	Monthly Finance Review presentation from CSV data — data-driven slides with charts and corporate branding
Lab 6: Compliance Report	Automated compliance report generator — scanned transactions against MAS (Singapore), BNM (Malaysia), and OJK (Indonesia) regulatory rules

Prompt Exercise: Merchant Risk Assessment Narrative — iterative 8-step prompt refinement from basic prompt to production-grade template with {{variables}} and LLM-as-Judge scoring.

⚡ Key Techniques Reference

Zero-Shot CoT

"Think step by step before answering"

Expert Persona

"You are a Senior [ROLE] with X years in [SPECIALTY]"

Multi-Perspective

"Present the case FOR and AGAINST"

Structured Output

"Use EXACTLY these sections: 1... 2... 3..."

RAG Grounding

"Base your answer ONLY on the provided documents"

Self-Critique

"Review: Is every claim supported by data?"

Meta-Prompting

"Write the best prompt for [TASK]"

Length Control

"Under [N] words" / "Exactly [N] bullets"

Self-Consistency

"Solve 3 ways, report majority answer"

LLM-as-Judge

"Score this output against these criteria: [rubric]"

🎯 Interactive Explainers (Self-Paced Reference)

✍️ Prompt Engineering Explainer6-section interactive reference with Prompt Builder 🔗 RAG Pipeline Explainer3D vector space, retrieval flow, grounding visualization

🔮 Looking Ahead

Day 3: Agentic AI & Workflow Automation

AI Agents vs Agentic AI — definitions, differences, when to use which
The Agentic Loop — Observe → Plan → Act → Reflect (automated self-critique)
4 Workflow Patterns — Chaining, Parallelization, Routing, Orchestration
The Kiro Automation Stack — Steering + Skills + Hooks + MCP
MCP Lab — Connect Kiro to a merchant database, query in plain English
Agent Design Canvas — Design an automated workflow for your team (with LLM-as-Judge scoring & leaderboard)
Deliverable — A workflow design you can hand to your tech team for implementation

📝 Homework: Look at the prompt template you built today. Think about: what if this ran automatically every time a new merchant application arrived? That's what we'll design tomorrow.

💬 Participant Q&A — Day 2 Discussions

Questions raised during Day 2, answered with context and practical guidance.

Q12 For Bedrock Guardrails, why don't we just create regex filtering for words we want to block?

Regex works for exact keyword blocking but fails at semantic understanding. Consider these two sentences:

"I want to harm myself" — should be blocked (self-harm intent)
"The product could harm our reputation" — should NOT be blocked (business discussion)

A regex filter on "harm" blocks both. Bedrock Guardrails uses ML classifiers that understand intent and context, not just pattern matching.

Approach	Strengths	Weaknesses
Regex filtering	Fast, predictable, zero false negatives for exact matches	No context awareness, high false positives, easily bypassed with synonyms/misspellings
Bedrock Guardrails	Understands intent, handles paraphrasing, configurable thresholds, auditable	Slight latency, occasional false positives on edge cases

Best practice: Use both. Regex for known exact patterns (account numbers, profanity word lists, PII formats like NRIC). Bedrock Guardrails for nuanced content filtering where context determines whether something is harmful. They complement each other.

Q13 Why not just rely on the built-in guardrails from the LLM/FM itself?

Every foundation model ships with built-in safety training (RLHF). So why add another layer? Three reasons:

Customization: Built-in safety is a black box — you can't adjust thresholds, add company-specific denied topics, or configure PII categories. Bedrock Guardrails lets you define exactly what's allowed for YOUR organization.
Consistency across models: Different models have different built-in filters. Switching from Claude to Nova changes your safety behavior unpredictably. Bedrock Guardrails applies the same rules regardless of which model you use.
Auditability: Built-in safety doesn't log why something was blocked. Bedrock Guardrails logs every intervention with reasons — which you need for MAS/BNM regulatory reporting and internal compliance audits.

Analogy: Built-in guardrails are the seatbelt (always on, generic, one-size-fits-all). Bedrock Guardrails are the airbag system (configurable deployment thresholds, auditable activation records, organization-specific calibration).

Q14 I'm a business user — how do I prompt AI to create Python or JSON for optimization?

You don't need to know Python or JSON syntax. Describe what you want in plain English and let the AI write the technical implementation. The pattern:

Describe the goal: "I need a script that reads invoices and flags discrepancies"
Describe the inputs: "The invoices are PDF files in this folder, POs are in a CSV"
Describe the output: "Generate an HTML report with a summary dashboard showing approved/flagged/escalated"
Describe the rules: "Flag if variance exceeds 2%, escalate if amount exceeds $25,000"

That's exactly what you did in today's labs — you described business requirements in English, Kiro wrote and ran the Python code. Your job is knowing WHAT to build, not HOW to code it.

Pro tip: The more specific your business rules, the better the code. "Flag discrepancies" is vague. "Flag if invoice total differs from PO approved amount by more than 2%" gives the AI exactly what it needs to write correct logic.

Q15 How can we apply steering documents as standard organization rules, like a global policy?

In Kiro, steering files in .kiro/steering/ are loaded into every AI conversation. Here's how to implement organization-wide standards:

Always-on steering: Create a file like company-standards.md with your team's rules (formatting standards, compliance requirements, approved terminology). This loads automatically in every session.
Share across repos: Include the .kiro/ folder in your team's shared repositories. Everyone gets the same AI behavior.
Conditional steering: Use inclusion: fileMatch for context-specific rules — e.g., compliance rules only activate when working on regulatory files.
Manual steering: Use inclusion: manual for heavy reference docs that only load when explicitly referenced with #.

Think of it as: Steering = system prompt that every team member's AI follows automatically. It's the equivalent of a "global policy" that shapes all AI outputs without anyone needing to remember to include it in their prompts.

Q16 Why are we separating JSON files to become variables of our Python code?

Separation of concerns — a fundamental principle that applies to AI workflows too:

File	Contains	Who changes it
`validation_rules.json`	Business rules (thresholds, required fields, limits)	Business users / policy owners
`invoice_validator.py`	Logic (how to apply those rules)	Developers / Kiro

Benefits:

Business users can update rules (change variance from 2% to 5%) without touching code
Different environments can use different rules (stricter for production, relaxed for testing)
Rules are auditable — you can track who changed what threshold and when
Same code works across teams with different policies — just swap the JSON file

Real-world example: Your Singapore team uses rules-sg.json (MAS thresholds), Malaysia team uses rules-my.json (BNM thresholds). Same Python code, different business rules per market.

Q17 If I do self-reflection (AI assessing its own response), what's the impact on tokens and context?

Self-reflection roughly doubles your token usage per interaction:

Step	Tokens (approx)	What happens
Draft generation	~N tokens	AI produces initial output
Self-evaluation	~N tokens	AI reviews its own output against criteria
Revision (if needed)	~N tokens	AI produces improved version
Total	~2-3x N	For a 500-token output, expect ~1,000-1,500 total

When to use self-reflection:

✅ High-stakes outputs — compliance reports, risk assessments, customer-facing content
✅ Complex multi-step reasoning — where errors compound
❌ Simple extraction or classification — speed matters more than perfection
❌ Internal drafts — where a human will review anyway

Production tip: Implement self-reflection as a two-step chain (generate → evaluate) rather than a single prompt. This gives you control over when reflection runs — skip it for simple tasks, enable it for high-value outputs.

Q18 Why create Python scripts in exercises instead of just asking Kiro directly?

Three reasons why we generate code rather than asking the LLM to do the work directly:

Deterministic accuracy: Python computes (36900 - 36415) / 36415 = 1.33% correctly every single time. An LLM might hallucinate the math. For financial validation where a penny matters, code is non-negotiable.
Reusability at zero cost: The script runs 1,000 times for free. Asking the LLM 1,000 times costs tokens (~$0.50-2.00 per batch). Write once, run forever.
Auditability: Code produces the same output given the same input — which compliance requires. An LLM might give slightly different answers each time (temperature, model updates).

The pattern: Use AI to write the tool, then use the tool to do the work. AI is the developer; code is the worker. This is the most cost-effective and reliable way to automate finance workflows — and it's exactly what Citizen Developers do with Kiro.

📨 Day 2 Summary