Day 1 Lab 3: GenAI Implementation Playbook

Overview

This page provides customized prompts for Lab 3. Follow the same step-by-step instructions in the BuilderLabs page, but use the prompts below instead of the generic "AnyOrganization" ones.

💡 How this works: The BuilderLabs page tells you where to click (navigate Bedrock, select model, upload file, clear chat). This page gives you the customized prompts to paste. Keep both pages open side by side.

Your Use Case File

Download Lab 3 Supporting Files (.zip)

📦 What's in the zip:

0 AnyCompany_UseCase.txt — Upload this file to Bedrock at the start of each task
1 AnyCompany_UseCase_Selection_SAMPLE.txt — Sample output for Document 1
2 AnyCompany_Model_Selection_SAMPLE.txt — Sample output for Document 2
3 AnyCompany_Performance_Improvement_SAMPLE.txt — Sample output for Document 3
4 AnyCompany_Evaluation_Framework_SAMPLE.txt — Sample output for Document 4
5 AnyCompany_Deployment_Plan_SAMPLE.txt — Sample output for Document 5

Instead of uploading the original AnyOrg_UseCase.txt, upload 0 AnyCompany_UseCase.txt from the zip:

📎 Upload: AnyCompany_UseCase.txt

This file describes AnyCompany Financial Group's specific challenges: merchant risk assessment backlog (200+/month, 45 min each), customer support volume (50K tickets/month), compliance across 6 regulators, invoice processing (3,000+/month), and financial reporting cycles.

⏱️ Total time: 45-60 minutes · Tool: Bedrock Playground → Chat mode → Nova Lite 1.0 · Output: 5 markdown documents

Task 1: Use Case Selection Document

Follow BuilderLabs steps 3-23. Use these customized prompts instead of the originals.

1.3

Define the Problem Statement

After uploading AnyCompany_UseCase.txt, enter:

Act as a business analyst at a Southeast Asian fintech company. Write a clear problem statement for implementing generative AI to automate merchant risk assessments at the company described in the attached [Use Case]. The statement should identify current challenges (200+ merchants/month backlog, 45 min per assessment, inconsistent ratings) and the opportunity for AI automation. (200 word maximum).

✅ Review: If the output is too generic, refine it: "In the previous problem statement, please add that the company operates across 6 Southeast Asian markets and must comply with MAS, BNM, and OJK regulations. Restate the whole problem statement."

1.4

Gather Requirements

Based on the problem statement for automating merchant risk assessments at the company described in the [Use Case], list 5 key requirements for the solution. Include both functional requirements (e.g., structured GREEN/AMBER/RED ratings, decision rule enforcement, multi-market support) and non-functional requirements (e.g., latency under 30 seconds, data stays in AWS, MAS audit compliance).

🎓 Temperature tip: Try changing temperature to 0 and re-running the same prompt. Compare the outputs — lower temperature gives more focused, consistent requirements. This connects to the Temperature lesson from Module 1.

1.5

Align Stakeholder Expectations

Provide 5 strategies for aligning stakeholder expectations when implementing an AI-powered merchant risk assessment system at the company described in the [Use Case]. Consider these stakeholder groups: Head of Risk Operations (wants 80% backlog reduction), Compliance team (needs full audit trail for MAS), Merchant Operations (needs actionable recommendations), Citizen Developers (will maintain prompt templates), and Regional teams across 6 SEA markets.

1.6

Identify Key Metrics

List and briefly explain 7 key metrics for evaluating the success of an AI-powered merchant risk assessment system at the company described in the [Use Case]. Include: assessment throughput (target: 50/day), rating accuracy vs human analysts (target: ≥90%), time per assessment (target: <5 min), backlog reduction, cost per assessment (target: <$2 SGD), compliance score (100% audit trail), and false negative rate (<5%).

1.7

Compile the Final Document

Using all the information we've generated about the merchant risk assessment AI project for the company described in the [Use Case], create a structured "Generative AI Use Case Selection" document. Include sections for: Problem Statement, Functional Requirements, Non-functional Requirements, Stakeholder Alignment, Key Metrics, Selection Criteria (with weights), and Organizational Readiness (assess data, infrastructure, skills, culture, governance, budget). Summarize and organize the information we've discussed into a single cohesive document. Add text formatting markdown.

💾 Save as: 1_use_case_selection_<your-name>.md — copy only the last response, not the full conversation.

Task 2: Model Selection Rubric

Clear the chat, re-upload AnyCompany_UseCase.txt, then follow BuilderLabs steps 24-35.

2.1

Identify Selection Factors

List and briefly explain 5 key factors to consider when selecting a foundation model for generating merchant risk assessment narratives at a regulated Southeast Asian fintech. Focus on: task quality (structured output, decision rule adherence), cost efficiency (cost per assessment at ~3,500 tokens), responsible AI (instruction following, bias mitigation), operational fit (Bedrock availability, team familiarity), and inference speed. Reference the attached [Use Case].

2.2

Evaluate Trade-offs

Based on the selection factors we just discussed, explain how you would evaluate trade-offs between Claude Sonnet 4 (highest quality, $3.00/$15.00 per 1M tokens), Amazon Nova Pro (good quality, $0.80/$3.20), and Llama 4 Maverick 17B (lowest cost, $0.22/$0.88). Provide 5 specific trade-offs for the merchant risk assessment use case — for example, quality vs cost for GREEN-rated merchants vs RED-rated merchants.

2.3

Assess Model Capabilities

Describe a step-by-step process for assessing Claude Sonnet 4, Amazon Nova Pro, and Llama 4 Maverick against the merchant risk assessment requirements. Include: how to test structured output consistency (8 required sections), decision rule compliance (GREEN/AMBER/RED thresholds), data grounding (no hallucinated metrics), and cost per assessment calculation.

2.4

Compile the Final Rubric

Based on all the information we've discussed about selecting foundation models for merchant risk assessments, create a detailed "Model Selection Rubric and Recommendation" document. Include: models evaluated (Claude Sonnet 4, Nova Pro, Llama 4 Maverick), scoring rubric (1-5 scale), evaluation categories (Task Performance 30%, Cost 20%, Responsible AI 20%, Architecture 15%, Operational Fit 15%), scoring results table, cost-per-assessment calculation, and a final recommendation with a tiered strategy (e.g., Nova Pro for GREEN, Claude for AMBER/RED). Add text formatting markdown.

💾 Save as: 2_model_selection_<your-name>.md

Task 3: Performance Improvement Plan

Clear the chat, re-upload AnyCompany_UseCase.txt, then follow BuilderLabs steps 36-46.

3.1

Explore Improvement Techniques

List and briefly explain 5 key techniques to improve the quality of AI-generated merchant risk assessments for the company described in the attached [Use Case]. Focus on: prompt engineering (iterative refinement with decision rules), RAG (grounding in company policy documents like Merchant Risk Policy v4.2), LLM-as-Judge evaluation, few-shot examples from historical assessments, and XML tag structure for Claude.

3.2

Plan for Prompt Engineering

Provide a step-by-step plan for iteratively improving the merchant risk assessment prompt. Start from a basic "assess this merchant" prompt and show 7 refinement steps: (1) add persona, (2) add structured output (8 sections), (3) add decision rules (RED if chargeback >3%, AMBER if >1%, GREEN otherwise), (4) add constraints (ONLY provided data, SGD currency), (5) add self-review step, (6) validate with LLM-as-Judge, (7) A/B test against previous version. Include expected quality improvement at each step.

3.3

Plan RAG Implementation

Explain how Retrieval Augmented Generation (RAG) can be used to ground merchant risk assessments in AnyCompany's actual policy documents. Specify: which documents to index (Merchant Risk Policy v4.2, PayLater Merchant Terms, MAS Notice PSN 06, Chargeback Threshold Guidelines), how to chunk and retrieve relevant sections, and how to enforce citation rules ("After each recommendation, cite [Policy Name, Section X.X]"). (200 words maximum)

3.4

Compile the Final Plan

Using all the information we've generated about improving merchant risk assessment quality, create a "Performance Improvement Plan" document. Include sections for: Prompt Engineering Strategy (7-step refinement), RAG Implementation (policy documents, retrieval, citation rules), Evaluation Loop (LLM-as-Judge rubric scoring 5 criteria, A/B testing), Continuous Improvement (monthly cycle, trigger-based updates), and Expected Improvement (baseline 12/25 → target 19/25 in 4 weeks). Add text formatting markdown.

💾 Save as: 3_performance_plan_<your-name>.md

Task 4: Evaluation Framework

Clear the chat, re-upload AnyCompany_UseCase.txt, then follow BuilderLabs steps 47-58.

4.1

Define Evaluation Metrics

Suggest 7 key metrics for evaluating AI-generated merchant risk assessments at the company described in the attached [Use Case]. Include quality metrics (completeness, data grounding, actionability, analytical depth, rating justification — each scored 1-5), accuracy metrics (agreement rate with human analysts, false negative rate), and operational metrics (latency, cost per assessment). Explain why each metric matters for a regulated fintech.

4.2

Design Test Dataset

Design a 100-case test dataset for validating AI-generated merchant risk assessments. Specify: composition (40 GREEN, 30 AMBER, 20 RED, 10 edge cases), data sources (historical assessments from past 12 months, anonymized), diversity requirements (all 6 SEA markets, multiple sectors — F&B, retail, e-commerce, travel), and edge cases to include (borderline ratings, new merchants with limited data, seasonal patterns, cross-border merchants, improving trends).

4.3

Plan Human Evaluation

Describe a human evaluation process for validating AI-generated merchant risk assessments. Include: evaluator panel (2 Senior Risk Analysts, 1 Compliance Officer, 1 Merchant Operations lead), weekly sample size (20 assessments), blind review process, scoring rubric alignment with automated metrics, disagreement resolution (weekly 30-min review meeting), and calibration sessions (monthly, target Cohen's kappa ≥0.8).

4.4

Compile the Final Framework

Using all the information we've generated about evaluating merchant risk assessments, create a "Foundation Model Evaluation Framework" document. Include sections for: Evaluation Metrics (quality rubric, accuracy, operational), Test Dataset Design (composition, diversity, edge cases), Automated Evaluation (LLM-as-Judge pipeline, batch schedule), Human Evaluation (panel, process, calibration), and Continuous Monitoring (real-time alerts, monthly dashboard, quarterly review). Production threshold: total score ≥19/25. Add text formatting markdown.

💾 Save as: 4_evaluation_framework_<your-name>.md

Task 5: Deployment Checklist & Plan

Clear the chat, re-upload AnyCompany_UseCase.txt, then follow BuilderLabs steps 59-72.

5.1

Pre-Deployment Checklist

Create a pre-deployment checklist for an AI merchant risk assessment system at a regulated Southeast Asian fintech described in the attached [Use Case]. Organize into 4 categories: Infrastructure (Bedrock access, data warehouse connection, S3 storage with 7-year retention), Security (IAM least-privilege, VPC endpoint, encryption, no PII in prompts), Compliance (MAS notification, audit trail design, human-in-the-loop for AMBER/RED), and Quality (prompt template finalized, 100-case validation passed ≥90% accuracy, edge cases documented). Use ✅/⬜ status indicators.

5.2

Deployment Phases

Design a 3-phase deployment plan for the merchant risk assessment AI system: Phase 1 — Shadow Mode (weeks 1-2): AI runs in parallel with human analysts, no action taken on AI output, track agreement rate daily, exit when ≥90% agreement for 5 consecutive days. Phase 2 — Assisted Mode (weeks 3-4): AI generates draft, human reviews and approves/modifies, track modification rate and time savings. Phase 3 — Production Mode (week 5+): GREEN auto-filed with monthly batch review, AMBER queued for 48-hour human review, RED immediate alert for 4-hour review. Include exit criteria for each phase.

5.3

Rollback & Monitoring

Define rollback triggers and post-deployment monitoring for the merchant risk assessment AI system. Rollback triggers: agreement rate drops below 80% (revert to Shadow Mode), false negative detected (pause auto-filing), model service outage (manual process resumes), regulatory concern raised (pause all AI assessments within 30 minutes). Monitoring: daily automated checks (success rate, judge scores, latency, cost), weekly human review (20 random assessments), monthly report to leadership (savings, quality trends, override rate).

5.4

Compile the Final Plan

Using all the information we've generated about deploying the merchant risk assessment AI system, create a "Deployment Checklist and Plan" document. Include sections for: Pre-Deployment Checklist (infrastructure, security, compliance, quality — with status indicators), Deployment Phases (Shadow → Assisted → Production with exit criteria), Rollback Plan (triggers and procedures), Post-Deployment Monitoring (daily/weekly/monthly), and 90-Day Success Metrics (compare baseline vs target: assessments/day 5→50, time 45min→5min, backlog 200→20, cost $35→$2, consistency 70%→95%). Add text formatting markdown.

💾 Save as: 5_deployment_plan_<your-name>.md

Wrap-Up

You now have a 5-document GenAI Implementation Playbook customized for AnyCompany Financial Group's merchant risk assessment use case.

Document	Key sections	Filename
1. Use Case Selection	Problem, requirements, stakeholders, metrics, readiness	`1_use_case_selection.md`
2. Model Selection	Rubric, scoring, cost analysis, recommendation	`2_model_selection.md`
3. Performance Plan	Prompt engineering, RAG, evaluation loop	`3_performance_plan.md`
4. Evaluation Framework	Metrics, test dataset, human + automated evaluation	`4_evaluation_framework.md`
5. Deployment Plan	Checklist, 3 phases, rollback, 90-day targets	`5_deployment_plan.md`

🔑 Key takeaway: You used AI to create a professional 5-document playbook in under an hour. The quality depended entirely on how well you prompted — which is exactly what Day 2 will teach you to master.

🏆 Lab 3: GenAI Implementation Playbook

Overview

Your Use Case File

📎 Upload: AnyCompany_UseCase.txt

Task 1: Use Case Selection Document

Define the Problem Statement

Gather Requirements

Align Stakeholder Expectations

Identify Key Metrics

Compile the Final Document

Task 2: Model Selection Rubric

Identify Selection Factors

Evaluate Trade-offs

Assess Model Capabilities

Compile the Final Rubric

Task 3: Performance Improvement Plan

Explore Improvement Techniques

Plan for Prompt Engineering

Plan RAG Implementation

Compile the Final Plan

Task 4: Evaluation Framework

Define Evaluation Metrics

Design Test Dataset

Plan Human Evaluation

Compile the Final Framework

Task 5: Deployment Checklist & Plan

Pre-Deployment Checklist

Deployment Phases

Rollback & Monitoring

Compile the Final Plan

Wrap-Up