Prompt Engineering Best Practices

The techniques that turn vague AI outputs into production-grade, auditable results โ€” with interactive before/after comparisons and a live prompt builder.

๐Ÿ“– Day 2 Reference โšก Interactive ๐Ÿฆ Finance Examples

๐ŸŽฏ The 4 Pillars of Effective Prompts

80% of prompt quality comes from 4 fundamentals. Master these and every prompt you write will be dramatically better.

๐ŸŽฏ

1. Clarity

Say exactly what you mean. If a colleague would ask "what do you mean?" โ€” your prompt needs work.

๐Ÿ“š

2. Context

Give the AI the background it needs: domain, data, situation, constraints. Without context, it guesses.

๐Ÿ‘ค

3. Role Assignment

Tell the AI who to be. A "risk analyst" focuses on different signals than a "support agent."

๐Ÿ“‹

4. Output Framing

Define what "done" looks like โ€” format, length, structure, style. No framing = unpredictable output.

๐Ÿ“š 4 Types of Context

Context is the most impactful pillar for finance. Skip any type and the output suffers in a specific way:

TypeWhat it tells the AIIf you skip it...Finance example
DomainIndustry, market, business areaGeneric, non-specific answers"In the context of Southeast Asian digital payments..."
DataSpecific numbers, records, documentsAI hallucinates plausible numbers"Here is the merchant's 6-month transaction history: [data]"
SituationalWhy you need this now โ€” the triggerWrong tone, urgency, framing"Preparing for quarterly board review" / "Flagged by monitoring"
ConstraintsRules, limits, requirementsIgnores your standards"All amounts in SGD" / "Follow MAS Notice 626"

๐Ÿ‘€ See the Difference

โŒ Without 4 Pillars

Prompt: "Review this merchant"

The merchant seems to be doing okay. Their transactions have been increasing which is good. There are some chargebacks but that's normal for this type of business. Overall they appear to be a reasonable merchant to work with.

โš ๏ธ Vague, no data cited, no structure, no risk rating, not actionable

โœ… With 4 Pillars

Prompt: Role + Context + Structure + Constraints

RISK RATING: ๐ŸŸก AMBER
Transaction Analysis: Volume surged 271% (4,200โ†’15,600). Avg txn flat at $10 suggests organic growth.
Chargeback Alert: Rate 0.3%โ†’4.1%, exceeding 1.0% benchmark by 4x.
Action: 1) Issue warning letter (Ops, 5 days) 2) Reduce PayLater limit (Risk, immediate)

โœ“ Structured, data-cited, specific actions with owners and timelines

๐Ÿ’ก
Same data, dramatically different quality. The difference is entirely in the prompt โ€” not the AI model.

๐Ÿง  Chain-of-Thought Reasoning

Financial decisions require multi-step logic. CoT makes reasoning visible and auditable โ€” the AI shows its work.

๐Ÿ’ฌ

Zero-Shot CoT

Just add "Think step by step." No examples needed. Best for quick calculations and simple logic.

๐Ÿ“

Few-Shot CoT

Provide one example with reasoning. The AI follows the exact same pattern. Best for consistent processes.

๐Ÿ”ญ

Step-Back

"First identify key factors, then analyze." Forces prioritization before writing. Best for complex analysis.

๐Ÿ—ณ๏ธ

Self-Consistency

"Solve 3 ways, report majority." Multiple approaches catch what a single analysis misses. Best for high-stakes.

๐Ÿ‘€ Zero-Shot CoT: Before & After

Question: "Can this merchant afford a $50K credit line?"

โŒ Without CoT

Yes, they can afford it. The merchant has sufficient revenue to cover the additional repayment.

โš ๏ธ No math. No reasoning. Not auditable. Could be wrong.

โœ… With "Think step by step"

Revenue: $120K
โˆ’ Operating costs: $95K
= Free cash flow: $25K
โˆ’ Existing loan: $15K
= Available: $10K
New repayment: $5K
DSCR = $10K รท $5K = 2.0x
โœ… Affordable โ€” DSCR above 1.25x threshold

โœ“ Every step visible. Math auditable. Conclusion backed by evidence.

๐Ÿ—ณ๏ธ Self-Consistency: 3 Approaches, 1 Answer

For high-stakes decisions, analyze from 3 independent angles and take the majority vote:

ApproachAnalysisConclusion
1. Velocity340 txns/day vs 75 norm (+353%). Burst at 2-4am.๐Ÿ”ด FRAUDULENT
2. AmountAvg $4.50 vs $15 norm (โˆ’70%). Micro-transactions = card testing.๐Ÿ”ด FRAUDULENT
3. GeographicAll from Singapore IP. Consistent with merchant location.๐ŸŸข LEGITIMATE

Majority: 2/3 FRAUDULENT. Approach 3 alone would have cleared this merchant. The majority vote catches what a single analysis misses.

โš ๏ธ
Finance rule: Any decision that could be audited should use CoT โ€” the reasoning trail IS your documentation.

๐Ÿ‘ค Role & Persona Prompting

Same data, dramatically different insights โ€” just by changing who the AI "is." The AI was trained on millions of documents written by different professionals. When you assign a persona, you activate that specific knowledge cluster.

The Persona Formula

You are [TITLE] at [COMPANY TYPE]
with [X years] of experience in [SPECIALTY].
You are known for [CHARACTERISTIC].
When [SITUATION], you always [BEHAVIOR].
๐Ÿ’ก
The last two fields โ€” CHARACTERISTIC and BEHAVIOR โ€” matter most. "Cautious" vs "opportunity-focused" produces completely different recommendations from the same data.

๐Ÿ‘€ Same Merchant, Different Eyes

Data: Merchant with 271% volume growth + chargeback rate rising from 0.3% to 4.1%

๐Ÿ›ก๏ธ Conservative Risk Analyst

RISK RATING: RED โ€” IMMEDIATE ACTION

Chargeback rate at 4.1% is 4x above the 1.0% threshold. 271% growth without infrastructure investment raises fulfillment risk.

Action: Reduce PayLater limit immediately. Issue formal warning. 30-day enhanced monitoring.

๐Ÿ“ˆ Growth Business Analyst

HIGH-GROWTH MERCHANT โ€” MANAGE, DON'T EXIT

271% volume increase = strong product-market fit. Potential top-10 partner within 6 months. 4.1% chargebacks are a logistics issue (45% "item not received"), not fraud.

Action: Maintain limits. Offer delivery tracking integration. Assign dedicated success manager.

Both are valid. The conservative analyst sees risk to mitigate. The growth analyst sees opportunity to capture. Neither is wrong โ€” they serve different audiences.

๐Ÿค Multi-Agent Framing

Get 3 perspectives in one prompt โ€” no need to schedule 3 meetings:

PerspectiveFocusKey finding
๐Ÿ›ก๏ธ Risk ManagerDefault rate, exposure, regulation"Doubling limits increases exposure by $12M"
๐Ÿ“Š Product ManagerAdoption, competition, revenue"Current $500 limit is #1 reason for churn"
โš–๏ธ ComplianceResponsible lending, MAS guidelines"MAS requires affordability assessment above $500"

Synthesis: Proceed with phased rollout ($750 first) with income verification. Monitor default rate weekly. Full $1,000 after 90-day review.

๐Ÿ”
The synthesis is where the real insight lives. No single perspective dominates โ€” the balanced recommendation is stronger than any individual view.

๐Ÿ“‹ Structured Outputs & RAG Grounding

Consistent format + grounded in YOUR data = production-safe outputs.

Why Structure Matters

โŒ Unstructured = Conversation

Different every time. Hard to compare. Can't feed into systems. Requires human parsing.

โœ… Structured = Form

Consistent format. Comparable across items. Machine-parseable. Scannable by busy stakeholders.

XML Tags for Claude

You use Claude daily (Cowork, Cursor, Kiro, Bedrock). Claude responds best to XML-style structure โ€” it's how Anthropic trained the model to parse complex prompts.

PatternTagsWhen to use
Data analysis<data> <task> <format>Analyzing reports, transactions, metrics
Document Q&A<document> <question> <rules>Policy lookups, compliance checks
Multi-step<context> <step1> <step2>Complex workflows, chained analysis
Review<draft> <criteria> <instructions>Reviewing reports, emails, proposals
๐Ÿ’ก
Quick rule: If your prompt has more than 3 sections or includes pasted data โ†’ use XML tags. For simple one-line questions โ†’ plain text is fine.

๐Ÿ”’ RAG Grounding: Only YOUR Data

Without grounding rules, the AI mixes its training data with your documents โ€” you can't trace which claims come from where. In an audit, that's a disaster.

CRITICAL GROUNDING RULES: 1. Base your answer ONLY on the provided documents 2. After each claim, cite: [Doc Name, Section] 3. If not in documents: "Not available in provided documents" 4. Do NOT use outside knowledge
โŒ Without RAG Grounding

Q: "What happens if a customer misses 3 PayLater payments?"

The account is typically suspended and sent to collections. Late fees of around 1-2% per month are usually applied.

โš ๏ธ Sounds plausible but NONE of this may be in your actual policy. "Typically" and "usually" = hallucination.

โœ… With RAG Grounding

Q: Same question + policy doc attached + grounding rules

Per PayLater Terms v3.2:
โ€ข After 3 missed payments โ†’ "Default" status [Section 8.3]
โ€ข Late fee: $5 or 1.5% whichever higher [Section 7.1]
โ€ข [INSUFFICIENT DATA: credit score reporting not in provided document]

โœ“ Every claim cites a section. Admits what it doesn't know. No hallucination.

๐Ÿ”ง Interactive Prompt Builder

Toggle techniques on/off to see how the prompt AND the AI's response change. Watch quality improve as you add each technique.

๐Ÿ“ Your Prompt 0 words
Loading...
๐Ÿค– AI Response
Loading...
๐Ÿ’ก What changed: Toggle techniques above to see how the AI response improves.

๐Ÿ“Š Quality Score

Completeness
2/5
Data Grounding
1/5
Actionability
1/5
Consistency
2/5
6/20
Needs work โ€” toggle more techniques

๐Ÿ” Issues in AI Response

โš ๏ธ 7 Prompt Mistakes Everyone Makes

Recognize these patterns? Fix them with one-line additions to your prompt.

MistakeWhy it hurtsQuick fix
๐Ÿณ The Kitchen SinkCramming 5 tasks into 1 promptOne task per prompt, chain results
๐Ÿ“„ The Blank CanvasNo examples = AI guesses your formatShow 1-2 examples of desired output
๐Ÿ™ˆ The Trust FallNo grounding = confident hallucinations"ONLY from provided data"
๐Ÿ” The Vague Ask"Analyze this" โ€” analyze what, how, for whom?Specify audience, format, length
โฑ๏ธ The One-Shot WonderExpecting perfection on first tryPlan for 2-3 refinement turns
๐Ÿ“‹ The Copy-Paste TrapSame prompt for different modelsTune syntax per model family
โš™๏ธ The Set-and-ForgetNever re-testing after model updatesMonthly prompt health checks

๐Ÿ”„ The 3-Round Improvement Workflow

Every production-quality prompt goes through this cycle:

RoundWhat you doResult
1. BaselineWrite prompt using 4 pillars. Run 3 times.See what AI gets right and wrong (~60% quality)
2. Fix failuresAdd negative constraints + example of good output. Run 3 more.Consistency jumps to ~85%
3. PolishAdd self-review step. Tighten format. Test edge cases.Production-ready at ~95%
๐Ÿ’ก
Total time: 15-20 minutes to go from first draft to production template. That template then saves hours every week.

๐Ÿšซ Tell the AI What NOT to Do

Negative constraints prevent common failure modes:

ProblemAdd this constraint
AI adds unsolicited opinions"Do not include personal opinions or speculation"
AI uses data not in your input"Do not reference any data outside the provided documents"
AI writes too much"Do not exceed 300 words"
AI hedges everything"Do not use phrases like 'it depends' or 'generally speaking'"
AI explains obvious things"Do not explain what PayLater is or how digital wallets work"
AI invents numbers"If a metric is not in the data, write [DATA NOT AVAILABLE]"
๐Ÿ”
Source: Claude's prompting best practices recommend telling Claude what to do instead of what not to do for general instructions, but negative constraints are highly effective for preventing specific failure modes โ€” especially in finance where hallucinated numbers are dangerous. Claude Prompting Best Practices โ†’