The techniques that turn vague AI outputs into production-grade, auditable results โ with interactive before/after comparisons and a live prompt builder.
80% of prompt quality comes from 4 fundamentals. Master these and every prompt you write will be dramatically better.
Say exactly what you mean. If a colleague would ask "what do you mean?" โ your prompt needs work.
Give the AI the background it needs: domain, data, situation, constraints. Without context, it guesses.
Tell the AI who to be. A "risk analyst" focuses on different signals than a "support agent."
Define what "done" looks like โ format, length, structure, style. No framing = unpredictable output.
Context is the most impactful pillar for finance. Skip any type and the output suffers in a specific way:
| Type | What it tells the AI | If you skip it... | Finance example |
|---|---|---|---|
| Domain | Industry, market, business area | Generic, non-specific answers | "In the context of Southeast Asian digital payments..." |
| Data | Specific numbers, records, documents | AI hallucinates plausible numbers | "Here is the merchant's 6-month transaction history: [data]" |
| Situational | Why you need this now โ the trigger | Wrong tone, urgency, framing | "Preparing for quarterly board review" / "Flagged by monitoring" |
| Constraints | Rules, limits, requirements | Ignores your standards | "All amounts in SGD" / "Follow MAS Notice 626" |
Prompt: "Review this merchant"
The merchant seems to be doing okay. Their transactions have been increasing which is good. There are some chargebacks but that's normal for this type of business. Overall they appear to be a reasonable merchant to work with.
โ ๏ธ Vague, no data cited, no structure, no risk rating, not actionable
Prompt: Role + Context + Structure + Constraints
RISK RATING: ๐ก AMBER
Transaction Analysis: Volume surged 271% (4,200โ15,600). Avg txn flat at $10 suggests organic growth.
Chargeback Alert: Rate 0.3%โ4.1%, exceeding 1.0% benchmark by 4x.
Action: 1) Issue warning letter (Ops, 5 days) 2) Reduce PayLater limit (Risk, immediate)
โ Structured, data-cited, specific actions with owners and timelines
Financial decisions require multi-step logic. CoT makes reasoning visible and auditable โ the AI shows its work.
Just add "Think step by step." No examples needed. Best for quick calculations and simple logic.
Provide one example with reasoning. The AI follows the exact same pattern. Best for consistent processes.
"First identify key factors, then analyze." Forces prioritization before writing. Best for complex analysis.
"Solve 3 ways, report majority." Multiple approaches catch what a single analysis misses. Best for high-stakes.
Question: "Can this merchant afford a $50K credit line?"
Yes, they can afford it. The merchant has sufficient revenue to cover the additional repayment.
โ ๏ธ No math. No reasoning. Not auditable. Could be wrong.
Revenue: $120K
โ Operating costs: $95K
= Free cash flow: $25K
โ Existing loan: $15K
= Available: $10K
New repayment: $5K
DSCR = $10K รท $5K = 2.0x
โ
Affordable โ DSCR above 1.25x threshold
โ Every step visible. Math auditable. Conclusion backed by evidence.
For high-stakes decisions, analyze from 3 independent angles and take the majority vote:
| Approach | Analysis | Conclusion |
|---|---|---|
| 1. Velocity | 340 txns/day vs 75 norm (+353%). Burst at 2-4am. | ๐ด FRAUDULENT |
| 2. Amount | Avg $4.50 vs $15 norm (โ70%). Micro-transactions = card testing. | ๐ด FRAUDULENT |
| 3. Geographic | All from Singapore IP. Consistent with merchant location. | ๐ข LEGITIMATE |
Majority: 2/3 FRAUDULENT. Approach 3 alone would have cleared this merchant. The majority vote catches what a single analysis misses.
Same data, dramatically different insights โ just by changing who the AI "is." The AI was trained on millions of documents written by different professionals. When you assign a persona, you activate that specific knowledge cluster.
Data: Merchant with 271% volume growth + chargeback rate rising from 0.3% to 4.1%
RISK RATING: RED โ IMMEDIATE ACTION
Chargeback rate at 4.1% is 4x above the 1.0% threshold. 271% growth without infrastructure investment raises fulfillment risk.
Action: Reduce PayLater limit immediately. Issue formal warning. 30-day enhanced monitoring.
HIGH-GROWTH MERCHANT โ MANAGE, DON'T EXIT
271% volume increase = strong product-market fit. Potential top-10 partner within 6 months. 4.1% chargebacks are a logistics issue (45% "item not received"), not fraud.
Action: Maintain limits. Offer delivery tracking integration. Assign dedicated success manager.
Both are valid. The conservative analyst sees risk to mitigate. The growth analyst sees opportunity to capture. Neither is wrong โ they serve different audiences.
Get 3 perspectives in one prompt โ no need to schedule 3 meetings:
| Perspective | Focus | Key finding |
|---|---|---|
| ๐ก๏ธ Risk Manager | Default rate, exposure, regulation | "Doubling limits increases exposure by $12M" |
| ๐ Product Manager | Adoption, competition, revenue | "Current $500 limit is #1 reason for churn" |
| โ๏ธ Compliance | Responsible lending, MAS guidelines | "MAS requires affordability assessment above $500" |
Synthesis: Proceed with phased rollout ($750 first) with income verification. Monitor default rate weekly. Full $1,000 after 90-day review.
Consistent format + grounded in YOUR data = production-safe outputs.
Different every time. Hard to compare. Can't feed into systems. Requires human parsing.
Consistent format. Comparable across items. Machine-parseable. Scannable by busy stakeholders.
You use Claude daily (Cowork, Cursor, Kiro, Bedrock). Claude responds best to XML-style structure โ it's how Anthropic trained the model to parse complex prompts.
| Pattern | Tags | When to use |
|---|---|---|
| Data analysis | <data> <task> <format> | Analyzing reports, transactions, metrics |
| Document Q&A | <document> <question> <rules> | Policy lookups, compliance checks |
| Multi-step | <context> <step1> <step2> | Complex workflows, chained analysis |
| Review | <draft> <criteria> <instructions> | Reviewing reports, emails, proposals |
Without grounding rules, the AI mixes its training data with your documents โ you can't trace which claims come from where. In an audit, that's a disaster.
Q: "What happens if a customer misses 3 PayLater payments?"
The account is typically suspended and sent to collections. Late fees of around 1-2% per month are usually applied.
โ ๏ธ Sounds plausible but NONE of this may be in your actual policy. "Typically" and "usually" = hallucination.
Q: Same question + policy doc attached + grounding rules
Per PayLater Terms v3.2:
โข After 3 missed payments โ "Default" status [Section 8.3]
โข Late fee: $5 or 1.5% whichever higher [Section 7.1]
โข [INSUFFICIENT DATA: credit score reporting not in provided document]
โ Every claim cites a section. Admits what it doesn't know. No hallucination.
Toggle techniques on/off to see how the prompt AND the AI's response change. Watch quality improve as you add each technique.
Recognize these patterns? Fix them with one-line additions to your prompt.
| Mistake | Why it hurts | Quick fix |
|---|---|---|
| ๐ณ The Kitchen Sink | Cramming 5 tasks into 1 prompt | One task per prompt, chain results |
| ๐ The Blank Canvas | No examples = AI guesses your format | Show 1-2 examples of desired output |
| ๐ The Trust Fall | No grounding = confident hallucinations | "ONLY from provided data" |
| ๐ The Vague Ask | "Analyze this" โ analyze what, how, for whom? | Specify audience, format, length |
| โฑ๏ธ The One-Shot Wonder | Expecting perfection on first try | Plan for 2-3 refinement turns |
| ๐ The Copy-Paste Trap | Same prompt for different models | Tune syntax per model family |
| โ๏ธ The Set-and-Forget | Never re-testing after model updates | Monthly prompt health checks |
Every production-quality prompt goes through this cycle:
| Round | What you do | Result |
|---|---|---|
| 1. Baseline | Write prompt using 4 pillars. Run 3 times. | See what AI gets right and wrong (~60% quality) |
| 2. Fix failures | Add negative constraints + example of good output. Run 3 more. | Consistency jumps to ~85% |
| 3. Polish | Add self-review step. Tighten format. Test edge cases. | Production-ready at ~95% |
Negative constraints prevent common failure modes:
| Problem | Add this constraint |
|---|---|
| AI adds unsolicited opinions | "Do not include personal opinions or speculation" |
| AI uses data not in your input | "Do not reference any data outside the provided documents" |
| AI writes too much | "Do not exceed 300 words" |
| AI hedges everything | "Do not use phrases like 'it depends' or 'generally speaking'" |
| AI explains obvious things | "Do not explain what PayLater is or how digital wallets work" |
| AI invents numbers | "If a metric is not in the data, write [DATA NOT AVAILABLE]" |