RAG Pipeline Explainer — AnyCompany Finance Workshop

🧠 The Problem: LLMs Only Know Their Training Data

Large language models are trained on public internet data up to a cutoff date. They don't know about AnyCompany's internal policies, your latest MAS circulars, or yesterday's merchant risk reports. Ask Claude about your chargeback threshold policy and it will guess — confidently, but incorrectly.

🧠

The Knowledge Gap

LLMs can't access your private docs, recent regulatory updates, or internal procedures. They'll fill gaps with plausible-sounding but potentially wrong information.

📎

The Solution: Retrieve, Then Generate

Instead of retraining the model, RAG retrieves relevant document chunks and pastes them into the prompt. The LLM reads the context and generates a grounded answer.

🔢

Powered by Embeddings

Embeddings turn text into numbers (vectors) so we can measure similarity. "Chargeback threshold" and "dispute rate limit" have similar vectors — even with different words.

🏢

Why This Matters for Finance

RAG is how enterprise AI apps work today. It's the bridge between a general-purpose LLM and your MAS compliance manuals, merchant policies, and KYC procedures.

💡

You're already doing manual RAG. Every time you paste a policy document into your AI assistant and ask a question, you're performing the RAG pattern by hand. The pipeline just automates it across thousands of documents.

🔗 Connection to Day 1

Remember the Embeddings Explainer where you clicked a word and found its nearest neighbors in 3D space? That's exactly what RAG's retrieval step does — but with document chunks instead of single words.

✂️Tokenizer

→

🔢Embeddings

→

🔍RAG Retrieval

→

🧠Transformer

→

✨Output

RAG sits between embeddings and the transformer — it uses embedding similarity to find relevant documents before the LLM generates.

⚙️ The RAG Pipeline — 6 Steps

Every RAG system follows the same pattern. Here's what happens when someone at AnyCompany asks: "What are our obligations if a merchant's chargeback rate exceeds 3%?"

❓User Query

→

🔢Embed Query

→

🔍Search Vectors

→

📄Retrieve Chunks

→

🧩Build Prompt

→

✨Generate Answer

Step	What Happens	AnyCompany Example
❓ User Query	Someone asks a question in natural language	"What are our obligations if a merchant's chargeback rate exceeds 3%?"
🔢 Embed Query	Question is converted to a vector (numbers)	Query becomes `[0.55, 0.72, 0.13, -0.28, 0.41, ...]`
🔍 Search Vectors	Find document chunks with similar vectors	Searches across all chunked policy documents for vectors close to the query
📄 Retrieve Chunks	Top 3-5 most relevant chunks returned with scores	Merchant Risk Policy §4.2 (0.94), Chargeback Procedures §7.1 (0.87), MAS Notice PSN-06 §3 (0.82)
🧩 Build Prompt	Assemble: instructions + retrieved chunks + question	"Given the following context: [chunk 1] [chunk 2] [chunk 3]. Answer: ..."
✨ Generate Answer	LLM reads context and generates grounded answer	Cites specific sections, admits gaps, no hallucination

🔗

The "Build Prompt" step is what you learned today. The grounding rules you practiced — "ONLY from provided documents, cite sections, admit gaps" — are exactly what goes into the assembled prompt. RAG automates the document retrieval; your prompt skills control the generation quality.

⚖️ RAG vs. The Alternatives

Approach	How It Works	Pros	Cons	Best For
Paste Full Doc	Copy entire document into prompt	Simple, no setup	Context window limits, expensive (200 pages = ~50K tokens/call), slow	Quick one-off questions, small docs
RAG ✓	Auto-retrieve relevant chunks, inject into prompt	Fresh data, cites sources, cost-efficient, no retraining	Needs infrastructure (vector DB, embeddings), chunking quality matters	Production Q&A over large/changing doc sets
Fine-Tuning	Retrain model on your data	Model "learns" your domain style	Expensive, slow to update, can't cite sources, needs ML team	Specialized language/style (not facts)

💬

When your tech team asks "should we fine-tune or use RAG?" — for document Q&A (policies, compliance, procedures), RAG wins almost every time. Fine-tuning is for changing how the model writes, not what it knows.

🎮 Interactive RAG Demo

Watch the RAG pipeline process a query step-by-step. Select a scenario and press play. Drag the 3D vector space to rotate.

🔍 RAG Pipeline

Step 0 / 6

Vector Space

Query Document Retrieved

Drag to rotate · Scroll to zoom

Select a scenario and press ▶ to start the RAG pipeline walkthrough.

✂️ Chunking — The Hidden Quality Driver

Before documents enter the vector store, they're split into chunks — smaller pieces the system can search and retrieve. How you chunk determines whether the AI gets complete answers or broken fragments.

❌ Bad Chunking — Split at Fixed Length

Chunk 1:
...merchant risk rating.

4.2 Chargeback Thresholds

Chunk 2:
Merchants exceeding 3.0% chargeback rate 
shall be classified as RED and subject to 
immediate review by the Risk Team within 
48 hours. Merchants between 1.0-3.0% are 
classified AMBER with enhanced monitoring...

⚠️ The section header got split from its content. Chunk 1 matches "chargeback thresholds" but has no useful answer. Chunk 2 has the answer but may score lower because it lacks the header.

✅ Good Chunking — Respect Section Boundaries

Chunk:
4.2 Chargeback Thresholds

Merchants exceeding 3.0% chargeback rate 
shall be classified as RED and subject to 
immediate review by the Risk Team within 
48 hours. Merchants between 1.0-3.0% are 
classified AMBER with enhanced monitoring.
Below 1.0% is GREEN with standard quarterly 
review cycle.

✓ Header + content together. The retrieval system finds the right chunk and the LLM gets the complete answer with all three thresholds.

⚠️

Your input matters here. Your MAS compliance manuals have tables, numbered sections, and cross-references. If those get split across chunks, the AI gets fragments instead of complete answers. You know which sections belong together — your tech team needs that knowledge to configure chunking correctly.

📏 Chunking Strategies

Strategy	How It Works	Best For	Watch Out
Fixed-size	Split every N characters/tokens	Simple, fast	Breaks mid-sentence, splits tables
Sentence-based	Split at sentence boundaries	General text	May split related sentences apart
Section-based ✓	Split at document headings/sections	Structured docs (policies, manuals)	Sections may be too large or too small
Semantic	Use embeddings to find natural topic breaks	Unstructured text	More complex, slower
Overlap	Chunks share N tokens at boundaries	Reducing context loss at edges	Increases storage, may retrieve duplicates

🏦

For AnyCompany's compliance documents: Section-based chunking with overlap is the sweet spot. Your MAS circulars and policy manuals have clear section numbers — chunk at those boundaries, with 2-3 sentence overlap to preserve cross-references.

🛠️ RAG for Business Users — No Infrastructure Needed

The pipeline demo shows how enterprise RAG systems work at scale. But you don't need a vector database to do RAG today. When you use Kiro or any AI assistant, the "RAG" you'll actually do is simpler — and just as powerful for your daily work.

🚫

What You DON'T Need

No vector database. No embeddings pipeline. No chunking configuration. No infrastructure. No tech team involvement.

✅

What You DO

Convert your documents to clean Markdown. Drop them in your workspace. Write a grounding prompt. The AI reads the full file.

📋 The 3-Step Workflow

1

Convert PDF → Markdown

PDFs are terrible for AI. Headers become random text, tables lose structure, columns merge, page numbers inject mid-sentence. Markdown preserves the hierarchy so the AI can navigate your document.

💡

In Kiro: Drag a PDF into chat and ask: "Convert this PDF to clean Markdown. Preserve all headings, tables, and section numbers." Kiro will produce a structured .md file you can save to your workspace.

2

Add Files to Your Workspace

Place the .md files in your project folder. In Kiro, you can reference them with #File in chat, or the AI can read them directly from your workspace when you ask questions.

        your-workspace/

        ├── policies/

        │   ├── merchant-risk-policy.md

        │   ├── paylater-terms-v3.md

        │   ├── mas-notice-psn06.md

        │   └── kyc-aml-procedures.md

        ├── .kiro/steering/

        │   └── grounding-rules.md

        └── ...

3

Ask with Grounding Rules

Reference the file and add grounding constraints. The AI reads the entire document into its context window — no chunking, no retrieval step. It's all in memory.

        PROMPT:

        Read #merchant-risk-policy.md

        Answer ONLY from this document.

        Cite [Section X.X] after each claim.

        If not in the document, say "Not found."

        Question: What are our obligations when a

        merchant's chargeback rate exceeds 3%?

⚡ Why PDF → Markdown Makes a Huge Difference

❌ Raw PDF (what the AI sees)

4.2 Chargeback Thresholds
Merchants exceeding 3.0% chargeback
rate shall be classified as RED and
subject to immediate review by the
Risk Team within 48 hours. Merchants
Page 23 of 156
between 1.0-3.0% are classified
AMBER with enhanced monitoring.
Table 4.1: Threshold Summary
GREEN AMBER RED
≤1.0% 1.0-3.0% >3.0%
Quarterly Enhanced Immediate

⚠️ Page number injected mid-paragraph. Table structure lost. AI may misread thresholds.

✅ Clean Markdown (what the AI sees)

## 4.2 Chargeback Thresholds

Merchants exceeding 3.0% chargeback rate
shall be classified as RED and subject to
immediate review by the Risk Team within
48 hours. Merchants between 1.0-3.0% are
classified AMBER with enhanced monitoring.

| Rating | Threshold | Review Cycle |
|--------|-----------|-------------|
| GREEN  | ≤1.0%     | Quarterly   |
| AMBER  | 1.0-3.0%  | Enhanced    |
| RED    | >3.0%     | Immediate   |

✓ Clean heading. Table preserved. No page artifacts. AI reads it perfectly.

⚠️

The #1 optimization you can do today: Convert your most-used policy documents from PDF to Markdown. One hour of conversion saves hundreds of hours of better AI answers. Focus on documents with tables, numbered sections, and cross-references — those break the worst in PDF.

📝 Why Markdown? The AI's Preferred Language

You've noticed: steering files, SKILL.md, prompt templates, RAG documents — everything in this workshop is .md. That's not a coincidence. Markdown is the format AI understands best.

🏗️

Structure Without Overhead

## headings, | tables, - lists give the AI a document hierarchy to navigate — with zero parsing complexity (unlike HTML, XML, or JSON).

🪙

Token-Efficient

## 4.2 Chargeback Thresholds = ~8 tokens. The HTML equivalent = ~20 tokens. When your context window is limited, every token counts.

👥

Three Audiences, One Format

Your compliance officer can read it. The AI can parse it. Your tech team can version it in git. No other format serves all three.

🧠

LLMs Were Trained On It

GitHub, Stack Overflow, documentation sites — the training data is saturated with Markdown. Models understand its conventions natively.

Format	Human Readable	AI Parseable	Token Cost	Versionable	Verdict
PDF	✅ Great	❌ Terrible	N/A (binary)	❌ No	Convert away from
Word (.docx)	✅ Good	⚠️ Needs extraction	N/A (binary)	❌ No	OK for drafting
HTML	⚠️ With browser	✅ Good	🔴 High (tags)	✅ Yes	Too verbose
JSON	❌ Hard	✅ Great	🟡 Medium	✅ Yes	For data, not docs
Markdown ✓	✅ Great	✅ Great	🟢 Low	✅ Yes	Best for AI docs

🔗

The pattern across all 3 days:
Day 1: You learn that tokens cost money → Markdown is token-efficient
Day 2: You learn grounding and RAG → Markdown preserves document structure for accurate answers
Day 3: You create SKILL.md, steering files, and agent configs → all Markdown because it's the format AI tools read natively

Markdown isn't just a file format — it's the interface layer between you and AI.

🔄 How This Connects to the Full Pipeline

Your 3-step workflow and the enterprise RAG pipeline solve the same problem — they just operate at different scales:

Step	Your Workflow (Kiro)	Enterprise Pipeline (Bedrock KB)
Prepare docs	You convert PDF → Markdown manually	Automated ingestion + chunking
Store docs	Files in your workspace folder	Vector database (embeddings)
Find relevant info	You reference the right file with `#File`	Similarity search retrieves top chunks
Ground the answer	Grounding rules in your prompt	Same grounding rules, automated
Scale	1-5 documents at a time (context window limit)	Thousands of documents, auto-retrieved

🎯

The key insight: Your prompt skills (grounding rules, citation requirements, gap admission) are the same whether you're doing manual RAG in Kiro or your tech team builds a full pipeline. The quality of the answer depends on the quality of your prompt — not the infrastructure.

🎯 Three Levels of RAG

RAG isn't all-or-nothing. You're already at Level 1. Today you learn Level 2. Your tech team builds Level 3.

1

Manual RAG

Copy-paste a policy section into your AI assistant, ask a question. You select the document, you paste the context.

👤 You do this today

2

Prompt-Level RAG

Write grounding rules: "ONLY from provided documents, cite sections, admit gaps." The prompt controls quality.

📝 Today's skill (Day 2 M4)

3

System-Level RAG

Bedrock Knowledge Base auto-retrieves from your document library. Chunking, embedding, and search happen automatically.

🔧 Tech team builds this

🔗

Day 3 connection: On Day 3, when we cover MCP (Model Context Protocol), you'll see how Kiro connects to databases and document stores. MCP is the plumbing that makes Level 3 RAG possible — the AI queries your systems directly instead of you pasting documents.

🏦 RAG Use Cases at AnyCompany

Use Case	Documents	Who Benefits	RAG Level
Policy Q&A	PayLater Terms, Merchant Onboarding, KYC/AML procedures	Operations, Compliance	Level 2-3
Regulatory Impact	MAS/BNM/OJK circulars, internal compliance memos	Compliance, Legal	Level 3
Merchant Risk Review	Risk policies, chargeback thresholds, historical assessments	Risk Team	Level 2-3
Invoice Dispute Resolution	PTP procedures, vendor contracts, dispute history	PTP Team	Level 3
Audit Preparation	RTR procedures, financial reporting standards, past audit findings	RTR, Finance CoE	Level 3
New Hire Onboarding	Process manuals, team wikis, training materials	All teams	Level 2-3

❓ Common Questions

Can we use RAG with our actual MAS compliance documents?

Yes — Bedrock Knowledge Bases supports PDF, Word, HTML. Your tech team uploads the documents, configures chunking, and exposes it as an API. You define which documents to include and review the output quality.

How is this different from just searching SharePoint?

SharePoint search matches keywords. RAG matches meaning. "Chargeback obligations" would find documents about "dispute resolution requirements" even if they don't use the word "chargeback." Plus, RAG doesn't just find the document — it reads it and generates an answer.

What about data security?

With Bedrock Knowledge Bases, documents stay in your AWS account. Embeddings are stored in your own vector database. Nothing leaves your environment. This is why AWS-hosted RAG is preferred over public tools for regulated industries.

How accurate is it? Can we trust it for compliance?

RAG dramatically reduces hallucination but doesn't eliminate it. That's why the grounding prompt rules are critical: cite sources, admit gaps, no outside knowledge. For compliance, always use RAG + human review (Level 2 autonomy from Day 3). The AI drafts, the human verifies.

How AI Answers from Your Documents

🧠 The Problem: LLMs Only Know Their Training Data

The Knowledge Gap

The Solution: Retrieve, Then Generate

Powered by Embeddings

Why This Matters for Finance

🔗 Connection to Day 1

⚙️ The RAG Pipeline — 6 Steps

⚖️ RAG vs. The Alternatives

🎮 Interactive RAG Demo

🔍 RAG Pipeline

✂️ Chunking — The Hidden Quality Driver

📏 Chunking Strategies

🛠️ RAG for Business Users — No Infrastructure Needed

What You DON'T Need

What You DO

📋 The 3-Step Workflow

Convert PDF → Markdown

Add Files to Your Workspace

Ask with Grounding Rules

⚡ Why PDF → Markdown Makes a Huge Difference

📝 Why Markdown? The AI's Preferred Language

Structure Without Overhead

Token-Efficient

Three Audiences, One Format

LLMs Were Trained On It

🔄 How This Connects to the Full Pipeline

🎯 Three Levels of RAG

🏦 RAG Use Cases at AnyCompany

❓ Common Questions

Can we use RAG with our actual MAS compliance documents?

How is this different from just searching SharePoint?

What about data security?

How accurate is it? Can we trust it for compliance?