← Back to Day 2

How AI Answers from Your Documents

See how Retrieval-Augmented Generation (RAG) lets LLMs answer questions using AnyCompany's policies, not just their training data.

📄 RAG Pipeline 🔍 Interactive 🏦 AnyCompany Finance 📐 Day 2 Module 4

🧠 The Problem: LLMs Only Know Their Training Data

Large language models are trained on public internet data up to a cutoff date. They don't know about AnyCompany's internal policies, your latest MAS circulars, or yesterday's merchant risk reports. Ask Claude about your chargeback threshold policy and it will guess — confidently, but incorrectly.

🧠

The Knowledge Gap

LLMs can't access your private docs, recent regulatory updates, or internal procedures. They'll fill gaps with plausible-sounding but potentially wrong information.

📎

The Solution: Retrieve, Then Generate

Instead of retraining the model, RAG retrieves relevant document chunks and pastes them into the prompt. The LLM reads the context and generates a grounded answer.

🔢

Powered by Embeddings

Embeddings turn text into numbers (vectors) so we can measure similarity. "Chargeback threshold" and "dispute rate limit" have similar vectors — even with different words.

🏢

Why This Matters for Finance

RAG is how enterprise AI apps work today. It's the bridge between a general-purpose LLM and your MAS compliance manuals, merchant policies, and KYC procedures.

💡
You're already doing manual RAG. Every time you paste a policy document into your AI assistant and ask a question, you're performing the RAG pattern by hand. The pipeline just automates it across thousands of documents.

🔗 Connection to Day 1

Remember the Embeddings Explainer where you clicked a word and found its nearest neighbors in 3D space? That's exactly what RAG's retrieval step does — but with document chunks instead of single words.

✂️Tokenizer
🔢Embeddings
🔍RAG Retrieval
🧠Transformer
Output

RAG sits between embeddings and the transformer — it uses embedding similarity to find relevant documents before the LLM generates.

⚙️ The RAG Pipeline — 6 Steps

Every RAG system follows the same pattern. Here's what happens when someone at AnyCompany asks: "What are our obligations if a merchant's chargeback rate exceeds 3%?"

User Query
🔢Embed Query
🔍Search Vectors
📄Retrieve Chunks
🧩Build Prompt
Generate Answer
StepWhat HappensAnyCompany Example
❓ User QuerySomeone asks a question in natural language"What are our obligations if a merchant's chargeback rate exceeds 3%?"
🔢 Embed QueryQuestion is converted to a vector (numbers)Query becomes [0.55, 0.72, 0.13, -0.28, 0.41, ...]
🔍 Search VectorsFind document chunks with similar vectorsSearches across all chunked policy documents for vectors close to the query
📄 Retrieve ChunksTop 3-5 most relevant chunks returned with scoresMerchant Risk Policy §4.2 (0.94), Chargeback Procedures §7.1 (0.87), MAS Notice PSN-06 §3 (0.82)
🧩 Build PromptAssemble: instructions + retrieved chunks + question"Given the following context: [chunk 1] [chunk 2] [chunk 3]. Answer: ..."
✨ Generate AnswerLLM reads context and generates grounded answerCites specific sections, admits gaps, no hallucination
🔗
The "Build Prompt" step is what you learned today. The grounding rules you practiced — "ONLY from provided documents, cite sections, admit gaps" — are exactly what goes into the assembled prompt. RAG automates the document retrieval; your prompt skills control the generation quality.

⚖️ RAG vs. The Alternatives

ApproachHow It WorksProsConsBest For
Paste Full Doc Copy entire document into prompt Simple, no setup Context window limits, expensive (200 pages = ~50K tokens/call), slow Quick one-off questions, small docs
RAG ✓ Auto-retrieve relevant chunks, inject into prompt Fresh data, cites sources, cost-efficient, no retraining Needs infrastructure (vector DB, embeddings), chunking quality matters Production Q&A over large/changing doc sets
Fine-Tuning Retrain model on your data Model "learns" your domain style Expensive, slow to update, can't cite sources, needs ML team Specialized language/style (not facts)
💬
When your tech team asks "should we fine-tune or use RAG?" — for document Q&A (policies, compliance, procedures), RAG wins almost every time. Fine-tuning is for changing how the model writes, not what it knows.

🎮 Interactive RAG Demo

Watch the RAG pipeline process a query step-by-step. Select a scenario and press play. Drag the 3D vector space to rotate.

🔍 RAG Pipeline

Step 0 / 6
Vector Space
Query Document Retrieved
Drag to rotate · Scroll to zoom
Document Store
Select a scenario and press ▶ to start the RAG pipeline walkthrough.

✂️ Chunking — The Hidden Quality Driver

Before documents enter the vector store, they're split into chunks — smaller pieces the system can search and retrieve. How you chunk determines whether the AI gets complete answers or broken fragments.

❌ Bad Chunking — Split at Fixed Length
Chunk 1:
...merchant risk rating.

4.2 Chargeback Thresholds

Chunk 2:
Merchants exceeding 3.0% chargeback rate 
shall be classified as RED and subject to 
immediate review by the Risk Team within 
48 hours. Merchants between 1.0-3.0% are 
classified AMBER with enhanced monitoring...

⚠️ The section header got split from its content. Chunk 1 matches "chargeback thresholds" but has no useful answer. Chunk 2 has the answer but may score lower because it lacks the header.

✅ Good Chunking — Respect Section Boundaries
Chunk:
4.2 Chargeback Thresholds

Merchants exceeding 3.0% chargeback rate 
shall be classified as RED and subject to 
immediate review by the Risk Team within 
48 hours. Merchants between 1.0-3.0% are 
classified AMBER with enhanced monitoring.
Below 1.0% is GREEN with standard quarterly 
review cycle.

✓ Header + content together. The retrieval system finds the right chunk and the LLM gets the complete answer with all three thresholds.

⚠️
Your input matters here. Your MAS compliance manuals have tables, numbered sections, and cross-references. If those get split across chunks, the AI gets fragments instead of complete answers. You know which sections belong together — your tech team needs that knowledge to configure chunking correctly.

📏 Chunking Strategies

StrategyHow It WorksBest ForWatch Out
Fixed-sizeSplit every N characters/tokensSimple, fastBreaks mid-sentence, splits tables
Sentence-basedSplit at sentence boundariesGeneral textMay split related sentences apart
Section-based ✓Split at document headings/sectionsStructured docs (policies, manuals)Sections may be too large or too small
SemanticUse embeddings to find natural topic breaksUnstructured textMore complex, slower
OverlapChunks share N tokens at boundariesReducing context loss at edgesIncreases storage, may retrieve duplicates
🏦
For AnyCompany's compliance documents: Section-based chunking with overlap is the sweet spot. Your MAS circulars and policy manuals have clear section numbers — chunk at those boundaries, with 2-3 sentence overlap to preserve cross-references.

🛠️ RAG for Business Users — No Infrastructure Needed

The pipeline demo shows how enterprise RAG systems work at scale. But you don't need a vector database to do RAG today. When you use Kiro or any AI assistant, the "RAG" you'll actually do is simpler — and just as powerful for your daily work.

🚫

What You DON'T Need

No vector database. No embeddings pipeline. No chunking configuration. No infrastructure. No tech team involvement.

What You DO

Convert your documents to clean Markdown. Drop them in your workspace. Write a grounding prompt. The AI reads the full file.

📋 The 3-Step Workflow

1

Convert PDF → Markdown

PDFs are terrible for AI. Headers become random text, tables lose structure, columns merge, page numbers inject mid-sentence. Markdown preserves the hierarchy so the AI can navigate your document.

💡
In Kiro: Drag a PDF into chat and ask: "Convert this PDF to clean Markdown. Preserve all headings, tables, and section numbers." Kiro will produce a structured .md file you can save to your workspace.
2

Add Files to Your Workspace

Place the .md files in your project folder. In Kiro, you can reference them with #File in chat, or the AI can read them directly from your workspace when you ask questions.

your-workspace/
├── policies/
│ ├── merchant-risk-policy.md
│ ├── paylater-terms-v3.md
│ ├── mas-notice-psn06.md
│ └── kyc-aml-procedures.md
├── .kiro/steering/
│ └── grounding-rules.md
└── ...
3

Ask with Grounding Rules

Reference the file and add grounding constraints. The AI reads the entire document into its context window — no chunking, no retrieval step. It's all in memory.

PROMPT:
Read #merchant-risk-policy.md

Answer ONLY from this document.
Cite [Section X.X] after each claim.
If not in the document, say "Not found."

Question: What are our obligations when a
merchant's chargeback rate exceeds 3%?

Why PDF → Markdown Makes a Huge Difference

❌ Raw PDF (what the AI sees)
4.2 Chargeback Thresholds
Merchants exceeding 3.0% chargeback
rate shall be classified as RED and
subject to immediate review by the
Risk Team within 48 hours. Merchants
Page 23 of 156
between 1.0-3.0% are classified
AMBER with enhanced monitoring.
Table 4.1: Threshold Summary
GREEN AMBER RED
≤1.0% 1.0-3.0% >3.0%
Quarterly Enhanced Immediate

⚠️ Page number injected mid-paragraph. Table structure lost. AI may misread thresholds.

✅ Clean Markdown (what the AI sees)
## 4.2 Chargeback Thresholds

Merchants exceeding 3.0% chargeback rate
shall be classified as RED and subject to
immediate review by the Risk Team within
48 hours. Merchants between 1.0-3.0% are
classified AMBER with enhanced monitoring.

| Rating | Threshold | Review Cycle |
|--------|-----------|-------------|
| GREEN  | ≤1.0%     | Quarterly   |
| AMBER  | 1.0-3.0%  | Enhanced    |
| RED    | >3.0%     | Immediate   |

✓ Clean heading. Table preserved. No page artifacts. AI reads it perfectly.

⚠️
The #1 optimization you can do today: Convert your most-used policy documents from PDF to Markdown. One hour of conversion saves hundreds of hours of better AI answers. Focus on documents with tables, numbered sections, and cross-references — those break the worst in PDF.

📝 Why Markdown? The AI's Preferred Language

You've noticed: steering files, SKILL.md, prompt templates, RAG documents — everything in this workshop is .md. That's not a coincidence. Markdown is the format AI understands best.

🏗️

Structure Without Overhead

## headings, | tables, - lists give the AI a document hierarchy to navigate — with zero parsing complexity (unlike HTML, XML, or JSON).

🪙

Token-Efficient

## 4.2 Chargeback Thresholds = ~8 tokens. The HTML equivalent = ~20 tokens. When your context window is limited, every token counts.

👥

Three Audiences, One Format

Your compliance officer can read it. The AI can parse it. Your tech team can version it in git. No other format serves all three.

🧠

LLMs Were Trained On It

GitHub, Stack Overflow, documentation sites — the training data is saturated with Markdown. Models understand its conventions natively.

FormatHuman ReadableAI ParseableToken CostVersionableVerdict
PDF✅ Great❌ TerribleN/A (binary)❌ NoConvert away from
Word (.docx)✅ Good⚠️ Needs extractionN/A (binary)❌ NoOK for drafting
HTML⚠️ With browser✅ Good🔴 High (tags)✅ YesToo verbose
JSON❌ Hard✅ Great🟡 Medium✅ YesFor data, not docs
Markdown ✓✅ Great✅ Great🟢 Low✅ YesBest for AI docs
🔗
The pattern across all 3 days:
Day 1: You learn that tokens cost money → Markdown is token-efficient
Day 2: You learn grounding and RAG → Markdown preserves document structure for accurate answers
Day 3: You create SKILL.md, steering files, and agent configs → all Markdown because it's the format AI tools read natively

Markdown isn't just a file format — it's the interface layer between you and AI.

🔄 How This Connects to the Full Pipeline

Your 3-step workflow and the enterprise RAG pipeline solve the same problem — they just operate at different scales:

StepYour Workflow (Kiro)Enterprise Pipeline (Bedrock KB)
Prepare docs You convert PDF → Markdown manually Automated ingestion + chunking
Store docs Files in your workspace folder Vector database (embeddings)
Find relevant info You reference the right file with #File Similarity search retrieves top chunks
Ground the answer Grounding rules in your prompt Same grounding rules, automated
Scale 1-5 documents at a time (context window limit) Thousands of documents, auto-retrieved
🎯
The key insight: Your prompt skills (grounding rules, citation requirements, gap admission) are the same whether you're doing manual RAG in Kiro or your tech team builds a full pipeline. The quality of the answer depends on the quality of your prompt — not the infrastructure.

🎯 Three Levels of RAG

RAG isn't all-or-nothing. You're already at Level 1. Today you learn Level 2. Your tech team builds Level 3.

1
Manual RAG
Copy-paste a policy section into your AI assistant, ask a question. You select the document, you paste the context.
👤 You do this today
2
Prompt-Level RAG
Write grounding rules: "ONLY from provided documents, cite sections, admit gaps." The prompt controls quality.
📝 Today's skill (Day 2 M4)
3
System-Level RAG
Bedrock Knowledge Base auto-retrieves from your document library. Chunking, embedding, and search happen automatically.
🔧 Tech team builds this
🔗
Day 3 connection: On Day 3, when we cover MCP (Model Context Protocol), you'll see how Kiro connects to databases and document stores. MCP is the plumbing that makes Level 3 RAG possible — the AI queries your systems directly instead of you pasting documents.

🏦 RAG Use Cases at AnyCompany

Use CaseDocumentsWho BenefitsRAG Level
Policy Q&APayLater Terms, Merchant Onboarding, KYC/AML proceduresOperations, ComplianceLevel 2-3
Regulatory ImpactMAS/BNM/OJK circulars, internal compliance memosCompliance, LegalLevel 3
Merchant Risk ReviewRisk policies, chargeback thresholds, historical assessmentsRisk TeamLevel 2-3
Invoice Dispute ResolutionPTP procedures, vendor contracts, dispute historyPTP TeamLevel 3
Audit PreparationRTR procedures, financial reporting standards, past audit findingsRTR, Finance CoELevel 3
New Hire OnboardingProcess manuals, team wikis, training materialsAll teamsLevel 2-3

Common Questions

Can we use RAG with our actual MAS compliance documents?

Yes — Bedrock Knowledge Bases supports PDF, Word, HTML. Your tech team uploads the documents, configures chunking, and exposes it as an API. You define which documents to include and review the output quality.

How is this different from just searching SharePoint?

SharePoint search matches keywords. RAG matches meaning. "Chargeback obligations" would find documents about "dispute resolution requirements" even if they don't use the word "chargeback." Plus, RAG doesn't just find the document — it reads it and generates an answer.

What about data security?

With Bedrock Knowledge Bases, documents stay in your AWS account. Embeddings are stored in your own vector database. Nothing leaves your environment. This is why AWS-hosted RAG is preferred over public tools for regulated industries.

How accurate is it? Can we trust it for compliance?

RAG dramatically reduces hallucination but doesn't eliminate it. That's why the grounding prompt rules are critical: cite sources, admit gaps, no outside knowledge. For compliance, always use RAG + human review (Level 2 autonomy from Day 3). The AI drafts, the human verifies.