See how Retrieval-Augmented Generation (RAG) lets LLMs answer questions using AnyCompany's policies, not just their training data.
Large language models are trained on public internet data up to a cutoff date. They don't know about AnyCompany's internal policies, your latest MAS circulars, or yesterday's merchant risk reports. Ask Claude about your chargeback threshold policy and it will guess — confidently, but incorrectly.
LLMs can't access your private docs, recent regulatory updates, or internal procedures. They'll fill gaps with plausible-sounding but potentially wrong information.
Instead of retraining the model, RAG retrieves relevant document chunks and pastes them into the prompt. The LLM reads the context and generates a grounded answer.
Embeddings turn text into numbers (vectors) so we can measure similarity. "Chargeback threshold" and "dispute rate limit" have similar vectors — even with different words.
RAG is how enterprise AI apps work today. It's the bridge between a general-purpose LLM and your MAS compliance manuals, merchant policies, and KYC procedures.
Remember the Embeddings Explainer where you clicked a word and found its nearest neighbors in 3D space? That's exactly what RAG's retrieval step does — but with document chunks instead of single words.
RAG sits between embeddings and the transformer — it uses embedding similarity to find relevant documents before the LLM generates.
Every RAG system follows the same pattern. Here's what happens when someone at AnyCompany asks: "What are our obligations if a merchant's chargeback rate exceeds 3%?"
| Step | What Happens | AnyCompany Example |
|---|---|---|
| ❓ User Query | Someone asks a question in natural language | "What are our obligations if a merchant's chargeback rate exceeds 3%?" |
| 🔢 Embed Query | Question is converted to a vector (numbers) | Query becomes [0.55, 0.72, 0.13, -0.28, 0.41, ...] |
| 🔍 Search Vectors | Find document chunks with similar vectors | Searches across all chunked policy documents for vectors close to the query |
| 📄 Retrieve Chunks | Top 3-5 most relevant chunks returned with scores | Merchant Risk Policy §4.2 (0.94), Chargeback Procedures §7.1 (0.87), MAS Notice PSN-06 §3 (0.82) |
| 🧩 Build Prompt | Assemble: instructions + retrieved chunks + question | "Given the following context: [chunk 1] [chunk 2] [chunk 3]. Answer: ..." |
| ✨ Generate Answer | LLM reads context and generates grounded answer | Cites specific sections, admits gaps, no hallucination |
| Approach | How It Works | Pros | Cons | Best For |
|---|---|---|---|---|
| Paste Full Doc | Copy entire document into prompt | Simple, no setup | Context window limits, expensive (200 pages = ~50K tokens/call), slow | Quick one-off questions, small docs |
| RAG ✓ | Auto-retrieve relevant chunks, inject into prompt | Fresh data, cites sources, cost-efficient, no retraining | Needs infrastructure (vector DB, embeddings), chunking quality matters | Production Q&A over large/changing doc sets |
| Fine-Tuning | Retrain model on your data | Model "learns" your domain style | Expensive, slow to update, can't cite sources, needs ML team | Specialized language/style (not facts) |
Before documents enter the vector store, they're split into chunks — smaller pieces the system can search and retrieve. How you chunk determines whether the AI gets complete answers or broken fragments.
Chunk 1:
...merchant risk rating.
4.2 Chargeback Thresholds
Chunk 2:
Merchants exceeding 3.0% chargeback rate
shall be classified as RED and subject to
immediate review by the Risk Team within
48 hours. Merchants between 1.0-3.0% are
classified AMBER with enhanced monitoring...
⚠️ The section header got split from its content. Chunk 1 matches "chargeback thresholds" but has no useful answer. Chunk 2 has the answer but may score lower because it lacks the header.
Chunk:
4.2 Chargeback Thresholds
Merchants exceeding 3.0% chargeback rate
shall be classified as RED and subject to
immediate review by the Risk Team within
48 hours. Merchants between 1.0-3.0% are
classified AMBER with enhanced monitoring.
Below 1.0% is GREEN with standard quarterly
review cycle.
✓ Header + content together. The retrieval system finds the right chunk and the LLM gets the complete answer with all three thresholds.
| Strategy | How It Works | Best For | Watch Out |
|---|---|---|---|
| Fixed-size | Split every N characters/tokens | Simple, fast | Breaks mid-sentence, splits tables |
| Sentence-based | Split at sentence boundaries | General text | May split related sentences apart |
| Section-based ✓ | Split at document headings/sections | Structured docs (policies, manuals) | Sections may be too large or too small |
| Semantic | Use embeddings to find natural topic breaks | Unstructured text | More complex, slower |
| Overlap | Chunks share N tokens at boundaries | Reducing context loss at edges | Increases storage, may retrieve duplicates |
The pipeline demo shows how enterprise RAG systems work at scale. But you don't need a vector database to do RAG today. When you use Kiro or any AI assistant, the "RAG" you'll actually do is simpler — and just as powerful for your daily work.
No vector database. No embeddings pipeline. No chunking configuration. No infrastructure. No tech team involvement.
Convert your documents to clean Markdown. Drop them in your workspace. Write a grounding prompt. The AI reads the full file.
PDFs are terrible for AI. Headers become random text, tables lose structure, columns merge, page numbers inject mid-sentence. Markdown preserves the hierarchy so the AI can navigate your document.
.md file you can save to your workspace.Place the .md files in your project folder. In Kiro, you can reference them with #File in chat, or the AI can read them directly from your workspace when you ask questions.
Reference the file and add grounding constraints. The AI reads the entire document into its context window — no chunking, no retrieval step. It's all in memory.
4.2 Chargeback Thresholds Merchants exceeding 3.0% chargeback rate shall be classified as RED and subject to immediate review by the Risk Team within 48 hours. Merchants Page 23 of 156 between 1.0-3.0% are classified AMBER with enhanced monitoring. Table 4.1: Threshold Summary GREEN AMBER RED ≤1.0% 1.0-3.0% >3.0% Quarterly Enhanced Immediate
⚠️ Page number injected mid-paragraph. Table structure lost. AI may misread thresholds.
## 4.2 Chargeback Thresholds Merchants exceeding 3.0% chargeback rate shall be classified as RED and subject to immediate review by the Risk Team within 48 hours. Merchants between 1.0-3.0% are classified AMBER with enhanced monitoring. | Rating | Threshold | Review Cycle | |--------|-----------|-------------| | GREEN | ≤1.0% | Quarterly | | AMBER | 1.0-3.0% | Enhanced | | RED | >3.0% | Immediate |
✓ Clean heading. Table preserved. No page artifacts. AI reads it perfectly.
You've noticed: steering files, SKILL.md, prompt templates, RAG documents — everything in this workshop is .md. That's not a coincidence. Markdown is the format AI understands best.
## headings, | tables, - lists give the AI a document hierarchy to navigate — with zero parsing complexity (unlike HTML, XML, or JSON).
## 4.2 Chargeback Thresholds = ~8 tokens. The HTML equivalent = ~20 tokens. When your context window is limited, every token counts.
Your compliance officer can read it. The AI can parse it. Your tech team can version it in git. No other format serves all three.
GitHub, Stack Overflow, documentation sites — the training data is saturated with Markdown. Models understand its conventions natively.
| Format | Human Readable | AI Parseable | Token Cost | Versionable | Verdict |
|---|---|---|---|---|---|
| ✅ Great | ❌ Terrible | N/A (binary) | ❌ No | Convert away from | |
| Word (.docx) | ✅ Good | ⚠️ Needs extraction | N/A (binary) | ❌ No | OK for drafting |
| HTML | ⚠️ With browser | ✅ Good | 🔴 High (tags) | ✅ Yes | Too verbose |
| JSON | ❌ Hard | ✅ Great | 🟡 Medium | ✅ Yes | For data, not docs |
| Markdown ✓ | ✅ Great | ✅ Great | 🟢 Low | ✅ Yes | Best for AI docs |
Your 3-step workflow and the enterprise RAG pipeline solve the same problem — they just operate at different scales:
| Step | Your Workflow (Kiro) | Enterprise Pipeline (Bedrock KB) |
|---|---|---|
| Prepare docs | You convert PDF → Markdown manually | Automated ingestion + chunking |
| Store docs | Files in your workspace folder | Vector database (embeddings) |
| Find relevant info | You reference the right file with #File |
Similarity search retrieves top chunks |
| Ground the answer | Grounding rules in your prompt | Same grounding rules, automated |
| Scale | 1-5 documents at a time (context window limit) | Thousands of documents, auto-retrieved |
RAG isn't all-or-nothing. You're already at Level 1. Today you learn Level 2. Your tech team builds Level 3.
| Use Case | Documents | Who Benefits | RAG Level |
|---|---|---|---|
| Policy Q&A | PayLater Terms, Merchant Onboarding, KYC/AML procedures | Operations, Compliance | Level 2-3 |
| Regulatory Impact | MAS/BNM/OJK circulars, internal compliance memos | Compliance, Legal | Level 3 |
| Merchant Risk Review | Risk policies, chargeback thresholds, historical assessments | Risk Team | Level 2-3 |
| Invoice Dispute Resolution | PTP procedures, vendor contracts, dispute history | PTP Team | Level 3 |
| Audit Preparation | RTR procedures, financial reporting standards, past audit findings | RTR, Finance CoE | Level 3 |
| New Hire Onboarding | Process manuals, team wikis, training materials | All teams | Level 2-3 |
Yes — Bedrock Knowledge Bases supports PDF, Word, HTML. Your tech team uploads the documents, configures chunking, and exposes it as an API. You define which documents to include and review the output quality.
SharePoint search matches keywords. RAG matches meaning. "Chargeback obligations" would find documents about "dispute resolution requirements" even if they don't use the word "chargeback." Plus, RAG doesn't just find the document — it reads it and generates an answer.
With Bedrock Knowledge Bases, documents stay in your AWS account. Embeddings are stored in your own vector database. Nothing leaves your environment. This is why AWS-hosted RAG is preferred over public tools for regulated industries.
RAG dramatically reduces hallucination but doesn't eliminate it. That's why the grounding prompt rules are critical: cite sources, admit gaps, no outside knowledge. For compliance, always use RAG + human review (Level 2 autonomy from Day 3). The AI drafts, the human verifies.