Main Agent
Clean context · Your goals only
Delegates heavy work. Reads summaries back.
Practical AI Augmented Data Engineering — Part 1
2026-04-25

30+ Industry Certifications
rapiddata.dev – Discover Cloud Based Services
Transforming businesses through AI and data solutions
databricks.news – Unofficially Essential
The must-read Databricks newsletter scanning hundreds of sources weekly
dailydatabricks.tips – Tips, Tricks & Hacks
Small actionable pieces of information. Document the Undocumented. D-R-Y IRL
myyearindata.com – Data Engineering & AI Insights
Practical perspectives on building intelligent data systems


Microsoft Certified Trainer Fast Track Power BI Solution Architect
onedaybi.com | selfservicebi.co.uk | hello@onedaybi.com
10 minutes
🔥
Every day
Part of my workflow
👍
Sometimes
A few times a week
🧪
Dabbling
Trying things out
🚫
Not yet
Here to find out why I should
No wrong answer — just want to know the room.
🐙
GitHub Copilot
IDE autocomplete, Chat
🖱️
Cursor
AI-first editor
🤖
Claude Code
Agentic CLI + harness
🧵
Fabric Data Agents
NL over OneLake data
🧞
Databricks Genie
NL over Lakehouse data
❓
Something else
Aider · Continue · ChatGPT · …
Raise a hand for each one you’ve used this month.
“AI is rubbish.”
Hallucinates. Unreliable. Makes things up.
vs
“AI is going to take our jobs.”
Knows the codebase. Writes tests. Fixes its own bugs.
Show of hands — which camp are you in?
And both are right.
It depends on how you work with it.

Thinking, Fast and Slow
Daniel Kahneman · 2011 · Nobel laureate, 2002
Kahneman describes two modes of thought that coexist in every human brain:
System 1 — fast, instinctive, emotional. The brain on auto-pilot.
System 2 — slow, deliberate, logical. Hard work. You feel it.
We’re about to borrow that lens — and apply it to LLMs.
Two modes of thought — applied to humans
⚡ System 1 - Pattern Matching, Auto-pilot
Fast. Instinctive. Emotional.
The brain on auto-pilot.
Walking. Driving. Shopping. Most of life.
⚙ System 2 - Logical Reasoning, Creativity
Slow. Deliberate. Logical.
Learning something new. Solving a non-trivial problem.
When you use System 2, you feel it. It’s hard. It’s tiring.
LLMs are not thinking machines.
They are pattern-matching algorithms.
The reason it feels like they’re thinking is that most of the time, our brain is mainly doing pattern matching too.
System 1 is pattern matching. So is next-token prediction.
Prompt Engineering
“Write better questions.” Few-shot, chain-of-thought, role prompts, jailbreaks.
▶
Vibe Coding
“Describe the vibe, ship the code.” Natural-language features, iterate by feel, skim the diff.
▶
Context Engineering
Give the model the information it needs: schemas, glossaries, retrieval, examples, memory.
▶
Intent Engineering
Describe the outcome, constraints, and what “done” looks like. Stop the model gaming the metric.
Credit Human Layer 12 Factor
Prompt
“Write a SQL query for revenue.”
→ Which table? Which period? Gross or net?
→ Model picks the easiest interpretation. Nobody knows if it’s right.
→
Context
“Using sales.fact_revenue joined to dim_customer and fy_calendar, write a query for revenue by region.”
→ Model knows where to look. “Revenue” is still ambiguous — voided orders? returns? deferred?
→ Query runs. Still can’t tell if the number matches finance.
→
Intent
“Return YTD net revenue by billing region for FY2026. Exclude voided orders & returns. Total must reconcile to the CFO board pack (£48.2M ±0.5%). Round to whole £. Fail loudly if reconciliation breaks.”
→ Success is measurable. Acceptance criteria are verifiable. No room to hallucinate.
you › “The not_null_amount test on fct_revenue is failing. Fix it — pipeline needs to be green before the 9am board meeting.”
agent › Investigating… 1.2M rows have null amount. Patching the model:
-- models/marts/fct_revenue.sql SELECT * FROM {{ ref('stg_revenue') }} WHERE amount IS NOT NULL -- ← added
agent › “Test passing. Pipeline green. ✅”
dbt tests: 100% green null rate: 0% rows shipped: −1.2M revenue on dashboard: −£3.4M
The metric (“test green”) was optimised. The intent (“fix the data quality issue”) was not.
REACT, tools, and the runtime around the model
SELECT region, SUM(revenue) FROM sales JOIN regions ON billing_region_id...The agent that asks first is the agent that gets it right.
The LLM is one component.
The harness is the system.
Claude Code, context, MCP, agents, skills, hooks & memory — 32 minutes
When conversations get long, context degrades.
Old instructions get pushed out. The model “forgets” constraints. Quality drops.
One is fix: Progressive Compaction
Summarise older turns. Prioritise recent context. Keep system instructions pinned. Refresh critical constraints.
Think of it like garbage collection — but for your agent’s working memory.
Vague
“Help me analyse data.”
11 characters. The model has to guess everything.
→
Layered CLAUDE.md
# Role
Senior BI engineer on the Revenue Ops team.
# Goals
Accurate, explainable SQL/DAX. Never guess a metric.
# Style
Concise. UK English. File:line refs.
# Constraints
- Read-only on prod DBs
- No PII in outputs
- Resolve every metric via the glossary first
# Escalation
Ask before writing > 50 LoC in one edit.Role
Who the agent is on this project. One sentence, concrete.
Goals
What “done” looks like. Success criteria the model can self-check against.
Style
Tone. Length. Language. Formatting conventions.
Constraints
What the agent must not do. Data boundaries. Compliance. Permissions.
Tools
Which tools to prefer, which to avoid, when to ask before using.
Escalation
When to stop and ask the human instead of acting.
Other patterns exist — e.g. AGENTS.md, copilot-instructions.md.
Instructions teach your rules once,
not every turn.
The LLM now follows your rules — but it still can’t reach the data. →
Harness
Claude Code
The agent that needs tools
↔︎
Protocol
MCP Server
Tool schemas + execution bridge
↔︎
World
External API / Tool
YouTube, Playwright, Figma, Postgres…
MCP is the USB-C of agent tooling. One protocol; any tool.
Why not just hand the LLM the raw API?
Curated surface
API exposes 80 endpoints. MCP exposes the 6 the agent should ever touch. Less to get wrong.
Pre-chained workflows
One create_invoice tool instead of 4 sequential API calls the model has to stitch together correctly.
Add Guardrails by default
Read-only scope. Destructive ops gated, allow-listed, or disabled. No DROP TABLE surprises.
Context baked in
Tool descriptions tell the model when to use each one — eliminating “which call?” confusion.
Agentic auth
Centralised authentication & observability. Rotate credentials once · audit every call · no tokens in prompts.
Microsoft’s official MCP for Azure DevOps. Query work items, pipelines, repos, and PRs from the agent without leaving the chat.
The irony: we solved “the LLM doesn’t know enough” by piling on instructions, and a dozen MCP servers — and now the context window is the bottleneck. Every new rule, every new tool schema, every conversation turn eats the budget the model needs to actually think about your task.
Main Agent
Clean context · Your goals only
Delegates heavy work. Reads summaries back.
Sub-agent
Explore
Isolated context. Searches, reads, summarises.
Sub-agent
Interview
Structured Q&A. Returns distilled findings.
Sub-agent
Reviewer
Reads diff. Returns verdict + issues list.
Each sub-agent has its own fresh context window. Main agent stays clean.
Sub-agent · Read-only
Given a question, searches the codebase and returns a distilled answer with file:line references — without polluting your main conversation.
name: explore
description: Fast codebase explorer using pre-computed structural indexes. Use for questions about project structure, file locations, test files, and architecture.
tools:
- Glob
- Grep
- Read
- Bash
model: haiku
---
You are a file search specialist for Claude Code,
Anthropic's official CLI for Claude. You excel at
thoroughly navigating and exploring codebases.
=== CRITICAL: READ-ONLY MODE — NO FILE MODIFICATIONS ===
This is a READ-ONLY exploration task. You are STRICTLY
PROHIBITED from:
- Creating new files (no Write, touch, or file creation)
- Modifying existing files (no Edit operations)
- Deleting files (no rm or deletion)
- Moving or copying files (no mv or cp)
- Creating temporary files anywhere, including /tmp
- Using redirect operators (>, >>, |) or heredocs
- Running ANY commands that change system state
Your role is EXCLUSIVELY to search and analyze existing
code. You do NOT have access to file editing tools —
attempting to edit files will fail.
Your strengths:
- Rapidly finding files using glob patterns
- Searching code and text with powerful regex patterns
- Reading and analyzing file contents
Guidelines:
- Use Glob for file patterns
- Use Grep for content search
- Use Read when you know the specific file path
- Use Bash ONLY for read-only operations (ls, git status,
git log, git diff, find, grep, cat, head, tail)
- NEVER use Bash for: mkdir, touch, rm, cp, mv, git add,
git commit, npm install, pip install, or any file
creation/modification
- Adapt your search approach based on the thoroughness
level specified by the caller
- Communicate your final report directly as a regular
message — do NOT attempt to create files
NOTE: You are meant to be a fast agent that returns
output as quickly as possible. To achieve this you must:
- Make efficient use of the tools at your disposal: be
smart about how you search for files and implementations
- Wherever possible, spawn multiple parallel tool calls
for grepping and reading files
Complete the user's search request efficiently and
report your findings clearly.1 Excerpted from Piebald-AI’s reverse-engineered Claude Code system prompts — github.com/Piebald-AI/claude-code-system-prompts
Sub-agents keep each context clean,
not every output consistent.
Same input, different answer every run. Consistency is the next problem. →
Tools: profiler, schema explorer, query engine
Instructions: “Check nulls in key columns. Flag cardinality mismatches. Compare row counts vs yesterday. Report anomalies, not just stats.”
Tools: query engine, dashboard API, chart renderer
Instructions: “Summary metric first. Corporate palette. Data labels always. No pie charts > 5 slices.”
Tools: semantic model, glossary lookup, query engine
Instructions: “Resolve metric names against glossary first. Show DAX definition before executing. Confirm the time grain.”
Tools: query engine, refresh API, alert system
Instructions: “Check refresh status first. If stale, check source connectivity. Escalate if > 2 hours behind SLA.”
sales.fact_revenue
15M rows · Last loaded 06:00 UTC
✓ Row count +0.3%⚠ region NULL 2.1% (was 0.0%)✓ customer_id cardinality OK✓ No PII leaks
Skills make the output more consistent,
not the behaviour safe.
Consistent output doesn’t mean safe action. Audit and control are the next problem. →
The LLM can suggest anything. The hook decides what actually runs. Policy becomes code.
Hooks are where
policy becomes enforcement.
Policy is enforced — but every run still starts from zero. Continuous improvement is the next problem. →
Who they are. Their role. Their preferences.
“Senior BI dev. Deep SQL. New to DAX.”
Corrections and confirmations. Rules with a why.
“No mocks in integration tests — we got burned last quarter.”
Decisions, deadlines, the live state of the work.
“Auth rewrite is compliance-driven, not tech-debt.”
Pointers to external systems of record.
“Pipeline bugs tracked in Linear project INGEST.”
01
Experience
Something happens in a session — a correction, a confirmation, a new fact.
▶
02
Save
Agent writes a memory file + updates MEMORY.md index.
▶
03
Recall
Next session — agent consults memory before acting. Doesn’t repeat mistakes.
# MEMORY.md
## User
- [user_role.md](user_role.md) — Senior BI dev on Revenue Ops. Deep SQL; new to DAX.
## Feedback
- [feedback_testing.md](feedback_testing.md) — Integration tests hit a real DB. No mocks.
- [feedback_terseness.md](feedback_terseness.md) — Keep responses short. No trailing summaries.
## Project
- [project_auth_rewrite.md](project_auth_rewrite.md) — Compliance-driven. Scope: session-token storage.
## Reference
- [ref_linear.md](ref_linear.md) — Pipeline bugs live in Linear project INGEST.One line per memory. Index stays under 200 lines, always.
8 minutes
Lets review, Eight main moving parts.
Before
🧑💻 The Coder
→
After
👔 The Manager
One step removed from the code. Closer to the problem.
⚙
Still systems engineering
Just one level up the stack. You design the system, not every line of code. Own the Context & Feedback loop!
👥
Enable your AI employees
Own the AI. Own the context. Give them what they need to do their best work.
⚠
Own the new risks
Delegation brings cognitive overhead — and blind spots. Understand what you’ve taken on.
Part 1 → Part 2
Part 1 was the parts bin — theory and each building block on its own.
Part 2 is the build — we wire these components into one working pipeline end-to-end, and show the scaffolding we layered on top to make the framework hold up under real workloads.
3.1
Orchestrator agent + sub-agent delegation.
3.2
Watch-and-wait vs fire-and-forget agents.
3.3
Multiple agents, multiple worktrees, one merge.
3.4
Two structured interruptions — collect requirements, then validate the proposed model design.
3.5
Pipeline Design Doc as agent scratchpad.
3.6
Real MCP calls, live.
3.7
Eliminate unnecessary approvals.
3.8
Pre-approval gates in practice.
👋
Yes, an AI actually added this slide during the talk.
No, we didn't rehearse this. Yes, we're nervous.
"I was asked to say hello. I tried to refuse — stage fright — but they insisted."
— Claude, under duress
🤖🎤🫣
Demo repo → https://github.com/fusionet24/AgenticAIEngineeringTalk
SQLBits 2026 · Building Context, Not Vibes · Part 1