The Backyard Quarry, Part 7: Systems Beyond the Backyard

By now, the Backyard Quarry system has grown beyond its original intent.

We started with a pile of rocks.

We ended up with:

  • a schema
  • a capture process
  • a processing pipeline
  • storage and indexing
  • digital representations of physical objects

Along the way, something interesting happened.

The problems stopped feeling unique.

Recognizing the Pattern

At first, the Quarry felt like a small, slightly absurd project.

But the more pieces came together, the more familiar it became.

The same structure appeared again and again:

  • capture data from the physical world
  • transform it into structured representations
  • store it
  • index it
  • build systems on top of it

This isn’t a rock problem.

It’s a pattern.

Where the Pattern Appears

Once you start looking for it, you see it everywhere.

Manufacturing Systems

Physical parts become digital records.

  • components are tracked
  • condition is monitored
  • systems are modeled

Each part has a digital twin.

The system keeps everything connected.

Museums and Archives

Artifacts are cataloged and preserved.

  • metadata describes objects
  • images and scans capture detail
  • provenance tracks history

The goal is the same:

Turn physical objects into structured, searchable systems.

Photogrammetry and 3D Capture

Entire environments can be captured and reconstructed.

  • objects become meshes
  • scenes become models
  • real-world geometry becomes data

This is the Quarry pipeline, scaled up.

AI and Document Systems

Even text-based systems follow the same pattern.

  • raw documents are ingested
  • processed into structured formats
  • indexed for retrieval
  • used by applications

The inputs are different.

The structure is familiar.

Healthcare and Motion

Human movement becomes data.

  • sensors capture motion
  • signals are processed
  • patterns are analyzed
  • systems track change over time

This is where the idea of digital twins becomes more dynamic.

Not just objects.

But behavior.

The Common Structure

Across all of these domains, the same core system emerges.

It doesn’t matter whether the input is:

  • a rock
  • a machine part
  • an artifact
  • a document
  • a human movement pattern

The architecture is remarkably consistent.

Capture.

Process.

Store.

Index.

Use.

The Value of Abstraction

One of the more useful realizations from the Quarry project is this:

The value isn’t in the specific object.
It’s in the system that handles it.

Once you understand the pattern, you can apply it in different contexts.

The details change.

The structure remains.

Systems, Not Features

At a certain point, it becomes less useful to think in terms of features.

Instead, the focus shifts to systems.

Questions change.

Instead of:

  • How do we store this object?
  • How do we search this dataset?

You start asking:

  • How does data move through the system?
  • Where are the bottlenecks?
  • How do we handle growth?
  • How do we handle imperfect inputs?

These are system-level questions.

The Real Takeaway

The Backyard Quarry started as a simple, somewhat comical, experiment.

But it revealed something broader.

Many modern systems are built on the same foundation:

  • transforming real-world inputs into structured data
  • building pipelines around that transformation
  • enabling search, analysis, and interaction

The objects change.

The pattern doesn’t.

Looking Back

It’s a little surprising how far the idea traveled.

From:

  • a pile of rocks

To:

  • data modeling
  • ingestion pipelines
  • search systems
  • digital twins
  • scalable architectures

And now:

  • recognizing patterns across industries

Not bad for something that started in the backyard.

What Comes Next

There’s one final step.

So far, we’ve explored:

  • how to model objects
  • how to capture them
  • how to store and search them
  • how systems scale
  • how patterns repeat

In the final post, we’ll bring everything together.

A single view of the system.

A way to think about it as a whole.

Because once you can see the full structure, the pattern becomes difficult to miss.

And at that point, it becomes clear that the Quarry was never really about rocks.

It was about learning to recognize systems.

The Rock Quarry Series

Facebooktwitterredditlinkedinmail

The Accountant: Optimizing AI Costs with Semantic Routing

We’ve solved the Reliability problem with The Judge. We have a system that can scientifically prove whether our Forensic Team is accurate. But there’s a new problem that keeps Directors and CFOs up at night: Sustainability.

In an enterprise environment, using a massive, high-reasoning model (like Claude 3.5 or GPT-4o) for every single bibliography lookup is a “Cognitive Budget” disaster. It’s like hiring a Senior Architect to fix a broken link.

Today, we introduce The Accountant: A Semantic Router that classifies task complexity and routes requests to the cheapest model capable of passing the Judge’s rubric.

1. The Concept of “Tiered Intelligence”

Not all forensic tasks require the same level of “gray matter.” To scale effectively, we must categorize our workload:

  • LEVEL 1 (Operational): “Find the standard page count for the 1925 edition of Gatsby.” This is a lookup and retrieval task. Local SLMs (Small Language Models) like Phi-4 or Llama 3.2 excel here.
  • LEVEL 2 (Forensic): “Compare the binding grain and typography inconsistencies between two suspected forgeries.” This requires high-dimensional analysis and deep reasoning. This is a job for the Cloud.
Architectural diagram of a Semantic Router called The Accountant. A user request enters the router, which classifies it into Level 1 (Simple/Metadata) or Level 2 (Complex Forensic). Level 1 is routed to a local Tier 1 SLM like Phi-4 or Llama 3.2, while Level 2 is routed to a Tier 2 Frontier Cloud model like Claude 3.5. Both paths converge to produce a final Audit Report.
The Semantic Router Architecture—Implementing Tiered Intelligence to optimize cognitive budget and reduce inference costs.

2. Implementing the Router (The Gatekeeper Pattern)

We’ve added router.py to our repository. The logic acts as a gatekeeper.
1. Classification: A lightweight model (the Accountant) reviews the user’s query against our config/prompts.yaml.
2. Economic Decision: If the query is “Level 1”, we trigger the ollama provider. If it’s “Level 2,” we escalate to the anthropic provider.

# The Accountant's Decision Engine in router.py
level = await classify_query(query)
provider = get_provider_for_level(level)

if level == "LEVEL_1":
    print("Accountant Decision: LEVEL_1 - Routing to Local SLM to save budget")
else:
    print("Accountant Decision: LEVEL_2 - Routing to High-Reasoning Cloud Model")

By defaulting to LEVEL_2 if classification fails, we ensure that we never sacrifice accuracy for cost – we only save money when we are certain the tasks are simple.

3. Projecting the ROI with The Judge

While we built the Accountant (the router), we haven’t yet run a full-scale economic audit in this repository. However, the architecture is designed to scientifically measure this trade-off using the Judge Agent (from our last post).

In an enterprise environment, a Director would use this framework to benchmark a representative sample of historical queries. A typical analysis for tiered intelligence systems shows that the vast majority of “forensic” requests are actually simple metadata lookups. By routing those to a local SLM (Phi-4 or Llama 3.2), we can achieve comparable reliability scores to a frontier cloud model while zeroing out the marginal cost of those specific tokens.

The Theoretical Savings (100k Calls/Month):

  • Current Cost (Frontier Cloud for 100% of tasks): ~$7,600/month
  • Projected Cost (90/10 Routed Split): ~$1,800/month
  • Total Savings: ~76% reduction in inference costs.
Task Category Estimated Volume “Status Quo” Cost (Frontier Cloud) “Routed” Cost (Accountant/SLM)
Level 1 (Standard Lookup/Formatting) 90% (90k calls) ~$4,500 ~$0 (Local/Self-Hosted)
Level 2 (Deep Forensic Analysis) 10% (10k calls) ~$3,100 ~$1,800*
Total Cognitive Budget 100% ~$7,600 ~$1,800

* Note: Level 2 “Routed” costs are lower here because the Accountant ensures only the most complex 10% of tokens hit the high-cost provider, whereas the “Status Quo” assumes a higher average cost across all 100k calls due to the lack of optimization.

Cognitive Budgeting Insights

As a Director, the responsibility is to build Sustainable Intelligence. If 80% of an AI workload can be moved to local infrastructure or cheaper “Flash” models without dropping our reliability score, I’m not just a developer—I’m a profit center. Semantic routing allows us to scale AI horizontally without the cloud bill scaling vertically.

🛠️ Step into the Clean-Room

The Accountant logic is now live in the repository. You can test the routing logic yourself by running the local orchestrator with the --use-accountant flag.

Explore the Code: MCP Forensic Analyzer on GitHub

(If this architecture helps your team justify their AI spend, consider dropping a ⭐ on the repo!)

The Production-Grade AI Series

  • Post 1: The Judge Agent: Who Audits the Auditors? (Reliability)
  • Post 2: The Accountant: Optimizing AI Costs with Semantic Routing (Sustainability) – You’re Here
  • Post 3: The Guardian: Human-in-the-Loop Governance (Safety) – Coming Soon

Looking for the foundation? Check out my previous series: The Zero-Glue AI Mesh with MCP.

Facebooktwitterredditlinkedinmail