The Local Eye (Sovereign Vision)

We’ve built a system that is Reliable, Affordable, and Governed. But until now, our Forensic Team has been “blind.” It could only reconcile text-based metadata.

In the world of rare book forensics, the text is only half the story. The typography, paper grain, and binding texture are the true “fingerprints.” However, sending high-resolution, proprietary scans of a $50,000 asset to a cloud-based LLM is a Data Sovereignty nightmare.

Today, we introduce The Local Eye: Edge-based Multimodal Vision that processes pixels without letting them leak into the cloud.

The Sovereignty Gap in Multimodal AI

Most multimodal implementations send raw images directly to frontier models (like GPT-4o). For an enterprise, this is a liability.

  1. Intellectual Property: Who owns the training data rights to the scan?
  2. Privacy: Does the image contain metadata or background information that violates NDAs?
  3. Cost: Sending 10MB 4K images for every query is an “Accountant’s” nightmare.

Implementing “Feature Extraction” at the Edge

Instead of sending the image to the cloud, we use Llama 3.2 Vision running locally via Ollama. Our MCP server acts as an “Airlock.”

The Handshake:
Normalization: The sharp library resizes and standardizes the forensic scan locally.
Local Inference: The Vision SLM analyzes the image and generates a text-based “Feature Map.”
Metadata Egress: Only the textual description is passed to the reasoning agents. Even if The Accountant routes the task to a Cloud model for deep analysis, the cloud only sees our description, never the pixels.

Architectural diagram of the 'Local Eye' workflow. An artifact image is processed locally using the Sharp library and Llama 3.2 Vision. Only the resulting text metadata is allowed to pass through the security airlock to cloud-based reasoning models, ensuring the original pixels never leave the local environment.
The Sovereign Vision Workflow—Extracting intelligence at the edge to prevent data leakage.

The Sovereign Vision Workflow—Extracting intelligence at the edge to prevent data leakage.
Architectural diagram of the 'Local Eye' workflow. An artifact image is processed locally using the Sharp library and Llama 3.2 Vision. Only the resulting text metadata is allowed to pass through the security airlock to cloud-based reasoning models, ensuring the original pixels never leave the local environment.

In code we might have something like this then:

// From src/index.ts: The Vision Airlock
async function analyzeArtifactVision(imagePath: string, focus: string) {
  const processedImage = await sharp(imagePath).resize(512, 512).toBuffer();

  // Local-only call to Ollama
  const description = await ollama.generate({
    model: 'llama3.2-vision',
    prompt: `Analyze the ${focus} of this artifact.`,
    images: [processedImage.toString('base64')]
  });

  return description; // Pixels stay here. Only text leaves.
}

The “Zero-Pixel” Policy

The goal is to maximize Intelligence while minimizing Exposure. By implementing Local Vision, we treat the cloud as a “Reasoning Utility,” not a “Data Store.” We send it the logic puzzle, but we never give it the raw forensic evidence. We gain the power of frontier-model reasoning without the risk of data harvesting.

Developer Lessons: The “Latency of Locality”

In building the Sovereign Vault, we learned that ‘Data Sovereignty’ has a physical cost: Time.

While a cloud-based API might analyze a 4K image in seconds, running a deep-dive OCR and visual analysis on local consumer hardware using Llama 3.2-Vision takes significantly longer. We had to tune our “Airlock” timeouts—raising the ceiling from 120 seconds to 300 seconds—to give the local “Eye” enough time to process complex handwriting on a standard CPU.

Additionally, we realized that our error logs were a potential privacy leak. We implemented Log Truncation to ensure that even our failures respect the Sovereign Vault’s privacy mandate.

The “Zero-Glue” Discovery

In a traditional setup, adding vision would require rewriting the orchestrator’s core logic. Because we use the Model Context Protocol, the orchestrator simply asked the server: “What can you do?”. The server replied with the analyze_artifact_vision manifest. The agent then dynamically decided to use this new “Eye” to investigate the Gatsby image. No new glue code was written to connect the vision model to the reasoning brain.

Case Study: The Gatsby Inscription

To test our Sovereign Vault, we ran a forensic audit on a high-value first edition of The Great Gatsby. Our local Vision Agent detected something anomalous on the title page: a cursive, multi-line inscription.

An image of The Great Gatsby copyright page
Image credit: [University of Southern Mississippi Special Collections](https://lib.usm.edu/spcol/exhibitions/item_of_the_month/iotm_june_2021.html) (June 2021 Item of the Month)

The Sovereign Trace

When we ran the analyze_artifact_vision tool, the local Llama 3.2 Vision model performed a deep scan and returned a fascinating finding:

**Visual Findings: Handwritten Inscription**
* Location: Right-hand margin of title page
* Medium: Faint pencil, cursive script
* Transcribed Content: "Then we are not alone at all when we remember that we have in our hearts that something so precious..."

Why this matters: Notice that the model didn’t just see “scribbles.” It attempted to transcribe a 40-word passage. Crucially, the Forensic Analyst (Claude) recognized that this text does not exist in any canonical version of The Great Gatsby.

This is a massive forensic win. The “Eye” identified a potential fabricated provenance or a non-standard owner intervention. Because this happened inside our “Airlock,” the specific handwriting and the non-canonical text were captured without ever touching a cloud API.

The Architect’s Trade-off: The Reasoning Gap
While our local Llama 3.2-Vision is an incredible “Eye,” it occasionally faces a Reasoning Gap. In certain runs, it may identify a note as “illegible” or produce repetitive output due to CPU thermal throttling or model constraints.

Instead of hallucinating a “clean” signature, our system is designed to Safe-Fail. It flags the finding as “Indeterminate” and triggers a High-Severity Human Authorization request.

The Governance Challenge: We now have a transcribed inscription that might contain a previous owner’s private thoughts or names. If we simply passed this output to an LLM for summarization, we would have leaked a private message to a third-party server. This discovery sets the stage for our next architectural layer: The Redactor.

Facebooktwitterredditlinkedinmail

Building Your First MCP Server: TypeScript vs. Python

The 5-Minute “Hello World” Comparison

We’ve spent the last month talking about the End of Glue Code and the Enterprise AI Mesh. But if you’re a developer, you don’t just want to see the blueprint—you want to hold the tools.

Whether you are a TypeScript veteran or a Python enthusiast, building an MCP server is surprisingly simple. Today, we’re going to build the same “Hello World” tool in both languages to show you exactly how the protocol abstracts away the complexity.

1. The TypeScript Approach (Node.js)

TypeScript is the “native” language of the Model Context Protocol, and the @modelcontextprotocol/sdk is exceptionally robust for high-performance enterprise tools.

Prerequisites:

npm install @modelcontextprotocol/sdk zod

The Code:

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new Server({
  name: "hello-world-server",
  version: "1.0.0",
}, {
  capabilities: { tools: {} }
});

// Define a simple greeting tool
server.tool(
  "greet_user",
  { name: z.string().describe("The name of the person to greet") },
  async ({ name }) => {
    return {
      content: [{ type: "text", text: `Hello, ${name}! Welcome to the MCP Mesh.` }]
    };
  }
);

async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
}

main().catch(console.error);

2. The Python Approach

For data scientists and AI engineers, the Python SDK offers a beautifully decorative approach. It feels more “agent-native” and integrates seamlessly with existing AI libraries.

Prerequisites:

pip install mcp

The Code:

import asyncio
from mcp.server.fastmcp import FastMCP

# Initialize FastMCP - the "Quick Start" wrapper
mcp = FastMCP("HelloWorld")

@mcp.tool()
async def greet_user(name: str) -> str:
    """Greets a user by name."""
    return f"Hello, {name}! Welcome to the MCP Mesh."

if __name__ == "__main__":
    mcp.run(transport='stdio')

Side-by-Side: Which Should You Choose?

Feature TypeScript (Standard SDK) Python (FastMCP)
Best For High-performance, Type-safe tools Rapid prototyping, AI logic
Validation Zod (Explicit & Strict) Pydantic / Type Hints (Implicit)
Verbosity Moderate (Structured) Minimal (Decorator-based)
Transport STDIO, SSE, Custom STDIO, SSE

How to Test Your Server

Once you’ve saved your code, you don’t need a complex frontend to test it. Use the MCP Inspector:

# For TypeScript
npx @modelcontextprotocol/inspector node build/index.js

# For Python
npx @modelcontextprotocol/inspector python your_script.py

This will launch a local web interface where you can perform the “Protocol Handshake” and trigger your tools manually. It’s the best way to verify your “Zero-Glue” infrastructure before connecting it to an agent.

Conclusion

The “Zero-Glue” architecture isn’t about which language you use—it’s about the Protocol. As you can see, the logic for the “Hello World” tool is nearly identical in both versions. The Model Context Protocol ensures that no matter how you build your tools, your agents can discover and use them in a standardized way.

Ready to build your own?

Check out the reference repo for more complex examples, including Notion and Oracle 26ai integrations.

MCP Forensic Analyzer Repository

The “Zero-Glue” Series

What’s Next?

The Mesh is built.
The agents are ready.
But can you trust them?

In my next series, we explore the ‘Science of Reliability’—building the evaluators that turn AI experiments into production-grade systems.

Facebooktwitterredditlinkedinmail