Data Engineering Archives | Page 2 of 4

What I’ve Been Building: Systems, AI, and Real-World Data

Over the past several weeks, I’ve been spending a lot of time thinking about systems.

Some of that thinking has taken the form of writing.

If you’ve come across any of my recent posts, they might seem like they cover very different topics:

cataloging rocks in a backyard
building AI systems using MCP
working with documents, images, and real-world data

At first glance, they don’t appear to have much in common.

But they’re all exploring the same underlying idea.

The Common Thread

Across all of these posts, the focus has been on a specific kind of problem:

How do we turn messy, real-world inputs into structured, usable systems?

That problem shows up in many different forms.

Sometimes the input is physical:

objects
artifacts
environments

Sometimes it’s digital:

documents
images
logs

Sometimes it’s dynamic:

motion
behavior
sensor data

But the challenge is the same.

The input is unstructured.

The system needs structure.

The Backyard Quarry

One way I explored this idea was through a small project I called the Backyard Quarry.

It started with a simple observation:

There are a lot of rocks in the yard.

From there, the problem evolved into something more interesting:

how to represent physical objects as data
how to capture images and measurements
how to build pipelines around that data
how to search and organize it
how to think about digital twins

What began as a small experiment became a way to explore system design in a constrained, tangible setting.

MCP and AI Systems

In parallel, I’ve been writing about building AI systems using MCP.

On the surface, this looks very different.

Instead of rocks, the inputs are:

documents
APIs
models
agent workflows

But the structure is familiar.

inputs are ingested
processed
transformed
routed
used by applications

The system still needs to handle:

variability
scale
imperfect data
orchestration

Different inputs.

Same patterns.

From Objects to Systems

One of the more useful realizations in working through these ideas is this:

The problem is rarely about the individual object.
It’s about the system that handles many objects over time.

Whether the object is:

a rock
a document
a sensor reading

The questions become:

how is it represented?
how does it enter the system?
how is it transformed?
how is it stored?
how is it retrieved?

These are system-level questions.

A Shared Architecture

Across these different domains, a common architecture begins to emerge.

Diagram showing how raw inputs are captured, processed, structured, indexed, and used by applications in a data system. — A common pattern for transforming real-world inputs into usable systems.

The labels change depending on the domain.

But the structure remains consistent.

Why This Matters

Understanding this pattern makes it easier to approach new problems.

Instead of starting from scratch each time, you can ask:

Where does the data come from?
How does it enter the system?
What transformations are required?
How will it be used?

This reduces complexity.

It also makes systems more predictable.

What I’m Interested In

Going forward, I’m particularly interested in systems that sit at the boundary between:

the physical world and digital systems
unstructured inputs and structured data
human workflows and automated processes

That includes areas like:

digital archiving
photogrammetry and 3D capture
AI-assisted analysis
systems that track objects or behavior over time

These problems are messy.

Which is part of what makes them interesting.

A Continuing Exploration

The posts I’ve been writing are not meant to be definitive.

They’re part of an ongoing exploration.

A way to think through problems in public.

And occasionally, a way to use a slightly unusual example — like a pile of rocks — to make broader ideas easier to see.

If You’re Interested

If any of this resonates, you might find these useful:

The Backyard Quarry Series

A systems-focused look at modeling and working with physical objects starting with Turning Rocks Into Data.

MCP and AI Systems

A technical exploration of building agent-based systems and data pipelines. I’d suggest starting with The End of Glue Code: Why MCP is the USB-C Moment for AI Systems.

More to come.

And if nothing else, it turns out that even a backyard can be a good place to think about system design.

The Backyard Quarry, Part 7: Systems Beyond the Backyard

By now, the Backyard Quarry system has grown beyond its original intent.

We started with a pile of rocks.

We ended up with:

a schema
a capture process
a processing pipeline
storage and indexing
digital representations of physical objects

Along the way, something interesting happened.

The problems stopped feeling unique.

Recognizing the Pattern

At first, the Quarry felt like a small, slightly absurd project.

But the more pieces came together, the more familiar it became.

The same structure appeared again and again:

capture data from the physical world
transform it into structured representations
store it
index it
build systems on top of it

This isn’t a rock problem.

It’s a pattern.

Where the Pattern Appears

Once you start looking for it, you see it everywhere.

Manufacturing Systems

Physical parts become digital records.

components are tracked
condition is monitored
systems are modeled

Each part has a digital twin.

The system keeps everything connected.

Museums and Archives

Artifacts are cataloged and preserved.

metadata describes objects
images and scans capture detail
provenance tracks history

The goal is the same:

Turn physical objects into structured, searchable systems.

Photogrammetry and 3D Capture

Entire environments can be captured and reconstructed.

objects become meshes
scenes become models
real-world geometry becomes data

This is the Quarry pipeline, scaled up.

AI and Document Systems

Even text-based systems follow the same pattern.

raw documents are ingested
processed into structured formats
indexed for retrieval
used by applications

The inputs are different.

The structure is familiar.

Healthcare and Motion

Human movement becomes data.

sensors capture motion
signals are processed
patterns are analyzed
systems track change over time

This is where the idea of digital twins becomes more dynamic.

Not just objects.

But behavior.

The Common Structure

Across all of these domains, the same core system emerges.

It doesn’t matter whether the input is:

a rock
a machine part
an artifact
a document
a human movement pattern

The architecture is remarkably consistent.

Capture.

Process.

Store.

Index.

Use.

The Value of Abstraction

One of the more useful realizations from the Quarry project is this:

The value isn’t in the specific object.
It’s in the system that handles it.

Once you understand the pattern, you can apply it in different contexts.

The details change.

The structure remains.

Systems, Not Features

At a certain point, it becomes less useful to think in terms of features.

Instead, the focus shifts to systems.

Questions change.

Instead of:

How do we store this object?
How do we search this dataset?

You start asking:

How does data move through the system?
Where are the bottlenecks?
How do we handle growth?
How do we handle imperfect inputs?

These are system-level questions.

The Real Takeaway

The Backyard Quarry started as a simple, somewhat comical, experiment.

But it revealed something broader.

Many modern systems are built on the same foundation:

transforming real-world inputs into structured data
building pipelines around that transformation
enabling search, analysis, and interaction

The objects change.

The pattern doesn’t.

Looking Back

It’s a little surprising how far the idea traveled.

From:

a pile of rocks

To:

data modeling
ingestion pipelines
search systems
digital twins
scalable architectures

And now:

recognizing patterns across industries

Not bad for something that started in the backyard.

What Comes Next

There’s one final step.

So far, we’ve explored:

how to model objects
how to capture them
how to store and search them
how systems scale
how patterns repeat

In the final post, we’ll bring everything together.

A single view of the system.

A way to think about it as a whole.

Because once you can see the full structure, the pattern becomes difficult to miss.

And at that point, it becomes clear that the Quarry was never really about rocks.

It was about learning to recognize systems.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tag: Data Engineering

What I’ve Been Building: Systems, AI, and Real-World Data

The Common Thread

The Backyard Quarry

MCP and AI Systems

From Objects to Systems

A Shared Architecture

Why This Matters

What I’m Interested In

A Continuing Exploration

If You’re Interested

The Backyard Quarry Series

MCP and AI Systems

The Backyard Quarry, Part 7: Systems Beyond the Backyard

Recognizing the Pattern

Where the Pattern Appears

Manufacturing Systems

Museums and Archives

Photogrammetry and 3D Capture

AI and Document Systems

Healthcare and Motion

The Common Structure

The Value of Abstraction

Systems, Not Features

The Real Takeaway

Looking Back

What Comes Next

The Rock Quarry Series