The Backyard Quarry, Part 4: Searching a Pile of Rocks

By this point, the Backyard Quarry has a schema, a capture process, and a growing collection of records.

Each rock has:

  • metadata
  • images
  • possibly a 3D model

In theory, everything is organized.

In practice, it quickly becomes difficult to find anything.

The First Search Problem

With a handful of rocks, you can rely on memory.

You remember roughly where things are.

You recognize shapes and colors.

But as the dataset grows, that breaks down.

You start asking questions like:

  • Which rocks are under 5 pounds?
  • Which ones are suitable for landscaping?
  • Where did that smooth gray stone go?

At that point, you’re no longer dealing with a pile.

You’re dealing with a dataset.

And datasets need to be searchable.

Filtering by Metadata

The most straightforward approach is to use structured queries.

If we have metadata like weight, color, and classification, we can filter directly.

Conceptually:

SELECT *
FROM rocks
WHERE weight_lb < 5
AND color = 'gray'
AND rock_class <= 'Class 2'

This works well for clearly defined attributes.

It’s predictable.

It’s efficient.

And it’s the foundation of most data systems.

The Role of Classification

This is where the Quarry Taxonomy starts to pay off.

Instead of requiring precise measurements, we can use categories:

  • Pebble Class
  • Hand Sample
  • Landscaping Rock
  • Wheelbarrow Class
  • Engine Block Class

This allows for simpler queries:

  • “Show me everything below Wheelbarrow Class”
  • “Exclude Engine Block Class entirely”

Classification reduces complexity.

It turns continuous values into discrete groups.

This is a common pattern in real-world systems.

When Metadata Isn’t Enough

Structured queries work well when you know exactly what you’re looking for.

But sometimes you don’t.

Sometimes the question looks more like:

Find rocks that look like this one.

Or:

Find something similar to the smooth stone I saw earlier.

At that point, metadata alone isn’t enough.

We need another way to compare objects.

Similarity and Representation

Images and 3D models contain information that isn’t captured in simple fields like color or weight.

To use that information, we need to represent it in a comparable way.

One approach is to generate embeddings — numerical representations of images or shapes.

Conceptually:

  • each rock image → vector representation
  • similar images → vectors close together
  • dissimilar images → vectors further apart

This allows for similarity search.

Instead of filtering by attributes, we search by resemblance.

A Different Kind of Query

With similarity search, queries look different.

Instead of:

color = 'gray'
weight < 5

We might have:

find nearest neighbors to this image

This shifts the system from exact matching to approximate matching.

It’s less precise.

But often more useful.

A Familiar Pattern

At this point, the Backyard Quarry starts to resemble systems used in:

  • image search engines
  • product recommendation systems
  • digital asset management platforms
  • AI-powered retrieval systems

The objects are different.

The pattern is the same.

Store data.

Index it.

Provide multiple ways to retrieve it.

Combining Approaches

In practice, the most useful systems combine both methods.

Structured filtering:

  • weight
  • class
  • location

Similarity search:

  • appearance
  • shape
  • texture

Together, they provide flexibility.

You can narrow down the dataset and then explore it.

The Cost of Search

Search doesn’t come for free.

It introduces:

  • indexing overhead
  • additional storage
  • preprocessing steps
  • more complex queries

And like everything else in the Quarry system, these tradeoffs become more significant as the dataset grows.

The Realization

At this point, something interesting becomes clear.

The hard part isn’t collecting rocks.

It isn’t even modeling them.

The hard part is making the data usable.

And usability, in most systems, comes down to one thing:

Search.

What Comes Next

With data captured and searchable, the next step is to zoom out.

What we’ve built so far is more than just a rock catalog.

It’s a small example of a larger idea.

In the next post, we’ll look at that idea more directly:

Digital twins.

Because once you can represent, store, and search objects, you’ve taken the first step toward building systems that mirror the physical world.

And somewhere in the process, it becomes clear that even a pile of rocks benefits from thoughtful indexing.

Which is not something I expected to say when this started.

The Rock Quarry Series

Facebooktwitterredditlinkedinmail

The Backyard Quarry, Part 3: Capturing the Physical World

In the previous post, we designed a schema for representing rocks as structured data.

On paper, everything looked clean.

Each rock would have:

  • an identifier
  • dimensions
  • weight
  • metadata
  • possibly images or even a 3D model

The structure made sense.

The problem was getting the data.

From Schema to Reality

Designing a schema is straightforward.

You can sit down with a notebook or a whiteboard and define exactly what you want the system to store.

Capturing real-world data is a different problem entirely.

The moment you step outside, a few complications become obvious.

Lighting changes.

Objects aren’t uniform.

Measurements are approximate.

And perhaps most importantly:

The dataset doesn’t behave consistently.

The Scale Problem

The Backyard Quarry dataset spans a wide range of sizes:

pea-sized
hand-sized
wheelbarrow-sized
engine-block-sized

That variability immediately affects how data can be captured.

Small rocks can be photographed on a table.

Medium rocks might need to be placed on the ground with careful framing.

Large rocks don’t move easily at all.

Each category introduces different constraints.

This is a pattern that shows up in many real-world systems.

The same pipeline rarely works for every object.

Image Capture

The simplest form of data capture is photography.

Take a few images of each rock from different angles.

Store them.

Attach them to the record.

Even this introduces decisions:

  • how many images per object?
  • what angles?
  • what lighting conditions?
  • what background?

Inconsistent capture leads to inconsistent data.

And inconsistent data leads to unreliable systems.

Introducing Photogrammetry

If we take the idea a step further, we can generate a 3D model of each rock.

Photogrammetry works by combining multiple images to reconstruct the shape of an object.

Conceptually:

  • take overlapping photos
  • feed them into a processing tool
  • generate a 3D mesh

This produces a much richer representation than a single image.

But it also introduces:

  • processing time
  • storage requirements
  • failure cases

Not every rock will produce a clean model.

The Capture Pipeline

At this point, the process starts to look like a pipeline.

Diagram showing a data pipeline for capturing physical objects, including image capture, photogrammetry processing, metadata extraction, and storage.
A simplified pipeline for turning a physical object into structured data and associated assets.

Each step transforms the data in some way.

The output of one stage becomes the input of the next.

This is a common pattern in data engineering.

The difference here is that the input isn’t a clean dataset.

It’s the physical world.

Imperfect Data

No matter how carefully you design the pipeline, real-world data introduces imperfections.

Examples:

  • missing images
  • inconsistent lighting
  • partially occluded objects
  • measurement errors

A rock might be:

  • too reflective
  • too uniform in texture
  • partially buried
  • awkwardly shaped

All of these affect the output.

This means the system has to tolerate incomplete or imperfect data.

Which leads to an important realization:

Data systems are rarely about perfect data.
They are about handling imperfect data gracefully.

Storage Considerations

Once data is captured, it needs to be stored.

Different types of data behave differently:

  • metadata → small, structured, easy to query
  • images → larger, unstructured
  • 3D models → even larger, more complex

This reinforces a pattern introduced earlier:

Separate structured data from large assets.

Store references rather than embedding everything directly.

A Familiar Pattern

At this point, the Backyard Quarry pipeline looks surprisingly familiar.

It resembles systems used for:

  • scanning historical artifacts
  • capturing industrial parts
  • generating 3D models for manufacturing
  • building datasets for computer vision

The specifics change.

The pattern remains the same.

What Comes Next

Once data is captured and stored, the next problem emerges.

How do we find anything?

A dataset of a few rocks is manageable.

A dataset of hundreds or thousands quickly becomes difficult to navigate without structure.

In the next post, we’ll look at how to index and search the dataset — and how even a pile of rocks benefits from thoughtful retrieval systems.

And somewhere along the way, it becomes clear that the hard part isn’t designing the schema.

It’s building systems that can reliably turn messy reality into usable data.

Miss Part of the Series?

Facebooktwitterredditlinkedinmail