By this point, the Backyard Quarry has a schema, a capture process, and a growing collection of records.
Each rock has:
- metadata
- images
- possibly a 3D model
In theory, everything is organized.
In practice, it quickly becomes difficult to find anything.
The First Search Problem
With a handful of rocks, you can rely on memory.
You remember roughly where things are.
You recognize shapes and colors.
But as the dataset grows, that breaks down.
You start asking questions like:
- Which rocks are under 5 pounds?
- Which ones are suitable for landscaping?
- Where did that smooth gray stone go?
At that point, you’re no longer dealing with a pile.
You’re dealing with a dataset.
And datasets need to be searchable.
Filtering by Metadata
The most straightforward approach is to use structured queries.
If we have metadata like weight, color, and classification, we can filter directly.
Conceptually:
SELECT *
FROM rocks
WHERE weight_lb < 5
AND color = 'gray'
AND rock_class <= 'Class 2'
This works well for clearly defined attributes.
It’s predictable.
It’s efficient.
And it’s the foundation of most data systems.
The Role of Classification
This is where the Quarry Taxonomy starts to pay off.
Instead of requiring precise measurements, we can use categories:
- Pebble Class
- Hand Sample
- Landscaping Rock
- Wheelbarrow Class
- Engine Block Class
This allows for simpler queries:
- “Show me everything below Wheelbarrow Class”
- “Exclude Engine Block Class entirely”
Classification reduces complexity.
It turns continuous values into discrete groups.
This is a common pattern in real-world systems.
When Metadata Isn’t Enough
Structured queries work well when you know exactly what you’re looking for.
But sometimes you don’t.
Sometimes the question looks more like:
Find rocks that look like this one.
Or:
Find something similar to the smooth stone I saw earlier.
At that point, metadata alone isn’t enough.
We need another way to compare objects.
Similarity and Representation
Images and 3D models contain information that isn’t captured in simple fields like color or weight.
To use that information, we need to represent it in a comparable way.
One approach is to generate embeddings — numerical representations of images or shapes.
Conceptually:
- each rock image → vector representation
- similar images → vectors close together
- dissimilar images → vectors further apart
This allows for similarity search.
Instead of filtering by attributes, we search by resemblance.
A Different Kind of Query
With similarity search, queries look different.
Instead of:
color = 'gray'
weight < 5
We might have:
find nearest neighbors to this image
This shifts the system from exact matching to approximate matching.
It’s less precise.
But often more useful.
A Familiar Pattern
At this point, the Backyard Quarry starts to resemble systems used in:
- image search engines
- product recommendation systems
- digital asset management platforms
- AI-powered retrieval systems
The objects are different.
The pattern is the same.
Store data.
Index it.
Provide multiple ways to retrieve it.
Combining Approaches
In practice, the most useful systems combine both methods.
Structured filtering:
- weight
- class
- location
Similarity search:
- appearance
- shape
- texture
Together, they provide flexibility.
You can narrow down the dataset and then explore it.
The Cost of Search
Search doesn’t come for free.
It introduces:
- indexing overhead
- additional storage
- preprocessing steps
- more complex queries
And like everything else in the Quarry system, these tradeoffs become more significant as the dataset grows.
The Realization
At this point, something interesting becomes clear.
The hard part isn’t collecting rocks.
It isn’t even modeling them.
The hard part is making the data usable.
And usability, in most systems, comes down to one thing:
Search.
What Comes Next
With data captured and searchable, the next step is to zoom out.
What we’ve built so far is more than just a rock catalog.
It’s a small example of a larger idea.
In the next post, we’ll look at that idea more directly:
Digital twins.
Because once you can represent, store, and search objects, you’ve taken the first step toward building systems that mirror the physical world.
And somewhere in the process, it becomes clear that even a pile of rocks benefits from thoughtful indexing.
Which is not something I expected to say when this started.
The Rock Quarry Series
- The Backyard Quarry: Turning Rocks Into Data
- The Backyard Quarry, Part 2: Designing a Schema for Physical Objects
- The Backyard Quarry, Part 3: Capturing the Physical World
It’s so relatable how quickly things can get chaotic even when you have a system in place. I’ve definitely experienced that with my own collections!