{"id":1337,"date":"2026-03-31T09:23:23","date_gmt":"2026-03-31T16:23:23","guid":{"rendered":"https:\/\/www.kenwalger.com\/blog\/?p=1337"},"modified":"2026-05-05T07:11:21","modified_gmt":"2026-05-05T14:11:21","slug":"capturing-physical-objects-data-pipeline","status":"publish","type":"post","link":"https:\/\/www.kenwalger.com\/blog\/data-engineering\/capturing-physical-objects-data-pipeline\/","title":{"rendered":"The Backyard Quarry, Part 3: Capturing the Physical World"},"content":{"rendered":"<p>In the <a href=\"https:\/\/www.kenwalger.com\/blog\/data-engineering\/designing-a-schema-for-physical-objects\">previous post<\/a>, we designed a schema for representing rocks as structured data.<\/p>\n<p>On paper, everything looked clean.<\/p>\n<p>Each rock would have:<\/p>\n<ul>\n<li>an identifier<\/li>\n<li>dimensions<\/li>\n<li>weight<\/li>\n<li>metadata<\/li>\n<li>possibly images or even a 3D model<\/li>\n<\/ul>\n<p>The structure made sense.<\/p>\n<p>The problem was getting the data.<\/p>\n<h2>From Schema to Reality<\/h2>\n<p>Designing a schema is straightforward.<\/p>\n<p>You can sit down with a notebook or a whiteboard and define exactly what you want the system to store.<\/p>\n<p>Capturing real-world data is a different problem entirely.<\/p>\n<p>The moment you step outside, a few complications become obvious.<\/p>\n<p>Lighting changes.<\/p>\n<p>Objects aren\u2019t uniform.<\/p>\n<p>Measurements are approximate.<\/p>\n<p>And perhaps most importantly:<\/p>\n<p>The dataset doesn\u2019t behave consistently.<\/p>\n<h2>The Scale Problem<\/h2>\n<p>The Backyard Quarry dataset spans a wide range of sizes:<\/p>\n<pre><code class=\"language-plaintext\">pea-sized\nhand-sized\nwheelbarrow-sized\nengine-block-sized\n<\/code><\/pre>\n<p>That variability immediately affects how data can be captured.<\/p>\n<p>Small rocks can be photographed on a table.<\/p>\n<p>Medium rocks might need to be placed on the ground with careful framing.<\/p>\n<p>Large rocks don\u2019t move easily at all.<\/p>\n<p>Each category introduces different constraints.<\/p>\n<p>This is a pattern that shows up in many real-world systems.<\/p>\n<p>The same pipeline rarely works for every object.<\/p>\n<h2>Image Capture<\/h2>\n<p>The simplest form of data capture is photography.<\/p>\n<p>Take a few images of each rock from different angles.<\/p>\n<p>Store them.<\/p>\n<p>Attach them to the record.<\/p>\n<p>Even this introduces decisions:<\/p>\n<ul>\n<li>how many images per object?<\/li>\n<li>what angles?<\/li>\n<li>what lighting conditions?<\/li>\n<li>what background?<\/li>\n<\/ul>\n<p>Inconsistent capture leads to inconsistent data.<\/p>\n<p>And inconsistent data leads to unreliable systems.<\/p>\n<h2>Introducing Photogrammetry<\/h2>\n<p>If we take the idea a step further, we can generate a 3D model of each rock.<\/p>\n<p>Photogrammetry works by combining multiple images to reconstruct the shape of an object.<\/p>\n<p>Conceptually:<\/p>\n<ul>\n<li>take overlapping photos<\/li>\n<li>feed them into a processing tool<\/li>\n<li>generate a 3D mesh<\/li>\n<\/ul>\n<p>This produces a much richer representation than a single image.<\/p>\n<p>But it also introduces:<\/p>\n<ul>\n<li>processing time<\/li>\n<li>storage requirements<\/li>\n<li>failure cases<\/li>\n<\/ul>\n<p>Not every rock will produce a clean model.<\/p>\n<h2>The Capture Pipeline<\/h2>\n<p>At this point, the process starts to look like a pipeline.<\/p>\n<figure id=\"attachment_1338\" aria-describedby=\"caption-attachment-1338\" style=\"width: 404px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1338\" data-permalink=\"https:\/\/www.kenwalger.com\/blog\/data-engineering\/capturing-physical-objects-data-pipeline\/attachment\/phyiscal-object-capture-data-pipeline-diagram\/\" data-orig-file=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/03\/phyiscal-object-capture-data-pipeline-diagram.png\" data-orig-size=\"547,1388\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"phyiscal-object-capture-data-pipeline-diagram\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;A simplified pipeline for turning a physical object into structured data and associated assets.&lt;\/p&gt;\n\" data-large-file=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/03\/phyiscal-object-capture-data-pipeline-diagram-404x1024.png\" class=\"size-large wp-image-1338\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/03\/phyiscal-object-capture-data-pipeline-diagram-404x1024.png\" alt=\"Diagram showing a data pipeline for capturing physical objects, including image capture, photogrammetry processing, metadata extraction, and storage.\" width=\"404\" height=\"1024\" srcset=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/03\/phyiscal-object-capture-data-pipeline-diagram-404x1024.png 404w, https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/03\/phyiscal-object-capture-data-pipeline-diagram-118x300.png 118w, https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/03\/phyiscal-object-capture-data-pipeline-diagram.png 547w\" sizes=\"auto, (max-width: 404px) 85vw, 404px\" \/><figcaption id=\"caption-attachment-1338\" class=\"wp-caption-text\">A simplified pipeline for turning a physical object into structured data and associated assets.<\/figcaption><\/figure>\n<p>Each step transforms the data in some way.<\/p>\n<p>The output of one stage becomes the input of the next.<\/p>\n<p>This is a common pattern in data engineering.<\/p>\n<p>The difference here is that the input isn\u2019t a clean dataset.<\/p>\n<p>It\u2019s the physical world.<\/p>\n<h2>Imperfect Data<\/h2>\n<p>No matter how carefully you design the pipeline, real-world data introduces imperfections.<\/p>\n<p>Examples:<\/p>\n<ul>\n<li>missing images<\/li>\n<li>inconsistent lighting<\/li>\n<li>partially occluded objects<\/li>\n<li>measurement errors<\/li>\n<\/ul>\n<p>A rock might be:<\/p>\n<ul>\n<li>too reflective<\/li>\n<li>too uniform in texture<\/li>\n<li>partially buried<\/li>\n<li>awkwardly shaped<\/li>\n<\/ul>\n<p>All of these affect the output.<\/p>\n<p>This means the system has to tolerate incomplete or imperfect data.<\/p>\n<p>Which leads to an important realization:<\/p>\n<blockquote><p>\n  Data systems are rarely about perfect data.<br \/>\n  They are about handling imperfect data gracefully.\n<\/p><\/blockquote>\n<h2>Storage Considerations<\/h2>\n<p>Once data is captured, it needs to be stored.<\/p>\n<p>Different types of data behave differently:<\/p>\n<ul>\n<li>metadata \u2192 small, structured, easy to query<\/li>\n<li>images \u2192 larger, unstructured<\/li>\n<li>3D models \u2192 even larger, more complex<\/li>\n<\/ul>\n<p>This reinforces a pattern introduced earlier:<\/p>\n<p>Separate structured data from large assets.<\/p>\n<p>Store references rather than embedding everything directly.<\/p>\n<h2>A Familiar Pattern<\/h2>\n<p>At this point, the <strong>Backyard Quarry<\/strong> pipeline looks surprisingly familiar.<\/p>\n<p>It resembles systems used for:<\/p>\n<ul>\n<li>scanning historical artifacts<\/li>\n<li>capturing industrial parts<\/li>\n<li>generating 3D models for manufacturing<\/li>\n<li>building datasets for computer vision<\/li>\n<\/ul>\n<p>The specifics change.<\/p>\n<p>The pattern remains the same.<\/p>\n<h2>What Comes Next<\/h2>\n<p>Once data is captured and stored, the next problem emerges.<\/p>\n<p>How do we find anything?<\/p>\n<p>A dataset of a few rocks is manageable.<\/p>\n<p>A dataset of hundreds or thousands quickly becomes difficult to navigate without structure.<\/p>\n<p>In the next post, we\u2019ll look at how to index and search the dataset \u2014 and how even a pile of rocks benefits from thoughtful retrieval systems.<\/p>\n<p>And somewhere along the way, it becomes clear that the hard part isn\u2019t designing the schema.<\/p>\n<p>It\u2019s building systems that can reliably turn messy reality into usable data.<\/p>\n<h3>The Rock Quarry Series<\/h3>\n<ul>\n<li><a href=\"https:\/\/www.kenwalger.com\/blog\/software-engineering\/the-backyard-quarry-turning-rocks-into-data\">Turning Rocks into Data<\/a><\/li>\n<li><a href=\"https:\/\/www.kenwalger.com\/blog\/data-engineering\/designing-a-schema-for-physical-objects\">Designing a Schema for Physical Objects<\/a><\/li>\n<li><a href=\"https:\/\/www.kenwalger.com\/blog\/data-engineering\/capturing-physical-objects-data-pipeline\">Capturing the Physical World<\/a> &#8211; <em>This Post<\/em><\/li>\n<li><a href=\"https:\/\/www.kenwalger.com\/blog\/data-engineering\/searching-physical-objects-data-indexing\">Searching a Pile of Rocks<\/a><\/li>\n<li><a href=\"https:\/\/www.kenwalger.com\/blog\/data-engineering\/digital-twins-physical-objects-explained\">Digital Twins for Physical Objects<\/a><\/li>\n<li><a href=\"https:\/\/www.kenwalger.com\/blog\/data-engineering\/scaling-data-pipelines-physical-objects\">Scaling the Quarry<\/a><\/li>\n<li><a href=\"https:\/\/www.kenwalger.com\/blog\/data-engineering\/system-design-patterns-real-world-data-platforms\">Systems Beyond the Backyard<\/a><\/li>\n<li><a href=\"https:\/\/www.kenwalger.com\/blog\/data-engineering\/from-rocks-to-reality-system-design-patterns\">From Rocks to Reality<\/a><\/li>\n<\/ul>\n<a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-facebook nolightbox\" data-provider=\"facebook\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Facebook\" href=\"https:\/\/www.facebook.com\/sharer.php?u=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1337&#038;t=The%20Backyard%20Quarry%2C%20Part%203%3A%20Capturing%20the%20Physical%20World&#038;s=100&#038;p&#091;url&#093;=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1337&#038;p&#091;images&#093;&#091;0&#093;=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-content%2Fuploads%2F2026%2F04%2Fblog-of-ken-w.-alger-69ea5acc0b4ed.png&#038;p&#091;title&#093;=The%20Backyard%20Quarry%2C%20Part%203%3A%20Capturing%20the%20Physical%20World\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"Facebook\" title=\"Share on Facebook\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/facebook.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-twitter nolightbox\" data-provider=\"twitter\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Twitter\" href=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1337&#038;text=Hey%20check%20this%20out\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"twitter\" title=\"Share on Twitter\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/twitter.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-reddit nolightbox\" data-provider=\"reddit\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Reddit\" href=\"https:\/\/www.reddit.com\/submit?url=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1337&#038;title=The%20Backyard%20Quarry%2C%20Part%203%3A%20Capturing%20the%20Physical%20World\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"reddit\" title=\"Share on Reddit\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/reddit.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-linkedin nolightbox\" data-provider=\"linkedin\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Linkedin\" href=\"https:\/\/www.linkedin.com\/shareArticle?mini=true&#038;url=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1337&#038;title=The%20Backyard%20Quarry%2C%20Part%203%3A%20Capturing%20the%20Physical%20World\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"linkedin\" title=\"Share on Linkedin\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/linkedin.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-mail nolightbox\" data-provider=\"mail\" rel=\"nofollow\" title=\"Share by email\" href=\"mailto:?subject=The%20Backyard%20Quarry%2C%20Part%203%3A%20Capturing%20the%20Physical%20World&#038;body=Hey%20check%20this%20out:%20https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1337\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"mail\" title=\"Share by email\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/mail.png\" \/><\/a>","protected":false},"excerpt":{"rendered":"<p>In the previous post, we designed a schema for representing rocks as structured data. On paper, everything looked clean. Each rock would have: an identifier dimensions weight metadata possibly images or even a 3D model The structure made sense. The problem was getting the data. From Schema to Reality Designing a schema is straightforward. You &hellip; <a href=\"https:\/\/www.kenwalger.com\/blog\/data-engineering\/capturing-physical-objects-data-pipeline\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;The Backyard Quarry, Part 3: Capturing the Physical World&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":1528,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"pmpro_default_level":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1739,1738],"tags":[1749,1736,1748,1747,1713],"yst_prominent_words":[507,420,782],"class_list":["post-1337","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-engineering","category-software-engineering","tag-computer-vision","tag-data-engineering","tag-data-pipelines","tag-photogrammetry","tag-system-design","pmpro-has-access"],"jetpack_featured_media_url":"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/04\/blog-of-ken-w.-alger-69ea5acc0b4ed.png","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8lx70-lz","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts\/1337","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/comments?post=1337"}],"version-history":[{"count":10,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts\/1337\/revisions"}],"predecessor-version":[{"id":1570,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts\/1337\/revisions\/1570"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/media\/1528"}],"wp:attachment":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/media?parent=1337"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/categories?post=1337"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/tags?post=1337"},{"taxonomy":"yst_prominent_words","embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/yst_prominent_words?post=1337"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}