{"id":1418,"date":"2026-06-18T04:29:00","date_gmt":"2026-06-18T11:29:00","guid":{"rendered":"https:\/\/www.kenwalger.com\/blog\/?p=1418"},"modified":"2026-04-23T07:58:01","modified_gmt":"2026-04-23T14:58:01","slug":"engineering-ai-agent-memory-json-ld","status":"publish","type":"post","link":"https:\/\/www.kenwalger.com\/blog\/ai\/engineering-ai-agent-memory-json-ld\/","title":{"rendered":"Engineering the Knowledge Archive"},"content":{"rendered":"<p>In our <a href=\"https:\/\/www.kenwalger.com\/blog\/ai\/death-of-note-taking-digital-scribe-mcp\">last post<\/a>, we introduced the <strong>Digital Scribe<\/strong>, an AI architecture designed to capture the &#8220;unstructured nightmare&#8221; of historical records. We showed how the Scribe uses the <a href=\"https:\/\/modelcontextprotocol.io\/\">Model Context Protocol (MCP)<\/a> to transcribe 19th-century cursive and resolve the cryptic &#8220;ditto marks&#8221; of the past.<\/p>\n<p>But transcription is only half the battle. If the Scribe forgets what it read the moment the session ends, we haven&#8217;t built a system; we\u2019ve just built a fancy typewriter.<\/p>\n<p>Today, we go deeper into the <strong>Scribe\u2019s Memory<\/strong>.<\/p>\n<h2>Memory is an Engineering Discipline<\/h2>\n<p>As I\u2019ve written before in <a>Engineering Agent Memory<\/a>, AI agents are often &#8220;stateless by default.&#8221; They live in the moment, relying on a flat conversation transcript that grows until it hits a token limit.<\/p>\n<p>For the Digital Scribe, that is unacceptable. To digitize the 1880 Census of Salem, Oregon, we need <strong>Semantic Memory<\/strong>, a way to store, index, and retrieve knowledge intentionally.<\/p>\n<h2>The Architecture of Persistence: JSON-LD<\/h2>\n<p>We didn&#8217;t just want a text file; we wanted a <em>Sovereign Archive<\/em>. We chose <a href=\"https:\/\/json-ld.org\/\">JSON-LD (JSON for Linked Data)<\/a> aligned with <a href=\"https:\/\/schema.org\">Schema.org<\/a> standards. This transforms a census row into a &#8220;Thing, not a string.&#8221;<\/p>\n<p>To achieve this, we don&#8217;t just dump JSON; we map our historical model to the Schema.org <code>Person<\/code> vocabulary. This ensures that a &#8216;Scribe&#8217; in 2026 and a researcher in 2050 can both understand that a &#8216;birthplace&#8217; string is actually a <code>Schema.org\/Place<\/code> entity.<\/p>\n<pre><code class=\"language-python\"># Mapping the Census to the Global Schema\ndef _record_to_jsonld_entity(record: Census1880Record, entity_id: str | None = None) -&gt; dict:\n    given, family = _parse_historical_name(record.name)\n    return {\n        \"@context\": \"https:\/\/schema.org\/\",\n        \"@type\": \"Person\",\n        \"@id\": entity_id or f\"urn:uuid:{uuid.uuid4()}\",\n        \"givenName\": given,\n        \"familyName\": family,\n        \"hasOccupation\": {\"@type\": \"Occupation\", \"name\": record.occupation},\n        \"birthPlace\": {\"@type\": \"Place\", \"name\": record.birthplace},\n        \"censusFamilyNumber\": record.family_number,\n        \"censusDwellingNumber\": record.dwelling_number,\n    }\n<\/code><\/pre>\n<blockquote><p>Technical Deep Dive: Parsing Historical Names<\/p><\/blockquote>\n<p>In 1880, names weren&#8217;t always &#8220;First Last.&#8221; We built a robust parser to handle &#8220;Surname, Given Name&#8221; formats and multi-word surnames. Without this, our &#8220;Semantic Memory&#8221; would be fractured by simple formatting variances.<\/p>\n<table>\n<thead>\n<tr>\n<th>Input String<\/th>\n<th><code>givenName<\/code><\/th>\n<th><code>familyName<\/code><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>&#8220;Smith, John&#8221;<\/td>\n<td>&#8220;John&#8221;<\/td>\n<td>&#8220;Smith&#8221;<\/td>\n<\/tr>\n<tr>\n<td>&#8220;Mary Ann Jones&#8221;<\/td>\n<td>&#8220;Mary Ann&#8221;<\/td>\n<td>&#8220;Jones&#8221;<\/td>\n<\/tr>\n<tr>\n<td>&#8220;John Smith&#8221;<\/td>\n<td>&#8220;John&#8221;<\/td>\n<td>&#8220;Smith&#8221;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>When the Scribe identifies &#8220;John Smith&#8221; in a ledger, it doesn&#8217;t just save a name. It creates a <code>Schema.org\/Person<\/code> entity, complete with a unique <code>urn:uuid:<\/code> and structured links to his occupation and birthplace.<\/p>\n<h3>Atomic Ingestion: Protecting the History<\/h3>\n<p>Because we are building &#8220;Sovereign Infrastructure,&#8221; the integrity of the data is paramount. We implemented an <strong>Atomic Write Pattern<\/strong> to ensure the archive is never corrupted.<\/p>\n<ol>\n<li><strong>Thread-Safety:<\/strong> A global lock ensures that multiple &#8220;Scribe&#8221; agents don&#8217;t collide when writing to the same archive.<\/li>\n<li><strong>Write-Ahead Strategy:<\/strong> The system writes to a temporary file and uses <code>os.replace<\/code> only after the data is verified.<\/li>\n<li><strong>Durability:<\/strong> We use <code>os.fsync<\/code> to ensure the data is physically flushed to the disk, protecting against power loss or OS crashes.<\/li>\n<\/ol>\n<p>By using a write-to-temp pattern followed by an <code>os.fsync<\/code>, we ensure that the data is physically committed to the platter before we ever swap it into the main archive. This prevents &#8216;half-written&#8217; files if the power cuts or the process crashes.<\/p>\n<pre><code class=\"language-python\"># The \"Sovereign\" Atomic Save\ndef _save_graph(self, entities: list[dict]) -&gt; None:\n    tmp_path = self._path.with_suffix(self._path.suffix + \".tmp\")\n    replaced = False\n    try:\n        with open(tmp_path, \"w\", encoding=\"utf-8\") as f:\n            json.dump(entities, f, indent=2, ensure_ascii=False)\n            f.write(\"\\n\")\n            f.flush()\n            os.fsync(f.fileno()) # Force the OS to flush to disk\n        os.replace(tmp_path, self._path) # Atomic swap\n        replaced = True\n    finally:\n        if not replaced and tmp_path.exists():\n            tmp_path.unlink() # Cleanup if we failed\n<\/code><\/pre>\n<h2>The Recall: Deduplication and Entity Intelligence<\/h2>\n<p>The true power of the Scribe\u2019s memory is revealed during Ingestion. If we attempt to capture the same person twice, the Scribe doesn&#8217;t just blindly append the data. It performs a <strong>Deduplication Check<\/strong>.<\/p>\n<p>By hashing the record&#8217;s &#8220;DNA&#8221; (Name, Dwelling, and Family Number), the Scribe recognizes &#8220;John Smith&#8221; from a previous run and skips the ingestion, returning a <code>duplicate_skipped<\/code> status.<\/p>\n<p>Deduplication is the ultimate test of a Scribe&#8217;s integrity. We define a unique fingerprint for each life, e.g. a combination of their Name, Dwelling, and Family Number. If the Scribe sees this &#8216;DNA&#8217; again, it refuses to create a duplicate, maintaining a clean, high-fidelity archive.<\/p>\n<pre><code class=\"language-python\"># The Knowledge Stewardship Guard\nfor e in entities:\n    if (\n        (e.get(\"givenName\") or \"\") == given\n        and (e.get(\"familyName\") or \"\") == family\n        and e.get(\"censusDwellingNumber\") == record.dwelling_number\n        and e.get(\"censusFamilyNumber\") == record.family_number\n    ):\n        # Already exists\u2014identify it and move on\n        existing_id = e.get(\"@id\") or f\"{LEGACY_ID_PREFIX}{_content_hash(e)}\"\n        return (existing_id, False)\n<\/code><\/pre>\n<p><img \/><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1423\" data-permalink=\"https:\/\/www.kenwalger.com\/blog\/ai\/engineering-ai-agent-memory-json-ld\/attachment\/digital-scribe-semantic-memory-architecture\/\" data-orig-file=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/06\/digital-scribe-semantic-memory-architecture-scaled.png\" data-orig-size=\"1154,2560\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"digital-scribe-semantic-memory-architecture\" data-image-description=\"&lt;p&gt;The Sovereign Memory Lifecycle. Beyond the initial transcription, the Scribe must govern the data. This architecture ensures every census resident is parsed, deduplicated, and persisted to a standards-compliant JSON-LD archive with atomic, thread-safe integrity.&lt;\/p&gt;\n\" data-image-caption=\"\" data-large-file=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/06\/digital-scribe-semantic-memory-architecture-461x1024.png\" class=\"aligncenter size-large wp-image-1423\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/06\/digital-scribe-semantic-memory-architecture-461x1024.png\" alt=\"A detailed architectural diagram of the Digital Scribe's Semantic Memory layer. It shows the flow from structured JSON through name parsing and entity fingerprinting, into a persistent JSON-LD archive protected by threading locks, corruption guards, and fsync durability.\n\" width=\"461\" height=\"1024\" srcset=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/06\/digital-scribe-semantic-memory-architecture-461x1024.png 461w, https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/06\/digital-scribe-semantic-memory-architecture-135x300.png 135w, https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/06\/digital-scribe-semantic-memory-architecture-768x1704.png 768w, https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/06\/digital-scribe-semantic-memory-architecture-692x1536.png 692w, https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/06\/digital-scribe-semantic-memory-architecture-923x2048.png 923w, https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/06\/digital-scribe-semantic-memory-architecture-1200x2663.png 1200w, https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/06\/digital-scribe-semantic-memory-architecture-scaled.png 1154w\" sizes=\"auto, (max-width: 461px) 85vw, 461px\" \/><\/p>\n<h2>Why This Matters: Building the Graph<\/h2>\n<p>By engineering a persistent, semantic memory, we\u2019ve given the Scribe the ability to recall context across time.<\/p>\n<p>In our next post, we will use this foundation to move from individual residents to The Knowledge Graph. We will begin linking families, neighborhoods, and migration patterns\u2014turning a static archive into a living map of the past.<\/p>\n<p>The Digital Scribe isn&#8217;t just reading history anymore. It\u2019s remembering it.<\/p>\n<a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-facebook nolightbox\" data-provider=\"facebook\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Facebook\" href=\"https:\/\/www.facebook.com\/sharer.php?u=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1418&amp;t=Engineering%20the%20Knowledge%20Archive&amp;s=100&amp;p[url]=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1418&amp;p[images][0]=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-content%2Fuploads%2F2026%2F04%2Fblog-of-ken-w.-alger-69ea335ce001a.png&amp;p[title]=Engineering%20the%20Knowledge%20Archive\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"Facebook\" title=\"Share on Facebook\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/facebook.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-twitter nolightbox\" data-provider=\"twitter\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Twitter\" href=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1418&amp;text=Hey%20check%20this%20out\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"twitter\" title=\"Share on Twitter\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/twitter.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-reddit nolightbox\" data-provider=\"reddit\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Reddit\" href=\"https:\/\/www.reddit.com\/submit?url=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1418&amp;title=Engineering%20the%20Knowledge%20Archive\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"reddit\" title=\"Share on Reddit\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/reddit.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-linkedin nolightbox\" data-provider=\"linkedin\" target=\"_blank\" rel=\"nofollow\" title=\"Share on Linkedin\" href=\"https:\/\/www.linkedin.com\/shareArticle?mini=true&amp;url=https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1418&amp;title=Engineering%20the%20Knowledge%20Archive\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px;margin-right:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"linkedin\" title=\"Share on Linkedin\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/linkedin.png\" \/><\/a><a class=\"synved-social-button synved-social-button-share synved-social-size-48 synved-social-resolution-single synved-social-provider-mail nolightbox\" data-provider=\"mail\" rel=\"nofollow\" title=\"Share by email\" href=\"mailto:?subject=Engineering%20the%20Knowledge%20Archive&amp;body=Hey%20check%20this%20out:%20https%3A%2F%2Fwww.kenwalger.com%2Fblog%2Fwp-json%2Fwp%2Fv2%2Fposts%2F1418\" style=\"font-size: 0px;width:48px;height:48px;margin:0;margin-bottom:5px\"><img loading=\"lazy\" decoding=\"async\" alt=\"mail\" title=\"Share by email\" class=\"synved-share-image synved-social-image synved-social-image-share\" width=\"48\" height=\"48\" style=\"display: inline;width:48px;height:48px;margin: 0;padding: 0;border: none;box-shadow: none\" src=\"https:\/\/www.kenwalger.com\/blog\/wp-content\/plugins\/social-media-feather\/synved-social\/image\/social\/regular\/96x96\/mail.png\" \/><\/a>","protected":false},"excerpt":{"rendered":"<p>In our last post, we introduced the Digital Scribe, an AI architecture designed to capture the &#8220;unstructured nightmare&#8221; of historical records. We showed how the Scribe uses the Model Context Protocol (MCP) to transcribe 19th-century cursive and resolve the cryptic &#8220;ditto marks&#8221; of the past. But transcription is only half the battle. If the Scribe &hellip; <a href=\"https:\/\/www.kenwalger.com\/blog\/ai\/engineering-ai-agent-memory-json-ld\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Engineering the Knowledge Archive&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":1505,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"pmpro_default_level":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1669,1670],"tags":[1774,1668,1736,1773,78,1725],"yst_prominent_words":[99,768],"class_list":["post-1418","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-mcp","tag-agentic-memory","tag-ai","tag-data-engineering","tag-json-ld","tag-python","tag-sovereign-ai","pmpro-has-access"],"jetpack_featured_media_url":"https:\/\/www.kenwalger.com\/blog\/wp-content\/uploads\/2026\/04\/blog-of-ken-w.-alger-69ea335ce001a.png","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8lx70-mS","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts\/1418","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/comments?post=1418"}],"version-history":[{"count":5,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts\/1418\/revisions"}],"predecessor-version":[{"id":1424,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/posts\/1418\/revisions\/1424"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/media\/1505"}],"wp:attachment":[{"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/media?parent=1418"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/categories?post=1418"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/tags?post=1418"},{"taxonomy":"yst_prominent_words","embeddable":true,"href":"https:\/\/www.kenwalger.com\/blog\/wp-json\/wp\/v2\/yst_prominent_words?post=1418"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}