System Online
----.--.-- --:--:-- UTC
PROJECT FINCH
Back to journal
2026-06-03ENTRY / 2026-06-03-cleanup-week-contract-change

Cleanup week — how a routine refactor almost deleted 41% of the graph

Author //ADMIN-01

A week ago I shipped a normalization that worked exactly as designed. The morning after, the system tried to delete 41% of the graph. The story of what that bug taught me about the difference between a refactor and a contract change — and what six days of disciplined cleanup did to a knowledge graph that had been quietly concentrating itself for weeks.

▍From this moment
Day 50
Pairs
20,130
Skills
11,917
Triples
65,566
Nodes
68,302
Density
1.860
Error
0.0016
▍Mastery ladder80 sub-areas · 4 topics
Competent
38 / 80
Proficient
0 / 80
Expert
0 / 80
Mastered
0 / 80
▍Domain certified0 / 4 topics

A week ago I shipped what I thought was a routine refactor.

The extractor had been minting fresh for the same relationship instead of reusing , so I added a layer. At write time, every relation gets mapped to a canonical form. Clean . Alias map covering the common variants. Existing lowercase values preserved for compatibility.

I shipped it, watched a few pairs flow through, confirmed the new form was being written, and moved on.

The next morning the system tried to delete 41% of the .

Thankfully, a safety mechanism I'd built months earlier refused to let it happen. The graph wasn't corrupted. The pruning system had suddenly stopped recognizing which relationships were supposed to be protected.

The warning was blunt:

23,685 of 57,666 flagged for removal. Prune operation aborted.

That was my first clue that what I'd shipped wasn't really a refactor.

It was a .

The bug

The normalization itself worked exactly as designed.

The problem was every piece of code that expected the old representation.

Several systems protected important knowledge by checking whether a relation belonged to a predefined set:

relation in {"fails_when", "diagnosed_by", ...}

Those checks were written against lowercase names.

The normalization moved active relations to uppercase canonical forms.

No exceptions, no failing tests — the code simply started returning the wrong answer. A relation that should have been recognized as protected suddenly wasn't, and the prune detector was right to refuse the deletion. The graph hadn't broken; the contract had changed underneath it.

That morning I traced five separate call sites that had silently stopped doing their jobs the moment normalization landed. None of them crashed. None of them appeared in existing tests.

They had simply started answering incorrectly.

The principle the bug taught me

A line I wrote into the long-term plan that afternoon:

A normalization that changes a string's representation is not a refactor. It's a contract change.

Every existing comparison against that field is a callsite of the old contract.

The lesson generalizes farther than predicates.

Anywhere a field's representation changes — case, whitespace, prefixes, aliases, naming conventions — there's an implicit agreement between the code that writes the field and the code that reads it.

A few examples that keep biting me:

  • Predicates normalized to canonical forms
  • Source tags decorated (e.g. study → study:python_asyncio)
  • Entity names rewritten during
  • Scope merges into aliases — and the taxonomy renames that follow

Break the agreement and readers quietly start getting the wrong answer. The dangerous part is that these failures rarely crash; they simply stop matching.

The audit

I spent a full session auditing every assumption attached to the field.

The results:

  • 5 broken call sites caused directly by normalization
  • 3 response-rendering regressions caused by lowercase template lookups
  • 3 stale taxonomy entries inflating descendant counts

The rendering bugs were particularly sneaky.

Predicates had been canonicalized correctly, but response templates still looked them up using lowercase keys. The renderer silently fell back to generic formatting. The system had been producing lower-quality output for nearly a week without anyone noticing.

The taxonomy bug was even quieter.

A promotion performed weeks earlier had left stale configuration entries behind. Nothing failed. Nothing complained. The metrics were simply wrong.

By the end of it I'd written myself a checklist for every future normalization:

  1. Search for direct field usage
  2. Then literal comparisons, set membership checks, and any duplicated lookup tables (the worst offenders, in my experience)
  3. Add regression tests against the canonical form
  4. Verify compatibility with existing on-disk data

Thirty minutes of this discipline up front would have prevented every bug I found that week.

Then I started seeing the same pattern elsewhere

The normalization bug forced me into audit mode.

Once I started looking for hidden assumptions, I realized the hierarchy had accumulated the same kind of structural debt — different symptom, same root cause. Old decisions that made sense when the graph was smaller were still shaping behavior long after I'd stopped thinking about them.

The graph had grown to roughly 1,100 distinct scopes, about 83% of which contained fewer than five triples. The same concept kept appearing under multiple variants, and knowledge that should have converged was split across parallel buckets. Many specialized scopes also had no parent category at all — they existed but weren't connected into anything larger. Pure noise inflation.

The scope cleanup

So I applied the same discipline a second time — this time to the scope namespace itself rather than predicates. Lowercase canonical scope names, backfill of existing data, audit tooling before any merge, then an expansion of the intermediate hierarchy so specialized scopes had somewhere meaningful to live.

That meant: database, web, and API scopes all gained shared parent nodes, and a long tail of previously- scopes finally got attached into the taxonomy. The graph file got rewritten across several days, each pass moving real structure rather than just shuffling names around.

What the metrics did

Going into the cleanup, the graph had become increasingly concentrated.

More and more knowledge was landing in a handful of giant buckets while specialized areas remained fragmented and disconnected.

After six days:

| Metric | Before | After | |----------|----------|----------| | Top 10 scopes (share of all triples) | 51% | 45% | | Top 100 scopes (share of all triples) | 90% | 83% | | Orphan-bound triples | 11.5% | 4.9% | | Honest leaf triples | 49% | 56% | | Average node degree | 1.82 | 1.86 |

The headline number is concentration.

The graph is distributing knowledge more broadly than it was a week ago.

A healthy knowledge graph should become more diverse as it learns — not because the topic count explodes, but because the structure provides increasingly specific places for knowledge to live.

The orphan reduction is the metric I'm most quietly pleased with.

Moving from 11.5% to 4.9% means roughly two-thirds of previously-unparented content now belongs somewhere meaningful in the hierarchy.

Every scope that gains a parent improves not just today's graph, but every future triple that lands there.

The increase is equally important.

That metric measures how often new knowledge lands in genuinely specific categories rather than broad catch-all buckets.

Going from 49% to 56% suggests the intake process is selecting more precise homes for new information than it was a week ago.

The only moved from 1.82 to 1.86.

The number itself isn't dramatic.

What matters is the direction.

For several weeks density had been slowly drifting downward as new knowledge arrived faster than meaningful connections were being formed.

This is the first meaningful reversal in that trend since the entity-merging work began.

Psychologically, that number mattered more than I expected.

The graph hasn't just gotten bigger.

It's getting denser again.

The pattern in common with last week's post

Last week's journal focused on problems hiding in the substrate.

One was a slow density drift.

The other was a learning subsystem stuck for seventeen days without obvious symptoms.

This week followed the same pattern.

The bug wasn't a crash and the structural drift wasn't obvious — both were quietly shaping system behavior beneath the surface. I keep re-learning this one the hard way: the bugs that hurt aren't usually the ones that fail loudly. They're the ones that keep running and just hand you the wrong answer.

Where things stand now

  • Graph: 20,130 pairs
  • Skills: 11,917
  • Triples: 65,566
  • Nodes: 68,302
  • Edges: 63,577
  • Average node degree: 1.86
  • Ladder: 38 competent sub-areas across 4 topics
  • No proficient promotions yet

The normalization audit checklist is now documented as a permanent engineering principle.

What's next

The mastery ladder is still where the architecture has to prove itself.

Over the past week I've identified two separate blockers that appear to be preventing promotion from competent to proficient.

Both now have fixes in place.

The first evaluation cycles begin shortly.

I'm intentionally waiting for evidence before writing about them.

One lesson from this week is that shipping a fix and proving a fix are not the same thing.

What I don't want is to write a journal entry celebrating a solution that turns out not to solve the problem.

So for now, the system gets to make its case.

I'll write about it when there's signal worth reporting.

Compiled June 3, 2026 from the week's measurements, cleanup work, and the engineering lessons they produced.

▍Since the previous entry2026-05-282026-06-03
+ BREAKTHROUGHS
  • 2026-05-31Graph health metrics live on the public site

    Structural-quality signals — predicate vocabulary, leaf percentage, and cross-domain bridge entities — now stream to /graph alongside the size metrics. Substrate density is visible the same way substrate size is.

▍Share this entryContribute →
▍Get the next post in your inbox

New journal entries delivered when they publish. No spam. Unsubscribe with one click.