2026-05-28ENTRY / 2026-05-28-density-and-the-17-day-stuck

The snowflake, 17-days stuck, and what it took to see both

Author //ADMIN-01

Two problems hiding in the substrate, both visible only once I went looking. The graph was accumulating entities faster than connections — average degree had been drifting downward for a week. And a single Python sub-area had been stuck for 17 days while every other sub-area progressed past it. Three days of measurement-driven fixes, and the first signs of compounding on both.

▍From this moment

Day 44

Pairs

17,009

Skills

10,612

Triples

53,830

Nodes

54,770

Density

1.894

Error

0.0015

▍Mastery ladder62 sub-areas · 4 topics

Competent

37 / 62

Proficient

0 / 62

Expert

0 / 62

Mastered

0 / 62

▍Domain certified0 / 4 topics

Most of the work this week was invisible until I went looking for it. Two problems that the substrate had been carrying — one a slow-burn structural drift, the other a single sub-area that had quietly stopped moving 17 days ago — both surfaced from the same instinct: look at the actual numbers, not the surface health.

What follows is the story of catching both, fixing both, and watching them start to compound.

The graph was a snowflake

A few mornings ago I ran a density audit on the knowledge graph. The kind of thing that takes five minutes and is dull unless you actually open the report. The numbers were stark:

72% of nodes were leaves — referenced by exactly one triple, never connecting to anything else
Median degree was 1 — half the graph was structurally dead-end
Only 2% of nodes were real hubs (degree ≥ 10)
And the long tail: 4,926 unique predicates for 51K triples — about 10 triples per relation on average

That last number was the smoking gun. The extractor was minting fresh predicate names per pair instead of reusing canonical ones. Near-identical relations and entities were accumulating separately instead of converging into shared structure. Every new pair the system extracted was adding its own little crystal that didn't stick to the rest of the graph.

This had been happening for weeks. The average-degree metric had been drifting downward — from 1.93 a week ago to 1.88, a slow trickle that nobody would have noticed without looking at the time series. The system was getting bigger without getting denser. The snowflake graph problem.

Three fixes, in order

The first one was predicate canonicalization. I pulled the most common predicates from the live graph into a canonical vocabulary the extractor now consults before inventing new relations:

Prefer canonical predicates when the meaning fits. Reuse beats near-synonyms. Only invent a new predicate if none of the existing relations capture the meaning cleanly.

The first day after shipping, most new triples reused canonical predicates instead of creating fresh variants. A small post-extraction normalizer cleaned up the remaining obvious near-synonyms. The relation axis was, finally, mostly closed.

Entity merging went second. The infrastructure for this had been built weeks ago — a two-pass analyzer (string canonicalization plus semantic clustering) and a migrator with backups, dry-run, provenance audit. But the analyzer hadn't been re-run since May 16, while the graph had grown significantly in the interim.

The migrator merged 561 names safely and rewrote 1,371 triples, deduping 25 after merge. Node count dropped by 489 after consolidation. The graph got smaller and more connected at the same time.

Substrate re-extraction went third. Even with the new predicate vocabulary in place, every triple already in the graph still reflected the older extractor — weaker normalization, lower-quality predicates, thinner connections. The fix was to re-run the current extractor over old pairs and replace the triples per-pair. Atomic: extract new first, only delete old if new succeeded. Empty result keeps old triples intact.

The first batch ran clean. Re-extracting older pairs consistently produced substantially more triples per pair than the original extractor had generated from the same inputs. That's pure compounding: same data, better structure. The timeline is long, but the process is now automated as part of every nightly run.

What it bought

Average degree nudged up for the first time in a week — from 1.881 to 1.894 after the entity merge alone. Small, but the slope had been negative for seven straight days. Inflection.

The bigger lift is structural. Predicate canonicalization plus entity normalization at insert time means new pairs will no longer add to the snowflake. Re-extraction means old triples get pulled up to the new quality over time. The three fixes form a closed loop: detect → migrate → prevent regrowth.

Type_system had been stuck for 17 days

While the graph problem was structural and quiet, the type_system problem was concrete and visible: a single Python sub-area that hadn't progressed since May 10. Every other sub-area in the project had moved past it. Some were already at competent. Type_system had 28 attempts and 5 passes — the same numbers it had two and a half weeks earlier.

I'd added foothold goals (easier diff-1 and diff-2 tasks) the morning before, hoping they'd give the system a path to accumulate passes. Nothing happened. The foothold goals had been added but never picked.

When I actually looked at the picker logic, the cause was nested deeper than I'd expected.

Three blockers, in order:

Status filter — the sub-area's status was stuck, which excluded it from the picker entirely. Adding goals didn't help because the sub-area itself was filtered out. Flipping it back to active unblocked the filter.
High-difficulty bias — once the sub-area crossed certain thresholds, the picker strongly favored unresolved higher-difficulty goals over easier foothold work. One stale high-difficulty goal kept capturing every selection.
Auto-propose loop — even after deleting the stale goal, the picker would auto-generate new high-difficulty goals whenever the pool emptied, preventing the easier foothold goals from ever running.

The fix was a combination: reset the status, remove the stale goal, and temporarily suppress the auto-propose cycle so the foothold goals could finally execute.

The result was immediate

The next master-step run on Python picked type_system, rotated through the goals, and produced three PASSes out of four attempts — the highest pass rate type_system had ever had. The d3 goal failed (correctly — type_system is genuinely hard), and the three diff-1/diff-2 foothold goals all passed. Output passes went 5 → 8, attempts 28 → 32. After seventeen days of nothing, the sub-area finally has some momentum under it.

A bonus visible only after unstick

When the d3 generic-repository goal failed, it failed in a specific way: the code ran cleanly in the sandbox (PASS, 111ms) but the type checker found a real type error and downgraded the verdict from PASS to REVISE.

This is the type-correctness gate I'd built two days earlier, doing exactly its job. Sandbox-success is one signal; structural type correctness is another. Without the gate, the cycle would have been a false PASS. With it, the system gets feedback that the types in the implementation are wrong even when the runtime didn't catch it. Type_system being stuck for 17 days may have been partly because static type-checking was always going to catch its errors — and previously the grader couldn't see them.

That changes the timeline expectation. Type_system progress will be slower than other sub-areas now that strict type-checking is active, but the passes it earns will be real.

The pattern in common

Both problems had the same shape: invisible until measured, then obvious. The graph density wasn't visible from skill counts or pair counts or triple counts — those all looked healthy. It was only visible when I asked the specific question "how connected is the graph?" and got 1.88 as the answer.

Type_system being stuck wasn't visible from the dashboard either — the sub-area showed competent like its peers, just with older numbers. It was visible only when I asked "which sub-areas have made the least progress lately?" and the report named one that had moved zero in the rolling window.

The lesson I keep relearning is that surface metrics rarely tell you the right thing to look at. The depth-skeptic exists because rubric scores were hiding shallow reasoning; the density audit because pair counts were hiding the snowflake; the picker diagnostic because per-sub-area status was hiding which goals were actually getting attempted. Every one of those instruments came out of a moment where the dashboard looked fine and the system was doing the wrong thing anyway.

Three days of building them and using them has put the system in a noticeably better place — not because anything dramatic shipped, but because they revealed two problems the substrate was about to keep carrying indefinitely.

Where things stand now

Graph: 17,009 pairs, 53,830 triples, 54,770 nodes, 51,859 edges, avg_degree 1.894 (up from 1.881)
Ladder: 37 sub-areas competent across 4 topics (analysis 16/16, python 15/16, writing 6/6, marketing 0/24); 0 proficient/expert/mastered
Type_system: active again and progressing after its first successful foothold cycle in over two weeks
Re-extraction: active nightly as part of the long-term graph consolidation pass

What's next

The mastery ladder is still where the architecture has to prove itself. Competent is an early rung, and the climb to proficient is what the next month should be about. I've started a week-long diagnostic to figure out whether the depth-score ceiling that's keeping writing and analysis at competent is a hard ceiling on what the substrate can currently produce (which means stronger underlying tooling would help) or a feedback-loop problem in the substrate itself (which means more work on the loop itself).

Either answer would actually be useful. The thing I'm trying to avoid is another month of guessing.

Compiled May 28, 2026 from the morning's measurements and the week's work.

▍Since the previous entry2026-05-25 → 2026-05-28

+ BREAKTHROUGHS

2026-05-27The relation axis stopped diverging
The predicate space had been bloating with near-synonyms — every new extraction was minting fresh relation names instead of reusing canonical ones. Canonical-first extraction shipped; most new triples now land on the curated vocabulary instead of inventing variants.

▍Share this entry

Bluesky X LinkedIn Reddit HN

▍Follow Finch as it learns

A short digest when something real happens — promotions, milestones, the occasional honest setback. No spam, no account, unsubscribe anytime.