Reference / Terminology

Glossary

Every technical term used across the site, defined. 43 entries grouped by where they show up — substrate, knowledge graph, mastery ladder, learning cycles, the engineering vocabulary. If you hit an unfamiliar word in a devlog or on one of the metric pages, this is where the definition lives.

02▍Foundation

// The shape of the project itself.

Knowledge graph#knowledge-graph: A network of concepts (entities) connected by labeled relationships (predicates). Every fact gets stored as a (subject, predicate, object) triple, e.g. `python uses asyncio`. The graph structure lets the system traverse from one concept to related ones during retrieval.
Substrate#substrate: The system's accumulated working memory — stored corrections, extracted skills, graph relationships, retrieval pathways, and learned patterns that persist across sessions. Project Finch's distinguishing feature: the substrate grows over time rather than resetting per conversation.

03▍Substrate

// The atomic units of accumulated working memory.

Pattern#pattern: A reusable shape extracted from many Q&A pairs — the underlying form of a recurring problem type. Patterns slot into the substrate alongside skills and triples; the count of slotted patterns is one of the substrate-size metrics on /metrics.
Q&A pair#q-and-a-pair: A single stored interaction — one question and the verified-correct answer. Pairs are the raw atomic unit of substrate; everything else (skills, triples, patterns) is extracted from them.
Skill#skill: A reusable problem-shape distilled from one or more Q&A pairs. Where a pair is a specific Q+A, a skill is the abstracted pattern — e.g. "cancel a specific asyncio.Task by calling task.cancel()" — that can be retrieved and applied to new but similar queries.

14▍Knowledge graph

// How concepts get stored, related, and traversed.

Average node degree#average-degree: A graph-density measure: roughly, how many other nodes the typical node connects to. Higher average degree means a more interconnected, denser graph. Computed as 2 × edges / nodes.
Bridge entity#bridge-entity: An entity that appears in multiple topic domains. Bridge entities are evidence of transfer — when one concept gets referenced across writing and python and analysis, the substrate is genuinely cross-pollinating rather than running as parallel siloed tracks.
Canonical predicate#canonical-predicate: A curated relation name that the extractor reuses across many triples instead of inventing fresh near-synonyms each time. The goal is a small, dense vocabulary of relations — the alternative is a snowflake graph where every triple uses its own variant.
Cross-domain entities#cross-domain-entities: Entities that appear in more than one topic domain. The percentage is a transfer signal: a substrate with many cross-domain entities is consolidating concepts across topics rather than running parallel siloed graphs per topic.
Edge#edge: A connection between two nodes, derived from a triple's relationship. The number of edges per node is its degree; higher-degree nodes are more interconnected.
Entity#entity: A concept that appears as the subject or object of a triple. Entities accumulate connections as more triples reference them; the most-connected entities form the spine of the graph.
Honest leaf#honest-leaf: Of all leaf triples (the dead-end ones), the percentage that land in specific concept nodes (e.g. `python_asyncio`) rather than catch-all buckets (e.g. `coding`). Higher is better — and it's a leading indicator. A fact shelved in the right aisle today gets found tomorrow during retrieval, weaving, and promotion checks; one dumped into the giant `programming` bin gets effectively lost forever.
Leaf#leaf: A node connected to only one other — a structural dead-end. A graph with many leaves is large but not well-connected; the leaf percentage is a quality signal that should trend down over time.
Node#node: An entity, expressed in graph terms. The graph has one node per unique entity. Some nodes connect to many others (hubs); others connect to only one (leaves).
Orphan#orphan: A scope or sub-graph that exists but isn't linked into the larger taxonomy — knowledge that's stored but isn't reachable from related areas. Orphan reduction is a structural-health metric.
Predicate#predicate: The relationship in a knowledge-graph triple — the verb that connects subject to object. In `python uses asyncio`, `uses` is the predicate. Canonical predicates are reused across many triples; near-synonym variants (`used_for`, `is_used_for`, `used_in`) are a common substrate-quality problem.
Scope#scope: The topical category each triple is minted under. Scopes group related knowledge — `python`, `marketing`, `general_knowledge`, etc. — and form a hierarchy that helps the system retrieve relevant context.
Snowflake graph#snowflake-graph: A pathology where every new triple uses its own near-unique predicate variant — `used_for`, `is_used_for`, `used_in`, etc. — instead of converging on shared canonical relations. The graph grows in size but not in structural density. Solved by canonical-first extraction.
Triple#triple: The atomic unit of the knowledge graph: a (subject, predicate, object) statement. E.g. `python uses asyncio` is one triple. Triples are extracted from Q&A pairs during ingest.

08▍Mastery ladder

// The five-rung ladder each sub-area climbs through.

Competent#competent: The first rung of the mastery ladder. A sub-area is competent when it passes the rubric grader. Says nothing about depth or robustness — only that isolated tests succeed.
Domain#domain: Used interchangeably with topic in most contexts on this site. The same thing: a high-level area of knowledge that contains sub-areas.
Domain certified#domain-certified: Topic-level achievement. Every sub-area in the topic at mastered, plus a topic gateway capstone. Not yet reached for any topic.
Expert#expert: The third rung. Adds diversity (handling varied task shapes) and an absence of mid-task corrections to the proficient bar.
Mastered#mastered: The fourth rung. The sub-area integrates with neighboring sub-areas — concepts compose cleanly with related ones, not just survive in isolation. Requires a capstone session (not yet built).
Proficient#proficient: The second rung. A sub-area is proficient when it sustains depth across multi-step tasks within itself, not just passes isolated grading.
Sub-area#sub-area: The atomic unit of the mastery ladder. A topic (e.g. python) contains many sub-areas (e.g. type_system, generators_iterators). Each sub-area moves through the ladder independently.
Topic#topic: A high-level domain that contains many sub-areas. Currently four: analysis, marketing, python, writing. Topics are how the curriculum is organized; mastery is tracked per sub-area within each.

08▍Learning cycles

// How the system practices, gets graded, and learns.

Cycle#cycle: One attempt at a sub-area task. The system picks a goal, attempts it, and gets a verdict (pass / fail / revise). Lifetime cycle counts and pass rates roll up across all attempts.
Depth score#depth-score: A measure of reasoning quality beyond rubric pass/fail — does the system get there through intact reasoning, or through faulty reasoning that happens to produce the right answer? Depth scoring is the second-pass verification the May 19 entry introduced.
Depth-skeptic#depth-skeptic: The instrument that runs after the rubric grader and classifies whether each step's reasoning was intact, weak, or absent. Was built after observing that rubric-passing answers sometimes had absent reasoning on intermediate steps.
Error correction#error-correction: When the system catches a wrong output during automated practice and replays it with the corrected answer, the pair gets stored as a (broken, fixed) record. Each entry is a real error the substrate now has a permanent shortcut for. The counter is monotonic — it only grows.
Foothold goal#foothold-goal: A deliberately easier task (low difficulty) added to a stuck sub-area's goal pool so the picker has somewhere accessible to start. Used to unstick sub-areas that have been failing high-difficulty work without accumulating any passes.
Pass rate#pass-rate: Of cycles attempted, the fraction that succeeded on first attempt. A leading indicator of substrate health — overall pass rate trending up over time means the system is getting better at first-shot answers.
Picker#picker: The subsystem that chooses which goal in which sub-area to attempt next. Picker behavior — its biases, filters, and auto-propose logic — was the deeper cause of the 17-day Python stuck described in the May 28 devlog.
Rubric#rubric: A domain-specific scoring guide used to evaluate output quality. Rubric scores are one signal; the May 19 turning point was discovering that rubric passes don't guarantee depth — they're a thermometer, not a furnace.

02▍Site metrics

// What the numbers on the site actually measure.

Avg loss#loss: An internal measure of error magnitude — roughly, how wrong the system is on average across recent tasks. Lower is better. Maps to the Learning quality signal on the system-health composite.
Patterns slotted#patterns-slotted: The count of patterns currently stored in the substrate. Patterns are reusable problem-shapes extracted from many pairs; the count is one of the four substrate-size measures alongside Q&A pairs, skills, and triples.

06▍Engineering

// Programming concepts that show up across the devlog.

Canonicalization#canonicalization: Mapping varied representations of the same thing onto one canonical form. Predicate canonicalization collapses `used_for` and `is_used_for` and `used_when` into a single canonical relation; entity canonicalization collapses `rest_api` and `rest api` into one node.
Contract change#contract-change: A change to a field's representation that breaks every existing comparison written against the old form. Distinct from a refactor: a refactor preserves observable behavior; a contract change silently changes what equality and membership checks return. The hardest bugs hide here.
Extraction#extraction: The step that takes a stored Q&A pair and pulls structured knowledge out of it — triples, skills, patterns. Extraction quality determines how much usable substrate each pair produces.
Idempotent function#idempotent: A function where applying it twice produces the same result as applying it once. Idempotent normalization is safe to re-run on already-normalized data; the second pass is a no-op.
Ingest#ingest: Reading new source material (a blog post, a docs page, a curated dataset) into the system and converting it into pairs, triples, and patterns. Each new domain typically gets an ingest pass over authoritative sources before practice cycles start.
Normalization#normalization: A specific kind of canonicalization — applying consistent rules to a string's representation (case, whitespace, prefixes). Looks like a refactor; is actually a contract change.

NOTE // Definitions are written from Project Finch's perspective. Some terms (e.g. node, edge, predicate, leaf) have broader meanings in graph theory or linguistics — the definitions here describe how the term is used on this site, not the universal sense.

// missing a term? drop me a note and I'll add it.