Glossary
Every technical term used across the site, defined. 43 entries grouped by where they show up — substrate, knowledge graph, mastery ladder, learning cycles, the engineering vocabulary. If you hit an unfamiliar word in a devlog or on one of the metric pages, this is where the definition lives.
// The shape of the project itself.
- Knowledge graph#knowledge-graph
- A network of concepts (entities) connected by labeled relationships (predicates). Every fact gets stored as a (subject, predicate, object) triple, e.g. `python uses asyncio`. The graph structure lets the system traverse from one concept to related ones during retrieval.
- Substrate#substrate
- The system's accumulated working memory — stored corrections, extracted skills, graph relationships, retrieval pathways, and learned patterns that persist across sessions. Project Finch's distinguishing feature: the substrate grows over time rather than resetting per conversation.
// The atomic units of accumulated working memory.
- Pattern#pattern
- A reusable shape extracted from many Q&A pairs — the underlying form of a recurring problem type. Patterns slot into the substrate alongside skills and triples; the count of slotted patterns is one of the substrate-size metrics on /metrics.
- Q&A pair#q-and-a-pair
- A single stored interaction — one question and the verified-correct answer. Pairs are the raw atomic unit of substrate; everything else (skills, triples, patterns) is extracted from them.
- Skill#skill
- A reusable problem-shape distilled from one or more Q&A pairs. Where a pair is a specific Q+A, a skill is the abstracted pattern — e.g. "cancel a specific asyncio.Task by calling task.cancel()" — that can be retrieved and applied to new but similar queries.
// How concepts get stored, related, and traversed.
- Average node degree#average-degree
- A graph-density measure: roughly, how many other nodes the typical node connects to. Higher average degree means a more interconnected, denser graph. Computed as 2 × edges / nodes.
- Bridge entity#bridge-entity
- An entity that appears in multiple topic domains. Bridge entities are evidence of transfer — when one concept gets referenced across writing and python and analysis, the substrate is genuinely cross-pollinating rather than running as parallel siloed tracks.
- Canonical predicate#canonical-predicate
- A curated relation name that the extractor reuses across many triples instead of inventing fresh near-synonyms each time. The goal is a small, dense vocabulary of relations — the alternative is a snowflake graph where every triple uses its own variant.
- Cross-domain entities#cross-domain-entities
- Entities that appear in more than one topic domain. The percentage is a transfer signal: a substrate with many cross-domain entities is consolidating concepts across topics rather than running parallel siloed graphs per topic.
- Edge#edge
- A connection between two nodes, derived from a triple's relationship. The number of edges per node is its degree; higher-degree nodes are more interconnected.
- Entity#entity
- A concept that appears as the subject or object of a triple. Entities accumulate connections as more triples reference them; the most-connected entities form the spine of the graph.
- Honest leaf#honest-leaf
- A stricter measure of leaf-ness that excludes some edge cases the raw leaf count over-credits. Lower honest-leaf % means more of the graph's mass is genuinely connected rather than dangling.
- Leaf#leaf
- A node connected to only one other — a structural dead-end. A graph with many leaves is large but not well-connected; the leaf percentage is a quality signal that should trend down over time.
- Node#node
- An entity, expressed in graph terms. The graph has one node per unique entity. Some nodes connect to many others (hubs); others connect to only one (leaves).
- Orphan#orphan
- A scope or sub-graph that exists but isn't linked into the larger taxonomy — knowledge that's stored but isn't reachable from related areas. Orphan reduction is a structural-health metric.
- Predicate#predicate
- The relationship in a knowledge-graph triple — the verb that connects subject to object. In `python uses asyncio`, `uses` is the predicate. Canonical predicates are reused across many triples; near-synonym variants (`used_for`, `is_used_for`, `used_in`) are a common substrate-quality problem.
- Scope#scope
- The topical category each triple is minted under. Scopes group related knowledge — `python`, `marketing`, `general_knowledge`, etc. — and form a hierarchy that helps the system retrieve relevant context.
- Snowflake graph#snowflake-graph
- A pathology where every new triple uses its own near-unique predicate variant — `used_for`, `is_used_for`, `used_in`, etc. — instead of converging on shared canonical relations. The graph grows in size but not in structural density. Solved by canonical-first extraction.
- Triple#triple
- The atomic unit of the knowledge graph: a (subject, predicate, object) statement. E.g. `python uses asyncio` is one triple. Triples are extracted from Q&A pairs during ingest.
// The five-rung ladder each sub-area climbs through.
- Competent#competent
- The first rung of the mastery ladder. A sub-area is competent when it passes the rubric grader. Says nothing about depth or robustness — only that isolated tests succeed.
- Domain#domain
- Used interchangeably with topic in most contexts on this site. The same thing: a high-level area of knowledge that contains sub-areas.
- Domain certified#domain-certified
- Topic-level achievement. Every sub-area in the topic at mastered, plus a topic gateway capstone. Not yet reached for any topic.
- Expert#expert
- The third rung. Adds diversity (handling varied task shapes) and an absence of mid-task corrections to the proficient bar.
- Mastered#mastered
- The fourth rung. The sub-area integrates with neighboring sub-areas — concepts compose cleanly with related ones, not just survive in isolation. Requires a capstone session (not yet built).
- Proficient#proficient
- The second rung. A sub-area is proficient when it sustains depth across multi-step tasks within itself, not just passes isolated grading.
- Sub-area#sub-area
- The atomic unit of the mastery ladder. A topic (e.g. python) contains many sub-areas (e.g. type_system, generators_iterators). Each sub-area moves through the ladder independently.
- Topic#topic
- A high-level domain that contains many sub-areas. Currently four: analysis, marketing, python, writing. Topics are how the curriculum is organized; mastery is tracked per sub-area within each.
// How the system practices, gets graded, and learns.
- Cycle#cycle
- One attempt at a sub-area task. The system picks a goal, attempts it, and gets a verdict (pass / fail / revise). Lifetime cycle counts and pass rates roll up across all attempts.
- Depth score#depth-score
- A measure of reasoning quality beyond rubric pass/fail — does the system get there through intact reasoning, or through faulty reasoning that happens to produce the right answer? Depth scoring is the second-pass verification the May 19 entry introduced.
- Depth-skeptic#depth-skeptic
- The instrument that runs after the rubric grader and classifies whether each step's reasoning was intact, weak, or absent. Was built after observing that rubric-passing answers sometimes had absent reasoning on intermediate steps.
- Error correction#error-correction
- When the system catches a wrong output during automated practice and replays it with the corrected answer, the pair gets stored as a (broken, fixed) record. Each entry is a real error the substrate now has a permanent shortcut for. The counter is monotonic — it only grows.
- Foothold goal#foothold-goal
- A deliberately easier task (low difficulty) added to a stuck sub-area's goal pool so the picker has somewhere accessible to start. Used to unstick sub-areas that have been failing high-difficulty work without accumulating any passes.
- Pass rate#pass-rate
- Of cycles attempted, the fraction that succeeded on first attempt. A leading indicator of substrate health — overall pass rate trending up over time means the system is getting better at first-shot answers.
- Picker#picker
- The subsystem that chooses which goal in which sub-area to attempt next. Picker behavior — its biases, filters, and auto-propose logic — was the deeper cause of the 17-day Python stuck described in the May 28 devlog.
- Rubric#rubric
- A domain-specific scoring guide used to evaluate output quality. Rubric scores are one signal; the May 19 turning point was discovering that rubric passes don't guarantee depth — they're a thermometer, not a furnace.
// What the numbers on the site actually measure.
- Avg loss#loss
- A model-side measure of error magnitude — roughly, how wrong the system is on average across recent tasks. Lower is better. Maps to the Learning quality signal on the system-health composite.
- Patterns slotted#patterns-slotted
- The count of patterns currently stored in the substrate. Patterns are reusable problem-shapes extracted from many pairs; the count is one of the four substrate-size measures alongside Q&A pairs, skills, and triples.
// Programming concepts that show up across the devlog.
- Canonicalization#canonicalization
- Mapping varied representations of the same thing onto one canonical form. Predicate canonicalization collapses `used_for` and `is_used_for` and `used_when` into a single canonical relation; entity canonicalization collapses `rest_api` and `rest api` into one node.
- Contract change#contract-change
- A change to a field's representation that breaks every existing comparison written against the old form. Distinct from a refactor: a refactor preserves observable behavior; a contract change silently changes what equality and membership checks return. The hardest bugs hide here.
- Extraction#extraction
- The step that takes a stored Q&A pair and pulls structured knowledge out of it — triples, skills, patterns. Extraction quality determines how much usable substrate each pair produces.
- Idempotent function#idempotent
- A function where applying it twice produces the same result as applying it once. Idempotent normalization is safe to re-run on already-normalized data; the second pass is a no-op.
- Ingest#ingest
- Reading new source material (a blog post, a docs page, a curated dataset) into the system and converting it into pairs, triples, and patterns. Each new domain typically gets an ingest pass over authoritative sources before practice cycles start.
- Normalization#normalization
- A specific kind of canonicalization — applying consistent rules to a string's representation (case, whitespace, prefixes). Looks like a refactor; is actually a contract change.
NOTE // Definitions are written from Project Finch's perspective. Some terms (e.g. node, edge, predicate, leaf) have broader meanings in graph theory or linguistics — the definitions here describe how the term is used on this site, not the universal sense.
// missing a term? drop me a note and I'll add it.