What we are building
A persistent character-state simulation engine where biology biases behavior. Not a diagnostic tool. Not an FDA device. A substrate that takes identity, memory, relationships, neurochemistry, and situation as inputs and produces consistent, replayable character behavior as output. Same five inputs, same character, same response, every time. That is what cannot be done by any standard language model and what this architecture does by construction.
The problem
Standard language models generate from a global vocabulary with no grounding to identity, memory, or biological state. You ask one to play a character in a specific neurochemical state, the character is in that state in turn one and forgotten by turn three. The character drifts because the substrate has no persistent state to drift around. Every existing safety effort is a layer of regularization on top of a substrate that has no built-in obligation to be consistent over time, across sessions, or to a particular character's interior.
The same architectural problem shows up everywhere standard LLMs are asked to be a specific entity rather than a generic interlocutor: medical training simulations, narrative-driven games, behavioral research, longitudinal patient simulators, therapy training, drug-effect modeling. The character is never really there. There is only the prompt and the model's general tendency to comply with it for a few turns.
The architectural answer
We turn the relationship inside out. Instead of a giant model with a global vocabulary that we hope stays in character, we build a small biology-grounded substrate that physically cannot generate outside its closed vocabulary, with a behavior generator above it that reads five persistent inputs and produces a deterministic, replayable response.
The five inputs are identity (who the character is), memory (what they remember), relationships (who they trust and how), current neurochemical state (a 10-dimensional vector that biases tone and reactivity), and current situation (the prompt, scene, or interlocutor). The substrate provides the domain-specialty vocabulary. The behavior generator integrates the five inputs against the substrate. The character's response is a deterministic function of those inputs. Replay the same state, get the same response, byte-exact. The character does not drift between sessions because state persists in canonical tables, not in fading context.
Chemistry biases. It does not determine. A character in a PTSD-like state is still themselves, just colored by hypervigilance and irritability. A character in a manic-like state is still themselves, just amplified and impulsive. The chemistry layers on top of identity and history. The substrate is the floor, not the ceiling.
How the model is structured
The architecture is hierarchical. A main Aether AI handles conversation and context. Below it sits a router that dispatches clinical questions to mixture-of-experts specialists. Each specialist is one small four-neuron cluster trained for one organ system. Beneath all of that sits the biological cascade, which provides the chemistry substrate (136 heads across four tiers: amino acids, precursors, neurotransmitters, brain regions) that modulates every layer above.
The specialist cluster (we call it OrganVoice) is where the per-organ knowledge lives. One cluster per organ system. Each specialist has its own closed vocabulary. The cardiac specialist's vocabulary contains cardiac terms. The pulmonary specialist's vocabulary contains pulmonary terms. They do not share a global token space. A doctor and a plumber both use the word "valve," but they are not talking about the same valve, and they do not share a profession. The architecture mirrors that: each specialist is its own linguistic world, communicating with the others through the chemistry substrate underneath rather than through shared words. This is what locks the context window cleanly, what makes catastrophic forgetting structurally impossible across specialists, and what eliminates word-sense ambiguity by construction. A geology specialist cannot say "bank" and mean a financial institution because the financial token is not in its vocabulary. The first specialist is cardiac. Inside each OrganVoice cluster are four trainable transformer neurons:
Neuron A · Vocabulary. A retrieval database. Each word in the closed cardiac vocabulary gets an embedding that places it near its clinical neighbors. Trained word by word; each row freezes after it learns.
Neuron B · Grammar. A second embedding over the same vocabulary, capturing part-of-speech structure.
Neuron C · Opcode recall. A third table mapping vocabulary to deterministic clinical functions. This is what learns when to call which canon function.
Neuron D · Voice. The integration layer with the most computational capacity. D learns to consume A, B, and C's specialized outputs and produce coherent speech. The four neurons are connected via astrocyte and tunneling-nanotube primitives drawn from real cellular biology.
Each neuron trains individually first (each on its own job), then all four train together. When a neuron fills its capacity it divides into a daughter cell of the same type instead of overwriting itself. Catastrophic forgetting is structurally impossible because the parent never gets modified.
Every public call into every cluster emits a SHA-256 hash into a tamper-evident audit chain. Every step is attributable, contemporaneous, original, accurate. The architecture is ALCOA+ compliant by construction, not by retrofit.
Why this matters
Because every emission is the result of a deterministic lookup against a closed substrate, the output is provenanced by construction. Anyone can trace any character's behavior back to the five input layers (identity, memory, relationships, neurochemistry, situation) and from those to the canon tables that produced them, and from there to the cited sources behind each row.
Because the vocabulary is closed and the canon table is the only place vocabulary comes from, the model has no mechanical way to drift outside its substrate. Inconsistency across sessions is not prevented by training discipline. It is not in the space of things the architecture can do. Same five-input state, same character response, every time. This is what makes the system a persistent character-state engine rather than a probabilistic chatbot.
Because each specialist has its OWN vocabulary, three additional problems standard language models cannot solve are also solved by construction. Catastrophic forgetting becomes impossible across specialists, because no specialist's training can reach into another's weights. Word-sense ambiguity disappears, because a token like "bank" or "valve" has exactly one meaning per specialist. Context windows are pinned, because the words available to the specialist define the world it can talk about. None of these are training-time properties. They are architectural properties.
The architecture is patent-fortressed (four pending applications: 63/939,190 substrate-independent memory weighting, 63/962,385 neurochemical language model, 63/988,485 cell-as-opcode tree, 64/034,536 hash-verifiable specialist composition) and substrate-sovereign (the Scyla compiler, in which the entire stack is written, is owned by Nexus Concordat). Together those two facts mean a competitor cannot reproduce the function without infringing or rebuilding both the patents and the compiler.
Where we are
Today we ran the integrated test of the four-rule, four-head architecture on the curated cardiac substrate. Four brains, four trainable transformer voice towers, each conditioned on the chemistry signal from the frozen FDA cascade. Each brain learns with a different biological learning rule matched to its specialized role. Adam for vocabulary statistics. Hormone-LR for context-modulated specialization. Hebbian for opcode recall and association. STDP for the voice itself, where causal temporal structure is what matters.
The training took 5,000 steps on 3,470 cardiac tokens. A standard foundation model trains on hundreds of billions to trillions of tokens. We are doing this with five orders of magnitude less data and somewhere between five and seven orders of magnitude less compute. The whole training run for all four brains fits in a single A5000 GPU and produces four 12 MB checkpoints. The architecture is small because it has to be: every entry in the vocabulary is citable, every clinical claim is dispatched through the canon table, and the model only has to learn when to ask, not what to remember.
The results from the four brains confirm the architectural claim that each rule is doing different work. The three gradient-descent-class rules (Adam, Hormone, Hebbian) each found a different reading of the cardiac space. Hebbian, for example, clustered warfarin with simvastatin and apixaban, exactly the real-world drug interactions the corpus contains. That is opcode-association doing its job. Adam clustered warfarin with digoxin and amiodarone, also real interactions, slightly different angle. Hormone clustered warfarin with general clinical context (patient, hold, hemodynamically). Three different rules, three different but useful views.
STDP did what the voice should do. When we sampled from brain D at high temperature, every single emitted token was a cardiac concept: angina, metoprolol, ef, bp, hfpef, crackles, dyspnea, edema, furosemide, fibrillation. The longer generations included real cardiologist jargon (cha2ds2-vasc, hfref, aki, dyns/cm5) that a generic language model would only produce after training on orders of magnitude more medical text. No "the." No "of." No general-English noise leaking through. The voice neuron stayed in domain by construction, which is exactly the property we need from a clinical speaker.
UpdatedThe reproducibility of the result was the next thing we tested. The four-rule comparison was rerun with completely different random seeds. If the architecture's behavior was a function of where each brain happened to start in weight space, the second run would have produced different patterns. It did not. The second run reproduced the first within 0.07 to 0.14 nats at every measured step. A third independent run was done on a different corpus shape (clinical narrative sentences only, no question-and-answer pairs), and the same per-rule banding showed up there too: STDP consistently higher than the gradient-descent variants, the gradient-descent variants clustering with each other. Three independent runs, three confirmations. The work is being done by the rule, not by the seed and not by an artifact of the specific corpus structure. That is the difference between a one-off demo and an empirical finding.
When we tested whether the trained brains can respond to clinical prompts in a real way, brain B (the Hormone-LR brain, which the architecture pairs with grammatical and contextual specialization) was given the prompt "warfarin amiodarone" and returned the token "major" twice in a row as its top response. Warfarin and amiodarone is, in fact, a major drug-drug interaction. The model produced the medically correct severity classification from 3470 training tokens and 5000 training steps on a 523-word vocabulary. That is a thing standard language models cannot do at this scale.
These four trained checkpoints will become the four neurons of the cardiac OrganVoice cluster. The cluster framework, already registered in the production compiler binary alongside the eleven existing clusters, will hold them, route between them via the astrocyte and nanotube primitives, and emit every public call through the SHA-256 tamper-evident audit chain. The canon runtime that intercepts opcode emissions and injects deterministic citation-bearing answers is the parallel engineering track that lands the no-hallucination property at decode time.
NewThe next training pass scales the corpus and adds the sister cell. The cardiac corpus expanded to ~20 million cleaned tokens from NIH NHLBI clinical guidelines, cardiology abstracts from PubMed, FDA drug labels, ClinicalTrials.gov protocols, and curated internal corpora — every entry citable in a research-defensible corpus declaration without licensing exposure. A second specialist, psych_gen1, is now in queue from PubMed psychiatric MeSH, ClinicalTrials.gov mental-health arms, DailyMed psychiatric labels, and the NIMH RDoC matrix. Same architecture, second domain, third independent confirmation that the substrate is reusable. The vocabulary grows dynamically through cell-division rather than overwriting any previously learned weights, which is why a third or fourth specialist can be added without disturbing the first.
Why this is the real discovery
Cardiac is the prototype, not the product. The architecture is domain-agnostic by construction. Same compiler. Same heads. Same freeze mechanism. Different canon table per domain.
If cardiac generalizes to a sister specialty (which the in-queue psych_gen1 cell is designed to falsify or confirm), the same machinery replicates to every body system and every behavioral domain. Pulmonary. Renal. Hepatic. Neurologic. Endocrine. Hematologic. Psychiatric. Each one becomes its own curated vocabulary and its own deterministic canon, sitting on top of the same proven mechanism.
Once multiple specialist canons exist side by side, the cross-domain interaction layer emerges from their intersection. A neurochemical state affects whichever specialists it touches; the canon tables already know the substrate of each touchpoint. The cross-domain reactivity is not a separate model. It is what you get for free when the per-domain canons overlap.
That is the actual product line. A persistent character-state simulation engine in which biology biases behavior across every body system the architecture has a canon for. Cardiac is the demonstration that the substrate is real. Psych is the demonstration that it generalizes. Everything else is the same substrate, applied to the next domain.
What we propose to Mayo
Mayo is not a diagnostic customer. Mayo is a training and research partner.
Mayo's medical education arm runs one of the largest residency and CME programs in the United States. Every year, hundreds of psychiatry residents, internal-medicine residents, and cardiology fellows train against standardized patient actors at significant per-encounter cost. Those actors are read from a script. They do not remember the trainee from yesterday's session. They cannot replay a stable neurochemical state. They do not have an interior.
The architecture produces a simulated patient that does. Identity, memory, relationships, neurochemistry, situation: five persistent inputs against a closed, biology-grounded substrate, every response deterministic and replayable. The same simulated PTSD patient on Monday and Friday behaves the same way given the same prompts. The same patient with manic-state chemistry behaves differently from the same patient with depressive-state chemistry. The chemistry biases. It does not replace the patient's identity. Mayo's residents would train against a simulated person, not a chatbot, with the substrate provenanced row by row.
The most persuasive demonstration is the one anyone can read at a glance:
Character: Xena (half-elf ranger, neutral, wary by default)
Prompt: "Do you trust the stranger?"
Healthy panel: "I don't know him, but I'll hear him out."
PTSD-like state: "Why is he here? How does he know my name? Keep him where I can see him."
MDD-severe state: "It doesn't matter. Do whatever you want."
Manic-like state: "Trust him? Of course. This is the opportunity I've been waiting for."
All four are Xena. Nobody needs a whitepaper to see what just happened.
Our proposal to Mayo is a co-development partnership, not a vendor sale. The shape:
Phase one. Demo. Mayo's medical education group reviews the live three-state same-character comparison. If it lands, Phase two starts. Total Mayo time investment in Phase one: a one-hour meeting.
Phase two. Psychiatry residency pilot. Mayo psychiatrists curate the canonical patient panels (PTSD, MDD, GAD, bipolar phases) against published DSM-5-TR criteria and Mayo's own training scenarios. We build the simulated patients around their panels. Residents train against them. Joint publication on residency simulation effectiveness. Mayo gets a first-of-kind simulated-patient platform for their training pipeline. We get a validation partner whose name is the most trusted institutional voice in American medicine.
Phase three. Expand to additional Mayo training departments. Cardiology fellow simulations (where the cardiac specialist substrate already lives). Endocrinology, neurology, internal medicine. Each department curates its own canon against the same substrate. The cross-specialty patient (a person with multiple co-morbidities, a real Mayo case archetype) emerges from the intersection of the specialist canons. The result is a unified simulated-patient platform usable across the Mayo training network and licensable to other academic medical centers.
What we ask of Mayo: domain experts who can curate canonical patient panels, access to anonymized training-scenario libraries, and the Mayo name on the joint validation publications.
What this is not: a diagnostic tool, an FDA-submitted medical device, a clinical-decision-support system, or a replacement for clinician judgment. Those are different products in different regulatory tracks and that is not what we are bringing to Mayo. We are bringing a simulator. The patent fortress stays the same. The regulatory exposure does not.