Aether — 4-variant learning-rule comparison

May 9, 2026 overnight session · single T1000 8GB GPU · 240K total training steps · zero NaN
Same integrated architecture (voice + trainable cascade + entity embedding + Ψ coherence regularizer + per-step audit + neurotransmitter input). Same data (compound_tokens.txt, ~600K phoneme-decomposed tokens of literary corpus). Same seed. Same eval region. Only the learning rule changed.
→ Try them yourself in the live demo

Final standings

RankVariantBest held-out CEAt stepVerdict
1A. Adam baseline6.090360,000Won by 0.003 over C
2C. Hebbian (degenerate scalar)6.093260,000Statistically tied with A
3B. Hormone-LR (synthetic cyclical salience)6.135260,000Cost ~0.045 nats vs A
4D. STDP (proxy spike-time)8.31875,000Cost ~2.2 nats on CE, but see §"D speaks differently"

CE held-out = mean cross-entropy loss on a held-out window of 1000 examples (count=496 after filtering OOV-context samples). Lower is better.

What each variant changed (vs Adam baseline)

VariantModification
A. Adam baselineStandard Adam optimizer, lr=0.0001, cascade_lr=0.00001. Control.
B. Hormone-LREffective lr scaled by salience: lr × clamp(0.5 + salience, 0.5, 1.75). Salience cycled 0.5–1.0 per step (synthetic, not real DB-derived).
C. Hebbian-blendAfter Adam step, applied scalar Hebbian update w += hebb_alpha × pre × post on cascade Q-projections. Pre = head m_state (Na+ activation), post = h_state (Na+ inactivation). Degenerate (scalar) implementation — uniform offset per head.
D. STDPAfter Adam, applied spike-timing bias to embedding rows using proxy spike times: pre_t = step, post_t = step + (i % 5). STDP rule: w += a_plus × exp(-Δt/τ) when Δt>0 (LTP).

What the CE numbers tell us (the framework's clean reads)

ComparisonΔ natsConclusion
A vs C0.003Pure noise. Degenerate Hebbian implementation has no measurable effect.
A vs B0.045Small but real signal. Synthetic cyclical LR slightly hurts long-run convergence.
A vs D2.228Big signal. Proxy STDP perturbation has substantive cost on next-token prediction.

The fact that we can confidently distinguish 0.003 (noise) from 2.228 (signal) means the comparison framework itself is the first scientific instrument here. Future variants — real vector Hebbian, real DB-derived hormone salience, real cascade-extracted spike timing — can be tested against this same baseline.

D speaks differently — CE is the wrong metric

D scored worst on cross-entropy by a wide margin. But at sampling temperature 1.5 and above, D produced the most visceral, emotionally-charged language of any variant. This is the heart of tonight's finding.

Same prompt to all four:

[seed → "the arena was cruel"]
A · Adam (greedy)
his body to him with the blade to his eyes that through his eyes raziel could blade like his feet to his eyes the blade he was his feet to his eyes
B · Hormone-LR (greedy)
his blade to him with the blade he was him to get with his eyes and in the blade to re in the tall to make him eyes
C · Hebbian (greedy)
his body to him with the blade to his eyes of his boy with the blade to his eyes their blade in the sound to him down raziel could blade
D · STDP (greedy)
him that and and he his and and and him that and and like his and and and him that and and him his and and and...

A, B, and C all converge on similar "blade and eyes" battle prose. D collapses to a 2-token loop with greedy decoding.

But raise the temperature on D:

D · STDP at temp 1.5
him that an fighting he against with arms for could and the could hair and him that empty blood thirsty him that he that re that and the could hair and him that empty blood thirsty him
D · STDP at temp 2.0
empty let and blood thirsty grip could and his faces hair them him that like that in blade he carved and the thing his faces hair them him that like that in blade he last without the thing his faces hair
D · STDP at temp 2.5
carved and good and arena and his them him that down that at faces in could without good and cruel and his them him that down that at faces light arms around could and cruel and his them him that

D knows: fighting · against · arms · empty · blood thirsty · grip · faces · blade · carved · arena · cruel · light arms. The STDP perturbation pushed D's embeddings into a region greedy decoding cannot reach, but with temperature noise it surfaces visceral war/violence vocabulary the corpus contains.

The regulatory-relevant finding

Cross-entropy held-out loss is insufficient as the sole evaluation metric for biologically-modulated language models. The variant ranked "worst" on CE produced the richest sampled output. The variants ranked first and second produced the most similar output to each other.

For FDA-2025-D-6131 (NAMs guidance) and any in-silico drug-development application, evaluation frameworks must combine quantitative metrics (CE, perplexity) with qualitative sampling at multiple temperatures and domain-specific output review. We will publish this argument formally in our NAMs comment (deadline May 18).

End-of-training drop is universal — likely architectural

All three CE-comparable variants (A, B, C) hit a plateau around step 35–40K, oscillated for 15K steps, then dropped 0.30–0.35 nats at the very last eval (step 60K).

VariantStep 35K bestStep 60K finalΔ
A. Adam6.4476.090−0.357
B. Hormone-LR6.4576.135−0.322
C. Hebbian6.4356.093−0.342

The drop happened across all three rules at the same magnitude. That points at the integrated stack itself — voice + cascade + entity + Ψ + audit + NT — finally settling into coherent equilibrium after 60K steps, regardless of which learning rule drove the gradient updates.

D-full: the trajectory we kept

A bonus run (D-full) saved D's checkpoint at every 5,000-step eval, giving a 14-snapshot trajectory of how STDP perturbation evolves over 60K steps. All preserved on disk for future qualitative analysis.

StepHeld-out CE
5,0008.32 (best)
10,00011.42
15,00010.71
20,00012.70
25,00012.40
30,00012.98
35,00014.83
40,00013.73
45,00013.31
50,00012.71
55,00011.57
60,00016.44 (most drifted)

What's next (already designed, not yet executed)