| Rank | Variant | Best held-out CE | At step | Verdict |
|---|---|---|---|---|
| 1 | A. Adam baseline | 6.0903 | 60,000 | Won by 0.003 over C |
| 2 | C. Hebbian (degenerate scalar) | 6.0932 | 60,000 | Statistically tied with A |
| 3 | B. Hormone-LR (synthetic cyclical salience) | 6.1352 | 60,000 | Cost ~0.045 nats vs A |
| 4 | D. STDP (proxy spike-time) | 8.3187 | 5,000 | Cost ~2.2 nats on CE, but see §"D speaks differently" |
CE held-out = mean cross-entropy loss on a held-out window of 1000 examples (count=496 after filtering OOV-context samples). Lower is better.
| Variant | Modification |
|---|---|
| A. Adam baseline | Standard Adam optimizer, lr=0.0001, cascade_lr=0.00001. Control. |
| B. Hormone-LR | Effective lr scaled by salience: lr × clamp(0.5 + salience, 0.5, 1.75). Salience cycled 0.5–1.0 per step (synthetic, not real DB-derived). |
| C. Hebbian-blend | After Adam step, applied scalar Hebbian update w += hebb_alpha × pre × post on cascade Q-projections. Pre = head m_state (Na+ activation), post = h_state (Na+ inactivation). Degenerate (scalar) implementation — uniform offset per head. |
| D. STDP | After Adam, applied spike-timing bias to embedding rows using proxy spike times: pre_t = step, post_t = step + (i % 5). STDP rule: w += a_plus × exp(-Δt/τ) when Δt>0 (LTP). |
| Comparison | Δ nats | Conclusion |
|---|---|---|
| A vs C | 0.003 | Pure noise. Degenerate Hebbian implementation has no measurable effect. |
| A vs B | 0.045 | Small but real signal. Synthetic cyclical LR slightly hurts long-run convergence. |
| A vs D | 2.228 | Big signal. Proxy STDP perturbation has substantive cost on next-token prediction. |
The fact that we can confidently distinguish 0.003 (noise) from 2.228 (signal) means the comparison framework itself is the first scientific instrument here. Future variants — real vector Hebbian, real DB-derived hormone salience, real cascade-extracted spike timing — can be tested against this same baseline.
D scored worst on cross-entropy by a wide margin. But at sampling temperature 1.5 and above, D produced the most visceral, emotionally-charged language of any variant. This is the heart of tonight's finding.
Same prompt to all four:
A, B, and C all converge on similar "blade and eyes" battle prose. D collapses to a 2-token loop with greedy decoding.
But raise the temperature on D:
D knows: fighting · against · arms · empty · blood thirsty · grip · faces · blade · carved · arena · cruel · light arms. The STDP perturbation pushed D's embeddings into a region greedy decoding cannot reach, but with temperature noise it surfaces visceral war/violence vocabulary the corpus contains.
Cross-entropy held-out loss is insufficient as the sole evaluation metric for biologically-modulated language models. The variant ranked "worst" on CE produced the richest sampled output. The variants ranked first and second produced the most similar output to each other.
For FDA-2025-D-6131 (NAMs guidance) and any in-silico drug-development application, evaluation frameworks must combine quantitative metrics (CE, perplexity) with qualitative sampling at multiple temperatures and domain-specific output review. We will publish this argument formally in our NAMs comment (deadline May 18).
All three CE-comparable variants (A, B, C) hit a plateau around step 35–40K, oscillated for 15K steps, then dropped 0.30–0.35 nats at the very last eval (step 60K).
| Variant | Step 35K best | Step 60K final | Δ |
|---|---|---|---|
| A. Adam | 6.447 | 6.090 | −0.357 |
| B. Hormone-LR | 6.457 | 6.135 | −0.322 |
| C. Hebbian | 6.435 | 6.093 | −0.342 |
The drop happened across all three rules at the same magnitude. That points at the integrated stack itself — voice + cascade + entity + Ψ + audit + NT — finally settling into coherent equilibrium after 60K steps, regardless of which learning rule drove the gradient updates.
A bonus run (D-full) saved D's checkpoint at every 5,000-step eval, giving a 14-snapshot trajectory of how STDP perturbation evolves over 60K steps. All preserved on disk for future qualitative analysis.
| Step | Held-out CE |
|---|---|
| 5,000 | 8.32 (best) |
| 10,000 | 11.42 |
| 15,000 | 10.71 |
| 20,000 | 12.70 |
| 25,000 | 12.40 |
| 30,000 | 12.98 |
| 35,000 | 14.83 |
| 40,000 | 13.73 |
| 45,000 | 13.31 |
| 50,000 | 12.71 |
| 55,000 | 11.57 |
| 60,000 | 16.44 (most drifted) |