Pebble AI is developing next-generation neural architectures where models reason through structured matrix representations. 26 rigorous experiments on H100 GPUs with verified novel findings in representation quality and emergent thinking dynamics.
Current neural networks think in flat vectors — fixed-length lists of numbers with no internal structure and no measurable notion of complexity. We replace this with matrix-valued representations where matrix rank provides a measurable complexity axis. Models process these matrices through multiplicative composition that can build rank, and we've discovered that the output architecture determines whether thinking enriches or collapses representations.
Each token is a 32x32 matrix instead of a 1024-dim vector. Outer-product embeddings create structured, low-rank initializations. Multiplicative composition (matrix multiplication) builds rank iteratively — a mechanism impossible with vectors.
A matrix's effective rank (via SVD) measures how many independent concepts it encodes. We've shown rank enrichment is emergent and output-head-dependent: MultiProbeHead drives rank 7.55; zero-param head drives rank 5.0. Same model, different dynamics.
Models generate sequences of matrix-valued thoughts before output. Each thought attends to inputs and previous thoughts. Thinking benefit scales monotonically: 1.6% (N=1), 2.8% (N=2), 10.6% (N=4). Mechanism validated mechanically; scale effects unknown.
A certainty-driven mode-switching mechanism: running average of recent prediction uncertainty modulates thinking depth. Grounded in locus coeruleus neuroscience. Verified novel — every existing adaptive compute system uses per-token decisions.
26 experiments across 8xH100 GPUs with rigorous methodology: pre-experiment checklists, hypothesis-driven design, comprehensive literature verification, and honest negative results.
| Finding | Result | Status |
|---|---|---|
| Outer-product embedding advantage (T=1) | 26% better per-parameter quality vs vectors | Proven |
| Rank enrichment is output-head-dependent | Novel phenomenon — MultiProbeHead rank 7.55 vs ZeroParam rank 5.0 | Proven |
| Matrix operations parameter efficiency | 130x more efficient per layer vs vector attention | Proven |
| Thought interleaving benefit scaling | 1.6% (N=1) to 10.6% (N=4) — monotonic with thought count | Proven |
| Adaptive compute novelty | Passage-level uncertainty → thinking depth. No prior system uses this. | Novel |
| Matrix structure advantage at scale | Requires 10M+ params minimum for conclusive evidence | Next Phase |
| Cross-domain generalization | Raw bytes (text + image + audio) — structure-dependent transfer | Next Phase |
Architecture verified against: COCONUT (Meta 2024), CoCoMix (Meta 2025), LoopFormer (ICLR 2026), Seq-VCR (ICLR 2025), MBLM (IBM 2025), bGPT (2024), EverMind MSA (2026), Quiet-STaR (Stanford), and 12 comprehensive research documents on thought generation, byte-level modeling, and neural reasoning.
Rigorous research means documenting failures as thoroughly as successes. These negative results are equally important — they narrow the hypothesis space and guide future work.
LoopFormer baseline: 0.87 BPB vs our matrix thinker: 1.67 BPB. Speed gains don't translate to quality. The operations are fundamentally more expensive.
E baseline (48 layers, no thoughts): 3.524 BPB. A with thoughts: 3.535 BPB. At 288K params, simply adding depth beats iterative refinement.
3D attention is slower AND worse. Drives solidification (rank drops), BPB 2.457 — 29% worse than Frobenius attention. Confirmed dead end.
Adaptive halting mechanisms fail below 10M params. Expected steps collapse to 1.0. Requires fixed-iteration baseline.
At 288K params, models barely learn unigram statistics. Cannot draw conclusions about reasoning, generalization, or abstract thought. Requires 10M+ minimum.
The outer-product embedding could theoretically be flattened to vectors. Does the MATRIX STRUCTURE help, or just the embedding? This blocks all downstream decisions.
Four-phase approach to resolve the core question: does matrix structure matter at scale, or is the advantage purely from the embedding?
Three-way param-matched comparison (standard vs bottleneck vs outer-product) at 2.5M and 10M params. Determines whether the structure or just the embedding creates the advantage. Approximately 10 H100-hours.
CoCoMix-style thought interleaving at 10-50M params. Measure whether matrix thoughts provide advantages over vector thoughts at publication-grade scale. Approximately 20 H100-hours.
Mixed byte data (text + images + code + audio) as raw bytes. Measure transfer coefficients. Test whether matrix rank correlates with cross-domain generalization. Approximately 30 H100-hours.
Standard benchmarks (WikiText-103, The Pile) with honest comparison against EvaByte, BLT, MBLM, LoopFormer at matched scale. Approximately 100 H100-hours.
Total compute needed: ~160 H100-hours (~$400-800 at cloud rates). Either outcome (matrix helps or doesn't) is publishable and valuable.
Founder & CEO of Pebble AI (Nevada S-Corp). Independent ML researcher focused on novel neural architectures. 26 experiments completed on H100 infrastructure with rigorous methodology: hypothesis-driven design, comprehensive ablations, and honest assessment of negative results. Sam is applying to GPU credit programs (NVIDIA Inception, Microsoft for Startups, Google Cloud, Modal, AWS, Lambda) to fund Phase 1-4 research.