Novel Architecture Research

AI that thinks in matrices,
not vectors

Pebble AI is developing next-generation neural architectures where models reason through structured matrix representations. 26 rigorous experiments on H100 GPUs with verified novel findings in representation quality and emergent thinking dynamics.

26
Experiments Completed
26%
Better Per-Param Quality
130x
Parameter Efficiency
8xH100
Infrastructure Used

The Research

Current neural networks think in flat vectors — fixed-length lists of numbers with no internal structure and no measurable notion of complexity. We replace this with matrix-valued representations where matrix rank provides a measurable complexity axis. Models process these matrices through multiplicative composition that can build rank, and we've discovered that the output architecture determines whether thinking enriches or collapses representations.

Matrix Representations

Each token is a 32x32 matrix instead of a 1024-dim vector. Outer-product embeddings create structured, low-rank initializations. Multiplicative composition (matrix multiplication) builds rank iteratively — a mechanism impossible with vectors.

Rank as Abstraction

A matrix's effective rank (via SVD) measures how many independent concepts it encodes. We've shown rank enrichment is emergent and output-head-dependent: MultiProbeHead drives rank 7.55; zero-param head drives rank 5.0. Same model, different dynamics.

Iterative Refinement

Models generate sequences of matrix-valued thoughts before output. Each thought attends to inputs and previous thoughts. Thinking benefit scales monotonically: 1.6% (N=1), 2.8% (N=2), 10.6% (N=4). Mechanism validated mechanically; scale effects unknown.

Passage-Level Adaptive Compute

A certainty-driven mode-switching mechanism: running average of recent prediction uncertainty modulates thinking depth. Grounded in locus coeruleus neuroscience. Verified novel — every existing adaptive compute system uses per-token decisions.

Verified Findings

26 experiments across 8xH100 GPUs with rigorous methodology: pre-experiment checklists, hypothesis-driven design, comprehensive literature verification, and honest negative results.

Finding Result Status
Outer-product embedding advantage (T=1) 26% better per-parameter quality vs vectors Proven
Rank enrichment is output-head-dependent Novel phenomenon — MultiProbeHead rank 7.55 vs ZeroParam rank 5.0 Proven
Matrix operations parameter efficiency 130x more efficient per layer vs vector attention Proven
Thought interleaving benefit scaling 1.6% (N=1) to 10.6% (N=4) — monotonic with thought count Proven
Adaptive compute novelty Passage-level uncertainty → thinking depth. No prior system uses this. Novel
Matrix structure advantage at scale Requires 10M+ params minimum for conclusive evidence Next Phase
Cross-domain generalization Raw bytes (text + image + audio) — structure-dependent transfer Next Phase

Literature Validation

Architecture verified against: COCONUT (Meta 2024), CoCoMix (Meta 2025), LoopFormer (ICLR 2026), Seq-VCR (ICLR 2025), MBLM (IBM 2025), bGPT (2024), EverMind MSA (2026), Quiet-STaR (Stanford), and 12 comprehensive research documents on thought generation, byte-level modeling, and neural reasoning.

Honest Negative Results

Rigorous research means documenting failures as thoroughly as successes. These negative results are equally important — they narrow the hypothesis space and guide future work.

Matrix Ops Don't Beat Vectors at Matched FLOPs

LoopFormer baseline: 0.87 BPB vs our matrix thinker: 1.67 BPB. Speed gains don't translate to quality. The operations are fundamentally more expensive.

Thought Interleaving Loses to Adding Layers

E baseline (48 layers, no thoughts): 3.524 BPB. A with thoughts: 3.535 BPB. At 288K params, simply adding depth beats iterative refinement.

3D Matrix Attention is Dead

3D attention is slower AND worse. Drives solidification (rank drops), BPB 2.457 — 29% worse than Frobenius attention. Confirmed dead end.

PonderNet Halting Collapses at Small Scale

Adaptive halting mechanisms fail below 10M params. Expected steps collapse to 1.0. Requires fixed-iteration baseline.

Scale Limitations of Current Results

At 288K params, models barely learn unigram statistics. Cannot draw conclusions about reasoning, generalization, or abstract thought. Requires 10M+ minimum.

Critical Open Question: Structure vs Embedding

The outer-product embedding could theoretically be flattened to vectors. Does the MATRIX STRUCTURE help, or just the embedding? This blocks all downstream decisions.

Compute Plan

Four-phase approach to resolve the core question: does matrix structure matter at scale, or is the advantage purely from the embedding?

Phase 1: Embedding Ablation

Three-way param-matched comparison (standard vs bottleneck vs outer-product) at 2.5M and 10M params. Determines whether the structure or just the embedding creates the advantage. Approximately 10 H100-hours.

Phase 2: Thought Scaling

CoCoMix-style thought interleaving at 10-50M params. Measure whether matrix thoughts provide advantages over vector thoughts at publication-grade scale. Approximately 20 H100-hours.

Phase 3: Cross-Domain Learning

Mixed byte data (text + images + code + audio) as raw bytes. Measure transfer coefficients. Test whether matrix rank correlates with cross-domain generalization. Approximately 30 H100-hours.

Phase 4: Publication Benchmarks

Standard benchmarks (WikiText-103, The Pile) with honest comparison against EvaByte, BLT, MBLM, LoopFormer at matched scale. Approximately 100 H100-hours.

Total compute needed: ~160 H100-hours (~$400-800 at cloud rates). Either outcome (matrix helps or doesn't) is publishable and valuable.

Team

SL

Sam Larson

Founder & CEO of Pebble AI (Nevada S-Corp). Independent ML researcher focused on novel neural architectures. 26 experiments completed on H100 infrastructure with rigorous methodology: hypothesis-driven design, comprehensive ablations, and honest assessment of negative results. Sam is applying to GPU credit programs (NVIDIA Inception, Microsoft for Startups, Google Cloud, Modal, AWS, Lambda) to fund Phase 1-4 research.