2026-05-25 at

The Cartesian Product Bubble

I'm guilty of avoiding AI over the past decades because the techniques I've heard about still seem rather primitive. The last decade in particular has been characterised by VCs throwing obscene amounts of money at a fundamentally inefficient approach.

After noodling around in the current tech for a week or so, just to make sure I understand what's going on, I think it's safe to say that the the titular concern of this post stands firm.

An analogy. At the heart of trendy LLMs is a giant Y times Y list of known words in English ( also other languages, but nevermind those for now ), forming an enormous 2D table. It's Y times Y because both H and V axes have the same list of words - sure, half the table is redundant, so you can think of it as a trangular half-rectangle of unique pairs. If you lookup the junction between any two words, you find a WINDOW to a realm of many, about 10^(3 +/- 1 ), dimensions of information about each pair of words.

Based on the "best" available public information, this is the data structure for storing everything the LLM knows about the world - which it knows ONLY FROM READING TEXT ( except for MLLMs, which we'll also ignore for now ). Even in the case of MLLMs, sensory spatio-temporal data can be understood to be stored within the realm accessed by each WINDOW.

Of course, when you ask an LLM a question like "What is the meaning of life?" it doesn't need to peek through ALL the WINDOWS and all the realms above, rather it only has to peek through a subset of windows, W_n. However it's still fundamentally inefficient, because : it doesn't JUST look at W_n and answer your question. 

Oh no.

It looks at your question, goes to a set of windows, W_n1, takes a peek, grimaces in deep thought, spits out ONE word, then pats itself on the head and goes to W_n2, a COMPLETELY DIFFERENT set of windows, spits out a SECOND word, and goes on and on until W_nM to obtain M words. While there is a cache, this is exactly what it sounds like - quite a bit of work.

I'm not a very smart fellow, so I'm probably going to be wrong about this. But, it would seem that the thin red circle that I am drawing around the entire architecture that drives most of our current AI tech is going to have to collapse, and be replaced by more efficient methods soon enough.

When? Who knows. I've been waiting since 2003, and "they" haven't figured it out yet. Some of "them" are pretty close, I think. I like the JEPA and MLLM approaches, and am eager to see where they end up.

One thing's for sure - if you create a model of the world based on this Cartesian product approach to windows upon a realm of thousands of dimensions, and have to crawl through the whole library, to peek through a different SET of windows, once per answer word ... even if it gets you the correct answer to a question, it's bloody tiring.

And to BUILD that library of windows to that realm ... it still takes WAY more money than it takes to build a human brain.

No comments :

Post a Comment