2026-04-04 at

ML / AI jargon revised/learnt today

 ( a light survey this week, after putting it off for a decade or so ) : 

  • - dimension : information-wise differentiated aspect of some quantifiable substrate; grossly synonymous with "rank", "aspect", "degree of freedom", "axis", "characteristic"
  • - N-dimensionality : N refers to the number of different aspects needed to describe a datum; usually N is a Natural number ( 0, 1, 2, etc. ); grossly synonymous with "N-scalar", "N-vector", "N-matrix", "N-tensor" because specifically scalars are 0-rank-tensors, vectors are 1-rank-tensors, matrices are 2-rank-tensors, etc.
  • - feature : a dimension of data
  • - embedding : a set of dimensions used to position a datum
  • - intrinsic vs ambient dimensions : "intrinsic" refers to the fundamentally minimum set of dimensions required to position a datum; "ambient" refers to to any ( arbitrarily greater than minimum ) set of dimensions in which a datum is positioned
  • - MI / mechanistic interpretability : the name of the concern, when people are running hot macro operations but don't understand the micro operations, and so they loop back to do forensics / epistemology on it
  • - RAG / retrieval augmented generation : the general pattern of ( AGENT, RESOURCE ) -> RESULT
  • - MCP / model context protocol : in the context of RAG : the name of a particular open-sourced standard protocol, which enables AGENT traversal and manipulation of RESOURCE; AGENTs are on the client-side, RESOURCES are on the server-side
  • - rank vs dimension : in the context of TRANSFORMATIONS : where a function is [ a mapping between sets DOMAIN and RANGE ] : "dimension" refers to [ the N-spaceness of either set , whereas "rank" refers to [ the dimension of the RANGE, specifically ]; also see "rank-nullity theory", where KERNEL a.k.a NULL SPACE refers to [ the subset of the DOMAIN which maps to "0" in the RANGE ]
  • - SAE / sparse auto encoder : wherein some internal nodes of a cognitive system may have redundant data storage, this refers to forcing-factors applied to encourage specialisation of memory per node 
A succinct description of the modelling concern, for human experience in general ( point of view of a formalist ) : So, the main problem with most people's use of natural language, is that they overestimate the number of INTRINSIC DIMENSIONS required to describe consciousness, because they live in the mess of AMBIENT DIMENSIONS intrinsic to the messy evolution of informal languages.

---

I think you should consider where spatio sensory data structures fall into embeddings. 

Current LLMs are mainly running on embeddings of distances between words and concepts described in words. All words are intrinsically meaningless, so mainly LLMs now are chinese rooms. 

When you switch out the underlying data, replacing words with sensory spatial structures, then the computation of distance between two structures can be done via various methods, to varying degrees of logical rigour. 

That is the future we are moving to inevitably. 

No comments :

Post a Comment