2026-06-17 at

transformer architecture's non-determinism problem in normative applications


( Admittedly, my verbalisation of this concern is an incomplete study. )

Broadly, T-arch is popular because it transposes time series data to spatial embeddings, allowing timeless parallel compute, for next-token prediction.

1. The parallelism itself in some cases introduces non-deterministic results depending on the hardware and kernel implementations. Folks take shortcuts when imprecision can be afforded.

2. More broadly, presuming a deterministic implementation : a common inference pattern is [ first input + first output -> next input, loop ] in scenarios where end-users want to know how past training data : probabilistically generates series data, under the assumption "present series behave like past series". This is to use T-arch as a DESCRIPTIVE tool.

However : T-arch is often used in tools expected to be NORMATIVE, enforcing specific governance (control). Because T-arch is fundamentally descriptive, governance internal to the looping inference process cannot be deductively guaranteed for novel inputs. Instead the entire T-arch loop must be wrapped in something like old-fashioned logic programming, as in proof assistants, and perhaps including supervised learning architecture.

Buyer beware.

Credit for discussions 
- https://lnkd.in/gdAYcMaY

No comments :

Post a Comment