( Admittedly, my verbalisation of this concern is an incomplete study. )
Broadly, T-arch is popular because it transposes time series data to spatial embeddings, allowing timeless parallel compute, for next-token prediction.
1. The parallelism itself in some cases introduces non-deterministic results depending on the hardware and kernel implementations. Folks take shortcuts when imprecision can be afforded.
2. More broadly, presuming a deterministic implementation : a common inference pattern is [ first input + first output -> next input, loop ] in scenarios where end-users want to know how past training data : probabilistically generates series data, under the assumption "present series behave like past series". This is to use T-arch as a DESCRIPTIVE tool.
However : T-arch is often used in tools expected to be NORMATIVE, enforcing specific governance (control). Because T-arch is fundamentally descriptive, governance internal to the looping inference process cannot be deductively guaranteed for novel inputs. Instead the entire T-arch loop must be wrapped in something like old-fashioned logic programming, as in proof assistants, and perhaps including supervised learning architecture.
Buyer beware.
Credit for discussions
- https://lnkd.in/gdAYcMaY
No comments :
Post a Comment