Beause the overall transformer is trained iteratively, the system will store information somehow in whatever parameters are available without prejudice. If you give it one param, it will use it, if you give it ten, ditto.
The net architecture is simply a decision of how much memory to allocate to internal nodes in a multinode NN.
No comments :
Post a Comment