A squad of [ small language models, SLMs ] is absolutely not the same as a [ large language model with a mixture of experts, LLM with MoE ] architecture. This has been said to me before, so now that I have caught up on the jargon, let me comment on it.
The main difference is the choice between supervision and non-supervision. Meat brains are ( probably ) assembled via a relatively unsupervised process, with some guidance from whatever our early childhood genes are doing at this point in history. Building AI using the same messy foundation is completely backasswards, when we already have 20th century computer technology. The verbal capabilities of humans are a ridiculously thin layer of architecture which sits upon all that evolved before it. Once you organically develop foundations such as set comprehension and therefore logic, you then build verbal coherence on top of that, with little relevance to the messy implementation underneath.
Unsupervised training of foundation models basically treats every foundation model as if it is a bunch of neurons in a petri dish that need to reevolve the capability for logic - and even then, unless strict rules are applied, the LLM doesn't enforce logic for the same reasons that humans often fail to do so. Human training, and most of what we call culture and civilisation, is built on verbal governance that is in most cases trained via what would be called supervised learning when emulated in AI.
Eventually, we will stop building AI this way for the same reason that we do not reinvent material science for the construction of every factory and every car. Then things will be a lot cheaper.