I could definitely see it! I would think that voice commands would be more for the musician side of it, such as "start", "stop", "cut", "redo", "alternate", stuff like that. Don't really need tensors for that. But yeah, once they have a question about "how do I...?", you can layer in some of the latest DeepSeek-style chain-of-thought stuff and probably get some actually useable results with it.
Still though, all of that is a layer AFTER that initial barrier to entry.
Even this is still a problem, because it's unlikely they know even what question to ask. Or if a sensible question is asked it may be an XY problem, where what is really intended is not what is asked.
Having thought about this for the last few minutes, it does seem inevitable that the software would have to start coaching the musician in the ways of the engineering and of "music software" people, so that the inputs become more accurate and aligned with the outcomes the software is capable of providing.
I think everyone would crave becoming more productive in the environment over time and not have to suffer the initial baby steps forever.
It's very difficult to imagine a DAW environment which exposes deeper functionality that is not already like a lot of the existing packages.
Edit: and one final thought - it's a hard environment to build by the nature of the work being done being a creative process with no correct answers and which needs to support a multitude of different approaches to creativity. It's pretty opposed to software being generally a machine with a fixed number of functions
Still though, all of that is a layer AFTER that initial barrier to entry.