I feel like VRChat not going deep enough into tech details (postmortems, as suggested above) is hurting their credibility among tech-oriented crowd quite a bit here.
The OG promise of Udon 2 that could be read between the lines was using two industry-standard technologies (WASM and some sort of CIL AOT) that both can provide excellent performance and interoperability. With WASM one could use any programming language, with enough community-powered elbow grease, and having off-the-shelf CLR would mean VRChat could focus on truly important things (more VRC+ gated features!).
Further discussion did reveal, as I understood it, that WASM was used as a sandbox for Mono interpreter - worse for performance, and not really delivering on WASM universality angle. I can see how this specific approach would have some of the problems that were mentioned in this and previous posts - not unsolvable, but I’d argue about the general direction of using yet another interpreter anyway.
The ideal outcome for the tech-savvy here would be reinstatement of WASM-based world scripting, with C# being AOT-compiled into modules actually loaded and JITed in the client, without any interpreter proxies. Other than the inevitable WASM interpreter on iDevices, that is - common Apple L - but Soba has no hopes for JIT on iStuff either. Speaking of Apple, limiting everyone’s performance by that of the worst platform sounds odd. We’re not getting quest’s hard 20k poly limit and “no very poors” on PC, why apply it to scripting performance?
What we get instead is Udon 1.5 Soba, seemingly an expansion on the concept of the original Udon VM, which boils down to a basic bytecode interpreter, and VRC is now standing in front of the monumental task of reimplementing CLR. The entire Mono team tried that once. We all know how well that is going, and how long it took to get to its current capabilities.
Sure, the initial scope of Soba is way smaller. Sure, we’ll get some very basic CLR features like generics soon™. Sure, the outer format is a CIL Assembly, so you can swap Soba interpreter for something faster later on. But let me ask this: if I take an off-the-shelf C# library, how long would it take for Soba to be able to run it, on average? And if at some point it manages to run some horribly outdated net35 build of it, how fast would it run? If I transpile WASM bytecode into the subset of CIL supported by Soba, how big would a performance drop be compared to a proper solution with WASM JIT?
There is a huge chicken-and-egg problem with Udon and what people do with it. All we see is relatively simple, as Udon doesn’t provide the performance for complex stuff. The promised “up to 10x” speedup of Soba would barely get us into “current worlds don’t tank your framerate as hard” territory. Take a look at what creators can achieve with GPU shaders - that’s the benchmark for what should be possible and performant in “Udon Next”, not the current compute-restricted use cases. Oh, and the painful networking (what’s an RPC with arguments?) doesn’t help at all either.
Oh, and speaking of WASM - slim modules built from a native language or a restricted subset of C# on a JIT runtime with gas metering would open the doors for proper, safe and performant avatar scripting instead of having to build and run abysmal spaghetti mess animators that tank your performance even on distance-hidden avatars. You could’ve built the tech stack once and used it everywhere. Somehow I doubt we’ll see any kind of proper avatar scripting anytime soon, in part because of lackluster performance of existing and upcoming scripting solutions.
All in all, I wonder if VRChat ever considered a WASM-first scripting approach (no interpreters!), and if so, why was it not chosen (Apple is not a valid argument here). It looks like you end up having two scripting runtimes anyway. Why build more of the same, similarly mediocre thing with only some aspects improved, when you could build practically the best thing possible instead?