Why Udon VM and not compiling to CIL?

I’ve given Udon Assembly, as well as projects such as UdonSharp a glance and have had some experience working with those, however a question bugged me ever since I started:

Why Udon VM at all? Why not simply compile Udon graphs to CIL, do security checks at compiler level, and let the resulting bytecode be ran by Mono directly?

I’ve played VRChat for some time now and one thing that always bothered me is that even in a world that’s just a completely empty platform with nothing else, there are stutters and lag spikes, which I obviously can’t know the exact reason for, but an educated guess suggests it might have to do with Udon (as my hardware is fairly high-end).

Sure if all the security checks are done at compiler level, it’d mean that the compiler would need to be vastly more complex and would need to do a lot more analysis and prevent any potential abuse, but is it really that bad of a tradeoff to have a complex compiler that produces super optimized code? For security, the compilation could take place at avatar/world load-time, instead of at publish-time, which would eliminate any possibility for injecting malicious CIL into the resulting thing.

So I just wanted to know the reason why the VRChat team decided to go this route, as it seems fairly odd to me.

if all you cared about was external security like accessing the internet, then yes, a compiler check would be sufficient. Just don’t run the program if the code contains an illegal function.

However, the problem is vrchat also cares about internal security. Objects like player avatars and the vrchat UI must be protected to prevent worlds from messing with it. no amount of compiler checks can stop that because you can get references to those things with a simple raycast. You need runtime checks, and that is one of the things udonVM does.

Thanks for the answer! It does clarify some things, however now I have another observation I think I can make. Also, I’m sorry if I’m saying something that’s obviously wrong, I am not nearly as experienced in Unity or C# as I probably should be.

What if such dangerous methods were wrapped by the compiler in some sort of security checking function? Surely the compiler could check if the return type of an expression is something dangerous, and simply wrap the return value of said expression in a function call that either returns the resulting reference or a reference to nothing if it considers the reference dangerous in any way.

that could also be a viable strategy if you trusted the compiler. but the compiler runs on the world uploader’s machine, so it cannot be trusted. It could easily be modified or simply be running an outdated version that does not have the right checks.

Now the obvious counterpoint to that is if there are dangerous methods like that, just blacklist them and create a security-checked copy of the function in some dll. that… could actually work. but the problem is just that it would be playing whackamole with hundreds of random functions and if you miss anything, you’ll have to fix it later and if you fix it later you’ll be breaking old worlds that used that function legitimately.

But what if the problem is approached from the other end.

Firstly, let the code compilation always run on the client machine in-game upon avatar / world loading, the world/avatar would have the code attached to it in source form. Sure, it would cause temporary lag spikes as things compile, and could be abused if a particularly huge project is loaded, however there could be a hard-limit to how many code files would be allowed to be present in a world/avatar (perhaps even integrated into VRChat’s avatar performance metric system), or they could be compiled gradually rather than all at once (while the avatar remains in its “processing” state).

The editor would sanity-check the code and try to compile it prior to submission in order to ensure it would actually compile when in-game, and even if this were to be bypassed, avatars with bad code could simply display their fallback or the default avatar, while worlds with bad code would simply refuse to load and redirect the player back to their home world or the default VRChat home world.

Secondly, as I saw in Udon graph editor, you (the VRChat developers) have the complete control of which functions are whitelisted and are allowed to be called from Udon. A whitelist-based system would hardly ever be bypassed by any oversights of the developer, and things could always be added to it later down the line if the demand exists for them.

A solution very similar to this is employed in Garry’s Mod, and seems to work fairly well.

Technically-speaking VRChat is compiled to IL2CPP so it’s not straightforward to load user CIL because IL2CPP doesn’t allow that at all, everything needs to be compiled ahead of time.

Security-wise as said before yeah you’d need a big layer of complexity that’d be subject to exploits on its own. Mono also abandoned the goal of running untrusted code a decade ago and never really finished its work towards that goal in the first place, and I believe given some benchmarks, that Unity has stripped out some of the obsolete security measures from its copy of Mono in favor of performance as well.

It can be a security risk on its own to compile user code on the client side, the compiler itself can have bugs that lead to exploits.

In general, computer security when it comes to running untrusted code is a act of mitigating risk. There is no way to know for certain that there aren’t any exploit vectors, and it is pretty likely that there are a number of them when you have codebases as large as .NET runtimes. In this case VRChat decided that it was too risky to use a deprecated security model that Mono itself tells you to not use.