Anyone already working on custom compiler?

I guess I’m not the only one that can’t stand blocks for code, especially when trying to create anything bigger, so I’m asking if someone is already working on that, and maybe there is some group already and some open source project for this.

I’m mostly thinking about using some simple typed syntax, maybe similar to c# as probably everyone working with unity for more than just vrchat know c#, and it would be best language to also write the compiler itself to work without any issues in unity.

Tho for much better development we probably need
https://feedback.vrchat.com/vrchat-udon-closed-alpha-feedback/p/assembler-request-allow-arbitrary-variables and
https://feedback.vrchat.com/vrchat-udon-closed-alpha-feedback/p/define-own-structuresclasses-assembly-only
And collections like lists etc, but can’t link 3 posts.

2 Likes

Here! I’m trying to build somethiing like OCaml or F# though.

1 Like

I think we also need this! https://feedback.vrchat.com/vrchat-udon-closed-alpha-feedback/p/request-metadata-of-built-in-externs-for-static-analysis

1 Like

How names of functions/externs ale related to real expected types them? Kinda expected this to be perfect match.
Also pretty interesting direction… I was thinking about very limited and retarded transpiler from CIL (compiled C#) to udon assembly, a bit hard due to how CIL is complex, but maybe it would be possible to keep some tools and if vrchat team would add more features maybe you could just copy some unity snippets into your udon scripts and they would just work.

We are discussing at the # udon-general in the Discord; the name seems to follow the convention func-name__inputtype1_inputtype2__outputtype1_outputtype2 but we’ve foond something weird. GameObject.Find (static method) and Transform.Find (instance method) have the same-ish name Find__SystemString__UnityEngineTransform and Find__SystemString__UnityEngineGameObject

I went that way because I love writing parser / implementing type inference and byte-code compiler :slight_smile:

1 Like

I’ve seen examples when output (out param) was in the middle of the signature (they postfixed as Ref).
It was like SomeClass.SomeMethod__Param1_Param2_Param3Ref_Param4__Param5 with corresponding c# like signature like that:
static Param5 SomeClass.SomeMethod(Param1 param1, Param2 param2, out Param3 param3, Param4 param4).
And input output slots for method’s node were like this in: [ Param1 “param1”, Param2 “param2”, Param4 “param4” ], outs: [ Param3Ref (corresponding .net type = Param3) “param3”, Param5 null ].

What for same-ish signatures: they belong to different Types (GameObject/Transform) - so there is no ambiguity (full method signatures are always unique, they are like Id’s). What for deciding if method is static or not - i consider method being instance if 1st parameter (node’s input slot) is of method’s owner type and it’s called “instance”. Looks like this instance parameter is not listed in method signatures, so it’s important to know if method is static to push parameters properly before external call.

I also found some strange stuff though. There were types from different methods and from one of events which doesn’t correspond to any Type_ / Variable_ / Const_ graphs - so i only can find it’s “assembly” names by parsing signatures. For example List<UnityEngine.Object>.
For now i think of this kind of decision to get method parameter’s typenames: for example signature like this ParamA_ParamA_ParamB_ParamBRef_ParamA_ParamARef_ParamARef__ParamA would lead to inputs: [ ParamA, ParamA, ParamB, ParamA ] and outputs: [ ParamBRef, ParamARef, ParamARef, ParamA ] (last is always return value, if it’s not void).

I still not sure which order is right to pass params to method - based on signature or based on method’s nodes inputs/outputs definition. Gonna investigate graph’s code for that later…

About request for ability to push integers as literals to stack - it looks like it could be done by just making new variables for that (with null value as initial and actual initial value as runtime .net type instance value passed along with assembly code as part of IUdonProgram (look at graph’s program source code how they do that). It’s look like dirty hack not being able to initialize literals in assebly code, but it works this way for now at least.

PS: I get all graph data from GetRegistries (or something like that) methods from some .dll which is used by graph editor to get all data for drawing nodes.

I’m also trying to make some custom compiler. I would love to use c# but i’m not sure it could be implemented with full support, and also some reflection related stuff looks like will take huge amount of “heap” memory to work, because it’s runtime thing. And there is also this custom stuff like synced variables. It could be done by custom c# attributes maby.
For now i’m thinking about small c#/c-like (not oop, only functions/variables) language compiler based on antlr (language parser generation library). I think i know how to do majority of stuff to do simple programs on it, it must be much easier than graph for coders at least. Not sure i have time for this though :frowning:

1 Like

There is this event “Event_OnControllerColliderHit” - it has parameter “hit”: UnityEngine.ControllerColliderHit and there is no corresponding “assebly” name for it not in const_/type_/variable_ graphs nor in method/event signatures, so i digged a little into decompiled assemblies which is used by udon graph tool, and i found out that these “assembly” typenames are just being genereted on fly from .net type. No predefined dictionary of names. Code is in VRC.Udon.Compiler.dll -> VRC.Udon.Compiler.Compilers.UdonGraphCompiler.BuildAssemblyString():
codeBuilder.AppendLine(dataObjectContainer.identifier + ": %" + UdonGraphCompiler.ConstTypeNameFromType(dataObjectContainer.type).Replace("[]", "Array") + ", " + dataObjectContainer.value);
This is code for setting up new heap variable.
private static string ConstTypeNameFromType(Type t) { if (t == (Type) null) t = typeof (object); return t.FullName != null ? t.FullName.Replace(".", "").Replace("+", "") : "null"; }

1 Like

Just found how to compute jump addresses (looks like it’s not documented anywhere):
Every instruction starting from .code_start has it’s own address and does have some length:
1st instruction of 1st declared event will have address 0x000000 ( = 0), but 1st instruction of 2nd declared event will have address = length of all 1st event’s instructions summarized

Lengths of instructions are:

ANNOTATION: 5
COPY: 1
EXTERN: 5
JUMP: 5
JUMP_IF_FALSE: 5
JUMP_INDIRECT: 5
NOP: 1
POP: 1
PUSH: 5

Except for ANNOTATION it looks like it’s resulting opcode byte length which is 1 byte for operation plus 4 bytes for parameter (integer as address pointer).

Found these values from decompiled VRC.Udon.Compiler.dll -> VRC.Udon.Compiler.Compilers.UdonGraphCompiler.InstructionsToByteCount

Note that adresses in assembly language are in HEX