Hello all. I don’t know if it’s a feature or not (I really hope it isn’t a feature), but it seems like all the strings that are synced between players are UTF-16 encoded.
In this test environment:
- String is generated from range
u+d7fftou+ffff. - Output is a generated string from above, converted to raw byte array, and each byte is expressed as hexadecimal numeric character.
Expected: Exact same output between players
Issue: Characters that in range u+d7ff to u+dfff are truncated to u+fffd.
The Official creators document states that char, string, as well as VRCUrl has a range of U+0000 to U+FFFF.
But, it seems like a hidden internal serializer is forcefully encodes the string, and try to encode them into UTF-16.
You might ask, why don’t you use byte[] or char[] to do such stuffs?
The reason why I’m not working with any types of array is that I think:
- To compare two
byte[], I have to write a function that compares two, and I do not trust Udon to be fast enough to process. - When any array type is synced, FieldChangeCallback is not called unless the array itself has changed (for example, length).
While working with string, it’s much better:
- I can use C# built-in
string.Equal()to compare two data containers without Udon’s interpretation performance drop - I can pack 2 bytes into one character. (efficiency)
I don’t know what VRChat (or Udon) is doing here, but I see some potential fix:
- Treat string data types just like
byte[] - Add attributes that can be added to
UdonSyncedvariables, and that string variable will not be encoded and will be treated likebyte[]when serialized
a. Such as[UdonSynced, NotEncoded] string someData = "\ud123";
b. Such as[UdonSyncedRaw] string someData = "\ud123";
please don’t tell me they will not fix this and I have to go over all of my codes again
