Strings are force-encoded to UTF-16 when serialized?

Hello all. I don’t know if it’s a feature or not (I really hope it isn’t a feature), but it seems like all the strings that are synced between players are UTF-16 encoded.

In this test environment:

  • String is generated from range u+d7ff to u+ffff.
  • Output is a generated string from above, converted to raw byte array, and each byte is expressed as hexadecimal numeric character.

Expected: Exact same output between players
Issue: Characters that in range u+d7ff to u+dfff are truncated to u+fffd.

The Official creators document states that char, string, as well as VRCUrl has a range of U+0000 to U+FFFF.

But, it seems like a hidden internal serializer is forcefully encodes the string, and try to encode them into UTF-16.

You might ask, why don’t you use byte[] or char[] to do such stuffs?
The reason why I’m not working with any types of array is that I think:

  • To compare two byte[], I have to write a function that compares two, and I do not trust Udon to be fast enough to process.
  • When any array type is synced, FieldChangeCallback is not called unless the array itself has changed (for example, length).

While working with string, it’s much better:

  • I can use C# built-in string.Equal() to compare two data containers without Udon’s interpretation performance drop
  • I can pack 2 bytes into one character. (efficiency)

I don’t know what VRChat (or Udon) is doing here, but I see some potential fix:

  1. Treat string data types just like byte[]
  2. Add attributes that can be added to UdonSynced variables, and that string variable will not be encoded and will be treated like byte[] when serialized
    a. Such as [UdonSynced, NotEncoded] string someData = "\ud123";
    b. Such as [UdonSyncedRaw] string someData = "\ud123";

please don’t tell me they will not fix this and I have to go over all of my codes again

I suggest you post this as a bug report, here: World/Udon Bugs & Feature Requests | VRChat