Developer Update - 28 September 2023

Incremental performance improvements are always a good thing, but at the end of the day there is nothing VRChat can do to make a room full of avatars with 20 materials and 500k triangles run fast. All your suggestions will have a superficial change at most. You are overstating the seriousness of the performance impact these systems have.

Mock it all you want, it remains true.

The only way I can see performance seriously improved is if either VRChat hard-enforce performance constraints, or a VRChat/third-party come up with some sort of specialised avatar creator that just doesn’t allow its users to do these things. Blender and Unity are professional tools aimed heavily at industry that assume the user immediately understands how to use them.

I don’t think you answered my question positively.
And I’ve been using Unity3D since Unity 5 for years, I’m not ignorant, okay?

I’ll tell you one thing for sure, VRchat doesn’t use streaming texture, it’s just in the SDK, it’s never really used.

And I know a lot more about how Unity3D actually works with memory.

You need to prove it to me with a profile instead of me having to prove it to you over and over again, I’m losing patience.

RAM doesn’t really replace VRAM at all, it’s just that VRAM is reallocated and loaded over and over again and it’s pretty obvious that you’re just going to have just enough 12GB to cover a lot of scenes without lagging badly too often, and that objects in the camera that are covered are being loaded up and others are being excluded from the VRAM.

You just need enough 150MB+ avatars, they consume a lot, you just have to find this public world and turn your head back and forth to observe it.

The process doesn’t even go directly through RAM via PCIE back to VRAM, it just reloads the VRAM content of the previous avatar, which is fundamentally different.

This means that RAM holds all of the avatar’s resources and reloads them back when needed, which leads to a lot of lag and a higher cost than sharing GPU memory and sending non-hotpoints back to the GPU’s VRAM.

Also stop exaggerating the cost of animation, I’ve observed an animation with about 20-30 layers, the cost is not huge by any measure, but the CPU frametime caused by an avatar is more than 2ms or even 2.5ms, far more than many avatars.

If you don’t have the right tools, you won’t be able to study the features in depth.

In addition to CPU, if you have AMD, intel profile, in fact, often only 3thread, 4thread is a crowded situation.
When loading, you can utilize 5 to 6thread, but it’s not very common.

Because I use the profile, I know the root cause of the problem is not in the animation.

The core of the game engine is the cause of the heaviest load on the main thread, and what exactly is going on in the core needs to be analyzed by some C++ decompiler tools.

And I’ve reversed it many times, so I know part of the abnormal CPU overhead comes from C# VM, and with the impact of scene management, that is, the animation may be a scapegoat rather than the real culprit, in the process of experimentation is often easy to confuse the real object, this is because of the complexity of the dependency between each other, do not inject the analysis of the profile of this kind of tool is not clear about the root cause of the problem and the occurrence of the problem.

Until now I still don’t know why you don’t put forward the profile, if you are engaged in the engineering aspect that should be very clear, instead of mentioning a bunch of concepts like explaining the actual answer that does not solve the problem.


To avoid endless and pointless arguments, I’ll put it this way.

There is no streaming texture in the world, and if you see a change, it’s a LOD to mesh replacement, and it doesn’t save any VRAM.

Seriously, there’s too much pointless debate and discussion, and no matter how much is said, no official action will be taken.

If they would have done it, they would have done it already.

If you don’t realize the complexity behind the whole thing and have done it yourself, you won’t be able to understand it and I’ll need to explain it over and over again, it’s too tiring.

I’m here to give advice, not to go on and on and on.

Ever since the update, I have been stuck on the first blue loading screen with no load bar, after I view my login. Everything was working before the update, unsure what happened.
I tried reinstalling the game and using the open beta, but neither worked. I also tried Desktop Mode, and deleting the folders under %appdata%

I explained, suggested, and implied a great many things vrchat and specific asset creators can indeed do to improve that greatly, without expecting the impossible (changing mass uneducated peoples’ habits). I feel like you aren’t understanding something. It’s not true that all the burden rests on the joe bloe avatar creator. . . Your average avatar creator is a variable factor that you need to account for and compensate for it’s margin of error, if that gap is wide, then you need to put more effort into minimize core system cost so that the common threshold of resource abuse is not as detrimental.

This is the same argument so many blame-shifting devs always make “just get a better computer and you will be able to run the game better”. How about you code your game better and it’ll be able to run on weaker systems. Have you ever looked at games like Red Faction or some other unreleased titles that existed back in the xbox/ps2 generation? They had realtime dynamic physics destruction that makes Battlefield BC2 and beyond look like nothing. They did that on that funky old freakin hardware. Because they put in the effort to make it happen.

As far as we know, VRChat has nobody in charge of bulk & core system optimization.



I’ve been using it since 2017, and actively constantly trying to understand all it’s workings, following GDC lectures, studying rasterization etc… I discuss it in detail with a lot of people that understand it very deeply, and also read a lot of posts by our lord and saviour Ben Golus. Go do more research and discuss it with more technically able individuals and collect a consensus. Anecdotes and single opinions are not very useful.

Sorry, but i’ve had dozens of individuals concur this same issues and observe the same costs. People have profiled extensively stuff like having a lot of layers, having a interrupt transition to self off of anystate, etc. I started mecanim from places like this https://www.youtube.com/watch?v=8VgQ5PpTqjc which go over a lot of technicalities and performance results, not from vrchat’s SDK3 and the beliefs of this particular community.

Once again, i have confirmed many times, as have many others, that bad animators is one of the largest performance hogs. Disabling animators or properly optimizing controllers massively decreases cpu frametime in vrchat.

Unity runs on C# not C++. Also, the core engine has only gotten more optimized, and yet steadily (not punctuated by unity upgrades) vrchat’s performance has gone down, usually punctuated by particular updates that vrchat makes to core systems or features.

I never said all the burden rests on the avatar. I said most of the burden does. That’s quite a different thing. I don’t feel you are understanding what I’m saying.

This is not a valid comparison because avatar creators are devs and not (just) players.

The core of the game engine is mostly C++.

Hey sorry I know this isn’t the right dev update but I can’t seem to find one on the announcement that came out today and I was just wondering. For the content gating that’s in the works, will guest accounts also have the filters auto enabled like accounts under 18 will? Because I think that makes sense logically sense I don’t remember if guest accounts used a dob on vrchat itself to make that guest account. And it seems like a workaround for underage users to just use those accounts if that’s the case.

Would it be possible to have an toggle-able option that prevents you from clicking on chairs? I can’t tell you how many times I find a great world with a great spot to chill at, but i accident click on a chair space and it snaps me to it and i fall off the ledge when I get off of it.

I’m not saying turn off chairs in the world, i’m saying prevent me from being able to click on them.

Are you able to afford a project?
Do you have sufficient experience and extensive basic training in software design?
If you are not in this field, I suggest you stop talking about it.

Many things cannot be solved by the so-called consensus. There are many problems in actual project use. The problems encountered by Unity3D are by no means solved by the so-called animation.

I have even measured many of the historically recurring bugs, such as retransmissions caused by AnimatorOverrideController that spiked CPU usage.

This is why I emphasize “engineering”, because the profile tool is very important for locating problems.

In addition, the heaviest load on vrchat always lies in two parts, the game engine core (unityplayer.dll) and the stuff developed by VRchat itself (gameassembly.dll)

Below is one of the data I collected, the consequences of an unusually high CPU overhead avatar on the entire scene.

As a result, up to 55% of the main thread is reduced to 2core~3core utilization. By affinity, it only requires around 2core. (VRchat cannot be specified manually during execution).

After using the profile tool, you must use the corresponding dll that matches the version for decompilation (vrchat previously used 2019.4.40 instead of 2019.4.31).

Next, try to trace and collect more data through many methods, and then send the address to the decompilation tool to trace the specific execution address, so that the instructions executed by specific fragments within the function can be analyzed.

Generally speaking, the analysis results include about thirty to fifty functions with obviously different usage rates. There is no need to analyze hundreds of functions, which occupy almost all the load on a certain dll.

Next, I spent a lot of time analyzing each one, and found that about every five or six fragments or even higher are actually a function. In the end, I can find that less than a few functions in a dll occupy the main workload.

In the end, I repeatedly tried to reproduce it in my own export project to find out what the real problem was (after all, Udon and PB are not officially public).

I honestly don’t think it’s useful to keep looking for information you think is relevant, you can easily fall into a trap.

Let me give you an example. If you suspect that X is guilty, collect information related to No matter how guilty you are in your eyes, the problem still exists after you solve it in the end.

So I always emphasize that it is important to be a developer, because after too much experience, you will find that useless optimization behavior is a sin and causes more problems, unless you have enough methods to accurately locate the problem.

To be honest, I don’t like too much preaching. Engineering projects don’t work on your suspicions.

Though the animated controller in the “avatar” is really questionable, it’s the focal point of the whole architectural design with tons of combinable parts.

However, there are some methods, such as turning off the animation in VRchat’s security level and moving all avatars closer again to collect data. You will find that he may not be the cause of the problem.

To be honest, after decompiling some things in VRchat, for example, there are many parts of the design in C# that I can complain about, but maybe it is just a design based on weighing the pros and cons. There are problems with the design of some components that rely on Unity.


In fact, find an IL2cpp tool to decompile VRchat’s gameassembly and get C#
After dnspy ​​parses the RVA click, and obtains the address according to the profile, you can know the name of the executed function, and it is very obvious to know the cause of the problem.

Next, open an open source IDA, press G to put the profile address in, and compare them one by one.

Let’s talk secretly here.

2 Likes

Regarding mipmap streaming, here is a relevant discussion: Unity Mipmap Streaming in VRChat - #13 by dark You commented there so I assume you’ve read it, but I figured I’d link it anyway for others who haven’t. Hopefully they can get it enabled at some point.

shaders like poi and pretty much all the ones people use are pixel lit, that means they take realtime lights at the pixel level instead of the vertex level

Per-pixel lighting is the standard for modern games and has been for quite a while. Vertex lighting is honestly a pretty strange choice these days, except if you’re going for a retro aesthetic I guess.

poi’s poly discard isn’t discarding anything, it yeets them beyond the frustum and farplane so that the geometry surface isn’t running on them (i dunno if this actually works how they hope, i’d like to see actual tests of it rather than just claims)

It works. Triangles that are entirely outside of clip space, including ones beyond the far clip plane, are guaranteed to be culled and not rasterized. I haven’t read poiyomi shaders so I can’t speak to what they do specifically, but in general there are many potential reasons you could want to discard triangles in the vertex shader as opposed to some other part of the graphics pipeline.

Realtime LoD generation aside from distance impostors is an extremely taxing system and is one of the main reasons why UE5 struggles to run acceptable framerates without upscalers.

Nanite does not generate LoDs in realtime, they are pre-baked. And in any case that has no relevance to VRChat’s upcoming impostor system.

I feel like you don’t even know how unity and C# works here. VRChat ahs many times done work on the very much necessary garbage collection, which is purging unused memory, because it doesn’t automatically expire it, you have to tell it to do so. I’ve had many people i know test, and have also found it to be the case myself, that mesh renderers that are not active indeed offload themselves out of working memory and if your memory fills up they will be evicted in favour of more readily needed memory, which can cause those meshes to hitch your system to squeeze them back into vram taking them off of your system ram, and if your system ram also filled then it has to query it off of your pagefile

Garbage collection, virtual memory (the pagefile), and whatever Unity does or doesn’t do w/r/t unloading inactive assets (not a Unity guy so idk), are 3 completely separate and unrelated things. You seem to be getting them mixed up here.

I don’t want to go over every little thing because I don’t think it’s that useful and it’s honestly not my intent to nitpick, but there are several other things you’ve written here that seem to be wrong or that just make no sense from a game developer’s perspective. In general, I get the impression that you have a lot of practice BSing your way through these kinds of conversations, but lack the necessary foundational knowledge to have a discussion that’s as productive as you want it to be. Just a hunch, coming from someone who has very much been there and done that. And from my experience I think you will be happier if you spend less time arguing about it online and more time mastering those fundamentals and being appropriately humble about the things you’re not totally sure of - assuming my vibe check was accurate.

Question about images that are displayed in-world on a canvas. Been working fine for a long time and then after the 2023.3.3 patch the images are reversed 180 deg. I can adjust it of course but don’t know if it is a bug and it will be changed / fixed in another patch.

Has anyone else noticed this? It could (I assume) affect lots of other worlds. Both Quest native and PCVR btw.

It’s a bug if you submit a report. I didn’t see it mentioned in the patch notes…

I’ve seen a prefab tablet for sharing images, so if you happen to link a bug report on Canny and don’t provide an example I can cough up the example picture.

I’ve also seen wonkiness in gogolocos newer flying method, it basically didn’t work until I was going to show someone. Basically 1.7.91 and newer uses a VRC_Station instead of… a collider?

The above debate has been going on for a long time, but when I was bored I read tupper’s reply on reddit.
I’ve been reading tupper’s replies on reddit in my boredom, so I’m going to give my opinion here.

I’m a bit curious if you guys have actually tried to do specific experiments? Or are you just quoting data without trying to reproduce and get as much avatar and world udon as possible from the game, stripping it down to have a standalone version, exporting it and using Unity3d’s analytics tools to monitor the exported version using the API to report back?

Then I have to say something kind of funny, actually Unity3D if you show the color during analysis it will be the animator’s overhead, but if you take the time to look at it you’ll see that it’s actually the constraints.

The difference between overhead and export doesn’t seem to be that big in the new Unity editor, but it’s really time-consuming since I didn’t do my best to decompile all the Unityplayers and really compare them one by one in detail.

Through VRchat’s stand-alone simulation test and using the profile tool, and the actual online comparison, it seems that there should not be much difference between the frame rate and the scene change difference, so we can assume that there is no difference.

Then VRchat stand-alone simulation and compared to the editor to execute the process, I’m not VRchat staff I do not know for Unity3D’s tools are more detailed and the implementation of the difference.

It just seems to be close in frame rate and number of people (I’m using a single repetitive avatar), so I’m guessing the difference won’t be too huge.

Because on average each avatar will accumulate frametime due to constraints is about the same.

Let’s say one avatar with 0.33ms, ten avatars would be 3.3ms, which would reduce the frametime from 3000fps to 300fps, ignoring the other costs of the scene.

Roughly speaking, a 0.25ms/50MB/1000 bones/500 phybones/250MB VRAM for 40 avatars would be reduced to 100fps, plus a world overhead would result in only 50fps.

Looking in the editor, you can see that 5.23ms out of 6.45ms is ConstaintManager::Update.
Then in ConstrainManager::update::jobsetup is 3.63ms.
and then sortConstraintsbydependencies is 3.57ms.

Note that this is only a two or three level animator.
What if it’s a 30+ layer animator?

My test condition is the CPU overhead caused by not seeing the avatar.


Based on my understanding of the constraints, I'm guessing this is an issue caused by a C# script written by VRchat or Unity3D's implementation. (There is a similar load, which has to be doubted. Moreover, in the case of other people's testing, the constraint should be much less expensive, but the editor results are still higher than the actual operation) In fact, there are too many issues to complain about. I just didn’t understand them in detail. I just decompiled the source code and took a look.

Let me explain here why I think it is Vrchat’s script problem?
Theoretically, the CPU overhead of the editor is higher than that of the export. However, the frametime is lower than expected, in other words, the fps is higher.
The fps of the same 4 avatars (including one insignificant avatar, a total of five) without being seen can be compared with the results of the same vrchat official version executed in a single machine. The editor is much higher, especially the one that is not running the world of udon#.
Results measured while ensuring there are no multi-core CPU bottlenecks or other bottlenecks

(However, there are animators and AnimatorOverrideController during the actual execution, so the overhead is higher)

intel_0001

Unity_CPU_fps

What if I deliberately changed it to full action?

anime_Always Animate

update transforms

avatar 16x Always Animate(There is update transforms version above)

Generally, others are update transforms and oneself is always animate, but there are some cases where other people’s avatars will be close to the latter state.

But no matter what, the animator’s overhead is always much less compared to constraints.

In order to better understand the problem, I changed it to the basic model of this avatar, from 68->14 constraints.
Then the number of avatars is doubled.

(There are some minor issues to be corrected here and provided
Compare update transforms with Always Animate version)



(avatar 16x->32x)

Seriously, there shouldn’t be many people who add 20 animation layers to the basic avatar in the booth.
And it has to be in an always animated state, and besides that, many PBs and bones are not added, so the cost is disproportionate.

In addition, the number of animation layers seems to increase the overhead of constraints? (I’m not sure, but it does seem to be more in comparison)
However, compared to the number of animation layers in Always Animate, the increase is not so huge.

In addition, there are 20 layers compared to 3 layers of Always Animate, a total of 32 avatars.


The animator in the editor seems to be less efficient? (If you have some own data, you can compare it)
Comparison object 5600X
(0.255ms vs 1.17ms)
We assume that six times the number of layers increases the frametime by about 4.5 times. Assuming 120 layers, it may add 5.265ms in the editor? Don’t take this data seriously.
(In fact, it will increase approximately linearly before 100 layers, and then it will increase squarely)

As far as personal observation is concerned, it is unlikely that most avatars have such an exaggerated number of animation layers, and before this, this number of avatars has already caused the fps to be quite exaggeratedly low.

However, it is actually necessary to consider that an avatar has more than one animator operating, so the cost may be higher but not much higher. After all, the main increase in animation layers comes from FX.

Then completely peel off the constraints on the avatar

Without constraints you can see that basically the color layer is rendered without much animation

But one thing is very important, because it is not uploaded in the editor, so the VRC avatar Descriptor is not used.

I’m quite skeptical about the coliders part.

As mentioned at the beginning, animated colors contain constraints.


The following is the result obtained by the editor directly by externally connecting other profiles.

And then there’s also the unusual CPU overhead data for some of these


2023/10/09 Very similar reproducible results

Seriously, I don’t really want to post this data. I would rather wait until others have studied it or even the official has dealt with it themselves.

I am very lazy.


You can find matching functions through their signatures

Seriously, the above diagram does not have a very detailed reverse direction, I cannot guarantee that it is 100% correct, and the load composition varies a lot in different scenarios, so I cannot guarantee that it is correct.

In fact, I want to see other people learn about it here. If more people do this, we can find the cause and solve it faster, clearer and more problem-free.

For ordinary people, under the load of many animators, they may think that the load is like this
It is about 33% of the main thread, which is equivalent to using three threads.

However, the biggest problem is that the signature obtained from the editor looks fundamentally different from the result reversed by IDA.

Compilation doesn’t make anything fundamentally different.

This is the reason why I would deny it. Secondly, by turning off custom animation, if no abnormal culling occurs, it will affect about 10~15% or even lower percentage of CPU frames. Compared with the case where the impact is large (that This kind of serious avatar (avatar) that causes the CPU frame rate to drop is not considered to be the main cause, and the multiple difference is too large.

However, officials believe that the animators are the source of the problem and are looking to solve it through imposters.
So I started to wonder if officials had successfully done some very in-depth tracking analysis?

This was in a certain world called “Chinese Bar” in the early years

There are less than ten people collected, and all avatar situations are fully activated.

If you compare similar payloads one by one, can you get a close signature?
So it’s very difficult to do…

This profile’s loaded objects start with a high-cost avatar Due to the design of the PB and the bones, it is equivalent to all activities for itself. Compared to other people wearing it, the frame time may increase several times.
Second, in the Chinese-speaking world, poorly optimized avatars are everywhere, so they are a good target for comparison.
In addition, many people in VRchat are accustomed to being on standby for long periods of time in public places.
This is an ideal environment to measure the problem.
Seriously, if VRchat is currently free to use profiles and exclude EAC, I’m afraid there’s already enough data to easily understand the problem.

In fact, this article is not to blame anyone, but the actual problems of VRchat are too complicated to imagine.
You can’t think of a free combination. Although it’s not as free as I imagined, it’s still so complicated that it’s difficult to find out the reason with limited data.
Collecting a large amount of data and sorting out matching and comparison requires a lot of basic knowledge training. I think this level is not too difficult for someone with a college degree or above.
The current hardware cost of VRchat environment and performance is too exaggerated, and I don’t know how to effectively solve it.

Even with the high cost of VR equipment, we can’t assume that all VRchat players are reasonably wealthy and have powerful enough accessories to solve any performance issues.
I hope the issue is resolved soon.


Add several animator objects with multiple layers to the avatar, each with complex animations.

This is my understanding of the official view of the high cost of animators.

It should be noted that the animations below are basically Always Animate.


off_02

Each avatar mounts four additional objects and four animators
There are 32 avatars in total, and the rest remain as they are.

Replace the animator on the avatar with a three-layer animation


off_3_fps

If that’s how you guys think animators are extremely overhead… I don’t know what to say, and custom animations that switch to safe modes don’t always get such a huge boost.

__

If we design according to this idea, we can get an fps that is very close to the results monitored by Intel.


4avatar_test_fps

Based on this reasoning, the fps and overhead of 16avatar
16avatar_fps


About 100fps at this time

Then double to 32 avatars

32avatar_fps

This fps is very close to the real experience, and the udon on the world is ignored. If added, it is estimated to be less than 30fps.

I wonder if some people often monitor fps and get similar values

However, animators are gradually marginalized.

And from the above simple experiment, it can be concluded that it does not require a high number of constraints to have a big impact, and about 68 per avatar is not too many (although it is still on the high side)

Also compare the boost from replacing the animation layer, which is about 20% fps (however I personally never actually get that good to such a high boost)

Replacement animation boost

In addition, the difference between the four avatars not added above in vrchat and the editor is actually not small.
It is best to spend time analyzing both gameassembly and unityplayer


Let me update the cause of the problem:
The C# animation AnimatorOverrideController combines the full motion of the animation to cause the bones to fully work in its own state, and then combines the malicious structure (possibly non-malicious) constraint nesting with thousands of bones, resulting in a few avatars that can easily occupy 1ms~2ms or even 3ms of CPU frametime.


Also, I think the reason why many people think the animators are the root of the problem is because simply measuring massive avatar stacks leads to a deterioration of the structural hierarchy of the scene, resulting in a massive waste of CPU overhead. Animation is inevitably cited as the main cause in attempts at reproduction.
However the constraints brought the problem much more seriously and much earlier than the animation, so I think he is the main cause, but a better solution would be to find a way to improve the single-threaded scene API.


(You can also try setting the animator status of the objects in the avatar yourself (?))

I’m not going to write more than that, I’ve already exaggerated.

2 Likes

Good stuff. I would like to suggest an easier way to change your Picture in VRC. If you guys could make that happen im sure many people would be thankful for it.

How is it questionable? Animator heaviness is common knowledge, and many many people have profiled it and experienced it in person dealing with their own animators.

Not true at all. You can code the lighting to look virtually identical to pixel lighting, the only major difference is that the light coverage is slightly less accurate. I have my own shader as evidence for this. Most people can’t tell the difference between light that is calculated per surface vs calculated based on object origin, i have a toggle for this too; point lights not super close by are hard to distinguish from directional lights.

I was only saying how systems like nanite are not practical for vr due to how heavy it is; my point was that it was pointless.



We all already know that constraints are messed up, there was a big deal made about it previously and it was added to the performance counter because of it, they showed a graph too that the curve goes up. I’m talking about previous unity, i dunno if it’s better in 2022.

https://twitter.com/hfcredddd/status/1547958650899009546
&
Developer Update - 16 February 2023

Avatars existing =/= avatars being controlled by player update when it comes to cpu cost. There’s a lot more cpu stuff going on to manage networking and ik updates and all kinds of other things. You’ll notice that is you just have an avatar or a world with a bunch of avatars that just stand there static, or even animate walking or something, it’s a lot less taking than actual users in a room.

. . . The issue is that due to mods being swept away thanks to EAC we have no proper means of profiling anything in the client anymore, and i didn’t think to recruit some rigorous profiling from people at the time.

Uhh, this isn’t true at all. Maybe not the bulk masses, but a lot of ethot avatars for example, like the gumroad stuff that a lot of typical people who aren’t deep in the community use, are a disaster in their animators. Then the booth people have their own disasters because they keep piling prefabs on - i can feel it, i can see it when they show their graphs or i see the avatars myself if they’re public, they have so many prefabs jammed into them and each adds like 2-5 layers - outfits, erp systems, gogoloco, weapons and toys.

Tupper also supported the issues of animator cost on twitter.

The reality is you cannot expect everyone to change their habits. Nothing anyone can do can change their habits significantly in any direction, you can only nudge it a bit, hence like is aid before, the responsibility falls upon those who make the main systems everyone uses. Even if the biggest costs are user incompetence, you have to mitigate that. This gets vastly worse in non-english speaking parts of the community whomst are not even part of the conversation.

Yes users could improve their avatars significantly, but will they, and will anything you or anyone say convince more than a handful individuals? No.

  • The only option is to make optimized content easier to make than unoptimized content.

You’d be shocked. 20 layers in animator is like the starting point for a lot of avatars. Not that everyone and their dog has it, but it’s enough that you feel it in a majority of places you go.

Well that’s the same mistake almost every game dev makes, they blame the hardware for not being good enough before considering that they could just work harder or hire more hands and put full force into streamlining everything they program, also making sure to avoid “clean code”. But i’ve tried to say before, how countles major companies have dedicated huge amounts of resources and time to get notable performance improvements despite no observable changes to UX besides speed.

Well it’s not just editor profiling, i’ve worked with a number of people who’ve had severe performance issues with their avatars (usually exacerbated locally for themselves), and upon reworking their animators got mind boggling performance improvements after solving issues or having too many active layers - one of the quick and dirty tricks i tell people to use is to weight layers to 0 that aren’t in use, like if you have a layer for a gun, have that weighted off until you actually grab said gun. Setting animation speed to 0 has similar effect but not as clean, and still is checking said layer.

People who have knowledge and know what they are doing can solve a lot of things, but most people don’t and won’t, unless there’s like some actual avatar optimization school course that’s concise and enjoyable to watch and easy to understand that VRChat itself develops and publishes.




…All in all, i apologize for turning this into a huge bloated discussion, so i’ll leave it here. I appreciate the efforts above, it’s definitely respectable, more stuff like this is what we all need, if anything, to pressure vrchat to consider the importance of making stuff like alternatives to constraints if unity doesn’t fix their own. Honestly though, a constraint system shouldn’t bee too complicated/difficult to replicate. It’s mostly just copying transform values/offsets of specified objects or rotating to face specified objects (little more complicated ofc). I’m not well versed in running these types of things, so i usually defer to others for that (which i do in many contexts, both directly like this, and anecdotally, to correlate).

My point about local slowdowns is that some systems and PBs that use similar constraints are the most noticeable.
Next is the huge amount of bones.

I analyzed the animator’s system and I don’t think that reducing the number of layers would be a good way to do it, on the contrary I think that VRchat needs to make a comprehensive improvement.

If there is a real need it should be split into two, one that always works and really synchronizes with a single world, and one that can’t be transformed or eliminated completely in the camera.
An avatar should not be a single one, especially with complex bounds.

However, the official design generates an animator that can’t fulfill the conditions, only a single type, and he still works a lot when you can’t see it.

Let me give you an example, there are some cases where the cost of the animator is abnormal, let’s say that a large number of running bones have a great impact on the structural level of the scene, and are always working on the match, you just look at him and he keeps on running.

Next, with VRchat’s distance culler, you can’t see it or turn your head to another position, but it’s already gone or not playing, and the CPU overhead still stays on itself.

This causes a lot of performance problems, which you can only release manually by switching to safe level mode and reloading.

At this point, you can also compare the performance of a customized animation with and without a customized animation.


There are definitely a lot of abuses of the constraints, but I’ve observed that many of them are even hidden via IL2cpp, and I really don’t want to spend too much time observing how VRchat actually utilizes the SDK to encapsulate the functionality.

There are many Chinese world-famous public avatars, with up to 1.5ms CPU frametime overhead, which I’ve personally measured several times.

(Most are about 0.5ms~0.8ms, not always this high, but a few people can easily break through 1ms)

This results in a world with just over ten avatars and an FPS of not even 40.

In these strange designs, there are some functions in the SDK that take up 30-40% of the CPU main thread overhead. Knowing that all CPUs add up to 100%, a few avatars’ profile data in the actual game environment takes up 38% of the 52% of the main thread (equivalent to 73%) with just one function instruction fragment.

Meanwhile, this avatar has over 60 FX layers, including others totaling about 70 animation layers, yet it barely takes up much here. (In the second thread there is 11% of Unityplayer, however he is still not the main one).

To achieve this effect you don’t need 5000 bones and 120 constraints, just a few hundred to a thousand bones and 50-70 constraints for the real activity, and such avatar values can be said to be frequent to the point of proliferation.

I even suspect that it is possible to achieve the same effect with a much smaller total number of avatar parameters.

Comparisons made by different people can have a difference of five times the CPU frametime for the same data, which is quite outrageous, and even ten times more is not impossible.

Using acquisitions from real environments often reveals that unimaginable things are happening.


Some avatars have up to 120 layers of effects, and also contain other animation layers, totaling 130-140 layers.
Execution time is about 1ms, but he still has a lot of constraint overhead (he has 300~400 constraints and turns them off when possible).

Considering a single animation layer, a high number of layers may take up 30% or more of the main thread, but if evenly distributed over a large number of layers, a large number of animation layers would bring the main thread closer to 25%, resulting in a parallel 4th thread.
These “avatars” take up about 30% to 40% or more of the main thread, and have characteristics similar to the mix between animation and constraints

Some structures may be closer to the 25-30% range, which makes them harder to distinguish.

These belong to an already simple structure that is not using a mixing tree (and is in fact more difficult to maintain without it).


Hazards are a bit high. An avatar requires only 200 bones, which is medium, and uses less than 15 constraints.

So even with about 20 or more animation layers, this is still only a small fraction of the total, of which about 50% are affected by constraints.
Ultimately, the cost of 32 medium avatars will be around 9ms, which can be reduced to nearly 4.5ms by eliminating constraints.
Proper management can reduce this to less than 5ms. Considering the actual measured fps, this is still almost double.

This means that in worlds like JHP and other parties, even if 60 avatars managed to appear are medium, the fps isn’t much higher.

With 64 or so avatars it’s only 50fps (20ms), and that doesn’t even include the overhead of the world itself.

Not true at all. You can code the lighting to look virtually identical to pixel lighting, the only major difference is that the light coverage is slightly less accurate.

It is absolutely true, and the reason vertex lighting is no longer widely used is not only because of its inaccuracy and inflexibility (normal mapping and deferred shading both require per-pixel lighting, for example), but also because of its poor performance scaling and the constraints it places on meshes and lighting setup.

The fundamental contradiction of vertex lighting is that it requires dense meshes to look good, but the denser the mesh, the worse it performs relative to per-pixel lighting. In the best case, you calculate lighting once per vertex. In the worst case, 3 times per triangle. So as a baseline, each vertex or triangle needs to be at least a few pixels large on-screen on average, in order for vertex lighting to outperform per-pixel lighting. Which doesn’t seem so bad, since that’s near-nanite levels of triangle density… until you realize that because the vertex shader is not subject to any of the clever rasterization shortcuts that the pixel shader benefits from, you have to pay the lighting cost for all shaded vertices, including degenerate triangles that cover 0 pixels, backface triangles facing away from the camera, triangles that are entirely offscreen, and triangles that are behind an already-drawn mesh so all their pixels will fail the depth test. Add on top of that the fact that having a dense mesh makes the pixel shader more expensive due to quad overdraw, and vertex lighting can easily become more expensive than per-pixel lighting, or at least not cheap enough to be worth the inflexibility and poor quality, particularly in a case like VRChat’s where there are no LoDs. And in general the cost of vertex lighting scales with scene complexity, vs. per-pixel lighting which tends to scale more with resolution, which is more desirable when the aim is to have consistent performance and pose fewer constraints on artists and designers.

1 Like

Re: All of that long-winded performance discussion.

All it really boils down to is this:

If y’all read what Tupper says and look into how Unity works:

  • Animator Layers are expensive
  • Draw Calls are expensive
  • Constraints can be expensive, but aren’t super common on avatars en-masse
  • Shaders can be expensive, but it depends on how they’re used + how many various materials there are
  • VRChat has basically zero way to instance avatars, so each one counts towards the budget

VRChat performance issues boil down to:

  • Dense instances will often be memory + CPU limited
  • This is becuase of laggy animators and draw calls, mostly
  • Go into a really dense instance with a good world and excellent/good avatars with 3 or fewer animator layers, and it should perform very well

Going into more detail:

What’s also evidenced by the X3D CPUs being so significant for VRChat performance is that the game is heavily memory limited. I.E: It can’t utilize cache lines on the CPU well. That’s likely why we often see cases where you will not have any cores maxed out nor your GPU maxed out, but you’re still dropping frames.

This talk by Mike Acton (who worked at Unity and implemented their DOTS/Jobs System) explains inefficient use of the CPU cache lines and how most of the time is spent waiting on data to be received from RAM. I believe that this, mixed with the outlined factors of VRChat above, causes this scaled problem where the game can not scale.

We have people abusing things such as animator layers, texture/mesh memory, and draw calls that cause poor performance, this scales to all avatars, causing the problems we see today. So not only are the core Unity systems struggling because we have all these non-instanced avatars making the CPU take forever with draw calls and animators, we have them soaking up system RAM and VRAM, which only further adds latency and eventually the system crashing/nearly locking up.

These are things that could be solved mostly if people understood how to make better avatars, as well as if VRChat enforced it better/allowed users to more granularly block poor avatars. It can also be helped if VRChat moved to something like a multi-threaded animator system.

All of this to say:

Performance issues in VRChat can mostly be solved if people make worlds and avatars better, but also that there’s multiple layers and problems to this that can’t really be solved without compromising the current freedom people have with avatar and world creation.

Also, yes, before anyone says it: I agree that it’s dumb that people need to basically become a game developer in order to make well-performing avatars. But that’s the reality we face. There’s a triangle between: easy, high quality, and performant. Pick two.

1 Like

Drawcall and Skin Mesh and Mesh and Particle System and Cloth are memory bound.
Animation layers and constraints and bones and PBs are not memory bound.

Although it seems to be, but after the profile to see and understand the implementation will know, mainly due to the data structure and algorithm details.

It is basically a completely sequential traversal and context-sensitive near-serial, requiring low memory overhead per unit of computation and easy to hit.

To simplify things a bit, it should be said that visible types consume more data and are denser, whereas invisible persistence behind the scenes always consumes a lot of checks and is serialized.

Once you can see the object, the CPU consumed in updating the relevant skeleton to change and make it visible to you is data-intensive, and can be relied upon by X3D to solve the performance problem.

The other type relies on architecture IPC and frequency enhancement, however the latter has almost no IPC enhancement in terms of constraints, followed by more room for IPC enhancement is the PB and skeleton based on branch prediction and speculative acceleration, and then there is the animation layer and its related operations which are bigger, but not huge, and the instruction parallelism is not high, so the enhancement is limited.

If you want to talk about architecture and cache to improve the maximum Cloth should be able to, but Cloth’s performance is poor to unacceptable, rather than using PB.


Incidentally, particle systems in 2D are not a bandwidth bottleneck for the GPU but a fill rate bottleneck.

In addition, C# VM and udon interpreters such as the bottleneck in branch prediction, scheduling, although you can combine speculation through out of order, however, the cost is completely disproportionate to the gain, out of order is supposed to be used to optimize the back-end resources (including data access and memory), rather than to improve the efficiency of the front-end of the effective CPU, or even the effect of dispatch.

That’s great and all, but that’s just overly specific for what people are concerned about. I honestly have not read any of your performance metrics because it’s just a gigantic wall of text vomit with seemingly no outline or summary that simplifies it and makes it easier to digest. Even what you said here was a lot to comprehend.

We know that draw calls and animators layers are significant for performance, and the animators rendering all the time (even with avatar hider) also account for the performance issues in dense instances. So those are where we can likely focus our attention, besides RAM/VRAM usage of course.

People need to start with using blend trees and reducing their texture memory. That’s what this comes down to.

Go into a really dense instance with a good world and excellent/good avatars with 3 or fewer animator layers, and it should perform very well

“Should” is doing a lot of work here. I’ve been to events with 40+ people, which enforce avatar performance limits (e.g. good or better, or medium or better), and even with literally all avatars hidden by distance, the framerate is still terrible. So there is something going wrong there that VRChat can optimize for sure, although I have not looked into what that might be.

It can’t utilize cache lines on the CPU well. That’s likely why we often see cases where you will not have any cores maxed out nor your GPU maxed out, but you’re still dropping frames.

The time spent waiting on main memory is still counted as CPU time, so that wouldn’t be a reason for low reported CPU utilization. It would be something else - for example, maybe CPU and GPU work are not sufficiently overlapped, so when the CPU is working the GPU is waiting, and when the GPU is working the CPU is waiting. I don’t know if that’s actually the case for VRChat, it’s just an example of one potential cause. There could be other reasons.