How does VRS might work for different types of Eye tracking?
Lets say I get the Droolon Pi, the Pimax eye tracking module for their 4k and 8k HMDs, Pimax by default offers a dynamic foveated rendering for this to follow eye movement, VRchat will work with this? do I need to set up anything else like OSC when the feature gets implemented?
To be fair, I do not have a deep knowledge of the subjects, but I have learned a bit. It seems odd you are focusing so hard on physbones, when you can check the profiler for any avatar and see that the animator is the #1 CPU bottleneck of any avatar. Second to that are often draw calls and blendshapes (though blendshapes apparently changed with 2022). Mesh transformations from bones should already be a static amount to my knowledge, scaling with how many bones you have. Physbones are a separate component.
In most cases, the bottlenecks in denser instances are CPU related, often animators, Udon, and VRAM/RAM swap.
What I understand of what @CyanMoon is talking about is more specifically Udon related, or how the fundamental systems of animators or whatever work.
They’re talking about how processors work, and how a lot of the designs within Unity are not optimized to take advantage of these things. For example, Animators could at least be put onto multiple threads using Unity Jobs, which would significantly improve the performance. I also know there is a nuance about draw calls between DX11 and DX12, though DX11 does support some level of multi-threaded draw calls/render threads.
But more succinctly, from my perspective, this includes things like Data Oriented Design, which Mike Acton talked about here in 2014 and he was hired to work on the Jobs and Entity Component System for Unity. It is about reducing your data structures down to arrays so the amount of data they use is minimal and it can spend as much time as possible in CPU cache, since RAM access takes way too long (and is a HUGE PROBLEM with VRChat). It’s also why the AMD X3D processors work so well, because anything that was being pushed out to RAM is able to fit more within the CPU cache, proving there are severe bottlenecks with whatever data structures and transformation systems that are being used.
All of this is to say: VRChat could improve how Udon performs, especially if they can educate people on how bad EXTERN calls are, as well as look into methods themselves to improve the performance of animators or the current pain points of performance.
I have to say something more. If currently animators are based on blending trees, some animation effects cannot be produced or must be much more complicated to produce equivalent effects.
One feature of multi-layer animation layers is that the time between different animations can be adjusted arbitrarily, and the time of the animation itself is independent, which is difficult to achieve with a blending tree.
There are not many problems for logic that does not depend on time or time difference changes, but once it does depend on it, it is difficult to optimize.
And the more animation layers there are, the more obvious the “contamination” of the cache will be. If you want to rank the three components that are invisible on the avatar and have a big impact, it is the easiest one.
From the perspective of access efficiency, Animator > Constraints > Physbones, and from the perspective of the amount of data carried, Animator > Physbones > Constraints.
Although the degree of multithreading is related to the ease of data access, it is not the same, Animator > Physbones > Constraints.
The L3 cache is often the victim, and the rest is kept much more efficient by the CPU hardware.
In the case of high L3 miss, coupled with the design of the DRAM bank, the final result may cause the animator to spend an additional 30% to 40% or even double the cost of the avatar itself, which is consumed by udon.
With the proportion of bandwidth occupied, DRAM latency will change with values such as bank and CL. It is relatively high but does not need to be used in full. It only takes half to immediately see the difference in latency. At this time, DDR5 will be better than DDR4.
(The measurement of AIDA64 is not accurate and can only be used as a reference)
To be honest, I don’t quite understand the update logic and ideas behind Unity3D.
A TL;DR (because this response is quite long and fair enough if you don’t read it entirely ): reasonably constructed avatars should not be running into significant caching and DRAM throughput problems. The problem is most avatars are not reasonably constructed.
I am not focusing on physbones. They are just an example of an easy and simple thing creators can tweak to improve performance easily.
… I may have overused it as an example, I admit .
Everything I mentioned is an example to demonstrate a point and nothing more. Feel free to replace physbones with high triangle/material/constraint/blendtree count or whatever other builtin-system-abuse you want. Forest and the trees, and all that.
Try pulling apart some assets from published games (any game over the last 20 years will do) to get an idea of how to construct high-quality art assets – learn from the masters. They are constructed very differently to the usual approach to avatars.
I also understand this. I’m questioning the practical relevance of these things for most users and creators. Older games on weaker hardware manage better visuals on the same engine.
VRChat is not a AAA game on the bleeding technical edge of graphics. And it shouldn’t be – the focus has clearly been on creativity and ease of use.
My responses are limited to avatars and I have not addressed Udon issues as I don’t have a deep enough understanding on how Udon works to comment. As far as that goes: rtyog84 could be completely on point.
This is actually quite relevant as I know many developers have been questioning not the performance, but the pragmatism of such an approach. There are cases where it’s almost a necessity (IIRC TABS used a custom system with a similar approach to accomplish what it did), but there are also concerns it can make smaller and more straightforward games more technically complex and higher-risk to engineer. A legitimate concern for an engine that’s essentially the go to for smaller and more straightforward games.
To answer what I think you are saying: systems like Mecanim are primarily designed for ease of artist use, not for raw data throughput. This is why they are not necessarily as performance optimal as they could be.
Unity’s whole claim-to-fame is with it’s ease of use. It unified what used to be created with a large collection of complex tools. Engines before it had much higher technical barriers.
There is one thing I will agree with in the general sense: the way the animators are handled in VRChat could be improved on their end. At the very least, a more intuitive way to set up animators for content creators could lead to more efficient animator setups. I suspect most creators just want to swap a few animations out and sometimes end up with a spaghetti animator doing this. Improvements here could lead to both performance and usability improvements. That, I feel, is on VRChat.
There is another issue I feel may potentially be at play. I suspect quite a few creators work by increasing numbers on things (like subdivision) until they run into performance problems, all under the misguided (but common) belief that art automatically looks better with higher res textures and more triangles. This makes improving performance in VRChat a moving target. You could hyper-optimise all the systems and end up with a beautiful stunning work of software engineering, and people will run into what is perceptually the same performance problems.
I have to talk about why I emphasize Udon. This is because avatar is a problem of avatar and world is a problem of world.
Udon’s heavy reliance on L3 size and DRAM performance greatly differentiates the experience of different devices.
For example, in a train world, there is only one person holding a well-optimized avatar that can only maintain about 100fps. He is an ordinary zen3 CPU, let’s assume he is 5800X.
Now a total of three people have entered the world, holding a 5800H laptop, a 5800X desktop, and a 5800X3D desktop.
Now that the train has started, the fps of the three people are now 20fps, 50fps, and 100+fps respectively.
Until related friends entered the venue one after another, and some laptop users also entered.
Now there are not three people but three types of people. Until there are sixteen people in a world, the laptop users can’t hold back the screams and leave one after another, shouting 10fps before disappearing.
(The above is what really happened)
At this time, zen3 on the desktop has more than 30fps, while X3D is much higher.
The gap between different devices may be as high as five or six times in CPU performance. This gap comes from the size of L3 and the latency of DDR and LPDDR.
This gap is larger than the gap between quest2 and quest3, and larger than the gap between flagship mobile phone CPU and desktop CPU.
Although optimization may never be enough, a better foundation can create a world with more amazing functions and gameplay. At least the creator does not have to face such an exaggerated gap in devices, unless he only wants to play for desktop users or X3D users.
Perhaps we should distinguish not only mobile, but also desktop, laptop, and X3D.
It’s not that some people don’t have money to buy a desktop, but that they can only use a laptop because of restrictions. This is not their fault.
In other worlds, it is much better, and the difference between each other may be within 30%.
On the other hand, I also hope that the official will focus on the key points of the design, instead of modifying the behavior in private without communication and discussion, which caused a large number of avatar problems, and also caused some bugs in the world that have not been fixed yet.
How long can it take for an avatar to be scaned, mine has been on the “?” for a while now and showing the error robot, im trying to see if its ready to put on my gumroad.
What’s the current status on the Mac version of VRChat Creator Companion? The commandline stuff works sorta-okay but there’s a lot of stuff in the CLI that’s still pretty difficult to do (and poorly-documented).
This topic was automatically closed after 14 days. New replies are no longer allowed.