I might be reading this wrong, but just to be clear, SDK2 worlds are definitely not 80% of our existing worlds. At this point, they’re a minority both in raw volume and in active worlds.
In addition, I do not believe this change would greatly affect SDK2 worlds, although I could be wrong. Tau (again) would have a better idea.
I am in agreement on not wanting to have more broken worlds, but I think it’s worth considering the present state of existing brokenness and niche-ness of the present issue as it stands.
Even if the proposed future fix were applied unilaterally, normal instance master changes should still be detectable on player leave events. It’s only in the scenario where an Android/Quest instance master sleeps in those worlds and the world caches/requires isMaster logic that it could enter into a weird state, since the master would change without firing the player leave event. But such worlds are already entering into a weird state as is when the networking breaks because there’s no active instance master to handle the Udon, hence the need for this change at all.
So if my understanding is correct, most worlds shouldn’t simply break outright from the change. It’s only in the scenario above where behavior would be different than expected, which is already somewhat uncertain or simply broken for many worlds anyways (Fax’s examples of No Time Two Talk and Super VR Ball for example).
The present situation is (as I understand it):
Master Sleeps → No Networking Occurs → Master Eventually Times Out (or returns and we stop here as networking resumes) → Leave Event Triggers → Master Changes.
The end result is no guarantee the world still works after the long pause in networking, and in many cases we know it doesn’t.
The proposed change without fallback to old behavior (as I understand it):
Master Sleeps → Master Changes (would apply for all worlds, but no event listened to for it in old worlds).
In this scenario there’s still no guarantee the world still works, but at least we skip the lack of networking for the timeout period, and there’s a chance the world continues to function if it doesn’t rely on the master or caches them. In some scenarios, this may even fix existing worlds that break under the current paradigm.
Yea you did read it wrong.
Don’t worry about it.
What i meant is that worlds that are not being updated every year is a large majority.
And in some cases having them updated is impossible because the creator has been dead for several years, yet people still go there as what they have created is still regarded as a unique experience not found in other worlds even years after.
One use case that needs to be considered with this wierd state.
IF you have the world designed with a player joining being designated a script/object that they are supposed to be the sole owner off and that object being cleared/disabled when the owner leaves the world.
If it’s for example designed to count the object owners duration in a zone,
and the owner is a hibernating Quest user,
then suddenly a different player than the intended owner is unaware making use of it and the ownership is not refunded when the person no longer hibernates.
This would cause a persistant glitch in the world that might not even be resolved when the player re-connects,
all this comes down to how it was coded,
but the root issue is that most people are un-aware of these things happening meaning they left wondering why it’s failing at seemingly random and none of this is noticed in any testing they do.
I try make my scripts as robust as imaginable, but if that imagination does not contain a known and difficult to diagnose/spot issue, then nothing i do will ever fix it.
These changes will most likely break my Object Pool prefab.
It was written with the assumption that only Master will perform certain logic. Part of these assumptions, and why Owner was not used, were to avoid specific situations that would easily cause network race conditions when the person performing the logic could easily switch between players still in the instance. These new changes break that assumption. Looking at the code, there is a cached isMaster, so these changes would have some effect eventually if it isn’t obvious right away.
The object pool has been used in multiple popular worlds, including the two worlds listed in the first post, and is breaking with the current quest sleep master issues. Along with needing to fix or rewrite the object pool after these changes, any world using an older version would break and the author would be forced to update. This is “normal” for a constantly changing platform, as much as I dislike how much work and stress is put on creators.
The issue here, which is a tangent, is that I am no longer a world creator, and have no desire to do major changes to a system that has been tested and verified working for multiple years. I also have no reasonable way to test changes. General networking is hard enough to test, but trying to test weird edge cases is even harder.
Now all of this is just a single former creator complaining. This thread should be used to see how creators are using “is Master checks”, and hopefully provide solutions for creators to stop using them. In my selfish case, these changes will mean I can no longer guarantee the quality of my prefab, making any creator who uses it at risk for unusual bugs.
Potential solutions for my specific issue:
Provide better ways to test networking edge cases.
This doesn’t actually solve anything, but would help other people verify if something, like the pool prefab, does or does not have obvious bugs due to the changes. Hai proposed some, but making general networking testing easier should be prioritized at some point.
Replace the need for my object pool to use master/owner logic in the first place.
My object pool is a complicated system to solve a simple problem: assigning a synced unique unchanging index within the range of world cap to all players in the instance. If VRChat provided this index value along with the playerId, it would hopefully be a simple replacement in my object pool, removing the need for master logic in the first place, and VRChat would then be responsible for testing it rather than users with little resources.
Replace the need for my third party object pool system entirely.
VRChat teased Player Programs years ago. This clearly is out of scope for this post…
Since it was mentioned in discord that someone else’s system also used master for assignment related tasks, I am curious how many people use isMaster checks for object pool or player assignment. If it is enough, it could be worth prioritizing a solution to replace those along with these master changes to ensure nothing breaks.
This wouldn’t be the case even with VRC’s proposed change though would it? Because all the proposed change affects is instance master transference being able to occur and made known to Udon outside of the player leave event. In the above scenario no item is refunded from the sleeping user because they are still in the world and haven’t triggered a leave event. The issue would mainly be if there’s some special logic that only the master is intended to perform, they may not know it until a leave event occurs and they can then detect they are now the master (even if they already had been).
Well, it might be simpler to fix than your imagining it, so the rest of us could probably fix it for you.
So no worries.
But that depends on what the solution for this topic ends up being.
I don’t deny that it could be simple to fix. The problem is verifying it is actually fixed. I spent months verifying and testing my object pool before release, working closely with a few creators who found specific and edge case bugs. I don’t expect anyone to go through that process again, including myself.
Hmm, looking at your object pool code that is indeed a problem. An easily fixable one, with an OnMasterChanged callback, which would also push the burden of making that event reliable and correctly timed onto us - but as you said, that’s sort-of besides the point here. In the end, it is one more reason to implement this as opt-in only.
I don’t have a good answer to this part of your post, but from an engineer’s perspective, I feel your pain - it certainly applies to working on our end of networking as well :)
Implicitly master-owned objects will not call ownership requests as far as I can tell. Previously, this wouldn’t have made much sense, given that the master it was being stolen from was already gone, and the same applies here to some degree with the player unable to respond.
Can you explain what the player programs thing is? This is the first time I’ve heard of it.
Player programs mentioned during the April 2021 dev stream as part of the persistence update. This is timestamped to the slide mentioning it but you should go back about 4 minutes to hear the beginning of persistence.
Interesting read. Two thoughts. One is has thought been given to “pre-event” events? As far as I can tell we have control only after something occurred (including simple actions like menu stuff). It would seem that having access to an event that could process/prepare and even cancel an event would be a nice feature. It seems to me that if we had an event that the master transfer was about to occur that code could be used to handle it in some world-specific way if needed. Obviously ignoring it would make things work the default way.
The other thing is… I use the master check at the startup of almost anything that can be sync’d. Perhaps there is a better way but if the user is the master (i.e. first person in the world) then the initialization occurs (only once) and all subsequent joiners get the sync’d values.
If there a better way I’d be happy to discuss it in another thread.
You can just check Networking.IsOwner(this.gameObject) in most cases as a drop-in. That said, none of the changes discussed here would actually break specifically the “initialize once on first world join” part.
As for the “pre-events” thing, I’m not sure exactly what you mean? Or rather, how that would help?
Ah yes I could check for the owner of the object in Start and the owner would be the master at that time. I might change it to that, the important thing is to have it done once by the first person in.
As for “pre-events” silly term but there are 2 components to my question. Our methods are called after an event has happened right? So the player has jumped or the menu has opened then our code runs. Wouldn’t it be handy to have access prior to the default handler? Optionally we could cancel the default action in some cases. Don’t take this as a perfect example but we have very limited controller inputs and almost all are accounted for. If for instance I wanted to use the Jump button for something else and only jump on a condition I determined can I do that? I believe that I got the default action along with my menu opening. I didn’t want to jump in that case.
That’s actually part 2… part 1 was along the lines of “OnPlayerJoining” or whatever. Not joining necessarily (and I haven’t worked out what I would do with it) but probably just other hooks so we could be alerted before an event takes place.
If this breaks the pool table code I’ll probably fix it.
Feedback:
“Master does not change without someone leaving” has been a part of the networking specification since Udon launched. This is a change in specification and thus should be strongly versioned, not hotfixed.
It’s not fair to the unpaid development community to have to refactor their work on-the-fly to cope with shifting requirements when this is fundamentally a platform bug that is on VRChat’s end to fix.
That said, my stance on the VRChat networking system for Udon is that it is overall not fit for purpose as it stands; my feedback is by nature going to be highly negative. I would anticipate that the overall impact of this patch would be to replace one set of bugs with another. The upside is that these bugs would be fixable by the community; the downside is that the bugs would have to be fixed by the community.
I would encourage the team to consider that building on technical debt is a bad idea, and to consider methods of moving off of the current networking model to something that is more scalable and appropriate to a platform for creating reliable, compelling (and profitable, for professional devs) experiences.
Losts to unpack here and im glad this talks about quest timeouts and stuff.
But this post reveals a larger issue with the official VRChat Documentation especially on networking. There are a lot of nuances that NEED TO BE UPDATED. Please update the documentation.
A lot of people, including myself, are using vrchat to learn the basics of programing and making games using UDON. It is absolutly maddening when the official documentation doesn’t give the correct answers, or half of the correct answers.
We don’t currently have a way of “versioning” network changes, but in this case it would technically be toggleable. This is the idea of making it opt-in via an SDK or web setting, and I think we’ve established most people want this by now :)
Working towards making certain features versionable is definitely good feedback that we can discuss.
As a response to not just this, but a few people sharing the same sentiment above (note that I am specifically talking about the issue at hand here - I know our communication and API stability in general has been a hot topic for a while, but I’m the wrong person to comment on that and this isn’t the place):
I think in this particular case it’s ignoring the technical reality of the situation. This bug exists now, and it is breaking worlds - if you have a solution that does not involve breaking content, I’m all ears! This is what this thread is for, we want to collect feedback in all different directions - but at the same time I don’t think it’s productive to put our heads in the sand and declare the issue unfixable. Be that because it changes APIs, or because it is (and I’m not denying this part) adding onto tech debt. IMHO, the more productive thing we can discuss is how things should change, or which parts should be added to help a transition instead.
It will get better as we fix the timeout limit bug, and potentially implement the PC-player prioritization (so switch to not-the-oldest player, but still only when current master leaves). This will get us most of the way there already. If master-transfer behaviour is then toggleable, it would be more of a feature, and less a breaking change.
Moving off our current networking model entirely would be a gargantuan task with even more breakage among older content, if not an (Udon) content wipe entirely. Pretty sure that one isn’t happening any time soon. I also don’t think the current model is that bad, all things considered - networking is hard, no matter how you do it.
As for @pikapetey’s post - improving documentation, and therein specifying more behaviour explicitly, is definitely on our roadmap :)