DIFF-based world downloads?

Hello,

I was curious if there is any way to do DIFF based world downloads? I do know that git can be used already with Unity, but I wasn’t sure the extent to which it worked and if it can include objects or anything. I.E: If the same concept of how git works to store changes (diff) could be used for world downloads (or even avatar changes).

I think it would be massively helpful to only need to download the changes to a world and apply them locally, than to re-download an entire world. This would be great for larger worlds when there are updates (such as LS Media, which needs to download 217 MB everytime there is a small change).

I will put this in a feature request, but I wanted to see if the idea made sense first.

diff is line based. A binary differential might work, but where you might run into trouble is handling compression. The most aggressive compression algorithms make a dictionary based on the file.

So to do differences the compression would need to be weakened.

I think VRChat is unlikely to implement this for two reasons. One being that this would require more computing power to have conversation and calculate differential. Second is that they might have to rewrite how assets are bundled. But maybe there is a flag for that.

If you do file a feature request, please look into it unity itself can make asset bundles compatible with differential updates. It’s possible that games split things up into separate bundles?

Linux appimage might be worth a look. Those can be compressed in an update-able way

2 Likes

Worlds are compiled into a single compressed asset bundle. Slight changes to a world can result in a completely different binary representation of the entire file, and thus not be meaningfully diff-able.

Binary data in general doesn’t diff well and is one of the reason why Git LFS exists.

Uncompressed asset bundles might be diffable to a degree but would also significantly increase download size.

1 Like

VRChat uses Unity’s assetbundle format, which is a “UnityFS” bundle. As far as I can tell the compression is at the individual file level so a binary diff would actually work pretty well. However, there are still a lot of little gotchas that you can run into with this stuff, and I can imagine that they wouldn’t want to keep around a billion versions of every file so that they can compute patches between arbitrary versions.

2 Likes

Binary data in general doesn’t diff well and is one of the reason why Git LFS exists.

Not exactly… git does fine with binary files, but it wasn’t designed for large files. Hence why it’s called LFS (Large File Storage), not BFS (Binary File Storage).

In general, uncompressed binary files diff just fine. Compressed files don’t diff well, regardless if they are text or binary. So if you do diffs, you want to diff the uncompressed data, and then compress the resulting diff. How feasible it is for VRChat to actually do that is a different question.

That said, I have no idea how common it is for players to load a world that’s in cache but out of date, but my guess is it’s probably not that common. I think most worlds you’re either visiting for the first time, or revisiting when they haven’t been updated.

Git handles large and mostly unchanging files fine too. “Large files” refers to files that are large, difficult to diff, and change frequently, i.e., files that will rapidly bloat a repo. These will usually be binary files.

So LFS is BFS :grinning:

The large files tend to be binary files only because text files tend to be small. If you have large text files, you’ll encounter the same problems you do with large binary files. You don’t put small binary files on LFS, because again, it’s LFS, not BFS.

It’s probably only the movie worlds cycling links around that has so many world updates. The avatar search world downloads the list from GitHub on load.

Otherwise most world I’ve visited only need rare changes

You absolutely put small binary files in LFS if they are large in number and change frequently, or else you run into the same problems you do with large files. Likewise, you generally don’t put large text files into LFS unless the entirety of the file changes frequently (Git by itself handles small changes to large files like this without issue).

LFS was created to deal with problems developers frequently faced by repos which use large amounts of binary data. Though those problems can be created by text files (and not be created by binary files) in certain repos, the name of the extension itself is ultimately just semantics.

I looked up lfs and from what I can tell it avoids diffing the designated files, and instead the repo tracks references, possibly in the form of a hash.

Looks like zsync is what linux appimage might use. GitHub - AppImageCommunity/zsync2: Rewrite of https://github.com/AppImage/zsync-curl, using modern C++, providing both a library and standalone tools.