Knowledgebase: Tips and tricks
Why does PresSTORE not use Byte-Level Synchronization
Posted by Sven Koester on 02 June 2008 10:48
Historically, byte-level differences, or byte-patching methods were used to patch existing large-file installation distributions that somebody would like to upgrade w/o providing the user with the whole file (which might be large).|
So, an OS vendor can release, say 10GB of data for an OS and then find out that he needs to "fix" some parts of it. Instead of delivering 10GB again, he makes a byte-difference between the old and the new distribution and supplies the customer only with the difference, which should be simpler/smaller/whatever and thus easier to download/send-per-email/whatever. Customer then runs the diff between his 10GB installation and the patches supplied by the vendor yielding new (different) 10GB image.
This whole thing is developed some 10-15 years ago when networks were slow (modems etc) and cost of transferring the file outperfomed the cost of calculating checksums on source and applying those checksums on the target.
This is how byte-level-differences came to place.
Today we have fast networks. And large files. BUT... large files are usually _compressed_ media files or large datasets in highly compressed mode. If you chage just a _single_ bit in the largely-compressed file the whole file changes significantly, so calculating checksums becomes very expensive AND yields in practically whole-file-changed mode! So you spend lots of time to calculate the checksums and yet need to transfer the whole file (practically). Byte-level differences usually make no sense in such scenarios.
Looking at the small(er) files, cost of transfering the file approaches the cost of calculating checksums so there is no need in that as well.
The about only reason one can think of, where byte-level differences make sense is when handling very large uncompressed files that need to be copied over very slow networks. The only thing that comes in mind are database-files OR container files where some applications store their data (like Entourage mail client holding user mailboxes in flat uncompressed large file).
On the Mac, the good practice that Apple is preaching is to go away from large monolyth files, as they pose the problem for their Spotlight and Time Machine applications. Hence all apps on Mac usually store their output in discrete small(er) files. So they are actually moving away from stuffing all in one file, which makes byte-level diffs pretty useless...
Having said all that... yes there is a need for byte-level diffs but this need is smaller and smaller every day, which is the reason why we opted to completely ignore it.