CRC32 stands for an algorithm: Cyclic Redudancy Check and 32 represents how many
bits does the final calculation represents. The algorithm is a sped up way of finding
remainders of division...
say we got a number: 123 and we divide it by 5... the answer is: 24 + 3... the 3 is the remainder..
when we divide it by 5, we know the remainder is between 0-4
Say we got a number: 12342878729874120987 and we divide it by 7.. the remainder is 6..
now lets assume the contents of a file (a file is basically bits and bytes which are
just numbers) are the number we want to divide... and we remember just the remainder..
The remainder will be your crc number...
But you will say: "its impossible to divide a number with 15,000,000 digits that fast,
otherwise we would be cracking all those credit card numbers..." well, we are not really
dividing, we are just trying to find out the remainder.. and we dont call sfv a
remainder algorithm but Cyclic Redudance Check afterall... This has to do with the
way HOW simple the CRC can be calculated (simple being relative term)..The thing is
that its just a simple set of XOR gates and left shift operations..
Thats why CRC check is so popular, it can be easily implemented in the hardware...
thats why RARs, Zips, Arjs, Internet:TCP use it... ;D
Ahh.. the old age question... Yes there are better algorithms...
you see, CRC32 generates 32bits... 32bits are not enough to ensure that our 15meg file
is unique..
You see, 32 bits gives us 4,294,967,296 bit combinations, while simple 15meg file has
of them: 5,287*10^3084 (thats in scientific notation, it means the number has 3084 digits)
as you can see there are many different 15meg files and there is a possibility that our
CRC32 will generate same code for two completly different files..
However this is not really a problem.. Usually when files get corrupt they dont have
ALOT of changes being made (few bits/bytes) which are not enough to be corrupt and still
generate a valid code... Another thing that saves us is that we generate SFV files
on relatively small files (15megs) not complete isos (700meg)...
But as I said, better algorithms do exist... For example large files (actually, irony,
linux Isos) are verified with md5 algorithm as opposed to sfv... MD5 uses 128 bits plus
has better avalanching than crc32 (this means even a single bit of difference will make
significant impact on the output)... Why is md5 not used? sfv became popular, md5 is
catching on however, but has not as wide use as md5.
Further reading recommended:
RFC 1320 is about the MD4 Message-Digest Algorithm.
RFC 1321 is about the MD5 one.
Also you might want to head to local bookstore and take some Compression books (crc checks
are used in compressions alot) and Networking books (again, CRC is popular).. Also books
on descrete math might be a good help (data integrity issues, and data mapping spread,
error detection, automatic error correction). Local university/College is always
a plus... ;D