Notifications
Clear all

[Closed] Compare two files prior to copy

Hello, I’m doing a simple resource collector where I copy textures from A to B. I would like to copy the texture only if the target file is different. If it’s the same, skip the copy.

How do I compare two files ? I thought about comparing their hashtag, with getHashValue() but it doesn’t work with files. I can’t rely on their size, since two different textures can have the same size.

5 Replies
 lo1

What is the point of creating a checksum and then comparing the checksum for every operation?
To create the checksum you are reading the entire file, for both files!

Checksum comparison is only effective if you can precompute them, this is not the case.

Here is a slightly faster method, straight byte comparison with early out when difference is found:

fn areFilesIdentical a b =
(
	if getFileSize a != getFileSize b do return false
		
	local fA = fOpen a "rS"
	local fB = fOpen b "rS"
	
	local different = false
	local count = 0
	local maxCount = getFileSize a / 8 - 1
	
	while not (different or count >= maxCount) do
	(
		if (readLongLong fA != readLongLong fB) do different = true
		count+=1
	)
	
	fClose fA
	fClose fB
	
	return not different
)

But let’s think first, about the consequences of comparing two files before copying one over the other:

  • If the files are different, the comparison will be fast, because the files will only be read up until the first different byte. Then your algorithm will perform the copying. In retrospect, you only wasted time comparing the files because it did not prevent the need for the copy.

  • If the files are similar, both files will still be entirely read and compared, and the copying can be avoided.
    The only condition in which this could even theoretically be faster than performing no comparison, is if doing 2 full reads is faster than 1 read and 1 write (overwriting).

edit: fixed function

I would just compare the files’ modified date.

getFileModDate <filename_string>

You can also call

getFileSize <filename_string>

to get the file size in bytes.

In my code, I use this function which I think Laszlo Sebo wrote back at Prime Focus:

	fn getFileInfoDotNet theFileName =
	(
		local fileLookup = dotnetobject "System.IO.DirectoryInfo" (getFileNamePath theFileName)
		local allMatchingFiles = #()
		try (
			allMatchingFiles = fileLookup.getFiles (filenamefrompath theFileName) 
		) catch()
		if allMatchingFiles.count == 1 then
		(
			local dotNetFile = allMatchingFiles[1]
			local fname = dotNetFile.FullName
			local date_ = dotNetFile.lastWriteTime.ToString()
			local size_ = dotNetFile.length
			#(fname, date_, size_)
		)
		else
			#()
	)

I call it for both files and then compare the second and third elements (date and size) and if either of them is different, assume they are different.

crc and c# assembly is the right solution for the per-byte comparing files.

Thanks for all your replies,

Lo, your method is pretty slow when files are identical for 2048*2048 maps (~800 ms), but that was expected.

I think I will stick with the files’ modified dates, that should be enough.