[Closed] doesFileExist = Slow
I have been playing around with a tool and it needs to recurse drives or client folders and copy files. So instead of just doing every file it will check to see “doesFileExist” and then check dates using a dateTime object and compare. Problem is the tool will recurse one of my larger client folders in a couple minutes but as soon as “doesFileExist” is added it becomes 16 minutes. I have also tried fileIo.exists as well and that is slightly faster at around 14 minutes.
Does any one know of a better, or faster solution for checking if a file exists?
Hi Paul, have you tried checking only the filesize? Using FileInfo.Length ?
or even:
getFileSize <filename_string>
Returns the size of the specified file in bytes.
Returns 0 if the file could not be found.
If the file doesn’t exist that will throw an error, I will test but I believe that it will.
Thanks
I’m having a similar issue with a render submission tool using DoesFileExist.
It’s very slow.
According to a post here, “PathFileExists” in Shlwapi.dll should be much faster
File.Exists also instantiates CLR permissioning before checking the file exists for the file. An alternative (though I haven’t tried for performance) is PathFileExists if you’re doing a lot of checks:
[DllImport("Shlwapi.dll", SetLastError = true, CharSet = CharSet.Auto)] -- some code sample cut ...
Reply: This is MUCH faster. I just ran a non-isolated performance test on some code I had that was using File.Exists. I then re-ran it after I updated my implementation to use your approach above. The original code (on the same usage scenario) was taking 5% of the stack trace time. Using this approach, that number dropped to about 0.5%. For our process which is checking for file existence repeatedly, this represents a vast improvement. Thanks! G-Mac Jun 23 '12 at 21:17
that’s how it can be done in MXS:
fn CreateFileAssembly =
(
source = ""
source += "using System;
"
source += "using System.Text;
"
source += "using System.Runtime.InteropServices;
"
source += "class FileIO
"
source += "{
"
source += " [DllImport(\"Shlwapi.dll\", SetLastError = true, CharSet = CharSet.Auto)]
"
source += " [return: MarshalAs(UnmanagedType.Bool)]
"
source += " public static extern bool PathFileExists([MarshalAs(UnmanagedType.LPTStr)]string path);
"
source += "}
"
csharpProvider = dotnetobject "Microsoft.CSharp.CSharpCodeProvider"
compilerParams = dotnetobject "System.CodeDom.Compiler.CompilerParameters"
compilerParams.GenerateInMemory = on
compilerResults = csharpProvider.CompileAssemblyFromSource compilerParams #(source)
compilerResults.CompiledAssembly.CreateInstance "FileIO"
)
global FileIO = CreateFileAssembly()
global pathFileExists = FileIO.PathFileExists
will be interesting to compare it with doesfileexist … my tests don’t show a huge difference.
Thanks Denis, i wrote a small benchmark script that compares all thre methods ( attached to the post).
MXS … maxscript’s DoesFileExist()
NET … .NET File.IO.File.Exists()
SHL … shlwapi.dll’s PAthFileExists() ( via DenisT import code )
Essentially the outcome is the following:
* Filesystem caching is a factor
On first run on a previously untraversed folder tree, when the files mostly DO Exist
SHL is fastest, closely followed by .NET, MXS is slowest by a large amount
On the second run over the same directory tree, MXS becomes fastest ( 50% faster than SHL on the first run), followed on some distance by SHL, than closely by .NET
On contrary, when the files mostly do NOT Exist, on the furst run, MXS is fastest, followed by SHL and slowest is again .NET. A second run does not show much difference to the first run, possibly because non-existing filenames are not getting cached by the filesystem/system
Quick results
Testscenario 1: 40K files, files DO EXIST
------------------------------------------------------------------------
Initial run: Second run:
MXS: 2611 MXS: 842
NET: 1425 NET: 1391
SHL: 1368 SHL: 1369
Testscenario 2: 98,6K files, files DO NOT EXIST
---------------------------------------------------------------------
Initial run: Second run:
MXS: 1443 MXS: 1444
NET: 3022 NET: 2936
SHL: 2823 SHL: 2811
BTW: The attached scripts checks files in #maxroot, be carefull changing that folder in the script, as big folders (like system32) will take very long time to collect the files. I simply copy&pasted a maxscript docs recursive filename collection function…
thanks Josef’.
so we have pretty close results in general case. the miracle hasn’t not happened. i’ve read somewhere that Directory.Exists is workings faster than File.Exists in the .NET.
maybe it’s faster to check the file path first, and the file name after?
Have you considered an alternative approach altogether?
[ul]
[li] cache a dictionary of <directory, list of files in that directory>
[/li][li] for every new doesfileexist query, check if the directory of that file is already cached.
[/li][li] if it not cached, call GetFiles or the dotnet equivilent, Directory.GetFiles (whichever one proves faster) to cache the files in that directory.
[/li][li] check if the file in question exists in that directory by checking the cached filename array
[/li][/ul]
Memory use may be larger, but performance should benefit greatly.