[Closed] Find and Replace in huge .txt files
fn test =
(
start = timeStamp()
f="d:\ est.txt"
fw="d:\ est_w.txt"
local contents = (dotNetClass "System.IO.File").ReadAllLines f
for i = 1 to contents.count do
(
if (matchPattern contents[i] pattern:"*text*") then
(
print contents[i]
--contents[i] = replaceByString contents[i] "bla2"
)
)
(dotNetClass "System.IO.File").WriteAllLines fw contents
end = timeStamp()
format "Processing took % seconds
" ((end - start) / 1000.0)
)
test()
Hello. I wrote this function to find and replace some words in my text files. Each of my text file has 200mb of size, thats why I need to seed up this process. But seems I’m stuck I’ll be glad to hear any suggestions how speed up this process.
Maybe it’s better to keep all the processing in dotnet.
Or maybe even lookat python, it’s supposed to be really fast at processing text files.
-Johan
PERL is very good at processing text files. It’s pretty easy to use as well.
I’ve used it to auto inset tags into VRML files.
fn test =
(
start = timeStamp()
f="d:\ est.txt"
fw="d:\ est_w.txt"
while (heapFree<((getfileSize f)*4)) do heapSize+=10000000
local contents = (dotNetClass "System.IO.File").ReadAllLines f
for i = 1 to contents.count do
(
if (matchPattern contents[i] pattern:"*text*") then
(
contents[i] = substituteString contents[i] "text" "bla2"
)
)
(dotNetClass "System.IO.File").WriteAllLines fw contents
contents=#()
gc light:true
end = timeStamp()
format "Processing took % seconds
" ((end - start) / 1000.0)
)
test()
If you make sure there’s enough memory in your heap it’s not that slow IMO. I tried this on a 120mb file which had lots of occurences of “text” in it, and it took 27 seconds, which I think is reasonable within the bounds of maxscript considering the size of the file. How long is it taking you to execute with your 200mb files?
You have to stay with c#/.net solution and not go back and forth .net <-> mxs.
global FileAssembly
fn CreateFileAssembly forceRecompile:on =
(
if forceRecompile or not iskindof ::FileAssembly dotnetobject or (::FileAssembly.GetType()).name != "Assembly" do
(
source = ""
source += "using System;
"
source += "using System.IO;
"
source += "using System.Text.RegularExpressions;
"
source += "class FileIO
"
source += "{
"
source += "static public void ReplaceInFile(string fileIn, string searchText, string replaceText)
"
source += "{
"
source += " StreamReader reader = new StreamReader(fileIn);
"
source += " string content = reader.ReadToEnd();
"
source += " reader.Close();
"
source += " content = Regex.Replace(content, searchText, replaceText);
"
source += " StreamWriter writer = new StreamWriter(fileIn);
"
source += " writer.Write(content);
"
source += " writer.Close();
"
source += "}
"
source += "}
"
csharpProvider = dotnetobject "Microsoft.CSharp.CSharpCodeProvider"
compilerParams = dotnetobject "System.CodeDom.Compiler.CompilerParameters"
compilerParams.ReferencedAssemblies.Add("System.dll");
compilerParams.GenerateInMemory = true
compilerResults = csharpProvider.CompileAssemblyFromSource compilerParams #(source)
FileAssembly = compilerResults.CompiledAssembly
FileAssembly.CreateInstance "FileIO"
)
)
global FileIO = CreateFileAssembly()
global replaceInFile = FileIO.ReplaceInFile
for 210Mb file where every 3rd word has to be replaced it takes 15sec on my machine for not cached file and 5sec for the cached.
what can you store in 200MBytes text file and use it with MAX?!
I’m patch .mi files because of 3dsmax exporter shortcomings.
denisT: thank you for example, I’ll try it.
lo: thank you. I’ll keep in mind max’s memory limitations.