I don’t see any reason to use Python… I’m pretty sure C# (dotnet) can do it just as fast.
Denis, I can’t send you the text file, sorry. But the string, generated by the maxscript code should be enough for performance testing. Serejah is using seed 12345
, I used seed 123
, but since last evening I also use 12345, so the generated string has to be the same.
Here is a text file, generated by the script(I have added —– in the wordsArr): https://drive.google.com/file/d/1CIzUFv_p_0ae9PX49AcuA-wcJu5W0AwS/view?usp=share_link
The task is to find lines that starts with “kappa”, or “KaPpA”(or any other combination of upper and lowercase letters) no matter of the empty spaces before the “kappa”.
When I saw the speed of the pure maxscript I decided to find another solution. Since I can use Python, but not C# I decided to try with python and it proves to be much much faster than maxscript. But, then the problem with executing the python inside maxscript and getting data back arised.
As I said, learning how to execute python inside maxscript is something that I want to learn.
(
t0 = timestamp()
h0 = heapfree
ss = filterstring strToCheck "\n"
pt = "kappa*"
ii = for k=1 to ss.count where matchpattern (trimleft ss[k] " \t") pattern:pt collect k
format "count:% time:% heap:%\n" ii.count (timestamp() - t0) (h0 - heapfree)
)
it’s very fast… I don’t see the reason to get it faster
It is the same as this one: Fast search through text file
What I have only for finding the Kappa:
time:76 heap:9919600L
kappaArr: 338
could you post (or send me) the file you are using for the test? (SampleText.txt)
to measure performance we have to test the same source
oops… I seem to have missed the point of the task. Do we only need to find lines that start with “kappa”?
Once again, what do we need?
as of the source string, I just set a random seed to 12345 to make sure it is always the same thing and used it to test
so is it not as simple as:
(
t0 = timestamp()
h0 = heapfree
ss = filterstring strToCheck "\n"
rx = dotnetobject "System.Text.RegularExpressions.Regex" "^(kappa|omicron)"
ii = for k=1 to ss.count where rx.IsMatch ss[k] collect k
format "count:% time:% heap:%\n" ii.count (timestamp() - t0) (h0 - heapfree)
ii
)
the only trouble I see is that not every match might be valid since “kappa*” pattern will match “kappa ” exactly as “kappaz”, that’s why I dismissed startswith and switched to regex.
Of course another extra check for the whitespace character could be added for every candidate, but I bet regex will be faster
i have for “kappa*”:
count:1866 time:54 heap:10019252L
do you need for many patterns?
denisT:
i have for “kappa*”:
count:1866 time:54 heap:10019252L
do you need for many patterns?
I have:
time:88 heap:10017612L
kappaArr: 1860 – this difference is because I use the dynamically generated string.
Yes, it have to find all words if needed – for each word collect line number, text on this line.
I have to check it against this:
import os
import sys
import time
txtFile = "H:\E_Desktop\M1\50000LinesOfText.txt"
_alpha = "alpha"
_beta = "beta"
_gama = "gama"
_delta = "delta"
_Epsilon = "epsilon"
_Zeta = "zeta"
_Eta = "eta"
_Theta = "theta"
_Iota = "iota"
_kaPPa = "kappa"
_LamBda = "lambda"
_mU = "mu"
_Nu = "nu"
_xi = "xi"
_omicron = "omicron"
_pi = "pi"
_rHo = "rho"
_siGma = "sigma"
_Tau = "tau"
_UpSiLoN = "upsilon"
_pHi = "phi"
_chi = "chi"
_psi = "psi"
_omega = "omega"
tokensArr = [_alpha, _beta, _gama, _delta, _Epsilon, _Zeta, _Eta, _Theta, _Iota, _kaPPa, _mU, _Nu, _xi, _omicron, _pi, _siGma, _Tau, _UpSiLoN, _pHi, _chi, _psi, _omega, _LamBda, _rHo, ]
# define the arrays to store the data
defalphaArr = []
defbetaArr = []
defgamaArr = []
defdeltaArr = []
defEpsilonArr = []
defZetaArr = []
defEtaArr = []
defThetaArr = []
defIotaArr = []
defkaPPaArr = []
defmUArr = []
deNuArr = []
defxiArr = []
defomicronArr = []
defpiArr = []
defrHoArr = []
defsiGmaArr = []
defTauArr = []
defUpSiLoNArr = []
defpHiArr = []
defchiArr = []
defpsiArr = []
defomegaArr = []
defLamBdaArr = []
defalphaLineNumArr = []
defbetaLineNumArr = []
defgamaLineNumArr = []
defdeltaLineNumArr = []
defEpsilonLineNumArr = []
defZetaLineNumArr = []
defEtaLineNumArr = []
defThetaLineNumArr = []
defIotaLineNumArr = []
defkaPPaLineNumArr = []
defmULineNumArr = []
deNuLineNumArr = []
defxiLineNumArr = []
defomicronLineNumArr = []
defpiLineNumArr = []
defrHoLineNumArr = []
defsiGmaLineNumArr = []
defTauLineNumArr = []
defUpSiLoNLineNumArr = []
defpHiLineNumArr = []
defchiLineNumArr = []
defpsiLineNumArr = []
defomegaLineNumArr = []
defLamBdaLineNumArr = []
spaceStr = " "
st = time.time()
with open(txtFile, "r") as file:
for num, line in enumerate(file, 1):
if len(line) != 0:
stringArr = line.split()
if stringArr:
first_word = stringArr[0]
f1 = first_word.rstrip('\t\n')
str1 = (f1.lstrip('\t')).lower()
for token in tokensArr:
if token == str1:
if str1 == _alpha:
defalphaArr.append(first_word + spaceStr + stringArr[1])
defalphaLineNumArr.append(num)
break
elif str1 == _beta:
defbetaArr.append(first_word + spaceStr + stringArr[1])
defbetaLineNumArr.append(num)
break
elif str1 == _gama:
defgamaArr.append(first_word + spaceStr + stringArr[1])
defgamaLineNumArr.append(num)
break
elif str1 == _delta:
defdeltaArr.append(first_word + spaceStr + stringArr[1])
defdeltaLineNumArr.append(num)
break
elif str1 == _Epsilon:
defEpsilonArr.append(first_word + spaceStr + stringArr[1])
defEpsilonLineNumArr.append(num)
break
elif str1 == _Zeta:
defZetaArr.append(first_word + spaceStr + stringArr[1])
defZetaLineNumArr.append(num)
break
elif str1 == _Eta:
defEtaArr.append(first_word + spaceStr + stringArr[1])
defEtaLineNumArr.append(num)
break
elif str1 == _Theta:
defThetaArr.append(first_word + spaceStr + stringArr[1])
defThetaLineNumArr.append(num)
break
elif str1 == _Iota:
defIotaArr.append(line)
defIotaLineNumArr.append(num)
break
elif str1 == _kaPPa:
defkaPPaArr.append(line)
defkaPPaLineNumArr.append(num)
break
elif str1 == _LamBda:
defLamBdaArr.append(line)
defLamBdaLineNumArr.append(num)
break
elif str1 == _mU:
defmUArr.append(first_word + spaceStr + stringArr[1])
defmULineNumArr.append(num)
break
elif str1 == _Nu:
deNuArr.append(line)
deNuLineNumArr.append(num)
break
elif str1 == _xi:
defxiArr.append(first_word + spaceStr + stringArr[1])
defxiLineNumArr.append(num)
break
elif str1 == _omicron:
defomicronArr.append(first_word + spaceStr + stringArr[1])
defomicronLineNumArr.append(num)
break
elif str1 == _pi:
defpiArr.append(first_word + spaceStr + stringArr[1])
defpiLineNumArr.append(num)
break
elif str1 == _rHo:
defrHoArr.append(first_word + spaceStr + stringArr[1])
defrHoLineNumArr.append(num)
break
elif str1 == _siGma:
defsiGmaArr.append(line)
defsiGmaLineNumArr.append(num)
break
elif str1 == _Tau:
defTauArr.append(line)
defTauLineNumArr.append(num)
break
elif str1 == _UpSiLoN:
defUpSiLoNArr.append(line)
defUpSiLoNLineNumArr.append(num)
break
elif str1 == _pHi:
defpHiArr.append(line)
defpHiLineNumArr.append(num)
break
elif str1 == _chi:
defchiArr.append(first_word + spaceStr + stringArr[1])
defchiLineNumArr.append(num)
break
elif str1 == _psi:
defpsiArr.append(line)
defpsiLineNumArr.append(num)
break
elif str1 == _omega:
defomegaArr.append(first_word + spaceStr + stringArr[1])
defomegaLineNumArr.append(num)
break
et = time.time()
# get the execution time
elapsed_time = et - st
print('Execution time:', elapsed_time, 'seconds')
print ("defalphaArr", len(defalphaArr))
print ("defbetaArr", len(defbetaArr))
print ("defgamaArr", len(defgamaArr))
print ("defdeltaArr", len(defdeltaArr))
print ("defEpsilonArr", len(defEpsilonArr))
print ("defZetaArr", len(defZetaArr))
print ("defEtaArr", len(defEtaArr))
print ("defThetaArr", len(defThetaArr))
print ("defIotaArr", len(defIotaArr))
print ("defkaPPaArr", len(defkaPPaArr))
print ("defmUArr", len(defmUArr))
print ("defpHiArr", len(defpHiArr))
print ("deNuArr", len(deNuArr))
print ("defxiArr", len(defxiArr))
print ("defomicronArr", len(defomicronArr))
print ("defpiArr", len(defpiArr))
print ("defsiGmaArr", len(defsiGmaArr))
print ("defTauArr", len(defTauArr))
print ("defUpSiLoNArr", len(defUpSiLoNArr))
print ("defchiArr", len(defchiArr))
print ("defpsiArr", len(defpsiArr))
print ("defomegaArr", len(defomegaArr))
print ("defLamBdaArr", len(defLamBdaArr))
print ("defrHoArr", len(defrHoArr))
print ("defalphaArr", defalphaArr[0])
Result:
Execution time: 0.186000108719 seconds
defalphaArr 1852
defbetaArr 1856
defgamaArr 1896
defdeltaArr 1877
defEpsilonArr 1800
defZetaArr 1844
defEtaArr 1779
defThetaArr 1751
defIotaArr 1846
defkaPPaArr 1837
defmUArr 1855
defpHiArr 1790
deNuArr 1817
defxiArr 1840
defomicronArr 1859
defpiArr 1863
defsiGmaArr 1931
defTauArr 1865
defUpSiLoNArr 1948
defchiArr 1794
defpsiArr 1807
defomegaArr 1806
defLamBdaArr 1776
defrHoArr 1754
defalphaArr alpha delta
Recent Topics
-
Need help with script to automate certain things
By siddhartha 6 days ago
-
By Dave 3 months ago
-
By MZ1 1 year ago
-
By brianstorm 1 year ago
-
By MZ1 1 year ago
-
Support class with multiple name (C#,SDK)
By MZ1 1 year ago
-
How to sort subarrays for Mat ID swapping
By brianstorm 1 year ago