the pure MXS dotnet is not bad too:
(
t0 = timestamp()
h0 = heapfree
matches = (dotnetclass "System.Text.RegularExpressions.Regex").Matches strToCheck "kappa"
k = for i=0 to matches.count-1 collect (matches.item i).index
format " count:% == time:% heap:%\n" k.count (timestamp() - t0) (h0 - heapfree)
)
the well-known issue kills performance – iterating of dotnet collections.
If anyone knows how to make regEx to find —– or any other number of – at the beginning of the line – please help.
I will try to find the answer tomorrow evening. Now it is time to go to the bed.
Thank you one more time.
the regex syntax is pretty simple, still I’d suggest you to read some good book about how to write efficient expressions. It will make your life easier many many times in the future
^
– beginning of the line
(dotNetClass "system.text.regularexpressions.regex").isMatch "----" "^[-]+" -- true
(dotNetClass "system.text.regularexpressions.regex").isMatch " ----" "^[-]+" -- false
(dotNetClass "system.text.regularexpressions.regex").isMatch " ----" "[-]+" -- true
(dotNetClass "system.text.regularexpressions.regex").isMatch " ----" "-+" -- true
(dotNetClass "system.text.regularexpressions.regex").isMatch " +-+-" "-[-+]+" -- true
as I said above, it’s very easy to get… just find ends of all lines first
hmm… but it will require a linear search of the index that is larger than the offset for every match just to know the line number.
did you mean smth like this?
(
ss = strToCheck as StringStream
seek ss 0
ends_of_lines_offsets = #()
while not eof ss do
(
if skiptostring ss "\n" != undefined do append ends_of_lines_offsets (filePos ss)
)
seek ss 0
skipToString ss "kappa omicron"
offset = filePos ss
line_number = for i = 1 to ends_of_lines_offsets.count where ends_of_lines_offsets[i] > offset do exit with i
seek ss 0
for i = 1 to line_number - 1 do skipToNextLine ss
readLine ss
)
something like this:
(
t0 = timestamp()
h0 = heapfree
r = (re.finditer "\n" strToCheck)
k = (re.finditer "kappa" strToCheck)
y = 0
s = 0
i = (r.next()).start()
pos = #()
while (x = try((k.next()).start()) catch()) != undefined do
(
while x > i do
(
s = i
y += 1
i = (r.next()).start()
)
append pos [x,y,x-s]
)
format " PY count:% == time:% heap:%\n" pos.count (timestamp() - t0) (h0 - heapfree)
pos
)
I don’t know how to suppress “StopIteration” yielding yet… it needs some “wrapping” of python try/except.
ok… found…
next(iterator, default)
Retrieve the next item from the iterator by calling its next() method. If default is given, it is returned if the
iterator is exhausted, otherwise StopIteration is raised.
so finally:
(
re = python.import "re"
next = (python.import "__builtin__").next
t0 = timestamp()
h0 = heapfree
r = (re.finditer "\n" strToCheck)
k = (re.finditer "kappa" strToCheck)
i = (next r).start()
y = 0
s = 0
pos = #()
while (x = next k undefined) != undefined do
(
x = x.start()
while x > i do
(
y += 1
s = i
i = (next r).start()
)
append pos [x,y,x-s]
)
format " PY count:% == time:% heap:%\n" pos.count (timestamp() - t0) (h0 - heapfree)
pos
)
ok, I see.
python is order of magnitude slower for this expression @"(^|\n|\r)\s*kappa"
which matches lines starting with kappa
guess we have to write plain python code and execute it with python.execute
to make it efficient. But how do we get the data back to mxs?
pyhon
PY count:1896 == time:1.784 sec. heap:15298128Lc#
Find words 0.172 sec.
1896
With python, going through all 50000 lines of a text file and finding all lines which starts wit “–”, skipping the empty spaces takes 0.086 sec and finds 4422 lines. I am pretty sure that the code can be optimized:
import os
import sys
import time
txtPath = "C:\SampleText.txt"
spaceStr = " "
dashStr = "--"
dashArr = []
dashLineNumArr = []
st = time.time()
with open(txtPath, "r") as file:
for num, line in enumerate(file, 1):
if len(line) != 0:
stringArr = line.split()
if stringArr:
first_word = stringArr[0]
f1 = first_word.rstrip('\n')
f2 = f1.rstrip('\t')
str1 = (f2.lstrip('\t')).lower()
if dashStr == str1:
dashArr.append(line)
dashLineNumArr.append(num)
et = time.time()
# get the execution time
elapsed_time = et - st
print('Execution time:', elapsed_time, 'seconds')
print ("defCommnetsArr", len(dashArr))
This works:
(
global PYTHON_RETURN
fn CollectOldFiles dir threshold_days: =
(
local pyCmd = StringStream ""
format "
import os, time
def collect_old_files(file_dir, threshold_days=None):
if threshold_days == None:
threshold_days = 0
threshold_time = ( time.time() ) - ( 60 * 60* 24 * threshold_days )
file_list = [os.path.join(file_dir, f) for f in os.listdir(file_dir) if '.ps1' in f and ( os.path.getmtime(os.path.join(file_dir, f)) <= threshold_time )]
return file_list
files = collect_old_files(r'%', %)
arr = '#({0})'.format(','.join([str('@\"'+str(n)+'\"') for n in files]))
MaxPlus.Core.EvalMAXScript('PYTHON_RETURN = {0}'.format(arr))
" ( TrimRight dir "\\" ) threshold_days to:pyCmd
python.execute ( pyCmd as string )
::PYTHON_RETURN
)
old_files = CollectOldFiles ( @"C:M1\" ) threshold_days:1
format "old_files: %\n" old_files[1]
)