[Closed] Micro-Challenge. #Split a String
are you bored or want to practice a little in the max scripting?
here is a task for you:
Split a string by a list of tokens.
This is not like a filterString (mxs) or a String.Split (dotnet).
The idea is to split up a string so that we have words and tokens.
Example:
-- splitString <string> <tokens>
easySplitString "--> Hello, << World >>!" "-<>! "
-- Result:
#("--", ">", " ", "Hello,", " ", "<<", " ", "World", " ", ">>", "!")
do you think it's trivial?
as usually try to do it [b]faster [/b]and [b]memory safer[/b]. :)
PS. Let's use just [b]pure MXS[/b] (no c#[color=white], no .NET[/color]) to even a chance of winning for a rookie and a pro
Hi Denis,
Here is my basic attempt
fn splitstring str tokens =
(
local strArray=#()
local lastIndex=1
for i=1 to str.count do
(
local found = findstring tokens str[i]
if found != undefined do
(
if i == 1 then append strArray str[i] -- starts with a token char
else
(
if (i-lastIndex)!=0 then append strArray (substring str (lastIndex) (i-lastIndex))
if matchpattern strArray[strArray.count] pattern:("*"+str[i]) then
strArray[strArray.count]+=str[i]
else append strArray str[i]
)
lastIndex = i+1
)
)
strArray
)
st=timestamp()
for i=1 to 1000 do (splitstring "--> Hello, << World >>!" "-<>! ")
format "% seconds
" ((timestamp() - st) / 1000.0)
0.423 seconds
Cheers
what was really changed? is it the restriction on .net/c# use?
well … you can use it if you want. personally i couldn’t beat my mxs function with any .net solution. maybe you can…
great! we have a first player!
please use the string and tokens from my sample, and give a time for 10000 executions.
thanks
ZBuffer:
1475 ms for 10000 iterations on my machine.
73073248 bytes of memory used.
not bad!
Hi Denis, is that function have practical purpose or its just for exercise?
Ok, I just still try to encrypt the definition of this challenge…
Could you explain the split criterion for words and tokens?
Looking at the example result open some questions.
In the example test string next char after “Hello,” is special char tabulator, and in your result its not a part of word, nor yet token, and it is joined with next token char which is space (” “). Is that the part of expected result? If so, am realy lost here Ah(!), or maybe this is just a forum typo? I know in [ CODE ] block 4 spaces are replaced with tab. If this is the case then sorry for bother you (:
Ok, I tested ZBuffer function and it return expected result if no tab char in the string, so this part is clear, and … logicaly – ‘word’ is any non-empty string delimited by tokens. Now, I go to sleep, maybe will think again about this tomorrow.
[EDIT]
Ok, if I’ve got the task right then here is my base attempt
fn splitString str tokens = (
fn isToken ch tk = (findString tk ch != undefined)
local aryOut = #(str[1]), index = 1, char
for i = 2 to str.count do (
char = str[i]
if isToken char tokens then (
if char == aryOut[index][1] then (aryOut[index] += char)
else (index += 1; aryOut[index] = char)
)
else (
if not isToken aryOut[index][1] tokens then (aryOut[index] += char)
else (index += 1; aryOut[index] = char)
)
)
aryOut
)
[EDIT#2]
– …uh, why i used findString insted of matchPattern?.. its time to sleep, really
This is my try:
(
fn splitString str tokens =
(
local result = #()
local last = undefined
local i = 0
local w = ""
while i < str.count do (
i += 1
local s = str[i]
if s == " " then s = " " -- this is so that Denis won't complain :)
local j = findString tokens s
if j == last then
w += str[i]
else (
if i > 1 then
append result w
last = j
w = str[i]
)
)
append result w
result
)
local ts = timeStamp()
local h = heapfree
for i = 1 to 10000 do
splitString "--> Hello, << World >>!" "-<>! "
format "%; %
" (timeStamp() - ts) (h - heapfree)
)
p.s: if I take down the line of code with the comment, and the line above it and use str[i] instead of s I get better results, but the result won’t be exactly as Denis wanted…
My attempt:
fn mySplit str tok =
(
local space = " "
local tab = " "
local res = #()
local tmpStr = ""
local wasToken = false
local lastChar = ""
local istoken
local tokens = for i = 1 to tok.count collect tok[i]
local strArr = for i = 1 to str.count collect str[i]
for s in strArr do
(
if lastChar == s then tmpStr+=s
else
(
if s == tab then s = space
lastChar = s
istoken = findItem tokens s > 0
if wasToken or isToken then
(
if tmpStr!="" do append res tmpStr
tmpStr = s
)
else
(
tmpStr += s
)
wasToken = isToken
)
)
append res tmpStr
res
)
733ms
34.08MB
I’m trying to do this without finditem and matchpattern ,
It take a long time to execute.
fn mysplitstring str tokens =
(
result = #()
local lastTokId
local getTok
local lastIsStr
for i = 1 to str.count do
(
getTok = false
for j = 1 to tokens.count do
(
if (str[i] == tokens[j]) do
(
if j == lastTokId then (result[result.count] += tokens[j]) else (append result tokens[j])
lastTokId = j ; getTok = true ; lastIsStr = false ; exit
)
)
if getTok == false do
(
lastTokId = undefined
if lastIsStr == true then (result[result.count] += str[i])else(append result str[i] ; lastIsStr = true)
)
)
)
you code is algorithmically correct but needs optimization.
#1:
Don’t use exit construct for breaking loops. It kills mxs function performance. (see MXS help for explanation)
#2:
change the loop through tokens in your code to use mxs buil-in function findString
these two things will make your code much faster
AND as I said: Thank you for your participation. We are all learning here.