[Closed] Micro-Challenge. #Split a String
i rewrote the same splitString (my string version) function on c++/SDK
[b]str [/b]= "--> Hello, << World >>!"
tokens = "-<>! "
[b]stt [/b]= ""
for k=1 to 1000 do stt += str
10000 iterations for str:
252 ms
3982104L
10 iterations for stt (str+str+str+ …1000 times):
1340 ms
3599556L
long string is very slow… i can’t understand why. probably i need to look at it more carefully.
i used max SDK string methods… maybe they are slow.
i will try to use pure c++.
Denis: The only problem with your solution is that it doesn’t solve the problem correctly.
It returns the result:
#("--", ">", " ", "Hello, ", " ", "<<", " ", "World", " ", ">>", "!")
instead of
#("--", ">", " ", "Hello,", " ", "<<", " ", "World", " ", ">>", "!")
:rolleyes:
P.S: your solution is indeed very clever, it reminds me of Formal Grammar
when i copied the code i lost some piece:
fn easySplitString str tokens =
(
local sx = "" /* start of text symbol*/
local ex = "" /* end of text symbol*/
local xx = ""
local em = ""
for i=1 to tokens.count do
(
t = tokens[i]
sss = substitutestring str t (ex+t+ex)
if sss.count != str.count do
(
str = substitutestring sss xx em
str = substitutestring str ex sx
)
)
filterstring str sx
)
thank for your pointing to the problem…
Seeing as how the competition is over, let me pose another challenge: lex a string to data structures based on an array of tokens.
my naive lexing attempt:
( clearListener(); local start = timestamp()
struct codeNode
(nodeType, nodeValue, nodePosition, tokenLength, tokenEndPosition, on create do (this.tokenEndPosition = (nodePosition+tokenLength)))
local mySnippet = "fn doIt=(print \"doing it!\");fn doNothing=();for i=1 to 10 do print \"works!\"; if 1>2 & 2>3 then doIt() else doNothing()"
local mxsTokens = #("if","then","else","for","to","do","print","&","=",">","\"","\\\"","fn","(",")",";","--","while",".count","not","and",\
"by","random","collect","as array","function","+","-","/","*","floor","as integer","as string","as float","try","catch","
"," ",\
"angle", "slider", "spinner", "button", "checkbutton", "mapbutton", "materialbutton", "pickbutton", "checkbox",\
"colorpicker", "listbox", "multilistbox", "dropdownlist", "combobox", "edittext", "groupbox", "hyperlink", "label", "progressbar",\
"radiobuttons", "bitmap", "imgtag", "rollout", "subrollout", "curvecontrol", "popupmenu", "timer", "menuitem", "rcmenu")
local ast = #()
for j=1 to mySnippet.count do
(
for i=1 to mxsTokens.count do
(
match = true
for h=1 to mxsTokens[i].count do
(if mySnippet[j+h] != mxsTokens[i][h] then (match = false))
if match == true then
(
local myNode = codeNode nodeType:undefined nodeValue:mxsTokens[i] nodePosition:j tokenLength:mxsTokens[i].count
append ast myNode
)
)
)
local myTotalTime = ((timeStamp() - start) / 1000.0) as string
print ("time elapsed: " + myTotalTime)
print ast
)
no, no, no…
it doesn’t work this way. nobody will solve the task for you if it doesn’t look as abstract.
so let’s change the rules…
#1
we have a list of tokens. for example #(” “,”:”,”=”,”(“,”)”,”!”)
#2
we have a list of operators and conditions. e.g. #(“+”, “!=”, “+=”)
#3
we have a list of reserved words. e.g. #(“if”, “do”)
let’s split the string:
“if miff != (::doom) do life+= 1”
I’ll bite. Here’s another snippet:
(
clearListener()
local tokens = #(" ", ":", "=", "(", ")" ,"!", "+", "!=", "+=", "if", "do")
local myString = "if miff != (::doom) do life+= 1"
struct codeNode
(nodeType, nodeValue, nodePosition, tokenLength)
local ast = #()
local myString = " " + myString --padd a space @ beginning for line 17's [j+h]
--
local start = timestamp()
for j=1 to myString.count do
(
for i=1 to tokens.count do
(
local match = true
for h=1 to tokens[i].count do
(if myString[j+h] != tokens[i][h] then (match = false))
if match == true then
(
local myNode = codeNode nodeType:undefined nodeValue:tokens[i] nodePosition:j tokenLength:tokens[i].count
append ast myNode
)
)
)
local myTotalTime = ((timeStamp() - start) / 1000.0) as string
print ("time elapsed: " + myTotalTime)
print ast
)
On a dualcore 1ghz execution time is 0.008ms.
Not sure about the memory footprint, but it’s probably pretty bad considering all the struct instances.
Whats missing in this implementation is correctly identifying user defined tokens, example: “miff, doom, life” in “if miff != (::doom) do life+= 1”. I’m doing this in a second pass:
local endPos = (ast[1].nodePosition-1)
local userToken = substring myString 1 endPos
userToken = (trimleft userToken)
userToken = (trimRight userToken)
if userToken != "" then
(
local myNode = codeNode nodeType:undefined nodeValue:userToken nodePosition:1 tokenLength:userToken.count
append ast myNode
)
--get the rest of the missing tokens, skipping the first
for i=2 to ast.count do
(
local startPos = (ast[i-1].tokenEndPosition)
local endPos = (ast[i].nodePosition-startPos)
--
local userToken = substring myString startPos endPos
userToken = (trimleft userToken)
userToken = (trimRight userToken)
if not isSpace userToken AND userToken != "" then
(
local myNode = codeNode nodeType:undefined nodeValue:userToken nodePosition:startPos tokenLength:userToken.count
append ast myNode
)
)
The next step might be to organize the tokens in the ast array according to their nodePosition.
Or, another step would be to define the nodeType on each instance of codeNode, for example a codeNode with the property .nodeValue == “true” would have a nodeType of “boolean”.
honestly the issue is closed. after we know how to split a string (text) by tokens and keep the tokens the problem lost its fascination.