Notifications
Clear all

[Closed] Micro-Challenge. #Split a String

i thought about it. but it will not be free any more.

i rewrote the same splitString (my string version) function on c++/SDK


  [b]str [/b]= "--> Hello,	 << World >>!"
  tokens = "-<>! "
  [b]stt [/b]= ""
  for k=1 to 1000 do stt += str
  

10000 iterations for str:
252 ms
3982104L

10 iterations for stt (str+str+str+ …1000 times):
1340 ms
3599556L

long string is very slow… i can’t understand why. probably i need to look at it more carefully.
i used max SDK string methods… maybe they are slow.
i will try to use pure c++.

Denis: The only problem with your solution is that it doesn’t solve the problem correctly.
It returns the result:

#("--", ">", " ", "Hello,	", " ", "<<", " ", "World", " ", ">>", "!")

instead of

#("--", ">", " ", "Hello,", "	 ", "<<", " ", "World", " ", ">>", "!")

:rolleyes:

P.S: your solution is indeed very clever, it reminds me of Formal Grammar

1 Reply
(@denist)
Joined: 11 months ago

Posts: 0

when i copied the code i lost some piece:


fn easySplitString str tokens =
(
	local sx = ""		/* start of text symbol*/
	local ex = ""		/* end of text symbol*/
	local xx = ""
	local em = ""

	for i=1 to tokens.count do 
	(
		t = tokens[i]
		sss = substitutestring str t (ex+t+ex)
		if sss.count != str.count do
		(	
			str = substitutestring sss xx em
			str = substitutestring str ex sx
		)
	)
	filterstring str sx
)

thank for your pointing to the problem…

Seeing as how the competition is over, let me pose another challenge: lex a string to data structures based on an array of tokens.

my naive lexing attempt:


  (	clearListener(); local start = timestamp()
  	struct codeNode
  	(nodeType, nodeValue, nodePosition, tokenLength, tokenEndPosition,	on create do (this.tokenEndPosition = (nodePosition+tokenLength)))
  	local mySnippet = "fn doIt=(print \"doing it!\");fn doNothing=();for i=1 to 10 do print \"works!\"; if 1>2 & 2>3 then doIt() else doNothing()"
  	local mxsTokens = #("if","then","else","for","to","do","print","&","=",">","\"","\\\"","fn","(",")",";","--","while",".count","not","and",\
  			"by","random","collect","as array","function","+","-","/","*","floor","as integer","as string","as float","try","catch","
","	",\
  			"angle", "slider", "spinner", "button", "checkbutton", "mapbutton", "materialbutton", "pickbutton", "checkbox",\
  			"colorpicker", "listbox", "multilistbox", "dropdownlist", "combobox", "edittext", "groupbox", "hyperlink", "label", "progressbar",\
  			"radiobuttons", "bitmap", "imgtag", "rollout", "subrollout", "curvecontrol", "popupmenu", "timer", "menuitem", "rcmenu")	
  	local ast = #()
  	for j=1 to mySnippet.count do
  	( 
  		for i=1 to mxsTokens.count do
  		( 	
  			match = true
  			for h=1 to mxsTokens[i].count do
  				(if mySnippet[j+h] != mxsTokens[i][h] then (match = false))
  			if match == true then
  			(
  				local myNode = codeNode nodeType:undefined nodeValue:mxsTokens[i] nodePosition:j tokenLength:mxsTokens[i].count
  				append ast myNode
  			)
  		)
  	)
  	local myTotalTime = ((timeStamp() - start) / 1000.0) as string
  	print ("time elapsed: " + myTotalTime)
	print ast
  )
  
1 Reply
(@denist)
Joined: 11 months ago

Posts: 0

no, no, no…
it doesn’t work this way. nobody will solve the task for you if it doesn’t look as abstract.

so let’s change the rules…
#1
we have a list of tokens. for example #(” “,”:”,”=”,”(“,”)”,”!”)
#2
we have a list of operators and conditions. e.g. #(“+”, “!=”, “+=”)
#3
we have a list of reserved words. e.g. #(“if”, “do”)

let’s split the string:
“if miff != (::doom) do life+= 1”

I’ll bite. Here’s another snippet:


  (
  	clearListener()	
  	local tokens = #(" ", ":", "=", "(", ")" ,"!", "+", "!=", "+=", "if", "do")
  	local myString = "if miff != (::doom) do life+= 1"
  	struct codeNode
  	(nodeType, nodeValue, nodePosition, tokenLength)
  	local ast = #()
  	local myString = " " + myString --padd a space @ beginning for line 17's [j+h]
  	--
  	local start = timestamp()
  	for j=1 to myString.count do
  	( 
  		for i=1 to tokens.count do
  		( 	
  			local match = true
  			for h=1 to tokens[i].count do
  				(if myString[j+h] != tokens[i][h] then (match = false))
  			if match == true then
  			(
  				local myNode = codeNode nodeType:undefined nodeValue:tokens[i] nodePosition:j tokenLength:tokens[i].count
  				append ast myNode
  			)
  		)
  	)
  	local myTotalTime = ((timeStamp() - start) / 1000.0) as string
  	print ("time elapsed: " + myTotalTime)
  	print ast
  )
  

On a dualcore 1ghz execution time is 0.008ms.
Not sure about the memory footprint, but it’s probably pretty bad considering all the struct instances.

Whats missing in this implementation is correctly identifying user defined tokens, example: “miff, doom, life” in “if miff != (::doom) do life+= 1”. I’m doing this in a second pass:


 local endPos = (ast[1].nodePosition-1)
 local userToken = substring myString 1 endPos
 userToken = (trimleft userToken)
 userToken = (trimRight userToken)				
 if userToken != "" then
 (
 	local myNode = codeNode nodeType:undefined nodeValue:userToken nodePosition:1 tokenLength:userToken.count
 	append ast myNode
 )		
 --get the rest of the missing tokens, skipping the first
 for i=2 to ast.count do
 (		
 	local startPos = (ast[i-1].tokenEndPosition)
 	local endPos = (ast[i].nodePosition-startPos)
 	--
 	local userToken = substring myString startPos endPos
 	userToken = (trimleft userToken)
 	userToken = (trimRight userToken)				
 	if not isSpace userToken AND userToken != "" then
 	(
 		local myNode = codeNode nodeType:undefined nodeValue:userToken nodePosition:startPos tokenLength:userToken.count
 		append ast myNode
 	)					
 )
  

The next step might be to organize the tokens in the ast array according to their nodePosition.
Or, another step would be to define the nodeType on each instance of codeNode, for example a codeNode with the property .nodeValue == “true” would have a nodeType of “boolean”.

honestly the issue is closed. after we know how to split a string (text) by tokens and keep the tokens the problem lost its fascination.

Page 3 / 3