Notifications
Clear all

[Closed] Micro-Challenge. #Split a String

are you bored or want to practice a little in the max scripting?
here is a task for you:
Split a string by a list of tokens.
This is not like a filterString (mxs) or a String.Split (dotnet).
The idea is to split up a string so that we have words and tokens.

  Example:

      -- splitString <string> <tokens>
      easySplitString "--> Hello,	 << World >>!" "-<>! "
      -- Result:
      #("--", ">", " ", "Hello,", "	 ", "<<", " ", "World", " ", ">>", "!")
      
  do you think it's trivial? 
  as usually try to do it [b]faster [/b]and [b]memory safer[/b]. :)

PS. Let's use just [b]pure MXS[/b] (no c#[color=white], no .NET[/color]) to even a chance of winning for a rookie and a pro
31 Replies

Hi Denis,
Here is my basic attempt


fn splitstring str tokens =
(
	local strArray=#()
	local lastIndex=1
	for i=1 to str.count do
	(
		local found = findstring tokens str[i]
		if found != undefined do
		(
			if i == 1 then append strArray str[i] -- starts with a token char
			else 
			(	
				if (i-lastIndex)!=0 then append strArray (substring str (lastIndex) (i-lastIndex))
				if matchpattern strArray[strArray.count] pattern:("*"+str[i]) then
				strArray[strArray.count]+=str[i]
				else append strArray str[i]
			)
			lastIndex = i+1
		)
	)
	strArray
)

st=timestamp()
for i=1 to 1000 do (splitstring "--> Hello,	 << World >>!" "-<>! ")
format "% seconds
" ((timestamp() - st) / 1000.0)

   
 0.423 seconds

   Cheers

what was really changed? is it the restriction on .net/c# use?
well … you can use it if you want. personally i couldn’t beat my mxs function with any .net solution. maybe you can…

great! we have a first player!

please use the string and tokens from my sample, and give a time for 10000 executions.

thanks

ZBuffer:
1475 ms for 10000 iterations on my machine.
73073248 bytes of memory used.

not bad!

Hi Denis, is that function have practical purpose or its just for exercise?
Ok, I just still try to encrypt the definition of this challenge…
Could you explain the split criterion for words and tokens?
Looking at the example result open some questions.

In the example test string next char after “Hello,” is special char tabulator, and in your result its not a part of word, nor yet token, and it is joined with next token char which is space (” “). Is that the part of expected result? If so, am realy lost here Ah(!), or maybe this is just a forum typo? I know in [ CODE ] block 4 spaces are replaced with tab. If this is the case then sorry for bother you (:

Ok, I tested ZBuffer function and it return expected result if no tab char in the string, so this part is clear, and … logicaly – ‘word’ is any non-empty string delimited by tokens. Now, I go to sleep, maybe will think again about this tomorrow.

[EDIT]
Ok, if I’ve got the task right then here is my base attempt

fn splitString str tokens = (
	fn isToken ch tk = (findString tk ch != undefined)
	local aryOut = #(str[1]), index = 1, char
	for i = 2 to str.count do (
		char = str[i]
		if isToken char tokens then (
			if char == aryOut[index][1] then (aryOut[index] += char)
			else (index += 1; aryOut[index] = char)
		)
		else (
			if not isToken aryOut[index][1] tokens then (aryOut[index] += char)
			else (index += 1; aryOut[index] = char)
		)
	)
	aryOut
)

[EDIT#2]
– …uh, why i used findString insted of matchPattern?.. its time to sleep, really

This is my try:

(
	fn splitString str tokens =
	(
		local result = #()
		local last = undefined
		local i = 0
		local w = ""
		while i < str.count do (
			i += 1
			local s = str[i]
			if s == "	" then s = " "		-- this is so that Denis won't complain :)
			local j = findString tokens s
			if j == last then
				w += str[i]
			else (
				if i > 1 then
					append result w
				last = j
				w = str[i]
			)
		)
		append result w
		result
	)
	
	local ts = timeStamp()
	local h = heapfree
	for i = 1 to 10000 do
		splitString "--> Hello,	 << World >>!" "-<>! "
	
	format "%; %
" (timeStamp() - ts) (h - heapfree)
)

p.s: if I take down the line of code with the comment, and the line above it and use str[i] instead of s I get better results, but the result won’t be exactly as Denis wanted…

 lo1

My attempt:

fn mySplit str tok =
(
	local space = " "
	local tab = "	"
	local res = #()
	local tmpStr = ""
	local wasToken = false
	local lastChar = ""
	local istoken
	local tokens = for i = 1 to tok.count collect tok[i]
	local strArr = for i = 1 to str.count collect str[i]	
	for s in strArr do
	(		
		if lastChar == s then tmpStr+=s
		else
		(
			if s == tab then s = space
			lastChar = s
			istoken = findItem tokens s > 0
			if wasToken or isToken then
			(
				if tmpStr!="" do append res tmpStr
				tmpStr = s
			)
			else 
			(
				tmpStr += s
			)
			wasToken = isToken
		)	
	)
	append res tmpStr
	res	
)

733ms
34.08MB

I’m trying to do this without finditem and matchpattern ,

It take a long time to execute.

fn mysplitstring str tokens =
(
result = #()
local lastTokId
local getTok
local lastIsStr
for i = 1 to str.count do
(
getTok = false
for j = 1 to tokens.count do
(
if (str[i] == tokens[j]) do
(
if j == lastTokId then (result[result.count] += tokens[j]) else (append result tokens[j])
lastTokId = j ; getTok = true ; lastIsStr = false ; exit
)
)
if getTok == false do
(
lastTokId = undefined
if lastIsStr == true then (result[result.count] += str[i])else(append result str[i] ; lastIsStr = true)
)
)
)

1 Reply
(@denist)
Joined: 11 months ago

Posts: 0

you code is algorithmically correct but needs optimization.
#1:
Don’t use exit construct for breaking loops. It kills mxs function performance. (see MXS help for explanation)
#2:
change the loop through tokens in your code to use mxs buil-in function findString

these two things will make your code much faster

AND as I said: Thank you for your participation. We are all learning here.

Page 1 / 3