Notifications
Clear all

[Closed] Bsearch to replace finditem with huge array?

I want to use bsearch to replace finditem function,blow is the origin way:

Fn Find_string_in_array arr str=
(
	count=0
	for i=1 to arr.count do
	(
		if findstring arr[i] str!=undefined then
		(
			count+=1
		)
	)
	count
)

key_string="C:\Users\Administrator\Personal\WeChat Files"
need_to_find=#(
"C:\Users\Administrator\Personal",
"C:\Users\Administrator\Personal\3ds Max 2020",
"C:\Users\Administrator\Personal\3ds Max 2021",
"C:\Users\Administrator\Personal\3dsMax",
"C:\Users\Administrator\Personal\Apowersoft",
"C:\Users\Administrator\Personal\Autodesk Application Manager",
"C:\Users\Administrator\Personal\WeChat Files",
"C:\Users\Administrator\Personal\WeChat Files\test.max",
"C:\Users\Administrator\Personal\Windows Live"
)

print (Find_string_in_array need_to_find key_string)

–results
–2 matches
/*
“C:\Users\Administrator\Personal\WeChat Files”,
“C:\Users\Administrator\Personal\WeChat Files\test.max”,
*/

It will too slow for a huge array,I am trying to use bsearch to do the same job,but I don’t know how to use it.

Fn bsearch_fn=
(
	--how to compare the string?
)
Fn Find_string_use_bsearch arr str=
(
	count=0
	compare=bsearch str need_to_find Comp_fn
	if compare!=undefined then count+=1
	count
)
print (Find_string_use_bsearch need_to_find key_string)

Any help would be appericate!

18 Replies

this is one way to do it, if you can guarantee the strings are already sorted you can forgo the qsort stage…

  (
    	
    	key_string="C:\Users\Administrator\Personal\WeChat Files";
    	strings = #(    "C:\Users\Administrator\Personal", "C:\Users\Administrator\Personal\3ds Max 2020",
    						"C:\Users\Administrator\Personal\3ds Max 2021", "C:\Users\Administrator\Personal\3dsMax",
    						"C:\Users\Administrator\Personal\Apowersoft", "C:\Users\Administrator\Personal\Autodesk Application Manager",
    						"C:\Users\Administrator\Personal\WeChat Files", "C:\Users\Administrator\Personal\WeChat Files\test.max",
    						"C:\Users\Administrator\Personal\Windows Live" );
    	
    	fn string_indexing_sort_compfn v1 v2 string_array: = stricmp string_array[v1] string_array[v2];
    	fn string_indexing_search_compfn v1 v2 string_array: = stricmp v1 string_array[v2];
    	indexing = for i = 1 to strings.count collect i;
    	
    	qsort indexing string_indexing_sort_compfn string_array:strings;
    	key_index = bsearch key_string indexing string_indexing_search_compfn string_array:strings;

    )

If the
key_string="C:\Users\Administrator\Personal\3ds Max"

then the key_index returns undefined,in my example with findstring function,returns

"C:\Users\Administrator\Personal\3ds Max 2020"
"C:\Users\Administrator\Personal\3ds Max 2021"

i’m afraid it’s difficult to do with besearch , the function just search the first element matched
If i need to do this , I will use regular expression , here is a reference

findstrings=@"C:\Users\Administrator\Personal\WeChat Files"
aa=#(
@"C:\Users\Administrator\Personal",
@"C:\Users\Administrator\Personal\3ds Max 2020",
@"C:\Users\Administrator\Personal\3ds Max 2021",
@"C:\Users\Administrator\Personal\3dsMax",
@"C:\Users\Administrator\Personal\Apowersoft",
@"C:\Users\Administrator\Personal\Autodesk Application Manager",
@"C:\Users\Administrator\Personal\WeChat Files",
@"C:\Users\Administrator\Personal\WeChat Files\test.max",
@"C:\Users\Administrator\Personal\Windows Live"
)
bb=aa as string
cc=substituteString bb "\\" "/"
ss=dotNetObject "System.Text.RegularExpressions.Regex" "" (dotnetclass "System.Text.RegularExpressions.RegexOptions").IgnoreCase
dd=ss.Split cc (substituteString findstrings "\\" "/")
sel=#()
for i = 2 to dd.count do 
if dd[i][1] ==  "\"" then append sel findstrings
	 else append sel (findstrings + substituteString (filterString dd[i] "\",")[1] "/"  "\\"  ) 
sel

how “huge” are your arrays?

do you sure you need “find a sub string” instead of “check matching a pattern”?

in many cases matchpattern works faster than findstring

Over 500,000 items ,I’ve tried matchpattern,it’s faster than findstring,I want to cost less time to finish the job if possible.

are you always looking from the beginning of path?

at least it looks like from your examples

(
	t0 = timestamp()
	h0 = heapfree

	x = "hjihfidjsfhjihfidjsfhjihfidjsfhjihfidjsfhjihfidjsfhjihfidjsfhjihfidjsfidjsfhjihfidjsfhjihfidjsfhjihfidjsfhjihfidjsf"
	
	for k=1 to 500000 do
	(
		(substring x 1 x.count) == x
	)

	format "time:% heap:%\n" (timestamp() - t0) (h0 - heapfree)
)

this is the worst scenario in our case… but it’s only ~1 sec. Is one sec a problem?

I don’t know how an exact string comparison is done in MXS … maybe using memcmp it might be faster … if we can’t use the SDK, at least .NET methods can give us comparable speed.

if we need to look for substring, then something simple that comes to mind is the Boyer-Moore Algorithm

it is somewhere in my archive, but… since it is in the archive, this means that it was not really in demand

Another method, which seems reasonable, is to organize everything like a TREE (which makes sense for paths). BRANCH search can be more efficient

Page 1 / 2