[Closed] quickly find duplicated names in array of objs

(@denist)

Joined: 1 year ago

Posts: 0

Feb 18, 2015 7:24 pm

Mambo4

uniques = #()
dups = #()
for node in nodes where not (appendifunique uniques node.name) do appendifunique dups node.name
dups

(written blind)

i cannot guaranty that is the fastest way, but it’s fast enough, clear and compact

martinez

Feb 18, 2015 7:24 pm

Nice Denis. My own solution for this is easily 3x longer. I think I need to start thinking differently.

Your solution works if I change ‘nodes’ to ‘objects’

1 Reply

(@denist)

Joined: 1 year ago

Posts: 0

Feb 18, 2015 7:24 pm

martinez

nodes in my script means ‘a list of any specified objects which have a “name” property’. (selection, materials, modifiers, etc.)

Feb 18, 2015 7:24 pm

the binary search solution would look a bit like this…


   strings = #("grandad", "table","immutable","mutable","immutable","Dictionaries","Vectors",
   			"mutable","immutable","Dictionaries","Vectors")
   
   struct hstr ( str, hash )
   	
   fn sortcomparefn i j values: =
   (
   	v1 = values[i].hash; 
   	v2 = values[j].hash;
   	if v1 < v2 then 1 else if v1 > v2 then -1 else 0;
   )	
   
   fn hashstring str =
   (
   	h = 2166136261;
   	for i = 1 to str.count do
   		h = bit.or (h * 16777619) (bit.charAsInt str[i]);
   	h;
   )	
   
   fn CollectHStr stringlist = for s in stringlist collect hstr s (hashstring s);
   
   fn FindUniqueBinaryStylee stringlist =
 (	
 	nStrs = stringlist.count;
 	
 -- hash the strings and create an indexing array	
 	
 	hashedstrings = CollectHStr stringlist;
 	indexes =  #{1..nStrs} as array;
 
 -- sort the indexes	
 	
 	qsort indexes sortcomparefn values:hashedstrings;
 
 -- collect uniques using binary search	
 	
 	for i = 1 to nStrs where i == bsearch i indexes sortcomparefn values:hashedstrings collect i
 )	

   uniques = FindUniqueBinaryStylee strings;
duplicates = (#{1..strings.count} - uniques as bitarray) as array;

not as concise as denis’s “brute force” method but would win hands down if when the count gets high.

compressed edit

1 Reply

(@denist)

Joined: 1 year ago

Posts: 0

Feb 18, 2015 7:24 pm

the calling method from BIT structure is slow

getting a structure item property is slow

but the hashing is a right idea. the sorting and comparing of integers is faster than of anything else

Feb 18, 2015 7:24 pm

i’m absolutely sure that for long lists the solution like:

names = for node in nodes collect node.name
sort names
dups = #()
for k=2 to names.count+1 where stricmp names[k-1] names[k] ==0 and stricmp names[k] names[k+1] !=0 do append dups names[k-1]

will be faster

Feb 18, 2015 7:24 pm

and my solution above can be faster… the half of string comparisons are not necessary.
pseudo code is:

if current item in list is the same as previous and previous was not added to list of dups add current

Feb 18, 2015 7:24 pm

i have two functions written on c++ – find uniques and find dups
they use almost the algorithm

using list of processing items collect list of their hash values

sort linked lists by hashed values

do linear iterating through sorted hash values to search hits

Feb 18, 2015 7:24 pm

easily changed but I’m not a bit fan of max’s hash function

strings = #("grandad", "table","immutable","mutable","immutable","Dictionaries","Vectors",
               "mutable","immutable","Dictionaries","Vectors")
   
   fn sortcomparefn i j values: =
   (
       v1 = values[i]; 
       v2 = values[j];
       if v1 < v2 then 1 else if v1 > v2 then -1 else 0;
   )    
   
   fn FindUniqueBinaryStylee stringlist =
   (    
       nStrs = stringlist.count;
       
   -- hash the strings and create an indexing array    
       
       hashedstrings =  for s in stringlist collect getHashValue  s 73856093;
       indexes =  #{1..nStrs} as array;
   
   -- sort the indexes    
       
       qsort indexes sortcomparefn values:hashedstrings;
   
   -- collect uniques using binary search,  collects a unique value or the first value in a run of duplicates 
       
       for i = 1 to nStrs where i == bsearch i indexes sortcomparefn values:hashedstrings collect i
   )    
   
   uniques = FindUniqueBinaryStylee strings;
   duplicates = (#{1..strings.count} - uniques as bitarray) as array;

1 Reply

(@denist)

Joined: 1 year ago

Posts: 0

Feb 18, 2015 7:24 pm

in c++ i’m using std::hash

Feb 18, 2015 7:24 pm

the problem of this method is that you can search by only exact coincidences. so if you want to search strings with case ignore you have to use their lower case for example… if you search close points (point2,3,4,color…) you have to round them.

Feb 18, 2015 7:24 pm

not had any issues using this method with points, normals and colour. It’s the same method used in our exporter, and that hasn’t failed yet… he say’s tempting fate as for the exact coincidences you can vary the hash to account for those. I use my own hash functions for the most part. Besides the OP did ask how binary search could be used in this situation !