Notifications
Clear all

[Closed] Challenge: Find Duplicates in Array


 fn findDups5 list = 
 (
 	vals = #()
 	dups = #()
 
 	for k=1 to list.count do 
 	(
 		if (i = finditem vals list[k]) != 0 then append dups[i] k else 
 		(
 			append vals list[k]
 			dups[vals.count] = #{k}
 		)
 	)
 	dups
 )
 

a little optimized, and a little faster… for strings:
[b]5 >> dups:100 time:242 memory:6664L

[/b]probably it will be hard to beat it by speed, memory use, and universality

Very impressive, I’m struggling to understand how yours is so much less heavy on the memory? The functions seem so similar…

1 Reply
(@denist)
Joined: 11 months ago

Posts: 0

because of the using bitarrays instead of arrays

here is multi-type array setup:


 seed 0
 list = for k=1 to 50000 collect 
 (
 	v = random 1 100
 	case (random 1 3) of 
 	(
 		1: v as integer
 		2: v as string
 		3: point2 v v
 	)
 )
 

and the function still works very well
5 >> dups:300 time:276 memory:19728L
more dups – much memory used… as expected

Just an idea:

would’nt it help to first sort the array and look for identical neighbours afterwards (in an optimized way ) ?

2 Replies
(@gazybara)
Joined: 11 months ago

Posts: 0

If sorting included do you think that bsearch method is right choise for finding dups?

(@denist)
Joined: 11 months ago

Posts: 0

i’ve tried it. first of all it really gives a result for very big arrays with rare dups. that means in the situation when search goes almost through array elements.
the second problem is that this method can not be use for a multi-type arrays, where there is no way to sort elements with any built-in sorter.

of course we have to count a time for any per-processing including sorting. bsearch? go mxs help and check what bsearch returns… could it really help? bsearch can answer the question “does key exist?” but can’t say where and how many times…

Page 2 / 2