Notifications
Clear all

[Closed] Challenge: Find Duplicates in Array

aNums = #(1,2,3,4,5,6,7,8,9,10,11,12,4,5,6,5,13,14,15,16)
aStrings = #(“Test1”,“Test2”,“Test2”,“test2”,“Test3”)

Find all the duplicates for integers or strings, with optional case-sensitive search. Return the indexes for all Dupes.

Fastest method wins a smiley from DenisT

19 Replies

Not that’s what I call cheap outsourcing!

I’ll give this a proper go tonight, but I never do very well with speed. Probably all that profanity I insist on printing to the listener…

1 Reply
(@davewortley)
Joined: 10 months ago

Posts: 0

I already have my solution It’s not too slow but I just thought it’d be fun to setup a nice challenge, I always learn lots :0

@Denis, Yep, something like that can work.

what does the function has to return in case of #(“Test1”,“Test2”,“Test2”,“test2”,“Test3”)?
is it something like #(#{1}, #{2…4}, #{3})?

let’s make a test array… do it first for numbers:


seed 0
arr = for k=1 to 10000 collect (random 1 100)

1 Reply
(@denist)
Joined: 10 months ago

Posts: 0

the simplest function gives me this (max 2012/64):
dups:100 time:783 memory:15344L

i hope to beat myself and do the search for ~100 ms

let’s make initial array bigger:


  seed 0
  list = for k=1 to 50000 collect (random 1 100)
  

i have now two methods. there are results:
1 >> dups:100 time:3235 memory:13864L
2 >> dups:100 time:611 memory:19759624L

the first is 5 times slower than the second, but the second is too hungry for memory

what are your numbers for 50,000 test array?

1 Reply
(@davewortley)
Joined: 10 months ago

Posts: 0

dupes:100 time:208 memory:2782600L

It appears I have a memory hungry one too!

there’s very good number for speed. was it specifically optimized for integers?
here is numbers in case i know for sure that my list is integers:
3 >> dups:100 time:91 memory:6664L

My current function is only for numbers, not sure where float values will slow it at all, but working on your 50000 random number generator.

gc()
t1 = timestamp()
m1 = heapfree

seed 0
list = for k=1 to 50000 collect (random 1 100)

aDupes = #()
aDupePairs = #()

for i = 1 to list.count do
(
	a = finditem aDupes list[i]
	if a == 0 then
	(
		append aDupes list[i]
		append aDupePairs #(i)
	)
	else
	(
		append aDupePairs[a] i
	)
)

aDupePairs
format "dupes:%	time: %	memory:%
" aDupePairs.count (timestamp() - t1) (m1 - heapfree)

Actually it works for strings too… but even heavier/slower…

seed 0
list = #()

for i = 1 to 50000 do 
(
	append list ("Test" + (random 1 100) as string)
)

aDupes = #()
aDupePairs = #()

for i = 1 to list.count do
(
	a = finditem aDupes list[i]
	if a == 0 then
	(
		append aDupes list[i]
		append aDupePairs #(i)
	)
	else
	(
		append aDupePairs[a] i
	)
)

aDupePairs
format "dupes:%	time: %	memory:%
" aDupePairs.count (timestamp() - t1) (m1 - heapfree)

dupes:100 time: 700 memory:18580336L

here is my best function… pretty fast, no memory use, can work with any type of array elements including multiple types:


 seed 0
 list = for k=1 to 50000 collect ((random 1 100) as string)

fn findDups4 list = 
(
	vals = #()
	dups = #()
	
	for k=1 to list.count do 
	(
		if (i = finditem vals list[k]) == 0 do i = (append vals list[k]).count
		if dups[i] == undefined then dups[i] = #{k} else append dups[i] k
	)
	dups
)
 (
 	gc()
 	t1 = timestamp()
 	m1 = heapfree
 	dups = findDups4 list
 	format "4 >> dups:% time:% memory:%
" dups.count (timestamp() - t1) (m1 - heapfree)
 	
 	ok
 )
 

/*
4 >> dups:100 time:175 memory:6664L – integers
4 >> dups:100 time:281 memory:6696L – strings
*/

Page 1 / 2