[Closed] Challenge: Find Duplicates in Array
aNums = #(1,2,3,4,5,6,7,8,9,10,11,12,4,5,6,5,13,14,15,16)
aStrings = #(“Test1”,“Test2”,“Test2”,“test2”,“Test3”)
Find all the duplicates for integers or strings, with optional case-sensitive search. Return the indexes for all Dupes.
Fastest method wins a smiley from DenisT
Not that’s what I call cheap outsourcing!
I’ll give this a proper go tonight, but I never do very well with speed. Probably all that profanity I insist on printing to the listener…
I already have my solution It’s not too slow but I just thought it’d be fun to setup a nice challenge, I always learn lots :0
@Denis, Yep, something like that can work.
what does the function has to return in case of #(“Test1”,“Test2”,“Test2”,“test2”,“Test3”)?
is it something like #(#{1}, #{2…4}, #{3})?
let’s make a test array… do it first for numbers:
seed 0
arr = for k=1 to 10000 collect (random 1 100)
the simplest function gives me this (max 2012/64):
dups:100 time:783 memory:15344L
i hope to beat myself and do the search for ~100 ms
let’s make initial array bigger:
seed 0
list = for k=1 to 50000 collect (random 1 100)
i have now two methods. there are results:
1 >> dups:100 time:3235 memory:13864L
2 >> dups:100 time:611 memory:19759624L
the first is 5 times slower than the second, but the second is too hungry for memory
there’s very good number for speed. was it specifically optimized for integers?
here is numbers in case i know for sure that my list is integers:
3 >> dups:100 time:91 memory:6664L
My current function is only for numbers, not sure where float values will slow it at all, but working on your 50000 random number generator.
gc()
t1 = timestamp()
m1 = heapfree
seed 0
list = for k=1 to 50000 collect (random 1 100)
aDupes = #()
aDupePairs = #()
for i = 1 to list.count do
(
a = finditem aDupes list[i]
if a == 0 then
(
append aDupes list[i]
append aDupePairs #(i)
)
else
(
append aDupePairs[a] i
)
)
aDupePairs
format "dupes:% time: % memory:%
" aDupePairs.count (timestamp() - t1) (m1 - heapfree)
Actually it works for strings too… but even heavier/slower…
seed 0
list = #()
for i = 1 to 50000 do
(
append list ("Test" + (random 1 100) as string)
)
aDupes = #()
aDupePairs = #()
for i = 1 to list.count do
(
a = finditem aDupes list[i]
if a == 0 then
(
append aDupes list[i]
append aDupePairs #(i)
)
else
(
append aDupePairs[a] i
)
)
aDupePairs
format "dupes:% time: % memory:%
" aDupePairs.count (timestamp() - t1) (m1 - heapfree)
dupes:100 time: 700 memory:18580336L
here is my best function… pretty fast, no memory use, can work with any type of array elements including multiple types:
seed 0
list = for k=1 to 50000 collect ((random 1 100) as string)
fn findDups4 list =
(
vals = #()
dups = #()
for k=1 to list.count do
(
if (i = finditem vals list[k]) == 0 do i = (append vals list[k]).count
if dups[i] == undefined then dups[i] = #{k} else append dups[i] k
)
dups
)
(
gc()
t1 = timestamp()
m1 = heapfree
dups = findDups4 list
format "4 >> dups:% time:% memory:%
" dups.count (timestamp() - t1) (m1 - heapfree)
ok
)
/*
4 >> dups:100 time:175 memory:6664L – integers
4 >> dups:100 time:281 memory:6696L – strings
*/