[Closed] Distance vs squared distance
So, in a conversation with Dimitry we were discussing that Distance is a slow function because it is using square root. So i did a test and wrote a squaredDistance function that looked someing like…
fn squaredDistance v1 v2=
(
v3=v1-v2
sDist=v3.x^2 + v3.y^2 + v3.z^2
)
Then I did a test against this function and the distance function in Max script. The distance function was way faster when I wouldn’t think that it should be.
I then added sqrt sDist to then end of my squaredDistance function and ran the test again and discovered that it slowed it down but by very little.
Can any one shed any light on this?
Interestingly enough it takes just as long to add 1+1+1 as it does to calculate distance [0,0,0] [1,1,1]
- So is a manual pythagorian distance calculation actually faster than the distance function?
- How do you measure the speed? I assume you take a large number of calculations, rather than time a single calculation?
- Concerning your last remark, is this the same for other values?
(
fn Distance0 v1 v2= --this is your original function
(
v3=v1-v2
sDist = sqrt (v3.x^2 + v3.y^2 + v3.z^2)
)
fn Distance1 v1 v2= --this is the same without the last assignment -> a bit faster
(
v3=v1-v2
sqrt (v3.x^2 + v3.y^2 + v3.z^2)
)
--this version tries to avoid one variable creation but does 3 subtractions ->a bit slower
fn Distance2 v1 v2=
(
sqrt ((v1.x-v2.x)^2 + (v1.y-v2.y)^2 + (v1.z-v2.z)^2)
)
fn Distance3 v1 v2= --this one uses the MXS Length() call which does the sqrt and sqr calls in C++ code ->much faster
(
length(v1-v2)
)
st = timestamp()
for i = 1 to 1000000 do Distance0 [1,2,3] [4,5,6]
format "Distance0: %ms
" (timestamp()-st)
st = timestamp()
for i = 1 to 1000000 do Distance1 [1,2,3] [4,5,6]
format "Distance1: %ms
" (timestamp()-st)
st = timestamp()
for i = 1 to 1000000 do Distance2 [1,2,3] [4,5,6]
format "Distance2: %ms
" (timestamp()-st)
st = timestamp()
for i = 1 to 1000000 do Distance3 [1,2,3] [4,5,6]
format "Distance3: %ms
" (timestamp()-st)
st = timestamp()
for i = 1 to 1000000 do distance [1,2,3] [4,5,6] --this is for comparison
format "MXS Distance: %ms
" (timestamp()-st)
)
/*
--The following results are from 3 runs of each function with 1M iterations each:
Distance0: 7219ms
Distance0: 7000ms
Distance0: 7328ms
OK
Distance1: 6641ms
Distance1: 6703ms
Distance1: 6953ms
OK
Distance2: 7891ms
Distance2: 7844ms
Distance2: 8281ms
OK
Distance3: 2765ms
Distance3: 2625ms
Distance3: 2734ms
OK
MXS Distance: 1297ms
MXS Distance: 1360ms
MXS Distance: 1344ms
OK
*/
As you can see, assigning to an intermediate variable has a small time penalty but it pays off in the second function where we assign to v3 to perform the subtraction of all 3 components at once (in C++) and then use the result for the sqrt/sqr calls.
Removing the unnecessary assignment of the result to sDist = shaves off a couple of microseconds.
Trying to avoid the v3 variable overhead does not pay off because we have to do 3 float subtractions instead of just one Point3 subtraction. The third function is obviously slower.
When using the Lenght() method, we use a C++ function to do the root and square calculations which are of course much faster than the respective MXS functions. Almost 3 times faster, but still twice as slow as the native distance() method which does all of the above in C++.
Me and Paul were talking about this as I ran into the problem where I had to massively calculate the distances between verts on meshes to find if any were too close to each other. Now after talking with some programmers, I was advised to avoid using distance as it needs to take the square root: | (x, y, z)T | = ( x2 + y2 + z2) Now this is still a very slow action , even if done by C++ which directs it straight through the CPU. Thus in C++ there is a function called sqrDist that allows the calculation of distance^2 and avoids the use of sqrroot increasing the speed of calculation. In thinking that this will make my script much faster, I attempted to use it. As I stumbled upon this, I told Paul thinking that this would be an awsome speed up to many scripts. And suddenly we ran into this issue that even addition of integers in MXS is slower then the Distance() in C++. I am assuming this is due to the fact that the assembly of the math procedure for MXS is done somewhere at the core of 3DsMax, but it would quite interesting to find out exactly how. And maybe if it would be possible to integrate most of the math fns in MXS to be derived from C++? It would deffinately speed things up…Ofcourse at that point it becomes much easier to just start writing plug-ins:)
Well good to know that you get the same values that I get Bobo. Interesting that addition is almost as slow as sqrt is in C++.
Actually, for 1 million iterations in MAXScript, on my single CPU 2.1 GHz Athlon64,
(1+2) needs 781 ms.
(1-2) needs 813 ms.
(sqrt 123) needs 718 ms.
(123^6) needs 953 ms.
These vary a bit, but the averages are pretty much right.
Keep in mind that an addition has to put two values into memory, perform the operation and write the result back, while the SQRT has to deal with just one value. So the difference between 781 and 718 milliseconds could be the overhead of the additional memory operation.
Still I think there are other factors that make mesh vertex distance calculations slower, like accessing the data to compare to start with. That’s why using an acceleration structure like a voxel grid to check for closest vertices to any vertex is the way to go – brute force comparison scales poorly as the mesh density increases, while encoding all vertices into a grid scales more or less linearly and then the search results are almost instantenuous.
Unfortunately I do not know exactly how to make voxal grids, I found a work around using arrays. The initial brute force fn did not suite me so I spent some time figuring out how to make it faster. Creating a multidimensional array with mini arrays contatining vertex index, x pos, y pos, and zpos. Then qsort them with a function that checked their vector length, then use a for loop to go through the XYZ coordinate comparisons. If the (Vec2.x – Vec1.x) < 0.08 or > -0.08 then go to Vec.y, same procedure, then Vec.z. If all are true, we have a close encounter;). Initial brute force fn cost me a process of 10secs and my approach reduced it to 0.063sec. 167x faster:)
I apologize in advance for posting to an old thread, but you can’t deduce anything about c++/compiled code speed from profiling maxscript code. The primary bottleneck in maxscript is the length of the code. Also it is not possible to profile c++ accurately using simple mathematical functions with fixed arguments, such as a=1+2; b=sqrt(3); An optimizing compiler replaces the a and b with the direct answer! Even if you had a function such as:
float someFunc(float arg){ return arg/sqrt(4); }
it is likely that the compiler would emit something like:
float someFunc(float arg){ return arg*0.5f; }
If you wanted to be sure your profile results were correct you would need to compare something like:
randCost=profile(twistRandf());
a=proflie(twistRandf()+twistRandf())-(randCost*2);
b=profile(sqrt(twistRandf()))-(randCost);
This would produce a more fair comparison, since the optimizer can’t do strengthReduction/functionValue substitution because it doesn’t know the values at compile time. You will find that ‘sqrt’ is MUCH slower than ‘add’ on modern processors, even using speed up techniques such as reciprocal sqrt based methods.
That said, I’ve had problems like you before where code using less expensive operations intended for fast approximation were vastly slower that the full calcs. Confused by the situation, I ported the code to C++/Assembly and found the approximations to be faster than the baseline accurate calcs, regardless of what ms profiling told me. In one situation, a SLERP approximation was 3x slower than the standard ms Slerp() built-in function, but a straight port of the code to C++ showed the approximation was ~10x faster than a full Slerp()!
We were really not testing the speed of C++ but the speed of Max script. The code that I was writting was in Max script and making Max script calls. I see what you mean how ever but I’m not sure that it changes anything in the findings.