OK, back to the beginning and worse.
As I mentioned earlier, I found this “slow down” issue while working with real code and real models. The drop in performance was easily noticeable by eye, so that made me start measuring the performance.
To my surprise, things were running not 1.5X slower on Max 2017 but over 2X slower, and it did happen with about 30-40 different codes.
As I was using modifiers, callbacks, etc., it was really hard to profile the code, as I was unsure of what was causing the slowdown. It could be anything when you have 10K lines of code. So, I managed to reduce the problem to a single empty iteration. The performance was not as bad as with real code, but it was enough for me to start looking further.
As it was stated before, it is logical to think that any overhead on the iterator itself would be reduced when doing real things inside the loop. For example, is we have 200ms in 1 million iterations, doing some work that takes 10 seconds would “dissolve” those 200ms and they would become unnoticeable compared with the 10 seconds.
Although I already had information that it was not always the case, I just wanted to have the word of the developers in order to understand if this was caused by a bug or a feature. Unfortunately, based on the developer’s word, this is not a bug.
I hope you trust me if I say that the following code was not created intentionally to make Max 2017 fail compared with previous versions. It is based on a real code that in fact, performs better than this example.
I was so surprised with the results that it took me some time to test it in all the possible ways before posting, but no matter what I do, I get the same results.
This tests were performed on fresh Max startup for every version, and are the average of 3 consecutive runs (1 run should be enough tough).
The drop in performance in Max 2017 compared with any previous version tested is over 360%.
There is also over 1GB of memory leaked in any Max version, including Max 2017.
I won’t claim this happens in all machines or that it is a conclusive situation, but these are real results from real code. You should try it for yourself in order to make any assumption.
I hope someone finds this useful.
(
fn GetElements node =
(
local GetFaceVerts = polyop.getfaceverts
local faces = #{1..node.numfaces}
local verts = #{1..node.numverts}
local geoverts = for v in verts collect #()
for f in faces do append geoverts (GetFaceVerts node f)
local elementsVerts = #()
for face in faces do
(
farray = #(face)
varray = #{}
for f in farray where faces[f] do
(
for idx in (GetFaceVerts node f) do
(
if geoverts[idx] != undefined do
(
join farray geoverts[idx]
append varray idx
geoverts[idx] = undefined
)
)
faces[f] = false
)
append elementsVerts (varray as bitarray)
)
return elementsVerts
)
delete objects
obj = converttopoly (plane length:100 width:100 lengthsegs:50 widthsegs:50)
polyop.breakverts obj #all
gc(); st = timestamp(); sh = heapfree
sysRAM = (sysinfo.getSystemMemoryInfo())[3]
GetVertPos = polyop.getvert
SetVertPos = polyop.setvert
elements = GetElements obj
undo "Randomize" on
(
for j in elements do
(
center = [0,0,0]
for i in j do center += GetVertPos obj i
tm = matrix3 1
translate tm center
prerotatez tm (random 0 180)
pretranslate tm -center
for i in j do
(
pos = GetVertPos obj i
pos *= tm
SetVertPos obj i pos -- This leaks over 1GB of RAM
)
)
)
format "3DS MAX %
" (((maxversion())[1]/1000)+1998)
format "time:%ms heap:% RAM:%MB
" (timestamp()-st) (sh-heapfree) ((sysRAM-(sysinfo.getSystemMemoryInfo())[3]) / 1024.^2)
)
3DS MAX 2011
time:1429ms heap:3471592L RAM:1138.97MB
3DS MAX 2014
time:1360ms heap:3460208L RAM:1147.47MB
3DS MAX 2016
time:1312ms heap:3499760L RAM:1139.0MB
3DS MAX 2017
time:5015ms heap:3704136L RAM:1138.08MB
If you run this test, please publish the results so we all can understand if this happens in other scenarios as well and not just on my end.
This sounds like exactly my situation and thought process too!
Here are my results :
3DS MAX 2017
time:5077ms heap:3485764L RAM:1146.23MB
3DS MAX 2016
time:1415ms heap:3321464L RAM:1140.9MB
Have you broken this code down to identify if the slowdown is specific to a particular loop or operation?
i still don’t think that the problem related specifically to loop operations. we just associate it with loops because it’s a simplest way to many iterations.
i’m pretty sure that the code:
(
a = [0,0,0]
(
a += 1
)
(
a += 1
)
(
a += 1
)
(
a += 1
)
(
a += 1
)
(
a += 1
)
-- another 1000 times
(
a += 1
)
)
will be slower for 2017
My first hypothesis was that this was caused by the iterator.
I thought there was a gc() kinking in in the loop that caused the 50% slowdown, but Denis suggested that this is due to a scope change and I think he is right, and this situation is obviously emphasized in loops, as nobody will unroll a loop just to make the code run in Max 2017 as it does in previous versions.
However, there is a worse problem that drops the performance in Max 2017 immensely.
Thanks to Vojtech suggestion, I started looking at were exactly the problem is, and I think I got an interesting situation.
The biggest problem does seem to be iterating over bitarrays.
How bad is the performance drop in Max 2017 in these situations? So far it is bad enough.
I wouldnt find this myself if it wasn’t for the help of Denis and Vojtech, so thank you so much guys for your contributions and knowledge.
Is this a new feature or a bug? You tell me.
(
st = timestamp(); sh = heapfree
elements = for j = 1 to 10000 collect #{j}
for j in elements do
(
for i in j do ()
)
format "time:% ram:%
" (timestamp()-st) (sh-heapfree)
)
MAX 2016
time:261 ram:663624L
MAX 2017
time:6540 ram:680204L
i can confirm that the code above is written using very good style and good practices.
any optimization of this code can be done only with a kind of tricks. which shouldn’t be a normal practice
i think it’s because of a changed way how values are handled now in different scopes
Just guessing but let’s play a game, what results do you get if you replace the line
for i in j do center += GetVertPos obj i
with a while loop, something along the lines of
k = j as array
count = 0
while count < k.count do center += GetVertPos obj k[count+=1]
It adds bitarray > array conversion (that might or might not make it perform even worse) and some checking but doesn’t open a new scope which seems to be the main issue here. Not a max machine to test now.
To answer my own question, in Jorge’s code the while loop even !with the added overhead of array conversion! is 1.5x as fast as the for loop (so about 2x slower compared to the for loop in 2016) and to comment on what denisT suggested:
(
local stream = stringStream ""
local iter = "(
a += 1
)
"
append stream "(
a = [0,0,0]
"
for i = 1 to 10000 do append stream iter
append stream ")
"
t = timeStamp()
execute (stream as string)
format "time: %
" (timeStamp() - t)
)
2016: 59 ms
2017: 83 ms
IMHO if there was a for-loop syntax that would use a predeclared iterator/loop variable and wouldn’t open a new scope, it would be at worst 1.5x slower no matter what you do in the loop body and how many loops you nest.
And with these changes (eliminating the conversion overhead in 2017):
varray = #()
...
count = 0
while count < j.count do center += GetVertPos obj j[count+=1]
3DS MAX 2016
time:1236ms heap:2643720L RAM:1165.67MB
3DS MAX 2017
time:1355ms heap:3033904L RAM:992.391MB
for me without the last example (which proves my thoughts very well) it was not a bug. it’s much worse. it’s a BAD feature. and it will be very hard to fix because developers have to admit their mistake.
yes. this is the thing. it’s not nested loops as i thought. and not a passing of a variable from one scope to another. it’s the passing of iteratable variable from one loop the another. great find, Jorge!
No, it’s partly nested loops (and it might get worse when a variable from different scope gets assigned to) but this is a whole different situation specific to bitarrays, see for example:
(
st = timestamp(); sh = heapfree
local almostEmptyBitarray = #{10000}
for i = 1 to 10000 do
for j in almostEmptyBitarray do ()
format "time:% ram:%
" (timestamp()-st) (sh-heapfree)
)
Max 2016 time:701 ram:184L
Max 2017 time:8996 ram:196L
could anyone do the test:
a = #{1..1000}
for k in a do for i in a do ()
PS. it would be more interesting to change A anyhow… let’s say:
a = #{1..1000}
for k in a do for i in a do (a[i] = on)
Just this loop, as it is, not nested?
2016: 187 ms
2017: 298 ms
With changing it:
2016: 292 ms
2017: 480 ms
It only gets extremely bad when it happend in the inner loop.
(
st = timestamp(); sh = heapfree
local almostEmptyBitarray = #{10000}
for i = 1 to 10000 do
for j in almostEmptyBitarray do ()
format "time:% ram:%
" (timestamp()-st) (sh-heapfree)
)
WOW! that’s pretty often used code combination.
but this is different than Jorge’s finding.
Well, yeah, say you have a 1M vert mesh and do ONLY 100 loops of something – so let’s have a look
(
st = timestamp(); sh = heapfree
local emptyBitarray = #{}
emptyBitarray.count = 1000000 --- so that it's like the returned bitarray from 1M mesh,
--- for example selected verts that you get once in the beginning
for i = 1 to 100 do
for j in emptyBitarray do ()
format "time:% ram:%
" (timestamp()-st) (sh-heapfree)
)
The result is exactly the same as before, i.e. bit more than half a second in 2016 vs. eight seconds in 2017.
Don’t worry, the results are when tested in local scope, it’s a bit slower when in global scope but not really much.
ahh, my bad. i forgot to put it in local scope
( local a = #{1..1000}; for k in a do for i in a do (a[i] = on))