more morphic performance

Previously in morphic performance I benchmarked different actions in Squeak Smalltalk for creating, hiding, showing and deleting morphs in Squeak.

On [squeak-dev] Herbert Konig wrote: “Juan Vuletich has done a great job at simplifying cleaning and speeding up his version of Morphic in Cuis.”

So lets check that out.  We should compare some other flavours of Smalltalk.  The code used to generated the graph data is the same  “V1″ code as used for morphic performance. The software versions tested here are:

More Performance Graph - Inter Flavour Testing

Figure 1 - Interflavour Morphic Performance Testing - "v1" code (note x-scale is 10% of Figure 2)

Some interesting points are:

  • Cuis is the fastest for creating, hiding and showing morphs.  For deleting morphs Cuis is no faster than non-cog Pharo/Squeak, following the same exponential curve.
  • Pharo-CogVM is by far the fastest at deleting morphs, near the fastest at creating morphs, but much slower at hiding and showing morphs.
  • Non-CogVM Squeak and Pharo are very similar throughout the four graphs, with Pharo being slightly faster.   Both increase exponentially for creating & deleting morphs.  I’m not sure what to make of the smoother Pharo curves.

An update will follow to include the effect of Juan Vuletich’s “addAllMorphs” performance tweak of the original code.

Update 31 March 2011
I updated the original morphic performance article with Juan Vuletich’s  addAllMorphs performance tweak.  It will be interesting to see how that compares across flavours of Squeak.

In addition, for some flavours of Squeak the previous code didn’t display any morphs at all.  It seemed that the tightness of the creation/deletion loops might have optimised away the actual drawing of the morphs.  I don’t expect that is how applications would operate so the results would be skewed.  More fair I think is to force a redraw between each of the four tests, where I’ve added doOneCycle below.  What do you think?

 rand := Random new.
creating :=
[    m := EllipseMorph  new.
      ellipses add: m.
      m color: Color random.
      m position: ActiveWorld extent * ((rand next)@(rand next))
].
deleting := [ World removeAllMorphsIn: ellipses. ].
hiding := [ ellipses do: [ :item | item hide ]].
showing := [ ellipses do: [ :item | item show ]]. 

Transcript show: 'Morphic Performance Test Code v3.0 - ##FLAVOUR## '; cr.
Transcript show: 'amount, createTime, hideTime, showTime, deleteTime'; cr.
#( 1 10  100 200 300 400 500 600 700 800 900 1000 2000
3000 4000 5000 6000 7000 8000 9000 10000  30000 100000) do:
[ :amount |
         ellipses := OrderedCollection new. 
         createTime :=
         [         amount timesRepeat: creating. 
                   World addAllMorphs: ellipses. 
                   ActiveWorld doOneCycle.
         ] timeToRun . 
    hideTime := [ hiding value. ActiveWorld doOneCycle. ] timeToRun .
    showTime := [ showing value. ActiveWorld doOneCycle. ] timeToRun.
    deleteTime := [ deleting value. ActiveWorld doOneCycle. ] timeToRun.
    Transcript show: amount asString,          ', ', 
                    createTime asString,    ', ', 
                    hideTime asString,        ', ', 
                    showTime asString,       ', ', 
         deleteTime asString;     cr.
]

Interflavour Morphic Performance Testing With addAllMorphs Code "v3"

Figure 2 - Interflavour Morphic Performance Testing - "v3" code with addAllMorphs

It is intriguing how very straight the lines are now in Figure 2 compared to Figure 1.  Somehow I feel like something must be wrong.  Anyhow, thanks to Juan’s suggestions, as well as being ten times faster, the graph shape is now linear rather than exponential - a much better prospect for system scaling.

Also interesting is how similar the flavours of Squeak are now.   For up to 30,000 morphs the createTime is identical for all four flavours.  After that performance does diverge, with Cog being fastest, Squeak & Cuis together in the middle and Pharo the slowest at about half the speed of the Cog.

Strangely, Cog has a hideTime four times slower than the other three.   For showTime, both versions of Pharo (Std & Cog) are together twice as fast as Cuis and Squeak.

Test platform was Windows 7 Professional 32-bit Intel Core2Duo P8700 2.53Ghz 3GB RAM with graphics from Mobile Intel 4 Series Express Chipset.

I am interested in your observations.  Can you replicate these results?
(graphs can be attached to comments)


This entry was posted in Morphic and tagged , , , , . Bookmark the permalink.

14 Responses to more morphic performance

  1. Pingback: performance testing spreadsheet | openInWorld

  2. Henrik Johansen says:

    Pharo does not draw Morphs which are fully covered by others, as do the new Cuis 3.2, if I read release notes correctly.

    Thus your draw timings will depend quite alot on how large a part of the screen your workspace covers (as the Ellipses are rendered below), f.ex. on my machine with a stock VM.

    - Show with Workspace covering entire screen:
    10k objects: 253ms
    30k objects: 652ms
    100k objects: 4955ms

    - Show with Workspace covering just a tinypart:
    10k objects: 1318ms
    30k objects: 5883ms
    100k objects: 27241ms

    As can be seen, deciding not to draw them still takes time, but that’s probably the reason they are faster than Squeak and Cuis 3.0 in your tests.

    Like or Dislike: Thumb up 1 Thumb down 0

    • admin says:

      That is really useful to know. I was wary of optimisation differences that sort-of-seemed to happen when morphs were hidden/deleted too soon after they were created – before it had time to draw on screen, but had not considered differences in culling algorithms. Subsequently I did not pay attention to the size of the Workspace (and I assume Transcript) windows between the different flavours.

      Interesting though that Pharo seems slower anyhow (for this particular test script anyway)

      Like or Dislike: Thumb up 0 Thumb down 0

    • Andreas Raab says:

      “Pharo does not draw Morphs which are fully covered by others, as do the new Cuis 3.2, if I read release notes correctly.”

      I’m curious what you mean by this. Morphic has always had occlusion culling intended to avoid precisely this kind of overdraw. Can you elaborate what you changed?

      Like or Dislike: Thumb up 0 Thumb down 0

      • Levente Uzonyi says:

        I guess Henrik is talking about the change of Canvas >> #drawMorph:

        drawMorph: aMorph
            ”Changed to improve performance. Have seen a 30% improvement.”    

            (aMorph fullBounds intersects: self clipRect)
                ifFalse: [^self].
            self draw: aMorph

        But this change has no effect in Squeak. I even reverted Rectangle >> #intersect: to the previous version, but that didn’t make a difference either. According to MessageTally the difference comes from incremental GCs. For 100000 ellipses Pharo’s IGCs (689) take 3278ms while Squeak’s IGCs (1753) take 7824ms.

        Like or Dislike: Thumb up 0 Thumb down 0

      • Henrik Johansen says:

        Sort of.
        I rewrote WorldState>># drawWorld: aWorld submorphs: submorphs invalidAreasOn: aCanvas a long way back.

        The way I remember it, Squeak only stops painting Morphs beneath if any one Morph covers the entire damage rect.
        The change I made in Pharo makes it so only the part which is is _not_ covered by the current Morph is considered for below Morphs.

        Like or Dislike: Thumb up 0 Thumb down 0

        • Henrik Johansen says:

          The effects of this was larger in Pharo at the time btw,
          since the default UI Theme had SystemWindows with translucent dropshadows within their fullBounds, which lead to horrible resizing performance when there were many windows open beneath, even if they were fully covered.

          Like or Dislike: Thumb up 0 Thumb down 0

  3. Henrik Johansen says:

    The intersects check used for culling really dislikes floating point bounding box though, as it has to convert them to Fractions before comparing with integers.

    Using Cog/Pharo, changed creation to do
    m position: (ActiveWorld extent * ((rand next)@(rand next))) rounded

    with old float position:
    objects/create/hide/show
    5k/626/342/662/10
    10k/1258/680/1325/17
    30k/3804/2048/4027/46
    100k/13573/7129/14420/197

    with new integer position:
    objects/create/hide/show
    5k/295/18/250/9
    10k/570/33/491/11
    30k/1746/81/1490/28
    100k/6390/279/5318/170

    Obviously window size/covered area was different from my last example, but same for the two runs.

    Like or Dislike: Thumb up 0 Thumb down 0

    • admin says:

      Thanks Henrik. Another really interesting tid-bit. Looks like Pharo runs twice as fast using integers. That may account for much of the performance difference in Levente’s results below. I wonder if a useful optimisation for Pharo would be to cache the float-to-integer conversion during the intersection check. I guess it would depend on how often morphs are redrawn compared to how often they move.

      Like or Dislike: Thumb up 0 Thumb down 0

  4. admin says:

    On squeak-dev, Levente Uzonyi says: “Interesting benchmark. For better comparison I ran it on my pc using the same VM (Cog r2382) and recent images.” Thanks Levente. I have graphed your results. (click for larger image)
    [img]http://blog.openinworld.com/wp-content/uploads/2011/04/morphic-flavour-performance-Levente-Uzonyi.png[/img]

    Like or Dislike: Thumb up 0 Thumb down 0

  5. Ben Coman says:

    Levente, thanks for re-running the v3 code against your recent images – with the addition of Henrik’s “rounded” tip to get an integer bounding box.

    This shows a dramatic improvement for Pharo such that it now just edges out Squeak for createTime and showTime. Cuis being half the speed of Squeak & Pharo for creatTime is notable, but I guess there is a lot in flux in development with Cuis, and there may be some additional fluff in there at the moment.

    While hideTime and deleteTime seem relatively insignificant to overall performance, expanding the vertical scale shows Squeak to be half the speed of Pharo & Cuis.

    However in all cases where performance is much worse for one flavour over the other two, there is a kink in the slope of the poor performer – occuring in each case around 30,000. This would indicate further care should be taken in digesting these graphs, and that averaging results from multiple runs would be benefical. Though I’ll have to look into that later.
    [img]http://blog.openinworld.com/wp-content/uploads/2011/04/morphic-flavour-performance-Levente-Uzonyi-integer.png[/img]

    Like or Dislike: Thumb up 0 Thumb down 0

  6. Ben Coman says:

    btw, As much as benchmarks need to be taken with a big grain of salt, this has been an good opportunity for myself to learn some of the factors affecting performance – which will be important when I get to the practicalities of my application development. Thanks all for your contributions.

    Like or Dislike: Thumb up 0 Thumb down 0

  7. Levente Uzonyi on [squeak-dev] says:

    Here’s the script I used for these measurements:

    rand := Random new.
    output := (String new: 100) writeStream.
    creating := [
    | m |
    m := EllipseMorph new.
    ellipses add: m.
    m color: Color random.
    m position: (ActiveWorld extent * (rand next @ rand next)) rounded ].
    deleting := [ World removeAllMorphsIn: ellipses. ].
    hiding := [ ellipses do: [ :item | item hide ] ].
    showing := [ ellipses do: [ :item | item show ] ].

    morphs := World submorphs.
    World removeAllMorphs.
    World doOneCycle.
    Smalltalk garbageCollect.
    output
    nextPutAll: ‘Morphic Performance Test Code v3.1 – ##FLAVOUR## ‘; cr;
    nextPutAll: ‘amount, createTime, hideTime, showTime, deleteTime’; cr.
    #(1 10 100 200 300 400 500 600 700 800 900 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 30000 100000) do: [ :amount |
    ellipses := OrderedCollection new.
    createTime := [
    amount timesRepeat: creating.
    World addAllMorphs: ellipses.
    ActiveWorld doOneCycle ] timeToRun.
    hideTime := [ hiding value. ActiveWorld doOneCycle. ] timeToRun .
    showTime := [ showing value. ActiveWorld doOneCycle. ] timeToRun.
    deleteTime := [ deleting value. ActiveWorld doOneCycle. ] timeToRun.
    output
    print: amount; nextPutAll: ‘, ‘;
    print: createTime; nextPutAll: ‘, ‘;
    print: hideTime; nextPutAll: ‘, ‘;
    print: showTime; nextPutAll: ‘, ‘;
    print: deleteTime; cr ].
    World addAllMorphs: morphs.
    output contents explore

    Then in the lower pane of the explorer window type self and print it (Alt+p or Cmd+p on mac)”

    Like or Dislike: Thumb up 0 Thumb down 0

Leave a Reply