Pixi.js: considered gl-matrix?

Created on 30 Jun 2017 · 39Comments · Source: pixijs/pixi.js

This may be heresy, but I'll throw an idea out here and let people scream bloody murder...have we ever considered the possibility of standardizing on matrices, points that are array based instead of object based? WebGL expects arrays in most of it's vertex/texture/etc apis. There are also a lot of existing ecosystem modules that standardize on plain old or native arrays (e.g., https://github.com/toji/gl-matrix) gl-matrix would be nice given that many browsers have hardware SIMD support behind flags, and are going to be available soon (in fact, gl-matrix internally supports this now.)

I realize this would be a massive paradigm shift for pixi, and totally break backward compatibility...still, wanted to throw this out there and see what people think, and how much/little they hate the idea ;)

Source

mreinstein

😄1 👍1

Most helpful comment

@GoodBoyDigital I think maybe you're right. resorting to my friend mrs. google, I stumbled upon this:

https://stackoverflow.com/questions/15823021/when-to-use-float32array-instead-of-array-in-javascript

the top rated answer (currently at 44 votes) seems reasonable and relevant.

mreinstein on 2 Jul 2017

👍2

All 39 comments

1) SIMD is dead. https://github.com/rwaldron/tc39-notes/blob/a66df6740eec3358d5e24f81817db99d6ee41401/es8/2017-03/mar-21.md#10if-simdjs-status-update

2) Simple 6 fields "a,b,c,d,tx,ty" are peforming much better even than Float32Array(9) matrix. Cant give links to the tests, but both I and @GoodBoyDigital tried to integrate it already.

3) JS does double precision, which is critical for apps that work with big coordinates. "projection x transform" for graphics and and sprites are better be on JS side.

ivanpopelyshev on 1 Jul 2017

I'm using a Float32Array(9) in v5 which in my jsperf tests was similar if not the same perf, and prevents us from needing to do toArray and transpose operations.

https://github.com/pixijs/pixi.js/blob/455c059e8d155c1d9a05fc2ece2c02b3bdf8bf84/plugins/core/src/math/Matrix.js

gl-matrix is beneficial because it had SIMD (which as Ivan mentioned is a dead spec), but has downfalls as well in their implementation. We want a 3x3 matrix (Float32Array(9)) to hit the GPU, but do operations as if it was a 2D matrix to save computation time. gl-matrix doesn't have a good mechanism for that.

The v5 version uses storage that we can upload directly to the GPU as well as only operating on the 2D parts we care about. It also sets us up to be able to use SharedArrayBuffers and other optimizations that might let us put more work on webworkers. We'll see how far we can get with it.

englercj on 1 Jul 2017

@englercj I'm afraid we'll have to use Math.fround in many places to be consistent. Try perf Float64Array.

ivanpopelyshev on 1 Jul 2017

I could try to make it Float64Array, but then we still need to downsize it to single-precision for upload. Keep in mind also, we are currently using a float32 when we upload to the GPU. So we do the math in double, then truncate to single. This may be more accurate than doing it all in single, but I'd like to try and be consistent with the data type we upload. Unfortunately that means single-precision until GL 4.0, and WebGL 2 is only GL ES 3.0 :(

englercj on 1 Jul 2017

Matrix upload never was our bottleneck, we use "uniform3fv" in those places, it is not simple operation, and its followed with a drawCall, and in most of cases, big buffer upload. Heavy pixi app is doing only about 400 drawcalls per frame,

After encountering user's problems with big coordinates, I prefer to store everything with double precision up until we upload it.

Also, "a,b,c,d,tx,ty" notation is easier to write and read than "0,1,3,4,6,7". Its also used in spine, they have very sofisticated transforms on top of that. If we switch to matrices it won't be that easy to check our code later. For some people its difficult to imagine matrix operations, but I read them easily.

UPD. I also think this one will help us more than converting matrices: https://github.com/gameofbombs/gobi/blob/master/src/core/math/FlatTransform2d.ts , that's "Flat" transform, contains all the fields needed for calculation of matrix.

UPD2. But for 3d transforms, Float32Array(16) is better and I wont speak against it.

ivanpopelyshev on 1 Jul 2017

My vote goes to the one that gives best performance. Last I checked it using objects over arrays was faster. Although I'm not 100% sure if thats still true, that may well have changed!

For 3d I favour gl-matrix style, mainly as stuff gets uploaded to GPU a lot! With Pixi this is not generally the case. Most manipulation happens in js land (eg sprite batcher).

GoodBoyDigital on 1 Jul 2017

https://jsperf.com/obj-vs-array-view-access/1

Here is the testing I did. Object is slower than Float32Array, both of which is way slower than normal Array. Since uniform3fv accepts a normal Array object, we should switch to that. Then we get double precision and the fastest CPU speed and it can be directly uploaded.

Edit: Looks like that Array result was a fluke, I am unable to reproduce it?

englercj on 1 Jul 2017

My vote goes to the one that gives best performance.
Last I checked it using objects over arrays was faster.

That is very interesting. It seems counterintuitive that objects would be faster than a purely numeric array, because the underlying vm should be able to do a number of optimizations, knowing that object semantics could be largely stripped out. I'm curious if you have any ideas _why_ objects might be faster?

mreinstein on 1 Jul 2017

Relevant results from my comp (i7, Windows 10):

Chrome 59:

Firefox 54.0.1 (32-bit):

Microsoft Edge 40.15063.0.0:

Seems like my chrome result of Array being faster was a fluke, not sure what that was but unable to reproduce it now. It is around 50M ops/s, but was 500M ops/s on one run I did before. Weird...

Either way, Float32Array member is consistently the same speed or faster than object in all browsers. Which is why I switched it over, it is the same speed as before (or faster) but we now avoid transposing and filling an array for uploads.

englercj on 1 Jul 2017

Can you do that with Float64 too please? :)

I care about code quality more than about one more matrix operation per drawcall.

Also, we dont use invert much. updateTransform() is our problem.

ivanpopelyshev on 1 Jul 2017

I care about code quality more than about one more matrix operation per drawcall.

How is Float64Array higher code quality? I understand we get more precision, but I'm not sure I follow why that is so important given we reduce to single-precision anyway.

Also, we don't use invert much. updateTransform() is our problem.

This is benchmarking the read/write of the data storage, the operation performed on the values inbetween in irrelevant.

englercj on 1 Jul 2017

I care about code quality more than about one more matrix operation per drawcall.

I agree with this too. Consider that "standardizing" on array based matrices, points, & vectors might actually be more of a code quality improvement than a performance one...I would consider re-use of popular js matrix/vector libraries and more interoperability with popular rendering modules to a be a big-time code quality win.

This is benchmarking the read/write of the data storage,
the operation performed on the values inbetween in irrelevant.

I think you're both right. This is a general benchmarking of data storage read/write. But Ivan has a really good point; if updateTransform() is the overwhelming operation, it would be compelling to see that.

I worry about micro benchmarks in general; with these static data sets I wonder if we inadvertently end up taking advantage of compiler optimizations in javascript vms in these tests. running real world tests would be a lot more enlightening (at the expense of being way more work to setup.)

mreinstein on 1 Jul 2017

I agree with this too. Consider that "standardizing" on array based matrices, points, & vectors might actually be more of a code quality improvement than a performance one...I would consider re-use of popular js matrix/vector libraries and more interoperability with popular rendering modules to a be a big-time code quality win.

I'm for "(a,b),(c,d),(tx,ty)" double-precision matrix, with conversion to float32array for upload and backed by "posX,posY,pivotX, pivotY,scaleX,scaleY,shearX,shearY,rotZ". I will use it in my fork anyway, but I prefer not to deal with matrix arrays in master pixi too. That's my standart.

I also doubt that it is possible to make people use one or two standarts for vec math in js.

How is Float64Array higher code quality? I understand we get more precision, but I'm not sure I follow why that is so important given we reduce to single-precision anyway.

We multiply camera transform by sprite transform (for scrolling) in updateTransform. Result fits on screen only if its small, so numbers are small in the end, but both sprite position and camera position can be big. User-side solutions:

1) compute everything on his side. PIXI deals only with relatively small coordinates.
2) divide world into chunks with big coords, sprite have relative small coords, however that wont work for subpixel camera => camera needs big and small coords too.

Its ok to force user do it on his side in case of big project, but for small ones its just one more headache.

ivanpopelyshev on 1 Jul 2017

I also doubt that it is possible to make people use one or two standarts for vec math in js.

I don't understand what you mean, can you elaborate?

mreinstein on 1 Jul 2017

I don't understand what you mean, can you elaborate?

I can't, its too much for me.

We have people with different experiences both in low-level optimizations. and language constructs, DSL-s. We need a standart that satisfies us all on some level.

In last two years, I made two forks of pixi (for v3 and v4) with different Transforms, I'm dealing with "pixi-spine" which has its own advanced transform and i'm making third fork. From "past ivan" point of view, array is the best because its simplest form and there's "gl-matrix".

ivanpopelyshev on 1 Jul 2017

I agree with @ivanpopelyshev that maintaining 64-bits is important. I'm trying to figure out how we do that, efficiently, without creating and copying buffers each frame.

Maybe we store it as a Float64Array, and when uploading copy it into a Float32Array. That at least lets us use a typed-array as the storage backing which should be easier to copy to/from a webworker.

englercj on 2 Jul 2017

👍1

Float64Array it is then, I'll just have to remember to use (0,1), (3,4), (6,7) as X,Y,translate

ivanpopelyshev on 2 Jul 2017

All interesting stuff guys, I would be keen for us to benchmark our proposed solution in a real pixi scenario - something like bunnymark.

My experience in this area is that when me and @ivanpopelyshev last switched to gl-matrix for bunny mark we found it ran significantly slower (like a third slower!)

That was a little while ago though and I really would prefer for me to be wrong here!
If the speed difference is now negligible then I think the proposed route above will be ace.

Pixi is SPEEED :P Lets make sure we test before committing fully to any solution.

GoodBoyDigital on 2 Jul 2017

We still need benchmark of Float64Array. This implementation is acceptable for me, but personally I will use old matrix in my fork. Also dont forget to add properties just for compatibility.

One more thing: @mreinstein , one of standarts i do remember is PaperScript, its a language special for paper.js, adds point operations directly into the language. If JS have more syntax sugar, we would use something like that too.

ivanpopelyshev on 2 Jul 2017

Is there anything I can help with re: this updated bunnymark? I'm happy to donate some time with this.

I'd be very interested in following along with the perf stuff happening here. Regardless of the results. If the array based stuff ends up still being a lot slower, I'm really curious about the _why_ behind it.

Something to note re: the perf testing...chrome's v8 recently (in v59) released some pretty dramatic changes to v8 called turbofan. Supposedly it has some significant performance improvements.

https://v8project.blogspot.com/2017/05/launching-ignition-and-turbofan.html

might be interesting to run the updated bunnymark on a version prior to turbofan being present vs now, just for the heck of it.

mreinstein on 2 Jul 2017

@GoodBoyDigital I updated the bench to include Float64Array, and it is consistently faster to read/write to the array than the object. If it gets slower in pixi then something else changed, because read/write to a Float64Array backing store is faster than an object.

https://jsperf.com/obj-vs-array-view-access/1

Only on Edge does the object speed match/exceed Float64Array, and it is really close.

englercj on 2 Jul 2017

👍1

I've tested this in my env and I get similar results...what the hell?! Why would Float64 array read/write be significantly faster than equivalent Float32 array operations? The only thing that comes to mind is maybe 64 bit floats align on word boundaries? I am perplexed.

mreinstein on 2 Jul 2017

Thanks for the offer @mreinstein ! If you could help us out with some perf tests that certainly would put to bed the whole debate with cold hard facts!

Best thing to do is fork pixi and then replace transforms with gl-matrix or @englercj matrix class. For this instance, we only really need to make the sprite batch work too - not the whole engine!

Then once we have a tweaked version we can test perf here: https://pixijs.github.io/bunny-mark/
We can the mess around with the different arrays types.

@englercj thats awesome dude! Its encouraging to see these results for sure. Is v5 version near a state where we can give bunny mark a spin?

GoodBoyDigital on 2 Jul 2017

@mreinstein just a hunch, I think it may be to do with conversion from 64bit -> 32bit
as a number in js is 64bit right?

GoodBoyDigital on 2 Jul 2017

@GoodBoyDigital I think maybe you're right. resorting to my friend mrs. google, I stumbled upon this:

https://stackoverflow.com/questions/15823021/when-to-use-float32array-instead-of-array-in-javascript

the top rated answer (currently at 44 votes) seems reasonable and relevant.

mreinstein on 2 Jul 2017

👍2

If you could help us out with some perf tests

Alrighty, happy to help with this. :)

Best thing to do is fork pixi

@GoodBoyDigital @englercj what is the best branch to fork from at this point?

then replace transforms with gl-matrix or @englercj matrix class.
For this instance, we only really need to make the sprite batch work too - not the whole engine!

Can you elaborate on this a little more? I get not wanting to convert the whole engine to gl-matrix just to run the perf bench... I don't see any sprite batch stuff in https://github.com/pixijs/bunny-mark There appears to be a PIXI.Container as the root element, to which the bunnies are added. Are you saying I should start with pixi.container, pixi.sprite and work backwards from there up the dependency tree to find all places where the transform stuff needs replacement? Not saying I disagree, just want to ensure I've got the strategy right for minimizing unnecessary work.

mreinstein on 2 Jul 2017

updateTransform has two matrix operations inside:

composition from position,pivot,scale,rotation - cant be improved.
multiplication by parent matrix - can be improved.

i suggest to make matrix class with old "a,b,c,d,tx,ty" props backed by Float64Array, and rewrite updateTransform. Also, @mreinstein made enough things to be in the core, i suggest we add him, so he has access to branches and build farm.

ivanpopelyshev on 2 Jul 2017

so he has access to branches and build farm.

Everyone has access to that, fork + PR does all of that.

englercj on 2 Jul 2017

I'm curious what kinds of results you folks get in bunny mark..when I try various branches (dev, release, others) from chrome with the default 100k bunnies I get 10-16fps consistently. Doing a perf analysis on the code running over 8.75s:

screen shot 2017-07-02 at 11 58 03 am

Almost all of the time is spent in javascript calls.

screen shot 2017-07-02 at 11 57 50 am

Of this time spent in javascript, the lion's share of time is actually spent in Sprite.calculateVertices(). 4x more than is spent in TransformStatic.updateTransform()

In firefox I get about twice the framerate, but the breakdown of time spent is similar; calculateVertices() takes up the bulk of time in both browsers. Is this expected? Are you getting similar results in your bunnymark runs?

mreinstein on 2 Jul 2017

This is expected for bunnies. And matrix operations that we are talking about are performed in two places: upateTransform() for multiplication and flush() ONCE PER CALL. It doesn't matter in large scale.

ivanpopelyshev on 2 Jul 2017

I'm confused. :( How can I tell the performance of that benchmark? I was under the impression that it's mostly based on pure fps achieved. If that's true, then my frame rate is being _severely_ throttled by the vertices calculations. Looking at the first diagram I pasted above, rendering is a tiny fraction of the total frame time. vertex and transform updates are overwhelming the total frame time.

If that's false, and fps isn't the measurement of how performant my pixi build is, what criteria do I use to measure a given build?

mreinstein on 2 Jul 2017

It can be throttled either on CPU either on GPU side (its not shown),

If you have FPS much less than 60 but "idle" is big - its GPU.

If idle is small, then its CPU.

calculateVertices has matrix operation inside - just multiplication four corners by matrix.

ivanpopelyshev on 2 Jul 2017

Just small idea for TS fork: add transform that inlines properties for matrix types. Its shame there aren't that kind of transforms for TS yet :(

I think it can be done for babel too if we find a way to annotate matrix variables.

ivanpopelyshev on 3 Jul 2017

I can't understand what is going on anymore.

Alt text

ivanpopelyshev on 4 Jul 2017

Something to keep in mind: the goal of this benchmark in the first place was to see if using objects vs gl-matrix would be a huge performance hit, as it was thought it might. Regardless of the differences in our benchmark results, we seem to show consistently that object perf is no worse than most of the array cases.

I also want to avoid drawing performance conclusions about pixi in general here because I've profiled the bunny mark. In Chrome, > 50% of the program's total time is spent in Sprite.calculateVertices() (40% ish), followed by TransformStatic.updateTransform() (11% ish) Firefox seems to run twice as fast, but the ratio of time spent in those 2 functions is still consistent.

I want to avoid going too far off topic, but I'll say that in doing this bunnymark profiling, I'm starting to think that our usage of es5 getters/setters might have something to do with the perf drop in chrome: https://jsperf.com/es5-getters-setters-versus-getter-setter-methods/10

Are any of you folks hanging out on slack here? I think it would be easier to chat in semi-real time rather than message passing here.

mreinstein on 4 Jul 2017

Whats you email @mreinstein ? Will invite you to slack 👍

GoodBoyDigital on 4 Jul 2017

Based on what @englercj says apparently we are not doing this. Closing.

mreinstein on 4 Jul 2017

Just small update: I have 3x3 matrix instead of 3x2 in https://github.com/pixijs/pixi-projection/blob/master/src/Matrix2d.ts , based on Float64Array.

ivanpopelyshev on 31 Jul 2017

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.