templateedge

I finally renewed my App Hub account, meaning it was time for me to make sure everything is working on the actual Xbox hardware. When I went to see how my current build was running… let’s just say things were pretty bad.  So I’ve been in the process of getting the game up to snuff for its console home, and I thought it might be nice to outline a few of the specifics.

So glad I paid attention in Linear Algebra

Aeternum doesn’t use the built in XNA SpriteBatch. My engine uses a custom group of batched primitive drawing tools. One reason for this, being that the SpriteBatch simply isn’t powerful enough for the amount and types of drawing I’m doing.

The Bullet engine is one of these, although it actually uses a custom Vertex and Pixel shader combo, similar to the 3d Particle effect demo to move most of the drawing math to the GPU.

However, the bulk of the sprite drawing is done by filling a large dynamic vertex buffer with quads having vertices calculated by the CPU, just like the SpriteBatch (to the best of my knowledge). My original implementation used the XNA Matrix structure to perform these calculations, as they’re pretty simple to get going, and are already built to interact with the Vector2 object for transformation, etc. However, the sheer bulk of math being done in the Matrix routines was starting to cause problems. I first tried using the ref/out functions to pass around pre-made Matrix objects by reference, but it just got to be unwieldy and wasn’t helping much.

My answer was to rip that out and start from scratch writing the transformation math myself inline. What’s nice in this case, is that since my engine is purely in 2D the math is actually pretty simple, and all I needed was a translation, a rotation, a scale, and a final translation to generate vertices. It looks a bit like this, very simplified:

float c = (float)System.Math.Cos(sprite.Rotation);
float s = (float)System.Math.Sin(sprite.Rotation);

float x1 = -sprite.Origin.X;
float x2 = sprite.Texture.Width - sprite.Origin.X;
float y1 = -sprite.Origin.Y;
float y2 = sprite.Texture.Height - sprite.Origin.Y;

Vector2 topLeft = new Vector2(
(x1 * c - y1 * s) * sprite.Scale.X + sprite.Position.X,
(y1 * c + x1 * s) * sprite.Scale.Y + sprite.Position.Y);

Vector2 topRight = new Vector2(
(x2 * c - y1 * s) * sprite.Scale.X + sprite.Position.X,
(y1 * c + x2 * s) * sprite.Scale.Y + sprite.Position.Y);

Vector2 bottomLeft = new Vector2(
(x1 * c - y2 * s) * sprite.Scale.X + sprite.Position.X,
(y2 * c + x1 * s) * sprite.Scale.Y + sprite.Position.Y);

Vector2 bottomRight = new Vector2(
(x2 * c - y2 * s) * sprite.Scale.X + sprite.Position.X,
(y2 * c + x2 * s) * sprite.Scale.Y + sprite.Position.Y);

Having said this, I would now like to say: you probably shouldn’t do this. This is something I did because I enjoy the challenge, and I wanted to see if it would really help. In my case, I think it did. But is it the best way to do this? I can’t really say. What I can say is that those linear algebra courses I took as part of my degree really paid off in their own way.

Scaling up is hard to do

One of my particle effects included a circular ring that was stretched to a large size while rotating. While very cool looking, and effective on PCs, with their beefy stretchy-rotaty power, the Xbox GPU was having a hard time coping. In this case, it was mostly the stretching outward that was hurting my frame rates, so my response was to reverse my intuition.

While initially I was using a small texture to save memory and let processing make up the slack, I instead went back and made a much larger texture, utilized mip-mapping in the content processor, and rewrote my scale interpolators to base off the new larger size. So instead of starting at 30% and scaling to 300%, I started at 10% and went to 100% of a texture that was three times bigger. So I effectively traded more texture memory for frames per second, which I think is a good deal in this case.

It’s full of bullets

This isn’t something I did lately, but I figure I would bring it up as an important optimization.

Aeternum as an engine generates no garbage. Although its technical heart is a custom scripting engine specifically designed for optimized handling of complex actions on large numbers of objects, namely bullets, I wrote before how hard I worked from the outset to make the whole system garbage free. Right now, it has a hard cap of 2000 simultaneous bullets on screen, although I’ve never written a usable pattern script that fills the screen up quite that much. I think it would be unplayable.

That’s not to say there isn’t garbage happening. The level scripts themselves allocate whatever they need, like new enemy instances. But the amounts are negligible, and I think I have everything working to where there won’t ever be a garbage collection in the middle of a stage.

The point here, is that thanks to the work I did in effectively managing object creation (most importantly in this case aggressive pooling) I’m not having to worry about garbage collection at all in keeping the Xbox build running smoothly.

Moving forward

With these bigger things out of the way, and the rest of the game appearing to run smoothly, I’m in the process of implementing the actual stage scripts and boss fights, and extensively testing them to make sure everything works properly on the Xbox without compromising frame rates. From there I’ll get into making sure the options and score table saves are working. That shouldn’t be too big of a jump since I’m already using the more up to date version of Nick Gravelyn’s EasyStorage library.

In all, I think it’s good progress. I probably should have gotten into making things work on the hardware earlier, but the financials didn’t line up until just recently.

  • Good article once again. I’m curious about why a larger texture made things faster for the GPU. You mention scale interpolators – perhaps these are the key to understanding this – tell me more :)

    • The interpolators I was referring to are just the scale values my particle engine uses to change the scale of each particle in an emitter over time. I mentioned having to modify them here because they’re stored as a multiple value of the source texture size. So a 128×128 pixel texture scaled to 2f would draw at 256×256 on screen.

      I believe what was killing me here and the solution is a two part problem. Shawn Hargreaves wrote quite a bit about texture filtering on his blog: http://blogs.msdn.com/b/shawnhar/archive/2009/09/08/texture-filtering.aspx. And you’ll notice that the fastest drawing involves, among other things, no scaling and no (or 90 degree only rotation). In my case I was doing both, and with linear sampling, which is comparatively inefficient.

      My solution of instead using a larger texture meant that by utilizing mip-maps, it became easier for the GPU to cope with drawing better by doing much less resampling of the source texture. To return to the interpolation example. Instead of scripting a particle with a 128×128 texture to start at 1x and scale up to 4x with a max draw size of 512×512, thereby requiring resampling and filtering every frame for every particle, I turned around and started with a 512×512 texture, which scripted to start at 0.25x and to go 1x. The visual result is largely the same, but the larger texture lets you specify mip-maps which means the GPU has the original texture plus pre-computed 256×256, 128×128, 64×64, etc. textures to pick from, whichever is closest to the desired output, meaning less sampling required.

      • Definitely in full agreement that mip-maps improving performance when drawing ‘smaller than texture’ areas – but the when drawing larger areas has me puzzled since it should use the largest mip map level, when upscaling the texture like this I think? I appreciate you’ve seen a performance win, curious as to why. You mention ‘much less resampling of the source texture’ – are you manually caching it perhaps? You also say ‘scripted to start at 0.25 and go to 1.0, is this a lerp across frames, or is their some caching going on? Or perhaps your manually specifying a lower mip-map level? as mentioned at the end? This would make better use of the texture cache, and I can certainly see it being faster.

        Gratefully

        • I’m honestly not entirely sure why the up scaling was killing my framerates. It could stem from any number of things that I’m doing that are specific to my engine environment and/or particle system. It ran pretty fast on any PC I tried it on, although that could just be a case of desktop graphics being so much more powerful.

          I think once again, the answer could be found buried in Shawn Hargreaves wisdom:
          http://blogs.msdn.com/b/shawnhar/archive/2009/09/14/texture-filtering-mipmaps.aspx
          Or better yet:
          http://blogs.msdn.com/b/shawnhar/archive/2008/04/11/santa-s-production-line.aspx

          So my best guess is I was simply doing something that was stalling the pipeline, whether it was in the increased texture lookups per pixel, or the increased necessity of on the fly sample interpolation filtering, or thrashing around in the texture cache from drawing a rotated texture. When you multiply this by a large number of particles all simultaneously doing the same thing, the problem compounds pretty quickly into a frame rate problem.

          It would take some pretty specific testing to locate the exact cause of the problem, and to truly understand what was going on here, I’d probably have to know a great deal more about what’s going on under the hood on the Xbox itself than I do. But at least thanks to the testing I could do I managed to find the variable part of the system that was doing the most damage and reworked things around that. So I think in that regard, this works as a pretty good example story in console platform development..

about

WastedBrilliance is an independent video game development studio run by Brooks Bishop. With contributions by Nate Graves, Jesse Bishop and Geoff Schultz.