BMOW title
Floppy Emu banner

Raspberry Pi 3D Performance Demo

trex720

I’ve recently gotten interested in 3D programming for the Raspberry Pi. I’m not sure why RPi 3D development interests me when desktop 3D mostly doesn’t – I guess there’s just something fun about coaxing as much 3D performance as possible from a little $35 device. Armed with 15-year-old Open GL experience and only slightly newer video game console dev experience, I set out to push the limits of the Raspberry Pi’s 3D hardware.

 

Finding the Right Yardstick

The first question I faced was how to best measure 3D performance. Does 60 FPS rendering of a single cube represent better performance than 10 FPS rendering of a complex scene? What should I measure, exactly? After mulling on the problem for a day, I decided the only decent metric I could measure was triangles per second. A cube is made of 12 triangles, and at 60 FPS that’s 720 triangles per second. A complex scene might consist of 100,000 triangles, and at 10 FPS that’s 1 million triangles per second.

Many 3D programmers argue that triangles/second is a useless metric, fundamentally flawed in its conception, because it ignores the fact that some triangles are much more expensive to draw than others. A large triangle is more expensive than a small one. A triangle with lots of textures and complex lighting shaders applied to it is more expensive than a solid blue triangle. Triangles whose outside faces are oriented away from the viewer, or that lie behind other previously-drawn triangles, are almost free. Increasing the screen resolution makes all triangles more expensive. There are so many factors that affect the cost of rendering a triangle, that comparing triangles per second benchmarks between different hardware platforms or different programs is virtually meaningless.

In the context of the same program drawing the same triangles on the same hardware, however, triangles per second is still a useful measure of relative performance. If a change to the program can increase triangles/second without sacrificing image quality somehow, then it’s a win. In fact, unless the API can access specialized hardware performance counters in the graphics processor itself, triangles/second and frames/second are virtually the only things it’s even possible to measure.

 

Building the Program

Over the course of a few evenings, I put together a demo program called rasperf3d, which you can download from the BMOW web site. My twin goals were to create an OpenGL ES sample program more complex than Raspberry Pi’s hello_triangle example, and to make a tool to measure the impact of screen resolution, model complexity, shader type, and other rendering settings on frame rate. The program draws many copies of the same model in a grid layout on the screen, and shows real-time rendering performance data including a measure of triangles per second. I can use the keyboard to change the current rendering settings, and see how it affects overall 3D performance.
kid2

 

For Raspberry Pi Programmers

The source code compiles on a stock install of Raspbian Wheezy, with no extra libraries needed. X Windows is not required nor used.

Some of the more interesting things demonstrated in the code are:

  • Basic vertex and fragment shaders for Phong (per-pixel lighting) and Gouraud (per-vertex lighting)
  • Using ETC1 compressed textures
  • Rendering with vertex buffer objects
  • Drawing text with a bitmap font
  • Dynamically changing the screen resolution
  • Dynamically enabling multi-sampling
  • Taking a screen shot

 

Performance Measurement

Press S to hide all but the first row of text – this improves performance slightly. Use the keyboard to modify these settings:

  • Number of objects rendered. Each object is 1 draw call.
  • Type of object: Five choices, from a 12 triangle cube, to a 55K triangle robot.
  • Screen resolution. From 1920 x 1080 down to 568 x 320.
  • 4x Multi-sampling: on/off.
  • Shader: textured with per-pixel lighting, textured with per-vertex lighting, untextured with per-pixel lighting, untextured with per-vertex lighting, flat colored.
  • Texture filter: linear or nearest.
  • Mipmaps: on/off.
  • Backface culling: on/off.
  • Depth test: on/off.
  • Wireframe view: on/off.
  • Camera distance away from the “wall of objects”.
  • Camera yaw angle – view the wall of objects at an oblique angle, or edge-on.
  • Camera look away – rotate the camera 180 degrees, so that all triangles will get clipped out.
  • Secret feature: press R to take a screenshot.

For the curious: All the object models use one 256×256 compressed texture with mipmaps, and the same shader with one directional light, so the only performance difference between them is due to their geometry. All models have indexed verts. Each vertex has position XYZ, normal XYZ, and texture UV stored as 4-byte floats, for a total of 32 bytes per vertex. Rendering is done with VBOs (vertex buffer objects) and glDrawElements. Each copy of the model is a separate draw call to OpenGL.

 

Conclusions

I was able to push the hardware as high as 16 million triangles per second, while rendering a dozen copies of a 19K triangle dinosaur model. This was on a stock Raspberry Pi, at a screen resolution of 1280 x 720, and with the texture and lighting settings mentioned above. Higher numbers are possible, but require using rendering settings that aren’t very realistic for a “real” 3D program. Absolute peak throughput was 27.3 million tris/sec with a very simple flat-colored shader, screen resolution of 568 x 320, and the camera pointed away from the objects so that all triangles were clipped.

19K triangle object models are probably heavier than any real Raspberry Pi 3D game would use, since the hardware is only able to draw about a dozen of them before the frame rate dips below 60 FPS. Using a more appropriate model – a 500 triangle frog – the hardware was able to reach just 4 million triangles/second, but could draw 132 frogs before the frame rate fell below 60.

How does this compare to my PC or smartphone or Playstation? As mentioned earlier, direct comparison of triangle per second numbers between hardware platforms is generally meaningless. If I could run rasperf3d on a desktop PC or the Playstation 4, we might get a better relative comparison, but even that would be questionable. The program wasn’t designed to be a general purpose cross-platform 3D benchmark.

So I can’t put any specific numbers on it, but I feel comfortable in saying the comparison to modern 3D hardware is not favorable. That’s fine – that’s to be expected on a $35 computer the size of credit card. It’s been a very long time since I did any real professional 3D development work, but my recollection is the Gamecube, Xbox, and Playstation 2 games I worked on had scenes similar to my frog test. A typical scene in one of those games might have had something on the order of 100 objects, each of which consisted of a few hundred triangles, and used fairly basic textures and lighting. It looks like the Raspberry Pi should be capable of roughly the same.

Read more on this topic in the Raspberry Pi forums, here and here.

Read 1 comment and join the conversation 

1 Comment so far

  1. john - April 5th, 2015 12:59 am

    Thank you very much for rasperf3d. It’s helping me & doubtless others, to understand more about gles on RPi. I’ll be showing this, along with OpenVG overlays, on a vga display attached to a RPi 2 & a separate hdmi screen to control what’s happening, at a raspberry jam in a few days.

Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.