Rendering Framework: GDI via C#

When developing graphical applications in a Windows environment, one of the key fundamental aspects for creating rich applications is understanding the “Graphics Device Interface” or, more commonly known as, simply GDI.

GDI is an abstraction layer for accessing video hardware built into the Windows operating system, specifically residing in the “GDI32.dll”.

This abstraction layer is responsible for drawing geometrical shapes, text with various fonts and sizes, handling various color depths – most commonly 8-bpp (256 colors represented by a palette – indexed, monochrome or “real color”), 16-bpp (“high color”), 24-bpp (“true color”) and 32-bpp ( “true color with transparency”) and rendering results to a Device Context – such as a screen or a printer.

Completing the cycle

In managed code we generally access a GDI surface through a wrapper object, such as the “Graphics” class. In my post on Real-time Drawing in C# I implemented a concrete rendering engine by inheriting from the RenderingCore base class in my rendering framework – this implementations render routine simply exposed a “Graphics” object as a display surface, and rendered the results directly onto the framework form.

While effective, this approach has several flaws such as the lack of a flipping-chain (double buffering) so the render display constantly “flickers”, the color depth of the graphics object is set to whatever the current windows display setting is and there is no efficient way to simply plot a single pixel – which is the essence of many rendering routines.

Let’s examine each one of these problems and see if we can come up with something a bit more ideal that will suite our needs.

Setting up a Flip-Chain

The most glaring problem with this application is the constant flicker. This is due to the fact that we’re directly writing over the existing display surface each frame. To eliminate this nuance we’re going to implore a technique known as double-buffering or page-flipping. Essentially, we create an exact replicate of the display surface (primary) in off-screen memory (secondary) and write directly to it.

When we complete a full render cycle, we do a memory copy from the secondary in-memory surface to the primary on-screen display.

I have created another abstract base class that derives from “RenderingCore” called “RenderingGDI”. This class is used as a foundation for the two concrete GDI rendering engines – “RenderingGraphics” and “RenderingMemory”. Inside the GDI base there is an instance field of type “Bitmap” called “buffer” which we’re going to use as our off-screen surface.

The render routine provided here is simply a wrapper for providing a flipping chain to inheriting classes. We still need to obtain the display surface on the framework form using the static method “FromHwnd” on the “Graphics” class – just as we did in the “RenderSimple” example. However, once we obtain a handle to this surface we’re going to need to invoke some native GDI32 methods to fully implement the flip-chain.

First, we need a handle (pointer) to the Device Context (HDC) for the primary surface. Then using this we can call the native “CreateCompatibleDC” function which creates an in-memory DC identical to our primary surface DC.  Once we have our in-memory DC, we need to invoke the native “SelectObject” function to put our bitmap buffer into this memory – this will make our in memory DC identical to our buffer surface.

From this point we’re free to draw on the buffer object at will – then when we’re finished we achieve the off-screen to on-screen “memory copy” using the native “BitBlt” function. Once complete the primary surface will be updated with the changes we made to secondary surface – so we just need to delete the temporary in-memory objects using the “DeleteObject” native call and release the primary surface handle.

Here is the sample code for achieving our flip-chain:


protected override void Render()
{
    using( Graphics display = Graphics.FromHwnd( FrameworkForm.Handle ) )
    {
        IntPtr hDC = display.GetHdc();
        IntPtr hMemDC = GDIMethods.CreateCompatibleDC( hDC );
        IntPtr bufferDC = buffer.GetHbitmap();

        GDIMethods.SelectObject( hMemDC, bufferDC );

        render.Invoke( buffer );

        GDIMethods.BitBlt
        (
            hDC,
            0, 0,
            FrameworkForm.Width, FrameworkForm.Height,
            hMemDC,
            0, 0,
            TernaryRasterOperations.SRCCOPY
        );

        GDIMethods.DeleteObject( bufferDC );
        GDIMethods.DeleteObject( hMemDC );

        display.ReleaseHdc( hDC );
    }
}

The subtleties of color

Another aspect we can improve upon from the original rendering implementation is specifying the color depth. Color depth is an immensity important aspect to real-time rendering in both computation time (the number of memory set operations required per-pixel) and memory bandwidth (the more bits per-pixel you have, the bigger memory footprint your application will have due to the size of the surfaces).

The RenderingGDI object exposes an internal “PixelSize” enumeration that you can use to specify the bits per-pixel – 8, 16 or 24. All three formats represent a combination of Red, Green and Blue values (RGB) – they just vary in size / amount of possible colors.

These are the really the only color formats you need to concern yourself with in real-time rendering due to anything less than 8-bpp and you don’t really have enough colors to display anything incredibly interesting and with 32-bpp the extra 8-bits is for an alpha channel which isn’t usually needed (and it’s incredibly slow).

Here is a quick breakdown of the three supported RGB pixel formats:

  • 8 bit: This color format is generally known as “palletized” color due to each pixel being 8 bits wide you have a maximum of 256 (2^8) colors. By default, the Windows GDI will use an “Indexed” palette if you specify 8-bpp which contains a set amount of around 20 “system colors” and a uniform distribution of other colors to fill the remaining slots. This is obviously not the desired palette to work with for rendering. An alternative to “Indexed” color is 8-bpp “real color” where the 8 bits allocated to a pixel are broken down into 3 bits for red, 3 bits for green and 2 bits for blue – this is because the human eye is more sensitive to red and green than it is to blue. This is known as the RGB332 format.
  • 16 bit: This format is pretty dated, however it was the standard for rending for a long time due to the relatively large color spectrum, and with properly optimized code it could yield basically twice the performance of 32bpp. There are three sub-types of this format broken down into the bit distribution of the pixel: 565 (5 red, 6 green, 5 blue), 555 (5 red, 5 green, 5 blue, 1 bit padding) and 1555 (1 alpha, 5 red, 5 green, 5 blue). In this rendering framework if you specify the 16-bpp format it will specifically use the RGB565 format because more colors are represented with this bit layout and we don’t have any use for transparency. Once again the extra bit devoted to the green channel is because the human eye is more sensitive to green than it is red – and obviously blue.
  • 24/32 bit: This is the standard pixel format for Managed GDI (GDI+) and all modern rendering. Both 24bpp and 32bpp represent each color channel as 8 bits wide, so you have 256 distinct types of each color – yielding 16,777,216 total distinct colors. The distinction between the two formats is that 32bpp uses the extra 8 bits for transparency, which is not needed in rendering. When using the RGB888 format in rendering it simply isn’t worth to specify 24bit color mode since each pixel is 3 bytes wide you would be required to multiply the x offset by 3 (to the CPU this would actually be 3 distinct add operations – which is why multiplication is to be avoided when possible) to get the starting address of the desired pixel, when you could simply use a bit shift (1 CPU operation) when the pixel size (in bytes) is a power of 2 (32 bits = 4 bytes = shift 2 places).

Here is some sample code for converting a Managed “Color” object into it’s bit representation for RGB332, RGB565 and RGB888 formats.


static byte Format332FromColor( Color color )
{
    return ( byte )
        ( ( color.B >> 6 ) +
        ( ( color.G >> 5 ) << 2 ) +
        ( ( color.R >> 5 ) << 5 ) );
}

static ushort Format565FromColor( Color color )
{
    return ( ushort )
        ( ( color.B >> 3 ) +
        ( ( color.G >> 2 ) << 5 ) +
        ( ( color.R >> 3 ) << 11 ) );
}

static uint FormatColor888( Color color )
{
    return ( uint )
    (
        color.B +
        ( color.G << 8 ) +
        ( color.R << 16 ) );
}

Working with Pixels

Now that we have a flipping-chain and variable color format surfaces, all we need is a fast and efficient way to plot individual pixels on our in-memory surface to complete our fundamental requirements.

I have written a custom wrapper object for manipulating a GDI surface called “GraphicsMemory”. This is a sealed instance class, meaning it can’t be inherited from, and it only provides very basic functionality (although it could easily be expanded).

The constructor for this wrapper takes a Bitmap (surface to write to) and the desired ColorFormat enumeration. There are three instance fields (FillRoutine, PPFastRoutine, and PPSafeRoutine) used to hold delegates (function pointers) to the correct implementation (based on the pixel format of the bitmap passed in) that are invoked for each publicly accessible function.

The ColorFormat enumeration is to specify “Monochrome” (8bit Grayscale), “Real” (8bit RGB332), “High” (16bit RGB565) and “True” (32bit RGB888). The “GraphicsMemory” wrapper gets it’s main speed advantage over the “Graphics” class in that inside the constructor we call “LockBits” on the Bitmap buffer, then we can use “unsafe” code blocks, i.e. pointers, to read/write directly to the surface of the buffer in our routines.

Since we’re using the “LockBits” method we must save the return value from this function call (BitmapData object) and pass it into the “Unlock” method when we’re done manipulating the surface memory.

The “LockBits” method returns a BitmapData object, which has two particularly interesting properties, “Scan0” and “Stride”. The “Scan0” property returns an “IntPtr” object, which is the starting address of the bitmap memory, and Stride contains the width, in bytes, of a single row of data. To visualize this try to think about it as a linear block of memory, with “Scan0” being the starting address. So, in order to set a pixel at a specific point (x,y) we need the address of that pixel and how wide it is.

The starting address of the correct row the pixel is on can simply be determined by multiplying the y value by the “Stride” property. Once you have the starting address of the row, the actual pixel address is simply the offset of the row (adjusted for pixel byte size) plus the x position, i.e, pixelStartAddress = Scan0 + ( y * ( Stride / PixelSize ) ) + x.

Once you have the pixel’s starting address, you just need to cast the starting address as the size of the block of memory you want to change (byte = 8 bits, ushort = 16 bits, uint = 32bits), then set the value like a normal write operation:


void PlotPixelFast332( int x, int y, Color color )
{
    *( ( ( byte* )bmpData.Scan0 ) + ( y * bmpData.Stride ) + x ) =
        Format332FromColor( color );
}

void PlotPixelFast565( int x, int y, Color color )
{
    *( ( ( ushort* ) bmpData.Scan0 ) + ( y * bmpData.Stride >> 1 ) + x ) =
        Format565FromColor( color );
}

void PlotPixelFast888( int x, int y, Color color )
{
    *( ( ( uint* ) bmpData.Scan0 ) + ( y * bmpData.Stride >> 2 ) + x ) =
        FormatColor888( color );
}

Fast Memory Fills

There are also routines for “Filling” the entire area of the buffer with a single color. These routines are essentially the same as the PlotPixel routines except that they iterate the entire surface instead of a single pixel.

You’ll notice in the public “Fill” routine I do a check to see if the desired color is “Color.Black” or RGB(0,0,0). This is because since black is just an empty (zeroed) bit block, all we need to do is get the starting address of the bitmap memory, create an empty byte array the size of the bitmap ( Stride (width) * Height ), and invoke “Marshal.Copy” with the newly instantiated byte array.

void ClearBuffer()
{
    IntPtr ptr = bmpData.Scan0;

    int bytes =
        Math.Abs( bmpData.Stride ) * buffer.Height;

    byte[] colorData = new byte[ bytes ];

    Marshal.Copy( colorData, 0, ptr, bytes );
}

This trick works because the CLR automatically instantiates types with empty or “zeroed” out memory, i.e., {0} for an array. The possible performance gains are huge compared to a simple boolean check. The “Marshal.Copy” method would technically “work” for a standard fill as well, however using pointers to directly access the surface vs marshaling the memory into a temporary array buffer, manually iterating through the array setting each individual pixel value, then marshaling the array back into that memory address is much, much faster by comparison.

For filling the surface with a color other than black, we can optimize our plot pixel routine a bit because we know we’re going to be plotting pixels in incrementing addresses – this means we can write them in succession. For example, in RGB332 format, each pixel is 8 bits so we can use a pointer to a “ulong” (64 bits) to write 8 pixels per memory set operation which will yield substantial speed increases. Here is the sample code for filling a surface in each of the supported color formats:


void FillBitmap332( Color color )
{
    byte pixelData =
        Format332FromColor( color );

    ulong pixelOctet = BitConverter.ToUInt64
    (
        new byte[]
        {
            pixelData, pixelData, pixelData, pixelData,
            pixelData, pixelData, pixelData, pixelData
        }, 0
    );

    for( int index = 0; index < ( bmpData.Height * ( SurfaceStride >> 3 ) ); index++ )
    {
        *( ( ( ulong* ) bmpData.Scan0 ) + index ) = pixelOctet;
    }
}

void FillBitmap565( Color color )
{
    ushort pixelData =
        Format565FromColor( color );

    byte[] pixelBytes = BitConverter.GetBytes( pixelData );

    ulong pixelQuad= BitConverter.ToUInt64
    (
        new byte[]
        {
            pixelBytes[ 0 ], pixelBytes[ 1 ], pixelBytes[ 0 ], pixelBytes[ 1 ],
            pixelBytes[ 0 ], pixelBytes[ 1 ], pixelBytes[ 0 ], pixelBytes[ 1 ]
        }, 0
    );

    for ( int index = 0; index < ( bmpData.Height * ( SurfaceStride >> 2 ) ); index++ )
    {
        *( ( ( ulong* ) bmpData.Scan0 ) + index ) = pixelQuad;
    }
}

void FillBitmap888( Color color )
{
    uint pixelData =
        FormatColor888( color );

    byte[] pixelBytes = BitConverter.GetBytes( pixelData );

    ulong pixelQuad = BitConverter.ToUInt64
    (
        new byte[]
        {
            pixelBytes[ 0 ], pixelBytes[ 1 ], pixelBytes[ 2 ], pixelBytes[ 3 ],
            pixelBytes[ 0 ], pixelBytes[ 1 ], pixelBytes[ 2 ], pixelBytes[ 3 ]
        }, 0
    );

    for ( int index = 0; index < ( bmpData.Height * ( SurfaceStride >> 1 ) ); index++ )
    {
        *( ( ( ulong* ) bmpData.Scan0 ) + index ) = pixelQuad;
    }
}

Sample Application and Examples

This article was originally written to be an independent post, however it has since become a sort of documentation for how my rendering framework project works. You’re more than welcome to go there and download the full project which includes the framework and some sample effects.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s