Veronica – GPU Recap

March 4, 2013 Quinn Dunki 15 Comments

By popular demand.

Housekeeping Note: I’ve added a new column on the left that groups Veronica posts together. If you want to follow this entire project, they’re all there, with the oldest at the bottom

So, now that the graphics board seems to be working (a good engineer never trusts anything completely), I can finally clean up the schematics and code a bit. People have been asking to see both for quite a while now, but you would not have wanted to see the pile of elephant vomit that was my project folder for the past 4 months. It’s still not pristine, but I think it’s at least constructive to share at this point.

First things first, the updated schematic for the VGA board (and the Eagle file):

Still cluttered, but you get what you pay for.

The primary difference from the earlier version of this board is the replacement of the 74HC573 latch with the 7200L FIFO. I do not have an updated PCB layout for this board, as I haven’t gone in and rerouted all the traces for this new version. I don’t plan to do that unless I need to re-etch this board for some reason. If you want the old BRD file as a starting point, it’s still available here.

Now, on to the firmware code for the GPU. It’s a single AVR assembly file, with a few parts broken out into .h files which are included for clarity. The makefile is included, which should work in your IDE of choice. It relies on GNU’s m4, so be aware of that if you plan to build this code. This code will not build as-is under AVR Studio, or other non-GNU AVR environments. If you’re on any flavor of un*x system, you’ll have m4 already, and assuming you have the GNU AVR toolchain installed, it should just build and go.

Here’s the complete code package. I’m not going to cover every line in gory detail- I refer you to the comments for that. I’ll just hit a few highlights here. Apologies for the funky indenting in the code samples below. I’ve yet to deduce the voodoo to get github-gists to indent the way I want.

The meat of the firmware is, of course, the VGA signal generator. That’s been covered in gory detail before, so I’m not going to go over that again. I will mention that I’ve changed my approach slightly. The VGA signals are still bit-banged out of PORTB, but instead of computing the various states, I now read them from a look-up table. The table looks like this:

As the scanline interrupts fire, the Z register is used to walk through that table line-by-line. Each line provides all the state needed for that pass- the state of the vertical blanking, the VRAM address of the line, and so on. This saves a ton of registers and computation, and makes the code quite tidy. Rather than saving and restoring the Z-register every time the interrupt fires, I just globally reserve it. No other code may use it. I do the same for a number of other registers. The ATMega has enough registers to get away with this, and it saves a lot of cycles in the interrupt service routine (which it sorely needs). Note that I have to use the Z register here, because I’m indexing into program memory (where the scanline table lives). Only the Z register can be used for indirect addressing into program space. If this becomes a major problem, I can copy the scanline table into SRAM (using the .data segment). I currently do this with the font data, so that I can indirect into it with the X register for rendering.

The second core component to the firmware is the command buffer handling. When the 6502 issues rendering commands (by writing to system memory address $EFFF), the VGA board intercepts those bytes and pushes them into the hardware FIFO. During vertical blanking, when the GPU isn’t doing much else, it pulls commands out of this FIFO and processes them. Here’s the code:

In a nutshell, it waits for the VBL, polls the inverted Empty Flag on the FIFO (“not empty” means at least one byte is in there) and springs into action if needed. To read the command byte, it takes control of the VRAM bus (which is shared with the FIFO), reads the byte by twiddling the control lines of the FIFO, and stores the byte in internal SRAM. When two bytes have been read (forming a two-byte render command packet), the command is processed.

The first byte of the command packet is an index into a jump table containing pointers to the various render routines. Because of how I’m doing the pointer math here, I’m ultimately limited to 127 different render commands. Surely that will be enough. As you can see, there are currently only four- filling the screen, plotting a character, changing the font color, and plotting a “string” (which just means plotting a character and advancing the cursor). One trick to note here- the ATMega has the opcode ‘ijmp’, intended for things like jump tables. However, it requires the Z register, which is reserved by the interrupt handler. Instead, I’m using a very old school trick to jump indirectly- I calculate the address I want to jump to, push it on the stack, then ‘ret’ (return from subroutine). The ‘ret’ code pulls a two-byte address of the stack and jumps to it. You can’t modify the program counter directly on AVRs, but you can easily fool it into modifying itself the way you want.

The rest of the code is pretty boilerplate stuff, I think. I’ll show one rendering command, just so you can see what those look like. Here’s plotting a character:

That’s just an unrolled loop to iterate through the bits in a character’s bitmap, and store the “font color” byte into VRAM for every set bit. This version forces a black background, but I also have a transparent version which just skips rendering for 0 bits in the character. The latter is slower, since it has a lot more branching in it. I may add the ability to set a text background color instead.

That’s all there is to it! It’s really not a lot of code. It will be interesting to see if this system holds up to real world use as I start to build out the other areas of Veronica, and start to write real software.

15 thoughts on “Veronica – GPU Recap”

K. Scharf says:

March 6, 2013 at 8:37 pm

Strange to ask at this late date but have you seen the uzebox http://belogic.com/uzebox/index.asp ? This gizmo sorta kinda does what you are doing with your VGA graphics generator, but by (grossly) overclocking the AVR he is ableto generate higher-rez graphics and drive an SVGA generator chip. While overclocking isn’t generally a great idea, maybe you could pump out a full 640×400 image with a faster clock rate. There is also the option of the 33mhz (and they overclock to at least 40mhz) xmegas.
1. Quinn Dunki says:
  
  March 6, 2013 at 9:01 pm
  
  Yep, I have indeed seen it. A number of people seemed to have had the same idea (bit-banging VGA from an AVR) around the same time, and uzebox is one of the results. LucidScience.com (which put me onto this idea) is or was also working on a game console or demo box of some sort based on the same premise.
  
  At some point, if I decide to step up to better graphics, I think an FPGA will be in order. This stunt with generating video signals in software is neat, but for a lot of reasons is not a great way to do things. It was (relatively) quick and easy to get going with the knowledge I had, which is the main reason it was the right choice for Veronica.
  1. Ken Scharf says:
    
    March 6, 2013 at 11:18 pm
    
    Looking back at your Veronica project made me realize I have a similar itch to scratch, IE: to build a ‘retro’ computer from back in the days when it all begain. I used to have both a KIM-1 and a homebrew 6502 machine with the ‘TIM’ monitor built on a set of OSI 400 series boards. I was one of those geeks standing in line at the MOS booth to buy a $25 6502 chip at the Atlantic City computer convention back in the mid ’70s. (Who knows Steve Wazniak could have been in line with me).
    
    Looking in my Junque box I have lots of Z80’s, a few 68000’s and some 80186’s that look tempting. The two ‘prizes’ though are an AMD AM9080A (8080 clone) and a D.E.C. T-11 chip. (The T-11 is a PDP-11 single chip processor that is a clone of the PDP-11/20).
    
    I could build an Altair/Imsai clone and use an ATmega to do the front panel. (Just think, the processor in the PANEL having 10X or more the power of the computer!). I’d also like to re-live my days working at DEC and build a PDP-11 around that T11 chip, again doing a front panel in software with an AVR talking to the T-11 via shared memory or a fifo chip.
JCCyC says:

March 10, 2013 at 10:21 pm

Do you plan to implement some way of reading the framebuffer content?
1. Quinn Dunki says:
  
  March 11, 2013 at 3:28 pm
  
  I have no plans to do that at the moment. I don’t think it will be necessary for the kinds of things I want to do, and it would be pretty difficult to implement. This whole system is predicated on the notion that data transfer is one-way. That allows a lot of assumptions that simplify the firmware and hardware design.
  1. JCCyC says:
    
    March 11, 2013 at 6:12 pm
    
    I see. So the fun resides in making code to implement “smart” commands at the AVR side. Draw lines, circles, bucket fill, patterns, scroll, bitblts, maintain sprites, do collision detection etc etc etc… all of that can be one-way.
    
    Ah. Apropos of absolutely nothing, I saw this and immediately thought of you: http://cdn.memegenerator.net/instances/250×250/36017046.jpg
    1. Quinn Dunki says:
      
      March 11, 2013 at 6:17 pm
      
      Indeed. It’s basically the model of modern graphics accelerators. You push rendering commands and data (vertices, textures, etc), and the card does the rest. Modern GPUs do usually support reading pixels from the frame buffer in a limited way, but it’s always very expensive and stalls the entire rendering pipeline. That feature is usually only present because the API standards (such as OpenGL) require it, and it’s quite kludgy on the hardware side. It’s sometimes useful for debugging, but is generally too slow to be of much real use. In the early days of accelerated rendering, it was sometimes used for side-buffer picking algorithms in isometric games, and as a workaround for systems that didn’t support render-to-texture for things like reflection mapping. Nowadays, there are better ways to do both, so reading pixels from the frame buffer is pretty much a no-no.
      1. JCCyC says:
        
        March 11, 2013 at 6:27 pm
        
        > That feature is usually only present because the API standards (such as OpenGL) require it
        
        …and because people like to be able to do a Print Screen.
        
        Quinn Dunki says:
        
        March 11, 2013 at 6:31 pm
        
        Yes, although generally you want to do your rendering to an offscreen buffer in that case. Screenshots (for example, for press) are usually rendered at resolutions higher than the frame buffer can hold anyway. There will usually be a 4x or 8x scale applied, and the screen will be rendered in tiles and composted offline in a very large image suitable for magazine or poster printing. Aside from debugging the rendering pipeline itself, there’s not much reason to take screenshots directly from the frame buffer.
kscharf says:

March 12, 2013 at 12:43 pm

While you’re working on your GPU command set and fonts here is an idea … variable width fonts. It seems that most characters can be rendered in a puny 3 pixel wide font, but some (N,M,W) look UGLY. OTOH an ‘i’ can get by in only ONE pixel wide. So if you make your font table a bit larger (to have pixel width per character info) you could implement a variable width font set which on the average migth be something like 3.2 pixels per char wide over typical text. I used this technique when I developed the graphical text driver for Niles Audio’s Iremote which had a 160 pixel wide screen. Normally it would allow 20 characters per line (8×8 cell), but the variable width technique allowed something closer to 30.
ryemac3 says:

March 13, 2013 at 4:28 pm

This project is truly amazing. It’s just such a shame that you sealed it up in such a box! I’d put it in an acrylic cube so that the world can see the amount of effort that was involved. Sometimes, it’s what’s on the inside that counts too!
1. Quinn Dunki says:
  
  March 13, 2013 at 4:32 pm
  
  Well, I wanted to go for a retro feel. However, the guts do pull out and the machine can be run with everything sitting outside (the cables to the control panel are all long). If I was going to display it somewhere at a conference or something, that might be what I’d do. Still, I like your acrylic cube idea. It’s not too late… hmmm….
2. JCCyC says:
  
  March 13, 2013 at 8:13 pm
  
  Are you kidding? That tube radio is the most awesome computer case in the history of ever. Though the transparent acrylic box idea is cool.
  
  And now for something completely different: Just as we were talking about the “write only” video, Ed S of the G+ Retro Computing group was talking about Tektronix graphics terminals. Maybe (a subset of) those escape sequences could be implemented in the AVR? https://plus.google.com/u/0/107049823915731374389/posts/TqATnjsXK7T
Rhialto says:

March 15, 2013 at 12:08 am

In an acrylic cube you’d get vORACnica or something like it 🙂
tomballarino says:

September 16, 2013 at 2:15 pm

Hello,

I really dig your site, especially the Veronica-stuff. I had so much fun reading your posts. In fact, I started my own site about a DIY computer projects I began a while back, but never came to completion, because of the inspiration I got from you. It can be found at http://blog.ballarino.org
The only thing that partly works is the GPU, which I dedicated my first post to.

Cheers
Tom

Comments are closed.

Blondihacks