Bit-banging a VGA signal generator.
Computers aren’t very exciting until they have a display of some sort. Well, I suppose if you’re at Bletchley Park trying to crack Enigma codes, computers are pretty exciting without a display. World-saving cryptography aside, I want Veronica to show pretty pictures. The usual route for a display on home-brew computers is composite video, because that’s fairly simple to generate. However, I haven’t had a device in my house that can display a composite signal since the 1990s. Herein lies the rub with old computer hardware- it’s not really the machines themselves that stop being usable. It’s the peripherals. Mechanical disk drives wear out, magnetic media goes bad, and video standards fade into history (so to speak).
For Veronica, I’d like to have video output that is relatively modern, so that I have some hope of connecting it to things. VGA seems like a good choice, because it’s now sorta retro, but still widely supported. I didn’t know a damn thing about VGA when I started this, so it has been pretty educational. It turns out that it’s a really simple standard. You just need a horizontal sync pulse, and a vertical sync pulse. Everything in between is pixels (more or less), based on an analog voltage on the red, green, and blue signal lines. One catch is the data bandwidth. To get any sort of decent resolution, you need to be able to push pixels at a good rate. The standard 640×480 pixel clock is 25.175Mhz. That’s much too fast for Veronica’s 6502 to crank out, so we need another solution. A number of old computers and game consoles used a separate custom chip to generate video, and I can see why. Nowadays, we’d call this a GPU, and much like modern-day, Veronica’s GPU will be considerably more powerful than her CPU. No shame in that, right? Also, 15″ VGA LCD monitors can be had for less than the cost of a nice lunch over on eBay.
While generating a VGA signal is fairly straightforward, it still takes quite a bit of hardware to do it. Instead, I can “cheat” and generate the requisite signals in software on a microcontroller. Sneaky options like this are available to us these days. It will be a quick way to get up and running, and allow me to focus on other problems that I’m more interested in. Before I proceed, I’d like to give a shout-out to a few folks. Like all my hacking, this project didn’t spring forth from a vacuum. I’m standing on the shoulders of giants here. A lot of research has gone into this project, and the following resources have been really valuable:
- Wikipedia’s VGA standards page
- LucidScience’s ATmega VGA generator
- Alfersoft’s ATtiny VGA generator (this link may be broken, sadly)
- The TinyVGA project
The VGA signal is pretty simple. Each horizontal line consists of a “front porch” (high signal), a sync pulse (low signal), a “back porch” (high signal), and a section of active video (analog). Each of these sections needs to be a specific length for the monitor to lock to the image. The horizontal line is analog, and the resolution basically depends on how fast you can push pixels out during this period. The vertical scanning has the same front-porch-sync-pulse-back-porch structure, but is measured in lines instead of time. There are 480 vertical lines. The sync pulses and “porches” make up the “blanking periods” on both the horizontal and vertical portions of the signal, which is when the electron beam on an old CRT would reset back to the start of the line or screen. I won’t go into gory detail on this here, as others have done a good job of explaining it (see the resources above, for example).
Veronica’s VGA signal generator is based on an ATmega 324PA. I chose this chip because it can run fast (20MHz), and has wide I/O. At some point, I’ll need video memory, which will need a lot of pins. For now though, I’m just spitting out the VGA sync pulses and generating some test pixels procedurally. This method of generating pixel values on the fly rather than pulling them from video RAM is sometimes called “racing the beam”. This is how the Atari 2600 generated video, and is the reason those games had that distinctive look that we all remember fondly.
In summary, I like to use the smallest thing I can that will do the job adequately, and the ATmega 324PA seems to be it. It’s not all puppies and unicorns, though. While the chip can run at 20Mhz, the I/O bandwidth is only 10Mhz. This is far short of the 25.175Mhz required to generate the standard VGA horizontal resolution of 640. However, we can cheat here once again. If we push pixels at the ATmega’s maximum rate of 10Mhz, the monitor will average things out and we wind up with a horizontal resolution of 256. Furthermore, if we double up the vertical lines, we get a manageable vertical resolution of 240. This also means all screen coordinates will fit into a byte, which will be a huge advantage for the 6502 when it needs to talk to my new GPU. A resolution of 256×240 gives a slightly weird aspect ratio of 1.06, which means the pixels won’t be square on a standard VGA monitor (which has an aspect ratio of 1.333). That’s livable, and was actually pretty common on 1980s computers. Many of them had weirdly-shaped pixels, due to similar technical compromises to the ones I’m making here. Special thanks to LucidScience for working out this resolution. It’s a great approach for VGA on an ATmega.
First things first- before I can write any code, I need a programming header. I haven’t used one of the big 40-pin AVRs before, so a new Bread Head was in order.
Okay, on to the code! The primary technique here is to calculate how many clock cycles will make up the required length of each phase of the VGA signal at the clock speed of the micro controller. You find these by cross-multiplying the clock speed you’re running at with the standard VGA rate of 25.175Mhz, and the phase lengths specified in the standard. Here are the clock cycle counts that I’m using. This is partly based on the work of others (LucidScience in particular), and also on my own experiments with what worked well:
- Horizontal front porch: 12 cycles
- Horizontal sync pulse: 76 cycles
- Horizontal back porch: 33 cycles
- Horizontal pixel clocks: 512
- Vertical front porch: 11 lines
- Vertical sync pulse: 2 lines
- Vertical back porch: 32 lines
- Vertical pixel lines: 480
These values work for me to get a monitor to lock to the signal at 640×480 with a refresh of 60Hz. There is some wiggle room. If you’re a couple of cycles off one way or another, I found that it still works. If you get too far out, you’ll start to get artifacts or (more often) the monitor will refuse to lock to the image. So, we need to write code that generates the sync pulses with those exact timings, and generates some analog pixel values during the visible part of the scan. This has to be done in assembly language, because you have to know exactly how long every instruction will take, so that you can guarantee the timing. It also means using interrupts, as they are the only reliable way to get this kind of realtime precision. While I have experience with assembly and interrupts on other chips, I’d never used either on an AVR. Fortunately, assembly languages are all basically the same. The AVR has some pretty fancy interrupt techniques that required some quality time with the documentation, but it’s not too difficult to grok.
Developing this code is a bit tricky, because until it’s nearly perfect, the monitor won’t lock to the signal, and you’ll just see The Blackness Of Failure. It’s not like an old television which would display anything that was sort of close, even if it resulted in a lot of static, rolling, Bob Saget, or other unpleasant artifacts on the screen. To get the basics right, I started with the oscilloscope. Until I figure out the image-export feature on my scope, you get to suffer through photos taken of the screen.
The sync pulses are generated from an interrupt service routine that fires based on one of the ATmega’s built in timers, every 636 clock cycles (the length of a horizontal line at 20Mhz). After getting the timer and interrupt code set up, I wrote the code to generate the pulses, counting cycles to make sure everything lines up. The ATmega data sheet lists the cycle counts for every instruction, so it’s just a matter of doing a lot of arithmetic. It’s quite a fun challenge to write assembly code to hit a very specific cycle count. Especially when you get into branching and loops, in which case you have to make sure every possible code path uses the same number of cycles. I had a couple of bugs in my code where one branch would take more or less cycles than another branch, which would cause slight timing differences. The symptom was the pulses on the oscilloscope would jitter slightly as the timing variations were averaged out. This is enough to make the monitor refuse to lock to the signal. This is an example of where getting things right on the scope first was crucial to success. Plugging in a monitor will tell you very little about what is wrong.
Okay, now that the sync pulses are ready, let’s generate some pixel data! For that, we need more hardware than just a naked ATmega. The red, green, and blue signals are analog, and I have a 8-bits available for the colors. For now, I’m going to give two bits to each color component, and ignore the extra two bits. That gives me 64 colors, which is plenty, honestly. It’s certainly far far more than most 6502-era computers had. To convert the two-bits-per-component into three analog signals, I’m using three small resistor ladders.
Here’s the schematic (and Eagle file) that I ended up with:
When using a resistor ladder used for digital to analog conversion, the need for resistor precision increases exponentially with the number of bits. With only two bits in each ladder, it normally wouldn’t be too critical. However, because we’re generating three color components that will be mixed by the monitor, the relationship of the ladders to each other is important. If they don’t match well, the resulting image will have a slight red, green, or blue tint. I opted to use 3k and 1.5k resistors to make each 2-bit DAC, because I happen to have those values in my box. Almost any 2:1 ratio should work. Next, I measured all the resistors I had in those sizes with an ohmmeter, and tried to find three of each that were exactly the same size (within the limits of my ohmmeter). I managed to find three that were exactly 1.476k, and three that were exactly 2.951k. The ratio of the resistors to each other will affect overall vividness of the colours, but the relationship of each pair to each other pair will affect overall tint. I felt it was more important to get the tint right.
So, with that all wired up, I added a loop to my interrupt routine to spit out some RGB values during the portions of the frame that are active pixels. I verified this on the scope:
Looking good! The code in question should be rendering all my 64 colours, in an 8×8 grid of blocks. Let’s plug in a monitor and see what happens!
Okay, okay, it wasn’t that easy. It took a fair amount of debugging the code to get to this point. Once the sync pulses were working, I started by showing a solid color, then vertical stripes (by getting the code for one scan line to work), then the blocks you see above.
Interesting sidebar, I spent a long time debugging what seemed to be muddy-looking colours. My “white” in the upper left seemed kinda gray. I learned two things while debugging this:
- Our perception of a color is greatly influenced by what colors are around it. To really test changes to a color, you need to see it in isolation.
- The backlight on my cheapo eBay screen is shit. The colors vary quite a bit depending on where on the screen they are. The eBay seller was less than honest about the condition of this display, I’m afraid.
I determined all this by setting up a mask for a single color block in my test pattern. I could then move this mask around while changing the code to verify that colors in different locations look right.
Here’s the final version of the signal generator code. I’ve indicated the clock cycle counts of each line and section in parentheses. This is avr-as assembly, so you can build it with avr-gcc, or (I suppose) the Arduino environment. The only really tricky stuff is that branches take a different number of cycles depending on whether the branch is taken (2 cycles), or falls through (1 cycle). To compensate for this, you’ll often see a ‘nop’ following a branch, so that both code paths are the same length. You’ll also see stretches of ‘nops’ anywhere that I had extra time to fill. The “porch” areas are used to set up loop counters and such for rendering the test pattern. During the active pixels portion, there’s no time to spare. In order to get pixels pushed out at the ATmega’s maximum I/O rate of 10Mhz, we need to be updating the output pins every two clock cycles (since we’re running at 20Mhz).
So far so good! I guess the next thing will need to be video memory, because there isn’t enough time left to do much more with this old school “racing the beam” technique. Stay tuned!